add `--count` and `--batch` args for data_export.py #522

jiangyinzuo · 2024-05-17T07:45:43Z

also convert X to numpy.array in glove dataset function

maumueller · 2024-05-21T09:16:33Z

Thanks @jiangyinzuo. What is the actual use case for adding these parameters? The idea of data_export.py is to export all the results that are found. Everything else was supposed to be done in post-processing the csv files.

jiangyinzuo · 2024-05-21T09:32:31Z

ann_benchmarks/results.py

@@ -84,7 +84,7 @@ def load_all_results(dataset: Optional[str] = None,
    Yields:
        tuple: A tuple containing properties as a dictionary and an h5py file object.
    """
-    for root, _, files in os.walk(build_result_filepath(dataset, count)):
+    for root, _, files in os.walk(build_result_filepath(dataset, count, batch_mode=batch_mode)):


If data_export.py is expected to export all the results that are found, then here build_result_filepath(...) should return all the sub-directories.

However, in def build_result_filepath(...):

ann-benchmarks/ann_benchmarks/results.py

Lines 11 to 38 in 3982de0

def build_result_filepath(dataset_name: Optional[str] = None,

count: Optional[int] = None,

definition: Optional[Definition] = None,

query_arguments: Optional[Any] = None,

batch_mode: bool = False) -> str:

"""

Constructs the filepath for storing the results.

Args:

dataset_name (str, optional): The name of the dataset.

count (int, optional): The count of records.

definition (Definition, optional): The definition of the algorithm.

query_arguments (Any, optional): Additional arguments for the query.

batch_mode (bool, optional): If True, the batch mode is activated.

Returns:

str: The constructed filepath.

"""

d = ["results"]

if dataset_name:

d.append(dataset_name)

if count:

d.append(str(count))

if definition:

d.append(definition.algorithm + ("-batch" if batch_mode else ""))

data = definition.arguments + query_arguments

d.append(re.sub(r"\W+", "_", json.dumps(data, sort_keys=True)).strip("_") + ".hdf5")

return os.path.join(*d)

If count and batch_mode are None, directories of specific counts and batch mode will be ignored.

So if we expect data_export.py to export all the results, including every count argument, batch mode, and non-batch mode, maybe we should implement a new function for def load_all_results(...)?

Good point. I think the problem is that

ann-benchmarks/ann_benchmarks/results.py

Lines 90 to 97 in 3982de0

continue

try:

with h5py.File(os.path.join(root, filename), "r+") as f:

properties = dict(f.attrs)

if batch_mode != properties["batch_mode"]:

continue

yield properties, f

except Exception:

doesn't differentiate between batch_mode being True or False, or batch_mode being None.

Since definition is None in the part above, it should try to load both the batch/non-batch/different count values.

jiangyinzuo · 2024-05-21T10:34:45Z

I decided to mark this PR as a draft until it is ready.

jiangyinzuo added 2 commits May 17, 2024 15:42

fix: convert X to numpy.array in glove dataset function

19d9f8c

add --count and --batch args for data_export.py

3982de0

jiangyinzuo commented May 21, 2024

View reviewed changes

jiangyinzuo marked this pull request as draft May 21, 2024 10:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `--count` and `--batch` args for data_export.py #522

add `--count` and `--batch` args for data_export.py #522

jiangyinzuo commented May 17, 2024

maumueller commented May 21, 2024

jiangyinzuo May 21, 2024 •

edited

Loading

maumueller May 21, 2024

jiangyinzuo commented May 21, 2024

	def build_result_filepath(dataset_name: Optional[str] = None,
	count: Optional[int] = None,
	definition: Optional[Definition] = None,
	query_arguments: Optional[Any] = None,
	batch_mode: bool = False) -> str:
	"""
	Constructs the filepath for storing the results.

	Args:
	dataset_name (str, optional): The name of the dataset.
	count (int, optional): The count of records.
	definition (Definition, optional): The definition of the algorithm.
	query_arguments (Any, optional): Additional arguments for the query.
	batch_mode (bool, optional): If True, the batch mode is activated.

	Returns:
	str: The constructed filepath.
	"""
	d = ["results"]
	if dataset_name:
	d.append(dataset_name)
	if count:
	d.append(str(count))
	if definition:
	d.append(definition.algorithm + ("-batch" if batch_mode else ""))
	data = definition.arguments + query_arguments
	d.append(re.sub(r"\W+", "_", json.dumps(data, sort_keys=True)).strip("_") + ".hdf5")
	return os.path.join(*d)

	continue
	try:
	with h5py.File(os.path.join(root, filename), "r+") as f:
	properties = dict(f.attrs)
	if batch_mode != properties["batch_mode"]:
	continue
	yield properties, f
	except Exception:

add --count and --batch args for data_export.py #522

Are you sure you want to change the base?

add --count and --batch args for data_export.py #522

Conversation

jiangyinzuo commented May 17, 2024

maumueller commented May 21, 2024

jiangyinzuo May 21, 2024 • edited Loading

Choose a reason for hiding this comment

maumueller May 21, 2024

Choose a reason for hiding this comment

jiangyinzuo commented May 21, 2024

add `--count` and `--batch` args for data_export.py #522

add `--count` and `--batch` args for data_export.py #522

jiangyinzuo May 21, 2024 •

edited

Loading