-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add --count
and --batch
args for data_export.py
#522
base: main
Are you sure you want to change the base?
Conversation
Thanks @jiangyinzuo. What is the actual use case for adding these parameters? The idea of |
@@ -84,7 +84,7 @@ def load_all_results(dataset: Optional[str] = None, | |||
Yields: | |||
tuple: A tuple containing properties as a dictionary and an h5py file object. | |||
""" | |||
for root, _, files in os.walk(build_result_filepath(dataset, count)): | |||
for root, _, files in os.walk(build_result_filepath(dataset, count, batch_mode=batch_mode)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If data_export.py
is expected to export all the results that are found, then here build_result_filepath(...)
should return all the sub-directories.
However, in def build_result_filepath(...):
ann-benchmarks/ann_benchmarks/results.py
Lines 11 to 38 in 3982de0
def build_result_filepath(dataset_name: Optional[str] = None, | |
count: Optional[int] = None, | |
definition: Optional[Definition] = None, | |
query_arguments: Optional[Any] = None, | |
batch_mode: bool = False) -> str: | |
""" | |
Constructs the filepath for storing the results. | |
Args: | |
dataset_name (str, optional): The name of the dataset. | |
count (int, optional): The count of records. | |
definition (Definition, optional): The definition of the algorithm. | |
query_arguments (Any, optional): Additional arguments for the query. | |
batch_mode (bool, optional): If True, the batch mode is activated. | |
Returns: | |
str: The constructed filepath. | |
""" | |
d = ["results"] | |
if dataset_name: | |
d.append(dataset_name) | |
if count: | |
d.append(str(count)) | |
if definition: | |
d.append(definition.algorithm + ("-batch" if batch_mode else "")) | |
data = definition.arguments + query_arguments | |
d.append(re.sub(r"\W+", "_", json.dumps(data, sort_keys=True)).strip("_") + ".hdf5") | |
return os.path.join(*d) |
If count
and batch_mode
are None
, directories of specific counts and batch mode will be ignored.
So if we expect data_export.py
to export all the results, including every count
argument, batch mode, and non-batch mode, maybe we should implement a new function for def load_all_results(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think the problem is that
ann-benchmarks/ann_benchmarks/results.py
Lines 90 to 97 in 3982de0
continue | |
try: | |
with h5py.File(os.path.join(root, filename), "r+") as f: | |
properties = dict(f.attrs) | |
if batch_mode != properties["batch_mode"]: | |
continue | |
yield properties, f | |
except Exception: |
doesn't differentiate between batch_mode
being True
or False
, or batch_mode being None
.
Since definition
is None
in the part above, it should try to load both the batch/non-batch/different count values.
I decided to mark this PR as a draft until it is ready. |
also convert
X
to numpy.array in glove dataset function