Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index checkpoint files incrementally #27

Open
bencottier opened this issue Jun 18, 2021 · 1 comment
Open

Index checkpoint files incrementally #27

bencottier opened this issue Jun 18, 2021 · 1 comment
Labels
enhancement New feature or request question Further information is requested

Comments

@bencottier
Copy link
Contributor

index_checkpoint_files walks a given path to find checkpoint files, and organises the segments of each checkpoint path into tags.

One might want to call this many times on the same top-level checkpoint directory, to analyse checkpoint data while the program is running and new checkpoints are added. For example, if a checkpoint is made at regular time intervals, with the timestamp used as a tag.

If there are a lot of checkpoint files (e.g. 100s), walking the whole path becomes a big waste. One could index a subdirectory of the top-level checkpoint directory, but then not all of the tags would be found, because tags are part of the path.

Is there a way to update the checkpoint index incrementally, based on diffs in the file tree? For example, if I want to reindex per timestep, it only searches the checkpoints for that timestep and adds them to an existing index, but still knows all of the tags.

@bencottier bencottier added enhancement New feature or request question Further information is requested labels Jun 18, 2021
@bencottier
Copy link
Contributor Author

bencottier commented Jun 18, 2021

Of course as soon as I wrote this... is it much simpler than I thought? As long as you give the full path starting from the top-level dir, even if it's a path to a subdirectory, it will be able to read all the tags in that path?

I think the problem with that is: you need to know the structure of the path, the ordering and value of the tags. Whereas in the use-case I have, I don't have information that specific, and I want to query what the tags are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant