Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to speedup the process for new files? #241

Open
mostafa8026 opened this issue Sep 25, 2021 · 2 comments
Open

How to speedup the process for new files? #241

mostafa8026 opened this issue Sep 25, 2021 · 2 comments

Comments

@mostafa8026
Copy link

Is there a ways to improve this technique? when I add a new file, I have to redo everything to find similarities. is there a way to speed up the process of adding new fiels?

@mostafa8026
Copy link
Author

any suggestion to implement it by myself appreciated. tnx

@duhaime
Copy link
Contributor

duhaime commented Sep 25, 2021

@mostafa8026 Good question!

The data processing pipeline has a few steps, the first of which transforms each image into a vector. The image vectors are computed and cached (in outputs/data/image-vectors) and so can be read directly after the first run, which should greatly expedite processing.

It's also worth noting that one can use a GPU to accelerate the creation of those image vectors. See the segments of the README on CUDA acceleration if that's an option for you.

From there, we need to project the vectors down to 2D for visualization. Right now we create a new UMAP model for this projection each time a user runs the pixplot command. But we could cache the model from the first run and then use it for subsequent runs. The tradeoff here is between model accuracy and performance--using a cached model will make the data less expressive and could potentially refrain from displaying some patterns that are latent in the distribution, but will run faster, while creating a new model each run maximizes data expressivity but slows down processing...

If you're interested in the idea, check out the UMAP docs on projecting new data with an extant model. We have some code for saving models and loading saved models you could consult if you wanted to try using cached models when processing data. If that sounds interesting, please feel free to send a PR and we'll be happy to review and help it get accepted!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants