-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dscquery takes long time to load data #203
Comments
@gaow As we discussed in person, I think the best way to approach this is to provide more information to the user about what |
@pcarbo I implemented a simple progress bar that shows percentage of tasks left and estimated time left:
It might worth adding to it for internal diagnosis some monitoring stats such as CPU usage and disk i/o status, to see if there are other improvements we can make. |
@gaow Very nice! That is certainly an improvement. |
I'm testing out @fmorgante 's example myself. I noticed even running a regular query via the |
@gaow One thing that might be helpful here would be to establish a "lower bound" on runtime. For instance, suppose I load matrices from hundreds of |
|
One think I notice is lines 571 to 582 of current dsc <- dscrutils::dscquery("./", c("score.err", "score", "simulate", "fit_pred", "simulate.n_traits", "simulate.pve", "simulate.n_signal", "small_data.subsetN"), return.type='list', verbose=T,cache='test.csv') since I added the By "stuck" I'm talking about 2 hours as of now, and still counting! Improving this could be a good first issue to someone with some computational background. Still it would be nice if you could verify it. I suspect it might be easier (or at least for me) to deal with it at the level of |
@gaow One bottleneck was There are some other places where the code is unnecessarily slow due to naive implementation. I will continue to work on this. In any case, this is a very useful test case. (And it is the first time I've tried running |
@fmorgante complains about low performance of
dscquery
for the scale of DSC he's working on. @fmorgante it would be helpful if you can tell us:Also since now we use RDS and PKL files to save output, we have to load the entire file to extract a specific quantity. This is a limitation that we cannot resolve unless we switch to other data storage solution as has long been discussed ..
The text was updated successfully, but these errors were encountered: