-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide for runs covering a volume of parameter space. #20
Comments
I am aware this is a lot of work so I'd love to discuss it first, including @jhidding, because some aspects of his original config file I don't grasp (and cannot run the fortran code that uses it, so I cannot try it out to understand). The motivation for this request is as follows:
It is not slow as of now. Maybe will become slow for certain values of parameters, but, again, we'll only run ranges of parameters for long model times if we get oscillations. So perhaps not priority now.
Why do we need to end up with a ton of hdf5 files? I thought we could use this format to store outputs from multiple runs in one file, possibly even with the input parameters, in a structured way. Alternatively we can use our RDM team to set up metadata for the ton of files - they really enjoy that sort of work but it's a bit experimental. We have thought about such a graph already! Specifically about a pipeline for detecting oscillations. This was a bit optimistic. I think the graph is something @NiklasHohmann and I can work on, as this requires much less skill than you have. And for us it's good learning. Of course if you agree. |
I guess that will be covered by #19 |
Okay, thanks for clarifying that. Perhaps I should add a "Priority low" label to this issue. |
O yeah, you're right. That is indeed possible. |
Absolutely. Please go ahead. I mean even with the output from all the runs stored in a single hdf5 file, data reduction, i.e. some form of combining the data from all the runs, is needed to enable an analysis. |
I guess one can append to one hdf5 file even if these are multiple individual runs from different config files. |
This request from @EmiliaJarochowska needs some discussion because there are a number of aspects to decide on. It is a big topic and quite some work to implement this properly.
One could create a special version of this parameter file, not covering single parameter values but ranges of values for some parameters, with some bin size for the sampling. Then every combination of parameters could be mapped onto a single hdf5 file.
Also, a special setup is needed making sure that the runs for all the combinations of parameters are executed consecutively in an automated way.
Or in parallel, on multiple nodes of a cluster. That would be faster, but would require an mpi version of this codebase.
However, one could end up with a ton of hdf5 files, which requires further data reduction and analysis in order to draw any scientific conclusion.
Ideally, instead of a ton of hdf5 files, one might prefer a single multidimensional graph depicting the effect of the parameter variations on the depth profiles of the five fields, i.e. on aragonite and calcite compositions, on the pore water concentrations of the two ions and on the porosity.
Exactly how this graph should be compiled needs some thought.
But perhaps a ton of hdf5 files as output is good enough as a first step.
The text was updated successfully, but these errors were encountered: