Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide for runs covering a volume of parameter space. #20

Open
HannoSpreeuw opened this issue Jul 5, 2023 · 6 comments
Open

Provide for runs covering a volume of parameter space. #20

HannoSpreeuw opened this issue Jul 5, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@HannoSpreeuw
Copy link
Contributor

This request from @EmiliaJarochowska needs some discussion because there are a number of aspects to decide on. It is a big topic and quite some work to implement this properly.

One could create a special version of this parameter file, not covering single parameter values but ranges of values for some parameters, with some bin size for the sampling. Then every combination of parameters could be mapped onto a single hdf5 file.
Also, a special setup is needed making sure that the runs for all the combinations of parameters are executed consecutively in an automated way.
Or in parallel, on multiple nodes of a cluster. That would be faster, but would require an mpi version of this codebase.

However, one could end up with a ton of hdf5 files, which requires further data reduction and analysis in order to draw any scientific conclusion.

Ideally, instead of a ton of hdf5 files, one might prefer a single multidimensional graph depicting the effect of the parameter variations on the depth profiles of the five fields, i.e. on aragonite and calcite compositions, on the pore water concentrations of the two ions and on the porosity.
Exactly how this graph should be compiled needs some thought.

But perhaps a ton of hdf5 files as output is good enough as a first step.

@HannoSpreeuw HannoSpreeuw added the enhancement New feature or request label Jul 5, 2023
@HannoSpreeuw HannoSpreeuw self-assigned this Jul 5, 2023
@EmiliaJarochowska
Copy link
Contributor

I am aware this is a lot of work so I'd love to discuss it first, including @jhidding, because some aspects of his original config file I don't grasp (and cannot run the fortran code that uses it, so I cannot try it out to understand).

The motivation for this request is as follows:

  1. Testing the code when we try to reproduce the oscillations - this does not require scanning a range of parameters, only storing the parameters (ideally together with the output in a hdf5 file). We often wanted to go back to a previous run to recall what a given modification changed. Not having the parameters (and output) stored in a systematic way quickly led to confusion.
  2. Scanning a set of parameters will only really be needed if we reproduce oscillations. So we can decide against that for now and focus on the model itself and having reproducible runs with a record of parameters. The only consideration is whether it is worth rewriting it for a range of parameters if we agree on 1.

Or in parallel, on multiple nodes of a cluster. That would be faster, but would require an mpi version of this codebase.

It is not slow as of now. Maybe will become slow for certain values of parameters, but, again, we'll only run ranges of parameters for long model times if we get oscillations. So perhaps not priority now.

Ideally, instead of a ton of hdf5 files, one might prefer a single multidimensional graph depicting the effect of the parameter variations on the depth profiles of the five fields, i.e. on aragonite and calcite compositions, on the pore water concentrations of the two ions and on the porosity.
Exactly how this graph should be compiled needs some thought.

Why do we need to end up with a ton of hdf5 files? I thought we could use this format to store outputs from multiple runs in one file, possibly even with the input parameters, in a structured way.

Alternatively we can use our RDM team to set up metadata for the ton of files - they really enjoy that sort of work but it's a bit experimental.

We have thought about such a graph already! Specifically about a pipeline for detecting oscillations. This was a bit optimistic.

I think the graph is something @NiklasHohmann and I can work on, as this requires much less skill than you have. And for us it's good learning. Of course if you agree.

@HannoSpreeuw
Copy link
Contributor Author

  1. We often wanted to go back to a previous run to recall what a given modification changed. Not having the parameters (and output) stored in a systematic way quickly led to confusion.

I guess that will be covered by #19

@HannoSpreeuw
Copy link
Contributor Author

It is not slow as of now. Maybe will become slow for certain values of parameters, but, again, we'll only run ranges of parameters for long model times if we get oscillations. So perhaps not priority now

Okay, thanks for clarifying that. Perhaps I should add a "Priority low" label to this issue.

@HannoSpreeuw HannoSpreeuw added the low priority Not yet needed label Jul 6, 2023
@HannoSpreeuw
Copy link
Contributor Author

Why do we need to end up with a ton of hdf5 files? I thought we could use this format to store outputs from multiple runs in one file, possibly even with the input parameters, in a structured way.

O yeah, you're right. That is indeed possible.

@HannoSpreeuw
Copy link
Contributor Author

I think the graph is something @NiklasHohmann and I can work on, as this requires much less skill than you have. And for us it's good learning. Of course if you agree.

Absolutely. Please go ahead. I mean even with the output from all the runs stored in a single hdf5 file, data reduction, i.e. some form of combining the data from all the runs, is needed to enable an analysis.

@EmiliaJarochowska
Copy link
Contributor

Absolutely. Please go ahead. I mean even with the output from all the runs stored in a single hdf5 file, data reduction, i.e. some form of combining the data from all the runs, is needed to enable an analysis.

I guess one can append to one hdf5 file even if these are multiple individual runs from different config files.
We'll wait for #19 to work on data reduction - it will probably naturally fall after I am back from holidays.

@HannoSpreeuw HannoSpreeuw added high priority Work on this first and removed low priority Not yet needed labels Nov 21, 2023
@EmiliaJarochowska EmiliaJarochowska added wontfix This will not be worked on and removed high priority Work on this first labels Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants