MPC-58 integrate training with mlflow #38

dnerini · 2024-11-19T07:55:22Z

No description provided.

…ing-with-MLflow

dnerini · 2024-11-19T08:07:10Z

rainforest/ml/rf_train.py

@@ -75,6 +75,13 @@ def main():
                      help="If set to 1 (default), the input parquet files (homogeneized tables) for the ml routines will be recomputed from the current database rows"+
                      "This takes a bit of time but is needed if you updated the database and want to use the new data in the training",
                      metavar="MODELS")
+
+    parser.add_option("-l", "--logmlflow", 


we need a flag to decide whether to log the model artifact, false by default?

parser.add_option("-l", "--logmlflow", type="choice", choices=["none", "metrics", "all"], dest="logmlflow", default="none", help="Specify the logging mode for MLFlow. Choices are:" + " 'none' (default, no logging), 'metrics' (log metrics only)," + " or 'all' (log metrics and model)." + " To log to a remote ML server, the environment variable MLFLOW_TRACKING_URI needs to be set.")

How about a choice like this?

dnerini · 2024-11-19T08:14:48Z

rainforest/ml/rfdefinitions.py

 import os
 from scipy.interpolate import UnivariateSpline
 from pathlib import Path
+import mlflow


should we make mlflow an optional dependency?

for example

try: import mlflow MLFLOW_INSTALLED = True except ImportError: MLFLOW_INSTALLED = False

good idea, we will change that

wolfidan · 2024-11-19T08:42:33Z

I just added the possibility to log also test errrors to mlflow., using cross-validation. Sorry for the stupid commit name.

It works by using the argument -C <number_of_crossval_iterations> in rf_train.py. Default is 0 : no test error, no cross-val

…ics only

…b.com/MeteoSwiss/rainforest into MPC-58-Integrate-training-with-MLflow

MicheleCattaneo added 7 commits November 8, 2024 15:43

Added initial mlflow logging code

9c43d97

logging regressor params to mlflow

5a89f12

Made mlflow logging optional and controlled by an input flag

dd3b740

switched to mlflow autologging

8b4798e

Merge branch 'dev' into MPC-58-Integrate-training-with-MLflow

67688d2

Reverted to manual logging, model is gzipped and then logged

eab8969

Merge remote-tracking branch 'origin/dev' into MPC-58-Integrate-train…

1524f3a

…ing-with-MLflow

dnerini requested a review from wolfidan November 19, 2024 07:55

dnerini commented Nov 19, 2024

View reviewed changes

wolfidan added 2 commits November 19, 2024 09:38

first commit

95a8077

first commit

455d34f

MicheleCattaneo and others added 7 commits November 19, 2024 17:21

CV is performed before overall fit. Added biascorrection in cv

712eb51

logmflow flag for rf_train specifies whether to log eveything or metr…

ba35bff

…ics only

rf model can be downloaded from mlflow's server

c7f9bf9

ENH: logging of additional cv metrics in rf fit

bf8a5ec

ENH: logging of additional cv metrics in rf fit

574eeb5

Merge branch 'MPC-58-Integrate-training-with-MLflow' of https://githu…

180e26d

…b.com/MeteoSwiss/rainforest into MPC-58-Integrate-training-with-MLflow

FIX: model_intercomparison in rf.py from master branch

44e8789

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPC-58 integrate training with mlflow #38

MPC-58 integrate training with mlflow #38

dnerini commented Nov 19, 2024

dnerini Nov 19, 2024

MicheleCattaneo Nov 19, 2024

dnerini Nov 19, 2024

wolfidan Nov 19, 2024

wolfidan commented Nov 19, 2024 •

edited

Loading

MPC-58 integrate training with mlflow #38

Are you sure you want to change the base?

MPC-58 integrate training with mlflow #38

Conversation

dnerini commented Nov 19, 2024

dnerini Nov 19, 2024

Choose a reason for hiding this comment

MicheleCattaneo Nov 19, 2024

Choose a reason for hiding this comment

dnerini Nov 19, 2024

Choose a reason for hiding this comment

wolfidan Nov 19, 2024

Choose a reason for hiding this comment

wolfidan commented Nov 19, 2024 • edited Loading

wolfidan commented Nov 19, 2024 •

edited

Loading