Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPC-58 integrate training with mlflow #38

Open
wants to merge 16 commits into
base: dev
Choose a base branch
from

Conversation

dnerini
Copy link
Member

@dnerini dnerini commented Nov 19, 2024

No description provided.

@@ -75,6 +75,13 @@ def main():
help="If set to 1 (default), the input parquet files (homogeneized tables) for the ml routines will be recomputed from the current database rows"+
"This takes a bit of time but is needed if you updated the database and want to use the new data in the training",
metavar="MODELS")

parser.add_option("-l", "--logmlflow",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a flag to decide whether to log the model artifact, false by default?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parser.add_option("-l", "--logmlflow", 
                  type="choice", 
                  choices=["none", "metrics", "all"],
                  dest="logmlflow", 
                  default="none",
                  help="Specify the logging mode for MLFlow. Choices are:" +
                       " 'none' (default, no logging), 'metrics' (log metrics only)," +
                       " or 'all' (log metrics and model)." +
                       " To log to a remote ML server, the environment variable MLFLOW_TRACKING_URI needs to be set.")

How about a choice like this?

import os
from scipy.interpolate import UnivariateSpline
from pathlib import Path
import mlflow
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make mlflow an optional dependency?

for example

try:
    import mlflow

    MLFLOW_INSTALLED = True
except ImportError:
   MLFLOW_INSTALLED = False

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, we will change that

@wolfidan
Copy link
Collaborator

wolfidan commented Nov 19, 2024

I just added the possibility to log also test errrors to mlflow., using cross-validation. Sorry for the stupid commit name.

It works by using the argument -C <number_of_crossval_iterations> in rf_train.py. Default is 0 : no test error, no cross-val

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants