100-days-mlops-kodekloud

Run and Compare DVC Experiments

Problem

The xFusionCorp Industries data science team compares multiple training runs with different hyperparameters using DVC experiments. Run three experiments that vary the n_estimators hyperparameter, identify the best-performing one, and promote it to the tracked workspace.

  1. A project exists at /root/code/fraud-detection/ with a parameterised DVC pipeline already in place. params.yaml contains n_estimators: 100 and the baseline pipeline has been run once.

  2. Run three DVC experiments, each with a different value for n_estimators across a reasonable range (for example 50, 200, and 500). Each experiment should produce a fresh metrics.json.

  3. Compare the experiments and choose the one whose f1_score is the highest.

  4. Apply the chosen experiment to the workspace so its n_estimators, metrics.json, and models/model.pkl become the tracked state.

The DVC extension’s EXPERIMENTS section under the DVC view lists every experiment alongside its parameters and metrics, supports running fresh experiments through the + action, and applies a selected experiment to the workspace from the right-click menu—every operation in this lab can be performed either through the extension UI or with the equivalent dvc exp commands.

Solution

To run and compare DVC experiments, follow these steps:

  1. First, ensure you are in the project directory:

    cd /root/code/fraud-detection/
    
  2. Run three DVC experiments with different n_estimators values:

    dvc exp run -S n_estimators=50
    dvc exp run -S n_estimators=200
    dvc exp run -S n_estimators=500
    
  3. After running the experiments, compare the results using:

    dvc exp show
    

    This will display a table of experiments with their parameters and metrics. Identify the experiment with the highest f1_score.

    You can sort experiments by f1_score to make the best run easier to spot:

    dvc exp show --sort-by metrics.json:f1_score- \
      --keep name \
      --keep params.yaml:n_estimators \
      --keep metrics.json:accuracy \
      --keep metrics.json:f1_score
    

    The trailing - after f1_score sorts in descending order, so the highest score appears first.

  4. Once you have identified the best experiment, apply it to the workspace using:

    dvc exp apply <experiment_name>
    
  5. Verify that the selected experiment is now the workspace state:

    cat params.yaml
    cat metrics.json
    dvc status
    

Good to Know?