100-days-mlops-kodekloud

Run and Compare DVC Experiments

Problem

The xFusionCorp Industries data science team compares multiple training runs with different hyperparameters using DVC experiments. Run three experiments that vary the n_estimators hyperparameter, identify the best-performing one, and promote it to the tracked workspace.

A project exists at /root/code/fraud-detection/ with a parameterised DVC pipeline already in place. params.yaml contains n_estimators: 100 and the baseline pipeline has been run once.
Run three DVC experiments, each with a different value for n_estimators across a reasonable range (for example 50, 200, and 500). Each experiment should produce a fresh metrics.json.
Compare the experiments and choose the one whose f1_score is the highest.
Apply the chosen experiment to the workspace so its n_estimators, metrics.json, and models/model.pkl become the tracked state.

The DVC extension’s EXPERIMENTS section under the DVC view lists every experiment alongside its parameters and metrics, supports running fresh experiments through the + action, and applies a selected experiment to the workspace from the right-click menu—every operation in this lab can be performed either through the extension UI or with the equivalent dvc exp commands.

Solution

To run and compare DVC experiments, follow these steps:

First, ensure you are in the project directory:
```
cd /root/code/fraud-detection/
```

Run three DVC experiments with different n_estimators values:

dvc exp run -S n_estimators=50
dvc exp run -S n_estimators=200
dvc exp run -S n_estimators=500

After running the experiments, compare the results using:
```
dvc exp show
```
This will display a table of experiments with their parameters and metrics. Identify the experiment with the highest f1_score.

You can sort experiments by f1_score to make the best run easier to spot:
```
dvc exp show --sort-by metrics.json:f1_score- \
  --keep name \
  --keep params.yaml:n_estimators \
  --keep metrics.json:accuracy \
  --keep metrics.json:f1_score
```
The trailing - after f1_score sorts in descending order, so the highest score appears first.
Once you have identified the best experiment, apply it to the workspace using:
```
dvc exp apply <experiment_name>
```
Verify that the selected experiment is now the workspace state:
```
cat params.yaml
cat metrics.json
dvc status
```

Good to Know?

dvc exp run -S key=value changes a parameter for that experiment only. It does not permanently change the tracked workspace until you run dvc exp apply.
Baseline n_estimators: 100 already exists in params.yaml, so you do not need to run another experiment with 100 unless you want to reproduce the baseline.
Choose the experiment by f1_score because the problem asks for best f1_score, not best accuracy.
dvc exp show --json is useful for automation, but dvc exp show or sorted output is easier for manual comparison.
If main shows FileNotFoundError for metrics.json, the baseline commit likely does not contain that metric file. Experiments can still have metrics if they generated metrics.json.
After dvc exp apply, commit the changed tracked files such as params.yaml, metrics.json, and dvc.lock if the lab expects the promoted result to persist in Git.

This site is open source. Improve this page.