100-days-mlops-kodekloud

Track ML Metrics with DVC

Problem

After training a model, the xFusionCorp Industries ML team wants DVC to surface metrics through dvc metrics show and the DVC extension’s METRICS view. The fraud-detection pipeline already trains a model and writes a metrics.json, but DVC does not recognise the file as a metric. Wire it in correctly.

  1. A project exists at /root/code/fraud-detection/ with a three-stage DVC pipeline (process_data, split_data, train). The train stage runs src/models/train.py, which writes the model to models/model.pkl and metrics to metrics.json. Do not modify the Python files.

  2. The train stage in dvc.yaml must declare metrics.json as a DVC metric output, not as a regular file output. The metric must be declared with cache: false so the JSON lives in Git for diff history rather than in the DVC cache.

  3. Re-run the pipeline with dvc repro so the metric registration takes effect.

  4. After your changes, dvc metrics show must report the accuracy and f1_score values from metrics.json.

The DVC extension’s METRICS section under the DVC view will surface the same values directly in the editor once the metric is registered.

Solution

To register metrics.json as a DVC metric output, we need to modify the dvc.yaml file for the train stage. Here’s how we can do it:

  1. Open the dvc.yaml file located at /root/code/fraud-detection/dvc.yaml.
  2. Locate the train stage in the dvc.yaml file. It should look something like this:

     train:
         cmd: python src/models/train.py
         deps:
             - data/processed/train.csv
             - src/models/train.py
         outs:
             - models/model.pkl
         metrics:
             - metrics.json:
                 cache: false
    

    checkout full source code of day_16_dvc.yaml pipeline file.

  3. After modifying the pipeline, run the following commands:

    dvc repro
    dvc metrics show
    

    It should create a metrics.json file and display the accuracy and f1_score.