The xFusionCorp Industries ML team keeps different dataset and model versions on different Git branches so that the team can roll between versions cleanly. Tag the current state as v1.0, produce a v2-improved branch based on a newer dataset, and confirm that switching back restores the original data.
A project exists at /root/code/fraud-detection/ with a working DVC pipeline and the baseline data/raw/transactions.csv already tracked.
An improved dataset has been pre-staged at /root/code/fraud-detection/data/raw/transactions_v2.csv and is visible in the file explorer. Do not delete this file.
On the main branch, tag the current state as v1.0.
Create a new branch named v2-improved. Replace the tracked dataset with the contents of the v2 file, re-track it with DVC, re-run the pipeline, and commit the changes.
Switch back to the main branch and use dvc checkout to restore the v1 dataset on disk. The restored content must match the hash recorded by the v1.0 tag.
The DVC extension’s DVC TRACKED section in the EXPLORER panel will reflect the current branch’s tracked state—it should show different dataset hashes on main and v2-improved.
To version the dataset across branches, follow these steps:
Go to project directory and tag current main state as v1.0:
cd /root/code/fraud-detection/
git switch main
git tag v1.0
git push origin v1.0
Create branch for improved dataset:
git switch -c v2-improved
Replace tracked dataset with staged v2 file, then re-track it with DVC:
cp data/raw/transactions_v2.csv data/raw/transactions.csv
dvc add data/raw/transactions.csv
Re-run DVC pipeline and commit updated branch state:
dvc repro
git add data/raw/transactions.csv.dvc dvc.lock .gitignore
git commit -m "Use improved transactions dataset"
git push origin v2-improved
Switch back to main and restore original tracked data from DVC:
git switch main
dvc checkout
Verify transactions.csv on main matches hash stored at v1.0:
git show v1.0:data/raw/transactions.csv.dvc
dvc status
The v2-improved branch now points to the newer dataset and regenerated pipeline outputs, while main recovers the original v1.0 tracked state after dvc checkout.
git tag v1.0 marks current Git+DVC state before data swap. Use it when you need exact rollback point.git switch -c v2-improved makes branch for dataset/model variant. Use it when you need parallel version line.dvc init creates .dvc/ metadata in a Git repo. Use it once when starting DVC tracking.dvc add data/raw/transactions.csv tracks a data/model file and writes a .dvc pointer file. Use it when raw data, trained model, or large artifact should be versioned outside Git.dvc remove data/raw/transactions.csv.dvc removes a DVC-tracked output from DVC metadata. Use it when a tracked artifact should leave the project.dvc move old/path new/path moves a DVC-tracked file and updates metadata. Use it instead of plain mv for tracked artifacts.dvc unprotect path/to/file makes a cached file writable. Use it when DVC linked data must be edited in place.dvc status shows workspace drift against dvc.lock and .dvc files. Use it before commits to find changed inputs or stale outputs.dvc diff compares DVC-tracked files between Git commits, branches, or tags. Use it to see dataset/model changes between main, v1.0, and v2-improved.dvc checkout restores tracked files from current Git branch metadata. Use it after git switch so disk data matches current branch.dvc commit records changed outputs without re-running a stage. Use it only when output files were intentionally changed manually.dvc repro rebuilds pipeline stages whose dependencies changed. Use it after data/code/params updates.dvc dag prints pipeline dependency graph. Use it to understand which stage depends on which data, params, or outputs.dvc stage add -n split -d src/data/split_data.py -d data/raw/transactions.csv -o data/processed python src/data/split_data.py creates pipeline stage. Use it to define reproducible commands with dependencies and outputs.dvc stage list lists pipeline stages. Use it to inspect available stage names before dvc repro stage_name.dvc freeze stage_name prevents a stage from running during dvc repro. Use it when one stage output must stay fixed.dvc unfreeze stage_name allows a frozen stage to run again. Use it when pipeline should refresh normally.dvc params diff compares parameter changes across Git refs. Use it when model behavior changed because params.yaml changed.dvc metrics show displays metrics from files declared as metrics. Use it to read current model scores.dvc metrics diff main v2-improved compares metrics between branches or commits. Use it to prove improved dataset/model helped.dvc plots show renders plots from tracked plot files. Use it to inspect curves or prediction distributions.dvc plots diff main v2-improved compares plots between Git refs. Use it for visual model evaluation drift.dvc exp run runs an experiment without making a normal Git commit. Use it for quick model/data/param trials.dvc exp show lists experiment results. Use it to compare params, metrics, and commits.dvc exp apply exp-name applies one experiment to workspace. Use it when an experiment should become real branch work.dvc exp remove exp-name deletes unwanted experiments. Use it to clean noisy trial history.dvc exp gc removes unused experiment objects. Use it to clean old experiment data.dvc remote add -d storage s3://bucket/path registers default remote storage. Use it so team can share cache/data.dvc remote list shows configured remotes. Use it to check project storage targets.dvc remote modify storage key value changes remote config. Use it for credentials, endpoint URLs, or remote options.dvc remote remove storage removes remote config. Use it when storage target is obsolete.dvc push uploads tracked data/model/cache objects to remote. Use it after dvc add, dvc repro, or model training so teammates/CI can pull artifacts.dvc pull downloads tracked data/model/cache objects needed by current Git state. Use it after clone, branch switch, or checkout.dvc fetch downloads cache objects without checking them out to workspace. Use it to prefetch data for later use.dvc gc deletes unused local/remote cache objects. Use it carefully after old branches/experiments are no longer needed.dvc cache dir shows or changes cache directory. Use it when cache needs to live on another disk.dvc cache list lists cache entries. Use it to inspect stored objects.dvc get repo-url path/to/file downloads a file from another DVC repo without adding it to current project. Use it for one-off data fetch.dvc get-url https://host/file.csv downloads a file from a URL without tracking it. Use it for quick external data pulls.dvc import repo-url path/to/file imports and tracks a file from another DVC repo. Use it when upstream data should remain reproducible and updatable.dvc import-url https://host/file.csv data/raw/file.csv imports and tracks external URL data. Use it when dataset source is HTTP/S3/GCS/etc.dvc import-db imports data from a database query. Use it when dataset source lives in SQL/database storage.dvc update data/raw/file.csv.dvc refreshes imported data from its original source. Use it when upstream dataset changed.dvc list repo-url lists files in another DVC repo. Use it before dvc get or dvc import.dvc list-url s3://bucket/path lists files at external storage URL. Use it to inspect remote data locations.dvc data status checks data status across workspace/cloud depending on DVC version. Use it for higher-level data tracking checks.dvc data ls lists DVC-tracked data entries when supported. Use it to browse project data inventory.dvc artifacts manages model/artifact registry features when supported. Use it when promoting named models/artifacts.dvc queue start starts experiment queue workers. Use it when running many queued experiments.dvc queue stop stops experiment queue workers. Use it after queued runs finish.dvc queue status shows queued experiment state. Use it to monitor batch experiment runs.dvc config core.autostage true changes DVC config. Use it when DVC metadata should be auto-staged in Git.dvc install installs Git hooks for DVC. Use it to automate checkout hooks and improve Git+DVC workflow.dvc completion -s zsh prints shell completion script. Use it to enable command autocomplete.dvc root prints project root. Use it inside scripts that need stable repo paths.dvc doctor prints environment/debug information. Use it when DVC behaves unexpectedly.dvc version prints DVC version and dependency details. Use it in bug reports and CI logs.dvc destroy removes DVC metadata from a repo. Use only when intentionally removing DVC from project.