100-days-mlops-kodekloud

Configure a DVC Remote Storage

Problem

The xFusionCorp Industries ML team uses SeaweedFS as the shared S3-compatible object store for DVC-tracked data. A .dvc/config already declares a remote called s3 for the fraud-detection project, but dvc push currently fails. Correct the configuration and push the tracked data into the SeaweedFS bucket.

  1. A project exists at /root/code/fraud-detection/ with DVC initialised and data/raw/transactions.csv already tracked.

  2. SeaweedFS is already running on the controlplane:

    • S3 endpoint: http://localhost:8333
    • Filer UI: open the SeaweedFS Filer button at the top of the lab (forwarded port 8888) – buckets are visible under /buckets/.
    • Credentials: weedadmin / weedadmin123 (already set in .dvc/config)
    • Bucket name: dvc-storage (already created and visible in the Filer UI under /buckets/dvc-storage)
  3. Review the existing .dvc/config and correct everything that prevents dvc push from succeeding. The remote called s3 must:

    • point at the dvc-storage bucket using s3://;
    • use the correct SeaweedFS S3 endpoint URL;
    • be marked as the default remote.
  4. Push the tracked data. After the push, the dvc-storage bucket in the SeaweedFS Filer UI must contain at least one object under the files/md5/... prefix.

Solution

  1. Inspect .dvc/config and correct remote s3 so it points to SeaweedFS bucket:

     ['remote "s3"']
         url = s3://dvc-storage
         endpointurl = http://localhost:8333
         access_key_id = weedadmin
         secret_access_key = weedadmin123
    

    Here we have corrected bucket name and the endpoint url. Make sure you have set the correct one.

  2. Set s3 as default remote:

     dvc remote default s3
    
  3. Push tracked data:

     dvc push
    
  4. Verify SeaweedFS Filer UI shows object under files/md5/... in dvc-storage.

Good to Know