This repository contains the code and resources for the 100 Days of MLOps challenge by KodeKloud. The course is designed to help you learn and practice MLOps concepts and techniques over a span of 100 days.
This challenge walks through core MLOps practice end to end. You start with local Python setup, project structure, and code quality, then move through data versioning, experiment tracking, model registry, training pipelines, serving, monitoring, CI/CD, and Kubernetes-based deployment.
Before starting, you should know basic Python, Git, command-line work, and simple machine learning workflows. Helpful extras: Docker, YAML, and comfort editing files in a terminal. I would recommend finishing this 100 days of DevOps challenge before starting here.
By the end of the 100 days, you should be able to:
#100DaysOfMLOpsWant to start the official challenge? Use this KodeKloud link - it helps support this project!
| Day | Challenge | Topics | Solution |
|---|---|---|---|
| 01 | Create a Python Virtual Environment for ML | Python, virtual environments, project setup | Solved |
| 02 | Set Up and Configure Jupyter Notebook Server | Jupyter, notebooks, local dev setup | Solved |
| 03 | Fix a Broken uv Lockfile Specification | uv, dependency locking, package resolution | Solved |
| 04 | Create a Standard ML Project Structure | repository structure, packaging, conventions | Solved |
| 05 | Create a Makefile for ML Workflow Automation | Makefile, task automation, developer workflow | Solved |
| 06 | Set Up Code Quality Tools for ML Code | linting, formatting, static analysis | Solved |
| 07 | Package an ML Project as Installable Python Package | packaging, setuptools, installation | Solved |
| 08 | Configure Pre-Commit Hooks for ML Repository | pre-commit, git hooks, code quality | Solved |
| 09 | Create a Custom ML Project Template with Cookiecutter | Cookiecutter, templates, scaffolding | Solved |
| 10 | Install and Initialize DVC in an ML Project | DVC, data versioning, initialization | Solved |
| 11 | Track a Dataset with DVC | DVC, dataset tracking, reproducibility | Solved |
| 12 | Configure a DVC Remote Storage | DVC, remote storage, artifact sync | Solved |
| 13 | Pull DVC-Tracked Data from Remote | DVC, pull, data restore | Solved |
| 14 | Create a DVC Pipeline for Data Processing | DVC pipelines, data processing, stages | Solved |
| 15 | Parameterize a DVC Pipeline | DVC, parameters, pipeline configuration | Solved |
| 16 | Track ML Metrics with DVC | DVC, metrics, experiment tracking | Solved |
| 17 | Run and Compare DVC Experiments | DVC, experiments, comparison | Solved |
| 18 | Version Datasets and Models Across Git Branches | Git branches, model versioning, datasets | Solved |
| 19 | Build Complete DVC ML Pipeline with Remote Storage and Experiments | DVC, remote storage, full pipeline | Solved |
| 20 | Install and Start the MLflow Tracking Server | MLflow, tracking server, setup | Solved |
| 21 | Log an ML Experiment to MLflow | MLflow, experiment logging, runs | Solved |
| 22 | Create and Organize MLflow Experiments | MLflow, experiment management, organization | Solved |
| 23 | Search and Query MLflow Runs | MLflow, search, querying runs | Pending |
| 24 | Enable MLflow Autologging | MLflow, autologging, instrumentation | Pending |
| 25 | Register, Version, and Manage Model Lifecycle | MLflow, model registry, lifecycle | Pending |
| 26 | Compare Model Runs and Select the Best | MLflow, model comparison, selection | Pending |
| 27 | Load Model from Registry with Custom Preprocessing | MLflow, registry, preprocessing | Pending |
| 28 | Fix a Broken MLflow Project and Re-Run It | MLflow, debugging, project recovery | Pending |
| 29 | Configure MLflow with Remote Tracking Server and Artifact Store | MLflow, remote tracking, artifacts | Pending |
| 30 | End-to-End MLflow Lifecycle: Train, Register, Serve, Monitor | MLflow, lifecycle, deployment, monitoring | Pending |
| 31 | Train a Scikit-Learn Model with Reproducible Script | scikit-learn, training script, reproducibility | Pending |
| 32 | Manage Training Configuration with YAML | YAML, configuration, training params | Pending |
| 33 | Evaluate a Trained Model and Generate Classification Report | evaluation, metrics, classification report | Pending |
| 34 | Implement Cross-Validation for Model Selection | cross-validation, model selection, evaluation | Pending |
| 35 | Hyperparameter Tuning with Optuna | Optuna, hyperparameter tuning, optimization | Pending |
| 36 | Automated Model Selection with FLAML AutoML | FLAML, AutoML, model selection | Pending |
| 37 | Distributed Model Training with Joblib Parallelization | joblib, parallelization, distributed training | Pending |
| 38 | Build Modular Training Pipeline with Config-Driven Stages | modular design, config-driven pipeline, training | Pending |
| 39 | Train a PyTorch Model with GPU Support and Checkpointing | PyTorch, GPU, checkpointing | Pending |
| 40 | Production Training System: Tracking, Tuning, and Model Selection | production training, tracking, tuning | Pending |
| 41 | Install and Initialize a Feast Feature Store | Feast, feature store, initialization | Pending |
| 42 | Define Feature Views in Feast | Feast, feature views, feature engineering | Pending |
| 43 | Materialize Features to the Online Store | Feast, online store, materialization | Pending |
| 44 | Store MLflow’s Admin Password in HashiCorp Vault | Vault, secrets, MLflow integration | Pending |
| 45 | Fix a Broken Vault KV Policy for the MLflow Reader | Vault, policies, access control | Pending |
| 46 | Author Data-Quality Expectations with Great Expectations | Great Expectations, data quality, expectations | Pending |
| 47 | Debug a Failing Great Expectations Checkpoint | Great Expectations, checkpoints, debugging | Pending |
| 48 | Publish Great Expectations Data Docs as a CI Artefact | Great Expectations, CI, documentation | Pending |
| 49 | Secrets + Data-Quality Integration Capstone | secrets management, data quality, capstone | Pending |
| 50 | Create Docker Image for ML Training Environment | Docker, containerization, training env | Pending |
| 51 | Create Multi-Stage Docker Build for ML Serving | Docker, multi-stage builds, serving | Pending |
| 52 | Set Up Local ML Dev Environment with Docker Compose | Docker Compose, local dev, services | Pending |
| 53 | Create GPU-Enabled Docker Image for Deep Learning | Docker, GPU, deep learning | Pending |
| 54 | Push ML Model Images to Container Registry | container registry, image push, Docker | Pending |
| 55 | Add Health Checks and Graceful Shutdown to ML Containers | health checks, shutdown, reliability | Pending |
| 56 | Automate ML Docker Image Building in CI Pipeline | CI, Docker build, automation | Pending |
| 57 | Serve an ML Model with Flask | Flask, serving, API | Pending |
| 58 | Serve an ML Model with FastAPI | FastAPI, serving, API | Pending |
| 59 | Run Batch Predictions on a Dataset | batch inference, predictions, dataset processing | Pending |
| 60 | Package a Model as a BentoML Service | BentoML, packaging, model service | Pending |
| 61 | Containerize an ML Model API with Docker | Docker, API containerization, serving | Pending |
| 62 | Implement A/B Testing for Model Deployment | A/B testing, deployment, experimentation | Pending |
| 63 | Implement Async Batch Prediction with Task Queue | async jobs, task queues, batch prediction | Pending |
| 64 | Serve Multiple Models Behind Unified API Gateway | API gateway, multi-model serving, routing | Pending |
| 65 | Implement Canary Deployment for Model Updates | canary deployment, rollout, release strategy | Pending |
| 66 | Production Model Serving with Docker Compose | Docker Compose, production serving, orchestration | Pending |
| 67 | Add Prometheus as a Grafana Data Source | Prometheus, Grafana, observability | Pending |
| 68 | Generate a Model Performance Report | model performance, reporting, metrics | Pending |
| 69 | Generate a Data Quality Report | data quality, reporting, validation | Pending |
| 70 | Create Automated Tests with Evidently Test Suites | Evidently, testing, monitoring | Pending |
| 71 | Set Up Evidently Monitoring Dashboard | Evidently, dashboard, monitoring | Pending |
| 72 | Set Up Drift Detection Alerts | drift detection, alerts, monitoring | Pending |
| 73 | Automatic Retraining Triggered by Drift Detection | drift detection, retraining, automation | Pending |
| 74 | Monitor Custom Business Metrics Alongside ML Metrics | business metrics, ML metrics, monitoring | Pending |
| 75 | End-to-End Monitoring: Prometheus, Grafana, Evidently | monitoring stack, observability, ML metrics | Pending |
| 76 | Create CI Pipeline for ML Code Linting and Testing | CI, linting, testing | Pending |
| 77 | Add Data Validation to CI Pipeline | CI, data validation, quality gates | Pending |
| 78 | Add Model Validation Tests to CI | CI, model validation, testing | Pending |
| 79 | Generate Model Performance Reports in CI with CML | CML, CI, reporting | Pending |
| 80 | Automate Model Registration in CI/CD | CI/CD, model registry, automation | Pending |
| 81 | Automate Model Deployment with CD Pipeline | CD, deployment, automation | Pending |
| 82 | End-to-End ML CI/CD Pipeline | CI/CD, automation, deployment | Pending |
| 83 | Automated Model Rollback with Health Checks | rollback, health checks, deployment safety | Pending |
| 84 | Production ML CI/CD with Multi-Environment Promotion | CI/CD, promotion, environments | Pending |
| 85 | Install Argo Workflows on Kubernetes | Argo Workflows, Kubernetes, installation | Pending |
| 86 | Create a Basic ML Training Workflow in Argo | Argo Workflows, training workflow, Kubernetes | Pending |
| 87 | Pass Data Between Argo Steps with Output Parameters and Branching | Argo, parameters, branching | Pending |
| 88 | Create an ML Pipeline with Prefect | Prefect, workflow orchestration, pipelines | Pending |
| 89 | Parallel Model Training with Argo withItems Fan-Out | Argo, parallelism, fan-out | Pending |
| 90 | Automated Retraining with Argo CronWorkflow | Argo, CronWorkflow, scheduled retraining | Pending |
| 91 | Production ML Pipeline: Argo Workflows + MLflow on Kubernetes | Argo, MLflow, Kubernetes pipeline | Pending |
| 92 | Deploy an ML Model on Kubernetes | Kubernetes, deployment, serving | Pending |
| 93 | Configure HPA for ML Serving Deployment | Kubernetes, HPA, autoscaling | Pending |
| 94 | Deploy a Model with KServe InferenceService | KServe, inference service, Kubernetes | Pending |
| 95 | Kubeflow Pipelines - Install and Run a Basic KFP Pipeline | Kubeflow Pipelines, installation, orchestration | Pending |
| 96 | GitOps Model Deployment with ArgoCD | GitOps, ArgoCD, deployment | Pending |
| 97 | Capstone (1/4): End-to-End ML System - Train, Register, Serve | capstone, training, registry, serving | Pending |
| 98 | Capstone (2/4): Monitoring and Automated Retraining | capstone, monitoring, retraining | Pending |
| 99 | Capstone (3/4): Orchestrate the Full MLOps Loop with Argo Workflows | capstone, orchestration, Argo Workflows | Pending |
| 100 | Capstone (4/4): Close the Loop with Prometheus + Grafana Observability | capstone, Prometheus, Grafana, observability | Pending |
Contributions are welcome! If you think any solution steps would be better, Here’s how you can help:
git checkout -b feature/improvement-day-number)git commit -am 'Add new solution for days-number')git push origin feature/improvement-day-number)If this repository helped you in your DevOps journey:
This project is licensed under the GPL License - see the LICENSE file for details.
Happy Learning!
Remember: Start with Day 1 and build your skills progressively!