100-days-mlops-kodekloud

Create a Standard ML Project Structure

Problem

A colleague has started a new ML project at /root/code/fraud-detection/, but the layout does not match the xFusionCorp Industries standard. Bring the project in line with the team’s conventions.

  1. Inspect the existing project at /root/code/fraud-detection/.

  2. The final layout must match the tree below exactly:

     fraud-detection/
     ├── data/
     │   ├── raw/
     │   └── processed/
     ├── models/
     ├── notebooks/
     ├── src/
     │   ├── data/
     │   ├── features/
     │   ├── models/
     │   └── utils/
     ├── tests/
     ├── configs/
     ├── requirements.txt
     └── README.md
    
  3. Every subdirectory under src/ must contain an __init__.py file so that Python recognises it as a package.

  4. requirements.txt must list the following dependencies, one per line: scikit-learn, pandas, numpy, and mlflow. The canonical PyPI name for the scikit-learn package is scikit-learn.

  5. README.md must begin with the heading # fraud-detection.

  6. Review the existing project and correct everything that does not match the requirements above.

Solution

  1. Updated Readme.md according to task 5:

     # fraud-detection
    
    
  2. According to required files structures

    • two sub directory raw and processed is missing under data directory.
    • tests and configs directory is also missing
    • let’s create them using the following commands:
     mkdir -p fraud-detection/data/{raw,processed}
     mkdir -p fraud-detection/{tests,configs}
    
  3. In my case, I found two directories name was wrong (util and feature). Lets rename those directories:

     mv fraud-detection/src/feature fraud-detection/src/features
     mv fraud-detection/src/util fraud-detection/src/utils
    
  4. For task 3, just inspect and make sure each sub directory has __init__.py under src/ directory. If anyone is missing, then you can create with these commands accordingly.

     touch fraud-detection/src/data/__init__.py
     touch fraud-detection/src/features/__init__.py
     touch fraud-detection/src/models/__init__.py
     touch fraud-detection/src/utils/__init__.py
    
  5. Updated requirements.txt file based on the packages that are required to be listed.

     echo -e "scikit-learn\npandas\nnumpy\nmlflow" > fraud-detection/requirements.txt