# Data Science Boilerplate
- What: A standardised project structure for doing and sharing data science work that enforces best practices.
- When: 2022
- Who: Me
# Usage
- Install cookiecutter.
pip install cookiecutter
- Start a new project. You will be prompted to enter some configuration values.
cookiecutter gh:andrewjkuo/ds-boilerplate
# Project Structure
The directory structure of your new project looks like this:
├── Dockerfile <- Dockerfile to build a basic image.
├── Makefile <- Makefile with useful commands for project setup and running analysis.
├── README.md <- The top-level README for developers using this project.
├── conf
│ ├── base <- Configuration files that can be stored in source control.
│ └── local <- Local secrets and credentials that should not be stored in source control.
├── data
│ ├── 01_raw <- The original, immutable data dump.
│ ├── 02_intermediate <- Intermediate data that has been transformed.
│ ├── 03_model_input <- The final, canonical data sets for modeling.
│ └── 04_model_output <- Outputs from models (e.g. predictions).
├── models <- Trained and serialised models or model summaries.
├── notebooks <- Jupyter notebooks.
├── references <- Data dictionaries, manuals and all other explanatory materials.
├── requirements.txt <- The requirements file for reproducing the analysis environment.
├── setup.py <- Makes project pip installable so src can be imported.
├── src <- Source code for use in this project.
│ ├── __init__.py <- Make src a Python module.
│ └── data <- Scripts to download or generate data.
│ └── features <- Scripts to turn raw data into features for modeling.
│ └── model <- Scripts to train models and make predictions.
│ └── utils <- Scripts with utility functions.
│ └── visualisation <- Scripts to create exploratory and results-oriented visualisations.
└── tests <- Tests for functions in src.
# Github Repository
https://github.com/andrewjkuo/ds-boilerplate (opens new window)