# Data Science Boilerplate
- What: A standardised project structure for doing and sharing data science work that enforces best practices.
- When: 2022
- Who: Me
- Install cookiecutter.
pip install cookiecutter
- Start a new project. You will be prompted to enter some configuration values.
# Project Structure
The directory structure of your new project looks like this:
├── Dockerfile <- Dockerfile to build a basic image. ├── Makefile <- Makefile with useful commands for project setup and running analysis. ├── README.md <- The top-level README for developers using this project. ├── conf │ ├── base <- Configuration files that can be stored in source control. │ └── local <- Local secrets and credentials that should not be stored in source control. ├── data │ ├── 01_raw <- The original, immutable data dump. │ ├── 02_intermediate <- Intermediate data that has been transformed. │ ├── 03_model_input <- The final, canonical data sets for modeling. │ └── 04_model_output <- Outputs from models (e.g. predictions). ├── models <- Trained and serialised models or model summaries. ├── notebooks <- Jupyter notebooks. ├── references <- Data dictionaries, manuals and all other explanatory materials. ├── requirements.txt <- The requirements file for reproducing the analysis environment. ├── setup.py <- Makes project pip installable so src can be imported. ├── src <- Source code for use in this project. │ ├── __init__.py <- Make src a Python module. │ └── data <- Scripts to download or generate data. │ └── features <- Scripts to turn raw data into features for modeling. │ └── model <- Scripts to train models and make predictions. │ └── utils <- Scripts with utility functions. │ └── visualisation <- Scripts to create exploratory and results-oriented visualisations. └── tests <- Tests for functions in src.