Milestone 4

Overall project summary

In this course you will work in assigned teams of three or four (see group assignments in Canvas) to answer a predictive question using a publicly available data set that will allow you to answer that question. To answer this question, you will perform a complete data analysis in R and/or Python, from data import to communication of results, while placing significant emphasis on reproducible and trustworthy workflows.

Your data analysis project will evolve throughout the course from a single, monolithic Jupyter notebook, to a fully reproducible and robust data data analysis project, comprised of:

An example final project from another course (where the project is similar) can be seen here: Breast Cancer Predictor

Milestone 4 summary

In this final project milestone, you will:

Final project specifics

Abstract project functions to their own software package

To further refine your project, you are going to split the code for your analysis into two version control repositories. One will be for your analysis, the other will be another separate repository, that will serve as a place to package up the functions you wrote in milestone 3. Decoupling these two code bases will allow for those functions to more easily be reused in future projects by you and others who might find them useful!

Your new version control repository for your packaged functions should:

You should update your the analysis code in your data analysis repository so that:

2. Address any feedback received in earlier milestones and during peer review

Revise your data analysis project to address feedback received from the DSCI 310 teaching team from past milestones, as well as feedback received from the peer review. 50% of your final grade for this milestone will be assessing whether you have addressed this feedback to improve your project.

To help us easily, and correctly, assess this, please create a GitHub issue in your analysis project’s GitHub repository with the title “Feedback addressed”. In this issue describe any improvements you made to the project based on feedback, and point to evidence of these improvements. You can point us to evidence of addressing it by providing URLs to reference specific lines of code, commit messages, pull requests, etc. Be sure to add some narration when sharing these URLs so that it is easy for us to identify which changes to your work addressed which pieces of feedback.

You will be graded on a sliding scale for this, the more improvements you make, the higher your grade for this part of the milestone will be. The improvements should be at least one per team member, and they should significantly improve the project. This minimum could earn at most 37.5/50. To earn more, you need to exceed these minimum improvements.

Submission Instructions

Just before you submit the milestone 4, create a release on your analysis project repository on GitHub and name it something like 3.0.0 (how to create a release). This release allows us and you to easily jump to the state of your repository at the time of submission for grading purposes, while you continue to work on your project for the next milestone.

Also create a release on your software package repository (do this manually if you are not using an automated tool for this in your continuous deployment).

You will submit a PDF to Gradescope for milestone 4 that includes:

  1. the URL of your analysis project’s GitHub.com repository
  2. the URL of a GitHub release of your analysis project’s GitHub.com repository
  3. the URL to the “Feedback addressed” issue that outlines the feedback you have addressed
  4. the URL of your software package’s GitHub.com repository
  5. the URL of a GitHub release of your software package’s GitHub.com repository

Expectations