Milestone 3

Overall project summary

In this course you will work in assigned teams of three or four (see group assignments in Canvas) to answer a predictive question using a publicly available data set that will allow you to answer that question. To answer this question, you will perform a complete data analysis in R and/or Python, from data import to communication of results, while placing significant emphasis on reproducible and trustworthy workflows.

Your data analysis project will evolve throughout the course from a single, monolithic Jupyter notebook, to a fully reproducible and robust data data analysis project, comprised of:

An example final project from another course (where the project is similar) can be seen here: Breast Cancer Predictor

Milestone 3 summary

In this milestone, you will:

  1. Abstract some code from your scripts to functions in a separate file, and write tests for those functions

  2. Continue to manage issues professionally

1. Abstract some code from your scripts to functions in a separate file, and write tests for those functions

In every data science project, there is some code that is repetitive, and other code that may not be repetitive in the current project, but would likely be very useful in other, related future projects. It is well worth it to abstract such code to functions, making it easily reuseable in the future in other work. Examples of code that often repetitive in a data analysis projects:

Abstracting our analysis code into functions also makes it testable! Meaning you can assess whether your code works as expected. This alone is reason enough to use functions in your analysis code.

Your job here is to create at least 3-4 functions from your scripts. One per group member is the minimum. When doing this task, follow the workflow for writing functions and tests for data science, remember this process will include:

If you are using R, these functions will live in an .R file (whose filename will be named after the function, or functions). It is OK to have one function per file, or all functions in one file. This/these file(s) will live in a sub-directory called R. If you are using Python, these functions will live in an .py file (whose filename will be named after the function, or functions). Again, it is OK to have one function per file, or all functions in one file. This/these file(s) will live in a sub-directory called src.

You will source (in the case of R) or import (in the case of Python) these functions in your scripts to use them in your analysis. Tests will live in a test directory, with files/subdirectories organized as per the testing framework you are using. If you are using R for your data analysis code, we expect you to use the testthat R package framework for writing software tests. If you are using Python, we expect you to use the pytest Python package framework.

Of course, if it makes sense to have more than 3-4 you are welcome to increase the number! However, all functions must have the same standards in regards to software robustness. Your functions will be assessed for their quality (e.g., functions should do one thing, and generally return an object unless they were specifically designed for side-effects), usability, readability (follow the tidyverse style guide for R, or the black style guide for Python), documentation and quality of the test suite.

5. Continue to manage issues professionally

Continue managing issues effectively through project boards and milestones, make it clear who is responsible for what and what project milestone each task is associated with. In particular, create an issue for each task and/or sub-task needed for this milestone. Each of these issues must be assigned to a single person on the team. We want all of you to get coding experience in the project and each team member should be responsible for an approximately equal portion of the code.

Submission Instructions

You will submit two URLs to Canvas in the provided text box for milestone 3:

  1. the URL of your project’s GitHub.com repository
  2. the URL of a GitHub release of your project’s GitHub.com repository for this milestone.

Creating a release on GitHub.com

Just before you submit the milestone 3, create a release on your project repository on GitHub and name it something like 2.0.0 (how to create a release). This release allows us and you to easily jump to the state of your repository at the time of submission for grading purposes, while you continue to work on your project for the next milestone.

Expectations