Wrap Up
Reproducible and Trustworthy Workflows for Data Science
Welcome
Introduction
1
How do reproducible and trustworthy workflows impact data science?
2
Introduction to the Bash Shell
Version Control
3
SSH for authentication
4
Version control (for transparency and collaboration) I
5
Version control (for transparency and collaboration) II
6
Project management using GitHub
Projects, Environments, and Containers
7
Filenames and data science project organization, Integrated development environments
8
Virtual environments
9
Conda lock: reproducible lock files for conda environments
10
Introduction to containerization
11
Using and running containers
12
Customizing and building containers
Data Validation
13
Data validation
14
Introduction to testing code for data science
Automation
15
Non-interactive scripts
16
Reproducible reports
17
Data analysis pipelines with scripts
18
Data analysis pipelines with GNU Make
Packages and Continuous Integration + Deployment
19
Packaging and documenting code
20
Automated testing and continuous integration
21
Deploying and publishing packages
Wrap Up
22
Copyright and licenses
23
Workflows for reproducibile and trustworthy data science wrap-up
Appendix
24
Defining functions in Python
25
Defining functions in R
26
Reproducible reports
Wrap Up
Finally, we’ll wrap up with how we manage the actual codebase surrounding our work.
21
Deploying and publishing packages
22
Copyright and licenses