Wrap Up
Reproducible and Trustworthy Workflows for Data Science
Welcome
Introduction
1
How do reproducible and trustworthy workflows impact data science?
2
Introduction to the Bash Shell
Version Control
3
SSH for authentication
4
Version control (for transparency and collaboration) I
5
Version control (for transparency and collaboration) II
6
Project management using GitHub
Projects, Environments, and Containers
7
Filenames and data science project organization, Integrated development environments
8
Virtual environments
9
Conda lock: reproducible lock files for conda environments
10
Introduction to containerization
11
Using and running containers
12
Customizing and building containers
Data Validation
13
Data validation
Automation
14
Non-interactive scripts
15
Reproducible reports
16
Data analysis pipelines with scripts
17
Data analysis pipelines with GNU Make
Software Testing for Data Science
18
Introduction to testing code for data science
19
Python testing example
20
R testing example
21
Observability of unit outputs in data science
Packages and Continuous Integration + Deployment
22
Packaging and documenting code
23
Automated testing and continuous integration
24
Deploying and publishing packages
Wrap Up
25
Copyright and licenses
26
Workflows for reproducibile and trustworthy data science wrap-up
Appendix
27
Defining functions in Python
28
Defining functions in R
29
Reproducible reports
Wrap Up
Finally, we’ll wrap up with how we manage the actual codebase surrounding our work.
24
Deploying and publishing packages
25
Copyright and licenses