Skip to main content
Back to top
Ctrl
+
K
Search
Ctrl
+
K
Reproducible and Trustworthy Workflows for Data Science
Course notes
How do reproducible and trustworthy workflows impact data science?
Introduction to the Bash Shell
SSH for authentication
Version control (for transparency and collaboration) I
Version control (for transparency and collaboration) II
Project management using GitHub
Filenames and data science project organization, Integrated development environments
Virtual environments
Conda lock: reproducible lock files for conda environments.
Introduction to containerization
Using and running containers
Customizing and building containers
Data validation
Non-interactive scripts
Reproducible reports
Data analysis pipelines
Introduction to testing code for data science
Packaging and documenting code
Automated testing and continuous integration
Deploying and publishing packages
Copyright and licenses
Workflows for reproducibile and trustworthy data science wrap-up
Appendix - Reproducible reports
Appendix - Defining functions
Appendix - Defining functions in Python:
Repository
Open issue
Index