Continuous Integration
Reproducible and Trustworthy Workflows for Data Science
Welcome
Introduction
1
How do reproducible and trustworthy workflows impact data science?
2
Introduction to the Bash Shell
Version Control
3
SSH for authentication
4
Version control (for transparency and collaboration) I
5
Version control (for transparency and collaboration) II
6
Project management using GitHub
Projects, Environments, and Containers
7
Filenames and data science project organization, Integrated development environments
8
Virtual environments
9
Conda lock: reproducible lock files for conda environments
10
Introduction to containerization
11
Using and running containers
12
Customizing and building containers
Data Validation
13
Data validation
Automation
14
Non-interactive scripts
15
Reproducible reports
16
Data analysis pipelines with scripts
17
Data analysis pipelines with GNU Make
Software Testing for Data Science
18
Introduction to testing code for data science
19
Testing functions in R with
testthat
20
Testing functions Python with
pytest
21
Testing: Images
22
Evaluating test suite quality
Packaging
23
Introduction to R & Python packages
24
Packaging: Python
25
Packaging: R
26
Packaging: Conclusion
27
Package Testing with Python
pytest
28
Code Coverage: R
29
Code Coverage: Python
Continuous Integration
30
Automated testing and continuous integration
31
Case Study:
pypkgs-cookiecutter
’s
ci.yml
workflow
32
Case study: a simplified version of the R
check-release.yaml
workflow
33
Package Documentation: Python
34
Package Documentation: R
Continuous Deployment
35
Deploying and publishing packages
36
Continuous Deployment: Python
37
What about CD with R packages
38
Peer review facilitates package publishing
Wrap Up
39
Copyright and licenses
40
Workflows for reproducibile and trustworthy data science wrap-up
Appendix
41
Defining functions in Python
42
Defining functions in R
43
Reproducible reports
Continuous Integration
29
Code Coverage: Python
30
Automated testing and continuous integration