Continuous Deployment
Reproducible and Trustworthy Workflows for Data Science
Welcome
Introduction
1
How do reproducible and trustworthy workflows impact data science?
2
Introduction to the Bash Shell
Version Control
3
SSH for authentication
4
Version control (for transparency and collaboration) I
5
Version control (for transparency and collaboration) II
6
Project management using GitHub
Projects, Environments, and Containers
7
Filenames and data science project organization, Integrated development environments
8
Virtual environments
9
R Environments: renv
10
Python Environments: conda
11
Conda lock: reproducible lock files for conda environments
12
Introduction to containerization
13
Using and running containers
14
docker-compose
15
Customizing and building containers
Data Validation
16
Data validation
17
R Data Validation: pointblank
18
Python Data Validation: Pandera
19
Python Data Validation: deepchecks
Automation
20
Non-interactive scripts
21
Reproducible reports
22
Data analysis pipelines with scripts
23
Data analysis pipelines with GNU Make
Software Testing for Data Science
24
Introduction to testing code for data science
25
Testing functions in R with
testthat
26
Testing functions Python with
pytest
27
Testing: Images
28
Evaluating test suite quality
Packaging
29
Introduction to R & Python packages
30
Packaging: Python
31
Packaging: R
32
Packaging: Conclusion
33
Package Testing with Python
pytest
34
Code Coverage: R
35
Code Coverage: Python
Continuous Integration
36
Automated testing and continuous integration
37
Github Actions
38
Actions: Matrix Workflows
39
Actions: Testing Workflows
40
Case Study:
pypkgs-cookiecutter
’s
ci.yml
workflow
41
Case study: a simplified version of the R
check-release.yaml
workflow
42
Package Documentation: Python
43
Package Documentation: R
Continuous Deployment
44
Deploying and publishing packages
45
Continuous Deployment: Python
46
What about CD with R packages
47
Peer review facilitates package publishing
Wrap Up
48
Copyright and licenses
49
Workflows for reproducibile and trustworthy data science wrap-up
Appendix
50
Defining functions in Python
51
Defining functions in R
52
Reproducible reports
Continuous Deployment
43
Package Documentation: R
44
Deploying and publishing packages