Continuous Deployment
Reproducible and Trustworthy Workflows for Data Science
Welcome
Introduction
1
How do reproducible and trustworthy workflows impact data science?
2
Introduction to the Bash Shell
Version Control
3
SSH for authentication
4
Version control (for transparency and collaboration) I
5
Version control (for transparency and collaboration) II
6
Project management using GitHub
Projects, Environments, and Containers
7
Filenames and data science project organization, Integrated development environments
8
Virtual environments
9
R Environments: renv
10
Python Environments: conda
11
Conda lock: reproducible lock files for conda environments
12
Introduction to containerization
13
Using and running containers
14
docker-compose
15
Customizing and building containers
Data Validation
16
Data validation
17
R Data Validation: pointblank
18
Python Data Validation: Pandera
19
Python Data Validation: deepchecks
20
Python Data Validation: pointblank
Automation
21
Non-interactive scripts
22
Reproducible reports
23
Data analysis pipelines with scripts
24
Data analysis pipelines with GNU Make
Software Testing for Data Science
25
Introduction to testing code for data science
26
Testing functions in R with
testthat
27
Testing functions Python with
pytest
28
Testing: Images
29
Evaluating test suite quality
Packaging
30
Introduction to R & Python packages
31
Packaging: Python
32
Packaging: R
33
Packaging: Conclusion
34
Package Testing with Python
pytest
35
Code Coverage: R
36
Code Coverage: Python
Continuous Integration
37
Automated testing and continuous integration
38
Github Actions
39
Actions: Matrix Workflows
40
Actions: Testing Workflows
41
Case Study:
pypkgs-cookiecutter
’s
ci.yml
workflow
42
Case study: a simplified version of the R
check-release.yaml
workflow
43
Package Documentation: Python
44
Package Documentation: R
Continuous Deployment
45
Deploying and publishing packages
46
Continuous Deployment: Python
47
What about CD with R packages
48
Peer review facilitates package publishing
Wrap Up
49
Copyright and licenses
50
Workflows for reproducibile and trustworthy data science wrap-up
Appendix
51
Defining functions in Python
52
Defining functions in R
53
Reproducible reports
54
Quarto and Github Pages
Continuous Deployment
44
Package Documentation: R
45
Deploying and publishing packages