Continuous Integration
46
Package Documentation: Python - quartodoc
Reproducible and Trustworthy Workflows for Data Science
Welcome
Introduction
1
How do reproducible and trustworthy workflows impact data science?
2
Introduction to the Bash Shell
Version Control
3
SSH for authentication
4
Version control (for transparency and collaboration) I
5
Version control (for transparency and collaboration) II
6
Project management using GitHub
Projects, Environments, and Containers
7
Filenames and data science project organization, Integrated development environments
8
Virtual environments
9
R Environments: renv
10
Python Environments: conda
11
Conda lock: reproducible lock files for conda environments
12
Introduction to containerization
13
Using and running containers
14
docker-compose
15
Customizing and building containers
Data Validation
16
Data validation
17
R Data Validation: pointblank
18
Python Data Validation: Pandera
19
Python Data Validation: deepchecks
20
Python Data Validation: pointblank
Automation
21
Non-interactive scripts
22
Reproducible reports
23
Data analysis pipelines with scripts
24
Data analysis pipelines with GNU Make
Software Testing for Data Science
25
Introduction to testing code for data science
26
Testing functions in R with
testthat
27
Testing functions Python with
pytest
28
Testing: Images
29
Evaluating test suite quality
Packaging
30
Introduction to R & Python packages
31
Packaging: Python
32
Packaging: Python (Poetry)
33
Packaging: Python (Hatch)
34
Packaging: R
35
Packaging: Conclusion
36
Package Testing with Python
pytest
37
Code Coverage: R
38
Code Coverage: Python
Continuous Integration
39
Automated testing and continuous integration
40
Github Actions
41
Actions: Matrix Workflows
42
Actions: Testing Workflows
43
Case Study:
pypkgs-cookiecutter
’s
ci.yml
workflow
44
Case study: a simplified version of the R
check-release.yaml
workflow
45
Package Documentation: Python - sphinx
46
Package Documentation: Python - quartodoc
47
Package Documentation: R
Continuous Deployment
48
Deploying and publishing packages
49
Continuous Deployment: Python (Poetry)
50
Continuous Deployment: Python (hatch + quartodoc)
51
What about CD with R packages
52
Peer review facilitates package publishing
Wrap Up
53
Copyright and licenses
54
Workflows for reproducibile and trustworthy data science wrap-up
Appendix
55
Defining functions in Python
56
Defining functions in R
57
Reproducible reports
58
Quarto and Github Pages
Table of contents
46.1
Learning objectives
46.2
Quartodoc
Continuous Integration
46
Package Documentation: Python - quartodoc
46
Package Documentation: Python - quartodoc
46.1
Learning objectives
Generate well formatted function and package-level documentation for Python packages using Sphinx & Read the Docs
46.2
Quartodoc
https://machow.github.io/quartodoc
45
Package Documentation: Python - sphinx
47
Package Documentation: R