35 Deploying and publishing packages
35.1 Topic learning objectives
By the end of this topic, students should be able to:
- Define continuous deployment and argue the costs and benefits of continuous deployment
- Explain why continuous deployment is superior to manually deploying software
- Store and use GitHub Actions credentials safely via GitHub Secrets
- Use GitHub Actions to set-up automated deployment of Python packages upon push to the main branch
- Explain semantic versioning, and define what constitutes patch, minor, major and breaking changes
- Write conventional commit messages that are useful for semantic release
- Publish Python packages to test PyPI
- Publish R packages to GitHub, document how to install them via
devtools::install_github
35.2 Continuous Deployment (CD)
Defined as the practice of automating the deployment of software that has successfully run through your test-suite.
For example, upon merging a pull request to master, an automation process builds the Python package and publishes to PyPI without further human intervention.
35.2.1 Why use CD?
- little to no effort in deploying new version of the software allows new features to be rolled out quickly and frequently
- also allows for quick implementation and release of bug fixes
- deployment can be done by many contributors, not just one or two people with a high level of Software Engineering expertise
35.2.2 Why use CD?
Perhaps this story is more convincing:
The company, let’s call them ABC Corp, had 16 instances of the same software, each as a different white label hosted on separate Linux machines in their data center. What I ended up watching (for 3 hours) was how the client remotely connected to each machine individually and did a “capistrano deploy”. For those unfamiliar, Capistrano is essentially a scripting tool which allows for remote execution of various tasks. The deployment process involved running multiple commands on each machine and then doing manual testing to make sure it worked.
The best part was that this developer and one other were the only two in the whole company who knew how to run the deployment, meaning they were forbidden from going on vacation at the same time. And if one of them was sick, the other had the responsibility all for themselves. This deployment process was done once every two weeks.
Source: Tylor Borgeson
Infrequent & manual deployment makes me feel like this when it comes time to do it:
and so it can become a viscious cycle of delaying deployment because its hard, and then making it harder to do deployment because a lot of changes have been made since the last deployment…
So to avoid this, we are going to do continuous deployment when we can! And where we can’t, we will automate as much as we can up until the point where we need to manually step in.
35.3 Examples of CD being used for data science
Python packages (but not R, more on this later…)
Package documentation (e.g.,
pkgdown
websites for R, ReadtheDocs websites for Python)Books and websites (e.g.,
jupyter-book
,bookdown
,distill
websites, etc)
35.4 Conditionals for when to run the job
We only want our cd
job to run if certain conditions are true, these are:
if the
ci
job passesif this is a commit to the
main
branch
We can accomplish this in our cd
job be writing a conditional using the needs
and if
keywords at the top of the job, right after we set the permissions:
cd:
permissions:
id-token: write
contents: write
# Only run this job if the "ci" job passes
needs: ci
# Only run this job if new work is pushed to "main"
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
35.4.1 Exercise: read the cd
job of ci-cd.yml
To make sure we understand what is happening in our workflow that performs CD, let’s convert each step to a human-readable explanation:
Sets up Python on the runner
Checkout our repository files from GitHub and put them on the runner
…
…
…
…
…
Note: I filled in the steps we went over last class, so you can just fill in the new stuff