35  Deploying and publishing packages

35.1 Topic learning objectives

By the end of this topic, students should be able to:

  1. Define continuous deployment and argue the costs and benefits of continuous deployment
  2. Explain why continuous deployment is superior to manually deploying software
  3. Store and use GitHub Actions credentials safely via GitHub Secrets
  4. Use GitHub Actions to set-up automated deployment of Python packages upon push to the main branch
  5. Explain semantic versioning, and define what constitutes patch, minor, major and breaking changes
  6. Write conventional commit messages that are useful for semantic release
  7. Publish Python packages to test PyPI
  8. Publish R packages to GitHub, document how to install them via devtools::install_github

35.2 Continuous Deployment (CD)

Defined as the practice of automating the deployment of software that has successfully run through your test-suite.

For example, upon merging a pull request to master, an automation process builds the Python package and publishes to PyPI without further human intervention.

35.2.1 Why use CD?

  • little to no effort in deploying new version of the software allows new features to be rolled out quickly and frequently
  • also allows for quick implementation and release of bug fixes
  • deployment can be done by many contributors, not just one or two people with a high level of Software Engineering expertise

35.2.2 Why use CD?

Perhaps this story is more convincing:

The company, let’s call them ABC Corp, had 16 instances of the same software, each as a different white label hosted on separate Linux machines in their data center. What I ended up watching (for 3 hours) was how the client remotely connected to each machine individually and did a “capistrano deploy”. For those unfamiliar, Capistrano is essentially a scripting tool which allows for remote execution of various tasks. The deployment process involved running multiple commands on each machine and then doing manual testing to make sure it worked.

The best part was that this developer and one other were the only two in the whole company who knew how to run the deployment, meaning they were forbidden from going on vacation at the same time. And if one of them was sick, the other had the responsibility all for themselves. This deployment process was done once every two weeks.

Source: Tylor Borgeson

Infrequent & manual deployment makes me feel like this when it comes time to do it:

and so it can become a viscious cycle of delaying deployment because its hard, and then making it harder to do deployment because a lot of changes have been made since the last deployment…

So to avoid this, we are going to do continuous deployment when we can! And where we can’t, we will automate as much as we can up until the point where we need to manually step in.

35.3 Examples of CD being used for data science

35.4 Conditionals for when to run the job

We only want our cd job to run if certain conditions are true, these are:

  1. if the ci job passes

  2. if this is a commit to the main branch

We can accomplish this in our cd job be writing a conditional using the needs and if keywords at the top of the job, right after we set the permissions:

cd:
    permissions:
      id-token: write
      contents: write

    # Only run this job if the "ci" job passes
    needs: ci

    # Only run this job if new work is pushed to "main"
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'

35.4.1 Exercise: read the cd job of ci-cd.yml

To make sure we understand what is happening in our workflow that performs CD, let’s convert each step to a human-readable explanation:

  1. Sets up Python on the runner

  2. Checkout our repository files from GitHub and put them on the runner

Note: I filled in the steps we went over last class, so you can just fill in the new stuff

35.4.2 Demo of Continuous Deployment!