18  Introduction to testing code for data science

Learning Objectives

  1. Fully specify a function and write comprehensive tests against the specification.
  2. Reproducibly generate test data (e.g., data frames, models, plots).
  3. Discuss the observability of unit outputs in data science (e.g., plot objects), and how this should be taken into account when designing software.
  4. Explain why test-driven development (TDD) affords testability
  5. Use exceptions when writing code.
  6. Test if a function is throwing an exception when it should, and that is does not do so when it shouldn’t.
  7. Evaluate test suite quality.

How do we ensure our code works as expected? That the code outputs are what we expect, and our code handles user errors in expected ways. We test it! Tests help to increase the robustness and trustworthiness of your code. In this chapter we will introduce formal software testing concepts and practices that are applicable to data science.

The first concept we will consider is testability. Testability is defined as the degree to which a system or component facilitates the establishment of test objectives and the execution of tests to determine whether those objectives have been achieved. In order to be successful, a test needs to be able to execute the code you wish to test, in a way that can trigger a defect that will propagate an incorrect result to a program point where it can be checked against the expected behaviour. From this, we can derive four high-level properties required for effective test writing and execution. These are controllability, observability, isolateablilty, and automatability.

  • controllability: the code under test needs to be able to be programmatically controlled
  • observability: the outcome of the code under test needs to be able to be verified
  • isolateablilty: the code under test needs to be able to be validated on its own
  • automatability: the tests should be able to be executed automatically

Source: CPSC 310 & CPSC 410 class notes from Reid Holmes, UBC]

Functions are individual units of code that perform a specific task. They are useful for increasing code modularity - this helps with reusability and readability. Functions also are critical in ensuring our code is testable by facilitating that our code has the four high-level properties required for effective test writing and execution introduced above. Functions allow our code to be controlled and automated (we can execute it by calling the function) and if we write our functions using best practices (e.g., functions do just one and they typically return an object) then our code is also isolateable and observable.

When should you write a function? Is it just when you start re-writing code for the second or third time? No, actually most code you write should be a function, as that helps with testability. You can even write your tests before you write your function, you just have to have planned what function inputs and outputs to expect. Writing your tests before implementing your function can even improve your productivity (Erdogmus, Morisio, and Torchiano 2005)! In software engineering, writting your tests before your functions is called “Test Driven Development” (Becker 2002).

What kinds of tests do we write for our functions? I like to think about three broad categories of tests, and then write 2-3 tests for each (or more if the function is complex):

  1. Simple expected use cases
  2. Edge cases (unexpected, or rare use cases)
  3. Abnormal, error or adversarial use cases.

We will walk through some specific examples of each of these later in this chapter.

How many tests do we write? There is no concrete answer to this question. It is finding the right balance between having enough test cases that you cover a variety of the possible uses cases of your function, ensuring that the correct behaviour occurs in each of these, without spending too much time on writing tests. As with many functions, the possible ways a function could be used, and the many resultant outputs from it means that it is often impossible to test everything.

18.1 Workflow for writing functions and tests for data science

How should we get started writing functions and tests for data science? There are many ways one could proceed, however some paths will be more efficient and less error-prone, and more robust than others. Borrowing from software development best practices, one recommended workflow is shown below:

  1. Write the function specifications and documentation - but do not implement the function. This means that you will have an empty function, that specifies and documents what the name of the function is, what arguments it takes, and what it returns.

  2. Plan the test cases and document them. Your tests should assess whether the function works as expected when given correct inputs, as well as that it behaves as expected when given incorrect inputs (e.g., throws an error when the wrong type is given for an argument). For the cases of correct inputs, you will want to test the top, middle and bottom range of these, as well as all possible combinations of argument inputs possible. Also, the test data should be as simple and tractable as possible while still being able to assess your function.

  3. Create test data that is useful for assessing whether your function works as expected. In data science, you likely need to create both the data that you would provide as inputs to your function, as well as the data that you would expect your function to return.

  4. Write the tests to evaluate your function based on the planned test cases and test data.

  5. Implement the function by writing the needed code in the function body to pass the tests.

  6. Iterate between steps 2-5 to improve the test coverage and function.