Data validation#
Topic learning objectives#
By the end of this topic, students should be able to:
Explain why it is important to validate data used in a data analysis project, and give examples of consequences that might occur with invalid data.
Discuss where data validation should happen in a data analysis project.
List the major checks that should be performed when validating data for data analysis, and justiy why they should be used.
Use the Python Pandera package to create data schema for checking and to validate data.
Use the Python Pandera package to drop invalid rows.
List other commonly used data validation packages for Python and R.