20  Python Data Validation: pointblank

This chapter goes through the {pointblank} R package that has been ported into Python: https://posit-dev.github.io/pointblank/

import pointblank as pb

small_table = pb.load_dataset(dataset="small_table")
small_table
shape: (13, 8)
date_time date a b c d e f
datetime[Ξs] date i64 str i64 f64 bool str
2016-01-04 11:00:00 2016-01-04 2 "1-bcd-345" 3 3423.29 true "high"
2016-01-04 00:32:00 2016-01-04 3 "5-egh-163" 8 9999.99 true "low"
2016-01-05 13:32:00 2016-01-05 6 "8-kdg-938" 3 2343.23 true "high"
2016-01-06 17:23:00 2016-01-06 2 "5-jdo-903" null 3892.4 false "mid"
2016-01-09 12:36:00 2016-01-09 8 "3-ldm-038" 7 283.94 true "low"
â€Ķ â€Ķ â€Ķ â€Ķ â€Ķ â€Ķ â€Ķ â€Ķ
2016-01-20 04:30:00 2016-01-20 3 "5-bce-642" 9 837.93 false "high"
2016-01-20 04:30:00 2016-01-20 3 "5-bce-642" 9 837.93 false "high"
2016-01-26 20:07:00 2016-01-26 4 "2-dmx-010" 7 833.98 true "low"
2016-01-28 02:51:00 2016-01-28 2 "7-dmx-010" 8 108.34 false "low"
2016-01-30 11:23:00 2016-01-30 1 "3-dka-303" null 2230.09 true "high"

20.1 Validation rules

pb.Validate(small_table).col_vals_lt(columns="a", value=10)
Pointblank ValidationNo Interrogation Peformed
None
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C66 1
col_vals_lt
col_vals_lt()
a 10 —
pb.Validate(small_table).col_vals_lt(columns="a", value=5)
Pointblank ValidationNo Interrogation Peformed
None
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C66 1
col_vals_lt
col_vals_lt()
a 5 —
validation = (
    pb.Validate(small_table)
    .col_vals_between(columns="d", left=0, right=5000)
    .col_vals_le(columns="c", value=5)
    .col_exists(columns=["date", "date_time"])
    .interrogate()
)

validation
Pointblank Validation
2025-03-21|11:38:22
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C66 1
col_vals_between
col_vals_between()
d [0, 5000] ✓ 13 12
0.92
1
0.08
— — —
#4CA64C66 2
col_vals_lte
col_vals_le()
c 5 ✓ 13 5
0.38
8
0.62
— — —
#4CA64C 3
col_exists
col_exists()
date — ✓ 1 1
1.00
0
0.00
— — — —
#4CA64C 4
col_exists
col_exists()
date_time — ✓ 1 1
1.00
0
0.00
— — — —
2025-03-21 11:38:22 UTC< 1 s2025-03-21 11:38:23 UTC

20.2 Post-interrogation

validation = (
    pb.Validate(small_table)
    .col_vals_between(columns="d", left=0, right=5000)
    .col_vals_le(columns="c", value=5)
    .col_exists(columns=["date", "date_time"])
    .interrogate()
)
validation.get_sundered_data(type="pass")
shape: (5, 8)
date_time date a b c d e f
datetime[Ξs] date i64 str i64 f64 bool str
2016-01-04 11:00:00 2016-01-04 2 "1-bcd-345" 3 3423.29 true "high"
2016-01-05 13:32:00 2016-01-05 6 "8-kdg-938" 3 2343.23 true "high"
2016-01-11 06:15:00 2016-01-11 4 "2-dhe-923" 4 3291.03 true "mid"
2016-01-15 18:46:00 2016-01-15 7 "1-knw-093" 3 843.34 true "high"
2016-01-17 11:27:00 2016-01-17 4 "5-boe-639" 2 1035.64 false "low"
validation.get_sundered_data(type="fail")
shape: (8, 8)
date_time date a b c d e f
datetime[Ξs] date i64 str i64 f64 bool str
2016-01-04 00:32:00 2016-01-04 3 "5-egh-163" 8 9999.99 true "low"
2016-01-06 17:23:00 2016-01-06 2 "5-jdo-903" null 3892.4 false "mid"
2016-01-09 12:36:00 2016-01-09 8 "3-ldm-038" 7 283.94 true "low"
2016-01-20 04:30:00 2016-01-20 3 "5-bce-642" 9 837.93 false "high"
2016-01-20 04:30:00 2016-01-20 3 "5-bce-642" 9 837.93 false "high"
2016-01-26 20:07:00 2016-01-26 4 "2-dmx-010" 7 833.98 true "low"
2016-01-28 02:51:00 2016-01-28 2 "7-dmx-010" 8 108.34 false "low"
2016-01-30 11:23:00 2016-01-30 1 "3-dka-303" null 2230.09 true "high"