Data Quality

Data Quality Checks That Actually Catch Real Problems (Not Just Nulls)

The dashboard is wrong, but it is wrong in a new and exciting way

Null checks are like smoke detectors that only beep when the battery is missing. Helpful, but not the kind of helpful you need when the kitchen is actively on fire.

Most analytics breakages I see are not "column is null." They are subtler. An upstream change causes silent undercounting. A join key stops matching. A backfill doubles yesterday. A filter starts including internal users. Your metric drifts and nobody notices until someone has a very awkward meeting.

You want checks that catch those.

TL;DR

• Null checks are necessary but not sufficient. Most real issues are freshness, duplication, join breakage, and semantic drift.
• Start with checks on the tables that power decisions, not every table in your warehouse.
• Monitor freshness, volume, uniqueness, referential integrity, and distribution drift.
• Add one "does the metric still make sense?" check for your top metrics.
• Build a simple incident loop: alert → triage → fix → prevent → document.

Why "basic tests" miss real failures

1) The data arrives, but it is incomplete

You get a successful pipeline run, but a source only delivered part of the day. Everything is "non-null," and everything is wrong.

How it shows up:

Daily totals are down
Segment breakdowns look weird
Nobody trusts the dashboard for a few days, which kills momentum

2) Joins fail quietly

A join key changes format, a new ID system is introduced, or a dimension table lags. Your fact table is fine. Your final model is undercounted.

3) Duplicates appear (and revenue gets… creative)

Reprocessing without idempotency, retries at ingestion, or a new event emitter that fires twice. Your totals look great. Too great.

4) Semantics change while the schema stays the same

The column is still called status, but now it means something else. Your checks are green. Your meaning is red.

5) Business logic changes and nobody updates definitions

Refund logic shifts. Trial rules change. Internal users expand. Your metric logic becomes a fossil.

The checks that catch real problems

The best quality checks are the ones that align with how analytics fails in real life. Here are the highest ROI categories.

1) Freshness checks (are we late?)

What it is: Verify that key tables updated within an expected window.

Check max(event_time) is within X hours of now
Check "data delivered for yesterday" exists by a specific time

Small team version

One freshness check per critical table, daily.

Growing team version

Different SLAs per source, plus an "expected arrival calendar" for weekend/holiday patterns.

2) Volume anomaly checks (did something drop or spike?)

What it is: Compare today's row counts (or event counts) to a baseline.

Day-over-day percent change thresholds
Week-over-week comparisons (often more stable)
Per-segment checks for critical segments

Why it catches real issues

If ingestion breaks or instrumentation drops, volume shifts are your early warning system.

Small team version

Alert if day-over-day changes exceed a threshold (example: ±30%). Tune later.

Growing team version

Use rolling baselines and per-segment rules for the handful of segments that matter.

3) Uniqueness checks (are we duplicating?)

What it is: Ensure primary identifiers are unique where they should be.

Event ID uniqueness in raw events
Order ID uniqueness in facts
One row per (user_id, date) in daily rollups

Common failure this catches

Retry loops and backfills that double-count.

4) Referential integrity checks (did the join break?)

What it is: Make sure foreign keys still match dimension tables.

% of fact rows with missing dimension matches
Count of nulls after join
Existence of new unmatched keys

This one is underrated.

A single broken join can nuke a metric while everything else looks fine.

5) Distribution drift checks (did the shape change?)

What it is: Track distributions of key fields over time.

Status breakdowns
Plan types
Country/region
Device types

Why it matters

If a field's distribution shifts drastically, you might have a new emitter, a bug, or a semantics change.

Small team version

Monitor top 5 categories and "other."

6) "Semantic" checks for your most important metrics (does this still make sense?)

You need at least one check that asks: "Is the metric behaving plausibly?"

Examples (choose what fits your business):

Conversion rate should not jump 3x overnight without a known change
Paid subscriptions should not exceed total accounts
Refunds should not exceed gross revenue
Active users cannot exceed total users

These are not perfect, but they catch "obviously wrong" fast.

Build a minimum viable data quality system (1–2 weeks)

Do this in order:

Identify the top 10 tables/models that power decisions
Add freshness + volume checks to them
Add uniqueness checks for key identifiers
Add one referential integrity check for your most important join
Add one semantic check for your top metric
Wire alerts to a place humans see (email or a channel)
Write a tiny runbook: "If alert X triggers, check Y first"

That gets you most of the benefit without turning into a monitoring project that needs its own monitoring.

Next level: quality checks that scale

Per-segment anomaly detection for key segments
Automated backfill validation comparing pre/post metrics
"Certified data" tiers and ownership
Changelog discipline so metric logic changes are transparent

Text diagram: quality checks wrapped around the pathway

sources
↓
ingestion (freshness + schema checks)
↓
raw (uniqueness + volume)
↓
transforms (referential integrity + distribution drift)
↓
metrics layer (semantic plausibility checks)
↓
dashboards (performance + adoption signals)

Common mistakes

Monitoring everything. Monitor decision-driving tables first.
Alert fatigue. If everything is red, nothing is red.
Treating quality checks as "data team only." Stakeholders should know what's certified.
No runbook. The goal is fast recovery, not mystery.

When to bring in help

If:

Your dashboards are frequently "untrusted"
Issues are detected days later
Backfills are scary and avoided
Nobody knows which checks matter most

Wrap-up

Null checks are table stakes. Real trust comes from catching the failures that actually happen: lateness, duplication, broken joins, and semantics drift.

Want a second set of eyes?

Request a free 20-minute fit call.

• We'll identify the 5–10 highest-impact checks for your stack and key metrics
• We'll outline a lightweight alerting + runbook approach that prevents repeat incidents

No prep needed. No pressure.

Request a fit call

← Back to all insights