Skip to main content
Practical How-To Guides

How to Set Data Quality Thresholds That Actually Make Sense

"Zero errors allowed" sounds rigorous but isn't practical. Here's how to set data quality thresholds that reflect real business risk — not theoretical perfection.

A data quality threshold is the minimum acceptable level of quality for a specific field or dataset — the point at which quality is good enough for its intended use, below which action is required.

"Zero errors allowed" sounds rigorous but isn't practical. A customer email field at 99.5% completeness has a very different risk profile than an optional demographic field at 70%. Setting one threshold for everything either produces constant false alarms or gets ignored entirely.

Why Thresholds Matter

Without defined thresholds, data quality monitoring produces one of two outcomes:

Sohovi tracks quality trends across runs and alerts you when a metric — null rate, duplicate count, score — moves outside its normal range.

Everything fails: Constant red alerts that teams start ignoring because "it's always red."

Nothing fails: No thresholds means nothing is ever flagged. Quality drifts undetected until something breaks visibly.

Thresholds convert continuous measurement into a meaningful signal: above threshold is acceptable, below is actionable.

Sohovi tracks quality trends across runs and alerts you when a metric — null rate, duplicate count, score — moves outside its normal range.

The Three Factors That Determine the Right Threshold

Factor 1: How the field is used. A field used in customer-facing communications needs a much higher threshold than one used only for internal analytics. Ask: "What breaks if this field has bad values?"

Factor 2: The cost of false positives vs. false negatives. A threshold too high triggers unnecessary alerts. A threshold too low allows real problems to pass. Set closer to the high end for fields where problems are costly.

Factor 3: Historical baseline. If your email completeness has consistently been 97–98%, a threshold of 95% gives meaningful headroom. A threshold of 99% would trigger constant alerts for normal variation.

Sohovi profiles every column in your dataset for completeness and flags the exact rows where values are missing — free to try.

Setting Thresholds by Quality Dimension

Completeness thresholds:

| Field Use | Recommended Threshold | |---|---| | Customer-facing required field (email for campaigns) | ≥ 98% | | Sales/CRM identifier (phone for outreach) | ≥ 95% | | Internal analytics field (industry, company size) | ≥ 80% | | Optional enrichment field | ≥ 60% |

Uniqueness thresholds:

| Entity Type | Recommended Threshold | |---|---| | Primary identifier (customer ID, order ID) | 100% unique | | Business-critical identifier (email in contact DB) | ≥ 99.5% unique | | Semi-controlled identifier (company name) | ≥ 97% unique |

Validity thresholds:

| Field Type | Recommended Threshold | |---|---| | Customer-facing format (email, phone) | ≥ 99% valid | | Date fields | ≥ 97% valid | | Categorical/enum fields | ≥ 95% valid |

[IMAGE: A table or dashboard showing threshold indicators for different fields, with green/amber/red zones marked]

Rule of thumb: Start at 5 percentage points below your current average. If email completeness has averaged 97.5%, set the threshold at 92.5%. This catches genuine declines without flagging normal variation.

Using Industry Benchmarks as Starting Points

Some published benchmarks as reference:

  • Email hard bounce rate: best practice is below 0.5%; above 2% is a problem
  • CRM data accuracy: Gartner research suggests best-in-class maintains 95%+ contact accuracy

These are starting points, not rules. Your threshold should reflect your specific data, usage, and risk tolerance.

How to Adjust Thresholds Based on Experience

Adjust when:

  • Triggering too frequently: Raise the threshold or add a hysteresis band (alert only if breached for two consecutive periods)
  • Never triggers but problems emerge: Your threshold is too low — raise it
  • Business requirements change: Revisit thresholds when a field gets a new use case

Frequently Asked Questions

Q: What is a data quality threshold? A data quality threshold is the minimum acceptable level of quality for a specific field or dataset — defined as a percentage for completeness and validity, or a count/percentage for uniqueness. When a metric falls below the threshold, action is required.

Q: Should all fields have the same data quality threshold? No. The right threshold depends on how the field is used, what breaks if values are wrong, and the historical baseline. A customer email field needs a much higher threshold than an internal notes field.

Q: What is a good data completeness threshold? For customer-facing required fields (email used in campaigns), 98% or higher is appropriate. For internal analytics fields, 80% may be sufficient. The right threshold depends on the field's use case.

Q: Is 100% data quality always the right goal? No. For most fields, 100% is unachievable without rejecting legitimate records that genuinely have no value for that field. Some customers don't have phone numbers. Setting 100% as a threshold for optional fields produces constant false alarms.

Q: What's the difference between a data quality threshold and a data quality standard? A standard defines what "good" looks like (email must be valid format). A threshold defines the minimum acceptable percentage of records meeting that standard (at least 99% of emails must be valid). Standards define the rule; thresholds define how often it must be met.

Q: How do I handle a field where quality varies by segment? Set segment-specific thresholds if the variation is meaningful. If managing segment-level thresholds is too complex, use the most conservative threshold that applies across all segments.

Q: What should I do when a field consistently falls just below threshold? Investigate whether the threshold is right or whether there's a real quality problem. If historical data shows the field has never reached the threshold, it may be set too high. If it used to meet the threshold and recently declined, investigate the quality issue.

Q: How should I communicate thresholds to my team? Make them part of your data quality policy and data quality checklist. Document them where they'll be seen — in runbooks and import procedures. The team needs to know the standard before they can meet it.

Q: What's the difference between an alert threshold and a hard stop threshold? An alert threshold triggers a notification when a metric warrants attention. A hard stop threshold blocks a process until the issue is resolved. Alert thresholds are higher; hard stop thresholds represent a more serious quality failure.

Q: Can I use the same thresholds across all my datasets? You can use consistent defaults, but the right thresholds differ by dataset and field based on usage and risk. Universal thresholds are better than no thresholds, but field-specific thresholds are more accurate.


Start with your current quality baseline. Set your threshold 5 percentage points below where you currently are. Adjust as you learn what "good enough" actually means for each field and use case.

Sohovi shows your actual completeness, uniqueness, and validity rates for every column when you upload a CSV. That baseline is the foundation for setting meaningful thresholds. Free to try — no credit card, no code required.

Sohovi Team

Data quality, for people who ship

The Sohovi team writes practical guides on data quality, profiling, and governance to help teams ship better data.

Start for free

Stop guessing. Start knowing your data quality.

Sohovi profiles your datasets in minutes — surfacing completeness gaps, type mismatches, and duplicate patterns before they reach production.

No credit card required · Free forever plan