Accuracy Is the Hardest Quality Dimension to Measure
Accuracy measures the degree to which data correctly represents the real-world entity or event it describes. Unlike completeness (is a value present?) or validity (does it conform to formatting rules?), accuracy requires an external reference for comparison.
Is this email address actually associated with this person? Is this revenue figure what was actually charged? Is this address where this customer actually lives? Answering these questions requires either checking an authoritative external source or trusting your data entry process — and trusting your data entry process is risky.
This is why accuracy is systematically underestimated: it's hard to measure without effort, so most organizations don't measure it at all. They assume their data is accurate until something breaks and proves it isn't.
Sohovi scores your dataset against your own accuracy standards and highlights the columns and rows where values fall outside expected ranges.
Why Accuracy Problems Are Structural, Not Just Occasional
Self-reported data is often inaccurate by design: People give fake email addresses to avoid marketing. They round numbers when reporting. They misremember dates. They describe their company size as larger than it is. Self-reported data has a structural accuracy problem that no validation rule can fully address — a syntactically valid email address like throwaway123@gmail.com is accurate in format and completely useless in practice.
Data decays over time: An accurate address in 2022 may be inaccurate in 2026. Industry estimates suggest that B2B contact data decays at roughly 30% per year — job titles change, people switch companies, addresses change. An email that was valid 18 months ago has a meaningful probability of being invalid or associated with the wrong person today. Accuracy is a point-in-time measurement that degrades continuously.
Manual entry produces errors at predictable rates: Transposing digits in a phone number. Misspelling a company name. Entering the wrong year. Accidentally overwriting a field. Industry estimates suggest that manual data entry produces error rates of 1–4% in most environments — which sounds small until you multiply it across a 50,000-record database.
Copy-paste errors are invisible to validators: A copied email address that's correct in format but belongs to the wrong person passes every format check. Accuracy errors that don't affect format are completely invisible to standard validation.
Sohovi lets you set up validation rules for any column and instantly see which rows fall outside them — no code or SQL required.
How to Assess Data Accuracy
Spot verification (manual, high effort): Sample records and manually verify them against an authoritative source. Call the phone number. Send a verification email. Check the address in a directory. This gives you a real accuracy estimate but is labor-intensive — typically used for small, high-value datasets or as a periodic audit of a sample.
Cross-referencing against payment processor records: For financial data, compare recorded transaction amounts against your payment processor's transaction history. Discrepancies reveal accuracy problems. This is low-cost and highly reliable for the financial data dimension.
User-reported corrections as a signal: Track how often customers update their own records after receiving a communication from you. A high correction rate ("this email was sent to the wrong address") signals low initial accuracy. Even unsubscribes can signal accuracy problems when the person says "I never signed up for this."
Sohovi scores your dataset against your own accuracy standards and highlights the columns and rows where values fall outside expected ranges.
Return mail and bounce rates: Email hard bounces and physical mail returns are direct evidence of address inaccuracy. A hard bounce rate above 2% on a list you believe is clean suggests significant accuracy problems.
External data enrichment: Services that verify contact information against their own databases can tell you what percentage of your records match their records and flag discrepancies. The match rate itself is an accuracy signal.
Building an Accuracy Culture
Accuracy is a human problem as much as a technical one. The systems that produce the most accurate data share a few practices:
Minimize manual entry: Every field that's populated automatically by a system — captured from a web form, copied from a verified source, generated by a process — is more likely to be accurate than a field a human typed by hand. Design your data collection to reduce manual entry for critical fields wherever possible.
Validate against authoritative sources at entry: Address lookup APIs confirm that an address exists as the user types. Email verification services confirm deliverability before the record is saved. These cost a small amount per verification but prevent accuracy problems from entering your data in the first place.
Prompt periodic self-verification: Once or twice a year, ask your active customers to verify their contact information. A simple email with a pre-filled form that they can update in one click. The customers who update something are telling you your data was wrong.
Capture the source and method for key fields: Knowing whether an email address was self-reported via a web form, imported from a purchased list, or manually entered by a sales rep gives you a proxy for expected accuracy. Self-entered via web form is generally more accurate than imported from a third-party list.
Sohovi surfaces format and pattern validity in your CSV data instantly — while accuracy checking requires external verification, catching validity problems first eliminates one category of inaccuracy and shows you which records need deeper review.
Accuracy vs. Other Dimensions: Why the Distinction Matters
- Completeness: Is there a value? (Present/absent)
- Validity: Does the value conform to expected rules? (Format check)
- Accuracy: Does the value correctly represent reality? (Factual check)
A record can be complete (all fields filled in), valid (all fields formatted correctly), and still be thoroughly inaccurate. A phone number that's 10 digits in the right format but belongs to a different person passes completeness and validity checks while failing accuracy.
This is why accuracy is the most expensive dimension to improve: it requires human judgment or external verification, not just automated rules. Focus your accuracy investment on the fields that matter most for your highest-stakes use cases.
If you're ready to start with what you can measure automatically — format validity, completeness, and duplicate detection — Sohovi gives you an instant report. Upload your CSV free, no setup required.