Skip to main content
Data Quality Fundamentals

What Is a Data Quality Score and How Is It Calculated?

A data quality score is a single number that summarizes how fit your data is for use. Here's how it's calculated and what a good score actually looks like.

You want to know: "Is our data good?" A data quality score gives you a single number to answer that question — one that can be tracked over time, compared across datasets, and used to prioritize improvement efforts.

A data quality score is an aggregate measure of how well a dataset performs across the key quality dimensions: completeness, accuracy, validity, uniqueness, consistency, and timeliness. Most scoring approaches produce a percentage from 0 to 100, where 100 represents data that fully meets all defined quality criteria.

How Data Quality Scores Are Calculated

There's no single universal formula — different tools and frameworks weight dimensions differently. But the most common approach is:

1. Score each dimension individually

  • Completeness score: percentage of required fields that have values
  • Validity score: percentage of values that pass defined format/rule checks
  • Uniqueness score: percentage of records that have no exact duplicate
  • (And so on for each applicable dimension)

2. Weight the dimensions by importance Not all dimensions matter equally for every use case. For a customer contact list, uniqueness and completeness of email are more important than precision. For financial records, accuracy and validity might be weighted highest.

3. Aggregate into a composite score Multiply each dimension score by its weight, sum the results. A dataset that's 95% complete, 90% valid, and 85% unique might produce a composite score of 90% if weighted equally — or a lower score if validity is weighted more heavily.

What Is a Good Data Quality Score?

Context matters, but as a general benchmark:

  • 90–100%: Excellent. Suitable for high-stakes use cases like revenue reporting, marketing campaigns, and regulatory compliance.
  • 70–89%: Acceptable with caveats. Usable for most purposes but specific known gaps should be documented and communicated.
  • 50–69%: Problematic. Significant cleanup needed before relying on this data for decisions.
  • Below 50%: Not fit for purpose. Major remediation required.

These thresholds vary by industry and use case. A 95% validity score on medical records may be acceptable for a general wellness database but completely unacceptable for a clinical trial dataset.

Building a Data Quality Score for Your Business

You don't need enterprise software to calculate a meaningful data quality score. A simple approach using spreadsheets:

Step 1: Define the dimensions you'll measure (typically completeness and uniqueness are easiest to start with).

Step 2: For each dimension, write a formula that produces a score between 0 and 1. For completeness: =COUNTA(A:A)/ROWS(A:A). For uniqueness: =(ROWS(A:A) - SUMPRODUCT(1/COUNTIF(A:A,A:A)))/ROWS(A:A).

Step 3: Assign weights to each dimension. Start with equal weights if you're unsure.

Step 4: Multiply each dimension score by its weight and sum. That's your composite score.

Step 5: Record this score in a tracking sheet monthly. Watching the trend is more valuable than any single score.

Sohovi lets you upload your CSV and get an instant data quality report — no setup, no code required. It calculates completeness, uniqueness, and format consistency scores for every column automatically.

Why the Score Matters Less Than the Trend

A data quality score of 78% tells you something, but it doesn't tell you whether that's good or bad without context. The same score tracked over 12 months tells you something much more valuable: is data quality improving, declining, or stable?

A score that goes from 62% to 78% over six months means the cleanup efforts are working. A score that goes from 85% to 78% signals that something upstream changed — a new data source was added, a validation rule was removed, or a new team started entering data without proper training.

Tracking the trend transforms a static number into an early warning system.

Using Scores to Prioritize Improvement Efforts

A column-level quality score (not just a file-level aggregate) is more actionable than a single composite. When you can see that email completeness is 95% but phone completeness is 34%, you know exactly where to focus.

The prioritization framework:

  1. Fix the lowest-scoring fields that have the highest business impact
  2. Set a minimum acceptable score per field (not the same threshold for every field)
  3. Track each field's score separately from the composite

This turns "we need to improve our data quality" into "phone completeness is 34%, needs to reach 60% before the Q3 campaign."

Data Quality Score in Vendor Contracts

If you purchase data from vendors, include data quality score thresholds in your contract. Minimum acceptable completeness and validity scores give you a measurable basis for rejecting or returning files that don't meet standards — instead of relying on vague "quality" language that's hard to enforce.

Communicating Data Quality Scores to Non-Technical Stakeholders

A score of 78% is meaningless to someone who doesn't know what it's measuring. When presenting data quality scores to leadership or non-technical stakeholders, translate the number into business impact: "Our customer email completeness is 78%, which means roughly 2,200 of our 10,000 customers can't be reached by email. That affects any email-based campaign we run."

The translation from metric to impact is what makes data quality scores actionable for decision-makers. A score alone is a data point. A score plus business consequence is a business problem that leadership can choose to prioritize.

If you're ready to stop guessing about your data quality, Sohovi is built for exactly this. Upload your first CSV free — no credit card, no IT team, no code needed.

Sohovi Team

Data quality, for people who ship

The Sohovi team writes practical guides on data quality, profiling, and governance to help teams ship better data.

Start for free

Stop guessing. Start knowing your data quality.

Sohovi profiles your datasets in minutes — surfacing completeness gaps, type mismatches, and duplicate patterns before they reach production.

No credit card required · Free forever plan