Skip to main content
Data Profiling

How to Find Outliers in Your Data Without Writing Code

Outliers in your data are values that fall far outside the expected range. Some are data errors; some are real. Here's how to find them without writing code.

An order with a quantity of 50,000 in a database where typical orders are 1–100. A customer with a credit score of 0 in a range of 300–850. A date of 1901 in a field that should contain recent transactions. These are outliers — and some of them are data errors masquerading as valid values.

An outlier is a value that deviates significantly from the rest of the values in a column. In the context of data quality, outliers are important because they're often signals of data entry errors, system glitches, or import problems — not legitimate extreme values.

Types of Outliers in Data Quality

Entry errors — A quantity of 10000 where the user meant 100. An age of 200 where they entered two fields in the wrong order. These are clearly wrong.

System defaults masquerading as values — A birth date of "01/01/1900" is often a system default for "unknown date," not an actual customer born in 1900.

Unit mismatch errors — Revenue entered in thousands rather than actual values, causing some records to appear 1,000x larger than others.

Legitimate extreme values — Some outliers are real: a customer who genuinely placed a $500,000 order, or an employee who genuinely has been with the company for 40 years. Context determines whether an outlier is an error or a fact.

How to Detect Outliers Without Code

Sort the column and inspect the extremes — Sort ascending and descending to see the top and bottom values. Are the extremes plausible? This is the fastest manual method.

Look at min and max values — When profiling a column, the minimum and maximum values immediately reveal whether the range is plausible for the field's purpose.

Check value distribution — A histogram of values shows whether data clusters normally or has unexpected spikes at specific values (like 0 or 9999) that might indicate defaults or placeholders.

Apply business rules — Define plausible ranges for key numeric fields and flag any records outside those ranges. Age between 0–120. Revenue between 0 and your largest-ever deal. Order quantity between 1 and your practical maximum. Any record outside the range is a candidate for review.

Outlier Detection in Excel and Google Sheets

You don't need specialized software to find outliers. These spreadsheet techniques work well:

Conditional formatting — Highlight cells above the 95th percentile or below the 5th percentile. This visually surfaces extreme values immediately.

PERCENTILE formula — Use =PERCENTILE(A:A, 0.95) to find the 95th percentile value. Any value above this is in the top 5% and worth examining.

STDEV and AVERAGE — Calculate the mean and standard deviation. Values more than 3 standard deviations from the mean are statistical outliers and warrant investigation.

Filter and sort — Filter out nulls, sort the column, and manually review the top 20 and bottom 20 values. For columns with fewer than 10,000 rows, this takes under a minute.

Sohovi lets you upload your CSV and get an instant data quality report — no setup, no code required. It flags outliers and distributional anomalies across every numeric column automatically.

A Practical Outlier Review Workflow

When you find outlier candidates, resist the urge to delete them immediately. Instead, follow this workflow:

  1. Export the outlier rows — Filter to the flagged records and export them separately for review
  2. Check the original source — Can you trace the record to its source? A CRM entry, a form submission, an import file?
  3. Classify: error, anomaly, or legitimate? — Is this clearly wrong (age of 300)? Suspicious but possible (order of 10,000 units from a new customer)? Or verifiably correct (a confirmed bulk order)?
  4. Correct or flag — Fix clear errors. Flag suspicious values with a note. Leave verified legitimate extremes as-is.

Documenting your decisions matters: if you delete a record, note why. If you correct a value, note what it was and what you changed it to. This trail is valuable if questions come up later.

Outliers in Date Columns

Date columns are particularly prone to outlier issues. Common problems:

  • Future dates that shouldn't exist yet (a transaction date of 2099 indicates a data entry error)
  • Dates before the organization existed (a customer "created" in 1985 when the business opened in 2010)
  • Unix timestamp 0 (January 1, 1970) appearing as a date default
  • Null dates stored as "00/00/0000" which some tools interpret as the year 0

For date columns, filter for dates outside your plausible business range. If your business opened in 2015, any customer record created before 2015 needs investigation.

Outliers in Categorical Columns

Outliers aren't only numeric. In categorical columns, look for:

  • Values that appear only once or twice when every other value appears hundreds of times — often a misspelling or variant of a standard category
  • Values that don't match your defined vocabulary ("Actve" instead of "Active," "NY" instead of "New York" in an inconsistent state field)
  • Blank values in columns that should be required

A frequency table — count of records per unique value — makes categorical outliers immediately visible. Sort by count ascending and the rare values surface at the top.

When Outliers Indicate Systemic Problems

A single outlier might be a one-off error. Multiple outliers of the same type — 50 records with a date of "01/01/1900," or 200 records with a quantity of exactly 9999 — indicate a systemic problem: a broken import, a misconfigured default, or a data entry pattern that's been quietly creating bad data for months.

When you find a cluster of identical outliers, trace them to their source and fix the process that created them.

If you're ready to stop guessing about your data quality, Sohovi is built for exactly this. Upload your first CSV free — no credit card, no IT team, no code needed.

Sohovi Team

Data quality, for people who ship

The Sohovi team writes practical guides on data quality, profiling, and governance to help teams ship better data.

Start for free

Stop guessing. Start knowing your data quality.

Sohovi profiles your datasets in minutes — surfacing completeness gaps, type mismatches, and duplicate patterns before they reach production.

No credit card required · Free forever plan