What Is ETL (Extract, Transform, Load)?

ETL stands for Extract, Transform, Load — the three-step pipeline process that copies data from one or more source systems, applies transformations to clean and restructure it, and loads the result into a destination system such as a data warehouse, analytics platform, or operational database.

ETL is the backbone of most business intelligence and analytics infrastructure. Every time your BI tool shows data from your CRM, your billing system, and your marketing platform together in one report, ETL is almost certainly the mechanism making that possible.

The Three Steps of ETL

Extract: Data is pulled from source systems — databases, APIs, flat files, SaaS applications. The extraction captures data in its native format without modification. Sources might include a Postgres database, a Salesforce CRM export, a Stripe API, or a CSV file from a partner.

Sohovi finds gaps, duplicates, and format errors in your CRM data — so your team is working from records they can trust.

Transform: The extracted data is cleaned, restructured, and enriched to fit the target schema and meet quality standards. Transformations include: standardizing date formats, joining data from multiple sources, calculating derived metrics, filtering out invalid records, and applying business rules.

Load: The transformed data is written to the destination — typically a data warehouse (Snowflake, BigQuery, Redshift), an analytics platform, or an operational database.

ETL vs. ELT

A modern variant, ELT (Extract, Load, Transform), reverses the last two steps. Data is extracted and loaded into the destination in raw form, and transformations happen inside the destination using tools like dbt. ELT is increasingly common with cloud data warehouses that can handle transformations at scale efficiently. The distinction matters for data quality: with ETL, quality checks are applied before loading; with ELT, they're applied after loading, using tools like dbt tests.

Sohovi applies your data quality rules automatically across the whole dataset and highlights every violation — so nothing slips through.

[IMAGE: A pipeline diagram showing the ETL flow: Source Systems → Extract → Transform (clean, restructure, enrich) → Load → Data Warehouse → BI Tools]

ETL and Data Quality

ETL pipelines are one of the most common places where data quality problems are introduced or compounded. Poorly designed transforms create data quality failures that propagate silently into downstream reports. Best practice: add data quality checks at each stage — validate extracted data before transforming, validate transformed data before loading.

Frequently Asked Questions

Q: What does ETL stand for? ETL stands for Extract, Transform, Load — the three-step pipeline process for moving data from source systems to a destination. Extract pulls data from sources, Transform cleans and restructures it, and Load writes it to the destination.

Q: What is the difference between ETL and ELT? ETL transforms data before loading it into the destination. ELT loads raw data first and transforms it inside the destination. ELT is increasingly common with cloud warehouses that can handle large-scale transformations efficiently. dbt is the primary tool for ELT transformations.

Q: What are common ETL tools? Traditional ETL tools include enterprise ETL platforms, enterprise ETL platforms, Microsoft SSIS, and Pentaho. Modern cloud ETL tools include cloud data pipeline tools, Airbyte, cloud data pipeline tools, and AWS Glue. For ELT, dbt handles transformations after loading. Each has different tradeoffs in cost, complexity, and capability.

Q: What is a data pipeline and how does it relate to ETL? An ETL process is a type of data pipeline — a sequence of steps that moves and transforms data. "Data pipeline" is the broader term that encompasses ETL, ELT, streaming data processing, and real-time integration patterns.

Q: Why does ETL cause data quality problems? Poorly designed transformations can introduce errors: join logic that drops records unintentionally, type casting that corrupts values, filter conditions that exclude valid data, or aggregations that double-count. Without quality checks at each stage, these errors propagate silently.

Q: What is schema drift in an ETL context? Schema drift occurs when a source system changes its schema — adding, removing, or renaming columns — without warning. ETL pipelines built against the original schema break silently when the schema changes, often producing wrong values or null outputs rather than explicit errors.

Q: How do you add data quality checks to an ETL pipeline? Add validation steps at key points: validate extracted data before transforming (check schema, null rates, expected volume), validate transformed data before loading (check business rules, referential integrity, expected output ranges), and monitor loaded data over time (alert on anomalies).

Q: What is a delta load vs. a full load in ETL? A full load replaces all data in the destination with the current source data. A delta load (or incremental load) loads only the records that have changed since the last load. Delta loads are faster for large datasets but more complex to implement correctly.

Q: What is the role of dbt in modern ETL? dbt (Data Build Tool) handles the Transform step in an ELT pattern. It applies SQL-based transformations inside the data warehouse and includes built-in support for data quality tests — checking uniqueness, not-null constraints, accepted values, and referential integrity on transformed models.

Q: What happens to data quality when an ETL job fails? It depends on the failure mode. A complete failure usually produces no data in the destination. A partial failure can produce incomplete data — some records transformed correctly, others missing or corrupted. This is why monitoring ETL job completion and data volume is part of data quality monitoring.

ETL moves your data — but without quality checks built into the pipeline, it also moves your data quality problems. Adding validation at each stage protects your downstream analytics from errors introduced upstream.

The Three Steps of ETL

ETL vs. ELT

ETL and Data Quality

Frequently Asked Questions

Stop guessing. Start knowing your data quality.

More from Data Quality Glossary

What Is Data Lineage? A Plain-English Guide for Business Owners

What Is Data Stewardship? And Who Should Own It at Your Company?

What Is Data Enrichment?