The difference between Datafold tests and dbt tests

Opening a pull request to modify a dbt model (or many at once!) can be a nerve-racking process. Even if your CI pipelines run smoothly and every ✅ dbt ✅ test ✅ passes, it doesn’t tell you about data changes that will be introduced. 

Are you confident that your code changes won’t introduce errors into your data? 

If your dbt tests pass, but breaking data changes sneak through, you could end up in crisis mode–with broken dashboards, malfunctioning pipelines, and lost stakeholder trust. 

“Wait–whatever do you mean? I thought my dbt tests cover all scenarios that could break my pipelines,” you say.

Reader–would that it were so simple.

dbt tests prevent some data quality issues, but not all. Let’s go over three major differences between Datafold and dbt tests. We will clarify why data teams need Datafold in CI in addition to dbt tests: because these two techniques provide complementary test coverage and protect against fundamentally different data quality issues. 

What is your benchmark for data quality?

Your CI pipeline should be able to verify the accuracy, completeness, and consistency of data whenever your code changes modify your data. Anything less means that you cannot guarantee quality data when you push data changes to production.

1. Datafold finds value-level differences between staging and production 

Datafold compares two versions of the data and identifies differences, while dbt tests evaluate one version of the data and test assertions.

With Datafold, you can prevent issues such as:

  • Errors in individual data values: Event timestamps or transaction amounts changing when they should be immutable
  • Problematic distribution shifts: The distribution of customer ages shifting, even if individual values remain within an acceptable range
  • Primary keys and rows dropped: Entire sections of tables missing due to faulty joins or filters

In contrast, dbt tests prevent issues such as:

  • Values outside of a range you explicitly set
  • PK-FK relationship violated
  • PKs that are not unique and not null
Unlike dbt tests, Datafold Cloud helps catch row-level value differences

2. Datafold prevents a broad range of downstream issues

Because dbt models are interlinked with each other, data sources, and BI tools, small changes in one file can create a ripple effect and wreak havoc on downstream dependencies. Data quality issues can also affect the underlying data infrastructure and computing resources. 

Datafold Cloud’s column-level lineage identifies downstream tables and dashboards that will be impacted by data changes if the code in the Pull Request is deployed to production.

Datafold Cloud's Column-Level Lineage UI offers a comprehensive visual graph of workflows

3. Datafold requires no manual test configuration or maintenance

Each dbt test is manually defined and maintained for every field you want coverage on. And if you have multiple tests per column, the time it takes to configure all of those tests can be significant (and hard to maintain as your dbt project scales!). 

Let’s take a look at how Datafold validates your data during your CI workflow in deployment testing. 

When you open a pull-request with some code changes, your CI pipeline kicks off several jobs, including Data Diff. The Datafold bot leaves a comment identifying modified tables and columns with differing values:

To investigate these discrepancies further, you can click on View details to scrutinize the value-level differences within the Datafold app:

Now that you’ve set up your workflow for Data Diffs, you don’t need to configure any more tests or invest in continued maintenance. This is quite different from dbt tests, which must be continuously updated as your dbt project evolves to ensure complete coverage.

Datafold and dbt tests work really well together

As you can see, Datafold and dbt test coverage is complementary, and each performs distinct and essential tests.

Integrating Datafold into your existing dbt project’s CI pipeline is straightforward. If you’re curious to learn more about how Datafold’s data diffing in CI can help your team prevent shipping code that breaks production data, here are a couple of ways to learn more:

Datafold is the fastest way to validate dbt model changes during development, deployment & migrations. Datafold allows data engineers to audit their work in minutes without writing tests or custom queries. Integrated into CI, Datafold enables data teams to deploy with full confidence, ship faster, and leave tedious QA and firefighting behind.

Datafold is the fastest way to test dbt code changes