When dbt Is the Right Call (And When It Isn't): A Data Engineer's Honest Take
Most dbt implementations fail not because dbt is wrong, but because teams adopt it for the wrong reasons. Here's an honest framework from engineers who have deployed dbt across Snowflake, Databricks, and Redshift.
Your SQL Transformations Are Probably a Mess
This is not a criticism. It is a pattern.
You have 400 stored procedures in your data warehouse. Some were written three years ago by an engineer who left. Half of them have no documentation. Twenty of them run every night in a sequence that nobody fully understands, and when one fails at 3am, someone manually restarts it and hopes the downstream dashboards are correct by 9am.
Your finance team asks why last month's revenue number changed between Monday and Wednesday. You cannot answer that question in under an hour.
If any of this sounds familiar, you have a transformation layer problem. And dbt might be the right tool to fix it — but only if you adopt it for the right reasons.
What dbt Actually Solves
dbt is not a pipeline tool. It does not extract data. It does not load data. It transforms data that is already in your warehouse — and it does this one thing exceptionally well.
But the real value is not the transformation itself. It is everything around it:
Version control for your SQL. Every transformation is a .sql file in a Git repository. You can see who changed what, when, and why. You can code-review a transformation the same way you code-review application code.
Testing as a first-class concept. You can assert that a column is never null, that values are unique, that a foreign key relationship holds. These tests run automatically before any downstream model builds.
Documentation that lives with the code. Model descriptions, column definitions, and lineage are defined in YAML files next to the SQL. The documentation site is auto-generated and always up to date.
Dependency-aware execution. dbt understands which models depend on which. When you change a staging model, it rebuilds only the downstream models that are affected.
Lineage visibility. You can trace any metric in your dashboard back through every transformation to the raw source table. When finance asks "why did this number change?", you can answer in minutes, not hours.
When dbt Is the Right Call
1. You are on a modern cloud warehouse
dbt works because it pushes computation into your warehouse. If you are on Snowflake, Databricks, BigQuery, Redshift, or Azure Synapse, dbt can leverage the warehouse's compute engine directly. This is the architecture it was designed for.
2. Your transformation logic lives in stored procedures nobody understands
If you have a library of stored procedures that were written incrementally over years, with no tests, no documentation, and no version control — dbt gives you a structured way to migrate that logic into a testable, reviewable, documented format. Not overnight, but model by model.
3. Your analytics engineers and data analysts know SQL
dbt is SQL-first. If your team thinks in SQL, dbt meets them where they are. They do not need to learn Python, Spark, or a new programming paradigm. They write SELECT statements, and dbt handles the DDL, dependency resolution, and execution.
4. You need a shared definition of business metrics
"Revenue" means three different things to three different teams? dbt's semantic layer and metrics definitions give you one governed place to define what "revenue" means — and every downstream dashboard pulls from that definition.
5. You are building a data platform that your team needs to own after the consultants leave
This is the scenario where dbt shines brightest. Because the transformation logic is SQL in Git, your internal team can read it, modify it, and extend it without needing specialised tooling knowledge. The handover documentation is the code itself.
When dbt Is NOT the Right Call
1. You need real-time or sub-second transformations
dbt runs in batch. It compiles SQL and executes it against your warehouse. If you need streaming transformations, event-driven processing, or sub-second latency, dbt is not the tool. You need Kafka Streams, Flink, or Spark Structured Streaming.
dbt can sit downstream of a streaming pipeline — transforming landed data in micro-batches — but it cannot replace the streaming layer itself.
2. Your transformations require heavy procedural logic
Some transformations are genuinely procedural: loops, conditional branching across rows, iterative algorithms, API calls mid-pipeline. dbt supports Jinja macros for templating, but if your transformation logic looks more like a Python script than a SQL query, you will fight dbt rather than benefit from it.
In these cases, a Python-based framework (PySpark, Pandas on a compute cluster, or a custom Airflow DAG with Python operators) is a better fit.
3. You do not have a cloud warehouse yet
If your data still lives exclusively in an on-premise SQL Server or Oracle database, adopting dbt before migrating to a cloud warehouse creates friction. dbt supports some on-premise adapters, but the experience is second-class. Migrate first, then adopt dbt.
What a dbt Implementation Actually Looks Like
This is the architecture we deploy for most mid-market clients. It is not the only pattern, but it is the one that survives contact with production.
Key decisions in this architecture:
-
Fivetran or Airbyte for ingestion — dbt does not extract or load data. You need a separate EL tool to get data into the warehouse. We typically recommend Fivetran for teams that want managed connectors, and Airbyte for teams that want open-source flexibility.
-
Three-layer dbt project — staging (raw cleanup), intermediate (business logic), marts (consumption-ready). This is not optional. Teams that skip the staging layer and write business logic directly against raw tables end up with the same spaghetti they had in stored procedures.
-
Tests on every layer — not just on the mart models. If a source system sends null values in a column that should never be null, you want to catch that in staging, not after it has propagated through 15 downstream models.
-
Orchestration via Airflow or dbt Cloud — dbt needs something to trigger it on a schedule. We use Apache Airflow for clients who want full control, and dbt Cloud for teams that prefer a managed scheduler.
Common Mistakes We See
Mistake 1: Adopting dbt without fixing your data ingestion first. If garbage data lands in your warehouse, dbt will transform garbage into well-documented, tested, version-controlled garbage. Fix your extraction and loading pipeline first. Then adopt dbt.
Mistake 2: One massive dbt project with 500+ models and no folder structure. We have seen projects where every model is in a single directory, naming conventions are inconsistent, and there is no separation between staging and mart layers. This is the dbt equivalent of putting all your code in one file. Use the recommended project structure from day one.
Mistake 3: Skipping incremental models because they are harder. Full-refresh models are simple but expensive. When your fact table hits 100 million rows, a full refresh takes 45 minutes and costs real money on Snowflake credits. Incremental models are harder to write correctly, but they are essential for cost control at scale.
Mistake 4: Not integrating dbt tests into CI/CD. dbt tests should run on every pull request, not just in production. If a developer changes a model and breaks a test, you want to catch that before it merges — not at 3am when the production run fails.
Mistake 5: Treating dbt as a silver bullet for data quality. dbt tests validate structure and basic assertions. They do not replace a data quality monitoring tool like Great Expectations, Monte Carlo, or Elementary. dbt tells you "this column has no nulls." A data quality tool tells you "this column usually has 2% nulls, but today it has 15% — something changed upstream."
How We Work
Our Data Engineering & Pipeline Development service includes dbt implementation as a standard component for clients on modern cloud warehouses. A typical engagement looks like:
-
Assessment (1-2 weeks) — audit your current transformation logic, identify migration candidates, and design the dbt project structure.
-
Foundation build (2-4 weeks) — set up the dbt project, CI/CD pipeline, staging models for your highest-priority data sources, and initial mart models for one business domain.
-
Migration & expansion (4-8 weeks) — systematically migrate stored procedures and legacy transformations to dbt models, adding tests and documentation as we go.
-
Handover & training (1 week) — your team runs the project independently. We document everything and run a hands-on training session covering model development, testing, and deployment.
The entire transformation layer is your code, in your Git repository, documented in your style. When we leave, you own it completely.
If you are evaluating whether dbt fits your data stack, schedule a free strategy session with a senior data architect. We will tell you honestly whether dbt is the right call for your situation — and if not, what is.
Frequently Asked Questions
How long does a dbt implementation take?
For a mid-market company with 20-50 data sources, expect 8-14 weeks for a production-ready dbt project covering your highest-priority business domains. This includes project setup, CI/CD, staging and mart models, tests, documentation, and team training. Full migration of legacy stored procedures can take 3-6 months depending on complexity.
Does dbt replace Airflow or other orchestration tools?
No. dbt handles the T in ELT — transformation only. You still need an orchestration tool (Airflow, Dagster, Prefect, or dbt Cloud) to schedule dbt runs, and an extraction tool (Fivetran, Airbyte, custom scripts) to load data into your warehouse. dbt sits between ingestion and consumption.
Can dbt work with our existing Snowflake / Databricks setup?
Yes. dbt connects to your warehouse via a database adapter. It supports Snowflake, Databricks (via Spark), BigQuery, Redshift, Azure Synapse, PostgreSQL, and more. It does not require changes to your warehouse infrastructure — it runs SQL against it.
Is dbt only for large enterprises?
No. dbt is equally valuable for a 3-person data team with 10 sources and a 50-person data org with 500 sources. The project structure scales. What changes is the governance layer — larger teams need more rigorous CI/CD, model ownership conventions, and testing standards.
What is the cost of dbt?
dbt Core is open-source and free. dbt Cloud (managed scheduling, IDE, semantic layer) starts at ~$100/developer/month. The real cost is engineering time to set up the project, migrate logic, and train your team — which is where consulting engagement typically provides the fastest path to production.
Not Sure Where to Start?
Book a free 30-minute strategy session with a senior data architect — no pitch, no obligation.
Schedule Your Free Strategy Session