Campaign Fulfillment at Scale: Migrating from SQL Server to Databricks — 4 Hours to 14 Minutes | DataStackX Case Study

A campaign fulfillment engine that worked fine at 100,000 contacts collapsed under the weight of 2 million. SQL Server was not the problem — the architecture was. Here is how we redesigned it for Databricks and cut fulfillment time by 94%.

The Breaking Point

The client operated one of the largest automotive marketing platforms in the US — serving 4,000+ dealerships with targeted campaign automation, including coupon-based offers delivered to prospect audiences segmented by vehicle interest, geography, purchase intent, and financing history.

The scale trajectory:

2019: A campaign to 200,000 contacts ran in 18 minutes
2021: A campaign to 600,000 contacts was taking 1.5 hours
2022: A campaign to 800,000 contacts was taking 2.5 hours with occasional timeouts
Early 2023: Campaigns targeting 2 million+ contacts were failing to complete within the business day

The contractual SLA to dealerships was "campaign live within 2 hours of approval." That commitment was being broken, weekly.

The system was collapsing under its own success. The question was whether to patch or redesign.

Previous company engagement. Client details are anonymised.

What the Campaign Fulfillment Engine Did

The fulfillment pipeline had four stages:

Audience resolution — evaluate the campaign's targeting rules against the full contact database to produce a recipient list
Coupon assignment — for coupon-based campaigns, assign a unique, one-time-use coupon code to each recipient (strict no-duplicate constraint)
Personalization — render each recipient's message with their name, vehicle interest, assigned coupon, dealership details, and offer copy
Delivery queuing — push personalized payloads to the delivery service (email, SMS, direct mail)

All four stages were implemented as stored procedures in SQL Server. Written when campaigns had 50,000 contacts. Patched and extended for three years. By 2023, they were a labyrinth of temporary tables, cursor loops, and cross-database joins that no single engineer could fully trace.

Root Cause Analysis

Before touching the migration, we spent two weeks profiling the existing system under production-representative load.

Finding 1: Coupon assignment was a cursor loop

Assigning unique coupons to 2M recipients was done row-by-row:

-- Simplified pseudocode of the original stored procedure
WHILE @remaining_recipients > 0
BEGIN
  SELECT TOP 1 @coupon_id = coupon_id
  FROM coupons
  WHERE used = 0
  AND pool_id = @pool_id;

  UPDATE coupons SET used = 1, assigned_to = @recipient_id
  WHERE coupon_id = @coupon_id;

  SET @remaining_recipients = @remaining_recipients - 1;
END

At 2M recipients, this is 2 million individual SQL Server round trips on a single-threaded cursor. This alone accounted for ~68% of total fulfillment time.

Finding 2: Audience resolution was doing full table scans

The contact database held 8.5M records. The audience filter queries were missing composite indexes on the most common filter dimensions, resulting in multiple full-table scans per campaign execution.

Finding 3: Lock contention under concurrent load

Multiple campaigns running simultaneously (standard during business hours) were acquiring table-level locks on shared reference tables (coupon pools, contact attributes). Concurrent campaigns queued behind each other, causing cascading delays and occasional deadlocks.

Finding 4: Zero parallelism

The entire four-stage pipeline was single-threaded. SQL Server was sitting at 12% CPU utilization during campaign runs. The bottleneck was sequential stored procedure logic — not compute resource saturation.

The Redesign: Databricks + Delta Lake

We redesigned the pipeline as distributed Spark jobs on Databricks, with Delta Lake as the data layer.

Key Design Decision: Coupon Assignment Without Duplicates at Scale

This was the most complex problem. We had to guarantee that:

Every recipient gets exactly one coupon
No coupon is assigned to more than one recipient
This holds under concurrent campaign execution (multiple campaigns running simultaneously)
At 5M+ recipient scale

The solution: partition-aware batch join with Delta Lake optimistic concurrency.

The available coupons table is partitioned by coupon pool ID. Each recipient is assigned to a partition bucket using a deterministic hash. Within each partition, coupon assignment is a single shuffle-free join — no row-by-row iteration. The assignment write is a conditional Delta Lake MERGE:

# Simplified — actual implementation handles retry and partition logic
(
    delta_coupons
    .alias("coupons")
    .merge(
        assignments_df.alias("assignments"),
        "coupons.coupon_id = assignments.coupon_id AND coupons.assigned_to IS NULL"
    )
    .whenMatchedUpdate(set={"assigned_to": "assignments.recipient_id", "assigned_at": "current_timestamp()"})
    .execute()
)

If a concurrent job claims a coupon in the same partition before this write commits, Delta Lake's optimistic concurrency detects the conflict and the affected assignments retry against the next available partition. Tested and verified: zero duplicate assignments at 5M recipient load across 50 concurrent test campaigns.

Audience Resolution with Predicate Pushdown

The contacts Delta table is Z-ordered on the four most common filter dimensions: geography_state, vehicle_category, purchase_intent_score, and last_contact_date. Audience resolution queries push filter predicates down to file-level skipping — a typical campaign filter touches 3–8% of data files instead of scanning all 8.5M records.

Migration Approach: Shadow Mode, Not Big Bang

We ran parallel pipelines for 6 weeks:

SQL Server continued processing all production campaigns
Databricks processed the same campaigns in shadow mode
Output comparison: recipient lists, coupon assignments, and personalized payloads were diff'd against the SQL Server output after each campaign

After zero discrepancies across 200+ shadow-mode campaigns, we cut over. SQL Server pipeline kept on cold standby for 30 days post-cutover as rollback insurance.

Results

Metric	SQL Server (before)	Databricks (after)	Change
2M audience campaign time	4.2 hours	14 minutes	−94%
Max practical audience size	~800K	5M+	6.25x capacity
Coupon duplication errors	Occasional under load	Zero	Eliminated
Concurrent campaign capacity	2 (lock contention)	8+	4x
Infrastructure cost per campaign run	Baseline	−74%	3.8x cheaper per run
SLA compliance (2-hour delivery)	67%	99.2%	+32 percentage points

The cost reduction is structural: the Databricks job cluster auto-scales, runs the campaign, and terminates. There is no always-on legacy database instance sized for peak load sitting idle during off-hours.

What This Demonstrates

This is a canonical data engineering migration: a system built for one scale, hitting the wall at 10x that scale, requiring architectural redesign rather than incremental patching.

The skills demonstrated here are transferable to any data platform migration:

Profile before migrating — the cursor loop and missing index findings came from measurement, not assumption
Partition-aware distributed job design — the coupon assignment pattern applies to any distributed write with uniqueness constraints
Shadow mode validation — running parallel pipelines to validate correctness before cutover eliminates migration risk
Phased cutover with rollback — the 30-day cold standby gave the business confidence to cut over

The industry is automotive. The pattern applies everywhere.

Campaign Fulfillment at Scale: Migrating from SQL Server to Databricks — 4 Hours to 14 Minutes

The Breaking Point

What the Campaign Fulfillment Engine Did

Root Cause Analysis

The Redesign: Databricks + Delta Lake

Key Design Decision: Coupon Assignment Without Duplicates at Scale

Audience Resolution with Predicate Pushdown

Migration Approach: Shadow Mode, Not Big Bang

Results

What This Demonstrates

Not Sure Where to Start?

More Case Studies

Architecting the Single View of the Customer: Building a Composable CDP That Actually Scales

Building a Real-Time Aquaculture Intelligence Platform: The AquaStackX Story

Pond Score: Building a Machine Learning Risk Engine for Aquaculture Farms

Not Sure Where to Start? Start Here.