Why Organizational Silos Wreck Cohort Retention

"What's our retention rate?" Three executives asked. You got three different answers.

Sound familiar? As data engineers, we build elegant pipelines and craft pristine SQL, only to watch our masterpieces fuel organizational warfare. We thought we were building bridges, but somehow constructed battlegrounds instead.

Nowhere is this data civil war more evident than in cohort retention analysis—a business-critical metric that should unite teams but often tears them apart.

The Organizational Battleground: When "Retention" Means 10 Different Things

Picture this: It's the quarterly business review. The CMO proudly announces "Our retention is up 12%!" only for the CPO to frown, "Actually, our numbers show it's down 3%." The CFO glances at her report, "According to Finance, it's flat."

Same company. Same users. Three different truths.

Before a single line of SQL is written, the organizational disconnect begins. Each department views retention through their specialized lens:

This departmental Rashomon effect creates real damage:

  1. Truth becomes tribal: "Marketing's numbers are wrong. Product's definition is the only one that matters."
  2. Data trust erodes: "These dashboards contradict each other. Can we trust any of our data?"
  3. Decision paralysis sets in: "We can't approve this initiative without agreement on the baseline metrics."
  4. Resource battles intensify: "Why invest in X when Y is clearly more impactful for retention?"

As the data engineers who built the pipes these conflicting metrics flow through, we cringe as our beautifully normalized tables fuel organizational chaos.

Technical Deep Dive: Where Silos Infiltrate Your SQL

Let's dissect how these organizational rifts manifest in code. Below is a simplified cohort retention SQL calculation—in real life, these queries can span hundreds of lines across multiple files.

-- Step 1: Define the Cohort - Who are we tracking?
WITH user_cohorts AS (
  SELECT
    user_id,
    MIN(signup_date) AS cohort_date -- Assuming 'signup_date' marks the start
    -- OR MIN(first_activity_timestamp)::DATE AS cohort_date -- Alternative definition
  FROM users -- Or an activity table if signup date isn't available
  GROUP BY 1
  -- >> Organizational Impact: Is 'signup_date' the agreed-upon cohort definition?
  --    What if Marketing uses 'first_marketing_touch_date'?
),
 
-- Step 2: Identify Relevant User Activity - What counts as "active"?
user_activity AS (
  SELECT
    user_id,
    activity_timestamp::DATE AS activity_date
  FROM activity_log
  WHERE event_type = 'core_product_action' -- Example: Product team's definition
  -- >> Organizational Impact: This 'event_type' filter IS the definition of active.
  --    Marketing might need 'email_open' or 'ad_click'. Finance might need 'purchase'.
  --    A single, hardcoded definition here serves only one team well.
),
 
-- Step 3: Map Activity Back to Cohorts and Calculate Periods
cohort_activity_periods AS (
  SELECT
    uc.user_id,
    uc.cohort_date,
    -- Calculate the week number since cohort start (Week 0 = first 7 days)
    FLOOR(DATE_DIFF('day', uc.cohort_date, ua.activity_date) / 7) AS week_number
  FROM user_cohorts uc
  JOIN user_activity ua
    ON uc.user_id = ua.user_id
  WHERE ua.activity_date >= uc.cohort_date -- Only count activity *after* joining the cohort
),
 
-- Step 4: Aggregate for Retention Metrics
final_cohort_retention AS (
  SELECT
    cohort_date,
    week_number,
    COUNT(DISTINCT user_id) AS retained_users
  FROM cohort_activity_periods
  GROUP BY 1, 2
),
 
-- Step 5: Calculate Cohort Size and Retention Percentage
cohort_sizes AS (
  SELECT
    cohort_date,
    COUNT(DISTINCT user_id) AS cohort_size
  FROM user_cohorts
  GROUP BY 1
)
 
-- Final Output
SELECT
  cs.cohort_date,
  cs.cohort_size,
  fcr.week_number,
  fcr.retained_users,
  (fcr.retained_users::FLOAT / cs.cohort_size) * 100 AS retention_percentage
FROM cohort_sizes cs
LEFT JOIN final_cohort_retention fcr
  ON cs.cohort_date = fcr.cohort_date
ORDER BY 1, 3;

The SQL Battlefield: Where Definitions Become Dogma

Look closely at our query. Three seemingly innocent choices create organizational landmines:

Marketing's Definition of Retention

Users who engaged with our email campaigns or visited our landing pages this month.

WITH user_cohorts AS (
  SELECT
    user_id,
    MIN(first_campaign_click_date) AS cohort_date
  FROM marketing_touchpoints
  GROUP BY 1
),
user_activity AS (
  SELECT
    user_id,
    activity_date
  FROM activity_log
  WHERE event_type IN ('email_open', 'campaign_click', 'landing_page_visit')
)

Insight

These three conflicting SQL implementations all claim to measure the same thing: "retention." But they're tracking fundamentally different user behaviors with dramatically different business implications.

This isn't just academic. This query—perfect for one team—becomes a source of conflict when other teams run contradictory analyses using different definitions.

From Code Monkey to Organizational Bridge Builder

We data engineers are uniquely positioned to solve this problem—not just with technical solutions, but by reimagining our role. Here's how:

1. Build Flexible Foundations, Not Fixed Reports

⚠️

Hardcoded SQL definitions are organizational landmines waiting to explode in your next cross-functional meeting.

Instead of hardcoding assumptions, create parameterized solutions:

-- Inflexible approach that only works for one team:
WHERE event_type = 'core_product_action'

Why it matters: This parameterized approach allows the same query to adapt to different team definitions without changing the core logic.

Create reusable components with explicit, configurable parameters. Build with dbt macros, stored procedures, or view generators that accept parameters like:

2. Facilitate the Definition Discussions

Become the translator between business and data:

Lead the diplomatic mission:

3. Engineer for Transparency, Not Just Performance

Make definitions visible:

Surface conflicts proactively:

From Data Plumbers to Strategic Partners

How Different Teams View Retention

Marketing

Key Metric: Campaign Engagement
Timeframe: Monthly
Definition: Email opens, ad clicks, page visits within the month.

Product

Key Metric: Feature Adoption
Timeframe: Weekly
Definition: Completed core product action loop at least once per week.

Finance

Key Metric: Revenue Retention
Timeframe: Billing Cycle
Definition: Successful renewal payment or active subscription status.

Customer Success

Key Metric: Account Health
Timeframe: Quarterly
Definition: Maintained positive health score and had no critical support escalations during the quarter.

This organizational challenge presents an opportunity. By addressing these retention definition conflicts, data engineers transition from backend implementers to strategic partners who:

  1. Facilitate cross-functional alignment by making metric definitions explicit and negotiable
  2. Build flexible technical foundations that adapt to evolving business needs
  3. Create transparency around data assumptions that builds organizational trust

The most valuable SQL you'll ever write isn't the one with the cleverest optimization—it's the one that helps your organization reach consensus on what its data actually means.

So the next time you're asked to build yet another retention dashboard, remember: your true value isn't just in joining tables, but in joining teams around a shared understanding of what the data means.

After all, the most important joins we make aren't in our queries—they're between the people who use them.


Question

What organizational data conflicts have you encountered? How did you solve them?

not made by a 🤖