002 ·  Selected work  ·  Digital banking ·  2022–2023

An active customer,
defined once.

Fourteen source systems, fourteen definitions of the same word, and an OSFI reviewer asking the same opening question every cycle: where does this number come from. We wrote a metric contract, landed bronze / silver / gold under audit lineage, and put one answer on every desk.

Duration: ~5 months (Build phase of a 14-month engagement)  ·  Team: 4  ·  Programs: Build · Advance ·  Year: 2022–2023

Brief

The ask

A Canadian digital bank under OSFI supervision asked us to stand up one analytics layer the executive team could read together — product, finance, risk, and compliance all reconciling against the same row. The constraint was the regulator. Every transformation had to be auditable, versioned, and explainable to a reviewer who would never see the SQL. Ad-hoc query writing at the modeled layer was disallowed by the time we arrived.

Starting state: fourteen source systems — core banking, mobile-app telemetry, KYC, AML scoring, marketing automation, two CRM instances, a support tool, NPS, a rates engine, the card ledger, and three operational CSV drops. Sixty-plus analysts on read-replicas of operational systems, each reporting a different number for the same word. A quarterly board deck that shipped late and got reconciled by footnote. The Build phase ran ~5 months inside a 14-month engagement.

Problem

The real diagnosis

The reported symptom was clean: "we can't agree on how many active customers we have." The COO had asked the question in three consecutive ops reviews and received three different numbers. Marketing counted email-engaged in the rolling 30. Product counted weekly-app-launch. Core banking counted any settled transaction. KYC counted verified-and-funded. Compliance counted something else again. Pairwise, the definitions diverged by 18–34% depending on the business unit and the week.

The real diagnosis was not measurement — it was governance. Every team owned its own definition because nobody had ever been paid to write the one. Audit lineage was missing from anything that ended up on a slide. The opening question of every OSFI examination was the same: where does this number come from. The bank could not answer it from a report. A forensic crawl through three systems by an analyst with tribal knowledge counted as a control — until the next analyst hire.

Quantify it. Fourteen production systems feeding reports. Sixty-plus analysts writing ad-hoc SQL against operational read-replicas at a cadence that meant the same business question produced different answers on different desks the same morning. The quarterly board deck shipped reconciled by methodology footnote. The Head of Analytics ran a Friday standing call to reconcile two analyst numbers that should have been the same number. That call existed for eighteen months before we were brought in.

Approach

Five moves, in order

Move 01

Sign the metric contract. One definition, on one page.

Three weeks of interviews across product, marketing, core banking, KYC, AML, and compliance produced a one-page contract. Active customer was defined once: primary-product holder, at least one customer-initiated event in the trailing 30 days, opted-in to operational contact, not in a frozen or under-review state. The CFO signed it. The Chief Compliance Officer signed it. Every downstream model carried a comment tying the column it produced back to the contract clause it implemented.

Move 02

Land bronze. Fivetran for SaaS, custom CDC for the core, audit lineage everywhere.

Snowflake as the warehouse. Fivetran for the eleven SaaS sources. A custom CDC pipeline ran in-VPC for the core banking platform and the card ledger — regulated data, could not leave the network boundary. Bronze was strictly load-as-is, no transforms, PII tagged at ingest with classification inherited from the source contract. Datadog watched freshness against the source SLAs and paged before the analyst noticed.

           ┌────────────────────────────────────────────────┐
           │  14 SOURCE SYSTEMS                              │
           │  core_banking · kyc · aml · mobile_app · crm_a  │
           │  crm_b · marketing_automation · support · nps   │
           │  rates_engine · card_ledger · csv_drops (x3)    │
           └─────────────────────┬──────────────────────────┘
                                 │  Fivetran (SaaS)
                                 │  Custom CDC (regulated, in-VPC)
                                 ▼
           ┌────────────────────────────────────────────────┐
           │  BRONZE                raw_<source>.<table>     │
           │  load-as-is · PII-tagged · freshness-monitored  │
           └─────────────────────┬──────────────────────────┘
                                 │  dbt: staging models
                                 ▼
           ┌────────────────────────────────────────────────┐
           │  SILVER                stg_<entity>             │
           │  typed · de-duplicated · contract-conformed     │
           │  one row per business-key · lineage-tagged      │
           └─────────────────────┬──────────────────────────┘
                                 │  dbt: dimensional models
                                 ▼
           ┌────────────────────────────────────────────────┐
           │  GOLD                  dim_customer (SCD-2)     │
           │                        fct_journey_event        │
           │                        mart_weekly_funnel       │
           │  one canonical definition per metric · tested   │
           └────────────────────────────────────────────────┘

Move 03

Build dim_customer as SCD-2, identity-resolved across fourteen sources.

One canonical customer dimension, slowly-changing on every attribute that affected the funnel — product holdings, KYC state, AML risk band, opt-in posture, frozen flag. Identity resolution joined the fourteen sources by a deterministic key set (core banking party-ID first, then a hashed-SIN fallback, then a phone+email composite for the long tail). The is_active flag is computed once, from the contract, on this table. Nothing downstream redefines it.

-- models/marts/dim_customer.sql
-- Slowly-changing dimension, Type 2, one row per (party_id, valid_from).
-- is_active follows the signed metric contract:
--   primary-product holder + customer-initiated event in trailing 30d
--   + opted-in + not in frozen/under-review state.
-- Lineage tag: METRIC-CONTRACT-2022-Q3 · clause 3.1 (active customer).

{{ config(
    materialized = 'incremental',
    unique_key   = 'customer_sk',
    on_schema_change = 'fail',
    tags         = ['gold', 'lineage:metric-contract-2022-q3']
) }}

with resolved as (

    select
        coalesce(cb.party_id, ph.party_id, pc.party_id)         as party_id,
        coalesce(cb.legal_name_hash, ph.sin_hash)               as identity_hash,
        cb.primary_product_code,
        cb.primary_product_opened_at,
        ks.kyc_state,
        ks.kyc_verified_at,
        am.aml_risk_band,
        cm.opt_in_operational,
        cb.account_status,                                     -- 'active' / 'frozen' / 'under_review'
        ev.last_customer_event_at,                             -- max across journey-affecting events
        current_timestamp::timestamp_ntz                       as snapshot_at

    from   {{ ref('stg_core_banking__party')         }} cb
    left   join {{ ref('stg_kyc__verification_state') }} ks
           on ks.party_id          = cb.party_id
    left   join {{ ref('stg_aml__risk_band')          }} am
           on am.party_id          = cb.party_id
    left   join {{ ref('stg_crm__contact_marketing')  }} cm
           on cm.party_id          = cb.party_id
    left   join {{ ref('stg_journey__last_event')     }} ev
           on ev.party_id          = cb.party_id
    -- fallback joins for the long tail (no core-banking party row yet)
    left   join {{ ref('stg_kyc__pending_handoff')    }} ph
           on ph.sin_hash          = cb.legal_name_hash
    left   join {{ ref('stg_crm__phone_email_composite') }} pc
           on pc.contact_hash      = cb.legal_name_hash

),

scored as (

    select
        r.*,
        case
            when r.primary_product_code is not null
             and r.last_customer_event_at >= dateadd(day, -30, r.snapshot_at)
             and r.opt_in_operational = true
             and r.account_status     = 'active'
            then true
            else false
        end                                                     as is_active

    from   resolved r

),

hashed as (

    select
        hash(party_id, snapshot_at)                             as customer_sk,
        party_id,
        identity_hash,
        primary_product_code,
        primary_product_opened_at,
        kyc_state,
        kyc_verified_at,
        aml_risk_band,
        opt_in_operational,
        account_status,
        last_customer_event_at,
        is_active,
        snapshot_at                                             as valid_from,
        cast(null as timestamp_ntz)                             as valid_to,
        true                                                    as is_current
    from   scored

)

{% if is_incremental() %}

-- close out the previous current row when any tracked attribute changes
merge into {{ this }} tgt
using (
    select  h.*
    from    hashed h
    join    {{ this }} prev
      on    prev.party_id  = h.party_id
     and    prev.is_current = true
    where   prev.primary_product_code   is distinct from h.primary_product_code
       or   prev.kyc_state              is distinct from h.kyc_state
       or   prev.aml_risk_band          is distinct from h.aml_risk_band
       or   prev.account_status         is distinct from h.account_status
       or   prev.opt_in_operational     is distinct from h.opt_in_operational
       or   prev.is_active              is distinct from h.is_active
) chg
on   tgt.party_id   = chg.party_id
 and tgt.is_current = true
when matched then update set
     valid_to   = chg.valid_from,
     is_current = false;

{% endif %}

select * from hashed

Move 04

Capture every customer-affecting event in fct_journey_event.

One fact table, one row per event, stage-attributed at write time against the four contracted stages — onboarded, activated, engaged, retained. Source-system column on every row. Audit-lineage tag on every column tying it back to the metric clause it implemented. Datadog watched freshness; a missed source SLA paged the analytics engineering on-call before Monday's funnel rendered. The fact-table grain stayed event-level so any new stage definition could be replayed without a backfill.

Move 05

Decommission read-replica access. Move the sixty onto the modeled layer.

We did not flip a switch. Product analytics migrated first, then marketing, then risk, then operations. Each team got a one-week pairing engagement: their three highest-volume queries rewritten against gold, plus office hours. Read-replica grants on operational systems were revoked team by team, with the analyst's manager in the meeting. By the end of the Build phase, no analyst was querying operational systems directly. The weekly executive funnel — six panels, one PDF, Monday 07:00 — replaced the quarterly board deck with the CFO's blessing.

Outcome

60 analysts.
14 sources.
1 customer.

One definition, written and signed. Sixty analysts reading the same tables. Fourteen production systems collapsed into a single governed layer whose lineage survives a regulator's opening question — without anyone leaving the room to look something up.

14 → 1

Source systems collapsed into one governed customer-journey warehouse with audit lineage tagged to a signed metric contract.

60 → 0

Analysts writing ad-hoc SQL against operational read-replicas. All sixty moved onto tested dbt models at the modeled layer.

Quarterly → weekly

Executive funnel cadence. The deck nobody read was retired; the Monday funnel is the report the COO opens first.

1 definition

Of active customer — primary product, last 30 days, opted-in. CFO-signed, CCO-signed, used end to end.

What this enabled, beyond the headline. The first OSFI review after migration opened on a lineage screen, not a forensic crawl — the reviewer's opening question was answered by clicking a column. Compliance reviews of new marketing campaigns moved from days to hours because the contract was queryable. The product team began reading A/B results against the weekly funnel inside the same week, replacing a six-to-eight-week analyst cycle. The Friday reconciliation call the Head of Analytics had been running for eighteen months was cancelled and not rescheduled.

Credits

Who shipped this

Lead Analyst
Haris Naeem
Analytics Engineering
Contractor — anonymized
Data Engineering
Internal team — anonymized
Client Stakeholder
Head of Analytics — anonymized
Client Stakeholder
VP Product — anonymized
Client Sponsor
Chief Operating Officer — anonymized
Governance Review
Compliance team — anonymized

Client-side names withheld under a multi-year confidentiality agreement. Named references available on request after a discovery call.

Next

Building a governed warehouse under regulator supervision?

Send a note describing the source systems you're unifying, the regulator your team answers to, and where your current definitions diverge across business units. If it's a fit, a named reference from this engagement is available after a discovery call.

© 2026 Coded Level — Toronto analytics studio.Toronto, ON · Canada