Medallion Architecture in Practice: Six Layers, One Source of Truth

Dilkesh Tilokchandani

  1. Jun 04, 2026
  2. 4 min read

Two hundred source definitions. Forty-seven job pipelines. Five hundred and sixteen transformation models. And not a single bespoke DAG written for any of them.

That's the headline result of a data platform we built to ingest, transform, and distribute healthcare and commercial data at scale - all driven by YAML configuration rather than hand-written pipeline code. Here's the architecture behind it, and what we're proud of delivering.

The Core Idea: Configuration Over Code

The guiding principle behind this platform is simple: adding a new data source should mean writing a YAML file, not a new DAG.

Instead of writing logic per source, we built general-purpose DAG generators that read metadata and produce the right Airflow behavior dynamically. The YAML files are the source of truth. This single decision shaped everything else about the platform.

Architecture Overview

The platform has two connected repositories working in tandem: an orchestration layer built on Airflow, and a transformation layer built on dbt. Together they form a clean end-to-end pipeline - from raw vendor files to business-ready outputs distributed to downstream consumers.

The Ingestion Layer

For each data source, a YAML file defines everything the system needs to know: where the file comes from (SFTP, S3, Salesforce SOQL, or an external API), the filename pattern, load type, delimiter and encoding, archive behavior, and column-level data quality expectations.

A DAG generator reads these files at runtime and produces one Airflow DAG per source. Each DAG follows a consistent, reliable sequence:

  1. Fetch files from the source system
  2. Land them in an S3 input bucket
  3. Log ingestion status to a Postgres/RDS metadata store
  4. Load into Snowflake raw tables using COPY INTO
  5. Run dbt tests on raw data
  6. Build base models
  7. Run dbt tests again
  8. Archive processed files

Sensitive datasets follow a dedicated PII-specific path that routes them into an isolated Snowflake schema with separate bucket and stage configurations - keeping compliance requirements cleanly separated from the main pipeline with no extra work from the engineers adding new sources.

The Medallion Architecture: Six Layers of Trust

The heart of the transformation layer is a medallion-style architecture implemented in dbt. Rather than consolidating everything into a single schema and hoping for the best, we designed six distinct layers - each with a clear purpose, increasing levels of refinement, and explicit contracts between them.

The idea is straightforward: an engineer can look at any model path and instantly know how mature and trustworthy the data is. A raw_ prefix signals unprocessed ingestion straight from source. A reporting_ prefix signals a polished, validated, business-approved output. The layers in between are the journey from one to the other.

RAW is the landing zone. Data arrives here exactly as it came from the source - no transformations, no business logic. Every source table is preserved in its original form, giving a full audit trail and the ability to reprocess from scratch if needed.

BASE is where raw data gets its first treatment. Light cleaning happens here: standardising column names, enforcing data types, trimming whitespace, handling nulls, and normalising encodings. BASE models are thin but consistent - they create a reliable foundation for everything built on top.

CORE is where the real transformation happens. This layer joins tables, deduplicates records, applies business rules, and produces the canonical representation of each business entity. If BASE is about technical correctness, CORE is about business correctness. A model like core_patients or core_prescriptions reflects what those entities actually mean to the organisation - not just what arrived in a file. CORE is the truth layer the entire downstream stack is built on.

ANALYTICS builds on CORE to produce aggregated, metric-level models - pre-computed measures, period-over-period comparisons, cohort rollups. By materialising these in the warehouse rather than in BI tools, every consumer querying the data sees consistent numbers.

REPORTING packages analytics outputs into consumption-ready datasets shaped to the exact needs of specific consumers - dashboards, stakeholder reports, or downstream applications. These models are denormalised and labelled with business-friendly names, requiring minimal transformation by the end consumer.

FORECASTING is the specialised layer for predictive and modelling workflows. It draws from CORE and ANALYTICS to feed statistical models, demand forecasts, and scenario analyses - keeping predictive logic cleanly separated from the operational reporting stack.

Job Orchestration

Job orchestration reads job YAML files to create sequential Airflow DAGs. A job can trigger source DAGs, trigger other jobs, run dbt commands, wait on external tasks, capture dbt artifacts, and send status or data quality emails. This composability is what makes the whole system feel like one cohesive platform rather than a collection of independent pipelines.

Outbound Publishing

Data doesn't just flow in — it flows out. A feeds framework reads feed YAML files, executes Snowflake queries, generates output files, and uploads them to S3 or pushes directly back to Salesforce. A separate file transfer manager handles standalone SFTP-to-S3 delivery for vendor partners.

What We Achieved

Scale without complexity. The platform handles over 200 source YAML definitions, 47 job definitions, 15 feed definitions, and around 516 SQL models - spanning commercial data domains including Veeva, Prognos, Claritas, SFMC, MMIT, Komodo, Cardinal 3PL, and others. Adding a new vendor typically involves writing a source YAML and a handful of dbt models. No new DAG code, no new operators.

Operational reliability built in. Every source pipeline - without exception - gets ingestion logging, file archiving, DQ test gates, and dbt artifact capture. These aren't features someone has to remember to add, they're part of the framework that every YAML-defined source inherits automatically.

Compliance handled gracefully. PII data flows through a fully isolated path with separate infrastructure, enforced by the framework - not by individual engineers remembering to do the right thing.

A trustworthy data contract across six layers. The medallion architecture gives every downstream consumer a clear signal about what they're working with. RAW preserves the original record. BASE standardises it. CORE establishes business truth. ANALYTICS aggregates it consistently. REPORTING delivers it consumption-ready. FORECASTING powers predictive work, cleanly separated from operational flows. Each layer earns its name.

Real-world edge cases solved. ZIP file ingestion, incremental file detection, encoding variations, outbound Salesforce syncs, custom QC reporting - these aren't afterthoughts. They're built into the framework as first-class capabilities, accumulated through the real experience of working with dozens of external vendors.

The Bigger Picture

The metadata-driven approach isn't new - it's a well-established pattern in data engineering. What made it work here was the discipline to commit to it consistently: across ingestion, transformation, orchestration, and outbound publishing, rather than applying it selectively and falling back to bespoke code when things got complicated.

The result is a platform that scales with the business. New data sources, new business domains, new downstream consumers - all handled through configuration. The engineering effort goes into improving the framework, not repeating the same work for the hundredth source.

For teams considering a similar path, our advice: invest in the YAML schema design before writing a single line of generator code. The shape of your configuration files determines everything downstream, and it's far easier to get that right at the start than to migrate 200 source definitions later.

Want to learn more about how we approach data platform engineering? Reach out to our team - we'd love to hear how you're tackling similar challenges.

About Author
Dilkesh Tilokchandani

See What Our Clients Say

Mindgap

Incentius has been a fantastic partner for us. Their strong expertise in technology helped deliver some complex solutions for our customers within challenging timelines. Specific call out to Sujeet and his team who developed custom sales analytics dashboards in SFDC for a SoCal based healthcare diagnostics client of ours. Their professionalism, expertise, and flexibility to adjust to client needs were greatly appreciated. MindGap is excited to continue to work with Incentius and add value to our customers.

Samik Banerjee

Founder & CEO

World at Work

Having worked so closely for half a year on our website project, I wanted to thank Incentius for all your fantastic work and efforts that helped us deliver a truly valuable experience to our WorldatWork members. I am in awe of the skills, passion, patience, and above all, the ownership that you brought to this project every day! I do not say this lightly, but we would not have been able to deliver a flawless product, but for you. I am sure you'll help many organizations and projects as your skills and professionalism are truly amazing.

Shantanu Bayaskar

Senior Project Manager

Gogla

It was a pleasure working with Incentius to build a data collection platform for the off-grid solar sector in India. It is rare to find a team with a combination of good understanding of business as well as great technological know-how. Incentius team has this perfect combination, especially their technical expertise is much appreciated. We had a fantastic time working with their expert team, especially with Amit.

Viraj gada

Gogla

Humblx

Choosing Incentius to work with is one of the decisions we are extremely happy with. It's been a pleasure working with their team. They have been tremendously helpful and efficient through the intense development cycle that we went through recently. The team at Incentius is truly agile and open to a discussion in regards to making tweaks and adding features that may add value to the overall solution. We found them willing to go the extra mile for us and it felt like working with someone who rooted for us to win.

Samir Dayal Singh

CEO Humblx

Transportation & Logistics Consulting Organization

Incentius is very flexible and accommodating to our specific needs as an organization. In a world where approaches and strategies are constantly changing, it is invaluable to have an outsourcer who is able to adjust quickly to shifts in the business environment.

Transportation & Logistics Consulting Organization

Consultant

Mudraksh & McShaw

Incentius was instrumental in bringing the visualization aspect into our investment and trading business. They helped us organize our trading algorithms processing framework, review our backtests and analyze results in an efficient, visual manner.

Priyank Dutt Dwivedi

Mudraksh & McShaw Advisory

Leading Healthcare Consulting Organization

The Incentius resource was highly motivated and developed a complex forecasting model with minimal supervision. He was thorough with quality checks and kept on top of multiple changes.

Leading Healthcare Consulting Organization

Sr. Principal

US Fortune 100 Telecommunications Company

The Incentius resource was highly motivated and developed a complex forecasting model with minimal supervision. He was thorough with quality checks and kept on top of multiple changes.

Incentive Compensation

Sr. Director