dbt schema models
dbt is a SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. It helps teams work directly within the warehouse to produce trusted datasets for reporting, ML modeling, and operational workflows.
dbt performs the T (Transform) of ETL (actually ELT) but it doesn’t offer support for Extraction and Load operations. It allows companies to write transformations as queries and orchestrate them in a more efficient way.
Multiple SQL-like databases are supported, including: PostgreSQL, Oracle, Databricks, Redshift, BigQuery, Snowflake, etc..
For selected targets, currently only Snowflake, it is possible to generate the models schema.yml for entities in a model. THe feature is available in Tools > Forwad-Engineering > dbt models
dbt models
Warning: the term "model" in dbt terminology is different than for Hackolade. In dbt <model name> is the technical name of an entity/table/collection/record in Hackolade Studio.
Model properties are declared in .yml files generally in a models/ directory (or possibly elsewhere.) These files can be named whatever_you_want.yml (for us, it will be either the schema's technical name or the entity's technical name, depending on the option chosen.)
Example:
version: 2
models:
- name: <model name>
description: <markdown_string>
columns:
- name: <column_name> # required
data_type: <string>
description: <markdown_string>
constraints:
- <constraint>
tags: [<string>]
- name: ... # declare properties of additional columns
constraints:
- <constraint>
config:
<model_config>: <config_value>
Users are able to choose to have one file per entity or one file per schema, in which case, multiple entities are listed inside the same model file.
dbt keywords
There are many keywords in dbt, called properties and configs. Below is a list of the relevant keywords in the context of Hackolade Studio.
Name
Name: must be letters, digits and underscores only, and cannot start with a digit
name: string
We use the technical name if present, otherwise the business name for the object.
Description
Description: a user-defined description. Can be used to document a model, and model columns
version: 2
models:
- name: dim_customers
description: One record per customer
columns:
- name: customer_id
description: Primary key
For multiline description: we use YAML block notation to split a longer description over multiple lines
version: 2
models:
- name: dim_customers
description: >
One record per customer. Note that a customer must have made a purchase to
be included in this <Term id="table" /> — customer accounts that were created but never
used have been filtered out.
columns:
- name: customer_id
description: Primary key
Markdown in description is possible too, but requires to quote description to ensure that the YAML parser doesn't get confused by special characters.
version: 2
models:
- name: dim_customers
description: "**[Read more](https://www.google.com/)\*\*"\ columns:
- name: customer_id
description: Primary key.
Columns
Columns: can define sub-properties, including name, description, data_type, constraints.
Data type
data_type: dbt supports scalar and complex semi-structured data types:
- numeric: integer, decimal, float
- string: string, char, varchar, text, character
- date: date, datetime, time, timestamp
- Boolean
- semi-structured: JSON, and array
models:
- name: dim_customers
columns:
- name: customer_id
data_type: int
constraints:
- type: not_null
- name: customer_name
data_type: string
- name: non_integer
data_type: numeric(38,3)
Constraints (TBA)
Constraints: Constraints may be defined for a single column, or at the model level for one or more columns.
If you are defining multiple primary_key constraints for a single model (entity), those must be defined at the model level. Defining multiple primary_key constraints at the column level is not supported.
The structure of a constraint is:
- type (required): one of not_null, unique, primary_key, foreign_key, check, custom
- expression: Free text input to qualify the constraint. Required for certain constraint types, and optional for others.
- name (optional): Human-friendly name for this constraint. Supported by some data platforms.
- columns (model-level only): List of column names to apply the constraint over
models:
- name: <model_name>
# required
config
contract:
enforced: true
# model-level constraints
constraints:
- type: primary_key
columns: [FIRST_COLUMN, SECOND_COLUMN, ...]
- type: FOREIGN_KEY # multi_column
columns: [FIRST_COLUMN, SECOND_COLUMN, ...]
expression: "OTHER_MODEL_SCHEMA.OTHER_MODEL_NAME (OTHER_MODEL_FIRST_COLUMN,OTHER_MODEL_SECOND_COLUMN, ...)"
- type: check
columns: [FIRST_COLUMN, SECOND_COLUMN, ...]
expression: "FIRST_COLUMN != SECOND_COLUMN"
name: HUMAN_FRIENDLY_NAME
- type: ...
columns:
- name: FIRST_COLUMN
data_type: DATA_TYPE
# column-level constraints
constraints:
- type: not_null
- type: unique
- type: foreign_key
expression: OTHER_MODEL_SCHEMA.OTHER_MODEL_NAME (OTHER_MODEL_COLUMN)
- type: ...
Config (TBA)
config: allows to configure resources at the same time as properties in YAML files.
version: 2
models:
- name: <model_name>
config:
<model_config>: <config_value>
...
where
version: 2
models:
- name: [<model-name>]
config:
database: <string>
schema: <string>
contract: {<dictionary>}
There platform-specific configs for Snowflake, Databricks, BigQuery, Synapse, SQL Server, Oracle, PostgreSQL, Redshift, and Teradata. Currently, Hackolade Studio does not generate platform-specific configs.
Contract (TBA)
Contract: when the contract configuration is enforced, dbt will ensure that your model's returned dataset exactly matches the attributes you have defined in yaml:
- name and data_type for every column
- additional constraints, as supported for this materialization and data platform
This is to ensure that the people querying your model downstream—both inside and outside dbt—have a predictable and consistent set of columns to use in their analyses. dbt uses built-in type aliasing for the data_type defined in your YAML. For example, you can specify string in your contract, and on Postgres/Redshift, dbt will convert it to text. If dbt doesn't recognize the data_type name among its known aliases, it will pass it through as-is. This is enabled by default, but you can opt-out by setting alias_types to false.
models:
- name: dim_customers
config:
materialized: table
contract:
enforced: true
alias_types: false # true by default
columns:
- name: customer_id
data_type: int
constraints:
- type: not_null
- name: customer_name
data_type: string
- name: non_integer
data_type: numeric(38,3)