Documentation

No results
    gitHub

    dbt schema models

    dbt is a SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. It helps teams work directly within the warehouse to produce trusted datasets for reporting, ML modeling, and operational workflows. 

     

    dbt performs the T (Transform) of ETL (actually ELT) but it doesn’t offer support for Extraction and Load operations. It allows companies to write transformations as queries and orchestrate them in a more efficient way. 

     

    Multiple SQL-like databases are supported, including: PostgreSQL, Oracle, Databricks, Redshift, BigQuery, Snowflake, etc..

     

    For selected targets, currently only Snowflake, it is possible to generate the models schema.yml for entities in a model.  THe feature is available in Tools > Forwad-Engineering > dbt models

     

    dbt models

    Warning: the term "model" in dbt terminology is different than for Hackolade.  In dbt <model name> is the technical name of an entity/table/collection/record in Hackolade Studio.

     

    Model properties are declared in .yml files generally in a models/ directory (or possibly elsewhere.)  These files can be named whatever_you_want.yml (for us, it will be either the schema's technical name or the entity's technical name, depending on the option chosen.)

     

    Example:

    version: 2
    models:
        - name: <model name>
          description: <markdown_string>
          columns:
            - name: <column_name> # required
              data_type: <string>
              description: <markdown_string>
              constraints:
                - <constraint>
              tags: [<string>]
            - name: ... # declare properties of additional columns
        constraints:
            - <constraint>
              config:
                <model_config>: <config_value>

     

    Users are able to choose to have one file per entity or one file per schema, in which case, multiple entities are listed inside the same model file.

     

    dbt keywords

    There are many keywords in dbt, called properties and configs.  Below is a list of the relevant keywords in the context of Hackolade Studio.

     

    Name

    Name: must be letters, digits and underscores only, and cannot start with a digit

     

    name: string

     

    We use the technical name if present, otherwise the business name for the object.

     

    Description

    Description: a user-defined description. Can be used to document a model, and model columns

     

    version: 2
    models:
        - name: dim_customers
          description: One record per customer
          columns:
            - name: customer_id
              description: Primary key

     

    For multiline description: we use YAML block notation to split a longer description over multiple lines

     


    version: 2
    models:
      - name: dim_customers
        description: >
          One record per customer. Note that a customer must have made a purchase to
          be included in this <Term id="table" /> — customer accounts that were created but never
          used have been filtered out.
        columns:
          - name: customer_id
            description: Primary key

     

    Markdown in description is possible too, but requires to quote description to ensure that the YAML parser doesn't get confused by special characters.

     

    version: 2
    models:
        - name: dim_customers
          description: "**[Read more](https://www.google.com/)\*\*"\       columns:
            - name: customer_id
            description: Primary key.

     

    Columns

    Columns: can define sub-properties, including name, description, data_type, constraints.

     

     

    Data type

    data_type: dbt supports scalar and complex semi-structured data types:

     

    - numeric: integer, decimal, float

    - string: string, char, varchar, text, character

    - date: date, datetime, time, timestamp

    - Boolean

    - semi-structured: JSON, and array

     

    models:
        - name: dim_customers
          columns:
            - name: customer_id
              data_type: int
              constraints:
                - type: not_null
            - name: customer_name
              data_type: string
            - name: non_integer
              data_type: numeric(38,3)

     

    Constraints (TBA)

    Constraints: Constraints may be defined for a single column, or at the model level for one or more columns.

    If you are defining multiple primary_key constraints for a single model (entity), those must be defined at the model level. Defining multiple primary_key constraints at the column level is not supported.  

     

    The structure of a constraint is:

    - type (required): one of not_null, unique, primary_key, foreign_key, check, custom

    - expression: Free text input to qualify the constraint. Required for certain constraint types, and optional for others.

    - name (optional): Human-friendly name for this constraint. Supported by some data platforms.

    - columns (model-level only): List of column names to apply the constraint over

     

    models:
        - name: <model_name>
          # required
          config
            contract:
            enforced: true
          # model-level constraints
          constraints:
            - type: primary_key
              columns: [FIRST_COLUMN, SECOND_COLUMN, ...]
            - type: FOREIGN_KEY # multi_column
              columns: [FIRST_COLUMN, SECOND_COLUMN, ...]
              expression: "OTHER_MODEL_SCHEMA.OTHER_MODEL_NAME (OTHER_MODEL_FIRST_COLUMN,OTHER_MODEL_SECOND_COLUMN, ...)"
            - type: check
              columns: [FIRST_COLUMN, SECOND_COLUMN, ...]
              expression: "FIRST_COLUMN != SECOND_COLUMN"
              name: HUMAN_FRIENDLY_NAME
            - type: ...


          columns:
            - name: FIRST_COLUMN
              data_type: DATA_TYPE

        # column-level constraints
        constraints:
            - type: not_null
            - type: unique
            - type: foreign_key
              expression: OTHER_MODEL_SCHEMA.OTHER_MODEL_NAME (OTHER_MODEL_COLUMN)
            - type: ...

     

    Config (TBA)

    config: allows to configure resources at the same time as properties in YAML files.

     

    version: 2
    models:
        - name: <model_name>
          config:
            <model_config>: <config_value>
            ...

     

    where

    version: 2
    models:
        - name: [<model-name>]
        config:
            database: <string>
            schema: <string>
            contract: {<dictionary>}

     

    There platform-specific configs for Snowflake, Databricks, BigQuery, Synapse, SQL Server, Oracle, PostgreSQL, Redshift, and Teradata.  Currently, Hackolade Studio does not generate platform-specific configs.

     

    Contract (TBA)

    Contract: when the contract configuration is enforced, dbt will ensure that your model's returned dataset exactly matches the attributes you have defined in yaml:

     

    - name and data_type for every column

    - additional constraints, as supported for this materialization and data platform

     

    This is to ensure that the people querying your model downstream—both inside and outside dbt—have a predictable and consistent set of columns to use in their analyses. dbt uses built-in type aliasing for the data_type defined in your YAML. For example, you can specify string in your contract, and on Postgres/Redshift, dbt will convert it to text. If dbt doesn't recognize the data_type name among its known aliases, it will pass it through as-is. This is enabled by default, but you can opt-out by setting alias_types to false.

     

    models:
        - name: dim_customers
          config:
            materialized: table
          contract:
            enforced: true
            alias_types: false # true by default
          columns:
            - name: customer_id
              data_type: int
              constraints:
                - type: not_null
            - name: customer_name
              data_type: string
            - name: non_integer
              data_type: numeric(38,3)