Skip to content

dbt Integration

Overview

grai.build seamlessly integrates with dbt (data build tool) to import your existing data models as graph entities and relations. This enables a powerful workflow where you can:

  1. Define your data transformations in dbt
  2. Import dbt models as grai.build entities
  3. Load data from your warehouse into Neo4j
  4. Visualize relationships in your knowledge graph

This integration respects dbt's conventions while extending them into the graph database world.

Prerequisites

  • A dbt project with compiled models (dbt compile or dbt run)
  • The manifest.json file (found in target/manifest.json)
  • grai.build installed (pip install grai-build)

Quick Start

1. Compile Your dbt Project

cd my-dbt-project
dbt compile  # or dbt run

This generates target/manifest.json containing all model metadata.

2. Import dbt Models

# Initialize grai.build project
grai init my-graph-project
cd my-graph-project

# Import all dbt models
grai import dbt --manifest ../my-dbt-project/target/manifest.json

This creates entity YAML files in entities/ directory, one per dbt model.

3. Validate and Build

# Validate the imported entities
grai validate

# Compile to Cypher
grai build

# Create schema in Neo4j
grai run

Import Options

Filter by Pattern

Import only specific models using include/exclude patterns:

# Import only fact and dimension tables
grai import dbt --manifest target/manifest.json --include "fct_,dim_"

# Exclude staging models
grai import dbt --manifest target/manifest.json --exclude "stg_"

# Combine both
grai import dbt \
  --manifest target/manifest.json \
  --include "fct_,dim_" \
  --exclude "test_"

Custom Output Directory

# Write to custom directory
grai import dbt --manifest target/manifest.json --output my-entities/

Overwrite Existing Files

# Force overwrite existing entity files
grai import dbt --manifest target/manifest.json --force

How Models Are Converted

Basic Entity Conversion

dbt models become grai.build entities with full metadata preservation:

dbt model (models/customers.sql):

version: 2

models:
  - name: customers
    description: Customer master table
    columns:
      - name: customer_id
        description: Unique customer identifier
        data_type: string
        tests:
          - unique
          - not_null

      - name: email
        description: Customer email address
        data_type: string

      - name: created_at
        description: Account creation timestamp
        data_type: timestamp

Generated entity (entities/customers.yml):

entity: customers
source:
  name: analytics.customers
  type: table
  database: my-project
  db_schema: analytics
  metadata:
    dbt_model: customers
    dbt_unique_id: model.my_project.customers
    dbt_package: my_project
    dbt_path: models/customers.sql
    materialization: table
keys:
  - customer_id
properties:
  - name: customer_id
    type: string
    description: Unique customer identifier
  - name: email
    type: string
    description: Customer email address
  - name: created_at
    type: datetime
    description: Account creation timestamp
description: Customer master table

Key Inference

grai.build automatically infers entity keys from dbt tests:

# dbt model with tests
columns:
  - name: customer_id
    tests:
      - unique # ← Inferred as key
      - not_null

# Becomes in grai.build:
keys:
  - customer_id

Supported test patterns:

  • unique
  • not_null + unique combination
  • primary_key (from dbt-utils)

Fallback: If no unique tests are found, the first column is used as a key.

Type Mapping

dbt data types are automatically mapped to grai.build types:

dbt Type grai.build Type
string, varchar, text string
integer, bigint, int integer
float, double, numeric float
boolean, bool boolean
date date
timestamp, datetime datetime
json, jsonb, variant json

Relationship Detection

Automatic Detection

grai.build can automatically detect relationship tables (junction tables) by analyzing dbt's relationships tests:

dbt relationship table (models/order_items.sql):

models:
  - name: order_items
    description: Line items for orders
    columns:
      - name: order_id
        tests:
          - relationships:
              to: ref('orders')
              field: order_id

      - name: product_id
        tests:
          - relationships:
              to: ref('products')
              field: product_id

      - name: quantity
        data_type: integer

      - name: price
        data_type: float

When a model has 2 or more foreign key relationships, it's treated as a relation:

Generated relation (relations/order_items.yml):

relation: ORDER_ITEMS
from: orders
to: products
source:
  name: analytics.order_items
  type: table
  database: my-project
  db_schema: analytics
  metadata:
    dbt_model: order_items
    is_relationship_table: true
mappings:
  from_key: order_id
  to_key: product_id
properties:
  - name: quantity
    type: integer
  - name: price
    type: float
description: Line items for orders

Explicit Relation Configuration

For more control, use the meta field in your dbt models:

models:
  - name: order_items
    description: Line items for orders
    meta:
      grai:
        type: relation
        relation_name: CONTAINS
        from_entity: order
        from_key: order_id
        to_entity: product
        to_key: product_id
    columns:
      - name: order_id
        data_type: string
      - name: product_id
        data_type: string
      - name: quantity
        data_type: integer

This gives you full control over:

  • Relation name (CONTAINS instead of ORDER_ITEMS)
  • Entity names (singular vs plural)
  • Key mappings

Complete Workflow Example

1. dbt Project Structure

my-dbt-project/
├── models/
│   ├── staging/
│   │   ├── stg_customers.sql
│   │   └── stg_products.sql
│   ├── marts/
│   │   ├── dim_customers.sql
│   │   ├── dim_products.sql
│   │   └── fct_orders.sql
│   └── schema.yml
└── target/
    └── manifest.json

2. Import to grai.build

# Initialize grai project
grai init my-knowledge-graph
cd my-knowledge-graph

# Import only mart models (exclude staging)
grai import dbt \
  --manifest ../my-dbt-project/target/manifest.json \
  --include "dim_,fct_" \
  --output entities/

Result:

entities/
├── dim_customers.yml
├── dim_products.yml
└── fct_orders.yml

3. Add Custom Relations

Create relations/purchased.yml:

relation: PURCHASED
from: dim_customers
to: dim_products
source: analytics.fct_orders
mappings:
  from_key: customer_id
  to_key: product_id
properties:
  - name: order_id
    type: string
  - name: order_date
    type: date
  - name: amount
    type: float

4. Configure Connection Profile

Edit ~/.grai/profiles.yml:

default:
  target: dev
  outputs:
    dev:
      warehouse:
        type: bigquery
        method: oauth
        project: my-project
        dataset: analytics
      graph:
        type: neo4j
        uri: bolt://localhost:7687
        user: neo4j
        password: "{{ env_var('NEO4J_PASSWORD') }}"

5. Build and Load

# Validate definitions
grai validate

# Create schema in Neo4j
grai build
grai run

# Load data from BigQuery to Neo4j
grai load dim_customers
grai load dim_products
grai load PURCHASED

6. Generate Documentation

# Generate interactive docs
grai docs --serve

Opens at http://localhost:8080 with full entity/relation catalog and graph visualization.

Advanced Patterns

Handling Large dbt Projects

For large projects with 100+ models:

# Import incrementally by layer
grai import dbt --manifest target/manifest.json --include "dim_"
grai import dbt --manifest target/manifest.json --include "fct_" --force
grai import dbt --manifest target/manifest.json --include "bridge_" --force

Selective Model Import

Import only models tagged in dbt:

# dbt schema.yml
models:
  - name: customers
    meta:
      grai:
        import: true # Mark for import

Then filter the manifest before import:

# Use dbt's selector syntax
dbt compile --select tag:graph
grai import dbt --manifest target/manifest.json

Multi-Environment Setup

Use profiles for different environments:

# ~/.grai/profiles.yml
default:
  target: dev
  outputs:
    dev:
      warehouse:
        type: bigquery
        project: my-project-dev
        dataset: analytics_dev
      graph:
        uri: bolt://localhost:7687

    prod:
      warehouse:
        type: bigquery
        project: my-project-prod
        dataset: analytics
      graph:
        uri: bolt://prod.neo4j.example.com:7687
# Import from dev
grai import dbt --manifest target/manifest.json

# Load to production
grai load dim_customers --target prod

Best Practices

1. Use Descriptive Model Names

# ✅ Good - Clear entity names
models:
  - name: customer
  - name: product
  - name: order

# ❌ Avoid - Prefixes make poor entity names
models:
  - name: stg_customers  # Becomes entity: stg_customers
  - name: int_products

Solution: Use --exclude to filter staging models, or rename in YAML after import.

2. Document Your Models

# dbt schema.yml
models:
  - name: customers
    description: Customer master table with enriched attributes
    columns:
      - name: customer_id
        description: Unique customer identifier from source system

Descriptions are preserved in grai.build entities and appear in generated documentation.

3. Define Unique Tests

columns:
  - name: customer_id
    tests:
      - unique
      - not_null

This ensures grai.build can infer proper entity keys.

4. Use Meta for Relationships

For junction tables, explicitly configure relationships:

models:
  - name: customer_purchases
    meta:
      grai:
        type: relation
        relation_name: PURCHASED
        from_entity: customer
        from_key: customer_id
        to_entity: product
        to_key: product_id

5. Keep dbt and grai.build in Sync

After updating dbt models:

# Recompile dbt
cd my-dbt-project
dbt compile

# Re-import with --force
cd my-graph-project
grai import dbt --manifest ../my-dbt-project/target/manifest.json --force

# Check for schema changes
grai migrate create "Update from dbt changes" --compare-to HEAD~1

Troubleshooting

No Entity Keys Generated

Problem: Entity has empty keys: []

Cause: No unique tests found in dbt model.

Solution: Add unique test to key column:

columns:
  - name: id
    tests:
      - unique

Or manually edit the generated YAML:

entity: customer
keys:
  - customer_id # Add manually

Wrong Columns Detected as Foreign Keys

Problem: Non-FK columns detected as relationships.

Cause: Column has relationships test but isn't actually a foreign key.

Solution: Remove the test or use explicit meta configuration:

meta:
  grai:
    type: entity # Force entity, not relation

Type Mapping Issues

Problem: Unknown data type in dbt model.

Cause: Custom or platform-specific data type.

Solution: The parser defaults to string. Edit the generated YAML:

properties:
  - name: custom_column
    type: json # Change from default string

Import Takes Too Long

Problem: Large manifest with 500+ models.

Cause: Processing all models at once.

Solution: Use filters to import incrementally:

# Import by prefix
for prefix in dim fct bridge; do
  grai import dbt --manifest target/manifest.json --include "${prefix}_" --force
done

Manifest Not Found

Problem: Error: Manifest file not found

Cause: dbt project not compiled.

Solution:

cd my-dbt-project
dbt compile  # or dbt run
ls target/manifest.json  # Verify exists

Comparison with dbt

Feature dbt grai.build
Purpose SQL transformations Graph schema & loading
Input Raw data Transformed data (from dbt)
Output Tables/views Graph nodes & edges
Testing Data quality Schema validation
Docs Table lineage Graph visualization
Orchestration Via Airflow/Prefect Via Airflow/Prefect

Together: dbt transforms → grai.build loads → Neo4j stores → Graph insights

Integration Architecture

┌─────────────┐
│   Raw Data  │
│  (Sources)  │
└──────┬──────┘
┌─────────────┐
│     dbt     │  ← Transform & model data
│ (SQL Models)│
└──────┬──────┘
       │ manifest.json
┌─────────────┐
│ grai.build  │  ← Import models as entities
│   (Import)  │
└──────┬──────┘
       │ Entity/Relation YAMLs
┌─────────────┐
│ grai.build  │  ← Load data into graph
│   (Load)    │
└──────┬──────┘
┌─────────────┐
│   Neo4j     │  ← Query & visualize
│   (Graph)   │
└─────────────┘

Next Steps

Examples

Check out example dbt integrations:

# Clone examples repository
git clone https://github.com/grai-build/examples
cd examples/dbt-integration

# Follow README for complete workflow

FAQ

Can I use grai.build without dbt?

Yes! The dbt integration is optional. You can define entities and relations in YAML from scratch.

Does this replace dbt?

No. dbt handles SQL transformations; grai.build handles graph schema and loading. They complement each other.

Can I import dbt sources (not just models)?

Currently, only models are imported. Source import is planned for a future release. Track progress in GitHub Issues.

How do I handle dbt snapshots?

Snapshots are treated like regular models. They'll be imported as entities with SCD columns preserved.

Can I import from dbt Cloud?

Yes, if you can access the manifest.json file. Download it from your dbt Cloud project's artifacts.

What about dbt exposures and metrics?

These are not currently imported but could be added in future versions. Open an issue if you need this feature!

Support