dbt Integration¶
Overview¶
grai.build seamlessly integrates with dbt (data build tool) to import your existing data models as graph entities and relations. This enables a powerful workflow where you can:
- Define your data transformations in dbt
- Import dbt models as grai.build entities
- Load data from your warehouse into Neo4j
- Visualize relationships in your knowledge graph
This integration respects dbt's conventions while extending them into the graph database world.
Prerequisites¶
- A dbt project with compiled models (
dbt compile
ordbt run
) - The
manifest.json
file (found intarget/manifest.json
) - grai.build installed (
pip install grai-build
)
Quick Start¶
1. Compile Your dbt Project¶
This generates target/manifest.json
containing all model metadata.
2. Import dbt Models¶
# Initialize grai.build project
grai init my-graph-project
cd my-graph-project
# Import all dbt models
grai import dbt --manifest ../my-dbt-project/target/manifest.json
This creates entity YAML files in entities/
directory, one per dbt model.
3. Validate and Build¶
# Validate the imported entities
grai validate
# Compile to Cypher
grai build
# Create schema in Neo4j
grai run
Import Options¶
Filter by Pattern¶
Import only specific models using include/exclude patterns:
# Import only fact and dimension tables
grai import dbt --manifest target/manifest.json --include "fct_,dim_"
# Exclude staging models
grai import dbt --manifest target/manifest.json --exclude "stg_"
# Combine both
grai import dbt \
--manifest target/manifest.json \
--include "fct_,dim_" \
--exclude "test_"
Custom Output Directory¶
Overwrite Existing Files¶
How Models Are Converted¶
Basic Entity Conversion¶
dbt models become grai.build entities with full metadata preservation:
dbt model (models/customers.sql
):
version: 2
models:
- name: customers
description: Customer master table
columns:
- name: customer_id
description: Unique customer identifier
data_type: string
tests:
- unique
- not_null
- name: email
description: Customer email address
data_type: string
- name: created_at
description: Account creation timestamp
data_type: timestamp
Generated entity (entities/customers.yml
):
entity: customers
source:
name: analytics.customers
type: table
database: my-project
db_schema: analytics
metadata:
dbt_model: customers
dbt_unique_id: model.my_project.customers
dbt_package: my_project
dbt_path: models/customers.sql
materialization: table
keys:
- customer_id
properties:
- name: customer_id
type: string
description: Unique customer identifier
- name: email
type: string
description: Customer email address
- name: created_at
type: datetime
description: Account creation timestamp
description: Customer master table
Key Inference¶
grai.build automatically infers entity keys from dbt tests:
# dbt model with tests
columns:
- name: customer_id
tests:
- unique # ← Inferred as key
- not_null
# Becomes in grai.build:
keys:
- customer_id
Supported test patterns:
unique
not_null
+unique
combinationprimary_key
(from dbt-utils)
Fallback: If no unique tests are found, the first column is used as a key.
Type Mapping¶
dbt data types are automatically mapped to grai.build types:
dbt Type | grai.build Type |
---|---|
string , varchar , text |
string |
integer , bigint , int |
integer |
float , double , numeric |
float |
boolean , bool |
boolean |
date |
date |
timestamp , datetime |
datetime |
json , jsonb , variant |
json |
Relationship Detection¶
Automatic Detection¶
grai.build can automatically detect relationship tables (junction tables) by analyzing dbt's relationships
tests:
dbt relationship table (models/order_items.sql
):
models:
- name: order_items
description: Line items for orders
columns:
- name: order_id
tests:
- relationships:
to: ref('orders')
field: order_id
- name: product_id
tests:
- relationships:
to: ref('products')
field: product_id
- name: quantity
data_type: integer
- name: price
data_type: float
When a model has 2 or more foreign key relationships, it's treated as a relation:
Generated relation (relations/order_items.yml
):
relation: ORDER_ITEMS
from: orders
to: products
source:
name: analytics.order_items
type: table
database: my-project
db_schema: analytics
metadata:
dbt_model: order_items
is_relationship_table: true
mappings:
from_key: order_id
to_key: product_id
properties:
- name: quantity
type: integer
- name: price
type: float
description: Line items for orders
Explicit Relation Configuration¶
For more control, use the meta
field in your dbt models:
models:
- name: order_items
description: Line items for orders
meta:
grai:
type: relation
relation_name: CONTAINS
from_entity: order
from_key: order_id
to_entity: product
to_key: product_id
columns:
- name: order_id
data_type: string
- name: product_id
data_type: string
- name: quantity
data_type: integer
This gives you full control over:
- Relation name (
CONTAINS
instead ofORDER_ITEMS
) - Entity names (singular vs plural)
- Key mappings
Complete Workflow Example¶
1. dbt Project Structure¶
my-dbt-project/
├── models/
│ ├── staging/
│ │ ├── stg_customers.sql
│ │ └── stg_products.sql
│ ├── marts/
│ │ ├── dim_customers.sql
│ │ ├── dim_products.sql
│ │ └── fct_orders.sql
│ └── schema.yml
└── target/
└── manifest.json
2. Import to grai.build¶
# Initialize grai project
grai init my-knowledge-graph
cd my-knowledge-graph
# Import only mart models (exclude staging)
grai import dbt \
--manifest ../my-dbt-project/target/manifest.json \
--include "dim_,fct_" \
--output entities/
Result:
3. Add Custom Relations¶
Create relations/purchased.yml
:
relation: PURCHASED
from: dim_customers
to: dim_products
source: analytics.fct_orders
mappings:
from_key: customer_id
to_key: product_id
properties:
- name: order_id
type: string
- name: order_date
type: date
- name: amount
type: float
4. Configure Connection Profile¶
Edit ~/.grai/profiles.yml
:
default:
target: dev
outputs:
dev:
warehouse:
type: bigquery
method: oauth
project: my-project
dataset: analytics
graph:
type: neo4j
uri: bolt://localhost:7687
user: neo4j
password: "{{ env_var('NEO4J_PASSWORD') }}"
5. Build and Load¶
# Validate definitions
grai validate
# Create schema in Neo4j
grai build
grai run
# Load data from BigQuery to Neo4j
grai load dim_customers
grai load dim_products
grai load PURCHASED
6. Generate Documentation¶
Opens at http://localhost:8080
with full entity/relation catalog and graph visualization.
Advanced Patterns¶
Handling Large dbt Projects¶
For large projects with 100+ models:
# Import incrementally by layer
grai import dbt --manifest target/manifest.json --include "dim_"
grai import dbt --manifest target/manifest.json --include "fct_" --force
grai import dbt --manifest target/manifest.json --include "bridge_" --force
Selective Model Import¶
Import only models tagged in dbt:
Then filter the manifest before import:
# Use dbt's selector syntax
dbt compile --select tag:graph
grai import dbt --manifest target/manifest.json
Multi-Environment Setup¶
Use profiles for different environments:
# ~/.grai/profiles.yml
default:
target: dev
outputs:
dev:
warehouse:
type: bigquery
project: my-project-dev
dataset: analytics_dev
graph:
uri: bolt://localhost:7687
prod:
warehouse:
type: bigquery
project: my-project-prod
dataset: analytics
graph:
uri: bolt://prod.neo4j.example.com:7687
# Import from dev
grai import dbt --manifest target/manifest.json
# Load to production
grai load dim_customers --target prod
Best Practices¶
1. Use Descriptive Model Names¶
# ✅ Good - Clear entity names
models:
- name: customer
- name: product
- name: order
# ❌ Avoid - Prefixes make poor entity names
models:
- name: stg_customers # Becomes entity: stg_customers
- name: int_products
Solution: Use --exclude
to filter staging models, or rename in YAML after import.
2. Document Your Models¶
# dbt schema.yml
models:
- name: customers
description: Customer master table with enriched attributes
columns:
- name: customer_id
description: Unique customer identifier from source system
Descriptions are preserved in grai.build entities and appear in generated documentation.
3. Define Unique Tests¶
This ensures grai.build can infer proper entity keys.
4. Use Meta for Relationships¶
For junction tables, explicitly configure relationships:
models:
- name: customer_purchases
meta:
grai:
type: relation
relation_name: PURCHASED
from_entity: customer
from_key: customer_id
to_entity: product
to_key: product_id
5. Keep dbt and grai.build in Sync¶
After updating dbt models:
# Recompile dbt
cd my-dbt-project
dbt compile
# Re-import with --force
cd my-graph-project
grai import dbt --manifest ../my-dbt-project/target/manifest.json --force
# Check for schema changes
grai migrate create "Update from dbt changes" --compare-to HEAD~1
Troubleshooting¶
No Entity Keys Generated¶
Problem: Entity has empty keys: []
Cause: No unique tests found in dbt model.
Solution: Add unique test to key column:
Or manually edit the generated YAML:
Wrong Columns Detected as Foreign Keys¶
Problem: Non-FK columns detected as relationships.
Cause: Column has relationships
test but isn't actually a foreign key.
Solution: Remove the test or use explicit meta configuration:
Type Mapping Issues¶
Problem: Unknown data type in dbt model.
Cause: Custom or platform-specific data type.
Solution: The parser defaults to string
. Edit the generated YAML:
Import Takes Too Long¶
Problem: Large manifest with 500+ models.
Cause: Processing all models at once.
Solution: Use filters to import incrementally:
# Import by prefix
for prefix in dim fct bridge; do
grai import dbt --manifest target/manifest.json --include "${prefix}_" --force
done
Manifest Not Found¶
Problem: Error: Manifest file not found
Cause: dbt project not compiled.
Solution:
Comparison with dbt¶
Feature | dbt | grai.build |
---|---|---|
Purpose | SQL transformations | Graph schema & loading |
Input | Raw data | Transformed data (from dbt) |
Output | Tables/views | Graph nodes & edges |
Testing | Data quality | Schema validation |
Docs | Table lineage | Graph visualization |
Orchestration | Via Airflow/Prefect | Via Airflow/Prefect |
Together: dbt transforms → grai.build loads → Neo4j stores → Graph insights
Integration Architecture¶
┌─────────────┐
│ Raw Data │
│ (Sources) │
└──────┬──────┘
│
▼
┌─────────────┐
│ dbt │ ← Transform & model data
│ (SQL Models)│
└──────┬──────┘
│
│ manifest.json
▼
┌─────────────┐
│ grai.build │ ← Import models as entities
│ (Import) │
└──────┬──────┘
│
│ Entity/Relation YAMLs
▼
┌─────────────┐
│ grai.build │ ← Load data into graph
│ (Load) │
└──────┬──────┘
│
▼
┌─────────────┐
│ Neo4j │ ← Query & visualize
│ (Graph) │
└─────────────┘
Next Steps¶
- Profiles Configuration - Set up warehouse and graph connections
- Data Loading - Load data from BigQuery/Snowflake to Neo4j
- CLI Reference - Full command documentation
- Source Configuration - Advanced source configuration options
Examples¶
Check out example dbt integrations:
# Clone examples repository
git clone https://github.com/grai-build/examples
cd examples/dbt-integration
# Follow README for complete workflow
FAQ¶
Can I use grai.build without dbt?¶
Yes! The dbt integration is optional. You can define entities and relations in YAML from scratch.
Does this replace dbt?¶
No. dbt handles SQL transformations; grai.build handles graph schema and loading. They complement each other.
Can I import dbt sources (not just models)?¶
Currently, only models are imported. Source import is planned for a future release. Track progress in GitHub Issues.
How do I handle dbt snapshots?¶
Snapshots are treated like regular models. They'll be imported as entities with SCD columns preserved.
Can I import from dbt Cloud?¶
Yes, if you can access the manifest.json
file. Download it from your dbt Cloud project's artifacts.
What about dbt exposures and metrics?¶
These are not currently imported but could be added in future versions. Open an issue if you need this feature!
Support¶
- Documentation: https://grai.build
- GitHub: https://github.com/grai-build/grai.build
- Issues: Report a bug or request a feature