🎯 Philosophy & Design Principles¶
What is grai.build?¶
grai.build is "dbt for knowledge graphs" - a schema-as-code tool for managing graph database schemas declaratively.
Like dbt transformed SQL analytics with declarative modeling, grai.build brings the same approach to graph databases.
🤔 The Problem We Solve¶
Traditional Graph Development Problems¶
- Schema Drift: Graph schemas evolve organically, becoming inconsistent
- No Version Control: Hard to track what entities/relations exist
- Manual Cypher: Writing constraints and indexes by hand
- No Documentation: Graph structure lives only in developers' heads
- No CI/CD: Can't validate schema changes before deployment
What We're NOT Solving¶
We are not an ETL tool. We don't:
- Extract data from source systems (use Airbyte, Fivetran, custom APIs)
- Load data in real-time (use Kafka, CDC, application code)
- Replace your data pipelines (use Airflow, Prefect, dbt)
- Manage data transformations (use dbt for that)
🎯 Core Philosophy¶
1. Schema, Not Data¶
# grai.build defines WHAT your graph looks like
entity: customer
keys: [customer_id]
properties:
- name: customer_id
- name: email
# Your ETL pipeline handles HOW data gets loaded
Think of it like database migrations:
- Alembic/Flyway manage schema changes
- Your application manages data
- grai.build is the Alembic for graphs
2. Declarative, Not Imperative¶
# Declarative (grai.build)
entity: customer
source: analytics.customers
keys: [customer_id]
# Not imperative
# (no "run this script to create customers")
You declare what you want, we generate the Cypher to make it happen.
3. Version Control Everything¶
git diff entities/customer.yml
# See exactly what changed in your schema
git blame relations/purchased.yml
# Know when and why relations were added
Your graph schema lives in version control, just like your application code.
4. Separation of Concerns¶
┌─────────────────────────────────────────────────────────┐
│ grai.build (Schema Layer) │
│ • Define entities/relations │
│ • Generate constraints/indexes │
│ • Validate consistency │
│ • Generate documentation │
└─────────────────────────────────────────────────────────┘
↓ (generates Cypher)
┌─────────────────────────────────────────────────────────┐
│ Your ETL Pipeline (Data Layer) │
│ • Extract from sources (Postgres, APIs, files) │
│ • Transform data │
│ • Load into Neo4j (using generated schema) │
│ • Scheduled via Airflow/Prefect/dbt │
└─────────────────────────────────────────────────────────┘
5. CI/CD First¶
# .github/workflows/graph-schema.yml
- name: Validate Graph Schema
run: grai validate
- name: Check for Breaking Changes
run: grai diff --fail-on-breaking
- name: Deploy Schema
run: grai run --schema-only
Schema changes go through code review and CI, just like application code.
🏗️ Architecture Principles¶
Inspired by Modern Data Tools¶
dbt (SQL Transformations)¶
grai.build (Graph Schema)
Terraform (Infrastructure as Code)¶
grai.build (Schema as Code)
Alembic (Database Migrations)¶
grai.build (Graph Migrations)
📊 Comparison to Other Tools¶
vs. Neo4j Desktop / Browser¶
- Neo4j: Manual Cypher in a GUI
- grai.build: Declarative schema in version control
vs. neo4j-admin import¶
- neo4j-admin: Bulk CSV loading tool
- grai.build: Schema management tool (use both together)
vs. Apache AGE / TigerGraph¶
- Other Graphs: Different graph databases
- grai.build: Could support multiple backends (Neo4j first)
vs. dbt¶
- dbt: SQL transformations in data warehouses
- grai.build: Schema definitions for graph databases
- Use together: dbt transforms relational data → grai.build defines graph schema
🎯 When to Use grai.build¶
✅ Perfect Use Cases¶
- Microservices with Shared Graph
Multiple services write to Neo4j
→ Need consistent schema across services
→ grai.build enforces schema contract
- Analytics Graphs
- Knowledge Graphs
- CI/CD Pipelines
❌ Not Ideal Use Cases¶
- Simple Application CRUD
- One-off Data Imports
- Exploratory Analysis
🔄 Recommended Workflows¶
Development Workflow¶
# 1. Define schema locally
vim entities/customer.yml
# 2. Validate
grai validate
# 3. See generated Cypher
grai build
cat target/neo4j/compiled.cypher
# 4. Test locally with sample data
grai run --schema-only
grai run --load-csv # Quick test with CSV samples
# 5. Commit
git add entities/customer.yml
git commit -m "Add customer entity"
Production Workflow¶
# CI Pipeline (GitHub Actions, GitLab CI, etc.)
steps:
- grai validate
- grai build
- grai run --schema-only --uri $PROD_URI
# Data Pipeline (Airflow, Prefect, etc.)
# Your DAG:
extract_from_postgres()
transform_data()
load_to_neo4j() # Uses schema from grai.build
Team Workflow¶
Developer A Developer B
│ │
├─ Add entity ├─ Add relation
├─ grai validate ├─ grai validate
├─ PR → Review ├─ PR → Review
│ │
└────────┬───────────────┘
│
Merge to main
│
CI validates
│
Deploy schema to prod
│
ETL pipeline loads data
🚀 Future Vision¶
Phase 1: Schema Management (Current)¶
- ✅ Define entities/relations in YAML
- ✅ Generate Cypher constraints/indexes
- ✅ Validate schema consistency
- ✅ Basic visualization
Phase 2: Integration Templates (Next)¶
- 🔄 Generate ETL boilerplate code
- 🔄 dbt integration (graph models)
- 🔄 Airflow operators for graph loading
- 🔄 FastAPI endpoints for graph CRUD
Phase 3: Multi-Backend (Future)¶
- ⏳ Apache AGE support
- ⏳ TigerGraph support
- ⏳ Gremlin-compatible databases
- ⏳ Cross-platform schema abstraction
Phase 4: Advanced Features (Future)¶
- ⏳ Schema migrations (like Alembic)
- ⏳ Breaking change detection
- ⏳ Auto-generated GraphQL APIs
- ⏳ Graph testing framework
💡 Key Insights¶
1. CSV Loading is for Development Only¶
The --load-csv flag exists for:
- Quick local testing
- Demos and tutorials
- Validating schema with sample data
In production, you need proper ETL pipelines.
2. grai.build Generates, You Execute¶
# grai.build generates Cypher
grai build → target/neo4j/compiled.cypher
# You decide when/how to execute it
# Option 1: CLI
grai run --schema-only
# Option 2: In your pipeline
cat compiled.cypher | cypher-shell
# Option 3: Application code
driver.execute_cypher(read_file('compiled.cypher'))
3. Schema Evolution > Data Migration¶
Unlike relational databases where migrations are complex:
- Graphs are schema-flexible
- New properties/labels can be added easily
- Focus on evolution, not migration
4. Documentation is a First-Class Output¶
Documentation stays in sync with code automatically.
🎓 Learning from dbt's Success¶
What dbt Got Right¶
- Separation of Concerns: Analysts own transformations, engineers own pipelines
- Version Control: SQL lives in git, not in tools
- Testing Built-in: Data tests run in CI/CD
- Documentation: Auto-generated from code
- Community: Open-source, extensible
What We're Applying¶
- Separation: Graph architects define schema, engineers load data
- Version Control: YAML in git, not in Neo4j Browser
- Testing: Schema validation in CI/CD
- Documentation: Auto-generated visualizations
- Community: Open-source, extensible to other graph DBs
🎯 Success Metrics¶
We know we're successful when:
-
Teams can onboard faster
-
New devs understand graph structure from YAML
-
Documentation is always up-to-date
-
Schema stays consistent
-
No more "wait, does this node have this property?"
-
CI catches schema violations
-
Deployment is automated
-
Schema changes deploy through CI/CD
-
No manual Cypher in production
-
Knowledge is shared
- Graph structure is documented
- Lineage is tracked
- Changes are reviewable
📚 Further Reading¶
- Getting Started - Quick start guide
- CLI Usage - Complete command reference
- Data Loading - ETL integration patterns
- Neo4j Setup - Local development setup
💬 Questions?¶
"Should I use grai.build if I'm just building a simple app?"
Probably not. If your app is the only thing writing to Neo4j, just use the driver directly. grai.build adds value when you have:
- Multiple services/teams sharing a graph
- Need for schema governance
- CI/CD pipelines
- Complex ETL processes
"Can grai.build replace my ETL pipeline?"
No. grai.build manages your graph schema. Your ETL pipeline manages your data. Use them together.
"How does this relate to dbt?"
Use dbt to transform data in your warehouse, then use grai.build to define the schema when loading that data into a graph. They complement each other.
"Why not just write Cypher directly?"
Same reason you use dbt instead of raw SQL:
- Version control
- Validation
- Documentation
- Consistency
- Team collaboration
Remember: grai.build is a schema management tool, not a data loading tool. Focus on defining your graph structure, and let your existing data pipelines handle the loading.