YAML Schema Reference¶
Complete reference for entity and relation YAML definitions.
Entity Schema¶
Entities represent nodes in your graph.
Basic Structure¶
entity: string # Required: Entity name
source: string | null # Required: Data source reference
keys: list[string] # Required: Unique identifier field(s)
properties: list[Property] # Required: List of properties
description: string # Optional: Entity description
tags: list[string] # Optional: Tags for organization
metadata: dict # Optional: Custom metadata
Example¶
entity: customer
source: analytics.customers
keys: [customer_id]
description: Customer entities from the CRM system
tags: [core, customer]
properties:
- name: customer_id
type: string
description: Unique customer identifier
- name: email
type: string
- name: created_at
type: datetime
Fields¶
entity
(required)¶
The name of the entity. This becomes the Neo4j node label.
- Type: string
- Rules: Alphanumeric + underscore, no spaces
- Example:
customer
,product
,order_item
source
(required)¶
Reference to the data source. Can be null
for schema-only entities.
- Type: string or null
- Format: Depends on source type
- BigQuery:
project.dataset.table
ordataset.table
- SQL:
schema.table
- null: Schema-only (no data loading)
keys
(required)¶
List of property names that uniquely identify the entity.
- Type: list of strings
- Rules: Each key must exist in
properties
- Creates: Unique constraint in Neo4j
properties
(required)¶
List of properties for the entity.
- Type: list of Property objects
- Minimum: 1 property (the key)
- See: Property Schema
description
(optional)¶
Human-readable description of the entity.
- Type: string
- Usage: Displayed in documentation
tags
(optional)¶
Tags for categorizing entities.
- Type: list of strings
- Usage: Filtering, organization, documentation
metadata
(optional)¶
Custom metadata for your use.
- Type: dictionary
- Usage: Store any custom information
Relation Schema¶
Relations represent edges in your graph.
Basic Structure¶
relation: string # Required: Relation type name
from: string # Required: Source entity name
to: string # Required: Target entity name
source: string | null # Required: Data source reference
mappings: Mappings # Required: Key mappings
properties: list[Property] # Optional: Relation properties
description: string # Optional: Relation description
tags: list[string] # Optional: Tags
metadata: dict # Optional: Custom metadata
Example¶
relation: PURCHASED
from: customer
to: product
source: analytics.orders
description: Customer purchase transactions
tags: [transaction, core]
mappings:
from_key: customer_id
to_key: product_id
properties:
- name: order_id
type: string
- name: order_date
type: datetime
- name: quantity
type: integer
- name: total_amount
type: float
Fields¶
relation
(required)¶
The type of relationship. This becomes the Neo4j relationship type.
- Type: string
- Convention: UPPER_CASE with underscores
- Example:
PURCHASED
,FOLLOWS
,WORKS_FOR
from
(required)¶
The source entity for the relationship.
- Type: string
- Rules: Must match an existing entity name
- Example:
customer
to
(required)¶
The target entity for the relationship.
- Type: string
- Rules: Must match an existing entity name
- Can be same as
from
: For self-referencing relations
source
(required)¶
Reference to the data source for the relation data.
- Type: string or null
- Format: Same as entity source
mappings
(required)¶
Defines how to join entities.
- Type: Mappings object
- Fields:
from_key
: Property name infrom
entityto_key
: Property name into
entity
For composite keys:
properties
(optional)¶
Properties of the relationship itself.
- Type: list of Property objects
- Example: order_date, quantity, amount
Property Schema¶
Properties define attributes of entities and relations.
Structure¶
name: string # Required: Property name
type: string # Required: Data type
description: string # Optional: Description
required: boolean # Optional: Is required (default: false)
indexed: boolean # Optional: Create index (default: false)
unique: boolean # Optional: Unique constraint (default: false)
default: any # Optional: Default value
Example¶
properties:
- name: email
type: string
description: Customer email address
required: true
indexed: true
unique: true
- name: age
type: integer
required: false
default: 0
- name: created_at
type: datetime
required: true
indexed: true
Fields¶
name
(required)¶
Property name.
- Type: string
- Rules: Valid Neo4j property name
- Example:
customer_id
,email
,created_at
type
(required)¶
Data type of the property.
- Type: string
- Values:
Type | Neo4j Type | Python Type | Example |
---|---|---|---|
string |
String | str | "Alice" |
integer |
Integer | int | 42 |
float |
Float | float | 99.99 |
boolean |
Boolean | bool | true |
datetime |
DateTime | datetime | 2024-01-15T10:30:00Z |
date |
Date | date | 2024-01-15 |
time |
Time | time | 10:30:00 |
list |
List | list | ["tag1", "tag2"] |
map |
Map | dict | {"key": "value"} |
- name: id
type: string
- name: price
type: float
- name: active
type: boolean
- name: created_at
type: datetime
- name: tags
type: list
description
(optional)¶
Human-readable description.
required
(optional)¶
Whether the property is required.
- Type: boolean
- Default: false
- Effect: Validation only (not enforced in Neo4j)
indexed
(optional)¶
Whether to create an index on this property.
- Type: boolean
- Default: false
- Effect: Creates Neo4j index for faster lookups
unique
(optional)¶
Whether the property must be unique.
- Type: boolean
- Default: false
- Effect: Creates unique constraint in Neo4j
default
(optional)¶
Default value if not provided.
- Type: any (matching property type)
- Usage: Data loading fallback
Complete Examples¶
E-commerce Example¶
Customer Entity:
entity: customer
source: analytics.customers
keys: [customer_id]
description: Customer master data
tags: [core, pii]
properties:
- name: customer_id
type: string
required: true
- name: email
type: string
required: true
indexed: true
unique: true
- name: name
type: string
required: true
- name: region
type: string
indexed: true
- name: created_at
type: datetime
required: true
- name: lifetime_value
type: float
default: 0.0
Product Entity:
entity: product
source: catalog.products
keys: [product_id]
description: Product catalog
tags: [core, catalog]
properties:
- name: product_id
type: string
required: true
- name: name
type: string
required: true
indexed: true
- name: category
type: string
indexed: true
- name: price
type: float
required: true
- name: in_stock
type: boolean
default: true
Purchase Relation:
relation: PURCHASED
from: customer
to: product
source: analytics.orders
description: Customer purchase transactions
tags: [transaction, revenue]
mappings:
from_key: customer_id
to_key: product_id
properties:
- name: order_id
type: string
required: true
indexed: true
- name: order_date
type: datetime
required: true
indexed: true
- name: quantity
type: integer
required: true
default: 1
- name: unit_price
type: float
required: true
- name: total_amount
type: float
required: true
- name: status
type: string
default: "pending"
Social Network Example¶
User Entity:
entity: user
source: social.users
keys: [user_id]
description: Social network users
tags: [core, user]
properties:
- name: user_id
type: string
required: true
- name: username
type: string
required: true
indexed: true
unique: true
- name: email
type: string
required: true
indexed: true
unique: true
- name: display_name
type: string
- name: bio
type: string
- name: joined_at
type: datetime
required: true
- name: verified
type: boolean
default: false
Follow Relation (self-referencing):
relation: FOLLOWS
from: user
to: user
source: social.follows
description: User follow relationships
tags: [social, engagement]
mappings:
from_key: follower_id
to_key: followee_id
properties:
- name: followed_at
type: datetime
required: true
indexed: true
- name: notifications_enabled
type: boolean
default: true
Validation Rules¶
grai.build validates your YAML against these rules:
Entity Validation¶
- ✅
entity
name is unique across project - ✅ All
keys
exist inproperties
- ✅ Property names are unique within entity
- ✅ Property types are valid
- ✅ If
source
is not null, it matches a configured source
Relation Validation¶
- ✅
from
entity exists - ✅
to
entity exists - ✅
from_key
exists infrom
entity's properties - ✅
to_key
exists into
entity's properties - ✅ Property names are unique within relation
- ✅ Property types are valid
Property Validation¶
- ✅
name
is valid identifier - ✅
type
is one of supported types - ✅
default
value matchestype
- ✅ If
unique: true
, thenindexed: true
(implied)
Tips and Best Practices¶
Naming Conventions¶
# Entities: lowercase, singular
entity: customer
entity: product
entity: order_item
# Relations: UPPERCASE, descriptive verb/preposition
relation: PURCHASED
relation: BELONGS_TO
relation: WORKS_FOR
# Properties: lowercase, snake_case
properties:
- name: customer_id
- name: created_at
- name: total_amount
Key Selection¶
# ✅ Good: Use stable, immutable keys
keys: [customer_id]
keys: [email] # If email never changes
# ❌ Bad: Don't use mutable fields as keys
keys: [name] # Names can change
keys: [status] # Status is mutable
Indexing Strategy¶
# Index fields you'll query frequently
properties:
- name: email
type: string
indexed: true # For lookups
- name: created_at
type: datetime
indexed: true # For range queries
- name: internal_id
type: string
indexed: false # Rarely queried
Required vs Optional¶
# Mark fields required if they're critical
properties:
- name: customer_id
type: string
required: true # Always needed
- name: phone
type: string
required: false # Optional contact method
default: null
See Also¶
- Getting Started - Tutorial with examples
- Command Reference - CLI commands
- Data Loading - Loading data from sources