Skip to content

YAML Parser Implementation

Overview

The YAML parser module (grai/core/parser/) provides comprehensive functionality for loading entity and relation definitions from YAML files into Pydantic models.

Features

✅ Core Functionality

  1. Individual File Parsing

  2. parse_entity_file() - Parse a single entity YAML file

  3. parse_relation_file() - Parse a single relation YAML file

  4. Batch Directory Loading

  5. load_entities_from_directory() - Load all entities from a directory

  6. load_relations_from_directory() - Load all relations from a directory

  7. Project Loading

  8. load_project_manifest() - Load the grai.yml project configuration

  9. load_project() - Load a complete project with entities, relations, and config

  10. Error Handling

  11. ParserError - Base exception for all parser errors
  12. YAMLParseError - YAML syntax or file errors
  13. ValidationParserError - Pydantic validation errors

✅ Key Features

  • Automatic File Discovery: Recursively finds .yml and .yaml files
  • Robust Validation: Uses Pydantic for type-safe validation
  • Clear Error Messages: Provides file paths and detailed error context
  • Flexible Structure: Supports custom directory names and paths
  • Property Parsing: Automatically converts property definitions to Property models
  • Mapping Support: Handles relation mappings between entities

Usage Examples

Parse Individual Files

from grai.core.parser import parse_entity_file, parse_relation_file

# Parse an entity
customer = parse_entity_file("entities/customer.yml")
print(f"Entity: {customer.entity}")
print(f"Keys: {customer.keys}")

# Parse a relation
purchased = parse_relation_file("relations/purchased.yml")
print(f"Relation: {purchased.relation}")
print(f"From: {purchased.from_entity} -> To: {purchased.to_entity}")

Load Entire Directories

from grai.core.parser import load_entities_from_directory, load_relations_from_directory

# Load all entities
entities = load_entities_from_directory("entities/")
print(f"Loaded {len(entities)} entities")

# Load all relations
relations = load_relations_from_directory("relations/")
print(f"Loaded {len(relations)} relations")

Load Complete Project

from grai.core.parser import load_project

# Load entire project
project = load_project("my-project/")
print(f"Project: {project.name} v{project.version}")
print(f"Entities: {len(project.entities)}")
print(f"Relations: {len(project.relations)}")

Project Structure

The parser expects this directory structure:

my-project/
├── grai.yml              # Project manifest (required)
├── entities/             # Entity definitions (optional)
│   ├── customer.yml
│   └── product.yml
└── relations/            # Relation definitions (optional)
    └── purchased.yml

YAML Format

Entity YAML

entity: customer
source: analytics.customers
keys:
  - customer_id
properties:
  - name: customer_id
    type: string
    required: true
    description: Unique identifier
  - name: name
    type: string
description: Customer entity

Relation YAML

relation: PURCHASED
from: customer
to: product
source: analytics.orders
mappings:
  from_key: customer_id
  to_key: product_id
properties:
  - name: order_id
    type: string
description: Purchase relation

Project Manifest (grai.yml)

name: my-project
version: 1.0.0
config:
  neo4j:
    uri: bolt://localhost:7687
  compiler:
    backend: neo4j
    output_dir: target/neo4j

Error Handling

The parser provides detailed error messages:

from grai.core.parser import ParserError, load_project

try:
    project = load_project("my-project/")
except ParserError as e:
    print(f"Error: {e}")
    if e.file_path:
        print(f"File: {e.file_path}")

Test Coverage

  • 33 tests covering all functionality
  • 87% code coverage across models and parser
  • Tests for:
  • Valid file parsing
  • Invalid YAML syntax
  • Missing required fields
  • Directory traversal
  • Error handling
  • Custom configurations

Implementation Details

File Discovery

The parser uses Path.glob() for file discovery:

  • Searches for both .yml and .yaml extensions
  • Returns sorted list of paths for consistent ordering

Validation Flow

  1. Load YAML file with yaml.safe_load()
  2. Parse properties into Property models
  3. Parse mappings into RelationMapping models
  4. Create Entity/Relation with Pydantic validation
  5. Collect all entities/relations into Project

Error Context

All errors include:

  • Human-readable message
  • File path where error occurred
  • Original exception details

Next Steps

The parser is production-ready and provides the foundation for:

  1. Validator - Verify entity references are consistent
  2. Compiler - Generate Cypher from parsed models
  3. CLI - User-friendly commands for building projects
  4. Loader - Execute compiled Cypher against Neo4j

Performance

  • Fast YAML parsing with PyYAML
  • Minimal memory overhead
  • Lazy loading supported (parse files individually)
  • Efficient file discovery with glob patterns

API Reference

Main Functions

Function Description Returns
parse_entity_file(path) Parse single entity file Entity
parse_relation_file(path) Parse single relation file Relation
load_entities_from_directory(dir) Load all entities List[Entity]
load_relations_from_directory(dir) Load all relations List[Relation]
load_project_manifest(path) Load grai.yml Dict[str, Any]
load_project(root) Load complete project Project

Exceptions

Exception Description
ParserError Base exception for all parser errors
YAMLParseError YAML syntax or file I/O errors
ValidationParserError Pydantic validation failures

Status: ✅ Complete and tested Coverage: 87% (222 statements, 28 missed) Tests: 33 passing