Skip to content

Connection Profiles

Similar to dbt's profiles.yml, grai.build uses a profiles system to manage connections to data warehouses and graph databases. This makes it easy to switch between development, staging, and production environments.

Profile File Location

By default, profiles are stored at ~/.grai/profiles.yml. You can override this location by setting the GRAI_PROFILES_DIR environment variable:

export GRAI_PROFILES_DIR=/path/to/custom/location

Creating Your First Profile

When you run grai init, a default profiles.yml file is created at ~/.grai/profiles.yml. You can also create one manually:

# ~/.grai/profiles.yml
default:
  target: dev
  outputs:
    dev:
      # Data warehouse configuration
      warehouse:
        type: bigquery
        method: oauth
        project: my-gcp-project
        dataset: analytics
        location: US
        timeout_seconds: 300

      # Graph database configuration
      graph:
        type: neo4j
        uri: bolt://localhost:7687
        user: neo4j
        password: mypassword
        database: neo4j
        encrypted: true

    prod:
      warehouse:
        type: bigquery
        method: service-account
        project: my-prod-project
        dataset: analytics_prod
        location: US
        keyfile: /path/to/service-account.json

      graph:
        type: neo4j
        uri: bolt://prod-neo4j.example.com:7687
        user: neo4j
        password: prodpassword
        database: neo4j
        encrypted: true

Profile Structure

Each profile has:

  • Profile name (e.g., default, my_project): Top-level key
  • target: The default environment to use (e.g., dev, prod)
  • outputs: Named configurations for different environments
  • warehouse: Data warehouse connection (BigQuery, Snowflake, etc.)
  • graph: Graph database connection (Neo4j)

Using Environment Variables

You can reference environment variables in your profiles using Jinja-style syntax:

default:
  target: dev
  outputs:
    dev:
      warehouse:
        type: bigquery
        method: oauth
        project: "{{ env_var('GCP_PROJECT') }}"
        dataset: analytics

      graph:
        type: neo4j
        uri: bolt://localhost:7687
        user: neo4j
        password: "{{ env_var('NEO4J_PASSWORD') }}"

Then set the environment variables:

export GCP_PROJECT=my-dev-project
export NEO4J_PASSWORD=mypassword

BigQuery Configuration

OAuth Authentication (Development)

warehouse:
  type: bigquery
  method: oauth
  project: my-project
  dataset: analytics
  location: US
  timeout_seconds: 300

This uses your local gcloud credentials:

gcloud auth application-default login

Service Account (Production)

warehouse:
  type: bigquery
  method: service-account
  project: my-project
  dataset: analytics
  location: US
  keyfile: /path/to/service-account.json
  timeout_seconds: 600

Service Account JSON (Alternative)

warehouse:
  type: bigquery
  method: service-account-json
  project: my-project
  dataset: analytics
  location: US
  keyfile_json:
    type: service_account
    project_id: my-project
    private_key_id: "..."
    private_key: "..."
    client_email: "..."
    # ... rest of service account JSON

PostgreSQL Configuration

Basic Configuration

warehouse:
  type: postgres
  host: localhost
  port: 5432
  database: analytics
  user: "{{ env_var('POSTGRES_USER') }}"
  password: "{{ env_var('POSTGRES_PASSWORD') }}"
  schema: public
  sslmode: prefer

SSL/TLS Configuration

warehouse:
  type: postgres
  host: prod-postgres.example.com
  port: 5432
  database: analytics_prod
  user: grai_user
  password: "{{ env_var('POSTGRES_PASSWORD') }}"
  schema: analytics
  sslmode: require # Options: disable, allow, prefer, require, verify-ca, verify-full

Amazon RDS PostgreSQL

warehouse:
  type: postgres
  host: my-db.xxxxx.us-east-1.rds.amazonaws.com
  port: 5432
  database: production
  user: "{{ env_var('RDS_USER') }}"
  password: "{{ env_var('RDS_PASSWORD') }}"
  schema: public
  sslmode: require

Google Cloud SQL PostgreSQL

warehouse:
  type: postgres
  host: /cloudsql/project:region:instance # Unix socket path
  port: 5432
  database: analytics
  user: postgres
  password: "{{ env_var('CLOUDSQL_PASSWORD') }}"
  schema: public
  sslmode: disable # SSL handled by Cloud SQL Proxy

Snowflake Configuration

Password Authentication

warehouse:
  type: snowflake
  account: abc12345.us-east-1 # Account identifier (includes region)
  user: "{{ env_var('SNOWFLAKE_USER') }}"
  password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
  role: ANALYST
  database: ANALYTICS
  warehouse: COMPUTE_WH
  schema: PUBLIC

SSO Authentication (Browser-based)

warehouse:
  type: snowflake
  account: abc12345.us-east-1
  user: "{{ env_var('SNOWFLAKE_USER') }}"
  authenticator: externalbrowser # Opens browser for SSO
  role: ANALYST
  database: ANALYTICS
  warehouse: COMPUTE_WH
  schema: PUBLIC

With Specific Role and Warehouse

warehouse:
  type: snowflake
  account: myorg-prod.us-west-2.aws
  user: data_engineer
  password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
  role: DATA_ENGINEER # Specific role for permissions
  database: PROD_ANALYTICS
  warehouse: ETL_WH # Dedicated warehouse for ETL
  schema: GRAPH_STAGING

Okta Authentication

warehouse:
  type: snowflake
  account: abc12345.us-east-1
  user: "{{ env_var('SNOWFLAKE_USER') }}"
  password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
  authenticator: https://mycompany.okta.com # Okta URL
  role: ANALYST
  database: ANALYTICS
  warehouse: COMPUTE_WH
  schema: PUBLIC

Neo4j Configuration

graph:
  type: neo4j
  uri: bolt://localhost:7687
  user: neo4j
  password: "{{ env_var('NEO4J_PASSWORD') }}"
  database: neo4j # Optional: defaults to 'neo4j'
  encrypted: true # Optional: defaults to true
  trust: TRUST_SYSTEM_CA_SIGNED_CERTIFICATES # Optional

Neo4j Aura (Cloud)

graph:
  type: neo4j
  uri: neo4j+s://xxxxx.databases.neo4j.io
  user: neo4j
  password: "{{ env_var('NEO4J_AURA_PASSWORD') }}"
  database: neo4j
  encrypted: true

Using Profiles in Your Project

In grai.yml

Reference a profile in your project manifest:

name: my-project
version: 1.0.0

# Use this profile by default
profile: default

Command Line

Override the profile or target at runtime:

# Use default profile and target
grai load customer

# Use specific profile
grai load customer --profile my_project

# Use specific target within a profile
grai load customer --target prod

# Use both
grai load customer --profile my_project --target staging

Environment Variables

Set environment variables to override defaults:

# Override profile
export GRAI_PROFILE=my_project

# Override target
export GRAI_TARGET=prod

# Now these use the prod target
grai load customer
grai load PURCHASED

Multiple Projects

You can define multiple profiles in one file:

# Project A
ecommerce:
  target: dev
  outputs:
    dev:
      warehouse:
        type: bigquery
        project: ecommerce-dev
        dataset: analytics
      graph:
        type: neo4j
        uri: bolt://localhost:7687
        user: neo4j
        password: devpass

# Project B
social_network:
  target: dev
  outputs:
    dev:
      warehouse:
        type: snowflake
        account: xyz789.us-west-2
        user: analyst
        database: SOCIAL
      graph:
        type: neo4j
        uri: bolt://localhost:7688
        user: neo4j
        password: devpass2

Then specify which profile to use:

cd ecommerce-project
grai load customer --profile ecommerce

cd ../social-project
grai load user --profile social_network

Best Practices

1. Use Environment Variables for Secrets

Never commit passwords or API keys to version control:

# ✅ Good
password: "{{ env_var('NEO4J_PASSWORD') }}"

# ❌ Bad
password: supersecretpassword123

2. Separate Profiles by Environment

Use different targets for dev, staging, and prod:

myproject:
  target: dev
  outputs:
    dev:
      # Development settings
    staging:
      # Staging settings
    prod:
      # Production settings

3. Document Your Profiles

Add comments to explain configuration choices:

myproject:
  target: dev
  outputs:
    dev:
      warehouse:
        type: bigquery
        method: oauth # Uses local gcloud credentials
        project: myproject-dev
        dataset: analytics
        # Timeout increased for long-running transformations
        timeout_seconds: 600

4. Share Profile Template

Create a profiles.yml.example in your project repository:

# profiles.yml.example
# Copy to ~/.grai/profiles.yml and fill in your credentials

myproject:
  target: dev
  outputs:
    dev:
      warehouse:
        type: bigquery
        method: oauth
        project: "{{ env_var('GCP_PROJECT') }}" # Set this
        dataset: analytics
      graph:
        type: neo4j
        uri: bolt://localhost:7687
        user: neo4j
        password: "{{ env_var('NEO4J_PASSWORD') }}" # Set this

Troubleshooting

Profile Not Found

Error: Profile file not found at ~/.grai/profiles.yml

Solution: Run grai init or create the file manually.

Profile Name Not Found

Error: Profile 'myproject' not found. Available profiles: default

Solution: Check the profile name in your profiles.yml matches what you're using.

Environment Variable Not Set

Error: Environment variable 'NEO4J_PASSWORD' is not set

Solution: Set the required environment variable:

export NEO4J_PASSWORD=yourpassword

Target Not Found

Error: Target 'prod' not found in profile 'default'

Solution: Add the target to your profile or check for typos.

Example Workflows

Development Workflow

# Use local development environment
export GRAI_TARGET=dev
export NEO4J_PASSWORD=devpass

grai load customer
grai load product
grai load PURCHASED

Production Deployment

# Use production environment
export GRAI_TARGET=prod
export NEO4J_PASSWORD=$(vault read -field=password secret/neo4j/prod)
export GCP_KEYFILE_PATH=/etc/gcp/service-account.json

grai load customer --limit 10  # Test with small batch
grai load customer            # Full load

CI/CD Pipeline

# .github/workflows/load-data.yml
name: Load Data to Neo4j

on:
  schedule:
    - cron: "0 2 * * *" # Daily at 2 AM

jobs:
  load:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.11"

      - name: Install grai.build
        run: pip install grai-build

      - name: Load data
        env:
          GRAI_TARGET: prod
          NEO4J_PASSWORD: ${{ secrets.NEO4J_PASSWORD }}
          GCP_PROJECT: ${{ secrets.GCP_PROJECT }}
        run: |
          grai load customer
          grai load product
          grai load PURCHASED

See Also