Connection Profiles¶
Similar to dbt's profiles.yml
, grai.build uses a profiles system to manage connections to data warehouses and graph databases. This makes it easy to switch between development, staging, and production environments.
Profile File Location¶
By default, profiles are stored at ~/.grai/profiles.yml
. You can override this location by setting the GRAI_PROFILES_DIR
environment variable:
Creating Your First Profile¶
When you run grai init
, a default profiles.yml
file is created at ~/.grai/profiles.yml
. You can also create one manually:
# ~/.grai/profiles.yml
default:
target: dev
outputs:
dev:
# Data warehouse configuration
warehouse:
type: bigquery
method: oauth
project: my-gcp-project
dataset: analytics
location: US
timeout_seconds: 300
# Graph database configuration
graph:
type: neo4j
uri: bolt://localhost:7687
user: neo4j
password: mypassword
database: neo4j
encrypted: true
prod:
warehouse:
type: bigquery
method: service-account
project: my-prod-project
dataset: analytics_prod
location: US
keyfile: /path/to/service-account.json
graph:
type: neo4j
uri: bolt://prod-neo4j.example.com:7687
user: neo4j
password: prodpassword
database: neo4j
encrypted: true
Profile Structure¶
Each profile has:
- Profile name (e.g.,
default
,my_project
): Top-level key - target: The default environment to use (e.g.,
dev
,prod
) - outputs: Named configurations for different environments
- warehouse: Data warehouse connection (BigQuery, Snowflake, etc.)
- graph: Graph database connection (Neo4j)
Using Environment Variables¶
You can reference environment variables in your profiles using Jinja-style syntax:
default:
target: dev
outputs:
dev:
warehouse:
type: bigquery
method: oauth
project: "{{ env_var('GCP_PROJECT') }}"
dataset: analytics
graph:
type: neo4j
uri: bolt://localhost:7687
user: neo4j
password: "{{ env_var('NEO4J_PASSWORD') }}"
Then set the environment variables:
BigQuery Configuration¶
OAuth Authentication (Development)¶
warehouse:
type: bigquery
method: oauth
project: my-project
dataset: analytics
location: US
timeout_seconds: 300
This uses your local gcloud credentials:
Service Account (Production)¶
warehouse:
type: bigquery
method: service-account
project: my-project
dataset: analytics
location: US
keyfile: /path/to/service-account.json
timeout_seconds: 600
Service Account JSON (Alternative)¶
warehouse:
type: bigquery
method: service-account-json
project: my-project
dataset: analytics
location: US
keyfile_json:
type: service_account
project_id: my-project
private_key_id: "..."
private_key: "..."
client_email: "..."
# ... rest of service account JSON
PostgreSQL Configuration¶
Basic Configuration¶
warehouse:
type: postgres
host: localhost
port: 5432
database: analytics
user: "{{ env_var('POSTGRES_USER') }}"
password: "{{ env_var('POSTGRES_PASSWORD') }}"
schema: public
sslmode: prefer
SSL/TLS Configuration¶
warehouse:
type: postgres
host: prod-postgres.example.com
port: 5432
database: analytics_prod
user: grai_user
password: "{{ env_var('POSTGRES_PASSWORD') }}"
schema: analytics
sslmode: require # Options: disable, allow, prefer, require, verify-ca, verify-full
Amazon RDS PostgreSQL¶
warehouse:
type: postgres
host: my-db.xxxxx.us-east-1.rds.amazonaws.com
port: 5432
database: production
user: "{{ env_var('RDS_USER') }}"
password: "{{ env_var('RDS_PASSWORD') }}"
schema: public
sslmode: require
Google Cloud SQL PostgreSQL¶
warehouse:
type: postgres
host: /cloudsql/project:region:instance # Unix socket path
port: 5432
database: analytics
user: postgres
password: "{{ env_var('CLOUDSQL_PASSWORD') }}"
schema: public
sslmode: disable # SSL handled by Cloud SQL Proxy
Snowflake Configuration¶
Password Authentication¶
warehouse:
type: snowflake
account: abc12345.us-east-1 # Account identifier (includes region)
user: "{{ env_var('SNOWFLAKE_USER') }}"
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
role: ANALYST
database: ANALYTICS
warehouse: COMPUTE_WH
schema: PUBLIC
SSO Authentication (Browser-based)¶
warehouse:
type: snowflake
account: abc12345.us-east-1
user: "{{ env_var('SNOWFLAKE_USER') }}"
authenticator: externalbrowser # Opens browser for SSO
role: ANALYST
database: ANALYTICS
warehouse: COMPUTE_WH
schema: PUBLIC
With Specific Role and Warehouse¶
warehouse:
type: snowflake
account: myorg-prod.us-west-2.aws
user: data_engineer
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
role: DATA_ENGINEER # Specific role for permissions
database: PROD_ANALYTICS
warehouse: ETL_WH # Dedicated warehouse for ETL
schema: GRAPH_STAGING
Okta Authentication¶
warehouse:
type: snowflake
account: abc12345.us-east-1
user: "{{ env_var('SNOWFLAKE_USER') }}"
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
authenticator: https://mycompany.okta.com # Okta URL
role: ANALYST
database: ANALYTICS
warehouse: COMPUTE_WH
schema: PUBLIC
Neo4j Configuration¶
graph:
type: neo4j
uri: bolt://localhost:7687
user: neo4j
password: "{{ env_var('NEO4J_PASSWORD') }}"
database: neo4j # Optional: defaults to 'neo4j'
encrypted: true # Optional: defaults to true
trust: TRUST_SYSTEM_CA_SIGNED_CERTIFICATES # Optional
Neo4j Aura (Cloud)¶
graph:
type: neo4j
uri: neo4j+s://xxxxx.databases.neo4j.io
user: neo4j
password: "{{ env_var('NEO4J_AURA_PASSWORD') }}"
database: neo4j
encrypted: true
Using Profiles in Your Project¶
In grai.yml¶
Reference a profile in your project manifest:
Command Line¶
Override the profile or target at runtime:
# Use default profile and target
grai load customer
# Use specific profile
grai load customer --profile my_project
# Use specific target within a profile
grai load customer --target prod
# Use both
grai load customer --profile my_project --target staging
Environment Variables¶
Set environment variables to override defaults:
# Override profile
export GRAI_PROFILE=my_project
# Override target
export GRAI_TARGET=prod
# Now these use the prod target
grai load customer
grai load PURCHASED
Multiple Projects¶
You can define multiple profiles in one file:
# Project A
ecommerce:
target: dev
outputs:
dev:
warehouse:
type: bigquery
project: ecommerce-dev
dataset: analytics
graph:
type: neo4j
uri: bolt://localhost:7687
user: neo4j
password: devpass
# Project B
social_network:
target: dev
outputs:
dev:
warehouse:
type: snowflake
account: xyz789.us-west-2
user: analyst
database: SOCIAL
graph:
type: neo4j
uri: bolt://localhost:7688
user: neo4j
password: devpass2
Then specify which profile to use:
cd ecommerce-project
grai load customer --profile ecommerce
cd ../social-project
grai load user --profile social_network
Best Practices¶
1. Use Environment Variables for Secrets¶
Never commit passwords or API keys to version control:
2. Separate Profiles by Environment¶
Use different targets for dev, staging, and prod:
myproject:
target: dev
outputs:
dev:
# Development settings
staging:
# Staging settings
prod:
# Production settings
3. Document Your Profiles¶
Add comments to explain configuration choices:
myproject:
target: dev
outputs:
dev:
warehouse:
type: bigquery
method: oauth # Uses local gcloud credentials
project: myproject-dev
dataset: analytics
# Timeout increased for long-running transformations
timeout_seconds: 600
4. Share Profile Template¶
Create a profiles.yml.example
in your project repository:
# profiles.yml.example
# Copy to ~/.grai/profiles.yml and fill in your credentials
myproject:
target: dev
outputs:
dev:
warehouse:
type: bigquery
method: oauth
project: "{{ env_var('GCP_PROJECT') }}" # Set this
dataset: analytics
graph:
type: neo4j
uri: bolt://localhost:7687
user: neo4j
password: "{{ env_var('NEO4J_PASSWORD') }}" # Set this
Troubleshooting¶
Profile Not Found¶
Solution: Run grai init
or create the file manually.
Profile Name Not Found¶
Solution: Check the profile name in your profiles.yml
matches what you're using.
Environment Variable Not Set¶
Solution: Set the required environment variable:
Target Not Found¶
Solution: Add the target to your profile or check for typos.
Example Workflows¶
Development Workflow¶
# Use local development environment
export GRAI_TARGET=dev
export NEO4J_PASSWORD=devpass
grai load customer
grai load product
grai load PURCHASED
Production Deployment¶
# Use production environment
export GRAI_TARGET=prod
export NEO4J_PASSWORD=$(vault read -field=password secret/neo4j/prod)
export GCP_KEYFILE_PATH=/etc/gcp/service-account.json
grai load customer --limit 10 # Test with small batch
grai load customer # Full load
CI/CD Pipeline¶
# .github/workflows/load-data.yml
name: Load Data to Neo4j
on:
schedule:
- cron: "0 2 * * *" # Daily at 2 AM
jobs:
load:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Install grai.build
run: pip install grai-build
- name: Load data
env:
GRAI_TARGET: prod
NEO4J_PASSWORD: ${{ secrets.NEO4J_PASSWORD }}
GCP_PROJECT: ${{ secrets.GCP_PROJECT }}
run: |
grai load customer
grai load product
grai load PURCHASED
See Also¶
- Data Loading - Loading data from warehouses
- dbt Integration - Importing dbt models
- Neo4j Setup - Setting up Neo4j