Database Schema¶

Carbon Connect uses PostgreSQL 16 with the pgvector extension for combined relational storage and 768-dimensional vector similarity search. The database is managed through SQLAlchemy 2.0 (async) with Alembic for migrations.

Entity Relationship Diagram¶

erDiagram
    tenants ||--o{ users : "has many"
    tenants ||--o{ companies : "has many"
    companies ||--o{ matches : "has many"
    companies ||--o{ applications : "has many"
    grants ||--o{ matches : "has many"
    grants ||--o{ applications : "has many"
    users ||--o| notification_preferences : "has one"
    users ||--o{ notification_history : "has many"
    applications ||--o{ documents : "has many"
    matches }o--|| applications : "optional link"

    tenants {
        uuid id PK
        string name
        string slug UK
        jsonb settings
        string subscription_tier
        boolean is_active
        timestamp created_at
        timestamp updated_at
    }

    users {
        uuid id PK
        uuid tenant_id FK
        string email
        string hashed_password
        string full_name
        string role
        boolean is_active
        boolean email_verified
        timestamp last_login_at
        timestamp created_at
        timestamp updated_at
    }

    companies {
        uuid id PK
        uuid tenant_id FK
        string name
        text description
        array nace_codes
        string country
        vector description_embedding
        float scope1_emissions
        float scope2_emissions
        float scope3_emissions
        array certifications
        vector carbon_profile_embedding
        timestamp created_at
        timestamp updated_at
    }

    grants {
        uuid id PK
        string external_id
        string source
        string title
        text description
        float funding_amount_min
        float funding_amount_max
        timestamp deadline
        array countries
        array nace_codes
        boolean is_carbon_focused
        array carbon_categories
        vector title_embedding
        vector description_embedding
        string content_hash
        timestamp created_at
        timestamp updated_at
    }

    matches {
        uuid id PK
        uuid tenant_id
        uuid company_id FK
        uuid grant_id FK
        float total_score
        float semantic_score
        float rule_score
        float carbon_score
        float collaborative_score
        float recency_score
        boolean is_viewed
        boolean is_saved
        boolean is_dismissed
        timestamp calculated_at
        timestamp created_at
        timestamp updated_at
    }

    applications {
        uuid id PK
        uuid tenant_id
        uuid company_id FK
        uuid grant_id FK
        uuid match_id FK
        string status
        float requested_amount
        float approved_amount
        jsonb generated_content
        timestamp submitted_at
        timestamp created_at
        timestamp updated_at
    }

Base Model¶

All models inherit from BaseModel which provides a UUID primary key and automatic timestamps:

Column	Type	Description
`id`	`UUID`	Auto-generated UUID v4 primary key
`created_at`	`TIMESTAMP WITH TIME ZONE`	Auto-set on creation via `func.now()`
`updated_at`	`TIMESTAMP WITH TIME ZONE`	Auto-updated on modification via `func.now()`

Source: backend/app/db/base.py

Table Definitions¶

tenants¶

Multi-tenant organizations with subscription tiers. All tenant-scoped data is isolated via PostgreSQL Row-Level Security policies.

Column	Type	Nullable	Default	Indexed	Description
`id`	UUID	No	uuid4()	PK	Primary key
`name`	VARCHAR(255)	No	--	--	Organization name
`slug`	VARCHAR(100)	No	--	UNIQUE	URL-safe identifier
`settings`	JSONB	No	`{}`	--	Tenant-specific settings
`subscription_tier`	VARCHAR(50)	No	`explorer`	--	Tier: `explorer`, `professional`, `enterprise`
`is_active`	BOOLEAN	No	`true`	Yes	Active status
`created_at`	TIMESTAMPTZ	No	`now()`	--	Creation timestamp
`updated_at`	TIMESTAMPTZ	No	`now()`	--	Last update timestamp

Relationships: Has many users, has many companies (both cascade delete).

users¶

User accounts with tenant association and role-based access control.

Column	Type	Nullable	Default	Indexed	Description
`id`	UUID	No	uuid4()	PK	Primary key
`tenant_id`	UUID	No	--	Yes, FK	References `tenants.id` (CASCADE)
`email`	VARCHAR(255)	No	--	Yes	User email address
`hashed_password`	VARCHAR(255)	No	--	--	bcrypt-hashed password
`full_name`	VARCHAR(255)	Yes	--	--	Display name
`role`	VARCHAR(50)	No	`user`	--	Role: `user`, `admin`, `owner`
`is_active`	BOOLEAN	No	`true`	--	Active status
`email_verified`	BOOLEAN	No	`false`	--	Email verification status
`last_login_at`	TIMESTAMPTZ	Yes	--	--	Last login timestamp
`created_at`	TIMESTAMPTZ	No	`now()`	--	Creation timestamp
`updated_at`	TIMESTAMPTZ	No	`now()`	--	Last update timestamp

Relationships: Belongs to tenant, has one notification_preferences, has many notification_history.

companies¶

SME profiles with industry classification, carbon profile, and 768-dimensional description embeddings for semantic matching.

Column	Type	Nullable	Default	Indexed	Description
`id`	UUID	No	uuid4()	PK	Primary key
`tenant_id`	UUID	No	--	Yes, FK	References `tenants.id` (CASCADE)
`name`	VARCHAR(500)	No	--	--	Company name
`description`	TEXT	Yes	--	--	Business description
`nace_codes`	VARCHAR[]	No	`{}`	--	European industry classification codes
`sic_codes`	VARCHAR[]	No	`{}`	--	UK Standard Industrial Classification
`legal_form`	VARCHAR(100)	Yes	--	--	Legal form (Ltd, GmbH, SARL, etc.)
`employee_count_min`	INTEGER	Yes	--	--	Minimum employee range
`employee_count_max`	INTEGER	Yes	--	--	Maximum employee range
`annual_revenue_min`	NUMERIC(15,2)	Yes	--	--	Minimum annual revenue (EUR)
`annual_revenue_max`	NUMERIC(15,2)	Yes	--	--	Maximum annual revenue (EUR)
`founding_year`	INTEGER	Yes	--	--	Year founded
`country`	VARCHAR(3)	No	--	Yes	ISO 3166-1 alpha-3 country code
`region`	VARCHAR(100)	Yes	--	--	Region/state
`city`	VARCHAR(100)	Yes	--	--	City
`postal_code`	VARCHAR(20)	Yes	--	--	Postal code
`website`	VARCHAR(500)	Yes	--	--	Company website
`contact_email`	VARCHAR(255)	Yes	--	--	Contact email
`description_embedding`	VECTOR(768)	Yes	--	HNSW	Embedding for semantic matching
`extra_data`	JSONB	No	`{}`	--	Additional metadata
`is_active`	BOOLEAN	No	`true`	--	Active status
`created_at`	TIMESTAMPTZ	No	`now()`	--	Creation timestamp
`updated_at`	TIMESTAMPTZ	No	`now()`	--	Last update timestamp

Carbon Profile Fields¶

Column	Type	Nullable	Default	Description
`carbon_profile_completed`	BOOLEAN	No	`false`	Whether carbon profile is fully completed
`has_carbon_baseline`	BOOLEAN	No	`false`	Whether baseline emissions are calculated
`scope1_emissions`	NUMERIC(15,4)	Yes	--	Scope 1 direct emissions in tCO2e/year
`scope1_sources`	JSONB	No	`{}`	Breakdown: `{natural_gas_kwh, diesel_liters, ...}`
`scope2_emissions`	NUMERIC(15,4)	Yes	--	Scope 2 indirect emissions in tCO2e/year
`scope2_sources`	JSONB	No	`{}`	Breakdown: `{electricity_kwh, district_heating_kwh, ...}`
`scope3_emissions`	NUMERIC(15,4)	Yes	--	Scope 3 value chain emissions in tCO2e/year
`scope3_categories`	JSONB	No	`{}`	GHG Protocol category breakdown
`reduction_target_percent`	NUMERIC(5,2)	Yes	--	Target emission reduction percentage
`reduction_target_year`	INTEGER	Yes	--	Year to achieve reduction target
`net_zero_target_year`	INTEGER	Yes	--	Net-zero target year
`certifications`	VARCHAR[]	No	`{}`	Certifications: `ISO_14001`, `SBTi_COMMITTED`, `CDP_DISCLOSURE`, `B_CORP`
`carbon_intensity_revenue`	NUMERIC(15,6)	Yes	--	tCO2e per EUR revenue
`carbon_intensity_employee`	NUMERIC(10,4)	Yes	--	tCO2e per employee
`carbon_profile_embedding`	VECTOR(768)	Yes	--	Embedding for carbon profile matching

Computed Properties (Python, not in database):

company_size -- Returns micro (< 10), small (< 50), medium (< 250), or large based on employee_count_max
total_emissions -- Sum of Scope 1 + Scope 2 + Scope 3 emissions
carbon_readiness_score -- 0-100 score based on profile completeness

Relationships: Belongs to tenant, has many matches, has many applications.

grants¶

Grant opportunities with eligibility criteria, carbon classification, and 768-dimensional embeddings. Grants are NOT tenant-scoped -- they are shared across all tenants.

Column	Type	Nullable	Default	Indexed	Description
`id`	UUID	No	uuid4()	PK	Primary key
`external_id`	VARCHAR(255)	Yes	--	--	ID from source system
`source`	VARCHAR(50)	No	--	Yes	Source: `cordis`, `eu_portal`, `innovate_uk`, etc.
`source_url`	TEXT	Yes	--	--	Original URL at source
`title`	VARCHAR(500)	No	--	--	Grant title
`description`	TEXT	Yes	--	--	Full description
`summary`	TEXT	Yes	--	--	Short description for cards
`funding_amount_min`	NUMERIC(15,2)	Yes	--	--	Minimum funding amount
`funding_amount_max`	NUMERIC(15,2)	Yes	--	--	Maximum funding amount
`funding_rate_min`	NUMERIC(5,2)	Yes	--	--	Minimum funding rate (0-100%)
`funding_rate_max`	NUMERIC(5,2)	Yes	--	--	Maximum funding rate (0-100%)
`currency`	VARCHAR(3)	No	`EUR`	--	Currency code
`deadline`	TIMESTAMPTZ	Yes	--	Yes	Application deadline
`publication_date`	TIMESTAMPTZ	Yes	--	--	Publication date
`start_date`	TIMESTAMPTZ	Yes	--	--	Grant program start date
`end_date`	TIMESTAMPTZ	Yes	--	--	Grant program end date
`countries`	VARCHAR[]	No	`{}`	--	Eligible countries
`regions`	VARCHAR[]	No	`{}`	--	Eligible regions
`nace_codes`	VARCHAR[]	No	`{}`	--	Eligible NACE industry codes
`company_sizes`	VARCHAR[]	No	`{}`	--	Eligible sizes: `micro`, `small`, `medium`, `large`
`legal_forms`	VARCHAR[]	No	`{}`	--	Eligible legal forms
`min_employees`	INTEGER	Yes	--	--	Minimum employee count
`max_employees`	INTEGER	Yes	--	--	Maximum employee count
`eligibility_criteria`	JSONB	No	`{}`	--	Additional eligibility data
`grant_type`	VARCHAR(100)	Yes	--	--	Grant type classification
`thematic_areas`	VARCHAR[]	No	`{}`	--	Thematic area tags
`title_embedding`	VECTOR(768)	Yes	--	HNSW	Title text embedding
`description_embedding`	VECTOR(768)	Yes	--	HNSW	Description text embedding
`content_hash`	VARCHAR(64)	Yes	--	Yes	SHA-256 hash for deduplication
`status`	VARCHAR(50)	No	`active`	Yes	Status: `active`, `closed`, `upcoming`
`is_active`	BOOLEAN	No	`true`	--	Active flag
`extra_data`	JSONB	No	`{}`	--	Additional metadata
`raw_data`	JSONB	Yes	--	--	Original data from source (for debugging)
`last_synced_at`	TIMESTAMPTZ	Yes	--	--	Last pipeline sync timestamp
`created_at`	TIMESTAMPTZ	No	`now()`	--	Creation timestamp
`updated_at`	TIMESTAMPTZ	No	`now()`	--	Last update timestamp

Carbon Classification Fields¶

Column	Type	Nullable	Default	Indexed	Description
`is_carbon_focused`	BOOLEAN	No	`false`	Yes	Primary carbon indicator
`carbon_categories`	VARCHAR[]	No	`{}`	--	Carbon categories (see list below)
`min_emission_reduction_percent`	NUMERIC(5,2)	Yes	--	--	Required minimum emission reduction %
`eligible_scopes`	VARCHAR[]	No	`{}`	--	Eligible emission scopes: `scope1`, `scope2`, `scope3`
`eu_taxonomy_aligned`	BOOLEAN	No	`false`	--	EU Taxonomy compliance flag
`taxonomy_objectives`	VARCHAR[]	No	`{}`	--	EU Taxonomy objectives (see list below)
`green_deal_aligned`	BOOLEAN	No	`false`	--	EU Green Deal alignment
`fit_for_55_relevant`	BOOLEAN	No	`false`	--	Fit for 55 package relevance
`supports_csrd_compliance`	BOOLEAN	No	`false`	--	CSRD reporting support
`supports_sbti_alignment`	BOOLEAN	No	`false`	--	Science Based Targets support
`carbon_extra_data`	JSONB	No	`{}`	--	Additional carbon metadata

Carbon Categories:

energy_efficiency, renewable_energy, clean_technology, circular_economy,
sustainable_transport, green_buildings, carbon_capture, hydrogen,
industrial_decarbonization, sustainable_agriculture, biodiversity,
water_management, waste_reduction, climate_adaptation

EU Taxonomy Objectives:

climate_mitigation, climate_adaptation, water_marine_resources,
circular_economy, pollution_prevention, biodiversity_ecosystems

Relationships: Has many matches, has many applications.

matches¶

Company-grant matching scores with full component breakdown. Matches are tenant-scoped through the tenant_id column and company relationship.

Column	Type	Nullable	Default	Indexed	Description
`id`	UUID	No	uuid4()	PK	Primary key
`tenant_id`	UUID	No	--	Yes	Tenant isolation column
`company_id`	UUID	No	--	Yes, FK	References `companies.id` (CASCADE)
`grant_id`	UUID	No	--	Yes, FK	References `grants.id` (CASCADE)
`total_score`	NUMERIC(5,4)	No	--	--	Final weighted score (0.0000 - 1.0000)
`semantic_score`	NUMERIC(5,4)	Yes	--	--	Embedding cosine similarity (25% weight)
`rule_score`	NUMERIC(5,4)	Yes	--	--	Rule-based criteria (30% weight)
`carbon_score`	NUMERIC(5,4)	Yes	--	--	Carbon alignment (25% weight)
`collaborative_score`	NUMERIC(5,4)	Yes	--	--	Peer signals (10% weight)
`recency_score`	NUMERIC(5,4)	Yes	--	--	Deadline urgency (10% weight)
`score_breakdown`	JSONB	No	`{}`	--	Human-readable score explanation
`is_viewed`	BOOLEAN	No	`false`	--	User has viewed this match
`is_saved`	BOOLEAN	No	`false`	--	User has saved this match
`is_dismissed`	BOOLEAN	No	`false`	--	User has dismissed this match
`user_rating`	INTEGER	Yes	--	--	User rating (1-5)
`calculated_at`	TIMESTAMPTZ	No	`utcnow()`	--	Score calculation timestamp
`expires_at`	TIMESTAMPTZ	Yes	--	--	Score expiration timestamp
`created_at`	TIMESTAMPTZ	No	`now()`	--	Creation timestamp
`updated_at`	TIMESTAMPTZ	No	`now()`	--	Last update timestamp

Unique Constraint: (company_id, grant_id) -- prevents duplicate matches.

Relationships: Belongs to company, belongs to grant.

applications¶

Grant application tracking with a Kanban-style status workflow.

Column	Type	Nullable	Default	Indexed	Description
`id`	UUID	No	uuid4()	PK	Primary key
`tenant_id`	UUID	No	--	Yes	Tenant isolation column
`company_id`	UUID	No	--	Yes, FK	References `companies.id` (CASCADE)
`grant_id`	UUID	No	--	Yes, FK	References `grants.id`
`match_id`	UUID	Yes	--	FK	References `matches.id` (optional link)
`status`	VARCHAR(50)	No	`draft`	Yes	Workflow status (see transitions below)
`application_reference`	VARCHAR(255)	Yes	--	--	External reference number
`submitted_at`	TIMESTAMPTZ	Yes	--	--	Submission timestamp
`decision_at`	TIMESTAMPTZ	Yes	--	--	Decision timestamp
`requested_amount`	NUMERIC(15,2)	Yes	--	--	Amount requested
`approved_amount`	NUMERIC(15,2)	Yes	--	--	Amount approved
`documents`	JSONB	No	`[]`	--	Document references (array)
`notes`	TEXT	Yes	--	--	User-visible notes
`internal_notes`	TEXT	Yes	--	--	Internal team notes
`generated_content`	JSONB	No	`{}`	--	AI-generated application content
`extra_data`	JSONB	No	`{}`	--	Additional metadata
`created_at`	TIMESTAMPTZ	No	`now()`	--	Creation timestamp
`updated_at`	TIMESTAMPTZ	No	`now()`	--	Last update timestamp

Status Workflow:

stateDiagram-v2
    [*] --> draft
    draft --> in_progress
    draft --> submitted
    in_progress --> draft
    in_progress --> submitted
    submitted --> under_review
    under_review --> approved
    under_review --> rejected
    approved --> [*]
    rejected --> [*]

Relationships: Belongs to company, belongs to grant, has many documents.

Additional Tables¶

The database includes several supporting tables:

Table	Description
`documents`	File metadata for S3-stored documents linked to applications
`notification_preferences`	Per-user notification frequency, channels, and thresholds
`notification_history`	Sent notification log with delivery status
`partners`	Partner/referral program accounts
`referrals`	Partner referral tracking with conversion status
`commissions`	Partner commission calculations and payment status
`sync_runs`	Grant pipeline sync execution history
`sync_status`	Per-source sync status tracking
`vcri_projects`	VCRI (Verified Carbon Reduction Initiative) project records
`vcri_measurements`	VCRI emission measurement data points
`vcri_certificates`	VCRI certificates with optional blockchain anchoring
`alembic_version`	Alembic migration version tracking

Indexes¶

Standard Indexes¶

Composite and single-column indexes for common query patterns:

Table	Index	Columns	Purpose
`tenants`	B-tree	`slug` (UNIQUE)	Tenant lookup by slug
`tenants`	B-tree	`is_active`	Active tenant filtering
`users`	B-tree	`tenant_id`	Tenant-scoped user queries
`users`	B-tree	`email`	Login lookup
`companies`	B-tree	`tenant_id`	Tenant-scoped company queries
`companies`	B-tree	`country`	Country filtering
`grants`	B-tree	`source`	Source filtering
`grants`	B-tree	`deadline`	Deadline sorting and filtering
`grants`	B-tree	`status`	Status filtering
`grants`	B-tree	`is_carbon_focused`	Carbon grant filtering
`grants`	B-tree	`content_hash`	Deduplication lookups
`matches`	B-tree	`tenant_id`	Tenant-scoped match queries
`matches`	B-tree	`company_id`	Company match lookups
`matches`	B-tree	`grant_id`	Grant match lookups
`matches`	UNIQUE	`(company_id, grant_id)`	Prevent duplicate matches
`applications`	B-tree	`tenant_id`	Tenant-scoped queries
`applications`	B-tree	`company_id`	Company application lookups
`applications`	B-tree	`grant_id`	Grant application lookups
`applications`	B-tree	`status`	Status filtering

Vector Indexes (pgvector)¶

HNSW (Hierarchical Navigable Small World) indexes for approximate nearest-neighbor search on 768-dimensional embeddings:

Table	Column	Dimensions	Algorithm	Purpose
`companies`	`description_embedding`	768	HNSW	Semantic company matching
`companies`	`carbon_profile_embedding`	768	HNSW	Carbon profile similarity
`grants`	`title_embedding`	768	HNSW	Grant title similarity
`grants`	`description_embedding`	768	HNSW	Grant description similarity

HNSW vs IVFFlat

The project uses HNSW indexes over IVFFlat because HNSW provides better recall at query time without requiring periodic retraining. HNSW index build time is higher but query performance is superior for the expected dataset size (100k-500k grants).

Alembic Migrations¶

Configuration¶

Alembic is configured in alembic/env.py to use the synchronous psycopg2 driver for migrations (asyncpg has compatibility issues on Windows). The database URL is pulled from the application Settings class.

All SQLAlchemy models are imported in alembic/env.py so that autogenerate can detect schema changes:

from backend.app.models.database import (
    Application, Company, Document, Grant, Match,
    NotificationHistory, NotificationPreferences,
    SyncRun, SyncStatus, Tenant, User,
)

Common Commands¶

# Apply all pending migrations
poetry run alembic upgrade head

# Roll back one migration
poetry run alembic downgrade -1

# Roll back to a specific revision
poetry run alembic downgrade <revision_id>

# Generate a new migration from model changes
poetry run alembic revision --autogenerate -m "Add new column to grants"

# View current migration state
poetry run alembic current

# View migration history
poetry run alembic history

# View migration history with verbose output
poetry run alembic history -v

Migration Best Practices¶

Always Review Autogenerated Migrations

Alembic's --autogenerate flag detects most schema changes but may miss:

Index name changes
Constraint modifications
Custom SQL (RLS policies, functions)
pgvector-specific operations

Always review the generated migration file before applying it.

pgvector Extension

The pgvector extension is created by scripts/init-extensions.sql during Docker container initialization, not by Alembic. If you see pgvector not defined errors during migration, ensure the init script has run.

Connection Strings¶

DevelopmentProduction

# Async (application runtime)
postgresql+asyncpg://grant_user:grant_password_dev@localhost:5433/grant_engine?ssl=disable

# Sync (Alembic migrations)
postgresql://grant_user:grant_password_dev@localhost:5433/grant_engine

# Async (application runtime)
postgresql+asyncpg://{user}:{password}@{rds_endpoint}:5432/{database}

# Sync (Alembic migrations)
postgresql://{user}:{password}@{rds_endpoint}:5432/{database}