Database Schema¶
Carbon Connect uses PostgreSQL 16 with the pgvector extension for combined relational storage and 768-dimensional vector similarity search. The database is managed through SQLAlchemy 2.0 (async) with Alembic for migrations.
Entity Relationship Diagram¶
erDiagram
tenants ||--o{ users : "has many"
tenants ||--o{ companies : "has many"
companies ||--o{ matches : "has many"
companies ||--o{ applications : "has many"
grants ||--o{ matches : "has many"
grants ||--o{ applications : "has many"
users ||--o| notification_preferences : "has one"
users ||--o{ notification_history : "has many"
applications ||--o{ documents : "has many"
matches }o--|| applications : "optional link"
tenants {
uuid id PK
string name
string slug UK
jsonb settings
string subscription_tier
boolean is_active
timestamp created_at
timestamp updated_at
}
users {
uuid id PK
uuid tenant_id FK
string email
string hashed_password
string full_name
string role
boolean is_active
boolean email_verified
timestamp last_login_at
timestamp created_at
timestamp updated_at
}
companies {
uuid id PK
uuid tenant_id FK
string name
text description
array nace_codes
string country
vector description_embedding
float scope1_emissions
float scope2_emissions
float scope3_emissions
array certifications
vector carbon_profile_embedding
timestamp created_at
timestamp updated_at
}
grants {
uuid id PK
string external_id
string source
string title
text description
float funding_amount_min
float funding_amount_max
timestamp deadline
array countries
array nace_codes
boolean is_carbon_focused
array carbon_categories
vector title_embedding
vector description_embedding
string content_hash
timestamp created_at
timestamp updated_at
}
matches {
uuid id PK
uuid tenant_id
uuid company_id FK
uuid grant_id FK
float total_score
float semantic_score
float rule_score
float carbon_score
float collaborative_score
float recency_score
boolean is_viewed
boolean is_saved
boolean is_dismissed
timestamp calculated_at
timestamp created_at
timestamp updated_at
}
applications {
uuid id PK
uuid tenant_id
uuid company_id FK
uuid grant_id FK
uuid match_id FK
string status
float requested_amount
float approved_amount
jsonb generated_content
timestamp submitted_at
timestamp created_at
timestamp updated_at
} Base Model¶
All models inherit from BaseModel which provides a UUID primary key and automatic timestamps:
| Column | Type | Description |
|---|---|---|
id | UUID | Auto-generated UUID v4 primary key |
created_at | TIMESTAMP WITH TIME ZONE | Auto-set on creation via func.now() |
updated_at | TIMESTAMP WITH TIME ZONE | Auto-updated on modification via func.now() |
Source: backend/app/db/base.py
Table Definitions¶
tenants¶
Multi-tenant organizations with subscription tiers. All tenant-scoped data is isolated via PostgreSQL Row-Level Security policies.
| Column | Type | Nullable | Default | Indexed | Description |
|---|---|---|---|---|---|
id | UUID | No | uuid4() | PK | Primary key |
name | VARCHAR(255) | No | -- | -- | Organization name |
slug | VARCHAR(100) | No | -- | UNIQUE | URL-safe identifier |
settings | JSONB | No | {} | -- | Tenant-specific settings |
subscription_tier | VARCHAR(50) | No | explorer | -- | Tier: explorer, professional, enterprise |
is_active | BOOLEAN | No | true | Yes | Active status |
created_at | TIMESTAMPTZ | No | now() | -- | Creation timestamp |
updated_at | TIMESTAMPTZ | No | now() | -- | Last update timestamp |
Relationships: Has many users, has many companies (both cascade delete).
users¶
User accounts with tenant association and role-based access control.
| Column | Type | Nullable | Default | Indexed | Description |
|---|---|---|---|---|---|
id | UUID | No | uuid4() | PK | Primary key |
tenant_id | UUID | No | -- | Yes, FK | References tenants.id (CASCADE) |
email | VARCHAR(255) | No | -- | Yes | User email address |
hashed_password | VARCHAR(255) | No | -- | -- | bcrypt-hashed password |
full_name | VARCHAR(255) | Yes | -- | -- | Display name |
role | VARCHAR(50) | No | user | -- | Role: user, admin, owner |
is_active | BOOLEAN | No | true | -- | Active status |
email_verified | BOOLEAN | No | false | -- | Email verification status |
last_login_at | TIMESTAMPTZ | Yes | -- | -- | Last login timestamp |
created_at | TIMESTAMPTZ | No | now() | -- | Creation timestamp |
updated_at | TIMESTAMPTZ | No | now() | -- | Last update timestamp |
Relationships: Belongs to tenant, has one notification_preferences, has many notification_history.
companies¶
SME profiles with industry classification, carbon profile, and 768-dimensional description embeddings for semantic matching.
| Column | Type | Nullable | Default | Indexed | Description |
|---|---|---|---|---|---|
id | UUID | No | uuid4() | PK | Primary key |
tenant_id | UUID | No | -- | Yes, FK | References tenants.id (CASCADE) |
name | VARCHAR(500) | No | -- | -- | Company name |
description | TEXT | Yes | -- | -- | Business description |
nace_codes | VARCHAR[] | No | {} | -- | European industry classification codes |
sic_codes | VARCHAR[] | No | {} | -- | UK Standard Industrial Classification |
legal_form | VARCHAR(100) | Yes | -- | -- | Legal form (Ltd, GmbH, SARL, etc.) |
employee_count_min | INTEGER | Yes | -- | -- | Minimum employee range |
employee_count_max | INTEGER | Yes | -- | -- | Maximum employee range |
annual_revenue_min | NUMERIC(15,2) | Yes | -- | -- | Minimum annual revenue (EUR) |
annual_revenue_max | NUMERIC(15,2) | Yes | -- | -- | Maximum annual revenue (EUR) |
founding_year | INTEGER | Yes | -- | -- | Year founded |
country | VARCHAR(3) | No | -- | Yes | ISO 3166-1 alpha-3 country code |
region | VARCHAR(100) | Yes | -- | -- | Region/state |
city | VARCHAR(100) | Yes | -- | -- | City |
postal_code | VARCHAR(20) | Yes | -- | -- | Postal code |
website | VARCHAR(500) | Yes | -- | -- | Company website |
contact_email | VARCHAR(255) | Yes | -- | -- | Contact email |
description_embedding | VECTOR(768) | Yes | -- | HNSW | Embedding for semantic matching |
extra_data | JSONB | No | {} | -- | Additional metadata |
is_active | BOOLEAN | No | true | -- | Active status |
created_at | TIMESTAMPTZ | No | now() | -- | Creation timestamp |
updated_at | TIMESTAMPTZ | No | now() | -- | Last update timestamp |
Carbon Profile Fields¶
| Column | Type | Nullable | Default | Description |
|---|---|---|---|---|
carbon_profile_completed | BOOLEAN | No | false | Whether carbon profile is fully completed |
has_carbon_baseline | BOOLEAN | No | false | Whether baseline emissions are calculated |
scope1_emissions | NUMERIC(15,4) | Yes | -- | Scope 1 direct emissions in tCO2e/year |
scope1_sources | JSONB | No | {} | Breakdown: {natural_gas_kwh, diesel_liters, ...} |
scope2_emissions | NUMERIC(15,4) | Yes | -- | Scope 2 indirect emissions in tCO2e/year |
scope2_sources | JSONB | No | {} | Breakdown: {electricity_kwh, district_heating_kwh, ...} |
scope3_emissions | NUMERIC(15,4) | Yes | -- | Scope 3 value chain emissions in tCO2e/year |
scope3_categories | JSONB | No | {} | GHG Protocol category breakdown |
reduction_target_percent | NUMERIC(5,2) | Yes | -- | Target emission reduction percentage |
reduction_target_year | INTEGER | Yes | -- | Year to achieve reduction target |
net_zero_target_year | INTEGER | Yes | -- | Net-zero target year |
certifications | VARCHAR[] | No | {} | Certifications: ISO_14001, SBTi_COMMITTED, CDP_DISCLOSURE, B_CORP |
carbon_intensity_revenue | NUMERIC(15,6) | Yes | -- | tCO2e per EUR revenue |
carbon_intensity_employee | NUMERIC(10,4) | Yes | -- | tCO2e per employee |
carbon_profile_embedding | VECTOR(768) | Yes | -- | Embedding for carbon profile matching |
Computed Properties (Python, not in database):
company_size-- Returnsmicro(< 10),small(< 50),medium(< 250), orlargebased onemployee_count_maxtotal_emissions-- Sum of Scope 1 + Scope 2 + Scope 3 emissionscarbon_readiness_score-- 0-100 score based on profile completeness
Relationships: Belongs to tenant, has many matches, has many applications.
grants¶
Grant opportunities with eligibility criteria, carbon classification, and 768-dimensional embeddings. Grants are NOT tenant-scoped -- they are shared across all tenants.
| Column | Type | Nullable | Default | Indexed | Description |
|---|---|---|---|---|---|
id | UUID | No | uuid4() | PK | Primary key |
external_id | VARCHAR(255) | Yes | -- | -- | ID from source system |
source | VARCHAR(50) | No | -- | Yes | Source: cordis, eu_portal, innovate_uk, etc. |
source_url | TEXT | Yes | -- | -- | Original URL at source |
title | VARCHAR(500) | No | -- | -- | Grant title |
description | TEXT | Yes | -- | -- | Full description |
summary | TEXT | Yes | -- | -- | Short description for cards |
funding_amount_min | NUMERIC(15,2) | Yes | -- | -- | Minimum funding amount |
funding_amount_max | NUMERIC(15,2) | Yes | -- | -- | Maximum funding amount |
funding_rate_min | NUMERIC(5,2) | Yes | -- | -- | Minimum funding rate (0-100%) |
funding_rate_max | NUMERIC(5,2) | Yes | -- | -- | Maximum funding rate (0-100%) |
currency | VARCHAR(3) | No | EUR | -- | Currency code |
deadline | TIMESTAMPTZ | Yes | -- | Yes | Application deadline |
publication_date | TIMESTAMPTZ | Yes | -- | -- | Publication date |
start_date | TIMESTAMPTZ | Yes | -- | -- | Grant program start date |
end_date | TIMESTAMPTZ | Yes | -- | -- | Grant program end date |
countries | VARCHAR[] | No | {} | -- | Eligible countries |
regions | VARCHAR[] | No | {} | -- | Eligible regions |
nace_codes | VARCHAR[] | No | {} | -- | Eligible NACE industry codes |
company_sizes | VARCHAR[] | No | {} | -- | Eligible sizes: micro, small, medium, large |
legal_forms | VARCHAR[] | No | {} | -- | Eligible legal forms |
min_employees | INTEGER | Yes | -- | -- | Minimum employee count |
max_employees | INTEGER | Yes | -- | -- | Maximum employee count |
eligibility_criteria | JSONB | No | {} | -- | Additional eligibility data |
grant_type | VARCHAR(100) | Yes | -- | -- | Grant type classification |
thematic_areas | VARCHAR[] | No | {} | -- | Thematic area tags |
title_embedding | VECTOR(768) | Yes | -- | HNSW | Title text embedding |
description_embedding | VECTOR(768) | Yes | -- | HNSW | Description text embedding |
content_hash | VARCHAR(64) | Yes | -- | Yes | SHA-256 hash for deduplication |
status | VARCHAR(50) | No | active | Yes | Status: active, closed, upcoming |
is_active | BOOLEAN | No | true | -- | Active flag |
extra_data | JSONB | No | {} | -- | Additional metadata |
raw_data | JSONB | Yes | -- | -- | Original data from source (for debugging) |
last_synced_at | TIMESTAMPTZ | Yes | -- | -- | Last pipeline sync timestamp |
created_at | TIMESTAMPTZ | No | now() | -- | Creation timestamp |
updated_at | TIMESTAMPTZ | No | now() | -- | Last update timestamp |
Carbon Classification Fields¶
| Column | Type | Nullable | Default | Indexed | Description |
|---|---|---|---|---|---|
is_carbon_focused | BOOLEAN | No | false | Yes | Primary carbon indicator |
carbon_categories | VARCHAR[] | No | {} | -- | Carbon categories (see list below) |
min_emission_reduction_percent | NUMERIC(5,2) | Yes | -- | -- | Required minimum emission reduction % |
eligible_scopes | VARCHAR[] | No | {} | -- | Eligible emission scopes: scope1, scope2, scope3 |
eu_taxonomy_aligned | BOOLEAN | No | false | -- | EU Taxonomy compliance flag |
taxonomy_objectives | VARCHAR[] | No | {} | -- | EU Taxonomy objectives (see list below) |
green_deal_aligned | BOOLEAN | No | false | -- | EU Green Deal alignment |
fit_for_55_relevant | BOOLEAN | No | false | -- | Fit for 55 package relevance |
supports_csrd_compliance | BOOLEAN | No | false | -- | CSRD reporting support |
supports_sbti_alignment | BOOLEAN | No | false | -- | Science Based Targets support |
carbon_extra_data | JSONB | No | {} | -- | Additional carbon metadata |
Carbon Categories:
energy_efficiency, renewable_energy, clean_technology, circular_economy,
sustainable_transport, green_buildings, carbon_capture, hydrogen,
industrial_decarbonization, sustainable_agriculture, biodiversity,
water_management, waste_reduction, climate_adaptation
EU Taxonomy Objectives:
climate_mitigation, climate_adaptation, water_marine_resources,
circular_economy, pollution_prevention, biodiversity_ecosystems
Relationships: Has many matches, has many applications.
matches¶
Company-grant matching scores with full component breakdown. Matches are tenant-scoped through the tenant_id column and company relationship.
| Column | Type | Nullable | Default | Indexed | Description |
|---|---|---|---|---|---|
id | UUID | No | uuid4() | PK | Primary key |
tenant_id | UUID | No | -- | Yes | Tenant isolation column |
company_id | UUID | No | -- | Yes, FK | References companies.id (CASCADE) |
grant_id | UUID | No | -- | Yes, FK | References grants.id (CASCADE) |
total_score | NUMERIC(5,4) | No | -- | -- | Final weighted score (0.0000 - 1.0000) |
semantic_score | NUMERIC(5,4) | Yes | -- | -- | Embedding cosine similarity (25% weight) |
rule_score | NUMERIC(5,4) | Yes | -- | -- | Rule-based criteria (30% weight) |
carbon_score | NUMERIC(5,4) | Yes | -- | -- | Carbon alignment (25% weight) |
collaborative_score | NUMERIC(5,4) | Yes | -- | -- | Peer signals (10% weight) |
recency_score | NUMERIC(5,4) | Yes | -- | -- | Deadline urgency (10% weight) |
score_breakdown | JSONB | No | {} | -- | Human-readable score explanation |
is_viewed | BOOLEAN | No | false | -- | User has viewed this match |
is_saved | BOOLEAN | No | false | -- | User has saved this match |
is_dismissed | BOOLEAN | No | false | -- | User has dismissed this match |
user_rating | INTEGER | Yes | -- | -- | User rating (1-5) |
calculated_at | TIMESTAMPTZ | No | utcnow() | -- | Score calculation timestamp |
expires_at | TIMESTAMPTZ | Yes | -- | -- | Score expiration timestamp |
created_at | TIMESTAMPTZ | No | now() | -- | Creation timestamp |
updated_at | TIMESTAMPTZ | No | now() | -- | Last update timestamp |
Unique Constraint: (company_id, grant_id) -- prevents duplicate matches.
Relationships: Belongs to company, belongs to grant.
applications¶
Grant application tracking with a Kanban-style status workflow.
| Column | Type | Nullable | Default | Indexed | Description |
|---|---|---|---|---|---|
id | UUID | No | uuid4() | PK | Primary key |
tenant_id | UUID | No | -- | Yes | Tenant isolation column |
company_id | UUID | No | -- | Yes, FK | References companies.id (CASCADE) |
grant_id | UUID | No | -- | Yes, FK | References grants.id |
match_id | UUID | Yes | -- | FK | References matches.id (optional link) |
status | VARCHAR(50) | No | draft | Yes | Workflow status (see transitions below) |
application_reference | VARCHAR(255) | Yes | -- | -- | External reference number |
submitted_at | TIMESTAMPTZ | Yes | -- | -- | Submission timestamp |
decision_at | TIMESTAMPTZ | Yes | -- | -- | Decision timestamp |
requested_amount | NUMERIC(15,2) | Yes | -- | -- | Amount requested |
approved_amount | NUMERIC(15,2) | Yes | -- | -- | Amount approved |
documents | JSONB | No | [] | -- | Document references (array) |
notes | TEXT | Yes | -- | -- | User-visible notes |
internal_notes | TEXT | Yes | -- | -- | Internal team notes |
generated_content | JSONB | No | {} | -- | AI-generated application content |
extra_data | JSONB | No | {} | -- | Additional metadata |
created_at | TIMESTAMPTZ | No | now() | -- | Creation timestamp |
updated_at | TIMESTAMPTZ | No | now() | -- | Last update timestamp |
Status Workflow:
stateDiagram-v2
[*] --> draft
draft --> in_progress
draft --> submitted
in_progress --> draft
in_progress --> submitted
submitted --> under_review
under_review --> approved
under_review --> rejected
approved --> [*]
rejected --> [*] Relationships: Belongs to company, belongs to grant, has many documents.
Additional Tables¶
The database includes several supporting tables:
| Table | Description |
|---|---|
documents | File metadata for S3-stored documents linked to applications |
notification_preferences | Per-user notification frequency, channels, and thresholds |
notification_history | Sent notification log with delivery status |
partners | Partner/referral program accounts |
referrals | Partner referral tracking with conversion status |
commissions | Partner commission calculations and payment status |
sync_runs | Grant pipeline sync execution history |
sync_status | Per-source sync status tracking |
vcri_projects | VCRI (Verified Carbon Reduction Initiative) project records |
vcri_measurements | VCRI emission measurement data points |
vcri_certificates | VCRI certificates with optional blockchain anchoring |
alembic_version | Alembic migration version tracking |
Indexes¶
Standard Indexes¶
Composite and single-column indexes for common query patterns:
| Table | Index | Columns | Purpose |
|---|---|---|---|
tenants | B-tree | slug (UNIQUE) | Tenant lookup by slug |
tenants | B-tree | is_active | Active tenant filtering |
users | B-tree | tenant_id | Tenant-scoped user queries |
users | B-tree | email | Login lookup |
companies | B-tree | tenant_id | Tenant-scoped company queries |
companies | B-tree | country | Country filtering |
grants | B-tree | source | Source filtering |
grants | B-tree | deadline | Deadline sorting and filtering |
grants | B-tree | status | Status filtering |
grants | B-tree | is_carbon_focused | Carbon grant filtering |
grants | B-tree | content_hash | Deduplication lookups |
matches | B-tree | tenant_id | Tenant-scoped match queries |
matches | B-tree | company_id | Company match lookups |
matches | B-tree | grant_id | Grant match lookups |
matches | UNIQUE | (company_id, grant_id) | Prevent duplicate matches |
applications | B-tree | tenant_id | Tenant-scoped queries |
applications | B-tree | company_id | Company application lookups |
applications | B-tree | grant_id | Grant application lookups |
applications | B-tree | status | Status filtering |
Vector Indexes (pgvector)¶
HNSW (Hierarchical Navigable Small World) indexes for approximate nearest-neighbor search on 768-dimensional embeddings:
| Table | Column | Dimensions | Algorithm | Purpose |
|---|---|---|---|---|
companies | description_embedding | 768 | HNSW | Semantic company matching |
companies | carbon_profile_embedding | 768 | HNSW | Carbon profile similarity |
grants | title_embedding | 768 | HNSW | Grant title similarity |
grants | description_embedding | 768 | HNSW | Grant description similarity |
HNSW vs IVFFlat
The project uses HNSW indexes over IVFFlat because HNSW provides better recall at query time without requiring periodic retraining. HNSW index build time is higher but query performance is superior for the expected dataset size (100k-500k grants).
Alembic Migrations¶
Configuration¶
Alembic is configured in alembic/env.py to use the synchronous psycopg2 driver for migrations (asyncpg has compatibility issues on Windows). The database URL is pulled from the application Settings class.
All SQLAlchemy models are imported in alembic/env.py so that autogenerate can detect schema changes:
from backend.app.models.database import (
Application, Company, Document, Grant, Match,
NotificationHistory, NotificationPreferences,
SyncRun, SyncStatus, Tenant, User,
)
Common Commands¶
# Apply all pending migrations
poetry run alembic upgrade head
# Roll back one migration
poetry run alembic downgrade -1
# Roll back to a specific revision
poetry run alembic downgrade <revision_id>
# Generate a new migration from model changes
poetry run alembic revision --autogenerate -m "Add new column to grants"
# View current migration state
poetry run alembic current
# View migration history
poetry run alembic history
# View migration history with verbose output
poetry run alembic history -v
Migration Best Practices¶
Always Review Autogenerated Migrations
Alembic's --autogenerate flag detects most schema changes but may miss:
- Index name changes
- Constraint modifications
- Custom SQL (RLS policies, functions)
- pgvector-specific operations
Always review the generated migration file before applying it.
pgvector Extension
The pgvector extension is created by scripts/init-extensions.sql during Docker container initialization, not by Alembic. If you see pgvector not defined errors during migration, ensure the init script has run.