System Architecture¶
This page describes the high-level architecture of Carbon Connect, the data flow between components, and the design of the carbon-enhanced matching algorithm.
System Overview¶
Carbon Connect follows a layered architecture with clear separation between the presentation layer (Next.js), the API layer (FastAPI), the data layer (PostgreSQL + Meilisearch), and the background processing layer (Celery workers).
graph TB
subgraph Client Layer
Browser["Browser"]
end
subgraph Presentation Layer
NextJS["Next.js 14<br/>App Router<br/>:3000"]
end
subgraph API Layer
FastAPI["FastAPI<br/>REST API<br/>:8000"]
Auth["JWT Auth<br/>Middleware"]
Tenant["Tenant<br/>Middleware"]
end
subgraph Data Layer
PG["PostgreSQL 16<br/>+ pgvector<br/>:5433"]
MS["Meilisearch 1.6<br/>Full-text Search<br/>:7700"]
VK["Valkey 8<br/>Cache / Broker<br/>:6379"]
S3["AWS S3<br/>Document Storage"]
end
subgraph Background Processing
CeleryW["Celery Workers<br/>(4 queues)"]
CeleryB["Celery Beat<br/>Scheduler"]
end
subgraph External APIs
Claude["Anthropic Claude<br/>LLM API"]
Climatiq["Climatiq<br/>Carbon API"]
CORDIS["CORDIS<br/>EU Research"]
EUPortal["EU Funding<br/>Portal"]
Cohesion["Cohesion<br/>Open Data"]
InnovateUK["Innovate UK<br/>Gateway"]
end
Browser --> NextJS
NextJS --> FastAPI
FastAPI --> Auth
Auth --> Tenant
Tenant --> PG
FastAPI --> MS
FastAPI --> VK
FastAPI --> S3
FastAPI --> Claude
FastAPI --> Climatiq
CeleryW --> PG
CeleryW --> VK
CeleryW --> MS
CeleryW --> CORDIS
CeleryW --> EUPortal
CeleryW --> Cohesion
CeleryW --> InnovateUK
CeleryB --> VK Component Responsibilities¶
Presentation Layer¶
| Component | Technology | Responsibility |
|---|---|---|
| Next.js App | Next.js 14, React 18, TypeScript | Server-side rendering, client routing, UI components |
| UI Library | Tailwind CSS, shadcn/ui | Design system, accessible components |
| State Management | React Query (TanStack Query) | Server state caching, optimistic updates |
API Layer¶
| Component | Technology | Responsibility |
|---|---|---|
| FastAPI | Python 3.11, Pydantic | REST endpoints, request validation, OpenAPI docs |
| JWT Authentication | PyJWT, passlib, bcrypt | Token-based auth with access/refresh tokens |
| Tenant Middleware | Custom FastAPI deps | Multi-tenant context injection, cross-tenant prevention |
| Rate Limiter | Per-tenant throttling | Request rate limiting |
Data Layer¶
| Component | Technology | Responsibility |
|---|---|---|
| PostgreSQL | PostgreSQL 16 + pgvector | Primary data store, vector similarity (HNSW indexes), RLS |
| Meilisearch | Meilisearch 1.6 | Sub-100ms full-text search with facets and filters |
| Valkey | Valkey 8 (Redis-compatible) | Session cache, Celery broker/backend, rate limit counters |
| AWS S3 | S3 + pre-signed URLs | Document storage with tenant-prefixed paths |
Background Processing¶
| Component | Technology | Responsibility |
|---|---|---|
| Celery Workers | Celery 5.3, 4 queues | Grant scraping, embedding generation, email, notifications |
| Celery Beat | Periodic scheduler | Scheduled grant syncs, deadline reminders |
External APIs¶
| Service | Purpose | Rate Limit |
|---|---|---|
| Anthropic Claude | Application generation, content analysis | Per-key |
| Climatiq | GHG Protocol emission factor calculations | Per-key |
| CORDIS | EU research funding data | 1 req/sec |
| EU Funding Portal | Real-time EU grant opportunities | 2 req/sec |
| Cohesion Open Data | ERDF/ESF structural fund programs | 2 req/sec |
| Innovate UK | UK research funding (UKRI) | 2 req/sec |
Request Flow¶
Authenticated API Request¶
sequenceDiagram
participant C as Client
participant F as FastAPI
participant A as Auth Middleware
participant T as Tenant Middleware
participant DB as PostgreSQL
C->>F: GET /api/v1/companies (Bearer token)
F->>A: Validate JWT token
A->>A: Check token blacklist
A->>DB: Fetch user by email
A->>A: Verify user is active
A->>DB: Fetch tenant by tenant_id
A->>A: Verify tenant is active
A-->>F: Return User object
F->>T: Get tenant context
T-->>F: Return Tenant object
F->>DB: SELECT * FROM companies WHERE tenant_id = ?
DB-->>F: Company records
F-->>C: 200 OK (paginated response) Grant Search Flow¶
sequenceDiagram
participant C as Client
participant F as FastAPI
participant MS as Meilisearch
participant DB as PostgreSQL
C->>F: GET /api/v1/grants/search?q=solar+energy
F->>MS: Search "solar energy" (with filters)
alt Meilisearch Available
MS-->>F: Matching grant IDs + highlights
F->>DB: Fetch full grant records by IDs
DB-->>F: Complete grant data
else Meilisearch Unavailable
F->>DB: ILIKE search fallback
DB-->>F: Matching grants
end
F-->>C: 200 OK (search results with pagination) Grant Matching Flow¶
The matching pipeline processes company profiles against available grants through a multi-stage scoring algorithm.
flowchart TD
A["Company Profile"] --> B{"Country Match?"}
B -->|No| Z["Score: 0<br/>(Disqualified)"]
B -->|Yes| C{"Size Match?"}
C -->|No| Z
C -->|Yes| D["Rule-Based Score<br/>(30% weight)"]
C -->|Yes| E["Semantic Score<br/>(25% weight)"]
C -->|Yes| F["Carbon Score<br/>(25% weight)"]
C -->|Yes| G["Collaborative Score<br/>(10% weight)"]
C -->|Yes| H["Recency Score<br/>(10% weight)"]
D --> I["Weighted Sum"]
E --> I
F --> I
G --> I
H --> I
I --> J{"Carbon Focused<br/>Grant?"}
J -->|Yes| K["Apply 1.2x Bonus<br/>(capped at 1.0)"]
J -->|No| L["Final Score"]
K --> L Matching Algorithm¶
The hybrid matching system uses five weighted scoring components optimized for carbon funding.
Score Component Weights¶
| Component | Weight | Description |
|---|---|---|
| Rule-Based Criteria | 30% | Country, NACE codes, company size, legal form, eligibility criteria |
| Semantic Similarity | 25% | Cosine similarity between company and grant description embeddings (768-dim, all-mpnet-base-v2) |
| Carbon Alignment | 25% | Carbon category overlap, certification matching, EU Taxonomy objectives, scope compatibility |
| Collaborative Filtering | 10% | Peer interaction signals from similar companies (saved, viewed, dismissed patterns) |
| Recency Bonus | 10% | Deadline urgency scoring based on days until deadline |
Disqualifying Conditions¶
These conditions immediately return a score of 0, regardless of other component scores:
- Country mismatch: The company's country is not in the grant's eligible countries list.
- Size mismatch: The grant explicitly restricts company sizes and the company does not match.
Carbon Scoring Components¶
The 25% carbon alignment score is composed of:
- Carbon category overlap: Intersection between grant categories and company focus areas (e.g.,
energy_efficiency,renewable_energy,clean_technology) - Certification matching: Overlap between company certifications (ISO 14001, SBTi, CDP, B Corp) and grant requirements
- EU Taxonomy objective alignment: Matching taxonomy objectives (
climate_mitigation,climate_adaptation, etc.) - Scope compatibility: Whether the grant's eligible emission scopes match the company's reported scopes
- Green Deal and Fit-for-55 alignment: Boolean alignment checks
- CSRD compliance support: Whether the grant supports CSRD reporting
Special Rules¶
- Carbon-focused bonus: Grants marked as
is_carbon_focused = truereceive a 1.2x multiplier on the total score, capped at 1.0. - NACE code partial matching: Same two-digit sector prefix gets 50% of full match score.
- Recency scoring: Deadline within 14 days scores 1.0, within 30 days scores 0.9, decaying to 0.2 for distant deadlines.
Data Pipeline Architecture¶
The grant data pipeline runs as scheduled Celery tasks that scrape, normalize, classify, and embed grant data from multiple sources.
flowchart LR
subgraph Sources
S1["CORDIS"]
S2["EU Portal"]
S3["Cohesion Data"]
S4["Innovate UK"]
end
subgraph Pipeline
F["Fetch<br/>(async HTTP)"]
N["Normalize<br/>(common schema)"]
D["Deduplicate<br/>(content hash)"]
CC["Carbon Classify<br/>(category tagging)"]
E["Embed<br/>(all-mpnet-base-v2)"]
I["Index<br/>(Meilisearch)"]
end
subgraph Storage
PG["PostgreSQL"]
MS["Meilisearch"]
end
S1 --> F
S2 --> F
S3 --> F
S4 --> F
F --> N --> D --> CC --> E --> PG
E --> I --> MS Pipeline Stages:
- Fetch: Async HTTP clients with rate limiting and exponential backoff retry logic scrape each data source.
- Normalize: Source-specific data is mapped to the common
Grantschema. - Deduplicate: Content hashes prevent duplicate entries across sync runs.
- Carbon Classify: Grants are tagged with carbon categories, EU Taxonomy alignment, and sustainability flags.
- Embed: Title and description text are converted to 768-dimensional vectors using
all-mpnet-base-v2. - Index: Grant documents are indexed in Meilisearch for full-text search.
Performance Targets¶
| Metric | Target | Implementation |
|---|---|---|
| API response time | < 200ms | Async SQLAlchemy, connection pooling |
| Search latency | < 100ms for 100k documents | Meilisearch with PostgreSQL fallback |
| Matching calculation | < 500ms per company | Pre-computed embeddings, vectorized scoring |
| LLM application generation | < 3 seconds | Claude Sonnet 4 with streaming support |
Security Architecture¶
graph TB
subgraph Public Subnet
ALB["Application<br/>Load Balancer"]
CF["CloudFront<br/>CDN"]
end
subgraph Private Subnet
ECS["ECS Fargate<br/>(API + Workers)"]
end
subgraph Data Subnet
RDS["RDS PostgreSQL<br/>(encrypted)"]
EC["ElastiCache<br/>(Valkey)"]
end
subgraph Security
KMS["AWS KMS<br/>Encryption Keys"]
SM["Secrets Manager<br/>Credentials"]
WAF["WAF<br/>Firewall"]
end
Internet --> WAF --> ALB --> ECS
Internet --> CF
ECS --> RDS
ECS --> EC
RDS --> KMS
SM --> ECS Key security measures:
- Encryption at rest: AES-256 via AWS KMS
- Encryption in transit: TLS 1.2+
- Multi-tenant isolation: PostgreSQL Row-Level Security
- Secret management: AWS Secrets Manager (no credentials in code)
- Pre-commit hooks:
detect-secretsscanning for accidental credential exposure - JWT authentication: Access tokens (30 min) + refresh tokens (7 days) with blacklisting