System Architecture¶

This page describes the high-level architecture of Carbon Connect, the data flow between components, and the design of the carbon-enhanced matching algorithm.

System Overview¶

Carbon Connect follows a layered architecture with clear separation between the presentation layer (Next.js), the API layer (FastAPI), the data layer (PostgreSQL + Meilisearch), and the background processing layer (Celery workers).

graph TB
    subgraph Client Layer
        Browser["Browser"]
    end

    subgraph Presentation Layer
        NextJS["Next.js 14<br/>App Router<br/>:3000"]
    end

    subgraph API Layer
        FastAPI["FastAPI<br/>REST API<br/>:8000"]
        Auth["JWT Auth<br/>Middleware"]
        Tenant["Tenant<br/>Middleware"]
    end

    subgraph Data Layer
        PG["PostgreSQL 16<br/>+ pgvector<br/>:5433"]
        MS["Meilisearch 1.6<br/>Full-text Search<br/>:7700"]
        VK["Valkey 8<br/>Cache / Broker<br/>:6379"]
        S3["AWS S3<br/>Document Storage"]
    end

    subgraph Background Processing
        CeleryW["Celery Workers<br/>(4 queues)"]
        CeleryB["Celery Beat<br/>Scheduler"]
    end

    subgraph External APIs
        Claude["Anthropic Claude<br/>LLM API"]
        Climatiq["Climatiq<br/>Carbon API"]
        CORDIS["CORDIS<br/>EU Research"]
        EUPortal["EU Funding<br/>Portal"]
        Cohesion["Cohesion<br/>Open Data"]
        InnovateUK["Innovate UK<br/>Gateway"]
    end

    Browser --> NextJS
    NextJS --> FastAPI
    FastAPI --> Auth
    Auth --> Tenant
    Tenant --> PG
    FastAPI --> MS
    FastAPI --> VK
    FastAPI --> S3
    FastAPI --> Claude
    FastAPI --> Climatiq

    CeleryW --> PG
    CeleryW --> VK
    CeleryW --> MS
    CeleryW --> CORDIS
    CeleryW --> EUPortal
    CeleryW --> Cohesion
    CeleryW --> InnovateUK
    CeleryB --> VK

Component Responsibilities¶

Presentation Layer¶

Component	Technology	Responsibility
Next.js App	Next.js 14, React 18, TypeScript	Server-side rendering, client routing, UI components
UI Library	Tailwind CSS, shadcn/ui	Design system, accessible components
State Management	React Query (TanStack Query)	Server state caching, optimistic updates

API Layer¶

Component	Technology	Responsibility
FastAPI	Python 3.11, Pydantic	REST endpoints, request validation, OpenAPI docs
JWT Authentication	PyJWT, passlib, bcrypt	Token-based auth with access/refresh tokens
Tenant Middleware	Custom FastAPI deps	Multi-tenant context injection, cross-tenant prevention
Rate Limiter	Per-tenant throttling	Request rate limiting

Data Layer¶

Component	Technology	Responsibility
PostgreSQL	PostgreSQL 16 + pgvector	Primary data store, vector similarity (HNSW indexes), RLS
Meilisearch	Meilisearch 1.6	Sub-100ms full-text search with facets and filters
Valkey	Valkey 8 (Redis-compatible)	Session cache, Celery broker/backend, rate limit counters
AWS S3	S3 + pre-signed URLs	Document storage with tenant-prefixed paths

Background Processing¶

Component	Technology	Responsibility
Celery Workers	Celery 5.3, 4 queues	Grant scraping, embedding generation, email, notifications
Celery Beat	Periodic scheduler	Scheduled grant syncs, deadline reminders

External APIs¶

Service	Purpose	Rate Limit
Anthropic Claude	Application generation, content analysis	Per-key
Climatiq	GHG Protocol emission factor calculations	Per-key
CORDIS	EU research funding data	1 req/sec
EU Funding Portal	Real-time EU grant opportunities	2 req/sec
Cohesion Open Data	ERDF/ESF structural fund programs	2 req/sec
Innovate UK	UK research funding (UKRI)	2 req/sec

Request Flow¶

Authenticated API Request¶

sequenceDiagram
    participant C as Client
    participant F as FastAPI
    participant A as Auth Middleware
    participant T as Tenant Middleware
    participant DB as PostgreSQL

    C->>F: GET /api/v1/companies (Bearer token)
    F->>A: Validate JWT token
    A->>A: Check token blacklist
    A->>DB: Fetch user by email
    A->>A: Verify user is active
    A->>DB: Fetch tenant by tenant_id
    A->>A: Verify tenant is active
    A-->>F: Return User object
    F->>T: Get tenant context
    T-->>F: Return Tenant object
    F->>DB: SELECT * FROM companies WHERE tenant_id = ?
    DB-->>F: Company records
    F-->>C: 200 OK (paginated response)

Grant Search Flow¶

sequenceDiagram
    participant C as Client
    participant F as FastAPI
    participant MS as Meilisearch
    participant DB as PostgreSQL

    C->>F: GET /api/v1/grants/search?q=solar+energy
    F->>MS: Search "solar energy" (with filters)
    alt Meilisearch Available
        MS-->>F: Matching grant IDs + highlights
        F->>DB: Fetch full grant records by IDs
        DB-->>F: Complete grant data
    else Meilisearch Unavailable
        F->>DB: ILIKE search fallback
        DB-->>F: Matching grants
    end
    F-->>C: 200 OK (search results with pagination)

Grant Matching Flow¶

The matching pipeline processes company profiles against available grants through a multi-stage scoring algorithm.

flowchart TD
    A["Company Profile"] --> B{"Country Match?"}
    B -->|No| Z["Score: 0<br/>(Disqualified)"]
    B -->|Yes| C{"Size Match?"}
    C -->|No| Z
    C -->|Yes| D["Rule-Based Score<br/>(30% weight)"]
    C -->|Yes| E["Semantic Score<br/>(25% weight)"]
    C -->|Yes| F["Carbon Score<br/>(25% weight)"]
    C -->|Yes| G["Collaborative Score<br/>(10% weight)"]
    C -->|Yes| H["Recency Score<br/>(10% weight)"]
    D --> I["Weighted Sum"]
    E --> I
    F --> I
    G --> I
    H --> I
    I --> J{"Carbon Focused<br/>Grant?"}
    J -->|Yes| K["Apply 1.2x Bonus<br/>(capped at 1.0)"]
    J -->|No| L["Final Score"]
    K --> L

Matching Algorithm¶

The hybrid matching system uses five weighted scoring components optimized for carbon funding.

Score Component Weights¶

Component	Weight	Description
Rule-Based Criteria	30%	Country, NACE codes, company size, legal form, eligibility criteria
Semantic Similarity	25%	Cosine similarity between company and grant description embeddings (768-dim, all-mpnet-base-v2)
Carbon Alignment	25%	Carbon category overlap, certification matching, EU Taxonomy objectives, scope compatibility
Collaborative Filtering	10%	Peer interaction signals from similar companies (saved, viewed, dismissed patterns)
Recency Bonus	10%	Deadline urgency scoring based on days until deadline

Disqualifying Conditions¶

These conditions immediately return a score of 0, regardless of other component scores:

Country mismatch: The company's country is not in the grant's eligible countries list.
Size mismatch: The grant explicitly restricts company sizes and the company does not match.

Carbon Scoring Components¶

The 25% carbon alignment score is composed of:

Carbon category overlap: Intersection between grant categories and company focus areas (e.g., energy_efficiency, renewable_energy, clean_technology)
Certification matching: Overlap between company certifications (ISO 14001, SBTi, CDP, B Corp) and grant requirements
EU Taxonomy objective alignment: Matching taxonomy objectives (climate_mitigation, climate_adaptation, etc.)
Scope compatibility: Whether the grant's eligible emission scopes match the company's reported scopes
Green Deal and Fit-for-55 alignment: Boolean alignment checks
CSRD compliance support: Whether the grant supports CSRD reporting

Special Rules¶

Carbon-focused bonus: Grants marked as is_carbon_focused = true receive a 1.2x multiplier on the total score, capped at 1.0.
NACE code partial matching: Same two-digit sector prefix gets 50% of full match score.
Recency scoring: Deadline within 14 days scores 1.0, within 30 days scores 0.9, decaying to 0.2 for distant deadlines.

Data Pipeline Architecture¶

The grant data pipeline runs as scheduled Celery tasks that scrape, normalize, classify, and embed grant data from multiple sources.

flowchart LR
    subgraph Sources
        S1["CORDIS"]
        S2["EU Portal"]
        S3["Cohesion Data"]
        S4["Innovate UK"]
    end

    subgraph Pipeline
        F["Fetch<br/>(async HTTP)"]
        N["Normalize<br/>(common schema)"]
        D["Deduplicate<br/>(content hash)"]
        CC["Carbon Classify<br/>(category tagging)"]
        E["Embed<br/>(all-mpnet-base-v2)"]
        I["Index<br/>(Meilisearch)"]
    end

    subgraph Storage
        PG["PostgreSQL"]
        MS["Meilisearch"]
    end

    S1 --> F
    S2 --> F
    S3 --> F
    S4 --> F
    F --> N --> D --> CC --> E --> PG
    E --> I --> MS

Pipeline Stages:

Fetch: Async HTTP clients with rate limiting and exponential backoff retry logic scrape each data source.
Normalize: Source-specific data is mapped to the common Grant schema.
Deduplicate: Content hashes prevent duplicate entries across sync runs.
Carbon Classify: Grants are tagged with carbon categories, EU Taxonomy alignment, and sustainability flags.
Embed: Title and description text are converted to 768-dimensional vectors using all-mpnet-base-v2.
Index: Grant documents are indexed in Meilisearch for full-text search.

Performance Targets¶

Metric	Target	Implementation
API response time	< 200ms	Async SQLAlchemy, connection pooling
Search latency	< 100ms for 100k documents	Meilisearch with PostgreSQL fallback
Matching calculation	< 500ms per company	Pre-computed embeddings, vectorized scoring
LLM application generation	< 3 seconds	Claude Sonnet 4 with streaming support

Security Architecture¶

graph TB
    subgraph Public Subnet
        ALB["Application<br/>Load Balancer"]
        CF["CloudFront<br/>CDN"]
    end

    subgraph Private Subnet
        ECS["ECS Fargate<br/>(API + Workers)"]
    end

    subgraph Data Subnet
        RDS["RDS PostgreSQL<br/>(encrypted)"]
        EC["ElastiCache<br/>(Valkey)"]
    end

    subgraph Security
        KMS["AWS KMS<br/>Encryption Keys"]
        SM["Secrets Manager<br/>Credentials"]
        WAF["WAF<br/>Firewall"]
    end

    Internet --> WAF --> ALB --> ECS
    Internet --> CF
    ECS --> RDS
    ECS --> EC
    RDS --> KMS
    SM --> ECS

Key security measures:

Encryption at rest: AES-256 via AWS KMS
Encryption in transit: TLS 1.2+
Multi-tenant isolation: PostgreSQL Row-Level Security
Secret management: AWS Secrets Manager (no credentials in code)
Pre-commit hooks: detect-secrets scanning for accidental credential exposure
JWT authentication: Access tokens (30 min) + refresh tokens (7 days) with blacklisting