Skip to content

System Architecture

This page describes the high-level architecture of Carbon Connect, the data flow between components, and the design of the carbon-enhanced matching algorithm.


System Overview

Carbon Connect follows a layered architecture with clear separation between the presentation layer (Next.js), the API layer (FastAPI), the data layer (PostgreSQL + Meilisearch), and the background processing layer (Celery workers).

graph TB
    subgraph Client Layer
        Browser["Browser"]
    end

    subgraph Presentation Layer
        NextJS["Next.js 14<br/>App Router<br/>:3000"]
    end

    subgraph API Layer
        FastAPI["FastAPI<br/>REST API<br/>:8000"]
        Auth["JWT Auth<br/>Middleware"]
        Tenant["Tenant<br/>Middleware"]
    end

    subgraph Data Layer
        PG["PostgreSQL 16<br/>+ pgvector<br/>:5433"]
        MS["Meilisearch 1.6<br/>Full-text Search<br/>:7700"]
        VK["Valkey 8<br/>Cache / Broker<br/>:6379"]
        S3["AWS S3<br/>Document Storage"]
    end

    subgraph Background Processing
        CeleryW["Celery Workers<br/>(4 queues)"]
        CeleryB["Celery Beat<br/>Scheduler"]
    end

    subgraph External APIs
        Claude["Anthropic Claude<br/>LLM API"]
        Climatiq["Climatiq<br/>Carbon API"]
        CORDIS["CORDIS<br/>EU Research"]
        EUPortal["EU Funding<br/>Portal"]
        Cohesion["Cohesion<br/>Open Data"]
        InnovateUK["Innovate UK<br/>Gateway"]
    end

    Browser --> NextJS
    NextJS --> FastAPI
    FastAPI --> Auth
    Auth --> Tenant
    Tenant --> PG
    FastAPI --> MS
    FastAPI --> VK
    FastAPI --> S3
    FastAPI --> Claude
    FastAPI --> Climatiq

    CeleryW --> PG
    CeleryW --> VK
    CeleryW --> MS
    CeleryW --> CORDIS
    CeleryW --> EUPortal
    CeleryW --> Cohesion
    CeleryW --> InnovateUK
    CeleryB --> VK

Component Responsibilities

Presentation Layer

Component Technology Responsibility
Next.js App Next.js 14, React 18, TypeScript Server-side rendering, client routing, UI components
UI Library Tailwind CSS, shadcn/ui Design system, accessible components
State Management React Query (TanStack Query) Server state caching, optimistic updates

API Layer

Component Technology Responsibility
FastAPI Python 3.11, Pydantic REST endpoints, request validation, OpenAPI docs
JWT Authentication PyJWT, passlib, bcrypt Token-based auth with access/refresh tokens
Tenant Middleware Custom FastAPI deps Multi-tenant context injection, cross-tenant prevention
Rate Limiter Per-tenant throttling Request rate limiting

Data Layer

Component Technology Responsibility
PostgreSQL PostgreSQL 16 + pgvector Primary data store, vector similarity (HNSW indexes), RLS
Meilisearch Meilisearch 1.6 Sub-100ms full-text search with facets and filters
Valkey Valkey 8 (Redis-compatible) Session cache, Celery broker/backend, rate limit counters
AWS S3 S3 + pre-signed URLs Document storage with tenant-prefixed paths

Background Processing

Component Technology Responsibility
Celery Workers Celery 5.3, 4 queues Grant scraping, embedding generation, email, notifications
Celery Beat Periodic scheduler Scheduled grant syncs, deadline reminders

External APIs

Service Purpose Rate Limit
Anthropic Claude Application generation, content analysis Per-key
Climatiq GHG Protocol emission factor calculations Per-key
CORDIS EU research funding data 1 req/sec
EU Funding Portal Real-time EU grant opportunities 2 req/sec
Cohesion Open Data ERDF/ESF structural fund programs 2 req/sec
Innovate UK UK research funding (UKRI) 2 req/sec

Request Flow

Authenticated API Request

sequenceDiagram
    participant C as Client
    participant F as FastAPI
    participant A as Auth Middleware
    participant T as Tenant Middleware
    participant DB as PostgreSQL

    C->>F: GET /api/v1/companies (Bearer token)
    F->>A: Validate JWT token
    A->>A: Check token blacklist
    A->>DB: Fetch user by email
    A->>A: Verify user is active
    A->>DB: Fetch tenant by tenant_id
    A->>A: Verify tenant is active
    A-->>F: Return User object
    F->>T: Get tenant context
    T-->>F: Return Tenant object
    F->>DB: SELECT * FROM companies WHERE tenant_id = ?
    DB-->>F: Company records
    F-->>C: 200 OK (paginated response)

Grant Search Flow

sequenceDiagram
    participant C as Client
    participant F as FastAPI
    participant MS as Meilisearch
    participant DB as PostgreSQL

    C->>F: GET /api/v1/grants/search?q=solar+energy
    F->>MS: Search "solar energy" (with filters)
    alt Meilisearch Available
        MS-->>F: Matching grant IDs + highlights
        F->>DB: Fetch full grant records by IDs
        DB-->>F: Complete grant data
    else Meilisearch Unavailable
        F->>DB: ILIKE search fallback
        DB-->>F: Matching grants
    end
    F-->>C: 200 OK (search results with pagination)

Grant Matching Flow

The matching pipeline processes company profiles against available grants through a multi-stage scoring algorithm.

flowchart TD
    A["Company Profile"] --> B{"Country Match?"}
    B -->|No| Z["Score: 0<br/>(Disqualified)"]
    B -->|Yes| C{"Size Match?"}
    C -->|No| Z
    C -->|Yes| D["Rule-Based Score<br/>(30% weight)"]
    C -->|Yes| E["Semantic Score<br/>(25% weight)"]
    C -->|Yes| F["Carbon Score<br/>(25% weight)"]
    C -->|Yes| G["Collaborative Score<br/>(10% weight)"]
    C -->|Yes| H["Recency Score<br/>(10% weight)"]
    D --> I["Weighted Sum"]
    E --> I
    F --> I
    G --> I
    H --> I
    I --> J{"Carbon Focused<br/>Grant?"}
    J -->|Yes| K["Apply 1.2x Bonus<br/>(capped at 1.0)"]
    J -->|No| L["Final Score"]
    K --> L

Matching Algorithm

The hybrid matching system uses five weighted scoring components optimized for carbon funding.

Score Component Weights

Component Weight Description
Rule-Based Criteria 30% Country, NACE codes, company size, legal form, eligibility criteria
Semantic Similarity 25% Cosine similarity between company and grant description embeddings (768-dim, all-mpnet-base-v2)
Carbon Alignment 25% Carbon category overlap, certification matching, EU Taxonomy objectives, scope compatibility
Collaborative Filtering 10% Peer interaction signals from similar companies (saved, viewed, dismissed patterns)
Recency Bonus 10% Deadline urgency scoring based on days until deadline

Disqualifying Conditions

These conditions immediately return a score of 0, regardless of other component scores:

  1. Country mismatch: The company's country is not in the grant's eligible countries list.
  2. Size mismatch: The grant explicitly restricts company sizes and the company does not match.

Carbon Scoring Components

The 25% carbon alignment score is composed of:

  • Carbon category overlap: Intersection between grant categories and company focus areas (e.g., energy_efficiency, renewable_energy, clean_technology)
  • Certification matching: Overlap between company certifications (ISO 14001, SBTi, CDP, B Corp) and grant requirements
  • EU Taxonomy objective alignment: Matching taxonomy objectives (climate_mitigation, climate_adaptation, etc.)
  • Scope compatibility: Whether the grant's eligible emission scopes match the company's reported scopes
  • Green Deal and Fit-for-55 alignment: Boolean alignment checks
  • CSRD compliance support: Whether the grant supports CSRD reporting

Special Rules

  • Carbon-focused bonus: Grants marked as is_carbon_focused = true receive a 1.2x multiplier on the total score, capped at 1.0.
  • NACE code partial matching: Same two-digit sector prefix gets 50% of full match score.
  • Recency scoring: Deadline within 14 days scores 1.0, within 30 days scores 0.9, decaying to 0.2 for distant deadlines.

Data Pipeline Architecture

The grant data pipeline runs as scheduled Celery tasks that scrape, normalize, classify, and embed grant data from multiple sources.

flowchart LR
    subgraph Sources
        S1["CORDIS"]
        S2["EU Portal"]
        S3["Cohesion Data"]
        S4["Innovate UK"]
    end

    subgraph Pipeline
        F["Fetch<br/>(async HTTP)"]
        N["Normalize<br/>(common schema)"]
        D["Deduplicate<br/>(content hash)"]
        CC["Carbon Classify<br/>(category tagging)"]
        E["Embed<br/>(all-mpnet-base-v2)"]
        I["Index<br/>(Meilisearch)"]
    end

    subgraph Storage
        PG["PostgreSQL"]
        MS["Meilisearch"]
    end

    S1 --> F
    S2 --> F
    S3 --> F
    S4 --> F
    F --> N --> D --> CC --> E --> PG
    E --> I --> MS

Pipeline Stages:

  1. Fetch: Async HTTP clients with rate limiting and exponential backoff retry logic scrape each data source.
  2. Normalize: Source-specific data is mapped to the common Grant schema.
  3. Deduplicate: Content hashes prevent duplicate entries across sync runs.
  4. Carbon Classify: Grants are tagged with carbon categories, EU Taxonomy alignment, and sustainability flags.
  5. Embed: Title and description text are converted to 768-dimensional vectors using all-mpnet-base-v2.
  6. Index: Grant documents are indexed in Meilisearch for full-text search.

Performance Targets

Metric Target Implementation
API response time < 200ms Async SQLAlchemy, connection pooling
Search latency < 100ms for 100k documents Meilisearch with PostgreSQL fallback
Matching calculation < 500ms per company Pre-computed embeddings, vectorized scoring
LLM application generation < 3 seconds Claude Sonnet 4 with streaming support

Security Architecture

graph TB
    subgraph Public Subnet
        ALB["Application<br/>Load Balancer"]
        CF["CloudFront<br/>CDN"]
    end

    subgraph Private Subnet
        ECS["ECS Fargate<br/>(API + Workers)"]
    end

    subgraph Data Subnet
        RDS["RDS PostgreSQL<br/>(encrypted)"]
        EC["ElastiCache<br/>(Valkey)"]
    end

    subgraph Security
        KMS["AWS KMS<br/>Encryption Keys"]
        SM["Secrets Manager<br/>Credentials"]
        WAF["WAF<br/>Firewall"]
    end

    Internet --> WAF --> ALB --> ECS
    Internet --> CF
    ECS --> RDS
    ECS --> EC
    RDS --> KMS
    SM --> ECS

Key security measures:

  • Encryption at rest: AES-256 via AWS KMS
  • Encryption in transit: TLS 1.2+
  • Multi-tenant isolation: PostgreSQL Row-Level Security
  • Secret management: AWS Secrets Manager (no credentials in code)
  • Pre-commit hooks: detect-secrets scanning for accidental credential exposure
  • JWT authentication: Access tokens (30 min) + refresh tokens (7 days) with blacklisting