Skip to content

Matching Engine

The matching engine is the core intelligence layer of Carbon Connect. It implements a hybrid scoring algorithm that evaluates how well a grant opportunity aligns with a company profile across five weighted dimensions.

Source: backend/app/services/matching_engine.py


Architecture Overview

flowchart LR
    A[Company Profile] --> B[Matching Engine]
    C[Grant Database] --> B
    B --> D{Hybrid Scoring}
    D --> E[Rule-Based<br/>30%]
    D --> F[Semantic<br/>25%]
    D --> G[Carbon<br/>25%]
    D --> H[Collaborative<br/>10%]
    D --> I[Recency<br/>10%]
    E --> J[Final Score]
    F --> J
    G --> J
    H --> J
    I --> J
    J --> K[Ranked Matches]

Weight Breakdown

The hybrid scoring system allocates weights as follows:

Component Weight Constant Description
Semantic Similarity 25% SEMANTIC_WEIGHT = 0.25 Embedding cosine similarity between company and grant descriptions
Rule-Based Criteria 30% RULE_WEIGHT = 0.30 Hard criteria: country, NACE codes, company size
Carbon Alignment 25% CARBON_WEIGHT = 0.25 Carbon categories, certifications, taxonomy, scopes
Collaborative Filtering 10% COLLABORATIVE_WEIGHT = 0.10 Peer interaction signals from similar companies
Recency Bonus 10% RECENCY_WEIGHT = 0.10 Deadline urgency scoring

The final score is calculated as:

final_score = (semantic * 0.25) + (rule * 0.30) + (carbon * 0.25) + (collaborative * 0.10) + (recency * 0.10)

Carbon-focused grants receive a 1.2x multiplier (CARBON_FOCUS_BONUS = 1.2), capped at 1.0.


Scoring Components

Rule-Based Scoring (30%)

The rule-based component evaluates hard eligibility criteria. It is subdivided internally:

Sub-component Internal Weight Behavior
Country match 40% Disqualifying -- returns 0.0 if no match
NACE code match 35% Exact match = full score; same section (first 2 digits) = 50%
Company size match 25% Disqualifying -- returns 0.0 if grant restricts sizes and company does not qualify

Disqualifying Criteria

Hard Disqualifiers

Two criteria cause the entire match to score 0.0, regardless of other components:

  1. Country mismatch -- If the company's country is not in the grant's eligible countries list, the match is disqualified.
  2. Company size mismatch -- If the grant explicitly restricts eligible company sizes and the company's size category does not match, the match is disqualified.

Company Size Classification

The engine maps employee counts to EU SME categories:

def _get_company_size(employee_count: int | None) -> str | None:
    if employee_count is None:
        return None
    if employee_count < 10:
        return "micro"
    if employee_count < 50:
        return "small"
    if employee_count < 250:
        return "medium"
    return "large"

NACE Code Partial Matching

NACE codes are matched at two levels of granularity:

  • Exact match (e.g., C25.1 matches C25.1): Full NACE weight (35%)
  • Section match (e.g., C25 matches C28 -- both in section C2): 50% of NACE weight (17.5%)
# Full match
if company_naces & grant_naces:
    score += nace_weight

# Partial match on first 2 digits (same sector)
elif company_sections & grant_sections:
    score += nace_weight * 0.5

Semantic Similarity Scoring (25%)

The semantic component uses 768-dimensional embeddings from the all-mpnet-base-v2 sentence transformer model to compute cosine similarity between company and grant descriptions.

def calculate_semantic_score(company: Company, grant: Grant) -> float:
    company_embedding = company.description_embedding
    grant_embedding = grant.description_embedding

    if company_embedding is not None and grant_embedding is not None:
        service = get_embedding_service()
        return service.compute_similarity(company_embedding, grant_embedding)

    # Fallback heuristic when embeddings are unavailable
    return _heuristic_text_similarity(company, grant)

When embeddings are not available (e.g., newly created companies or grants), a keyword-based heuristic fallback provides a baseline score.


Carbon Alignment Scoring (25%)

The carbon component evaluates alignment across multiple dimensions:

Sub-component Description
Category overlap Intersection of company's carbon focus areas and grant's carbon_categories
Certification matching Matching certifications: ISO 14001, SBTi, CDP, B Corp
EU Taxonomy alignment Overlap of taxonomy objectives (climate mitigation, adaptation, etc.)
Scope compatibility Grant's eligible_scopes vs. company's reported scopes (1, 2, 3)
Reduction target alignment Whether company's reduction targets meet grant's min_emission_reduction_percent

Carbon Categories

The engine recognizes 14 carbon categories for matching:

CARBON_CATEGORIES = [
    "energy_efficiency",
    "renewable_energy",
    "clean_technology",
    "circular_economy",
    "sustainable_transport",
    "green_buildings",
    "carbon_capture",
    "hydrogen",
    "industrial_decarbonization",
    "sustainable_agriculture",
    "biodiversity",
    "water_management",
    "waste_reduction",
    "climate_adaptation",
]

Carbon Focus Bonus

CARBON_FOCUS_BONUS = 1.2

Grants where is_carbon_focused = True receive a 1.2x multiplier on the final score, capped at 1.0. This ensures carbon-aligned grants are surfaced preferentially.


Collaborative Filtering (10%)

The collaborative component analyzes interaction patterns from similar companies to provide peer-based signals:

  • Companies with overlapping NACE codes and country
  • Interaction signals: saved, viewed, applied
  • Grants that peers have saved or applied to receive a boost

This score is derived from the Match table's interaction data (interaction_type and counts).


Recency Scoring (10%)

Deadline urgency is scored on a decay curve:

Days to Deadline Score Label
Less than 14 days 1.0 Urgent
14--30 days 0.9 Soon
30--60 days 0.7 Moderate
60--90 days 0.5 Normal
90--180 days 0.3 Distant
More than 180 days 0.2 Far
No deadline set 0.5 Default

Match Result Data Class

Every match calculation produces a MatchResult with full score breakdown:

@dataclass
class MatchResult:
    grant: Grant
    score: float           # Final weighted score (0.0 to 1.0)
    rule_score: float      # Rule-based component
    semantic_score: float  # Embedding similarity
    carbon_score: float    # Carbon alignment
    collaborative_score: float  # Peer signals
    recency_score: float   # Deadline urgency
    match_reasons: list[str]  # Human-readable reasons

Match Persistence

Matches are persisted to the database using an upsert strategy:

  • If a match already exists for a (company_id, grant_id) pair, the scores are updated
  • If no match exists, a new record is created
  • Match reasons are stored as a list of human-readable strings
  • Individual component scores are stored for transparency and debugging

Match Refresh Flow

sequenceDiagram
    participant API as Matches API
    participant Engine as Matching Engine
    participant DB as PostgreSQL

    API->>Engine: refresh_matches(company_id)
    Engine->>DB: Fetch company profile
    Engine->>DB: Fetch active grants
    loop For each grant
        Engine->>Engine: calculate_rule_score()
        Engine->>Engine: calculate_semantic_score()
        Engine->>Engine: calculate_carbon_score()
        Engine->>Engine: calculate_collaborative_score()
        Engine->>Engine: calculate_recency_score()
        Engine->>Engine: Apply carbon focus bonus
    end
    Engine->>DB: Upsert match results
    Engine-->>API: Return ranked matches

API Endpoints

The matching engine is exposed through the following endpoints:

Method Endpoint Description
GET /api/v1/matches/{company_id} Real-time match calculation
POST /api/v1/matches/{company_id}/refresh Refresh and persist matches
GET /api/v1/matches/{company_id}/saved Get saved matches
PATCH /api/v1/matches/{company_id}/{grant_id}/view Mark as viewed
PATCH /api/v1/matches/{company_id}/{grant_id}/save Toggle saved
PATCH /api/v1/matches/{company_id}/{grant_id}/dismiss Toggle dismissed
POST /api/v1/matches/calculate Trigger match calculation
GET /api/v1/matches/stats Match statistics

Performance Considerations

  • Target: Match calculation completes within 500ms per company
  • Embedding comparison is O(n) where n = number of active grants
  • Database queries use composite indexes on (tenant_id, company_id) and (tenant_id, grant_id)
  • Batch processing is used for refresh operations to avoid N+1 queries
  • Carbon-focused index on is_carbon_focused accelerates filtered queries

Testing

The matching engine has 32 dedicated tests covering:

  • Unit tests for each scoring component
  • Integration tests with real database records
  • API endpoint tests
  • Performance tests ensuring sub-500ms calculation time
  • Edge cases: missing embeddings, no carbon profile, empty grant lists