Matching Engine¶

The matching engine is the core intelligence layer of Carbon Connect. It implements a hybrid scoring algorithm that evaluates how well a grant opportunity aligns with a company profile across five weighted dimensions.

Source: backend/app/services/matching_engine.py

Architecture Overview¶

flowchart LR
    A[Company Profile] --> B[Matching Engine]
    C[Grant Database] --> B
    B --> D{Hybrid Scoring}
    D --> E[Rule-Based<br/>30%]
    D --> F[Semantic<br/>25%]
    D --> G[Carbon<br/>25%]
    D --> H[Collaborative<br/>10%]
    D --> I[Recency<br/>10%]
    E --> J[Final Score]
    F --> J
    G --> J
    H --> J
    I --> J
    J --> K[Ranked Matches]

Weight Breakdown¶

The hybrid scoring system allocates weights as follows:

Component	Weight	Constant	Description
Semantic Similarity	25%	`SEMANTIC_WEIGHT = 0.25`	Embedding cosine similarity between company and grant descriptions
Rule-Based Criteria	30%	`RULE_WEIGHT = 0.30`	Hard criteria: country, NACE codes, company size
Carbon Alignment	25%	`CARBON_WEIGHT = 0.25`	Carbon categories, certifications, taxonomy, scopes
Collaborative Filtering	10%	`COLLABORATIVE_WEIGHT = 0.10`	Peer interaction signals from similar companies
Recency Bonus	10%	`RECENCY_WEIGHT = 0.10`	Deadline urgency scoring

The final score is calculated as:

final_score = (semantic * 0.25) + (rule * 0.30) + (carbon * 0.25) + (collaborative * 0.10) + (recency * 0.10)

Carbon-focused grants receive a 1.2x multiplier (CARBON_FOCUS_BONUS = 1.2), capped at 1.0.

Scoring Components¶

Rule-Based Scoring (30%)¶

The rule-based component evaluates hard eligibility criteria. It is subdivided internally:

Sub-component	Internal Weight	Behavior
Country match	40%	Disqualifying -- returns 0.0 if no match
NACE code match	35%	Exact match = full score; same section (first 2 digits) = 50%
Company size match	25%	Disqualifying -- returns 0.0 if grant restricts sizes and company does not qualify

Disqualifying Criteria¶

Hard Disqualifiers

Two criteria cause the entire match to score 0.0, regardless of other components:

Country mismatch -- If the company's country is not in the grant's eligible countries list, the match is disqualified.
Company size mismatch -- If the grant explicitly restricts eligible company sizes and the company's size category does not match, the match is disqualified.

Company Size Classification¶

The engine maps employee counts to EU SME categories:

def _get_company_size(employee_count: int | None) -> str | None:
    if employee_count is None:
        return None
    if employee_count < 10:
        return "micro"
    if employee_count < 50:
        return "small"
    if employee_count < 250:
        return "medium"
    return "large"

NACE Code Partial Matching¶

NACE codes are matched at two levels of granularity:

Exact match (e.g., C25.1 matches C25.1): Full NACE weight (35%)
Section match (e.g., C25 matches C28 -- both in section C2): 50% of NACE weight (17.5%)

# Full match
if company_naces & grant_naces:
    score += nace_weight

# Partial match on first 2 digits (same sector)
elif company_sections & grant_sections:
    score += nace_weight * 0.5

Semantic Similarity Scoring (25%)¶

The semantic component uses 768-dimensional embeddings from the all-mpnet-base-v2 sentence transformer model to compute cosine similarity between company and grant descriptions.

def calculate_semantic_score(company: Company, grant: Grant) -> float:
    company_embedding = company.description_embedding
    grant_embedding = grant.description_embedding

    if company_embedding is not None and grant_embedding is not None:
        service = get_embedding_service()
        return service.compute_similarity(company_embedding, grant_embedding)

    # Fallback heuristic when embeddings are unavailable
    return _heuristic_text_similarity(company, grant)

When embeddings are not available (e.g., newly created companies or grants), a keyword-based heuristic fallback provides a baseline score.

Carbon Alignment Scoring (25%)¶

The carbon component evaluates alignment across multiple dimensions:

Sub-component	Description
Category overlap	Intersection of company's carbon focus areas and grant's `carbon_categories`
Certification matching	Matching certifications: ISO 14001, SBTi, CDP, B Corp
EU Taxonomy alignment	Overlap of taxonomy objectives (climate mitigation, adaptation, etc.)
Scope compatibility	Grant's `eligible_scopes` vs. company's reported scopes (1, 2, 3)
Reduction target alignment	Whether company's reduction targets meet grant's `min_emission_reduction_percent`

Carbon Categories¶

The engine recognizes 14 carbon categories for matching:

CARBON_CATEGORIES = [
    "energy_efficiency",
    "renewable_energy",
    "clean_technology",
    "circular_economy",
    "sustainable_transport",
    "green_buildings",
    "carbon_capture",
    "hydrogen",
    "industrial_decarbonization",
    "sustainable_agriculture",
    "biodiversity",
    "water_management",
    "waste_reduction",
    "climate_adaptation",
]

Carbon Focus Bonus¶

CARBON_FOCUS_BONUS = 1.2

Grants where is_carbon_focused = True receive a 1.2x multiplier on the final score, capped at 1.0. This ensures carbon-aligned grants are surfaced preferentially.

Collaborative Filtering (10%)¶

The collaborative component analyzes interaction patterns from similar companies to provide peer-based signals:

Companies with overlapping NACE codes and country
Interaction signals: saved, viewed, applied
Grants that peers have saved or applied to receive a boost

This score is derived from the Match table's interaction data (interaction_type and counts).

Recency Scoring (10%)¶

Deadline urgency is scored on a decay curve:

Days to Deadline	Score	Label
Less than 14 days	1.0	Urgent
14--30 days	0.9	Soon
30--60 days	0.7	Moderate
60--90 days	0.5	Normal
90--180 days	0.3	Distant
More than 180 days	0.2	Far
No deadline set	0.5	Default

Match Result Data Class¶

Every match calculation produces a MatchResult with full score breakdown:

@dataclass
class MatchResult:
    grant: Grant
    score: float           # Final weighted score (0.0 to 1.0)
    rule_score: float      # Rule-based component
    semantic_score: float  # Embedding similarity
    carbon_score: float    # Carbon alignment
    collaborative_score: float  # Peer signals
    recency_score: float   # Deadline urgency
    match_reasons: list[str]  # Human-readable reasons

Match Persistence¶

Matches are persisted to the database using an upsert strategy:

If a match already exists for a (company_id, grant_id) pair, the scores are updated
If no match exists, a new record is created
Match reasons are stored as a list of human-readable strings
Individual component scores are stored for transparency and debugging

Match Refresh Flow¶

sequenceDiagram
    participant API as Matches API
    participant Engine as Matching Engine
    participant DB as PostgreSQL

    API->>Engine: refresh_matches(company_id)
    Engine->>DB: Fetch company profile
    Engine->>DB: Fetch active grants
    loop For each grant
        Engine->>Engine: calculate_rule_score()
        Engine->>Engine: calculate_semantic_score()
        Engine->>Engine: calculate_carbon_score()
        Engine->>Engine: calculate_collaborative_score()
        Engine->>Engine: calculate_recency_score()
        Engine->>Engine: Apply carbon focus bonus
    end
    Engine->>DB: Upsert match results
    Engine-->>API: Return ranked matches

API Endpoints¶

The matching engine is exposed through the following endpoints:

Method	Endpoint	Description
`GET`	`/api/v1/matches/{company_id}`	Real-time match calculation
`POST`	`/api/v1/matches/{company_id}/refresh`	Refresh and persist matches
`GET`	`/api/v1/matches/{company_id}/saved`	Get saved matches
`PATCH`	`/api/v1/matches/{company_id}/{grant_id}/view`	Mark as viewed
`PATCH`	`/api/v1/matches/{company_id}/{grant_id}/save`	Toggle saved
`PATCH`	`/api/v1/matches/{company_id}/{grant_id}/dismiss`	Toggle dismissed
`POST`	`/api/v1/matches/calculate`	Trigger match calculation
`GET`	`/api/v1/matches/stats`	Match statistics

Performance Considerations¶

Target: Match calculation completes within 500ms per company
Embedding comparison is O(n) where n = number of active grants
Database queries use composite indexes on (tenant_id, company_id) and (tenant_id, grant_id)
Batch processing is used for refresh operations to avoid N+1 queries
Carbon-focused index on is_carbon_focused accelerates filtered queries

Testing¶

The matching engine has 32 dedicated tests covering:

Unit tests for each scoring component
Integration tests with real database records
API endpoint tests
Performance tests ensuring sub-500ms calculation time
Edge cases: missing embeddings, no carbon profile, empty grant lists