Matching Engine¶
The matching engine is the core intelligence layer of Carbon Connect. It implements a hybrid scoring algorithm that evaluates how well a grant opportunity aligns with a company profile across five weighted dimensions.
Source: backend/app/services/matching_engine.py
Architecture Overview¶
flowchart LR
A[Company Profile] --> B[Matching Engine]
C[Grant Database] --> B
B --> D{Hybrid Scoring}
D --> E[Rule-Based<br/>30%]
D --> F[Semantic<br/>25%]
D --> G[Carbon<br/>25%]
D --> H[Collaborative<br/>10%]
D --> I[Recency<br/>10%]
E --> J[Final Score]
F --> J
G --> J
H --> J
I --> J
J --> K[Ranked Matches] Weight Breakdown¶
The hybrid scoring system allocates weights as follows:
| Component | Weight | Constant | Description |
|---|---|---|---|
| Semantic Similarity | 25% | SEMANTIC_WEIGHT = 0.25 | Embedding cosine similarity between company and grant descriptions |
| Rule-Based Criteria | 30% | RULE_WEIGHT = 0.30 | Hard criteria: country, NACE codes, company size |
| Carbon Alignment | 25% | CARBON_WEIGHT = 0.25 | Carbon categories, certifications, taxonomy, scopes |
| Collaborative Filtering | 10% | COLLABORATIVE_WEIGHT = 0.10 | Peer interaction signals from similar companies |
| Recency Bonus | 10% | RECENCY_WEIGHT = 0.10 | Deadline urgency scoring |
The final score is calculated as:
final_score = (semantic * 0.25) + (rule * 0.30) + (carbon * 0.25) + (collaborative * 0.10) + (recency * 0.10)
Carbon-focused grants receive a 1.2x multiplier (CARBON_FOCUS_BONUS = 1.2), capped at 1.0.
Scoring Components¶
Rule-Based Scoring (30%)¶
The rule-based component evaluates hard eligibility criteria. It is subdivided internally:
| Sub-component | Internal Weight | Behavior |
|---|---|---|
| Country match | 40% | Disqualifying -- returns 0.0 if no match |
| NACE code match | 35% | Exact match = full score; same section (first 2 digits) = 50% |
| Company size match | 25% | Disqualifying -- returns 0.0 if grant restricts sizes and company does not qualify |
Disqualifying Criteria¶
Hard Disqualifiers
Two criteria cause the entire match to score 0.0, regardless of other components:
- Country mismatch -- If the company's country is not in the grant's eligible countries list, the match is disqualified.
- Company size mismatch -- If the grant explicitly restricts eligible company sizes and the company's size category does not match, the match is disqualified.
Company Size Classification¶
The engine maps employee counts to EU SME categories:
def _get_company_size(employee_count: int | None) -> str | None:
if employee_count is None:
return None
if employee_count < 10:
return "micro"
if employee_count < 50:
return "small"
if employee_count < 250:
return "medium"
return "large"
NACE Code Partial Matching¶
NACE codes are matched at two levels of granularity:
- Exact match (e.g.,
C25.1matchesC25.1): Full NACE weight (35%) - Section match (e.g.,
C25matchesC28-- both in sectionC2): 50% of NACE weight (17.5%)
# Full match
if company_naces & grant_naces:
score += nace_weight
# Partial match on first 2 digits (same sector)
elif company_sections & grant_sections:
score += nace_weight * 0.5
Semantic Similarity Scoring (25%)¶
The semantic component uses 768-dimensional embeddings from the all-mpnet-base-v2 sentence transformer model to compute cosine similarity between company and grant descriptions.
def calculate_semantic_score(company: Company, grant: Grant) -> float:
company_embedding = company.description_embedding
grant_embedding = grant.description_embedding
if company_embedding is not None and grant_embedding is not None:
service = get_embedding_service()
return service.compute_similarity(company_embedding, grant_embedding)
# Fallback heuristic when embeddings are unavailable
return _heuristic_text_similarity(company, grant)
When embeddings are not available (e.g., newly created companies or grants), a keyword-based heuristic fallback provides a baseline score.
Carbon Alignment Scoring (25%)¶
The carbon component evaluates alignment across multiple dimensions:
| Sub-component | Description |
|---|---|
| Category overlap | Intersection of company's carbon focus areas and grant's carbon_categories |
| Certification matching | Matching certifications: ISO 14001, SBTi, CDP, B Corp |
| EU Taxonomy alignment | Overlap of taxonomy objectives (climate mitigation, adaptation, etc.) |
| Scope compatibility | Grant's eligible_scopes vs. company's reported scopes (1, 2, 3) |
| Reduction target alignment | Whether company's reduction targets meet grant's min_emission_reduction_percent |
Carbon Categories¶
The engine recognizes 14 carbon categories for matching:
CARBON_CATEGORIES = [
"energy_efficiency",
"renewable_energy",
"clean_technology",
"circular_economy",
"sustainable_transport",
"green_buildings",
"carbon_capture",
"hydrogen",
"industrial_decarbonization",
"sustainable_agriculture",
"biodiversity",
"water_management",
"waste_reduction",
"climate_adaptation",
]
Carbon Focus Bonus¶
Grants where is_carbon_focused = True receive a 1.2x multiplier on the final score, capped at 1.0. This ensures carbon-aligned grants are surfaced preferentially.
Collaborative Filtering (10%)¶
The collaborative component analyzes interaction patterns from similar companies to provide peer-based signals:
- Companies with overlapping NACE codes and country
- Interaction signals: saved, viewed, applied
- Grants that peers have saved or applied to receive a boost
This score is derived from the Match table's interaction data (interaction_type and counts).
Recency Scoring (10%)¶
Deadline urgency is scored on a decay curve:
| Days to Deadline | Score | Label |
|---|---|---|
| Less than 14 days | 1.0 | Urgent |
| 14--30 days | 0.9 | Soon |
| 30--60 days | 0.7 | Moderate |
| 60--90 days | 0.5 | Normal |
| 90--180 days | 0.3 | Distant |
| More than 180 days | 0.2 | Far |
| No deadline set | 0.5 | Default |
Match Result Data Class¶
Every match calculation produces a MatchResult with full score breakdown:
@dataclass
class MatchResult:
grant: Grant
score: float # Final weighted score (0.0 to 1.0)
rule_score: float # Rule-based component
semantic_score: float # Embedding similarity
carbon_score: float # Carbon alignment
collaborative_score: float # Peer signals
recency_score: float # Deadline urgency
match_reasons: list[str] # Human-readable reasons
Match Persistence¶
Matches are persisted to the database using an upsert strategy:
- If a match already exists for a
(company_id, grant_id)pair, the scores are updated - If no match exists, a new record is created
- Match reasons are stored as a list of human-readable strings
- Individual component scores are stored for transparency and debugging
Match Refresh Flow¶
sequenceDiagram
participant API as Matches API
participant Engine as Matching Engine
participant DB as PostgreSQL
API->>Engine: refresh_matches(company_id)
Engine->>DB: Fetch company profile
Engine->>DB: Fetch active grants
loop For each grant
Engine->>Engine: calculate_rule_score()
Engine->>Engine: calculate_semantic_score()
Engine->>Engine: calculate_carbon_score()
Engine->>Engine: calculate_collaborative_score()
Engine->>Engine: calculate_recency_score()
Engine->>Engine: Apply carbon focus bonus
end
Engine->>DB: Upsert match results
Engine-->>API: Return ranked matches API Endpoints¶
The matching engine is exposed through the following endpoints:
| Method | Endpoint | Description |
|---|---|---|
GET | /api/v1/matches/{company_id} | Real-time match calculation |
POST | /api/v1/matches/{company_id}/refresh | Refresh and persist matches |
GET | /api/v1/matches/{company_id}/saved | Get saved matches |
PATCH | /api/v1/matches/{company_id}/{grant_id}/view | Mark as viewed |
PATCH | /api/v1/matches/{company_id}/{grant_id}/save | Toggle saved |
PATCH | /api/v1/matches/{company_id}/{grant_id}/dismiss | Toggle dismissed |
POST | /api/v1/matches/calculate | Trigger match calculation |
GET | /api/v1/matches/stats | Match statistics |
Performance Considerations¶
- Target: Match calculation completes within 500ms per company
- Embedding comparison is O(n) where n = number of active grants
- Database queries use composite indexes on
(tenant_id, company_id)and(tenant_id, grant_id) - Batch processing is used for refresh operations to avoid N+1 queries
- Carbon-focused index on
is_carbon_focusedaccelerates filtered queries
Testing¶
The matching engine has 32 dedicated tests covering:
- Unit tests for each scoring component
- Integration tests with real database records
- API endpoint tests
- Performance tests ensuring sub-500ms calculation time
- Edge cases: missing embeddings, no carbon profile, empty grant lists