Two datasets. 103 farmers. 42 polygons. No shared identifiers. Between them sits a carbon-credit registration that can't move forward. We built a pipeline that reconciles them, tells you which matches it trusts, and resolves the ones it doesn't.
The farmer list is a spreadsheet. The polygon map is drawn by field staff. Both are accurate. Neither references the other. Matching them by hand works at this scale. It doesn't work at ten thousand farmers.
Farmers are grouped A through E. Polygons aren't grouped at all. We use weighted K-means on polygon centroids to produce five spatial clusters, then match each cluster to a farmer group by comparing total hectares. Every polygon gets a group label, inferred directly from where it sits on the map.
With 103 farmers on 42 polygons, every plot holds about 2.5 farmers on average. We solve this as a minimum-cost flow problem: each polygon has a capacity, each farmer sends one unit of flow, and the cost is the fit between their reported area and their share of the polygon. Click any polygon to see who landed on it.
Click a polygon to see its assigned farmers, their shares, and any matches flagged for review.
For each farmer, we compute the gap between their best and second-best candidate polygon, then compress it to a zero-to-one confidence score. High scores are unambiguous. Low scores mean the optimizer couldn't decide. The registry gets to see exactly where the pipeline is certain and where it isn't.
We don't claim accuracy without ground truth. We publish a distribution. High-confidence matches are ready for J-Credit registration as-is. Low-confidence ones route to the resolution stage, where we narrow them down to the top few candidates and prepare them for field-staff verification.
This is what makes the pipeline auditable. Every number has a provenance. Every decision has a reason.
A flagged assignment means the optimizer wasn't sure. The resolver takes each flagged farmer, looks at their top three candidate polygons, and uses the confident assignments on those polygons as spatial anchors. If two farmers on a candidate polygon are already matched with high confidence, that's strong evidence the flagged farmer belongs there too. Pick a farmer from the queue to see how it works.