Newsroom

Why station-level totals beat county-level totals for evidence

7 min read

What this is / is not. This piece discusses general patterns in election integrity infrastructure. It does not endorse or critique any specific candidate, party, or past electoral outcome.

Two ways to know a number

Take any aggregated election figure - a county total, a constituency total, a national projection. There are two ways the figure can be true.

The first is that someone with authority announced it. The second is that you can reconstruct it from primary documents produced under direct observation, line by line, station by station, with the source artifacts attached.

Both are real claims. Only the second is evidence. The difference matters whenever a result is contested, and it is the entire reason station-level data is the right unit of analysis for any platform whose output is meant to be defensible.

The county-total failure mode

Imagine a campaign command center that ingests county-level totals as they arrive. The dashboard shows aggregated turnout, candidate share, swing relative to a prior cycle. Anomalies are detected at the county level: this county is reporting faster than expected, that county shows a turnout spike outside its historical band.

The information is useful for narrative. It is almost useless for evidence. A county total is a sum. When a sum is wrong, you cannot tell whether one station produced a bad number, whether ten stations produced slightly wrong numbers, or whether the sum was computed from numbers that match the underlying station-level documents but were misread on transmission. The county-total layer hides the variable you actually need to examine.

It also hides the corruption surface. The space between a station's signed Form 34A and the constituency tallying center is the space where transcription errors, deliberate substitutions, and good-faith mistakes occur. A platform that starts from county totals has already lost the data it would need to detect any of those things.

The station-level model

A station-scoped data model treats the polling station as the atomic unit. Every figure in the system - candidate vote, rejected ballot, registered voter count, turnout - exists as a row tied to one station, one Form 34A photograph, one agent, one reviewer, and one timestamp. County totals exist, but they are derived. They are computed from the underlying rows on every read, never stored as a primary value.

Three architectural consequences fall out of this choice.

Disagreement becomes local. When a candidate's legal team challenges a result, the conversation is no longer "is the county total wrong". It is "for which of these 612 stations do we have a station-level objection, and what is the evidence per station". The unit of dispute matches the unit of evidence. Legal arguments that previously required forensic accountants now require examining a finite set of rows.

Anomalies become specific. A turnout spike at the county level is a story. A turnout spike at one station, when the surrounding stations are within historical bands, is an evidence claim. Station-level anomaly detection produces artifacts a campaign manager can actually act on - send a supervisor to that station, request the original form, escalate to the returning officer - within the few hours when action still matters.

Aggregation is auditable. Because totals are derived, the system can show the sum and the inputs in the same view. A campaign analyst, a lawyer, a journalist, and an observer can all answer the same question - which stations contributed to this number - without trusting an intermediate layer.

Why granularity is not free

The station-level model is more expensive. Kenya has roughly 46,000 polling stations across a presidential cycle. A platform that maintains row-level integrity for each one needs an agent-deployment plan, a verification workflow, a storage strategy for tens of thousands of source images, an anomaly-detection layer that operates at the row level, and an export format that preserves the chain of custody. None of that is incidental.

It is also the only model that survives serious scrutiny. The cost of building a station-level pipeline is paid once, by engineers. The cost of building only a county-level pipeline is paid on election night, by legal teams who discover that the evidence they needed to make a case does not exist in a form a court can use.

What to look for

When evaluating any election platform, the granularity test is simple: ask whether you can click through from any aggregated number to the station-level rows that compose it, and from each row to the Form 34A photograph it was extracted from. If you can, the platform is producing evidence. If you cannot, the platform is producing a dashboard. Both have uses. Only the first survives a courtroom.