Monday, February 21, 2022

2020 San Francisco RCV Election Not Reproducible

The good news is that San Francisco publishes full ballot data from their elections [1]. It's a quarter-gigabyte zip file full of JSON and I can go digging into it and look for deeper patterns in how people rank candidates on their ballot. Is there a hidden story about who 2nd-choice candidates were and what people really want? Did IRV screw up and we should switch to Condorcet?

The bad news is that I can't reproduce their results. They publish the full data, but the program they run isn't open source.

Board of Supervisors District 1 was a really close race. In the official results after 6 rounds of Instant Runoff Vote (IRV) elimination it came down to 17,142 to 17,017. A small difference of 125 votes after 3,726 ballots started out blank for this race and another 2,024 who didn't fully rank their ballot and had no mark for either of the last two.

I can't even reproduce the first round count:

officialmy countdiff
AMANDA INOCENCIO7027031
ANDREW N. MAJALYA3123142
CONNIE CHAN1350813455-53
DAVID E. LEE629362941
MARJAN PHILHOUR1238312379-4
SHERMAN R. D'SILVA155815580
VERONICA SHINZATO132013222

I ran into a few unexpected tweaks in the data just to get here. They're effectively erasing all the Write-in votes. It's considered an RCV user error to rank two candidates the same rank [2]. It's a frequent user error to mark a candidate's bubble and then also write-in the name. People do this on pick-one ballots too. By erasing the write-in votes a bunch more votes get considered. Yay democracy! The other weirdness is that the "first round" in the official RCV is kinda effectively a second round where the write-in candidates have already been disqualified for getting too few votes.

I've posted a jupyter notebook and Python code which gets the above result. If anyone else can reproduce the San Francisco 2020-11 election by tweaking my code or by using other open source software, I'd love to hear it.

---
[2] I forget which law I read, Maine maybe, that said as soon as the IRV process got to a rank where a ballot had two of the same rank then that ballot was effectively dead and discarded. There's no need for this. I have software that can work around that a couple ways: 1. temporarily set aside the vote, maybe one of those two will get eliminated and then the ballot can be used as normal; 2. give half a fractional vote to each candidate at that ranking, again after elimination maybe someone gets a whole vote; 3. use Condorcet's method and then it just doesn't matter and equal-ranks are fine.

No comments: