Monday, November 14, 2016

I scraped all the 2016 U.S. election data

I'm a weird guy. I enjoy building webscrapers; I find it relaxing. I have no real plans to use the 2016 U.S. election data, no particular horses to grind (I'm not to thrilled at the outcome, but hey), but I've been  hanging around /r/datasets on Reddit, and lots of people were asking for the data, and wondering if someone was going to scrape it. So, I did.

All of the data is in this Github repo. Obviously, I did not put a license of any sort on it, so feel free to use it. If you end up doing any interesting analyses with it (even boring ones, I'm not picky) I'd love to hear about it!

If I understand correctly, official verified (as opposed to reported) results will start to be released by the states in 2-4 weeks, but it seemed a shame the data was not, as far as I could tell, easily available right now.

(Of course, given my experience as to how the universe works, probably there will be a better data dump of all this info somewhere soon, or someone will point out it already exists somewhere my Google Fu wasn't strong enough to find, making my 10-12 hours of effort redundant. But it wouldn't have happened if I hadn't forced fate's hand!)

Is scraping legal? Yes. Is it ethical? Pretty much. Feel free to disagree; my position is that this is all public data, not in any copyright, presented publicly, and I scraped it by automating an actual web browser, so I did not use up any more of the websites' resources than a regular visitor. If I have breached any terms of service, I'll just have to live with the consequences, of which there are likely to be none. Here's a good Quora post about the subject.

