Monday, March 3, 2014

The mystery of Emma

In this week's post, I note that the U.S. Social Security baby names database has an unusual amount of boys named Emma (21) born around 1900 (BTW, here's my IPython notebook with everything you need to reproduce this graph, all my posted graphs, and plenty of graphs you might want to see on your own)

I can't figure it out (other names show peaks around 1930 and 1980, and often the error rate is high when the name's popularity is low as it was for Emma around 1970, but seldom do you see a popular name with a high error rate before 1920), so I'm posting the data here. My first thought was bad OCR turning "rn" into "m" or a rash of particularly poor spellings of "Ezra", but these don't seem supported by the data. Unless there's another, similar name that doesn't begin with E or end with a? I can't think of one.

  1. Can you narrow it down to personal of the country, or some similar specific variable? Was there any sort of mechanical data entry/tabulation at that time?

  2. You make an excellent point, Muhanned. The original database is divided up by states, I'll see if these male Emmas cluster together, that could be quite informative. Another factor is that people born in 1900 were only entered into Social Security records in 1935, as adults; maybe there was some miscommunication when husbands went to the office to sign up their wives, and the clerk checked the "male" box about once every 200 times without thinking because there was a man standing in front of them? Seems plausible, but it's purely speculation on my part.