Statistics in the Nude
HAHA, you wish, right?
Today (or yesterday, or some time recently) CNN.com put up a link to the 100 most popular boy and girl baby names of 2011. Said link is here.
My personal opinions:
- Hate the name Aiden
- Dig the name Sophia
- Annabelle’s my kitty’s name!
- Xavier? Really?
- What’s with the excessive overuse of “y” as a replacement vowel for…well, pretty much every other vowel?
- Half of these names I would never expect to be in the 100 most popular for this year.
Anyway.
Me being me, I decided to run a few quick little stats to see what’s what with these names. Consider this a delightful little romp through basic descriptive and inferencial stats.
- Test 1: Is there a difference between the mean number of syllables in the 100 most popular names for boys vs. the 100 most popular names for girls? (2-sample t-test)
- Test 2: Is there a difference between the mean syllable of emphasis in the 100 most popular names for boys vs. the 100 most popular names for girls? (2-sample t-test)
- Test 3: Do either the number of syllables or the mean syllable of emphasis statistically predict the rank of the names for either boys or girls? (Regressions! Regressions!)
Since these are small baby analyses I won’t go through the analyses in depth; I’ll just give you the results.
Test 1
I wanted to determine with this test if the top 100 male and 100 female names had a statistically different number of syllables. No names in either list had more than 4 syllables.
Mean number of syllables for male names: 2.45
Mean number of syllables for female names: 2.09
Results of the t-test: t(187.956) = 3.80, p < 0.001 (0.0001967)
This means that there is a statistically significant difference in the number of syllables in the 100 most popular male names and the 100 most popular female names (with male names having, on average, more syllables).
Test 2
I wanted to see if there was any difference between the two lists of names in terms of where the emphasis was placed in the name. Did one list contain more names where the emphasis was on the first syllable (e.g., “E-than,” “KA-thy”, “CA-ro-line”), or more names where the emphasis was later in the name (e.g. “a-LEX-a,” “nath-AN-iel,” “el-ISE”)? This was simplified somewhat by my coding; I just had “1,” “2,” “3,” and “4” as the codes for the emphasis falling on the first, second, third, or fourth syllable, respectively.
Mean syllable of emphasis for male names: 1.09
Mean syllable of emphasis for female names: 1.29
Results of the t-test: t(167.09) = 3.04, p = 0.0027
This means that there is a statistically significant difference between the location of the syllable of emphasis in the 100 post popular male names and the 100 most popular female names (with the emphasis being placed earlier in the name for males than females).
Test 3
I didn’t save the printout results of the regressions because afterward I realized how bad it was to attempt an inferential statistic with such a pittance of a data set, but I figured I’d let you know what I got anyway: performing a Poisson regression (y variable is a count variable, bitches!) revealed that neither the number of syllables nor the location of the emphasis in the name were statistically significant predictors of the rank of the names.
Ta-da!
Even in my dreams, man, even in my dreams
So I had this dream last night where I had to come up with something to statistically analyze within thirty seconds or this group of guys would murder me. It was like the Statistics Mafia or something.
Anyway.
My idea in the dream was to go to the Sporcle quiz of “50 states in 10 minutes” and see whether or not there were any correlations between the percentage of times a state name is remembered and:
a) Geographical location
b) State population
c) Number of syllables in the state name
My dreams: scary stuff. So guess what I did today?
Part I: Percentage of times a state name is remembered vs. geographical location
States were categorized by geographic location and those locations were numbered from northwest to southeast:
Northwest: 1
Southwest: 2
North central: 3
South central: 4
Northeast: 5
Southeast: 6
The average percent of times the state name is remembered was averaged within each geographic location, and these mean percentages were plotted against the location numbers. Here is said plot:
As you can see, the mean percentage of times that the state name is remembered is lowest for the two central regions and the Northeast. This could be due to lower populations in the central areas and the sheer number of itty bitty states in the Northeast. Or something else, who knows.
Part II: Percentage of times a state name is remembered vs. state population
This was a bit easier. This is just a plot of the percentage versus the state population. Ignore that x-axis; I can’t remember how to quickly fix it in R and it’s late.
Here you’ve got those few outlier states like California and Texas with mega populations that also appear to be at the top of the percentage remembered axis. Other than that, though, there doesn’t appear to be any sort of trend going on here. At least, in my opinion.
Part III: Percentage of times a state name is remembered vs. number of syllables in the state name
The average percent of times the state name is remembered was averaged within each number of syllables (1 through 5), and these mean percentages were plotted against the location numbers.
Haha, I don’t really know what this says. There aren’t that many 2- and 5- syllable states anyway. And Maine is the only 1-syllable. I don’t think this is the best of the three graphs to look at.
Anyway.
Fun times.
Today’s song: Thinking About You by Radiohead



