# Waiter! There’s a Super Nova in my ANOVA! How in the World…?

Ladies and gentlemen, I present to you the first actual statistical analysis of my blogs. It’s a crappy one (just a SRS + proportion estimate) because I couldn’t think of anything else that was interesting and thus couldn’t think of anything worthy of a two-sample t-test. So disappointing!

But anyway.

Goal of analysis: to discover what proportion of my blogs are surveys.

Method:
1) estimate several bounds
2) using the best estimated bound, calculate an acceptable sample size (n) from which to gather data.
3) use data gathered in step 2 to calculate the total population proportion of blogs that are surveys with a reasonable bound on the error of estimation.

Formulae used:
To estimate appropriate sample size:

To estimate population proportion:

To estimate variance and bound on the error of estimation, respectively:

Procedure:
The initial N was 663, as that was the total number of blogs. It was found best to set p = .5, as that would give us the most conservative estimate and a sample size larger than would be necessary. Several magnitudes of B were plugged into the sample size equation, and the best was found to be B = .17. This was used in the sample size equation and an n of 33 was obtained.

Using a random numbers table, a SRS of 33 blogs was obtained. Each specific blog was looked up and marked as to whether or not it contained a survey. Results from this SRS are below (a ‘0’ indicates no survey, a ‘1’ indicates a survey):

Blog        Survey?
139                  0
163                  0
198                  0
41                    0
145                  0
66                    0
301                  0
253                  0
380                  1
2                      0
408                  1
400                  0
440                  1
259                  1
351                  0
273                  0
487                  0
183                  0
599                  1
510                  0
473                  0
170                  0
534                  0
257                  0
279                  0
151                  0
394                  0
186                  0
604                  1
577                  0
388                  0
568                  1
221                  0

These results were used in the calculation of the total population proportion of the proportion of blogs that were surveys. The result of this equation was .21. The variance of the data in the SRS was then calculated (=.007405213) and then used to calculate the bound on the error of estimation, which came out to be .17.

Therefore, we can extrapolate that 21% +/- 17% of my blogs are surveys. Or, anywhere from 4% to 38% of my blogs are surveys.

Yes, yes, I know that’s a horrible, horrible bound on the error of estimation (seriously, 17% either way?! Blasphemy!) but I don’t think you realize how hard it is to actually go back and figure out the specific number of each blog I’ve ever written. There are 663 of them, you know.

So yeah. That’s all I’ve done tonight, basically. Do you people have any ideas for possible blog-related things I could statistically analyze? I’m dyin’ here.