Guess who successfully defended their thesis this morning?
It honestly didn’t feel like as big of a deal as my first defense did back in 2011. Maybe it’s because I knew what to expect since I’d done it before. Maybe it’s because I felt more prepared and confident about the subject matter this time around. Maybe it’s because I’m just older and more mature than I was in 2011 and am less prone to freaking out.*
But it doesn’t matter, ‘cause now I’m DONE! I just have a few minor corrections/additions to make, then send the thesis off to the online Vault so that it’s officially submitted.
*Who am I kidding? I’ll always be prone to freaking out.
I am DONE with the first draft of my thesis introduction. Last time I did this, I thought the introduction was the hardest/most work-intensive part of the whole thing to write. So considering I’ve got all my simulations done and just need to write up the results, I’m guessing it’s the same sort of thing this time around.
So it’s nice to have that first rough draft done, even though it is just a rough draft.
Plus, I don’t think I’d made this much progress until April or so last time.
Okay, so unlike Thesis: Round I in 2011, I’ve actually started my thesis writing now—in January—versus in March. Which is probably a good thing. I also feel like I understand what I’m actually doing WAAAAAAAAAAAAAAY better than I did back then.
I also care a lot more.
So yeah. Hopefully things will go a lot smoother than they did last time, but I guess we’ll see.
Haha. So. I found this post tonight where someone plotted average (well, median) dissertation and thesis lengths by area of study.
So I took a screenshot of the one for the theses and put a vertical red line at the length of my UBC thesis, just to see how it compared (a better, clearer pictures of this graph can be found at the above link).
TO BE FAIR, many of my pages were plots, not text. But still. It was a long thesis.
AND NOW I HAVE TO DO ANOTHER ONE LASDJFLAKSDFHALFJWELKF.
So school starts up again tomorrow. And I’m super nervous about it.
Well, okay, I’m always super nervous about school. I’ve basically been in high anxiety mode since 2006. You might wonder what the hell I’m so nervous about. I’ve had almost a decade of college now (DEAR GOD, THAT’S DEPRESSING) and I’ve already gone through a master’s program. What’s the big deal?
The big deal is the following: that first master’s program? That was the worst two goddamn years of my life. Every day was miserable and the thought of going to campus made me physically ill on several occasions. I hated meeting with my supervisor because I knew I would get berated to some degree no matter what we were talking about.
I was so nervous and stressed out that most of the second year is gone from my memory. Seriously. I don’t remember much school-wise beyond just being miserable and wanting to quit. I wanted to quit so badly. I honestly have no idea how I finished that thesis and successfully defended it. My fear and anxiety made me procrastinate and I really didn’t get started on things until March (I had to defend in June). Really, I was just not in a good mental place that entire year.
And while I know this time is almost completely different in every aspect, I still have that fear and anxiety about the process. And I still get nauseous whenever I have to meet with Dr. Chen just because of how bad things were between my supervisor and me before. I know Dr. Chen’s not like that at all, but there’s still that fear.
There’s still the fear about everything regarding this whole thing.
And that’s why I’m super nervous.
For my presentation* in my seminar class, I’m basically presenting some of the main results of my MA thesis. This has required me to dredge up this old notebook from way back when.
This notebook brings back bad memories.
This notebook brings back thoughts of UBC.
This notebook brings back thoughts of how, every morning, I would dread going to campus with every fiber of my being.
This notebook brings back thoughts of how I would have a panic attack every Thursday because Thursday was the day I was supposed to meet with my supervisor.
This notebook brings back thoughts of how much I eventually stopped caring about school—something I hate admitting even now.
This notebook reminds me that I lost two years of my early twenties to misery, fear, dread, and depression, among other things.
This notebook brings back bad memories.
Aaaaaaaand now I’m sad.
*You may be asking, “if this brings back so many bad memories, why the hell are you doing your presentation on your old thesis results? Because the presentation is focused more on our presentation skills rather than the content, so it was recommended that we just use some results/ideas that we’ve come up with in the past and focus on the “presenting” part rather than try to come up with something new.
This is probably the strangest result I found while writing my thesis. I’ll explain what it’s showing ‘cause it certainly isn’t obvious from this graph (especially if you don’t know structural equation modeling, aka SEM) and then tell you why I’d like to study something like this in depth.
SEM is basically the process by which researchers attempt to construct models of the relationships amongst variables that best fit a given data set. For example, if the data I’m interested in are a bunch of variables related to the Big Five personality factors and I as a researcher have evidence to support a specific structure of relationships amongst these variables and factors, I can construct a structural equation model that numerically represents how I think the variables are related. I can then test my model against the actual relationships amongst the variables in the actual data.
Fit indices, the whole topic of my thesis, are calculations which allow researchers to quantify the degree to which their hypothesized model accurately represents the real structure of the relationships amongst the variables in the data. Most fit indices range from 0 to 1, though the meaning of scores of 0 and 1 differ depending on the index. Model fit can be affected by a bunch of stuff, but most obviously (and importantly) it is affected by inaccuracies in the hypothesized model.
For example, say I had a model in which I had variable A, variable B, and variable C all related to factor X but all uncorrelated with one another (good luck with that setup, but it’s good for this example). I fit this to data which, indeed, has A, B, and C all related to X but also has B and C covarying via their errors. The fact that my model is missing this covariation would factor into the calculation of the fit index, lowering its value.
Without going into the gory details of how these simulations were constructed and what model misspecification we added so that the fit index would have a discrepancy to work with (that is, the proposed model in the simulations purposefully didn’t match the underlying structure of the data and thus would have a fit index indicating a certain degree of misspecification), I’ll tell you what we did for this plot. I’ll tell you as I describe the plot, actually, ‘cause I think that’d be easiest.
Recall from like 20 sentences ago: SEM is about creating an accurate representation of the real relationships that exist amongst a set of variables. This representation of the true relationships amongst the data (called the “true model”) takes the form of a researcher’s proposed model (called the “hypothesized model”). I’ve labeled the pic above appropriately.
For the plot at the beginning of this blog, there were actually 18 simulated models—each with two factors and 24 indicator variables. The only differences between each of these models was how many indicator variables loaded onto the two factors. For example, one model looked like this (click to make these pics bigger, BTW):
And another model looked like this:
For each model, all the errors of the indicators were uncorrelated except for V1 and V2 (indicated by the crappily-drawn red arrows). You don’t really need to know what that means to get the rest of this blog; basically all you need to know is that each of the models had one extra “path” (or relationship between variables) in addition to the relationship between the two factors and the 24 indicator-to-factor relationships. So for each model, there totaled a number of 26 pathways or relationships between variables.
Now remember, I said these were simulated models. These models are actually what the data I created are arising from. Hence, they can be considered in the context of SEM as “true models” (see above).
Okay, so we’ve got a bunch of true models. How in the heck do we assess the performance of fit indices?
Easy! By creating a “hypothesized model” that (purposefully, in this case) omits a pathway that’s actually present in the data arising from the true model. In this simulation, that meant that for each true model, there would be a hypothesized model created that would fit every path correctly BUT would omit the correlation between the errors for V1 and V2 (the red-arrow-represented relationship between V1 and V2 would not exist in the hypothesized model).
See what I’m getting at? I’m purposefully creating a hypothesized model that doesn’t fit the true model exactly so that I can analyze what fit indices appropriately reflect the discrepancy. Indices that freak out and say “OH YOUR MODEL SUCKS, IT’S TOTALLY NOT AN ACCURATE REPRESENTATION OF THE UNDERLYING DATA STRUCTURE AT ALL” would be too sensitive, as a model that accurately represents 25 out of 26 possible pathways is a pretty damn good one (and is almost unheard of in psychology-related data). However, an index that says, “Hey, you’re a pretty badass researcher, ‘cause your model fits PERFECTLY!” isnt’ right either; you’re missing a whole pathway, how can the fit be perfect?
Wow, that was like 20 paragraphs longer than I was expecting.
[INTERMISSON TIME! Go grab some popcorn or something. I’m watching Chicago Hope at the moment, actually. Love that show. Thank you, Hulu. INTERMISSION OVER!]
Back to the plot.
So now you know what the models were in this case, I can tell you that the x-axis of this plot represents the 18 different models I had created. You’ll note the axis label states “# of Indicators per Factor with Misspecification.” This means that for the tick labeled “3,” the correlated errors of V1 and V2 in the true model occurred under the factor with three variables (with the other factor, Factor 2, having the remaining 21 indicator variables loading onto it). The hypothesized model, then, which omits this relationship, looks like this:
On the opposite side of the plot then, the tick labeled “21” is opposite—the error covariance occurs between variables that load onto the factor with the 21 indicator variables loading onto it.
Probably not ‘cause I’m writing this at like 5 AM and sleep is for wusses and thus I haven’t been partaking in much of it, but I SHALL CARRY ON FOR THE GOOD OF THE NATION!
Remember, for each of the 18 true models, I fit a hypothesized model that matched the true model perfectly, except it OMITTED the error covariance occurring between two indicator variables.
Now let’s look at the y-axis, shall we? You’ll see it’s label reads “SRMR,” which stands for the Standardized Root Mean Square Residual fit index. This index, as can be seen by the y-axis values, ranges from 0 to 1. The closer the index gets to 1, the better the hypothesized model is said to fit the true model, or the true underlying structure of the data.
Okay, and NOW let’s look at the colored lines. The different colors represent the different strengths of correlation between the two factors in the model. But that’s probably the least important thing right now. So I guess just ignore them, haha, sorry.
Alrighty. Now that you (hopefully kind of sort of) mucked through my crappy, haphazard, rushed explanation of what this graph is showing, take a look at it, particularly at how the lines change as you move left to right on the x-axis.
Do you all see how weird of a pattern that is? This plot is basically showing me that the fit index SRMR is sensitive to misspecification in the form of an omitted pathway (relationship between variables), but that this sensitivity jumps all over the damn place depending on the size of the factor on which it occurs. Notice how all the lines take a dive toward a y-axis value of zero (poor fit) when there 7 indicators belonging to the factor containing the misspecification (and 17 indicators belonging to the factor without the misspecification). Isn’t that WEIRD? Why in the hell does that particular shaped model have such a poor fit according to this index? Why does fit magically improve once this 7:17 ratio is surpassed and more indicator variables per the factor with the error are included?* By the way, that’s this model:
Freaking SRMR, man. And the worst part of all this is the fact that this is NOT such an aberrant result. ALL of the fit indices I looked at (I looked at seven of them), at least once, performed really, really poorly/counter-intuitively.
This is why this stuff needs studying, yo. Also why new and better indices need to be developed.
Haha, okay, I’m done. Sorry for that.
*Actually this sort of makes sense—the more indicator variables there are loading onto the factor with the error, the more “diluted” that error becomes and it’s harder for fit indices to pick it up. However, there’s not really an explanation as to why the fit takes a dive UP TO the 7:17 ratio.
YAY, my thesis has been submitted and approved for publication. It shall now be available to the public within 4 or 5 days (edit: here it is!)
In other news:
– My family is extremely strange.
– My internet addiction is not as bad as I thought it was.
– My father, on the other hand, is severely addicted to Facebook. It doesn’t help that he’s also Captain Hunt n’ Peck when it comes to typing and therefore takes an hour and a half to write a single wall post.
– Eggplant hummus from Trader Joe’s is phenomenal.
– Broccoli may be the best food ever.
– Eggplant hummus from Trader Joe’s + broccoli = OMG TASTE BUDS ASPLODE
– I suck at swimming.
– I finally found The Adventures of Augie March by Saul Bellow, a book that has been perpetually checked out at every library I’ve ever been to.
– I will perhaps make a video log of the cruise, I’m not sure yet.
– It is SO SUNNY HERE. My body doesn’t know what to do with all this vitamin D.
– The end!
HOLY SHIT IT’S OVER.
I successfully defended my thesis this afternoon. Received a nice high grade and was told it was PhD-level work.
Relief level: beyond belief.
Sanity level: depleted, but recharging slowly.
To-do list for the next few days: absolutely nothing, except incessant Fallout playing.
My revisions are limited to typos and a few additional references to check out, then I’m set to turn it in to Grad Studies and get it out there.
I’m going to go do something mindless for awhile. Because I finally can.
Longer entry on thesis conclusion to come later, promise.
Kill me now. I honestly don’t know if I’ll make it through tomorrow.
Anybody want to come to a party at my house tonight? We’ll be talking about structural equation modeling. Model fit, in particular.
My poor mother will have to undergo three semesters’ worth of math and stats in about half an hour so she can be caught up on my research
to follow my defense presentation enough to be able to ask questions.
The fact that she agreed to do this is one of like five billion reasons why I love her.
Anyway, you all should come. There will be a huge whiteboard and a lot of lambdas.
Be there or be square.
Thesis = done.
It shall be turned in tomorrow.
Right now I’m going to not do anything productive (read: play Fallout and not work on my defense slides) and maybe clean.
Due to yesterday’s events, my mom shall be coming up here tomorrow.
A month from now, I will be defending my thesis.
Scared all to hell.
Solution: apartment temperature cranked to 78 degrees, Top Chef, making pretty graphs for my thesis, and not going to campus.
Why isn’t there a MyLifeIsPathetic.com yet?