R Stuff!!!!!1!!!!11one11!


HI GUYS!

I have something cool today. (Well, not really. It’s cool to me, at least, but that means absolutely nothing as far as it’s actual coolness goes. And it’s R code, so who knows if that is really anything interesting to any of you out there. It’s not even the code itself; I’m just showing the results. Whatevs, just read. Or don’t!).

Now that I’ve parentheticalled you all to death, here’s the story:

One of the ladies in the office next to ours is a second year Master’s and is working on her thesis. She has a lot of huge matrices of data and is doing a lot of the matrix construction in R. I don’t know her research very well or why she had to remedy this particular problem, but today she came to me with this question: she has a matrix that looks like this (this is just an example of the first few rows):

Untitled

The columns p1 through p6 contain probabilities based on some distribution (I can’t remember which one, it was a weird one), the column u contains probabilities from a uniform distribution between 0 and 1, and the y column contains values based on properties of the other columns. For example, if the u probability is greater than p2 but smaller than p1 for a specific row, that row’s y value is 2. If the u probability for another row is greater than p4 but smaller than p5, that row’s y value is a 4. Things like that. The problem, though, is that because of the distribution from which the p1-p6 values are drawn, there are a lot more 1’s and 2’s in the resulting y column than there are 3’s, 4’s, 5’s, etc. So she wanted to know if there was an easy way to “even out” the distribution of the y numbers so that their frequencies are approximately equal (that is, there are about as many 1’s as 2’s, 2’s as 3’s, 3’s as 4’s, etc.) while still being initially based on the p1-p6 values.

Because of a few other stipulations, it took me awhile to work it out, but I finally got some code that did it! To test it, I wrote some other code that generated a matrix similar to hers:

Untitled2

Here are the frequencies of the numbers in the y column prior to applying my fixing code:

dsdf

And after:

sdfsdfsf

Yay! I hope it’s what she wanted.

Advertisements

What sayest thou? Speak!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: