Week 14: The Median Absolute Deviation Test for Identifying Outliers


Today we’re going do something a little bit different by talking about the median absolute deviation test for identifying outliers!

When Would You Use It?
The median absolute deviation test for identifying outliers is used to determine whether or not a specific sore in a sample of n observations should be classified as an outlier.

What Type of Data?
The median absolute deviation test for identifying outliers requires interval or ratio data.

Test Assumptions
None listed.

Test Process
The equation employed in this test is as follows:

Step 1: Compute the median M of the dataset.

Step 2: Compute the median absolute deviation, or MAD. To do so:

a) Calculate the absolute values of the difference between each score and the median.
b) Arrange these absolute deviations in order from lowest to highest.
c) Find the median of these absolute deviations; this is the MAD value.

Step 3: Determine the Max value. While the selection of this value is somewhat arbitrary, a recommended value is to set Max = 5. This is because if the data are assumed to come from an approximately normal distribution, this value will be very likely to identify extreme or outlier scores.

Step 4: Plug in each X value into the equation to determine if it is an outlier. X is an outlier if the left-hand side of the equation exceeds the Max value. If doing this test by hand, the best way to go about this step is to start with the X that deviates the most from the median and work down from there, but if using a program, it’s easy enough to just test them all at once.

Example
Today’s data is from my 2013 music. I have the lengths (in seconds) of all n = 365 songs from that year, and I want to determine which values are outliers.

Computations:
M = 226
MAD = 36

With Max = 5, I found that the songs with the following lengths are outliers:

891
564
636
516
580
534
597
574
537
595
486

This was done using R; the code is below.

Example in R

x = read.table('clipboard', header=T) #data
M = median(x)
absdev = abs(x-M)
MAD = median(absdev)
Max = 5
for (i in 1:length(x)){       # if an x value is an outlier, this loop will
dev = (abs(x[i]-M))/MAD       # print its value
if (dev > 5) { print(x[i]) }}
Advertisements

What sayest thou? Speak!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: