Tag Archives: test 14
Week 14: The Median Absolute Deviation Test for Identifying Outliers
Today we’re going do something a little bit different by talking about the median absolute deviation test for identifying outliers!
When Would You Use It?
The median absolute deviation test for identifying outliers is used to determine whether or not a specific sore in a sample of n observations should be classified as an outlier.
What Type of Data?
The median absolute deviation test for identifying outliers requires interval or ratio data.
Test Assumptions
None listed.
Test Process
The equation employed in this test is as follows:
Step 1: Compute the median M of the dataset.
Step 2: Compute the median absolute deviation, or MAD. To do so:
a) Calculate the absolute values of the difference between each score and the median.
b) Arrange these absolute deviations in order from lowest to highest.
c) Find the median of these absolute deviations; this is the MAD value.
Step 3: Determine the Max value. While the selection of this value is somewhat arbitrary, a recommended value is to set Max = 5. This is because if the data are assumed to come from an approximately normal distribution, this value will be very likely to identify extreme or outlier scores.
Step 4: Plug in each X value into the equation to determine if it is an outlier. X is an outlier if the left-hand side of the equation exceeds the Max value. If doing this test by hand, the best way to go about this step is to start with the X that deviates the most from the median and work down from there, but if using a program, it’s easy enough to just test them all at once.
Example
Today’s data is from my 2013 music. I have the lengths (in seconds) of all n = 365 songs from that year, and I want to determine which values are outliers.
Computations:
M = 226
MAD = 36
With Max = 5, I found that the songs with the following lengths are outliers:
891
564
636
516
580
534
597
574
537
595
486
This was done using R; the code is below.
Example in R
x = read.table('clipboard', header=T) #data
M = median(x)
absdev = abs(x-M)
MAD = median(absdev)
Max = 5
for (i in 1:length(x)){ # if an x value is an outlier, this loop will
dev = (abs(x[i]-M))/MAD # print its value
if (dev > 5) { print(x[i]) }}
