The original Seven Essays, in a cool, new outfit ...

Removing Outliers from Data

Removing Outliers from Data

While exploring data that was collected for an investigation of resources needed for software testing and improvement, you have found an outlier. The study is focused on two variables: the number of bugs found in the code and the time (in person-hours) required to fix them. The outlier corresponds to a very large project with many bugs that required a large amount of time to repair them (and the numbers appear to be correct). Your supervisor recalls something about ‘outliers being bad’ and recommends that you remove this observation from the data set. A co-worker comments that the outlier actually seems to be typical for a large project. All agree that large projects are indeed part of the universe under study. Should the outlier be removed? Justify the answer.