Abstract
Mining outliers has become more and more important in recent years. It has wide applications in military surveillance for enemy activities, detection of potential terrorist attacks, credit card fraud detection, network intrusion, computer virus attack, clinical trials, severe weather prediction, athlete performance analysis, and many other data mining tasks. In today's big data age, multivariate data sets are very complex. Variables among different dimensions are usually correlated with different variations. Classical data mining methods with Euclidean distance measure are not working well for mining multivariate outliers. In this study, we propose a normal mixture model-based framework of multivariate outlier detection. We fit the model parameters through the robust EM algorithm. The K-means clustering algorithm is used to provide the initial inputs for the EM algorithm. The well-know Mahalanobis distance is used to determine the cutoff points for outlier detection via the chi-square distribution critical values. Implementation details of this framework are also discussed.