Anomaly detection on big data system logs using deep learning

Uday R Soni

Back

Anomaly detection on big data system logs using deep learning

Thesis

Open access

Anomaly detection on big data system logs using deep learning

Uday R Soni

Master of Science (MS), California State University, Sacramento

03/01/2021

Handle: https://hdl.handle.net/10211.3/218647

Abstract

HDFS logs

Log parsing

Convolutional neural network (ConvNets)

Big data systems are generally utilized in different industries for handling a gigantic amount of data. HDFS (Hadoop Distributed File System) is one of the examples of that. This type of system is operated on hundreds of machines, running parallelly to analyze a huge amount of data. Any system failure or downtime can result in losing valuable knowledge of data or losing valuable time for processing the data. The logs generated by the system can be used to detect issues or failures, which are considered an anomaly for the systems. The identified anomalies provide insight for patching the system and improving overall performance. With the increasing size of logs, manually detecting errors, warnings, or exceptions related to anomalies or failure is a daunting task. As the log data size and complexity increase, manual analysis and detection cannot pace-up with the data coming in. Machine learning is the most promising approach for developing and deploying automatic detection of anomalies. Machine Learning methods such as Logistic Regression, Decision trees, and Support Vector Machine (SVM) are available for anomaly detection, but they cannot provide performance as good as deep learning in terms of training time and accuracy. SVM is popular in application areas such as intrusion detection and anomaly detection, but it struggles when the data is massive and consists of more feature sets. Deep learning methods are widely popular because of the vast usability, availability of different methods, and a simple model implementation framework. They have an advantage over traditional machine learning techniques in terms of accuracy, gathering knowledge from data, and training time. Some researchers have used deep learning methods such as LSTM to learn the log sequence information and pattern to detect the anomalies, but LSTM needs more training time and extensive hyper-parameter tuning. Therefore, this project has focused on other deep learning methods like Multi-Layer Perceptron (MLP) and Convolutional Neural Networks (ConvNets) to detect anomaly through pattern analysis and recognition. This project has analyzed and parsed the log data to extract valuable information from the log messages. It has implemented MLP, ConvNets, multi-head ConvNets, and leveraged these model’s pattern recognition ability for anomaly detection on the parsed data.

Files and links (1)

pdf

Uday_Soni_Project1.09 MBDownload View

TextProject Open Access

Metrics

75 File views/ downloads

124 Record Views

Details

Title: Anomaly detection on big data system logs using deep learning
Creators: Uday R Soni
Contributors: Xiaoyan Sun (Advisor) - California State University, Sacramento, Computer Science Department
Jun Dai (Committee Member) - California State University, Sacramento, Computer Science Department
Academic Unit: Computer Science Department
Theses and Dissertations: Master of Science (MS); Computer Science; California State University, Sacramento; 12/2020
Publication Details: 03/01/2021
Identifiers: 99257889214901671; https://hdl.handle.net/10211.3/218647
Resource Type: Masters Project
Language: English