Abstract

Marlina Abdul Latib, Saiful Adli Ismail, Haslina Md Sarkan, Rasimah Che Mohd Yusoff

ICRIIS 2015, December 2015,Melaka

Abstract

Log Analysis is a crucial process in most system and network activities where log data is used for various reasons such as for performance monitoring, security auditing or even for reporting and profiling. However, as years passed by, the volume of log data increases along with the size of the system as well as the number of users involved. Traditional or existing log analyzer tools are not able to handle the massive amount of data. Therefore, Big Data is the solution to overcome this issue. The main purpose of this paper is to present a review of log file analysis in Big Data environment based on previous research works. This paper also highlights the characteristics of Big Data as well as Hadoop Framework that has been widely used as Big Data application. Results from the papers reviewed shows that majority researchers applied MapReduce as the main component of Hadoop for analyzing the log files and HDFS as the data storage. Previous researchers have also used other tools and algorithms together with the Hadoop Framework for analysis purposes. The findings of this paper will provide a comprehensible review of Hadoop usage performance in analyzing different types of log files and recommend understandable results for end users to use in future work.

Keywords: big data, big data characteristics, data preprocessing, Hadoop, HDFS, log analysis, log files, MapReduce