DIAGNOSING VIRTUALIZED HADOOP PERFORMANCE FROM BENCHMARK RESULTS: AN EXPLORATORY STUDY

Authors

  • Sudhir Allam Sr. Data Scientist, Department of Information Technology, USA

Keywords:

Artificial Intelligence, Hadoop, Hadoop virtualized

Abstract

The importance of virtualization technologies in Hadoop is explored in this article. It looked at Hadoop as a new and common platform for businesses to use to improve business performance based on broad data sets. Hadoop gains from virtualization technologies in a variety of ways, including increased resource availability and cluster stability. Customers are also requesting virtual services including CPU, RAM, disks (etc.) from service companies (e.g. Amazon) and paying "pay as you go."[1] These advantages, though, are meaningless to consumers if unreasonable output loss occurs when moving from a real to a virtual platform. According to existing research on virtualized Hadoop performance, inappropriate network and storage settings for open-source virtual implementation result in significant performance degradation. However, due to the complexities of hardware and applications, including virtualization setups and implementation scales, performance tuning remains an extremely difficult practice to implement [1]. To bridge the virtualized Hadoop implementation gap, this paper recommends a performance diagnostic approach that incorporates statistical research from several levels, as well as a heuristic performance diagnostic tool that tests the reliability and accuracy of virtualized Hadoop by tracking employee traces from common big data benchmarks. Users will easily detect the bottleneck using the insights given by this tool, validate the assessment using performance resources generated by the guest OS and hypervisor, and keep optimizing performance for virtualized Hadoop by running this tool several times. Virtualization systems, in general, are used by supervisors to maximize resource use while lowering operational costs. Virtualization systems are divided into two classes. The first is about heavy virtualization, which is focused on the principle of virtual machines (VM) [2]. Each virtual machine (VM) replicates hardware and runs its operating system (OS) that is entirely independent of the host OS. The next one is for light virtualization, which is focused on container management. Although maintaining isolation, the containers share the host OS kernel [2]. This paper looks at the efficiency of Hadoop software which utilizes virtualization technologies

Downloads

Published

2019-07-25

Issue

Section

Articles