The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Hadoop é uma plataforma popular de análise de dados baseada no modelo de programação MapReduce do Google. As unidades de disco rígido (HDDs) são geralmente usadas na análise de big data, e a eficácia da plataforma Hadoop pode ser otimizada melhorando seu desempenho de E/S. O desempenho do HDD varia dependendo se os dados estão armazenados nas zonas internas ou externas do disco. Este artigo propõe um método que utiliza o conhecimento das características do trabalho para realizar um armazenamento eficiente de dados em HDDs, o que, por sua vez, ajuda a melhorar o desempenho do Hadoop. De acordo com o método proposto, os arquivos de trabalho que precisam ser acessados com frequência são armazenados em trilhas externas do disco que são capazes de facilitar velocidades de acesso sequencial superiores às fornecidas pelas trilhas internas. Assim, o método proposto armazena arquivos temporários e permanentes nas zonas externa e interna, respectivamente, facilitando assim o acesso rápido aos dados frequentemente necessários. Os resultados da avaliação de desempenho demonstram que o método proposto melhora o desempenho do Hadoop em 15.4% quando comparado a casos normais quando o posicionamento de arquivos não é usado. Além disso, o método proposto supera uma abordagem de colocação proposta anteriormente em 11.1%.
Makoto NAKAGAMI
Kogakuin University
Jose A.B. FORTES
University of Florida
Saneyasu YAMAGUCHI
Kogakuin University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Makoto NAKAGAMI, Jose A.B. FORTES, Saneyasu YAMAGUCHI, "Job-Aware File-Storage Optimization for Improved Hadoop I/O Performance" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 10, pp. 2083-2093, October 2020, doi: 10.1587/transinf.2019EDP7337.
Abstract: Hadoop is a popular data-analytics platform based on Google's MapReduce programming model. Hard-disk drives (HDDs) are generally used in big-data analysis, and the effectiveness of the Hadoop platform can be optimized by enhancing its I/O performance. HDD performance varies depending on whether the data are stored in the inner or outer disk zones. This paper proposes a method that utilizes the knowledge of job characteristics to realize efficient data storage in HDDs, which in turn, helps improve Hadoop performance. Per the proposed method, job files that need to be frequently accessed are stored in outer disk tracks which are capable of facilitating sequential-access speeds that are higher than those provided by inner tracks. Thus, the proposed method stores temporary and permanent files in the outer and inner zones, respectively, thereby facilitating fast access to frequently required data. Results of performance evaluation demonstrate that the proposed method improves Hadoop performance by 15.4% when compared to normal cases when file placement is not used. Additionally, the proposed method outperforms a previously proposed placement approach by 11.1%.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7337/_p
Copiar
@ARTICLE{e103-d_10_2083,
author={Makoto NAKAGAMI, Jose A.B. FORTES, Saneyasu YAMAGUCHI, },
journal={IEICE TRANSACTIONS on Information},
title={Job-Aware File-Storage Optimization for Improved Hadoop I/O Performance},
year={2020},
volume={E103-D},
number={10},
pages={2083-2093},
abstract={Hadoop is a popular data-analytics platform based on Google's MapReduce programming model. Hard-disk drives (HDDs) are generally used in big-data analysis, and the effectiveness of the Hadoop platform can be optimized by enhancing its I/O performance. HDD performance varies depending on whether the data are stored in the inner or outer disk zones. This paper proposes a method that utilizes the knowledge of job characteristics to realize efficient data storage in HDDs, which in turn, helps improve Hadoop performance. Per the proposed method, job files that need to be frequently accessed are stored in outer disk tracks which are capable of facilitating sequential-access speeds that are higher than those provided by inner tracks. Thus, the proposed method stores temporary and permanent files in the outer and inner zones, respectively, thereby facilitating fast access to frequently required data. Results of performance evaluation demonstrate that the proposed method improves Hadoop performance by 15.4% when compared to normal cases when file placement is not used. Additionally, the proposed method outperforms a previously proposed placement approach by 11.1%.},
keywords={},
doi={10.1587/transinf.2019EDP7337},
ISSN={1745-1361},
month={October},}
Copiar
TY - JOUR
TI - Job-Aware File-Storage Optimization for Improved Hadoop I/O Performance
T2 - IEICE TRANSACTIONS on Information
SP - 2083
EP - 2093
AU - Makoto NAKAGAMI
AU - Jose A.B. FORTES
AU - Saneyasu YAMAGUCHI
PY - 2020
DO - 10.1587/transinf.2019EDP7337
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2020
AB - Hadoop is a popular data-analytics platform based on Google's MapReduce programming model. Hard-disk drives (HDDs) are generally used in big-data analysis, and the effectiveness of the Hadoop platform can be optimized by enhancing its I/O performance. HDD performance varies depending on whether the data are stored in the inner or outer disk zones. This paper proposes a method that utilizes the knowledge of job characteristics to realize efficient data storage in HDDs, which in turn, helps improve Hadoop performance. Per the proposed method, job files that need to be frequently accessed are stored in outer disk tracks which are capable of facilitating sequential-access speeds that are higher than those provided by inner tracks. Thus, the proposed method stores temporary and permanent files in the outer and inner zones, respectively, thereby facilitating fast access to frequently required data. Results of performance evaluation demonstrate that the proposed method improves Hadoop performance by 15.4% when compared to normal cases when file placement is not used. Additionally, the proposed method outperforms a previously proposed placement approach by 11.1%.
ER -