The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Ao aplicar métodos de estimativa, a questão dos valores discrepantes é inevitável. A extensão da sua influência não foi esclarecida, embora vários estudos tenham avaliado métodos de eliminação de valores discrepantes. Não está claro se devemos sempre ser sensíveis aos valores discrepantes, se os valores discrepantes devem ser sempre removidos antes da estimativa e que precauções são necessárias para a recolha de dados do projecto. Portanto, o objetivo deste estudo é ilustrar uma diretriz que sugere com que sensibilidade devemos lidar com outliers. Na análise, adicionamos experimentalmente outliers a três conjuntos de dados, para analisar sua influência. Modificamos a porcentagem de valores discrepantes, sua extensão (por exemplo, variamos o esforço real de 100 a 200 pessoas-hora quando a extensão era de 100%), as variáveis incluindo valores discrepantes (por exemplo, adição de valores discrepantes a pontos de função ou esforço) e o localizações de outliers em um conjunto de dados. A seguir, o esforço foi estimado usando esses conjuntos de dados. Usamos análise de regressão linear múltipla e estimativa baseada em analogias para estimar o esforço de desenvolvimento. Os resultados experimentais indicam que a influência dos valores discrepantes na precisão da estimativa não é trivial quando a extensão ou porcentagem dos valores discrepantes é considerável (ou seja, 100% e 20%, respectivamente). Em contraste, a sua influência é insignificante quando a extensão e a percentagem são pequenas (ou seja, 50% e 10%, respectivamente). Além disso, em alguns casos, a análise de regressão linear foi menos afetada por valores discrepantes do que a estimativa baseada em analogias.
Kenichi ONO
Nara Institute of Science and Technology
Masateru TSUNODA
Kindai University
Akito MONDEN
Okayama University
Kenichi MATSUMOTO
Nara Institute of Science and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Kenichi ONO, Masateru TSUNODA, Akito MONDEN, Kenichi MATSUMOTO, "Influence of Outliers on Estimation Accuracy of Software Development Effort" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 1, pp. 91-105, January 2021, doi: 10.1587/transinf.2020MPP0005.
Abstract: When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020MPP0005/_p
Copiar
@ARTICLE{e104-d_1_91,
author={Kenichi ONO, Masateru TSUNODA, Akito MONDEN, Kenichi MATSUMOTO, },
journal={IEICE TRANSACTIONS on Information},
title={Influence of Outliers on Estimation Accuracy of Software Development Effort},
year={2021},
volume={E104-D},
number={1},
pages={91-105},
abstract={When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.},
keywords={},
doi={10.1587/transinf.2020MPP0005},
ISSN={1745-1361},
month={January},}
Copiar
TY - JOUR
TI - Influence of Outliers on Estimation Accuracy of Software Development Effort
T2 - IEICE TRANSACTIONS on Information
SP - 91
EP - 105
AU - Kenichi ONO
AU - Masateru TSUNODA
AU - Akito MONDEN
AU - Kenichi MATSUMOTO
PY - 2021
DO - 10.1587/transinf.2020MPP0005
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2021
AB - When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.
ER -