The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Para realizar pesquisas empíricas sobre o desenvolvimento de software industrial, é necessário obter dados de projetos reais de software da indústria. Contudo, apenas alguns conjuntos de dados da indústria estão disponíveis publicamente; e, infelizmente, a maioria deles é muito antiga. Além disso, a maioria das empresas de software atuais não consegue tornar os seus dados abertos, porque o desenvolvimento de software envolve muitas partes interessadas e, portanto, a confidencialidade dos seus dados deve ser fortemente preservada. Para tanto, este estudo propõe um método para gerar artificialmente um conjunto de dados de projeto de software “imitador”, cujas características (como média, desvio padrão e coeficientes de correlação) são muito semelhantes a um determinado conjunto de dados confidenciais. Em vez de usar o conjunto de dados original (confidencial), espera-se que os pesquisadores usem o conjunto de dados simulados para produzir resultados semelhantes aos do conjunto de dados original. O método proposto utiliza a transformada de Box-Muller para gerar números aleatórios normalmente distribuídos; e transformação exponencial e reordenação de números para mimetismo de dados. Para avaliar a eficácia do método proposto, a estimativa de esforço é considerada um domínio potencial de aplicação para o emprego de dados mímicos. Os modelos de estimativa são construídos a partir de 8 conjuntos de dados de referência e seus respectivos dados mímicos. Nossos experimentos confirmaram que os modelos construídos a partir de conjuntos de dados mímicos apresentam desempenho de estimativa de esforço semelhante aos modelos construídos a partir de conjuntos de dados originais, o que indica a capacidade do método proposto em gerar amostras representativas.
Maohua GAN
Okayama University
Zeynep YÜCEL
Okayama University
Akito MONDEN
Okayama University
Kentaro SASAKI
Okayama University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Maohua GAN, Zeynep YÜCEL, Akito MONDEN, Kentaro SASAKI, "Empirical Evaluation of Mimic Software Project Data Sets for Software Effort Estimation" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 10, pp. 2094-2103, October 2020, doi: 10.1587/transinf.2019EDP7150.
Abstract: To conduct empirical research on industry software development, it is necessary to obtain data of real software projects from industry. However, only few such industry data sets are publicly available; and unfortunately, most of them are very old. In addition, most of today's software companies cannot make their data open, because software development involves many stakeholders, and thus, its data confidentiality must be strongly preserved. To that end, this study proposes a method for artificially generating a “mimic” software project data set, whose characteristics (such as average, standard deviation and correlation coefficients) are very similar to a given confidential data set. Instead of using the original (confidential) data set, researchers are expected to use the mimic data set to produce similar results as the original data set. The proposed method uses the Box-Muller transform for generating normally distributed random numbers; and exponential transformation and number reordering for data mimicry. To evaluate the efficacy of the proposed method, effort estimation is considered as potential application domain for employing mimic data. Estimation models are built from 8 reference data sets and their concerning mimic data. Our experiments confirmed that models built from mimic data sets show similar effort estimation performance as the models built from original data sets, which indicate the capability of the proposed method in generating representative samples.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7150/_p
Copiar
@ARTICLE{e103-d_10_2094,
author={Maohua GAN, Zeynep YÜCEL, Akito MONDEN, Kentaro SASAKI, },
journal={IEICE TRANSACTIONS on Information},
title={Empirical Evaluation of Mimic Software Project Data Sets for Software Effort Estimation},
year={2020},
volume={E103-D},
number={10},
pages={2094-2103},
abstract={To conduct empirical research on industry software development, it is necessary to obtain data of real software projects from industry. However, only few such industry data sets are publicly available; and unfortunately, most of them are very old. In addition, most of today's software companies cannot make their data open, because software development involves many stakeholders, and thus, its data confidentiality must be strongly preserved. To that end, this study proposes a method for artificially generating a “mimic” software project data set, whose characteristics (such as average, standard deviation and correlation coefficients) are very similar to a given confidential data set. Instead of using the original (confidential) data set, researchers are expected to use the mimic data set to produce similar results as the original data set. The proposed method uses the Box-Muller transform for generating normally distributed random numbers; and exponential transformation and number reordering for data mimicry. To evaluate the efficacy of the proposed method, effort estimation is considered as potential application domain for employing mimic data. Estimation models are built from 8 reference data sets and their concerning mimic data. Our experiments confirmed that models built from mimic data sets show similar effort estimation performance as the models built from original data sets, which indicate the capability of the proposed method in generating representative samples.},
keywords={},
doi={10.1587/transinf.2019EDP7150},
ISSN={1745-1361},
month={October},}
Copiar
TY - JOUR
TI - Empirical Evaluation of Mimic Software Project Data Sets for Software Effort Estimation
T2 - IEICE TRANSACTIONS on Information
SP - 2094
EP - 2103
AU - Maohua GAN
AU - Zeynep YÜCEL
AU - Akito MONDEN
AU - Kentaro SASAKI
PY - 2020
DO - 10.1587/transinf.2019EDP7150
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2020
AB - To conduct empirical research on industry software development, it is necessary to obtain data of real software projects from industry. However, only few such industry data sets are publicly available; and unfortunately, most of them are very old. In addition, most of today's software companies cannot make their data open, because software development involves many stakeholders, and thus, its data confidentiality must be strongly preserved. To that end, this study proposes a method for artificially generating a “mimic” software project data set, whose characteristics (such as average, standard deviation and correlation coefficients) are very similar to a given confidential data set. Instead of using the original (confidential) data set, researchers are expected to use the mimic data set to produce similar results as the original data set. The proposed method uses the Box-Muller transform for generating normally distributed random numbers; and exponential transformation and number reordering for data mimicry. To evaluate the efficacy of the proposed method, effort estimation is considered as potential application domain for employing mimic data. Estimation models are built from 8 reference data sets and their concerning mimic data. Our experiments confirmed that models built from mimic data sets show similar effort estimation performance as the models built from original data sets, which indicate the capability of the proposed method in generating representative samples.
ER -