The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
É introduzida uma técnica que adquire documentos da mesma categoria com um determinado texto curto. Considerando o texto fornecido como documento de treinamento, o sistema marca o documento mais semelhante, ou documentos suficientemente semelhantes, dentre o domínio do documento (ou toda a Web). O sistema então adiciona os documentos marcados ao conjunto de treinamento para aprender o conjunto, e esse processo é repetido até que nenhum outro documento seja marcado. Definir uma propriedade crescente monótona para a similaridade à medida que aprende permite que o sistema 1) detecte o tempo correto para que não restem mais documentos para serem marcados e 2) decida o valor limite que o classificador usa. Além disso, sob a condição de que o processo de normalização seja limitado a quais pesos de termo são divididos por uma norma p dos pesos, o classificador linear no qual os documentos de treinamento são indexados de maneira binária é a única instância que satisfaz a propriedade crescente monótona . A viabilidade da técnica proposta foi confirmada através de um exame de similaridade binária e utilizando documentos em inglês e alemão selecionados aleatoriamente na Web.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Izumi SUZUKI, Yoshiki MIKAMI, Ario OHSATO, "Monotone Increasing Binary Similarity and Its Application to Automatic Document-Acquisition of a Category" in IEICE TRANSACTIONS on Information,
vol. E91-D, no. 11, pp. 2545-2551, November 2008, doi: 10.1093/ietisy/e91-d.11.2545.
Abstract: A technique that acquires documents in the same category with a given short text is introduced. Regarding the given text as a training document, the system marks up the most similar document, or sufficiently similar documents, from among the document domain (or entire Web). The system then adds the marked documents to the training set to learn the set, and this process is repeated until no more documents are marked. Setting a monotone increasing property to the similarity as it learns enables the system to 1) detect the correct timing so that no more documents remain to be marked and to 2) decide the threshold value that the classifier uses. In addition, under the condition that the normalization process is limited to what term weights are divided by a p-norm of the weights, the linear classifier in which training documents are indexed in a binary manner is the only instance that satisfies the monotone increasing property. The feasibility of the proposed technique was confirmed through an examination of binary similarity and using English and German documents randomly selected from the Web.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.11.2545/_p
Copiar
@ARTICLE{e91-d_11_2545,
author={Izumi SUZUKI, Yoshiki MIKAMI, Ario OHSATO, },
journal={IEICE TRANSACTIONS on Information},
title={Monotone Increasing Binary Similarity and Its Application to Automatic Document-Acquisition of a Category},
year={2008},
volume={E91-D},
number={11},
pages={2545-2551},
abstract={A technique that acquires documents in the same category with a given short text is introduced. Regarding the given text as a training document, the system marks up the most similar document, or sufficiently similar documents, from among the document domain (or entire Web). The system then adds the marked documents to the training set to learn the set, and this process is repeated until no more documents are marked. Setting a monotone increasing property to the similarity as it learns enables the system to 1) detect the correct timing so that no more documents remain to be marked and to 2) decide the threshold value that the classifier uses. In addition, under the condition that the normalization process is limited to what term weights are divided by a p-norm of the weights, the linear classifier in which training documents are indexed in a binary manner is the only instance that satisfies the monotone increasing property. The feasibility of the proposed technique was confirmed through an examination of binary similarity and using English and German documents randomly selected from the Web.},
keywords={},
doi={10.1093/ietisy/e91-d.11.2545},
ISSN={1745-1361},
month={November},}
Copiar
TY - JOUR
TI - Monotone Increasing Binary Similarity and Its Application to Automatic Document-Acquisition of a Category
T2 - IEICE TRANSACTIONS on Information
SP - 2545
EP - 2551
AU - Izumi SUZUKI
AU - Yoshiki MIKAMI
AU - Ario OHSATO
PY - 2008
DO - 10.1093/ietisy/e91-d.11.2545
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2008
AB - A technique that acquires documents in the same category with a given short text is introduced. Regarding the given text as a training document, the system marks up the most similar document, or sufficiently similar documents, from among the document domain (or entire Web). The system then adds the marked documents to the training set to learn the set, and this process is repeated until no more documents are marked. Setting a monotone increasing property to the similarity as it learns enables the system to 1) detect the correct timing so that no more documents remain to be marked and to 2) decide the threshold value that the classifier uses. In addition, under the condition that the normalization process is limited to what term weights are divided by a p-norm of the weights, the linear classifier in which training documents are indexed in a binary manner is the only instance that satisfies the monotone increasing property. The feasibility of the proposed technique was confirmed through an examination of binary similarity and using English and German documents randomly selected from the Web.
ER -