The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Os dados semiestruturados compreendem uma estrutura irregular e não possuem esquema de banco de dados a priori, portanto, encontramos vários problemas, como recuperação ineficiente de dados e armazenamento de dados desnecessário. Para lidar com tais problemas, foram propostos alguns algoritmos de extração de esquemas sobre dados semiestruturados, nos quais os dados são modelados como um não ordenado árvore. No entanto, a ordem dos elementos é indispensável para os dados do documento, por isso consideramos extrair um esquema de banco de dados ideal em um banco de dados. ordenado árvore. Consideramos um problema de otimização para extrair o menor esquema de banco de dados tal que a densidade de cada classe não seja inferior a um determinado limite, onde a densidade de uma classe representa uma semelhança entre o tipo da classe e os dos objetos da classe. Primeiro provamos que o problema de decisão correspondente é fortemente NP-completo e mostramos que outra versão do problema é fortemente NP-difícil e pertence a Δ2 P. Então mostramos que para qualquer r <3/2, não há tempo polinomial r-algoritmo de aproximação que resolve o problema de otimização a menos que P = NP. Por fim, propomos uma espécie de classe chamada classe limitada que pode ser construído de forma eficiente e, em seguida, mostre um algoritmo de tempo polinomial para construir um esquema de banco de dados usando classes limitadas.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copiar
Nobutaka SUZUKI, Yoichirou SATO, Michiyoshi HAYASE, "Complexity and a Method of Extracting a Database Schema over Semistructured Documents" in IEICE TRANSACTIONS on Information,
vol. E85-D, no. 6, pp. 940-949, June 2002, doi: .
Abstract: Semistructured data comprises irregular structure and has no a-priori database schema, therefore we encounter several problems such as inefficient data retrieval and wasteful data storage. To cope with such problems, some schema extraction algorithms over semistructured data have been proposed, in which data is modeled as an unordered tree. However, the order of elements is indispensable for document data, therefore we consider extracting an optimal database schema over an ordered tree. We consider an optimization problem to extract a smallest database schema such that the density of each class is no less than a given threshold, where the density of a class represents a similarity between the type of the class and those of the objects in the class. We first prove that the corresponding decision problem is strongly NP-complete, and show that another version of the problem is strongly NP-hard and belongs to Δ2 P. Then we show that for any r < 3/2, there is no polynomial-time r-approximation algorithm that solves the optimization problem unless P = NP. Finally, we propose a kind of class called bounded class that can be constructed efficiently, then show a polynomial-time algorithm for constructing a database schema by using bounded classes.
URL: https://global.ieice.org/en_transactions/information/10.1587/e85-d_6_940/_p
Copiar
@ARTICLE{e85-d_6_940,
author={Nobutaka SUZUKI, Yoichirou SATO, Michiyoshi HAYASE, },
journal={IEICE TRANSACTIONS on Information},
title={Complexity and a Method of Extracting a Database Schema over Semistructured Documents},
year={2002},
volume={E85-D},
number={6},
pages={940-949},
abstract={Semistructured data comprises irregular structure and has no a-priori database schema, therefore we encounter several problems such as inefficient data retrieval and wasteful data storage. To cope with such problems, some schema extraction algorithms over semistructured data have been proposed, in which data is modeled as an unordered tree. However, the order of elements is indispensable for document data, therefore we consider extracting an optimal database schema over an ordered tree. We consider an optimization problem to extract a smallest database schema such that the density of each class is no less than a given threshold, where the density of a class represents a similarity between the type of the class and those of the objects in the class. We first prove that the corresponding decision problem is strongly NP-complete, and show that another version of the problem is strongly NP-hard and belongs to Δ2 P. Then we show that for any r < 3/2, there is no polynomial-time r-approximation algorithm that solves the optimization problem unless P = NP. Finally, we propose a kind of class called bounded class that can be constructed efficiently, then show a polynomial-time algorithm for constructing a database schema by using bounded classes.},
keywords={},
doi={},
ISSN={},
month={June},}
Copiar
TY - JOUR
TI - Complexity and a Method of Extracting a Database Schema over Semistructured Documents
T2 - IEICE TRANSACTIONS on Information
SP - 940
EP - 949
AU - Nobutaka SUZUKI
AU - Yoichirou SATO
AU - Michiyoshi HAYASE
PY - 2002
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E85-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2002
AB - Semistructured data comprises irregular structure and has no a-priori database schema, therefore we encounter several problems such as inefficient data retrieval and wasteful data storage. To cope with such problems, some schema extraction algorithms over semistructured data have been proposed, in which data is modeled as an unordered tree. However, the order of elements is indispensable for document data, therefore we consider extracting an optimal database schema over an ordered tree. We consider an optimization problem to extract a smallest database schema such that the density of each class is no less than a given threshold, where the density of a class represents a similarity between the type of the class and those of the objects in the class. We first prove that the corresponding decision problem is strongly NP-complete, and show that another version of the problem is strongly NP-hard and belongs to Δ2 P. Then we show that for any r < 3/2, there is no polynomial-time r-approximation algorithm that solves the optimization problem unless P = NP. Finally, we propose a kind of class called bounded class that can be constructed efficiently, then show a polynomial-time algorithm for constructing a database schema by using bounded classes.
ER -