XML-based retrieval method oriented to constraint on integrated paths of large amount of small-size XML documents

A path-constrained, small-scale technology, applied in the database field, which can solve the problems of difficult fragment granularity, less information, and incomplete application.

Inactive Publication Date: 2010-08-18
NANKAI UNIV
View PDF2 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the hierarchical structure of XML documents, this tf-idf weighting scheme on plain text is not fully applicable. Therefore, how to improve the indexing word weighting scheme and vector space model to calculate the similarity between documents and queries has become a further problem. one of the research questions
[0005] At present, for the retrieval of large-scale XML documents, most researchers think that it is not necessary to return the entire document to the user, but only the document fragments that meet the retrieval conditions, but it is usually difficult to determine the granularity of the fragments that meet the retrieval conditions
Moreover, for the retrieval of massive small-scale XML documents, since the documents themselves are short and small, if only some fragments are returned, it is likely to result in too little information provided, which cannot meet the needs of users

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • XML-based retrieval method oriented to constraint on integrated paths of large amount of small-size XML documents
  • XML-based retrieval method oriented to constraint on integrated paths of large amount of small-size XML documents
  • XML-based retrieval method oriented to constraint on integrated paths of large amount of small-size XML documents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0126] The present invention faces a large amount of small-scale XML documents, and proposes a new XML document retrieval method. The overall flow of the method is as follows figure 1 As shown, the following figure 2 The sample XML document shown and sample user query "article / title / xmlbody / section / title / DTD" explain the core content of the method.

[0127] 1. Preprocessing the XML document;

[0128] All XML documents in the retrieval system need to be preprocessed. First, all XML documents are defined as XML document trees, and Dewey encoding is used to encode the entire XML document tree. image 3 show off figure 2 The encoded document tree form corresponding to the sample XML document in . Secondly, an inverted index table of indexing words and node codes is established for the element node names, attribute node names and text node contents of all XML documents in the retrieval system, and finally the frequency of indexing words in each XML document in the system is ca...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an XML-based (extensible markup language) retrieval method oriented to the constraint on integrated paths of a large amount of small-size XML documents. The innovative XML-based retrieval method comprises the following steps: using the keywords subject to the path constraint in an XPath form as a way for a user to submit a query, so that the user can express the requirements for the query in an easier and more accurate way; providing a novel retrieval ranking model subject to the path constraint, wherein by making full use of the characteristic of the structural layer of an XML document, the retrieval ranking model based on the conventional VSM (vector space model) can skillfully apply the N-Gram idea to the matching calculation of the path constraint, thus acquiring the degree of correlation between the document and the query of the user; and finally sequencing the documents according to the degree of correlation. The technical scheme provided by the invention for retrieving the XML documents can accurately express the requirements of a user for queries, and calculate the degree of correlation between the document and the query of the user by making full use of the path constraint of the XML document; and the retrieval result obtained by the technical scheme of the invention can better meet requirements of the user. Therefore, XML-based retrieval method is applicable to the field of retrieval of XML documents and databases.

Description

【Technical field】 [0001] The invention belongs to the technical field of databases, and in particular relates to a scheme for retrieving a large amount of small-scale XML documents through a novel fusion path constraint. 【Background technique】 [0002] Extensible Markup Language (eXtensible Markup Language, XML) has become the most popular standard for information representation and data exchange due to its self-describing, extensible and semi-structured characteristics, and has been widely supported and accepted in various fields. application. With the massive emergence of data and information in the form of XML, how to obtain the information that users are interested in from the massive XML documents has become one of the problems that people pay close attention to. It is based on this demand that information retrieval, a traditional data management and acquisition technology, begins to set foot in the field of XML data. [0003] Because of its simple use and concise int...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 袁晓洁张莹温延龙刘众奇汪陈应
Owner NANKAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products