Supercharge Your Innovation With Domain-Expert AI Agents!

XML document keyword searching and clustering method based on semantic distance model

A technology of semantic distance and clustering method, applied in the field of web data management, which can solve the problems of incomparability, discarding, and inability to sort.

Inactive Publication Date: 2008-08-13
FUDAN UNIV
View PDF0 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In fact, what the SLCA method does is to select some "optimal" combinations containing all keywords from various combinations of nodes containing different keywords according to certain rules, and the "optimal" combination that the SLCA method considers "It means that the LCA is relatively the lowest, and the results with relatively high LCA are discarded. This is the fundamental reason why the SLCA method will lose meaningful results.
In addition to losing some meaningful results, the SLCA method also has some other problems: (1) During its calculation, not all node combinations are comparable, and there are no two pairs of ancestors and descendants between all LCAs. combinations such as figure 1 The combination {0.2.2.0.1.0, 0.2.2.0.1.1} and the combination {0.0.1.0, 0.0.1.1} are incomparable, so the selected "best" can only be relatively optimal; (2) Since there is no ancestor-descendant relationship between the LCAs of a set of results selected by the SLCA method, they are not comparable, so they cannot be sorted, which is obviously not suitable for the situation where the result set is relatively large; (3) SLCA method Each result is required to contain all keywords, which actually requires an "AND" logical relationship between all keywords used as query statements, and the keyword search engines (such as Google) we use in real life include The "or" relationship is established, which means that keywords can exist in all or part of the results, which makes the search results more likely to meet the user's possible intentions, which is obviously more reasonable and should also be applied to XML keyword search Come

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • XML document keyword searching and clustering method based on semantic distance model
  • XML document keyword searching and clustering method based on semantic distance model
  • XML document keyword searching and clustering method based on semantic distance model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0126] The core of the present invention is to design three kinds of clustering algorithms on the basis of semantic distance model, and the pseudo-code of concrete realization is as follows

[0127] (1) GKSC algorithm

[0128] A LGORITHM 1(G RAPH-BASED C LUSTERING )

[0129] Input: a hierarchical structure H

[0130] Output: a set of optimal clusters C

[0131] 1. for(every list l in H) / / top-down

[0132] 2. for(every node x i in l) / / left-right

[0133] 3. find a right neighbor x j

[0134] 4. while(dis(x i , x j )<=ω)

[0135] 5. link(x i , x j ); / / link two nodes

[0136] 6. find next right neighbor x j ;

[0137] 7. for(every list l′in floor(depth(x i )·ω)layers below l)

[0138] 8. p←findDescPosition(x i , l′);

[0139] 9. traverse leftward and rightward from p until distance

[0140] overflows and link x i with neighbors close enough;

[0141] 10. Use graph partition algorithm to get optimal cluster set C;

[0142] findDesc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention belongs to Web data management technology field, in particular as a key word searching method of XML based on the semantic distance model, called XKLuster. The invention provides a new model, called ''XML key word semantic distance model''. Searching semanteme of the XML key word is measured by fully considerating the layer instructure characteristics of the XML; three clustering algorithms are designed in different corners based on the ''XML key word semantic distance model; a rank-based model is provided to rank all the searching result so as to return the searching result to the user based on the graphic keyword cluster arithmetic, the keyword cluster arithmetic of the core set drive and slack core set drive cluster arithmetic. Contrast to the present method, the invention more reasonable return result. The invention is used for internet XML file search, XML database searching field etc.

Description

technical field [0001] The invention belongs to the technical field of Web data management, and in particular relates to a method for keyword searching of Extensible Markup Language (XML) databases or documents based on clustering ideas. Background technique [0002] Due to the friendly interface and easy use, keyword search has achieved great success in the field of information retrieval, such as Google search engine, Baidu search engine and so on. Their search objects are usually a collection of HTML documents or ordinary text documents. The purpose of searching is to find out which keywords appear in which documents, and return all or part of the documents containing keywords. Due to the massive appearance and wide application of XML format data, the demand for keyword search on XML documents is becoming more and more urgent. In recent years, XML keyword search has received much attention from both industry and academia [1][2][3][4][5][6][7] . XML keyword search is dif...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 杨卫东朱皓
Owner FUDAN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More