Literature retrieval method based on semantic small-word model

A world model and document retrieval technology, applied in the computer field, can solve the problems of large overhead for updating index information, inappropriate full-text retrieval, network load, etc., and achieve the effect of improving query speed, reducing information storage, and high accuracy.

Inactive Publication Date: 2009-07-22
HUAZHONG UNIV OF SCI & TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the above methods all require precise metadata matching (such as file names or keywords) to complete the search requirements. Since the semantic information of other nodes in the network cannot be obtained, it is necessary to blindly search a large number of nodes to ensure the recall rate of information retrieval. causing severe network load
Guiding query messages through improved neighbor node index information (such as local indexes) can improve query performance, but updating index information requires very large additional overhead
A structured peer-to-peer network (such as CAN, Chord) based on a distributed hash table can provide good scalability and effective search performance, but it can only support the keyword / value lookup method. For the information retrieval field Full-text search is not suitable, and the overhead of maintaining a structured peer-to-peer network structure is very high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Literature retrieval method based on semantic small-word model
  • Literature retrieval method based on semantic small-word model

Examples

Experimental program
Comparison scheme
Effect test

example

[0047] (1) The specific implementation of establishing a network topology structure with semantic small-world characteristics includes the following steps:

[0048] (1.1) Use latent semantic indexing to extract document feature vectors, as follows:

[0049] Latent semantic indexing is an extension of the traditional vector space model in information retrieval. In the vector space model, documents and queries are expressed as the weight information of all words in the document collection, and the similarity between the query sentence and the document is represented by the cosine of the angle between the two in the vector space. If there are t different words in a collection of d documents, then use the word-document matrix A=(a ij )∈R t×d represents the set. Each column vector a j Corresponding document j, a ij Indicates the weight of word i in document j. Through singular value decomposition, the matrix A is decomposed into three matrixes U, Σ and V, where Σ is a diagona...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a document retrieval method based on a semantic small-world model. Firstly, a latent semantic index is used to extract document feature vectors, and the dimensionality thereof is reduced on the premise of retaining document features as much as possible, so as to reduce the amount of document information storage; and then a support vector machine is used. Classify all the shared documents of the node to form classification information, which marks the interest ratio of the node to the literature category; finally, using the small-world phenomenon in the social network, all the nodes in the peer-to-peer network have direct connections. There are few short link nodes with similar interests, and there are very few long links with a very high interest ratio in a certain literature category, forming a network topology with semantic small world characteristics. The invention is characterized in that the query message is routed to the node most likely to answer the request, improving query efficiency; making full use of long links, the query statement can be quickly routed to other parts of the network, improving the recall rate and reducing network load.

Description

technical field [0001] The invention belongs to distributed computing and information retrieval in the computer field, and specifically relates to a document retrieval method based on a semantic small world model, which mainly uses the semantic small world model to solve efficient information storage and retrieval in a peer-to-peer network for document information sharing question. Background technique [0002] Peer-to-peer network system has attracted more and more attention in the field of large-scale information retrieval due to its characteristics of scalability, fault tolerance, autonomy and self-organization. However, in the peer-to-peer network for document information sharing, how to store and retrieve information effectively is still a very challenging problem. [0003] The small world phenomenon widely exists in social networks, that is, everyone in the world can be connected through a very short social relationship chain, and the length of the social relationship...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 金海宁小敏袁平鹏武浩余一娇
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products