Multisource semantic analysis based information retrieval method

A technology of semantic analysis and information retrieval, applied in the field of information retrieval based on multi-source semantic analysis, which can solve problems such as query accuracy reduction

Inactive Publication Date: 2016-11-23
BEIJING UNIV OF TECH
View PDF2 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method is based on the initial query. If the first search result is not good, it may extrac

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multisource semantic analysis based information retrieval method
  • Multisource semantic analysis based information retrieval method
  • Multisource semantic analysis based information retrieval method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0039] In order for those skilled in the art to better understand the solution of the present invention, the following will describe the embodiments of the solution of the present invention in detail with reference to the accompanying drawings in the examples of the present invention.

[0040] like figure 1 As shown, the general idea of ​​an information retrieval method for multi-source semantic analysis of the present invention is as follows: first, perform LDA modeling on the preprocessed document, obtain the representation ability of the term on the document at the hidden topic level, and then use the term At the same time, an inverted index is established with the representation ability of the term to the document, so that it can represent the text information in the form of low-dimensional topics; then the user's initial query text is obtained and preprocessed, and then according to whether each query term is a professional medical vocabulary Carry out multi-dimensional a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multisource semantic analysis based information retrieval method. The method comprises the steps that document acquisition and preprocessing are performed; document modeling is performed by utilizing an LDA model, and a reverse index is established; obtaining and preprocessing of user's initial query are performed; multi-dimensional analysis is performed according to the judgment whether queried lexical items are professional medical vocabularies or not, lexical item weighting and query extension are performed based on WordNet and UMLS Metathesaurus; the similarity between a queried extended word set and documents undergoing dimensionality reduction of LDA is calculated, ranking is performed according to progressively decreasing similarity, and the documents which are not lower than a preset threshold value are extracted and returned to a user. The multisource semantic analysis based information retrieval method integrates the characteristics of the WordNet and the UMLS Metathesaurus, conducts multi-dimensional analysis, weighting and extension on the initial query, can make the user's query intention more accurately understood, utilizes the LDA model to perform document modeling, analyzes the document representation capacity of lexical items at hidden theme level and improves the document retrieval performance for the user.

Description

technical field [0001] The invention belongs to the technical field of information retrieval, and in particular relates to an information retrieval method based on multi-source semantic analysis. Background technique [0002] Information retrieval research is a research field that rises with the development of science and technology and the sharp increase of various forms of information. With the popularity of the Internet, medical researchers and doctors often use search engines to obtain the medical information they need. Therefore, how to accurately grasp the user's retrieval intention and how to accurately extract the information that the user is interested in from the massive data Returning information to users has become a primary topic. In response to this problem, the use of query expansion technology to discover and utilize the content of medical literature has become one of the most popular means to improve retrieval performance. [0003] Query expansion is one ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/903
Inventor 亢阳阳李建强田猛孙靖超赵旭莫豪文
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products