Integrated retrieval method for multi-language information retrieval

An information retrieval, multi-language technology, applied in the information field, can solve the problems of the accuracy of noise retrieval results, loss of source language information, etc., to achieve the effect of improving accuracy and reducing noise

Active Publication Date: 2010-06-30
哈尔滨工业大学高新技术开发总公司
View PDF0 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to solve the problems of loss of source language information, a large amount of noise and low accuracy of retrieval results caused by the existing multilingual information retrieval in separate mode, the present invention provides an integrated retrieval method for multilingual information retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Integrated retrieval method for multi-language information retrieval
  • Integrated retrieval method for multi-language information retrieval
  • Integrated retrieval method for multi-language information retrieval

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0021] Specific implementation mode 1. The specific steps of the multilingual information retrieval integrated retrieval method are as follows:

[0022] Step 1: query the keyword q in the source language input by the user i Keywords translated into the target language t ij , where t ij Indicates the source language query keyword q i The jth reasonable translation of ;

[0023] Step 2, the keyword t of the target language obtained in step 1 ij According to the word order of each word, the modification and collocation relationship of each word, the word distance of each word is divided into three kinds of relationship patterns: exact match pattern, co-occurrence pattern and independent pattern, described exact match pattern is that each word of phrase must be in order Adjacent occurrence; Described co-occurrence mode is that several words that form phrase co-occur in preset window and promptly represents the occurrence of this phrase; Described independent mode is that in ph...

specific Embodiment approach 2

[0042] Embodiment 2. The difference between this embodiment and Embodiment 1 is that the conditional probability P(t of the exact matching pattern in the query document D is obtained in step 3. ij |D, θ Ex ) is specifically: for the exact matching mode, the entire phrase can be regarded as an independent vocabulary, and the maximum likelihood estimation is used for statistics, and the calculation process is expressed as:

[0043] P ( t ij | D , θ Ex ) = Len ( t ij ) × ...

specific Embodiment approach 3

[0045] Specific embodiment 3. The difference between this embodiment and specific embodiment 1 or 2 is that the conditional probability P(t of the co-occurrence pattern in the query document D is obtained in step 3 ij |D, θ Co ) is specifically: by counting the number of co-occurrences of words within the preset window range in the document, combined with the characteristics of phrases in cross-language retrieval, the following co-occurrence pattern is obtained:

[0046] P ( t ij | D , θ Co ) = Σ s = 1 n - 1 Σ t = s + ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An integrated retrieval method for multi-language information retrieval relates to multi-language information retrieval method, solving the problems of source language information loss caused by the multi-language information retrieval of the existing separation mode, a lot of noise and low accuracy of retrieval result, specifically comprising the following steps: step one, translating the source language inquiring key word input by the user into the key word of target language; step two, dividing the key word of target language into three relation modes according to the word order of each word, the decoration and collocation relation of each word, and word distance of each word that are precision matching mode, common display mode and independent mode; step three, obtaining the condition probability of precision matching mode, condition probability of common display mode and condition probability of independent mode in the inquiring file D; step four, calculating the file generating inquiring probability in the inquiring file D; step five, calculating the similarity of source language inquiring key word and inquiring file character vector; step six, calculating the condition probability of multi-language information retrieval; step seven, returning the retrieval result. The method is suitable for cross language information retrieval.

Description

technical field [0001] The invention relates to the field of information technology, in particular to a multilingual information retrieval method. Background technique [0002] With the explosive growth of various types of information in the Internet, the language used to write information in the Internet is becoming increasingly international, and people have put forward higher requirements for information retrieval, that is, they are no longer satisfied with searching in the same language document set, Instead, it is required to include multilingual information in the search results. It is becoming more and more common for users to query a multilingual document set. In order to obtain more, more comprehensive and more accurate information, and to overcome language barriers, people hope to be able to use the language they are most familiar with. (such as: Chinese, English) to describe the user query, and at the same time present the document set written in other languages ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/28
Inventor 郑德权朱红垒赵铁军
Owner 哈尔滨工业大学高新技术开发总公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products