Unlock instant, AI-driven research and patent intelligence for your innovation.

Index term weighing computation method based on structural constraint in Chinese information retrieval

A technology of information retrieval and structural constraints, applied in computing, instruments, electrical digital data processing, etc., can solve the problems that the accuracy of weight calculation cannot be guaranteed, will not be considered, and will not be considered.

Active Publication Date: 2011-05-11
THE HONG KONG POLYTECHNIC UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In Chinese, new words will appear frequently, and the word list also needs to be updated frequently. After the word list is updated, the old list used earlier will expire, the weight of the index word needs to be recalculated, and the index needs to be rebuilt. This frequent update of the engine is difficult to achieve, and the accuracy of weight calculation cannot be guaranteed
[0017] In this case, the weight calculation of some index words that are not in the word list is particularly important. In the prior art, the calculation of the weight of such index words is as follows: if it is a word-based index, its weight is calculated by a single word For example, the weight of the word "Hong Kong" is calculated through the words "Hong Kong" and "Li", without considering the relationship between the word and the context, that is, the relationship between "Hong Kong" and "Science and Technology". Calculations are inaccurate
If it is not a word-based index, the statistical method of n-gram is usually used to segment the word and calculate its weight. When calculating the weight of the index word separated by this method, it does not consider whether the index word is in the word list. A part of a word, it will not consider whether the index word is on the boundary of two words in the vocabulary, or the index word itself is a word, so the calculated weight is also inaccurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Index term weighing computation method based on structural constraint in Chinese information retrieval
  • Index term weighing computation method based on structural constraint in Chinese information retrieval
  • Index term weighing computation method based on structural constraint in Chinese information retrieval

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] Specific embodiments of the present invention will be described in detail below. It should be noted that the embodiments described here are for illustration only, and are not intended to limit the present invention.

[0037] see figure 2 As shown, it is a flow chart of the word weight calculation method based on structural constraints in the present invention. For a query D1, one can follow figure 1 The steps shown structure this to form a structured query D2. Let q be a Chinese query, which is structured as formula (1). If there is no string group, formula (1) can be simplified as:

[0038] [(q 1,1 , t 1,1 ) T 1 ],...,[(q m,1 , t m,1 ), T m ]...... (2)

[0039] Where there is no type identifier, that is, no T i , then (2) can be further simplified as:

[0040] (q 1,1 , t 1,1 ),..., (q m,1 , t m,1 )...... (3)

[0041] If there is no part-of-speech tag, that is, there is no t i,j , then in this special case the structured query can be simplified to: ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a feature weighing computation method based on a structural constraint in a Chinese information retrieval. The method comprises the following steps: a. carrying out structuring processing to inquiry and obtaining a structuring inquiry result, wherein the structuring processing comprises one or a plurality of following steps: splitting words, carrying out part-of-speech tagging to splitted works, carrying out shallow parsing to inquiry or carrying out parsing to inquiry; b. determining an index term according to the structuring inquiry result and then determining an inquiry-context property set of the index term according to the structuring inquiry result which is adjacent to the index term and positioned in a word list; c. computing the weighing value of each property in the inquiry-context property set; d. combining the weighing values of all properties into property values of the index terms through a first composite function; and e. combining the property values of the index terms by a second composite function and obtaining the index term weighing. The method can compute the weighing accurately no matter whether the index terms exist in the word list ornot.

Description

technical field [0001] The invention relates to a Chinese information retrieval technology, in particular to a method for calculating the weight of index words based on structural constraints in Chinese information retrieval. Background technique [0002] Due to the popularity of the Internet, a large amount of information is rapidly accumulated and widely used. Therefore, the distance in time and space is no longer the biggest obstacle for people to access and use information. Instead, the problem is that there is no efficient way to find the desired information in the vast amount of information on the Internet. Information retrieval technologies (information retrieval technologies) have received special attention in recent years because they can provide users with convenient ways to access and use desired information. [0003] Search engine (Search Engine) is realized based on information retrieval technology. The important function of search engine is to provide retrieva...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 陆永邦
Owner THE HONG KONG POLYTECHNIC UNIV