Unlock instant, AI-driven research and patent intelligence for your innovation.

Generating method and device of substitute dictionaries

A technology of dictionaries and words, applied in the field of data search, can solve the problem of low accuracy and recall rate of replacing dictionaries

Active Publication Date: 2015-04-01
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF2 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The biggest disadvantage of the above approach is: directly using the IBM model to generate a replacement dictionary, resulting in low accuracy and recall of the generated replacement dictionary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Generating method and device of substitute dictionaries
  • Generating method and device of substitute dictionaries
  • Generating method and device of substitute dictionaries

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0100] see figure 1 , the method for generating a replacement dictionary provided in this embodiment specifically includes: operation 101 to operation 104 .

[0101] In operation 101, sentence pair resources are obtained.

[0102] Specifically, the sentence pair resource is composed of a query sentence input by the user and a word corresponding to the query question clicked by the user on the title part (in bold font here). These sentence pair resources are available on the Internet. For example, using the Baidu search tool, the user enters teen movie, and Baidu displays the following results:

[0103] Top 10teenage movies for girls of all time 2014 – Squidoo

[0104] www.Squidoo.com>…>Movies>Blockbuster Movies ▼ translate this page

[0105] These are my favorite high school movies. It’s probably a bit juvenile of me. but I always love a good teenage movie. And since I’m a girl. I guess…  

[0106] Ranking the 10Best Teen Films of 2013Thus Far|BlackBook

[0107] www.bbo...

Embodiment 2

[0159] Based on the foregoing embodiments, this embodiment provides another method for generating a replacement dictionary.

[0160] see image 3 , the method for generating a replacement dictionary provided in this embodiment specifically includes: operation 201 to operation 208 .

[0161] In operation 201, sentence pair resources are obtained. For details, refer to the description in Embodiment 1 above, and details will not be repeated here.

[0162] In operation 202, the sentence pair resources are preprocessed.

[0163] This operation performs error correction processing, word segmentation processing, part-of-speech tagging, proper name recognition, word segmentation correction processing, and data normalization processing on sentence pair resources. Through the above preprocessing, more erroneous data in the sentence pair resources can be filtered out, and alignment errors caused by partial word segmentation errors can be avoided. For example, before the word segmentat...

Embodiment 3

[0198] see Figure 5 , the device for generating a replacement dictionary provided in this embodiment specifically includes: an acquisition module 11 , a rule alignment module 12 , a statistical alignment module 13 and a generation module 14 .

[0199]Obtaining module 11 is used for obtaining sentence pair resource;

[0200] The rule alignment module 12 is used to utilize language prior knowledge to carry out rule alignment to described sentence to resource, generates the first replacement dictionary;

[0201] The statistical alignment module 13 is used to perform statistical alignment on the remaining corpus in the sentence-pair resource by using an IBM model that incorporates language prior knowledge to generate a second replacement dictionary; wherein, the remaining corpus is in the sentence-pair resource. The remaining words after the rule alignment is carried out by the rule alignment module;

[0202] The generation module 14 is used for generating a third replacement d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a generating method and device of substitute dictionaries. The method includes: acquiring sentence pair resources, and subjecting the sentence pair resources to rule alignment through language prior knowledge so as to generate a first substitute dictionary; subjecting remaining corpora of the sentence pair resources to statistical alignment through an IBM model in which the language prior knowledge is integrated, so as to generate a second substitute dictionary; generating an online available third substitute dictionary according to the first and second substitute dictionaries. The remaining corpora are remaining words and expressions occurring after the sentence pair resources are subjected to rule alignment. The generating method and device helps increase accuracy of the substitute dictionaries and their recall rate.

Description

technical field [0001] Embodiments of the present invention relate to data search technology, and in particular to a method and device for generating a replacement dictionary. Background technique [0002] When the search engine retrieves the sentence input by the user, in order to return more search results, it needs to replace the keywords in the sentence with synonyms, and then use the replaced synonyms to search. In the search engine, the rewriting module is responsible for synonymous replacement of the keywords in the sentence according to the replacement dictionary. Therefore, the quality of the replacement dictionary directly determines the retrieval effect, and improving the accuracy and recall of the replacement dictionary will directly bring relevance benefits. [0003] At present, the common way to generate a replacement dictionary is: for sentence pair resources, first use the IBM model to perform statistical alignment to generate a replacement dictionary, and t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/374
Inventor 石磊李朋凯曾增烽林英展
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD