Unlock instant, AI-driven research and patent intelligence for your innovation.

Substitution dictionary generating method and device

A technology of dictionaries and words, applied in the field of data search, can solve the problem of low accuracy and recall rate of replacing dictionaries

Active Publication Date: 2017-11-03
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The biggest disadvantage of the above approach is: directly using the IBM model to generate a replacement dictionary, resulting in low accuracy and recall of the generated replacement dictionary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Substitution dictionary generating method and device
  • Substitution dictionary generating method and device
  • Substitution dictionary generating method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0100] See figure 1 The method for generating a replacement dictionary provided in this embodiment specifically includes: operations 101 to 104.

[0101] In operation 101, a sentence pair resource is obtained.

[0102] Specifically, the sentence pair resource is composed of the query question sentence input by the user and the user clicked title part (here in bold font) words corresponding to the query question. These sentence pair resources can be obtained on the Internet. For example, using the Baidu search tool and the user enters teen movie, Baidu showed the following results:

[0103] Top 10teenage moviesfor girls of all time 2014–Squidoo

[0104] www.Squidoo.com> …> Movies> Blockbuster Movies ▼ translate this page

[0105] These are my favorite high school movies.It’s probably a bit juvenileof me.but I always love a good teenagemovie.And since I’m a girl.Iguess……

[0106] Ranking the 10Best Teen Films of 2013Thus Far|BlackBook

[0107] www.bbook.com / ranking-the-10-best-teen-fil...

Embodiment 2

[0159] Based on the foregoing embodiment, this embodiment provides another alternative dictionary generation method.

[0160] See image 3 The replacement dictionary generation method provided in this embodiment specifically includes: operation 201 to operation 208.

[0161] In operation 201, the sentence pair resource is obtained. For details, please refer to the description in the foregoing embodiment 1, which will not be repeated here.

[0162] In operation 202, the sentence is preprocessed to the resource.

[0163] This operation performs error correction processing, word segmentation processing, part-of-speech tagging, proper name recognition, word segmentation correction processing and data normalization processing on sentence resources. The above-mentioned preprocessing can filter out more wrong data in sentence pair resources, and avoid alignment errors caused by partial word segmentation errors. For example, prior to word segmentation processing, first perform error correcti...

Embodiment 3

[0198] See Figure 5 The replacement dictionary generation device provided in this embodiment specifically includes: an acquisition module 11, a rule alignment module 12, a statistical alignment module 13, and a generation module 14.

[0199] The obtaining module 11 is used to obtain sentence pair resources;

[0200] The rule alignment module 12 is configured to use prior knowledge of the language to perform regular alignment on the sentence pair resources to generate a first replacement dictionary;

[0201] The statistical alignment module 13 is used to perform statistical alignment on the remaining corpus in the sentence pair resource using the IBM model fused with prior knowledge of language to generate a second replacement dictionary; wherein, the remaining corpus is the sentence pair resource The remaining words after the regular alignment is performed by the regular alignment module;

[0202] The generating module 14 is configured to generate a third replacement dictionary avail...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a generating method and device of substitute dictionaries. The method includes: acquiring sentence pair resources, and subjecting the sentence pair resources to rule alignment through language prior knowledge so as to generate a first substitute dictionary; subjecting remaining corpora of the sentence pair resources to statistical alignment through an IBM model in which the language prior knowledge is integrated, so as to generate a second substitute dictionary; generating an online available third substitute dictionary according to the first and second substitute dictionaries. The remaining corpora are remaining words and expressions occurring after the sentence pair resources are subjected to rule alignment. The generating method and device helps increase accuracy of the substitute dictionaries and their recall rate.

Description

Technical field [0001] The embodiment of the present invention relates to data search technology, in particular to a method and device for generating a replacement dictionary. Background technique [0002] When searching for sentences input by users, search engines need to synonymously replace keywords in sentences in order to return more search results, and then use the replaced synonyms for searching. In the search engine, the rewriting module is responsible for synonymous replacement of keywords in sentences according to the replacement dictionary. Therefore, the quality of the replacement dictionary directly determines the search effect, and improving the accuracy and recall rate of the replacement dictionary will directly bring correlation benefits. [0003] At present, the common method for generating replacement dictionaries is: for sentence pair resources, first use the IBM model to do statistical alignment, generate a replacement dictionary, and then use the language prio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/374
Inventor 石磊李朋凯曾增烽林英展
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More