Generating method and device of substitute dictionaries
A technology of dictionaries and words, applied in the field of data search, can solve the problem of low accuracy and recall rate of replacing dictionaries
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0100] see figure 1 , the method for generating a replacement dictionary provided in this embodiment specifically includes: operation 101 to operation 104 .
[0101] In operation 101, sentence pair resources are obtained.
[0102] Specifically, the sentence pair resource is composed of a query sentence input by the user and a word corresponding to the query question clicked by the user on the title part (in bold font here). These sentence pair resources are available on the Internet. For example, using the Baidu search tool, the user enters teen movie, and Baidu displays the following results:
[0103] Top 10teenage movies for girls of all time 2014 – Squidoo
[0104] www.Squidoo.com>…>Movies>Blockbuster Movies ▼ translate this page
[0105] These are my favorite high school movies. It’s probably a bit juvenile of me. but I always love a good teenage movie. And since I’m a girl. I guess…
[0106] Ranking the 10Best Teen Films of 2013Thus Far|BlackBook
[0107] www.bbo...
Embodiment 2
[0159] Based on the foregoing embodiments, this embodiment provides another method for generating a replacement dictionary.
[0160] see image 3 , the method for generating a replacement dictionary provided in this embodiment specifically includes: operation 201 to operation 208 .
[0161] In operation 201, sentence pair resources are obtained. For details, refer to the description in Embodiment 1 above, and details will not be repeated here.
[0162] In operation 202, the sentence pair resources are preprocessed.
[0163] This operation performs error correction processing, word segmentation processing, part-of-speech tagging, proper name recognition, word segmentation correction processing, and data normalization processing on sentence pair resources. Through the above preprocessing, more erroneous data in the sentence pair resources can be filtered out, and alignment errors caused by partial word segmentation errors can be avoided. For example, before the word segmentat...
Embodiment 3
[0198] see Figure 5 , the device for generating a replacement dictionary provided in this embodiment specifically includes: an acquisition module 11 , a rule alignment module 12 , a statistical alignment module 13 and a generation module 14 .
[0199]Obtaining module 11 is used for obtaining sentence pair resource;
[0200] The rule alignment module 12 is used to utilize language prior knowledge to carry out rule alignment to described sentence to resource, generates the first replacement dictionary;
[0201] The statistical alignment module 13 is used to perform statistical alignment on the remaining corpus in the sentence-pair resource by using an IBM model that incorporates language prior knowledge to generate a second replacement dictionary; wherein, the remaining corpus is in the sentence-pair resource. The remaining words after the rule alignment is carried out by the rule alignment module;
[0202] The generation module 14 is used for generating a third replacement d...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 