Cross-language non-standard word recognition method and device
A technology of non-standard words and identification methods, applied in the field of text processing, can solve the problems of complex and diverse forms of non-standard words, ambiguity of non-standard words, and high cost, so as to achieve easy expansion, reduce dependence on expert resources, and reduce costs. Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
example 2
[0055] Example 2: La Chaux du Milieu has an area, as of 2009, of 17.3 square kilometers (6.7 sq mi).
[0056] In example 1 and example 2, the results after word segmentation are result 1 and result 2 respectively.
[0057] Result 1: On March 26, 2013, in "Morning News", male CCTV reporters who appeared on the screen asked three questions in English.
[0058] Result 2: La Chaux du Milieu has an area , as of 2009 , of 17.3 square kilometers ( 6.7 sq mi ) .
[0059] Select corresponding word segmentation methods according to different language characteristics to ensure the accuracy of word segmentation and improve the accuracy of subsequent non-standard word recognition.
[0060] Step S13, preprocessing the non-standard words in the target language corpus after the word segmentation.
[0061] In an embodiment of the present disclosure, preprocessing the non-standard words in the target language corpus after word segmentation includes: converting the numbers in the non-standard ...
example 3-1
[0069] Example 3-1: International Callers Call: 1-505-998-3793, totally free.
[0070] After preprocessing:
example 3-2
[0071] Example 3-2: International Callers Call : D_1-D_3-D_3-D_4 , totally free .
[0072] In the current English bilingual dictionary, find the following entries corresponding to the target word, and determine the replacement word and its replacement word vector in the source language.
[0073]
[0074]In one embodiment, setting the substitution weight based on the target word includes: if the target word corresponds to one or more source language words in the bilingual dictionary, then setting the substitution weight of each substitution word to 1; For the corresponding source language word, the substitution weight is determined based on the distance between the candidate word and the target word. By replacing the weights, the semantic replacement information in the conversion process can be added to the category classification model, and the accuracy of the classification model can be improved by weighting.
[0075] In one embodiment, searching for candidate words throu...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


