Unlock instant, AI-driven research and patent intelligence for your innovation.

Cross-language non-standard word recognition method and device

A technology of non-standard words and identification methods, applied in the field of text processing, can solve the problems of complex and diverse forms of non-standard words, ambiguity of non-standard words, and high cost, so as to achieve easy expansion, reduce dependence on expert resources, and reduce costs. Effect

Active Publication Date: 2020-07-14
北京海天瑞声科技股份有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The forms of non-standard words are complex and diverse, and it is difficult to generalize. Moreover, non-standard words often have ambiguity and need to be judged with the help of context. Standard word processing is often expensive

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-language non-standard word recognition method and device
  • Cross-language non-standard word recognition method and device
  • Cross-language non-standard word recognition method and device

Examples

Experimental program
Comparison scheme
Effect test

example 2

[0055] Example 2: La Chaux du Milieu has an area, as of 2009, of 17.3 square kilometers (6.7 sq mi).

[0056] In example 1 and example 2, the results after word segmentation are result 1 and result 2 respectively.

[0057] Result 1: On March 26, 2013, in "Morning News", male CCTV reporters who appeared on the screen asked three questions in English.

[0058] Result 2: La Chaux du Milieu has an area , as of 2009 , of 17.3 square kilometers ( 6.7 sq mi ) .

[0059] Select corresponding word segmentation methods according to different language characteristics to ensure the accuracy of word segmentation and improve the accuracy of subsequent non-standard word recognition.

[0060] Step S13, preprocessing the non-standard words in the target language corpus after the word segmentation.

[0061] In an embodiment of the present disclosure, preprocessing the non-standard words in the target language corpus after word segmentation includes: converting the numbers in the non-standard ...

example 3-1

[0069] Example 3-1: International Callers Call: 1-505-998-3793, totally free.

[0070] After preprocessing:

example 3-2

[0071] Example 3-2: International Callers Call : D_1-D_3-D_3-D_4 , totally free .

[0072] In the current English bilingual dictionary, find the following entries corresponding to the target word, and determine the replacement word and its replacement word vector in the source language.

[0073]

[0074]In one embodiment, setting the substitution weight based on the target word includes: if the target word corresponds to one or more source language words in the bilingual dictionary, then setting the substitution weight of each substitution word to 1; For the corresponding source language word, the substitution weight is determined based on the distance between the candidate word and the target word. By replacing the weights, the semantic replacement information in the conversion process can be added to the category classification model, and the accuracy of the classification model can be improved by weighting.

[0075] In one embodiment, searching for candidate words throu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The disclosure relates to a cross-language non-standard word recognition method and device, electronic equipment, and a computer-readable storage medium. Wherein the cross-language non-standard word recognition method includes: obtaining target language corpus; performing word segmentation processing on the target language corpus; preprocessing the non-standard words in the target language corpus after the word segmentation; determining in the target language corpus A substitute word vector and a substitute weight of the target word in the source language; through the substitute word vector and the substitute weight, the non-standard word recognition model of the source language is used to determine the non-standard word category of the target language. By using the semantic information corresponding to word vectors and bilingual dictionaries, the non-standard word type recognition model in the source language is transferred to the target language to identify the non-standard word type in the target language, avoiding the problem of scarcity of corpus in the target language. At the same time, it has good portability.

Description

technical field [0001] The present disclosure relates to the field of text processing, and in particular to a cross-language non-standard word recognition method and device, electronic equipment, and a computer-readable storage medium. Background technique [0002] In the text, other than non-standard words, words composed of characters of this language and conforming to the orthography are called standard words. In addition to the characters and punctuation marks of the language, there are many other symbols, such as Arabic numerals (0-9), currency symbols (such as: ¥, $, €), mathematical symbols (such as: ≥, +, etc.), physical Symbols (such as: km, kg, ℃), etc. These symbols or words cannot be found in commonly used dictionaries, and their pronunciation cannot be obtained through normal pronunciation rules, and in different contexts, their meanings and pronunciations are often different. These words are called non-standard words. The following are examples of non-standa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/284G06F40/247
Inventor 闫启伟郝玉峰黄宇凯曹琼李科宋琼
Owner 北京海天瑞声科技股份有限公司