Supercharge Your Innovation With Domain-Expert AI Agents!

Chinese error correction method based on a pinyin encoding and decoding model

An error correction method and coding model technology, applied in the field of natural language processing, can solve problems such as complex processes, achieve the effects of improving accuracy, strengthening long-distance information extraction capabilities, and improving relevance

Active Publication Date: 2019-03-19
SHANDONG IND TECH RES INST OF ZHEJIANG UNIV
View PDF14 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The patent needs to go through two judgments, and the process is complicated

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese error correction method based on a pinyin encoding and decoding model
  • Chinese error correction method based on a pinyin encoding and decoding model
  • Chinese error correction method based on a pinyin encoding and decoding model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0039] see figure 1 and figure 2 , the Chinese error correction method of the encoding and decoding model based on pinyin of the present embodiment comprises the following steps:

[0040] S100 Chinese text dataset preprocessing

[0041] The present invention adopts the preprocessing of controlling the distribution of the text training set, so that the model can more realistically fit the real environment in the error correction process.

[0042] S101 counting the sentence frequency in the original data set, and arranging the sentences according to the sentence frequency from high to low;

[0043] S102 controls the maximum sentence frequency of the data set, and uses a natural exponential function to change the frequency of sentences in the data set;

[0044] S103 converts the Chinese text sequences in the data set into their corresponding pinyin sequences one by one. The labels of the corresponding Chinese text sequences are: "where", "home", "medicine", "hospital";

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Chinese error correction method based on a pinyin encoding and decoding model. The invention discloses a Chinese character recognition method, belongs to the field of naturallanguage processing, and comprises the following steps: converting Chinese characters into pinyin sequences, vectorizing the pinyin sequences, inputting the pinyin sequences into an encoding model, encoding the pinyin sequences by the encoding model, and positively and negatively decoding a current target Chinese text sequence by a decoding model by using an attention mechanism. And after probability weighted addition is carried out on the positively and negatively decoded Chinese sequence, whether the probability corresponding to the target sequence character is greater than a threshold value is judged, if yes, a prediction character is used, and if not, an original Chinese sequence character is used, and the obtained final target Chinese sequence is the corrected Chinese sequence. A deep learning model of a structure of an encoding model and a forward and reverse decoding model is used, and effective text information characteristics are extracted in the encoding and decoding process, so that the correlation of contexts in a Chinese error correction task is improved, and the accuracy of the model is improved.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a Chinese error correction method based on a phonetic encoding and decoding model. Background technique [0002] With the great development of deep learning in different fields such as image and speech recognition, methods based on deep learning are also widely used in the field of natural language processing. At present, compared with traditional algorithms, computer systems based on deep learning have remarkable results in natural language tasks such as named entity recognition (Named Entity Recognition), machine translation (Machine Translation), and aspect extraction (Aspect Extraction). . [0003] Chinese text error correction is an important research direction in computer natural language processing. In recent years, it has received extensive attention in the computer field. Its task is to correct the errors caused by human factors in Chinese text according to it...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22G06F17/27
CPCG06F40/126G06F40/211G06F40/232G06F40/30
Inventor 吴健胡汉一王文哲陆逸飞吴福理
Owner SHANDONG IND TECH RES INST OF ZHEJIANG UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More