Text detection and correction method based on Pinyin similarity and language model

A pinyin similarity and language model technology, applied in natural language data processing, semantic analysis, character and pattern recognition, etc., can solve dependencies, low text correction accuracy, and text correction methods cannot fully consider sentence semantic information and context and other issues to achieve the effect of avoiding semantic context errors and improving accuracy

Active Publication Date: 2021-01-15
THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
View PDF7 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a text detection and correction method based on pinyin similarity and language model to solve the problem that the existing text correction met...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text detection and correction method based on Pinyin similarity and language model
  • Text detection and correction method based on Pinyin similarity and language model
  • Text detection and correction method based on Pinyin similarity and language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0060] The embodiment of the present invention discloses a text detection and correction method based on pinyin similarity and language model. This method is applied to the correction of instruction texts in professional fields. Since terms in professional fields are usually specific, this method can effectively detect professional words and consider Semantic information of the instruction statement.

[0061] like figure 1 As shown, the present embodiment provides a text detection and correction method based on pinyin similarity and language model, including the following steps:

[0062] Step 1, collect a large number of correct instruction text sentences as training sentences;

[0063] Step 2, select the word of professional fi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text detection and correction method based on Pinyin similarity and a language model. The method comprises the steps: collecting a large number of correct instruction text statements to serve as training statements; selecting words of a professional field from the training statements, and constructing a custom dictionary; carrying out word segmentation on the training statements by utilizing a HanLP language processing toolkit and a custom dictionary; counting the occurrence frequency of each word and each word combination in the word segmentation result in all the training statements, and constructing a Bi-Gram language model; converting the to-be-corrected statement into corresponding to-be-corrected pinyin, and converting words of the custom dictionary into corresponding dictionary pinyin; and correcting the to-be-corrected statement according to the pinyin similarity between the to-be-corrected pinyin and the dictionary pinyin in combination with the sentence rationality of the to-be-corrected statement to obtain a corrected statement. Through word pinyin similarity calculation and sentence rationality analysis, semantic information and contexts of sentences are considered, wrong words in the sentences can be detected, and the correction accuracy is improved.

Description

technical field [0001] The invention relates to the technical field of text detection, in particular to a text detection and correction method based on pinyin similarity and language model. Background technique [0002] When the speech recognition in the open field is directly applied to the professional field, due to the interference of noise, user accent and lack of professional vocabulary, the text after speech recognition has errors, which reduces the analyzability of the text. Chinese correction technology is an important technology to realize the automatic checking and correcting of Chinese sentences. Its purpose is to improve the correctness of the language and reduce the cost of manual verification. Most of the existing research on text correction is oriented to normative texts in open fields such as newspapers, books and periodicals. In specific domains, speech recognition engines in general domains have a low recognition rate for sentences in specific domains, and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/194G06F40/211G06F40/242G06F40/284G06F40/30G06K9/62
CPCG06F40/194G06F40/211G06F40/242G06F40/284G06F40/30G06F18/22Y02D10/00
Inventor 韩竞李晓冬梁木吴蔚王鑫鹏
Owner THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products