Error word correction method and device, computer device and storage medium

A wrong word and pinyin technology, applied in the field of devices, wrong word correction methods, computer devices and computer storage media, can solve problems such as language recognition and correction effects not being effective, special words being recognized as common words, and difficult to find, etc.

Active Publication Date: 2019-08-09
PING AN TECH (SHENZHEN) CO LTD
View PDF6 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For some companies that develop products with speech recognition functions, it is more common to use the speech recognition module of the general-purpose system. If they do not recognize their specific application scenarios, it is easy to recognize some proprietary words as common words.
For example, "Who needs to be insured" is recognized as "Who needs to be Taobao". Since there is no obvious error, it is difficult for the existing typo correction system to find such errors
[0003] At present, there is no effective solution for how to improve the correction effect of language recognition in practical application scenarios

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Error word correction method and device, computer device and storage medium
  • Error word correction method and device, computer device and storage medium
  • Error word correction method and device, computer device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0063] figure 1 It is a flow chart of the method for correcting wrong words provided by Embodiment 1 of the present invention. The method for correcting wrong words is applied to a computer device.

[0064] The method for correcting wrong words of the present invention is to correct the sentences obtained by language recognition. The wrong word correction method can solve the problem that due to the versatility of the speech recognition system, it cannot accurately predict the proprietary words in a specific field, and at the same time enhances the error correction system's ability to find wrong words when the proprietary words are replaced with common words, Improve user experience.

[0065] Such as figure 1 Shown, described wrong word correcting method comprises:

[0066] Step 101, acquire a general natural language data set, the general natural language data set includes multiple sentences.

[0067] The general-purpose natural language dataset is a Chinese text contain...

Embodiment 2

[0111] figure 2 It is a structural diagram of the device for correcting wrong words provided by Embodiment 2 of the present invention. The wrong word correcting device 20 is applied to a computer device. Such as figure 2 As shown, the device 20 for correcting wrong words may include a first acquisition module 201 , a conversion module 202 , a generation module 203 , a pre-training module 204 , a second acquisition module 205 , a fine-tuning module 206 , and an error correction module 207 .

[0112] The first acquiring module 201 is configured to acquire a general natural language data set, and the general natural language data set includes a plurality of sentences.

[0113] The general-purpose natural language dataset is a Chinese text containing everyday expressions.

[0114] The general natural language data set can be collected from data sources such as books, news, web pages (such as Baidu Encyclopedia, Wikipedia, etc.). For example, character recognition can be perf...

Embodiment 3

[0158] This embodiment provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps in the above embodiment of the method for correcting wrong words are implemented, for example figure 1 Steps 101-107 shown:

[0159] Step 101, obtaining a general natural language data set, the general natural language data set includes a plurality of sentences;

[0160] Step 102, converting each sentence contained in the general natural language dataset into a pinyin sequence to obtain a pinyin-sentence pair of the general natural language dataset;

[0161] Step 103, select a plurality of pinyin-sentence pairs from the pinyin-sentence pairs of the general natural language data set, replace part of the pinyin of each selected pinyin-sentence pair with similar pinyin, and obtain the replaced pinyin-sentence pair , forming a first sample set from the unselected pinyin-sentence pairs and the replaced pinyin-sentence...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an error word correction method and device, a computer device and a storage medium. The error word correction method comprises the steps of obtaining a general natural languagedata set; converting each sentence contained in the natural language data set into a Pinyin sequence to obtain Pinyin-Pinyin of the universal natural language data set; sentence pairs; pinyin-partialof a generic natural language dataset Performing pinyin replacement on the sentence pairs to obtain a first sample set; pre-training the neural network model by using the first sample set to obtain apre-trained neural network model; pinyin-containing similar pinyin related to specific fields is acquired Taking the sentence pair as a second sample set; performing fine tuning on the pre-trained neural network model by using the second sample set to obtain a fine-tuned neural network model; and inputting the Pinyin sequence of the sentence to be corrected into the finely adjusted neural networkmodel for correcting to obtain the sentence subjected to error correction. According to the invention, error correction can be carried out on special words identified as common words in language identification.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a method and device for correcting wrong words, a computer device and a computer storage medium. Background technique [0002] With the rapid expansion of speech recognition application scenarios, speech recognition technology is becoming more and more mature, and the market demand for high-accuracy speech recognition is becoming stronger and stronger. For some companies that develop products with speech recognition functions, it is more common to use the speech recognition module of the general system, and do not recognize specific application scenarios, and it is easy to recognize some proprietary words as common words. For example, "who needs to be insured" is identified as "who needs to be Taobao". Since there are no obvious mistakes, it is difficult for the existing typo correction system to find such mistakes. [0003] At present, there is no effective solution ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F17/27G06N3/04G10L25/30G10L15/08
CPCG06F16/3343G06F16/3344G10L25/30G10L15/08G10L2015/088G06F40/232G06F40/279G06N3/045
Inventor 解笑徐国强邱寒
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products