Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

An error correction method and system based on a language model and word features

A technology of language model and error correction method, which is applied in natural language data processing, special data processing applications, instruments, etc., and can solve problems such as relying on word segmentation effects

Pending Publication Date: 2018-12-07
ZHONGAN INFORMATION TECH SERVICES CO LTD
View PDF4 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method relies too much on the word segmentation effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An error correction method and system based on a language model and word features
  • An error correction method and system based on a language model and word features
  • An error correction method and system based on a language model and word features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] A schematic flow chart of a preferred embodiment of the error correction method based on the language model and word features of the present invention, as figure 1 As shown, the method includes the following steps:

[0053] S1: Obtain the first sentence and input it into the wrong word detection system, then use the language model to detect the first sentence and return the suspect word;

[0054] S2: Input the obtained suspect words into the candidate word recommendation system, and combine at least two different similarity algorithms to select candidate words and output them;

[0055] S3: Replace the suspect word in the first sentence with the candidate word obtained in S2 to form a second sentence, perform sentence scoring on the first sentence and the second sentence, and select a sentence with a higher score for output.

[0056] The above is a basic implementation manner of the technical solution. In this technical solution, the inventor uses at least two differen...

Embodiment 2

[0089] This embodiment is another preferred implementation mode in combination with the basic implementation mode of the above-mentioned embodiment 1. The difference between this embodiment and the above-mentioned embodiment 1 is that in this embodiment, the S2 specifically includes:

[0090] S21: Obtain the suspect word, find out whether there is a vocabulary consistent with the suspect word in the dictionary in the candidate word recommendation system, if yes, return the suspect word; if not, go to S22;

[0091] S22: Use at least two different similarity algorithms to calculate and match candidate words similar to the suspect word, and each algorithm obtains one or more candidate words for output.

[0092] It should be noted that, in this technical step, if a vocabulary consistent with the suspect word is found in the dictionary in the candidate word recommendation system at the time of S21 matching, it means that the suspect word is correct, and the original suspect Word re...

Embodiment 3

[0106] This embodiment is another preferred implementation mode in combination with the basic implementation mode of the above-mentioned embodiment 1. The difference between this embodiment and the above-mentioned embodiment 1 is that in this embodiment, the S3 also includes:

[0107] Using the difference between the scores of the first sentence and the second sentence as the error correction confidence value, if the error correction confidence value is greater than the second threshold, then select the sentence to be tested with a higher score to output; if the error correction If the confidence value is less than the second threshold, the first statement is output.

[0108] Since it is impossible for the language model to make a perfect score for all Chinese word collocations, there are some errors in the above-mentioned wrong word detection method, that is, some of the original correct word collocations are misreported as wrong words, so we can judge whether the n-gram is a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an error correction method based on a language model and word features. The method comprises the following steps: S1, acquiring a first sentence and inputting the first sentence into an error word detection system, and then detecting the first sentence and returning a suspect word by using the language model; S2, inputting that obtained suspect word into a candidate word recommendation system, selecting the candidate word by using at least two different similarity algorithms and outputting the candidate word; S3, replacing the suspect word in the first statement with the candidate word obtained in S2 to form a second statement, scoring the first statement and the second statement respectively, and selecting the statement with higher score for output. The system comprises a detection module, a recommendation module and a scoring module. Through the technical scheme, the error correction accuracy can be improved.

Description

technical field [0001] The present invention relates to the technical field of language processing, in particular to an error correction method based on language models and word features, and further to a system for applying the method. Background technique [0002] There are many ways to realize the technical architecture of the traditional error correction system. The most common way to deal with it is to segment the wrong sentence first. Then compare each word that has been cut with the words in the standard vocabulary. If there is an unregistered word, it will be regarded as a potential wrong word, and it will be corrected in a variety of ways. [0003] For example, Chinese invention patent 201611233791.8 discloses an error correction method and device for an input sentence, which includes: constructing and training a language model based on training corpus; obtaining the error judgment threshold of the language model, which indicates the criticality of the input senten...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
CPCG06F40/232G06F40/205G06F40/289
Inventor 雷画雨周笑添倪博溢
Owner ZHONGAN INFORMATION TECH SERVICES CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products