Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Entity-based lexical inspection method and device, computer equipment and storage medium

A technology of lexical checking and substantive words, which is applied in the fields of entity-based lexical checking methods and devices, computer equipment and storage media, and can solve problems such as reduced error correction success rate, grammatical detection algorithm is limited to finding homophonic wrong words, misjudgment, etc. , to achieve the effect of wide coverage

Pending Publication Date: 2020-09-11
SHENZHEN GIISO INFORMATION TECH
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In the current technology, lexical detection is usually based on edit distance, language model, and dependency syntax to verify homonym typos, which is limited to the impact of the quality of language models and homonym thesaurus. The current grammar detection algorithm is limited to finding homonyms, and will A large number of misjudgments occurred
Because the current grammar detection is mainly based on the recognition of a single word, it will also lead to the error correction of two words in a row. When using the edit distance to replace words, the error correction is often performed according to the word order, which leads to error correction. The success rate is greatly reduced

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity-based lexical inspection method and device, computer equipment and storage medium
  • Entity-based lexical inspection method and device, computer equipment and storage medium
  • Entity-based lexical inspection method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] Such as figure 1 The shown method flowchart is Embodiment 1 of the present invention, and an entity-based lexical checking method includes the following steps:

[0048] Step S1, perform word segmentation and word segmentation on the text to be processed;

[0049] Step S2, calculating the word-level N-Gram score of three adjacent words, and the word-level N-Gram score of three adjacent words;

[0050] Step S3, by calculating the average absolute deviation of the word-level N-Gram score and the word-level N-Gram score, the words with a value greater than the threshold are initially identified as wrong words, and a set of wrong words is created;

[0051] Step S4, counting the wrong words and creating a candidate set, respectively substituting the candidate words in the candidate set into the original text in order to replace the previous wrong words;

[0052]Step S5, perform word segmentation and word segmentation on the combined new text, use word-level N-Gram to calcul...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an entity-based lexical examination method and device, computer equipment and a storage medium. The method comprises the steps of calculating word level N-Gram scores of threeadjacent words and word level N-Gram scores of three adjacent words; calculating an average absolute deviation between a word-level N-Gram score and a word-level N-Gram score, preliminarily identifying words with values greater than a threshold value as wrong words, and creating a wrong word set; counting the wrong words and creating a candidate set, respectively substituting the candidate words in the candidate set into the original text according to a sequence, and replacing the previous wrong words; performing word segmentation and character segmentation on the combined new text, calculating the sentence confusion degree of a word level by using a word level N-Gram, calculating the sentence confusion degree of a word level by using a word level N-Gram, and calculating the word average confusion degree of the sentence; and after calculating the confusion degree of the candidate words in the candidate set, comparing the confusion degree with the confusion degree of the original sentence, and selecting the candidate word with the minimum confusion degree as the optimal candidate word.

Description

technical field [0001] The invention relates to the technical field of statistical natural language processing, in particular to an entity-based lexical checking method and device, computer equipment and storage media. Background technique [0002] In the current technology, lexical detection is usually based on edit distance, language model, and dependency syntax to verify homonym typos, which is limited to the impact of the quality of language models and homonym thesaurus. The current grammar detection algorithm is limited to finding homonyms, and will There were a lot of misjudgments. Because the current grammar detection is mainly based on the recognition of a single word, it will also lead to the error correction of two words in a row. When using the edit distance to replace words, the error correction is often performed according to the word order, which leads to error correction. success rate is greatly reduced. [0003] The information disclosed in this Background ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/253G06F40/289G06F40/295G06N3/04
CPCG06F40/253G06F40/289G06F40/295G06N3/044G06N3/045
Inventor 李勇斌郑海涛冯勤宇赵从志卢炳干
Owner SHENZHEN GIISO INFORMATION TECH
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More