Chinese text error detection method and system based on word order and semantic conjoint analysis

A joint analysis and text technology, applied in semantic analysis, natural language data processing, instruments, etc., can solve problems such as unsatisfactory weight distribution and inability to perform error detection well, and achieve anti-interference ability, increase quantity, The effect of deepening understanding

Pending Publication Date: 2022-05-27
HANGZHOU DIANZI UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the current mainstream technology is difficult to dig out the semantic problems of words, so it cannot perform error detection well.
Moreover, the relationship between different words is different, and different weights need to be assigned to represent their relevance. The existing methods are not ideal for the distribution of weights.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese text error detection method and system based on word order and semantic conjoint analysis
  • Chinese text error detection method and system based on word order and semantic conjoint analysis
  • Chinese text error detection method and system based on word order and semantic conjoint analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0071] The specific embodiments of the present invention will be described in further detail below with reference to the accompanying drawings. Its specific process is described as follows figure 1 shown, where:

[0072] Step 1: Preprocess the input data obtained by the model.

[0073] The preprocessing process is divided into the following four steps:

[0074] 1-1 Create a dictionary. Perform word segmentation on all text sentences to construct a set of candidate Chinese characters The frequency of occurrence of each word is counted according to the set, and the words whose frequency is lower than 3 are filtered, and the filtered set is deduplicated to form a set of Chinese characters D(w). Insert some special symbols such as "START" starter, "END" terminator, "CLS" spacer, "UNKNOW" unknown character, "PAD" filler and so on into the Chinese character set D(w). These symbols help the computer to better fit the text. Then use the index to mark each word in the Chinese wo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese text error detection method and system based on word order and semantic conjoint analysis. In order to solve the problem that an existing Chinese text error detection method cannot deeply understand Chinese text semantics and cannot automatically distribute weights, a Chinese text error prediction model which regards a text as a one-dimensional picture and uses a bidirectional recurrent neural network to fit the text and a self-attention mechanism to distribute weights is designed. A semantic understanding module (FR) composed of a full convolutional neural network (FCN) and a residual network (ResNet) is adopted, and the method has the following two advantages that firstly, the full convolutional neural network (FCN) is used for regarding one-dimensional text data as a one-dimensional picture, text semantics are understood, and the problem that semantic processing means are lacked in the prior art is solved; and secondly, a residual network (ResNet) is used to deepen the number of layers of the network, improve the number of features and deepen the understanding degree of text semantics.

Description

technical field [0001] The invention belongs to the fields of Chinese text processing, text cleaning and text error detection, and relates to a Chinese text error detection method and system based on joint analysis of word order and semantics. Background technique [0002] With the development of science and technology, the popularization of 4G and 5G, the informationization of the whole society is increasing day by day. Online office and remote office are no longer a fantasy, and the paperless era has come. With the advent of paperless, more and more information is stored in storage devices in the form of electronic information. Because of the particularity of the text, only slight differences may bring about completely different meanings. It may be that one word increases and the meaning of the whole sentence becomes different. These problems have brought huge troubles and losses to people. Like official documents, academic papers, legal documents, and case documents, th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/30G06F40/232G06F40/211G06F40/117G06N3/04
CPCG06F40/30G06F40/211G06F40/232G06F40/117G06N3/048
Inventor 周仁杰沈佳冰任永坚张纪林万健曾艳寇亮袁俊峰王星
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products