Text filtering method and device and computer storage medium

A text filtering and text technology, applied in computing, natural language data processing, special data processing applications, etc., can solve the problems of high cost, time-consuming and labor-intensive filtering of text, and low efficiency

Pending Publication Date: 2021-05-18
PENG CHENG LAB
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, a text filtering method is provided to solve the problems of manual filtering, time-consuming, labor-intensive, low efficiency, high cost and low quality of filtering text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text filtering method and device and computer storage medium
  • Text filtering method and device and computer storage medium
  • Text filtering method and device and computer storage medium

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0050] refer to figure 2 , figure 2 For the first embodiment of the text filtering method of the present application, the method includes:

[0051] Step S110: Obtain text fluency based on the language model.

[0052] The calculation of text fluency in this application uses a language model combined with perplexity as a fluency evaluation standard. The language model can be an N-gram language model, a topic model, a neural network model, and GPT (Generative Pre-Training), BERT (Bidirectional Encoder Representations from Transformers), XLnet and other pre-trained language models, this application uses the BERT model as an example for detailed process description, but the language model in this application is not limited to the above-mentioned language model, and can be other language models.

[0053] The goal of the BERT model is to use large-scale unlabeled corpus training to obtain text semantic representations containing rich semantic information, and then fine-tune the t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text filtering method and device and a computer storage medium. The method comprises the following steps: obtaining text fluency based on a language model; obtaining an effective word rate based on an effective word dictionary constructed in a user-defined manner; when the text fluency meets a first preset threshold value and the effective word rate meets a second preset threshold value, executing a filtering operation on the text. According to the method, the problems of time and labor consumption, low efficiency, high cost and low quality of manual text screening and filtering are solved, and the semantic-level and character-level text screening quality in the corpus is improved, so the training model and service quality are improved, and the calculation overhead is reduced.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a text filtering method, device and computer storage medium. Background technique [0002] With the rapid development of artificial intelligence technology, the importance of artificial intelligence security has become increasingly prominent, and in artificial intelligence technology, natural language processing technology has been widely used. Natural language processing technology uses text as the processing target to provide assistance for production and life. However, under the background of the big data era and the untrusted environment of multi-source big data, low-quality text data poses a great threat to the training and testing of natural language processing models. In order to deal with the problem of low-quality text, various text filtering schemes emerge in endlessly. [0003] At present, text filtering methods are mostly based on filtering rules formulate...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/335G06F40/216G06F40/242G06F40/289
CPCG06F16/335G06F40/216G06F40/242G06F40/289
Inventor 程正涛张伟哲束建钢艾建文钟晓雄
Owner PENG CHENG LAB
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products