Text classification method, computing device and computer storage medium

A text classification and text technology, applied in text database clustering/classification, computing, unstructured text data retrieval, etc., can solve the problems of increasing the difficulty and cost of maintaining sensitive thesaurus, and the time-consuming text review, etc., and achieve the speed of improvement. and accuracy, the effect of improving accuracy

Active Publication Date: 2020-07-31
ZHANGYUE TECH CO LTD
View PDF10 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the inventor found that the prior art has at least the following deficiencies in the process of realizing the present invention: on the one hand, with the development of language habits, the sensitive lexicon will continue to expand, which increases the maintenance difficulty and cost of the sensitive lexicon, and will cause Text review is taking longer and longer; on the other hand, in order to avoid content review, content creators will use allusion, borrowing, etc. to transfer sensitive information. This kind of vocabulary is normal on the surface, but it involves sensitive information on the semantic level. , which cannot be discovered by string matching alone

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method, computing device and computer storage medium
  • Text classification method, computing device and computer storage medium
  • Text classification method, computing device and computer storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

[0031] figure 1 A flow chart showing a text classification method provided by an embodiment of the present invention, such as figure 1 As shown, the method includes the following steps:

[0032] Step S110, by training the unsupervised corpus, extracting the semantic features of each character and each common word in the unsupervised corpus to obtain a corpus feature set.

[0033] Pre-train on a large-scale unsupervised corpus to extract the s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method, computing equipment and a computer storage medium, and the method comprises the steps: carrying out the training of an unsupervised corpus, extracting the semantic features of all characters in the unsupervised corpus and the semantic features of all common words, and obtaining a corpus feature set; performing word segmentation processing on the labeled sample corpus to obtain a word segmentation processing result, and determining common words and non-common words contained in the word segmentation processing result; carrying out segmentation processing on the non-common words to obtain each character contained in the non-common words; semantic features corresponding to common words contained in the word segmentation processing result and semantic features corresponding to all characters contained in non-common words are obtained from the corpus feature set; training to obtain a violation classification model according to the obtained semantic features and the annotation information of the annotated sample corpus; and based on the violation classification model, performing classification processing on the to-be-classified text.According to the method, semantic-level content classification can be realized, and the text classification accuracy is improved.

Description

technical field [0001] The invention relates to the technical field of text classification, in particular to a text classification method, computing equipment and computer storage media. Background technique [0002] Books, articles and other creative content need to be reviewed before they are released online to filter out sensitive content such as pornography, terrorism, and politics. In the prior art, it is usually to build a sensitive word library, and search for sensitive words in the text to be reviewed by string matching, so as to filter out sensitive content, which can liberate manpower. [0003] However, the inventor found that the prior art has at least the following deficiencies in the process of realizing the present invention: on the one hand, with the development of language habits, the sensitive lexicon will continue to expand, which increases the maintenance difficulty and cost of the sensitive lexicon, and will cause Text review is taking longer and longer;...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F40/30G06F40/289
CPCG06F16/35Y02D10/00
Inventor 柳燕煌
Owner ZHANGYUE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products