Short text auditing method and device fusing variant word recognition

A short text and variant technology, applied in the field of text analysis, can solve the problems of mistakenly harming harmless text, a large amount of harmful text information, and no identification of variants, so as to reduce the possibility of misjudgment, improve the system recall rate, The effect of speeding up iterations

Pending Publication Date: 2021-01-29
INST OF AUTOMATION CHINESE ACAD OF SCI +1
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantages of manual review are also obvious: (1) The iteration of harmful information content is fast, and the update of sensitive thesaurus is delayed
(2) The way of matching sensitive words may "accidentally injure" some harmless texts, so a second review is required manually
(3) The amount of harmful text information is large, and the cost of manual review is high
[0004] Later, a text review method based on machine learning appeared in the industry, which reduced the cost of manual review to a certain extent. This type of method has the following disadvantages: (1) The accuracy of the harmful short text classification method based on traditional machine learning is not high, which is Due to the short length and low content of social media information, traditional machine learning methods are very easy to accidentally damage some harmless short texts containing sensitive words
(2) Harmful information has the characteristics of irregular expression. Information publishers will replace sensitive words with variants (such as homonyms) of sensitive words. Text review methods based on machine learning do not have the ability to identify variants
(3) The topic and content of harmful information are updated quickly, and the model needs to be updated frequently to ensure the recall rate of the model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text auditing method and device fusing variant word recognition
  • Short text auditing method and device fusing variant word recognition
  • Short text auditing method and device fusing variant word recognition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain related inventions, not to limit the invention. It should also be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

[0063] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

[0064] The present invention provides a kind of text checking method of fusion variant word recognition technology and feature vector analysis, and this method comprises:

[0065] Step S100, constructing a configuration lexicon; the configuration of the configuration lexicon include...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of short text auditing, particularly relates to a short text auditing method and device fusing variant word recognition, and aims to solve the problem of how to fusea variant word recognition technology into a harmful text auditing task and realize automatic updating of a model. The method comprises the steps of constructing a configuration word bank, obtainingto-be-audited text data based on a social media platform, screening the to-be-audited text data to obtain suspicious text data, removing meaningless information, and calculating a text feature vectorand a statistical feature vector; and performing feature fusion on the text feature vectors and the statistical feature vectors, obtaining harmful texts through a trained harmful text classification model based on a support vector machine, obtaining sensitive words of the harmful texts by utilizing a preset keyword extraction algorithm, and writing the sensitive words into a configuration word bank. According to the method, the variant word recognition technology is fused into text feature and statistical feature calculation to carry out harmful text auditing tasks, automatic model updating isachieved, and the text auditing accuracy and updating speed are improved.

Description

technical field [0001] The invention belongs to the field of text analysis, and in particular relates to a short text review method and device for fusion variant word recognition. Background technique [0002] With the gradual maturity of various social media platforms such as Twitter and Weibo, the threshold for information dissemination is gradually lowered, and users can easily disseminate information on the Internet. While the scale of user contributions continues to grow, content chaos is also becoming increasingly prominent. Some lawbreakers use social media to spread politically sensitive, maliciously promoted, pornographic and violent content. These harmful contents not only affect the user experience, but also bring great legal risks to the platform and have a very bad impact on the network environment. Therefore, how to identify and filter harmful content from massive amounts of information has become an important issue. [0003] The traditional content review m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/30
CPCG06F40/295G06F40/30
Inventor 孔庆超王婧宜王宇琪王磊毛文吉曾大军王祥王元杰
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products