Classification and identification method and device for junk short messages, computer equipment and storage medium

A spam message classification and identification technology, applied in the field of data processing, can solve problems such as poor effect of spam text messages, poor classification results of spam text messages, irregular writing of spam text messages, etc., and achieve the effect of accurate classification identification and accurate extraction

Pending Publication Date: 2021-03-12
EVERSEC BEIJING TECH
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The classification identification and entity extraction methods of spam text messages in the prior art, due to the short text text of spam text messages and many classifications, the accuracy of the classification results of spam text messages is poor, and because the writing of spam text messages is not standardized, there are homonyms, homonyms, etc. meaning variants, so the effect of entity extraction on spam messages is also poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Classification and identification method and device for junk short messages, computer equipment and storage medium
  • Classification and identification method and device for junk short messages, computer equipment and storage medium
  • Classification and identification method and device for junk short messages, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] figure 1 It is a flow chart of a method for classifying and identifying spam text messages provided by Embodiment 1 of the present invention. This embodiment is applicable to identifying spam text messages in massive text messages, classifying spam text messages, and extracting entity information in spam text messages In the case of the situation, the method can be executed by a device for classifying and identifying junk messages, which can be implemented by software and / or hardware, and generally integrated in computer equipment.

[0024] Such as figure 1 As shown, the technical solution of the embodiment of the present invention specifically includes the following steps:

[0025] S110. Perform text filtering on the short message text collection to obtain a spam short message text collection.

[0026] Wherein, the short message text collection includes a plurality of short message texts obtained from the short message platform, and the short message text collection ...

Embodiment 2

[0044] figure 2 It is a flow chart of a method for classifying and identifying spam text messages provided by Embodiment 2 of the present invention. On the basis of the above-mentioned embodiments, the embodiment of the present invention performs the process of text filtering and classifies the text collection of spam text messages into multiple categories of spam text messages The process of text collection and entity information extraction is further specified, and the process of whitelist and / or blacklist filtering is added before spam classification, and the process of text preprocessing is added after text filtering .

[0045] Correspondingly, such as figure 2 As shown, the technical solution of the embodiment of the present invention specifically includes the following steps:

[0046] S210. According to the tagged training short message text collection and the constructed variant font library, train the machine learning model to obtain an entity information extractio...

Embodiment 3

[0086] image 3 It is a schematic structural diagram of a device for classifying and identifying spam messages provided by Embodiment 3 of the present invention. The device can be implemented by software and / or hardware, and is generally integrated into computer equipment. The device includes: a text filtering module 310 , a category junk short message text collection acquiring module 320 and an entity information extracting module 330 . in:

[0087] Text filtering module 310, is used for carrying out text filtering to short message text collection, obtains junk short message text collection;

[0088] Category spam text set acquisition module 320, for inputting the spam text set into the primary classification model and the secondary classification model successively, to obtain a plurality of categories spam text collections;

[0089] The entity information extraction module 330 is configured to input the text sets of various types of spam messages into the entity informatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a junk short message classification and identification method and device, computer equipment and a storage medium. The method comprises the steps of performing text filtering on a short message text set to obtain a junk short message text set; sequentially inputting the junk short message text set into a first-level classification model and a second-level classification model to obtain a plurality of categories of junk short message text sets; and inputting each type of junk short message text set into the entity information extraction model to obtain each type of junkshort message text set after the entity information is identified or restored. By using the technical scheme of the invention, accurate classification and identification of massive short messages canbe realized, and entity information in junk short messages can be accurately extracted.

Description

technical field [0001] The embodiments of the present invention relate to data processing technology, and in particular to a method, device, computer equipment and storage medium for classifying and identifying spam short messages. Background technique [0002] As a large-scale information exchange platform, SMS provides convenience for the transmission of daily information. However, some criminals transmit spam information through the SMS platform, which has caused adverse effects on social security management and people's daily life. [0003] There are a large number of SMS texts in the SMS platform. Before sending SMS texts, operators need to identify and intercept spam SMS texts, and extract the entity information of the intercepted spam SMS texts, so as to assist the supervision department to trace the information sources of spam SMS messages. , Promote the construction of a green information exchange platform. In the prior art, the text content of a large number of s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04W12/088H04W12/128G06F16/35H04W4/14
CPCG06F16/35H04W4/14
Inventor 黄之李林翰周小明陈浩武林红侯立冬孟宝权梁彧田野傅强王杰杨满智蔡琳金红陈晓光
Owner EVERSEC BEIJING TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products