Method, device and electronic equipment for identifying new words in short message spam

A new word recognition and short message technology, which is applied in electronic digital data processing, digital data information retrieval, instruments, etc., can solve the problems of full and low recall rate of new words in spam text messages, so as to improve the detection accuracy and improve The effect of calling full rate and improving call accuracy rate

Active Publication Date: 2022-05-24
ALIBABA GRP HLDG LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] This application provides a method for identifying new spam words in text messages to solve the problem of low recall rate and low recall rate of new words in spam text messages in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, device and electronic equipment for identifying new words in short message spam
  • Method, device and electronic equipment for identifying new words in short message spam
  • Method, device and electronic equipment for identifying new words in short message spam

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0137] Please refer to figure 1 , which is a flowchart of an embodiment of a spam short message vocabulary recognition method provided by the application, and the execution body of the method includes a spam short message vocabulary recognition device. A spam short message vocabulary recognition method provided by this application includes:

[0138] Step S101: Acquire a short message collection.

[0139] The short message, also known as short message or short message, includes but is not limited to mobile phone short messages, and may also be short messages in other forms such as instant messages.

[0140] The short message set includes a plurality of spam short messages and a plurality of normal short messages. The short message category of spam short messages is marked as spam short messages, and the short message category of normal short messages is marked as normal short messages.

[0141] Step S103: Determine candidate word sets corresponding to the multiple spam short...

no. 2 example

[0206] see Figure 5 , which is a schematic diagram of an embodiment of the apparatus for identifying new words in short message spam of the present application. Since the apparatus embodiment is basically similar to the method embodiment, the description is relatively simple, and reference may be made to part of the description of the method embodiment for related parts. The embodiments of the apparatus for identifying new words in short message spam described below are merely illustrative.

[0207] The application further provides a new word recognition device for short message spam, including:

[0208] A short message set obtaining unit 501, configured to obtain a short message set; the short message set includes a plurality of junk short messages and a plurality of normal short messages;

[0209] A candidate word set determining unit 503, configured to determine candidate word sets corresponding to the multiple spam short messages;

[0210] An indicator determination un...

no. 3 example

[0227] Please refer to Image 6 , which is a schematic diagram of an embodiment of the electronic device of the present application. Since the device embodiments are basically similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for related parts. The device embodiments described below are merely illustrative.

[0228] An electronic device in this embodiment includes: a processor 601 and a memory 602; the memory is used to store a program for implementing a method for identifying new words in short message junk, the device is powered on and runs the short message through the processor After the program of the method for identifying new junk words, the following steps are performed: acquiring a short message set; the short message set includes multiple junk short messages and multiple normal short messages; determining a candidate word set corresponding to the multiple junk short messa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The application discloses a method, a device and an electronic device for identifying new words in spam short messages. Wherein, the method includes: obtaining a short message collection; determining a plurality of candidate word sets corresponding to spam short messages; determining the short message category tendency related indicators of the candidate words according to the short message category information; and obtaining the document rarity related indicators of the candidate words; According to the related indicators of SMS category tendency and document rarity, the new word score of spam message is determined; according to the new word score of spam message, the new word of spam message is determined from the candidate word set. With this processing method, most of the new words recalled according to the related indicators of SMS category tendencies are representative of spam messages, avoiding the recall of many normal words; therefore, the recall rate can be effectively improved. At the same time, this processing method makes it possible to recall low-frequency neologisms in spam text messages according to relevant indicators of document rarity; therefore, the recall rate can be effectively improved.

Description

technical field [0001] The present application relates to the technical field of text mining, in particular to a method and device for identifying new words in short message spam, and electronic equipment. Background technique [0002] A typical SMS sending scenario is that merchants send SMS messages to consumers through the network platform, so that information such as product promotions can be sent to consumers in a timely manner, thereby ensuring the effective implementation of the merchant's sales plan and improving user experience. However, along with these beneficial effects, a large number of spam messages have also appeared. The proliferation of spam messages has seriously affected the normal life of consumers, the image of online platforms and even social stability. [0003] With the continuous development of Internet technology, more and more network platforms use SMS content security systems to analyze the content of business-to-customer (B2C) SMS messages, and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F40/30G06F16/332
CPCG06F16/332G06F40/289G06F40/30
Inventor 高喆康杨杨周笑添孙常龙刘晓钟司罗
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products