Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Junk short message new word identification method and device and electronic equipment

A new word recognition and new word technology, which is applied in the fields of electronic digital data processing, digital data information retrieval, instruments, etc., can solve the problems of spam text message recall rate and low recall rate of new words

Active Publication Date: 2020-03-24
ALIBABA GRP HLDG LTD
View PDF8 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] This application provides a method for identifying new spam words in text messages to solve the problem of low recall rate and low recall rate of new words in spam text messages in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Junk short message new word identification method and device and electronic equipment
  • Junk short message new word identification method and device and electronic equipment
  • Junk short message new word identification method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0137] Please refer to figure 1 , which is a flow chart of an embodiment of a spam text vocabulary recognition method provided by the present application, and the execution body of the method includes a spam short message vocabulary recognition device. A method for identifying spam text vocabulary provided by the application includes:

[0138] Step S101: Obtain a short message collection.

[0139] The short message, also known as a short message or a short message, includes but is not limited to a short message on a mobile phone, and may also be a short message in other forms such as an instant message.

[0140] The set of short messages includes a plurality of spam short messages and a plurality of normal short messages. Wherein, the short message category of the spam short message is marked as a spam short message, and the short message category of a normal short message is marked as a normal short message.

[0141] Step S103: Determine a set of candidate words correspond...

no. 2 example

[0206] Please see Figure 5 , which is a schematic diagram of an embodiment of the device for identifying new text spam words of the present application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, refer to the part of the description of the method embodiment. The embodiment of the device for identifying new words in short message spam described below is only illustrative.

[0207] The present application additionally provides a device for identifying new words in short message spam, including:

[0208] A short message collection acquisition unit 501, configured to obtain a short message collection; the short message collection includes a plurality of junk messages and a plurality of normal messages;

[0209] Candidate word set determining unit 503, for determining the candidate word set corresponding to the plurality of spam messages;

[0210] The index determining unit 505 is used...

no. 3 example

[0227] Please refer to Figure 6 , which is a schematic diagram of an electronic device embodiment of the present application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to part of the description of the method embodiment. The device embodiments described below are illustrative only.

[0228] A kind of electronic equipment of present embodiment, this electronic equipment comprises: processor 601 and memory 602; Described memory is used for storing the program that realizes short message rubbish new word recognition method, and this equipment powers on and runs this short message by described processor After the program of the rubbish new word identification method, perform the following steps: obtain the collection of short messages; the collection of short messages includes a plurality of junk short messages and a plurality of normal short messages; determine the set of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a junk short message new word identification method and device, and electronic equipment. The method comprises the following steps: acquiring a short message set; determining candidate word sets corresponding to the plurality of junk short messages; determining short message category tendency related indexes of the candidate words according to the short message category information; obtaining document rare related indexes of the candidate words; determining junk short message new word scores of the candidate words according to the short message category tendency relatedindexes and the document scarcity related indexes; and determining the junk short message new words from the candidate word set according to the scores of the junk short message new words. By the adoption of the processing mode, most new words recalled according to the short message category tendency related indexes have representative significance for junk short messages, and recalling of many normal vocabularies is avoided; therefore, the call accuracy can be effectively improved. Meanwhile, by means of the processing mode, low-frequency junk short message new words can be recalled according to related indexes of document scarcity; therefore, the calling full rate can be effectively improved.

Description

technical field [0001] The present application relates to the technical field of text mining, and specifically relates to a method and device for identifying new text spam words, and electronic equipment. Background technique [0002] A typical short message sending scenario is that merchants send short messages to consumers through the network platform, so as to send information such as product promotions to consumers in a timely manner, thereby ensuring the effective implementation of merchant sales plans and improving user experience. However, along with these beneficial effects, a large number of spam messages have also appeared. The proliferation of spam text messages has seriously affected the normal life of consumers, the image of online platforms and even social stability. [0003] With the continuous development of Internet technology, more and more network platforms use SMS content security systems to analyze the content of Business-to-Customer (B2C) SMS, and perf...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/30G06F16/332
CPCG06F16/332G06F40/289G06F40/30
Inventor 高喆康杨杨周笑添孙常龙刘晓钟司罗
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products