Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Junk text library establishing method and system and junk text filtering method

A method of establishing a method and a text technology, which is applied in the fields of instruments, computing, and electrical and digital data processing. It can solve the problems of time-consuming and laborious statistics, lack of learning of keywords, etc., to save time and energy, and to overcome the effect of junk text samples.

Active Publication Date: 2017-05-24
北京粉笔蓝天科技有限公司
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 1. Existing technical solutions require a large amount of spam texts to be compared with normal texts to determine the spam feature words when extracting advertising keywords, and the statistics are time-consuming and laborious;
[0005] 2. The relevant garbage keywords after the keywords are included lack the function of further learning;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Junk text library establishing method and system and junk text filtering method
  • Junk text library establishing method and system and junk text filtering method
  • Junk text library establishing method and system and junk text filtering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in combination with specific embodiments and with reference to the accompanying drawings. It should be understood that these descriptions are exemplary only, and are not intended to limit the scope of the present invention. Also, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily obscuring the concept of the present invention.

[0037] see figure 1 , figure 1 It is a flow chart of the method for establishing a garbage text library provided by the first embodiment of the present invention.

[0038] like figure 1 As shown, in the present embodiment, the establishment method of rubbish text storehouse comprises:

[0039] Step S100: Obtain at least one pre-collected junk text sample from the text. Step S200: Detect whether there is a long featur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

An embodiment of the invention discloses a junk text library establishing method and belongs to the technical field of establishment of computer text libraries, wherein the method comprises: S100, acquiring at least one pre-collected junk text sample from text; S200, detecting whether long characteristic words are present in each junk text sample or not; if yes, recording the long characteristic words into a long characteristic word set; S300, classifying the junk text samples corresponding to the long characteristic word set based on Bayes classifier to obtain junk text and non-junk text; S400, comparing the number of new junk text with a preset convergence threshold, executing step 500 if the number of new junk text is less than the convergence threshold, and executing step 600 otherwise; S500, finishing the establishment of the junk text library, and ending the process; S600, acquiring new junk sample files from the text, and returning to execute the step S200 to step S500. According to the embodiment of the invention, the method allows the junk text library to be established just with few text samples collected, time and labor are saved, and the precision is greater.

Description

technical field [0001] The invention relates to the technical field of computer text database establishment, in particular to a method for establishing a garbage text database, a method for filtering garbage text and a system for establishing a garbage text database. Background technique [0002] With the popularization of the Internet industry and the continuous in-depth development of e-commerce applications, people interact more and more frequently on the Internet. However, as the amount of information continues to increase, unnecessary spam information also increases, causing users to Receiving unwanted junk information when obtaining information, thus making wrong judgments or choices. [0003] In the prior art, some online games or forums have provided a detection function similar to spam comments, and its usual processing steps are: 1. Segment the text entered by the user; 2. Perform keyword matching on the word segmentation results; 3. If the keyword is matched, the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/353
Inventor 张凯
Owner 北京粉笔蓝天科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products