Text filtering method and system based on keyword weight value

A text filtering and keyword technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of low accuracy in determining text, and achieve the effect of increasing the accuracy.

Active Publication Date: 2014-06-04
CHINA MOBILE COMM GRP CO LTD
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0020] In order to solve the problem that the system judges the text with low accuracy, the application provides a text filtering method based on keyword weight, which includes the following steps: calculating the weight of the keyword; and based on the calculated keyword The weight of the text is filtered; wherein the step of calculating the keyword weight includes: judging whether the keyword is a new keyword, if not, calculating the number M of correct judgment data and the number M of wrong judgment data in the historical judgment data Number N, and the number M1 of correct judgment data containing keywords and the number N1 of wrong judgment data; and calculating t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text filtering method and system based on keyword weight value
  • Text filtering method and system based on keyword weight value
  • Text filtering method and system based on keyword weight value

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] Since the number of samples of keywords for information filtering is generally not large (hundreds to thousands), there are hundreds of billions of objects (texts) that are judged by thesaurus every day, and there are also several samples that need to be manually reviewed every day. ten thousand.

[0026] For the same keyword, such as "gun", it may be used in both normal text and violent web pages. However, due to the existence of manual review in the existing filtering system, the results of manual review can be used to determine the impact of keywords on correct judgments and wrong judgments in the judgment; comprehensive analysis of the positive and negative effects of keywords in the judgment , and finally determine the weight of the keyword.

[0027] This application proposes a mechanism for optimizing and setting keyword weights based on classified samples. This mechanism becomes a keyword weight setting mechanism based on sample friction, which divides samples ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text filtering method based on a keyword weight value. The method comprises the following steps that the weight value of a keyword is calculated; a text is filtered based on the calculated weight value of the keyword; the process of calculating the weight value of the keyword comprises the steps of judging whether the keyword is a brand-new keyword or not, calculating the number of accurate judgment data and the number of wrong judgment data in historical judgment data and the number of accurate judgment data including the keyword and the number of wrong judgment data including the keyword if the keyword is the brand-new keyword, and calculating the weight value of the keyword. The invention further provides a text filtering system based on the keyword weight value.

Description

technical field [0001] The present application relates to the fields of security and data services, and in particular to a text filtering method and system based on keyword weights. Background technique [0002] Text information is the most disseminated content in mobile Internet information, including: web pages, short messages, multimedia messages, instant messaging tools, etc. Information filtering (such as politics, pornography, gambling...) for sensitive content in text transmission is an important technology in the Internet. In general, the system categorizes text as "OK" and "Needs filtering." [0003] From the perspective of information volume, the amount of text data accessed by users on each link (10G) is as many as hundreds of millions per day, and the entire network has hundreds of billions of data, and the proportion of information that needs to be filtered is very small, generally less than 1%, so it is difficult to accurately capture the information to be fi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 粟栗张峰付俊
Owner CHINA MOBILE COMM GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products