Text categorizing method, device and system

A text classification and text technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of poor classification effect of text classification methods, and achieve the effect of improving classification accuracy and reducing feature space.

Inactive Publication Date: 2013-08-14
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF2 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the embodiments of the present invention is to provide a text classification method, which aims to s

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text categorizing method, device and system
  • Text categorizing method, device and system
  • Text categorizing method, device and system

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0017] In order to make the objectives, technical solutions and advantages of the present invention clearer, the following further describes the present invention in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0018] The embodiment of the present invention uses the part-of-speech of each word in the text as a feature for feature extraction and classification, which greatly reduces the feature space. Therefore, a relatively complex and accurate classification model can be selected in the classifier to classify the text to be classified. Improved classification accuracy.

[0019] figure 1 The implementation process of the text classification method provided by the first embodiment of the present invention is shown, which is detailed as follows:

[0020] In step S101, features of the text to be classifie...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention is applicable to the technical field of internet text categorizing and provides a text categorizing method, device and system. The method includes: extracting characteristics of texts to be categorized; and categorizing the texts to be categorized according to the characteristics of the texts to be categorized to obtain normal texts and junk texts, wherein the characteristics include word properties in the texts to be categorized. By using the property of each word in the texts as the characteristics to conduct characteristic extraction and categorizing, characteristic space is greatly reduced, a relatively complex and precise categorizing model can be selected from a categorizer to categorize the texts to be categorized, and the categorizing accuracy is greatly improved.

Description

technical field [0001] The invention belongs to the technical field of Internet text classification, and in particular relates to a text classification method, device and system. Background technique [0002] The good openness and interactivity of the Internet has brought about the problem of spam texts. Some unscrupulous users publish a large amount of political, advertising and pornographic content through the Internet, which seriously endangers public network security. Therefore, it is necessary to classify the text information uploaded by users , to filter out junk text from it. [0003] Existing text classification methods are based on words for feature extraction. Since any language has a large number of vocabulary, the feature extraction based on words has the problem of huge feature space on the one hand, which limits the performance of the classifier. On the one hand, compared with the huge feature space, the number of texts that can be obtained for training is rel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 何晓宁勇凤伟
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products