Text classification feature extraction method, classification method and device

A text classification and feature extraction technology, which is applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., can solve problems such as poor performance of classification operations

Active Publication Date: 2014-07-02
TENCENT TECH (SHENZHEN) CO LTD
View PDF5 Cites 45 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When implementing the prior art, the inventor found that the classification method using the above method has relatively poor classification opera

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification feature extraction method, classification method and device
  • Text classification feature extraction method, classification method and device
  • Text classification feature extraction method, classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0098] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0099] See figure 1 , is a schematic flow chart of a text classification feature extraction method in an embodiment of the present invention; the method in the embodiment of the present invention can be applied to various types of text application servers, and the training set involved is a pre-set one including multiple text It is called a collection of training set texts, the text type of the text in the training set is a known type, and the feature extraction of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a text classification feature extraction method and a classification method and device. The text classification feature extraction method includes the steps of obtaining a feature word set of training set texts, confirming feature grading values of the feature words according to relevance of feature words in the feature word set and preset text categories, recording the feature words whose grading values are higher than a preset score threshold value, and obtaining a text feature set of the training set texts. According to the text classification feature extraction method and the classification method and device, the number of the feature words can be effectively reduced on the situation that the feature words capable of expressing text information are obtained, and accordingly when texts are classified, classification operation time is conveniently reduced, calculation time and space expenditures are reduced, and calculation cost is saved.

Description

technical field [0001] The invention relates to the field of text classification, in particular to a text classification feature extraction method, classification method and device. Background technique [0002] With the rapid development of Internet technology, the number of network texts has shown explosive growth. How to effectively manage these texts is a current hot issue. As a key technology for managing massive data, text classification has been widely used. [0003] The current text classification method based on statistics can better classify new instance texts by learning the classified texts. Among them, in the process of classifying a new instance, it is necessary to segment the instance text first to obtain a word set including several words, and then perform text classification processing based on all the words in the word set to complete the classification of the instance text . When implementing the prior art, the inventor found that the classification meth...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 李鑫张延祥
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products