Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method and device for text classification, and method and device for characteristic processing of text classification

A technology of text classification and processing method, applied in the field of data processing, can solve the problems of large memory occupation and large feature library in machine learning, and achieve the effect of reducing the feature library, reducing the feature library, and reducing the memory occupied

Inactive Publication Date: 2013-08-14
ALIBABA GRP HLDG LTD
View PDF1 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The main purpose of this application is to provide a text classification method and device and a text classification feature processing method and device to solve the problem that the large feature library of text classification leads to large memory usage during machine learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for text classification, and method and device for characteristic processing of text classification
  • Method and device for text classification, and method and device for characteristic processing of text classification
  • Method and device for text classification, and method and device for characteristic processing of text classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] In order to enable those skilled in the art to better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described The embodiments are only some of the embodiments of the present application, but not all of them. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in this application shall fall within the protection scope of this application.

[0022] First, a text classification device in the embodiment of the present application is described, as figure 1 As shown, the text classification device includes: a feature processing device 20 , a training module 40 and a classification module 60 .

[0023] Before the machine learning task of text classification, a certain amount of learning materials must be pro...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device for text classification, and a method and a device for characteristic processing of text classification. The method for characteristic processing of text classification includes obtaining a characteristic set of learning materials used for text classification; calculating a sum of information gain values of each characteristic word in all classification types; and extracting characteristic words in a preset number in the characteristic set as learning characteristics used for text classification to enable the learning characteristics used for text classification to be part of the characteristic words in residual characteristic words except for stop words in the characteristic set, wherein the sum of the information gain values corresponding to extracted characteristic words is larger than the sum of the information gain values corresponding to non-extracted characteristic words. By applying the method and the device for text classification to characteristic extraction of text classification, noise characteristics can be effectively avoided being brought into a machine learning process, so that accuracy of text classification is improved, the scale of a characteristic library is greatly reduced, and memory usage is reduced.

Description

technical field [0001] The present application relates to the field of data processing, in particular, to a text classification method and device, and a text classification feature processing method and device. Background technique [0002] Machine learning algorithms rely on the extraction of effective feature data to obtain a good learning effect. How to extract effective features and avoid the interference of noise features is an important way to improve the effect of machine learning. [0003] At present, when obtaining the learning features of machine learning, all words are often used as features, which makes the feature library huge, thus occupying a huge amount of memory during machine learning, and it is mixed with many noise features, and the text classification effect is poor. [0004] In order to remove the noise features, the words after the stop words are deleted are used as features, but the noise features can only be eliminated to a certain extent, and the fe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 许文奇
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products