Attribute weighting method based on information gain ratios and text classification methods

A technology of information gain rate and text classification, which is applied in the field of artificial intelligence data mining and classification, and can solve problems such as high time complexity and inapplicability

Inactive Publication Date: 2015-07-29
CHINA UNIV OF GEOSCIENCES (WUHAN)
View PDF1 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the heuristic search process of the CFS attribute weighting method has too high time complexity, and it is not suitable for text data with high dimensions or even more than ten thousand dimensions.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Attribute weighting method based on information gain ratios and text classification methods
  • Attribute weighting method based on information gain ratios and text classification methods
  • Attribute weighting method based on information gain ratios and text classification methods

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The present invention will be further described below in conjunction with embodiment.

[0057] The present invention provides an attribute weighting method based on information gain rate, comprising the following steps:

[0058] (1) For a known training document set D, any document d in the training document set D is represented as a word vector form d=1 ,w 2 ,...w m >, where w i is the i-th word in document d, and m is the number of words in document d;

[0059] Use the following formula to calculate the information gain rate of each attribute in the training document set D:

[0060] GainRatio ( D , w i ) = Gain ( D , w i ) S...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an attribute weighting method based on information gain ratios. Firstly, the information gain ratio of each attribute is calculated and then by utilizing the information gain ratios, the weight of each attribute is respectively calculated. The invention also provides a polynomial naive Bayesian text classification method relying on the attribute weighting method based on information gain ratios, a complementary set naive Bayesian text classification method relying on the attribute weighting method based on information gain ratios and a polynomial and complementary set integrated naive Bayesian text classification method relying on the attribute weighting method based on information gain ratios. The invention improves the classification precision of the original naive Bayesian text classifier, and also keeps the simplicity and time complexity of the original naive Bayesian algorithm.

Description

technical field [0001] The invention relates to an attribute weighting method based on information gain rate and a text classification method, and belongs to the technical field of artificial intelligence data mining classification. Background technique [0002] Naive Bayesian text classifier is often used to deal with text classification problems because of its simplicity and efficiency, but its attribute independence assumption affects its classification performance to some extent while making it efficient. Given a document d, the document is represented as a word vector of the form <w 1 ,w 2 ,...,w m >, Multinomial Naive Bayes (MNB), Complementary Naive Bayes (CNB) and the combined model of both (OVA) classify document d using Equations 1, 2 and 3, respectively. [0003] c ( d ) = arg max c ∈ C [ ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62
Inventor 张伦干蒋良孝李超群
Owner CHINA UNIV OF GEOSCIENCES (WUHAN)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products