Feature selection method based on covariance metric factor

A feature selection method and measurement factor technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of reducing the classification performance of the classifier, too large space dimension, etc., and achieve the effect of reliable feature selection algorithm.

Pending Publication Date: 2021-12-14
XIAN UNIV OF TECH
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the feature space dimension is too large, the classification performanc

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature selection method based on covariance metric factor
  • Feature selection method based on covariance metric factor
  • Feature selection method based on covariance metric factor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0048] The present invention provides a feature selection method based on covariance measurement factors, such as figure 1 As shown, the specific steps are as follows:

[0049] Step 1. Select different text type data sets, and perform preprocessing operations, that is, perform word segmentation operations, and remove stop words in the text. The text data is represented by the vector space model, and the feature words that appear in the data are more than 25% of the total number of documents or less than 3 are removed. The data set is divided according to the ratio of 9:1, that is, 90% of the samples in the data set are randomly selected as the training set data, and the remaining 10% of the samples are used as the test set data.

[0050] Step 2. Set the size of the optimal feature subset to C, use the feature sorting function to calculate t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

According to a feature selection method based on a covariance measurement factor, on the basis of an original triangular comparison measurement algorithm (TCM), the concept of the covariance measurement factor is introduced, and the correlation between features and categories is further measured on the document frequency level by calculating covariance values of feature words and the categories. When the performance of the method is verified, a naive Bayes algorithm is used for classification operation, and a macro F1 and a micro F1 are used for evaluating the classification effect. According to the method, feature words highly related to the categories can be better screened out, the method is a reliable feature selection algorithm, and the classification accuracy and efficiency are improved.

Description

technical field [0001] The invention belongs to the technical field of text classification methods, and in particular relates to a feature selection method based on a covariance measurement factor. Background technique [0002] With the wide application of big data technology, a large number of unstructured text information emerges on the World Wide Web and is stored and processed by computers, such as user comments on music and video software; user feedback and purchase records on e-commerce platforms; Articles, comments, etc. To process huge unstructured text data, technologies such as data mining and natural language processing must be used. Among them, text classification is widely used. Text data is divided into different categories through model learning, which facilitates further data processing. Text type data often consists of tens of thousands of feature words, which contain a large number of irrelevant and redundant features, which have a negative impact on class...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F40/216G06F40/289G06K9/62
CPCG06F16/35G06F40/216G06F40/289G06F18/2113G06F18/213G06F18/24155
Inventor 周红芳李想王晨光连延彬
Owner XIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products