Unlock instant, AI-driven research and patent intelligence for your innovation.

Feature selection method based on entry distribution

A feature selection method and entry technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as low classification accuracy, the influence of word frequency feature selection, and the lack of good consideration of the interaction of data sets. , to achieve the effect of improving the classification accuracy and the correct rate

Inactive Publication Date: 2015-07-22
XIAN UNIV OF TECH
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, word frequency also has a strong influence on feature selection during classification
Therefore, someone proposed the t-test method. Although this method is also based on word frequency, the classification accuracy is not high because the interaction between categories in the data set is not well considered.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature selection method based on entry distribution
  • Feature selection method based on entry distribution
  • Feature selection method based on entry distribution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0028] The core idea of ​​the present invention is based on word frequency, first the text in the data set is cut into individual entries, and then calculates the class average word frequency and the overall average word frequency of each entry; then utilizes the class average word frequency and the overall average word frequency to calculate each The weight value of an entry; finally, the first m entries are selected as feature vectors for text classification according to the weight values ​​in descending order.

[0029] The present invention is a feature selection method based on entry distribution, such as figure 1 shown, including the following steps:

[0030] Step 1, collecting several texts of different categories to form a data set;

[0031] Step 2, preprocess all the texts in the data set to get several entries, denoted as t i , i ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a feature selection method based on entry distribution. The method includes the following steps that a plurality of texts different in class are collected to form a data set, all the texts in the data set are preprocessed, and a plurality of entries are obtained; the class average word frequency of each entry in the corresponding class is calculated; the overall average word frequency of each entry in the whole data set is calculated; the weight value of each entry is calculated according to the class average word frequency and the overall average word frequency of the corresponding entry; all the entries are arranged in a descending mode according to the weight values; the front 5-30 percent of the entries in the data set are selected to serve as feature words used for classifying the texts. According to the feature selection method based on entry distribution, the word frequency serves as the basis, and the classifying precision can be improved.

Description

technical field [0001] The invention belongs to the technical field of data mining and relates to a feature selection method based on entry distribution. Background technique [0002] With the rapid growth of network information, the number of electronic texts has increased dramatically, how to effectively organize these resources has attracted more and more researchers' attention. Text classification is the key technology to solve this problem. Text classification is to use some marked texts to construct classifiers, and then automatically classify unmarked texts into predefined categories according to the constructed classifiers. This technology is widely used in Web text classification, information retrieval, mail filtering and spam SMS filtering and other fields. [0003] At present, there are a large number of classification algorithms, such as decision tree, k-Nearest Neighbors (kNN, k-Nearest Neighbors), Support Vector Machine (SVM, Support Vector Machine) and so on...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 周红芳郭杰段文聪王心怡何馨依刘杰李锦
Owner XIAN UNIV OF TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More