Unlock instant, AI-driven research and patent intelligence for your innovation.

Feature selection method based on rough set and swarm intelligence

A feature selection method and rough set technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as high computational complexity, time-consuming, and inability to guarantee optimal feature subsets.

Inactive Publication Date: 2018-11-23
JINGCHU UNIV OF TECH
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, when a bank establishes a personal credit scoring system, it is necessary to filter indicators for personal credit scoring. There are the following problems: first, how to select the necessary indicators; Screening speed and other issues
[0008] Among the three types of methods, only the exhaustive method can ensure the optimal feature subset, but the exhaustive method needs to find all the feature subsets that meet the requirements, which has high computational complexity and consumes a lot of time, so it is not suitable for processing large Data set; the heuristic method is simple, fast and efficient, but because there is no complete heuristic information, it cannot guarantee to find the optimal feature subset; although the random method can provide a better feature selection solution, the operation Very time-consuming, requires a lot of calculations, and there is no guarantee that the optimal feature subset can be obtained every time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature selection method based on rough set and swarm intelligence
  • Feature selection method based on rough set and swarm intelligence
  • Feature selection method based on rough set and swarm intelligence

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0084] Embodiment 1 applies the present invention to 12 discrete data sets (low dimension) to carry out the test of feature selection

[0085] Experimental environment and data:

[0086] Hardware environment: Intel Core i5 3470-3.20GHz, 8.0GB memory, 1TB hard disk.

[0087] Software environment: Matlab R2017a, 64-bit Windows 7 operating system.

[0088] Experimental data: Select 12 discrete datasets (see Table 1) in the machine learning UCI database (UCI Machine Learning Repository[DB / OL].http: / / archive.ics.uci.edu / ml / datasets.html) as a test data set.

[0089] Table 1 12 test data sets (low dimension)

[0090]

[0091] In table 1, the title of each data set, the number of instances and the number of conditional features it contains, the number of conditional features, the feature evaporation rate, the missing data ratio and the adoption of the Weka 3 tool ( Data MiningSoftware in Java[EB / OL].http: / / www.cs.waikato.ac.nz / ml / weka / index.html) to complete the specific metho...

Embodiment 2

[0146] Embodiment 2 applies the present invention to 8 discrete data sets (high dimension) to carry out the test of feature selection

[0147] Experimental environment and data:

[0148] Hardware environment: Intel Core i5 3470-3.20GHz, 8.0GB memory, 1TB hard disk.

[0149] Software environment: Matlab R2017a, 64-bit Windows 7 operating system.

[0150] Experimental data: 8 discrete data sets (see Table 6) in the machine learning UCI database are selected as test data sets.

[0151] Table 6 8 test data sets (high dimension)

[0152]

[0153]

[0154] In Table 6, there are 5 data sets (Arrhythmia, Hill, Musk1, Musk2, and Semeion) with less than 300 conditional features, and the remaining 3 data sets (Isolet, Micromass, and Secom) have more conditional features, all of which are within 500 Above, especially the dataset Micromass, the number of features reaches 1300. At the same time, there are 3 large-scale data sets, and the product of the number of conditional features...

Embodiment 3

[0156] Embodiment 3 Performance test of this method (RPA) on 12 low-dimensional data sets

[0157] In order to examine the performance of this method (RPA), 20 tests are carried out to each data set in embodiment 1, and four kinds of feature selection methods (heuristic method QUICKREDUCT (A.Chouchoulas, Q.Shen.Rough Set-AidedKeyword Reduction for Text Categorization[J].Applied Artificial Intelligence,2001,15(9):843-873) and MIBARK(Miao Duoqian, Hu Guirong. A heuristic algorithm for knowledge reduction[J]. Computer Research and Development, 1999,36(6):681-684), a hybrid method IDS based on swarm intelligence (C.S.Bae, W.C.Yeh, Y.Y.Chung, S.L.Liu. Feature Selection with Intelligent Dynamic Swarm and RoughSet[J]. Expert Systems with Applications, 2010 ,37(10):7026-7032) and NDABC(Y.R.Hu,L.X.Ding,D.T.Xie,S.W.Wang.A Novel Discrete Artificial Bee Colony Algorithm for Rough Set-Based Feature Selection[J].International Journal of Advancements in Computing Technology,2012 ,4(6):295-3...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of data processing and analysis, particularly relates to the technical field of business data processing and analysis, in particular to a feature selectionmethod based on a rough set and swarm intelligence. The method comprises the following steps of 1, setting parameters of the method; 2, by utilizing the rough set and mutual information knowledge, calculating out a feature kernel; and optionally selecting one or more steps in the following steps of 3, initializing a population; 4, calculating a fitness value of a feasible solution, an individualextremum Pbest and a global extremum Gbest; 5, performing iteration; and 6, outputting an optimal feature subset REDU. According to the method, the calculation is simple; the convergence speed is high(all feature subsets do not need to be calculated out); a big data set can be processed, and the optimal feature subset (which does not fall into local optimum) can be obtained; and finally the feature selection target of removing noises and obtaining the optimal feature subset is achieved. By use of a bank personal credit scoring indicator screening method provided by the invention, a simplifiedpersonal credit scoring indicator system can be obtained rapidly and accurately.

Description

technical field [0001] The invention belongs to the technical field of data processing and analysis, in particular to the technical field of commercial data processing and analysis, and specifically relates to a feature selection method based on rough sets and swarm intelligence. Background technique [0002] With the new wave of science and technology, the increasing popularity of computer and Internet technology, the era of big data has quietly come, big data is becoming an important strategic resource, and it is extremely important to analyze and mine big data. In data mining, the feature dimension of describing data is getting higher and higher. However, most of the features may be irrelevant to the mining task or there is mutual redundancy between the features, which makes the time-space complexity of the learning algorithm in data mining increase and the effect change. This phenomenon is known as the "curse of dimensionality". In the face of the "curse of dimensionali...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/00G06N99/00G06F17/30
CPCG06N3/006
Inventor 胡玉荣余晨阳余建国胡斌李祥琴李冉田雯陆焱
Owner JINGCHU UNIV OF TECH