Feature extraction and feature selection methods for background multi-source data

A feature selection method, multi-source data technology, applied in data processing applications, character and pattern recognition, instruments, etc., can solve problems such as the impact of prediction results

Inactive Publication Date: 2017-10-27
NANJING UNIV +1
View PDF2 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For LR classifiers, small changes in features will also have a great impact on the final prediction results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature extraction and feature selection methods for background multi-source data
  • Feature extraction and feature selection methods for background multi-source data
  • Feature extraction and feature selection methods for background multi-source data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Such as figure 1 As shown, for background multi-source data, specific feature extraction methods are used for different source data. For the extracted multi-source data features, the Group Lasso method is used for group feature selection, and a machine learning model is further established on the selected group features. Predict off-grid users.

[0030] Such as figure 2 As shown, the training set and test set are divided for the data from May 2013 to February 2014.

[0031] Such as image 3 As shown in the figure, it is a line chart of the daily online time of 50 users in May. The amount of data about users going online and offline is huge and contains a lot of information.

[0032] Such as Figure 4 , Figure 5 As shown, the method for extracting online time trend features based on multi-scale histogram statistics proposed by the present invention includes the following steps:

[0033] (1) This time series is not a typical time series in the traditional sense, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A feature extraction and feature selection method for background multi-source data is characterized in that it includes the following steps: (1) dividing training sets and test sets on the background data of multiple months; (2) training sets for different Extract the corresponding group features from the source data; (3) Use the Group Lasso method to select feature groups through cross-validation on the test set. The beneficial effects of the present invention are as follows: for the selected group features, the C45 decision tree is used to establish a classifier for off-network user analysis classifier, and the accuracy rate of off-network user prediction reaches 45%. The accuracy rate reached 88%.

Description

technical field [0001] The invention relates to a background multi-source data-oriented feature extraction and feature selection method for off-network user analysis. Background technique [0002] For each household's daily online time series, there is currently no good way to characterize the changing trend characteristics of users' online time. The Lasso method is a sparse feature selection method. When Lasso is directly applied to a model with a group structure, it tends to select a single feature and destroys the group structure of the feature. For LR classifiers, small changes in features will also have a great impact on the final prediction results. [0003] The Group Lasso method introduces the extension of the penalty function to study the selection of group features. The Filter method is a feature selection method that has nothing to do with the learning machine, and selects a subset of features through a certain measure. A commonly used measure is the Pearson co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06K9/46
CPCG06Q10/04G06V10/50G06F18/24
Inventor 范剑锋杨琬琪高阳史颖欢孙良君
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products