Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Nominal attribute-based continuous type feature construction method

A construction method and continuous technology, applied in computing models, machine learning, computing, etc., can solve problems such as one-time extraction and large feature dimensions, and achieve the effect of strong interpretability, obvious differences, and simple feature selection

Inactive Publication Date: 2017-06-27
SOUTH CHINA UNIV OF TECH
View PDF5 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, most of the feature construction is manual extraction based on rules, which largely depends on the engineer's understanding of the business background. It is difficult to extract comprehensive features in a short period of time, especially for nominal attribute features or categorical variable features. For color features such as "yellow, red, and blue", the nominal attributes are often converted into sparse vectors with the same length between each other, and the features are constructed by One-Hot encoding or Dummy encoding.
Although each dimension of this coding method represents whether a certain nominal attribute or categorical variable appears, it has a certain physical meaning, but for different samples, this feature representation defaults to the same fixed value as the distance between each other. One point may be contrary to reality, and when the nominal attribute has too many values, this encoding method will lead to the problem of excessive feature dimension

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Nominal attribute-based continuous type feature construction method
  • Nominal attribute-based continuous type feature construction method
  • Nominal attribute-based continuous type feature construction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The present invention will be further described below in conjunction with specific examples.

[0054] Such as figure 1 As shown, the continuous feature construction method based on nominal attributes described in this embodiment is an important part of the entire machine learning system, which is responsible for generating all the features required for the training model and determines the upper limit of the accuracy of the entire prediction model. At the same time, this The method is divided into two parts: offline training and online prediction. The feature is constructed offline, and the sample feature to be predicted is generated online based on the existing training set without recalculation. Specifically include the following steps:

[0055] 1) Data preprocessing, including data table integration, data representation, missing value processing, etc. The data table integration refers to the integration of existing data tables, and puts all the fields in the data s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a nominal attribute-based continuous type feature construction method. The method comprises the following steps of: 1) performing data preprocessing; 2) setting a feature construction frame according to business background knowledge; 3) generating concrete a feature construction path; 4) constructing corresponding features according to the feature construction path and generating a training set; (5) performing feature selection on the training set and constructing a prediction model; (6) saving the relevant data set and the prediction model and terminating an off-line training process; 7) performing preprocessing and feature extraction on sample data required to be subjected to on-line prediction; and 8) using a prediction model obtained through the off-line training to predict a sample. The nominal attribute-based continuous type feature construction method of the invention cannot only be applied to a user-item scene and but also be applied to more general classification and regression prediction problems with nominal attributes or categorical variable features. Compared with traditional One-Hot and Dummy coding, the features generated by using the method of the invention make the differences of samples more obvious and have strong interpretability.

Description

technical field [0001] The invention relates to the field of feature engineering in machine learning, in particular to a continuous feature construction method based on nominal attributes. Background technique [0002] With the advent of the era of big data and the rise of the Internet, various machine learning algorithms are used to mine the commercially valuable information contained in the data, and feature engineering is a key step in the machine learning system, which determines the performance of the system. The upper limit of accuracy, feature construction is an important part of feature engineering. At present, most of the feature construction is manual extraction based on rules, which largely depends on the engineer's understanding of the business background. It is difficult to extract comprehensive features in a short period of time, especially for nominal attribute features or categorical variable features. Color features such as "yellow, red, and blue" often con...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N99/00
CPCG06N20/00
Inventor 董守斌马雅从张晶胡金龙
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products