Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method for enhancing high-dimensional category feature expression capability

A feature expression and category technology, applied in character and pattern recognition, instruments, computing, etc., can solve the problems of high hardware resource requirements, long training time, weak feature expression ability, etc., so that the training time will not increase and the model performance will be improved , the effect of improving performance

Pending Publication Date: 2019-04-19
SICHUAN XW BANK CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the above-mentioned research problems, the purpose of the present invention is to provide a method for enhancing the expressive ability of high-dimensional category features, so as to solve the problem that when the number of input feature categories is large in the prior art, the complexity of model parameters will be greatly aggravated in the case of massive data. High memory consumption, long training time, slow computing speed and high requirements for hardware resources; in the case of less data, the expressive ability of features is weak, resulting in the problem of weak model performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for enhancing high-dimensional category feature expression capability
  • Method for enhancing high-dimensional category feature expression capability
  • Method for enhancing high-dimensional category feature expression capability

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] The data source of this example comes from Porto Seguro’s Safe DriverPrediction of the kaggle competition platform.

[0042] The specific link is as follows: https: / / www.kaggle.com / c / porto-seguro-safe-driver-prediction , Due to the large amount of data used, there are many types of attributes corresponding to single-category variables. Only the names of single-category variables are given below, so this embodiment does not provide specific data (you can find it in the link). If necessary, provide , which we can provide separately.

[0043] The single-category variables used are: "ps_ind_02_cat", "ps_ind_04_cat", "ps_ind_05_cat", "ps_car_01_cat", "ps_car_02_cat", "ps_car_03_cat", "ps_car_04_cat", "ps_car_05_cat", "ps_car_06_cat", "ps_car_07_cat", "ps_car_08_cat" ", "ps_car_09_cat", "ps_car_10_cat" and "ps_car_11_cat", the above variables are also public expressions, which are meanings known in the art.

[0044] When using the target conversion formula in the present in...

Embodiment 2

[0048] For further illustration, the processed attribute target feature variable of the present invention can improve model performance, specifically as follows:

[0049] The data source comes from: Lending Club (a US peer-to-peer lending company) customer loan data, the purpose is to predict the "good or bad" of the applicant, the link is as follows:

[0050] https: / / raw.githubusercontent.com / h2oai / app-consumer-loan / master / data / loan.csv, due to the large amount of data used, there are many types of attributes corresponding to single-category variables, and only given below The name of the single-category variable is specified, so this embodiment does not provide specific data (can be found in the link), and we can provide it separately if necessary.

[0051] The target conversion formula in the present invention is mainly used to process the category variable "addr_state", so as to observe the performance of the gbdt model before and after processing, and the evaluation crite...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for enhancing high-dimensional category feature expression capability, belongs to the technical field of feature engineering, machine learning algorithms and classification prediction, and solves the problem of one-in the prior art. The problems that the expression ability of features is weak through a hot coding mode and an embeding strategy, so that the expression ability of a model is weak, and the complexity of model parameters can be greatly increased under the condition that a large number of feature categories exist are solved. The method comprises the following steps: constructing a conversion formula for converting attributes corresponding to category variables into attribute characteristics; S2, performing regularization processing on the conversion formula to obtain a target conversion formula; And S3, processing the attributes corresponding to the class variables through a target conversion formula to obtain a final attribute target characteristic variable. The method is used for enhancing the expression capability of the high-dimensional class characteristics.

Description

technical field [0001] A method for enhancing the expressive ability of high-dimensional category features is used for enhancing high-dimensional category features, and belongs to the technical fields of feature engineering, machine learning algorithms, and classification prediction. Background technique [0002] In the field of machine learning, there is a profound consensus: data and features determine the upper limit of machine learning, and models and algorithms only approach this upper limit. It can be seen that the importance of feature engineering is self-evident. The essence of feature processing is to enhance the expressive ability of features and further improve the performance of the model. Among them, the processing of high-dimensional category features, how to increase their feature expression ability and improve model performance has always been a difficult point in academic and industrial research. [0003] For high-dimensional category features, there are t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/21
Inventor 罗时超
Owner SICHUAN XW BANK CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products