Automatic data exploration method and system based on contribution degree of target label

A technology of target labeling and contribution degree, which is applied in digital data information retrieval, electronic digital data processing, special data processing applications, etc., can solve problems such as heavy workload, low work efficiency, repetition, etc., to improve efficiency and realize automatic screening Effect

Pending Publication Date: 2022-06-24
北京思特奇信息技术股份有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Each modeling process requires manual processing, including manually writing SQL statements for statistical analysis and feature exploration through modeling methods. There are a lot of tedious and repetitive tasks, heavy workload, and low work efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic data exploration method and system based on contribution degree of target label
  • Automatic data exploration method and system based on contribution degree of target label

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present invention.

[0041] Below in conjunction with accompanying drawing, the present invention is described in further detail:

[0042] like figure 1 As shown, according to a kind of automatic data exploration method based on target label contribution provided by the present invention, comprising:

[0043] Associate and integrate the feature data with the t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic data exploration method and system based on the contribution degree of a target label, and the method comprises the steps: carrying out the correlation integration of feature data and the target label, and obtaining a data set; inputting the data set into a random forest and cross validation model, an information gain rate algorithm and a random forest algorithm, and determining a sorting table of contribution degrees of each feature field in the feature data relative to a target label; processing the three sort tables in a voting mode to obtain a feature contribution degree sort table; obtaining a preset number of feature fields ranked in the front in the feature contribution degree ranking table, performing correlation check on every two feature fields, and removing the feature fields with correlation exceeding a preset threshold value; and carrying out data distribution analysis according to basic data types of the fields for feature fields left after the relevance test is eliminated. According to the technical scheme, automatic screening of the feature fields is achieved, the feature screening efficiency is greatly improved, and tedious and repeated manual work is omitted.

Description

technical field [0001] The invention relates to the technical field of data mining, in particular to an automatic data exploration method based on target label contribution degree and an automatic data exploration system based on target label contribution degree. Background technique [0002] At present, feature engineering is a very important link in data mining, and it is also the link with the largest workload. It is necessary to select features that contribute to the target label, and at the same time, data exploration and analysis are required to provide input for model construction and model interpretability. Each modeling requires manual processing, including manual writing of SQL statements for statistical analysis, and feature exploration through modeling methods. There are a lot of tedious and repetitive tasks, and the workload is large and the work efficiency is low. SUMMARY OF THE INVENTION [0003] In view of the above problems, the present invention provides ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/9035G06N20/00
CPCG06F16/9035G06N20/00
Inventor 吕宁
Owner 北京思特奇信息技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products