High-dimensional feature data classification method and system based on distributed parallel decision tree

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of feature data and classification methods, which is applied in special data processing applications, relational databases, database models, etc., can solve problems such as inability to efficiently process high-dimensional feature data, achieve the effect of shortening the establishment time and improving parallel efficiency

Active Publication Date: 2020-06-09

INST OF COMPUTING TECH CHINESE ACAD OF SCI

View PDF8 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0009] The purpose of the present invention is to overcome the problem that the above-mentioned existing parallel decision tree algorithm cannot efficiently process high-dimensional feature data, and proposes a parallel decision tree algorithm that processes in parallel at the node and feature levels at the same time. With the same efficiency, it can effectively improve the processing efficiency of high-dimensional feature data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0037] When the inventors conduct large-scale data mining research, they find that the data dimension is very large, and the existing decision tree algorithm cannot handle this data well. The reason is that the serial decision tree cannot handle large-scale data, and the existing parallel decision tree algorithm has a low degree of parallelism, and the fastest algorithm is only parallel at the node level, but not in the optimal feature selection part. In the case of large feature dimensions and many feature values, using a multi-fork decision tree will lead to too many decision tree nodes, resulting in excessive memory usage and overfitting. Using a binary decision tree must divide all possible nodes. Traversal, finding the information gain of each division and deciding the optimal node will also bring a lot of time consumption. Existing parallel decision tree algorithms do not take this into account, because naturally occurring data rarely have particularly large feature dime...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a high-dimensional feature data classification method and system based on a distributed parallel decision tree. A parallel decision tree algorithm oriented to high-dimensional feature data based on Spark is realized; the parallel algorithm is high in degree of parallelism, can process a large-scale data set, not only can perform parallel calculation between nodes on the samelayer in a decision tree, but also can perform parallel calculation on a feature level, improves the degree of parallelism of high-dimensional data, and can effectively reduce the processing time ofhigh-dimensional features.

Description

technical field [0001] The invention relates to the field of tree classification, and in particular to a method and system for classifying high-dimensional feature data based on distributed parallel decision trees. Background technique [0002] The decision tree classification algorithm is an instance-based inductive learning method, which can extract a tree-type classification model from a given unordered training sample. Each non-leaf node in the tree records which feature is used to judge the category, and each leaf node represents the last category judged. A classified path rule is formed from the root node to each leaf node. When testing a new sample, you only need to start from the root node, test at each branch node, and recursively enter the subtree along the corresponding branch to test again until you reach the leaf node. The category represented by the leaf node is is the predicted category of the current test sample. Quinlan proposed the famous ID3 algorithm i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/62G06F16/27G06F16/28G06F16/2458

CPCG06F16/27G06F16/285G06F16/2462G06F18/24323Y02D10/00

Inventor 孙莹庄福振敖翔何清

Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

High-dimensional feature data classification method and system based on distributed parallel decision tree

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology