Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Improved hierarchical cascade support vector machine parallelization method

A technology of support vector machine and support vector, which is applied in the direction of computer parts, instruments, characters and pattern recognition, etc., can solve the problems of node resource idleness, time consumption, and low utilization rate of cluster resources, so as to improve the efficiency of training and classification, The effect of shortening the training time and improving the efficiency of text classification

Pending Publication Date: 2017-09-22
HARBIN ENG UNIV
View PDF6 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Assuming that the model training is performed on a cluster containing N nodes, a large number of node resources will be idle during the model training process, which leads to low resource utilization of the cluster
[0007] (2) During the whole layer training process, after the first layer of SVM training, most of the non-support vectors are eliminated, and the number of non-support vectors filtered out in the subsequent layer training is small, but it consumes the calculation. lot of time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved hierarchical cascade support vector machine parallelization method
  • Improved hierarchical cascade support vector machine parallelization method
  • Improved hierarchical cascade support vector machine parallelization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The following examples describe the present invention in more detail.

[0038] Combining the improved Cascade SVM algorithm proposed by the present invention with the Spark parallel framework, the algorithm flow chart is as follows image 3 shown. By combining Spark's good parallel processing capabilities with the improved Casade SVM algorithm, the efficiency of model training and classification is improved.

[0039] Algorithm: Parallelization based on spark improved Casade SVM algorithm.

[0040] Input: training data set, the number of partitions N;

[0041] Output: support vector machine model model.

[0042] (1) Input the training data set to the HDF (SHadoop Distributed File System) distributed file storage system;

[0043] (2) Read the training data set from HDFS, generate RDD (Resilient Distribute Dataset) elastic distributed data set, and divide it into sub-datasets with uniform samples according to the specified N value, N is the number of parallel machines;...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an improved hierarchical cascade support vector machine parallelization method. Optimization is obtained through an improved Cascade SVM algorithm. Firstly, a c is introduced in the improved algorithm to be used to measure a change situation of the number of model support vectors obtained in training of each layer in a hierarchical training process; and then through adjusting a merging strategy and a hierarchical structure in the model training process, a merging mode of the support vectors obtained by training of each layer is optimized and adjusted from merging of every two to average segmentation after merging of all, and thus a defect, existing in the merging mode of every two, that filtering non-boundary samples is insufficient is avoided. Under the premise of ensuring that the classification accuracy is not reduced, the method can effectively shorten the model training time and improve the training and classification efficiency of a model by virtue of a current-mainstream parallel framework of Spark.

Description

technical field [0001] The invention relates to an improved Casade SVM parallelization method. Background technique [0002] In the era of big data, with the rapid development and wide application of the Internet and information technology, various massive data closely related to people's lives have been generated. Among these messy data, unstructured text information accounts for the main part. Faced with such huge text data, it is difficult for people to quickly obtain their own useful information. How to quickly process and discover these data information has become a major problem, which has also promoted the in-depth research and extensive application of text classification related technologies. Text classification is a very important and popular technology in the field of data mining. Text classification technology can calculate a huge amount of unstructured text data in a way that computers can understand, so as to help people better and quickly obtain the informati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/2148G06F18/2411
Inventor 王念滨陈龙何鸣周连科王红滨童鹏鹏王瑛琦陈锡瑞赵新杰王昆明
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products