Categorizing method oriented to Internet unbalanced application flow

A classification method and application flow technology, which is applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of low classification accuracy of small categories, unbalanced application flow, and low overall byte classification accuracy

Inactive Publication Date: 2014-10-15
SOUTH CHINA UNIV OF TECH
View PDF2 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The purpose of the embodiments of the present invention is to provide a classification method for Internet unbalanced application flow, aiming to sol

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Categorizing method oriented to Internet unbalanced application flow
  • Categorizing method oriented to Internet unbalanced application flow
  • Categorizing method oriented to Internet unbalanced application flow

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] A classification method for unbalanced Internet application flow of the present invention includes 5 steps, which are divided into three parts: flow data preprocessing, such as S101, S102 and S103; off-line training of the flow classification model, such as S104; flow classification S105.

[0039] S101, using the k-means algorithm to divide the data set into multiple dense and disjoint subsets, each subset contains a cluster center;

[0040] S102: For the subset obtained in S101, expand the small class flow samples according to the oversampling ratio;

[0041] S103: Aiming at the subset obtained in S102, formulate a heuristic rule to undersample the large class flow samples;

[0042] S104: Using the subset obtained in S103 as a training set, train k integrated classification models offline;

[0043] S105: Combine k integrated classification models to classify the test stream samples.

Embodiment 2

[0045] S101, using the k-means clustering algorithm to divide the traffic data set into multiple dense and disjoint data subsets, the number of clusters k is determined by: using the sum of squared error (SSE) search k value, SSE represents the sum of the divergence within each cluster when the data set is divided into k clusters, the calculation of the sum of the SSE of each cluster is as formula (1), where x i Denotes the i-th flow sample, μ j Indicates the jth cluster center, n j Indicates the number of flow samples of the jth cluster;

[0046] SSE = Σ j = 1 k Σ i = 1 n j ( x i - μ j ) ...

Embodiment 3

[0052] S102, the classification method for Internet unbalanced application flow is characterized in that expanding the small flow sample according to the oversampling ratio includes the flow of the oversampling ratio and expanding the small flow sample; the "oversampling ratio" refers to the In the oversampling subset, the ratio of the number of flow samples of the largest class to the number of flow samples of a small class is set artificially; the oversampling subset is a flow sample data set obtained by expanding the flow samples of the small class in the current subset ; The current subset contains samples of multiple categories, which are divided into three parts: samples of 1 largest class, samples of 1 or more small classes, and samples of 1 or more other classes; the largest class refers to The category with the largest number of samples in the current subset; the small category refers to the category that expands at least one sample in the current subset and satisfies ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a categorizing method oriented to Internet unbalanced application streaming. The method includes: on the basis that streaming, streaming statistical feature value calculation and category labeling of collected flow data messages are completed and streaming samples and a flow data set are acquired, dividing the data set through a cluster algorithm to form a plurality of dense and disjoint subsets; aiming at streaming sample feature values of the subsets, using interpolation to expand sub-category streaming samples; setting the undersampling rules of large-category streaming samples according to the neighboring relations among the streaming samples of current subsets and the bytes number of the streaming samples; training integration categorizing models one by one on the basis of a boosting-style integration learning algorithm explicitly considering integration diversity; judging the distance between testing set streaming samples and each training subset cluster center, selecting the integration categorizing model categorizing streaming samples corresponding to the closest cluster center, and outputting the application category which the streaming samples belong to. By the method, the categorizing model can increase sub-category categorizing accuracy and total bytes categorizing accuracy while large-category categorizing accuracy is not lowered.

Description

technical field [0001] The invention belongs to the technical field of flow classification for Internet flow measurement, in particular to a classification method for Internet unbalanced application flow. Background technique [0002] In recent years, the continuous development of Internet access technology and access equipment has promoted the rapid expansion of Internet users. The rapid increase of Internet network applications has led to a rapid increase in Internet traffic. Since 1999, with the emergence of P2P (Peer-to-Peer) architecture, applications such as P2P file sharing and streaming media have been widely used. According to the 32nd "Statistical Report on Internet Development in China", as of June 2013, the semi-annual growth rate of online video applications was 4.5%, reaching 389 million people, and the usage rate was 65.8%. Due to the rapid growth of Internet traffic due to the development of heavy hitters, it causes excessive consumption of network bandwidt...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F18/23213G06F18/24
Inventor 刘琼刘珍
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products