Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Real-time parallel classification method for mass data flow

A classification method and data flow technology, applied in the Internet field, can solve the problem of low cost performance

Active Publication Date: 2016-11-09
SICHUAN UNIV
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when the dimensionality of the data is not high enough, partitioning and parallel computing may not be cost-effective

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Real-time parallel classification method for mass data flow
  • Real-time parallel classification method for mass data flow
  • Real-time parallel classification method for mass data flow

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] All features disclosed in this specification, or steps in all methods or processes disclosed, may be combined in any manner, except for mutually exclusive features and / or steps.

[0063] Any feature disclosed in this specification (including any appended claims, abstract and drawings), unless expressly stated otherwise, may be replaced by alternative features which are equivalent or serve a similar purpose. That is, unless expressly stated otherwise, each feature is one example only of a series of equivalent or similar features.

[0064] The specific implementation manners of the present invention will be described in detail below in conjunction with the drawings and embodiments.

[0065] According to an embodiment of the present invention, this embodiment discloses a real-time parallel classification method for massive data streams. The method is based on the Storm real-time data stream processing framework and can be applied to big data scenarios. Experimental result...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a real-time parallel classification method for mass data flow. The method includes the following steps of 1, obtaining of data Spout; 2, filtering and batching of Bolt; 3, obtaining of the model Bolt; 4, local statistics and calculation of Bolt; 5, evaluation of the Bolt. According to 3V characteristics including Volume, Velocity and Value in '4V' characteristics of big data and the requirement for efficient treatment of mass data, a vertical and parallel P-VFDT algorithm based on a Storm platform is achieved; it is proved by experiments on large-scale data that the P-VFDT algorithm and a VFDT algorithm have similar classification performance, time consumption of the P-VFDT algorithm in a single-computer multi-core environment is reduced by 12% compared with the VFDT algorithm, and the P-VFDT algorithm in a clustering environment is reduced by about 8% compared with the VFDT algorithm.

Description

technical field [0001] The invention relates to the technical field of the Internet, and relates to a real-time parallel classification method for massive data streams. Background technique [0002] With the continuous development of the Internet and data processing technology, applications such as search engines, e-commerce, Weibo and instant messaging have provided people with massive information and convenient services, which have greatly improved people's work while enriching people's lives. Efficiency and joie de vivre. People also generate various types of data in the process of using these applications and services, such as sending search requests to search engines, browsing products on e-commerce websites, commenting and forwarding Weibo and online chatting, etc. The accumulated scale of these data has been very large after a certain period of time, and has maintained a relatively high growth rate. The "4V" characteristics of big data - Volume (large amount), Veloci...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/24568
Inventor 李川李旺龙
Owner SICHUAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products