Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A real-time parallel classification method for massive data streams

A classification method and data flow technology, applied in the Internet field, can solve the problem of low cost performance

Active Publication Date: 2019-04-09
SICHUAN UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when the dimensionality of the data is not high enough, partitioning and parallel computing may not be cost-effective

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A real-time parallel classification method for massive data streams
  • A real-time parallel classification method for massive data streams
  • A real-time parallel classification method for massive data streams

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] All features disclosed in this specification, or steps in all methods or processes disclosed, may be combined in any manner, except for mutually exclusive features and / or steps.

[0063] Any feature disclosed in this specification (including any appended claims, abstract and drawings), unless expressly stated otherwise, may be replaced by alternative features which are equivalent or serve a similar purpose. That is, unless expressly stated otherwise, each feature is one example only of a series of equivalent or similar features.

[0064] The specific implementation manners of the present invention will be described in detail below in conjunction with the drawings and embodiments.

[0065] According to an embodiment of the present invention, this embodiment discloses a real-time parallel classification method for massive data streams. The method is based on the Storm real-time data stream processing framework and can be applied to big data scenarios. Experimental result...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a real-time parallel classification method for massive data streams. The method comprises the following steps: Step 1, data Spout; Step 2, filtering and batching Bolt; Step 3, model Bolt; Step 4, local statistics and Calculate Bolt; step five, evaluate Bolt. Aiming at the "3V" characteristics of Volume (mass), Velocity (high speed) and Value (value) in the "4V" characteristics of big data and the demand for efficient processing of massive data, the present invention realizes the vertically parallel P-VFDT algorithm based on the Storm platform ; Experiments on large-scale data show that the P-VFDT algorithm and the VFDT algorithm have similar classification performance, but the P-VFDT algorithm in the single-machine multi-core environment is about 12% less time-consuming than the VFDT algorithm, and the P-VFDT algorithm in the cluster environment is faster than the VFDT algorithm. The VFDT algorithm takes about 8% less time.

Description

technical field [0001] The invention relates to the technical field of the Internet, and relates to a real-time parallel classification method for massive data streams. Background technique [0002] With the continuous development of the Internet and data processing technology, applications such as search engines, e-commerce, Weibo and instant messaging have provided people with massive information and convenient services, which have greatly improved people's work while enriching people's lives. Efficiency and joie de vivre. People also generate various types of data in the process of using these applications and services, such as sending search requests to search engines, browsing products on e-commerce websites, commenting and forwarding Weibo and online chatting, etc. The accumulated scale of these data has been very large after a certain period of time, and has maintained a relatively high growth rate. The "4V" characteristics of big data - Volume (large amount), Veloci...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/2455
CPCG06F16/24568
Inventor 李川李旺龙
Owner SICHUAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products