Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A distributed traffic big data parallel clustering method suitable for a wide area network

A clustering method and big data technology, applied in the field of data processing, can solve problems such as inability to be directly applicable, complex structure and other problems

Inactive Publication Date: 2019-02-19
洪月华
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Big data is stored in a distributed manner in a wide area network environment, and the data with complex structure and huge amount (reaching TB or even PB level) is moved and concentrated before performing traditional parallel clustering operations based on LAN. Due to time, money and equipment other costs are not directly applicable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A distributed traffic big data parallel clustering method suitable for a wide area network
  • A distributed traffic big data parallel clustering method suitable for a wide area network
  • A distributed traffic big data parallel clustering method suitable for a wide area network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0083] The technical solution of the present invention will be further completely and clearly described below in conjunction with specific embodiments and accompanying drawings. It should be understood that the embodiments used here are only for explaining the present invention, not for limiting the present invention. The discovery of behavior patterns of distributed big data groups in other industries can also be realized by the method of the present invention.

[0084] The present invention uses the violation traffic data of electric vehicles as an example to illustrate the specific implementation method of the technical solution of the present invention.

[0085] Due to the continuous increase in the number of electric vehicles, various electric vehicle violations continue to appear, which has brought huge hidden dangers to traffic safety. If the group behavior patterns of electric vehicle violations can be found, and the right medicine can be prescribed at the same time, to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed traffic big data parallel clustering method suitable for a wide area network, wherein the parallel clustering operation of the distributed big data is divided into a history total quantity stage and a plurality of cycle increment stages according to a time sequence cycle and continuously executed; firstly, the maximum and minimum distance method is used to optimize the k-Means clustering algorithm; Secondly, a distributed parallel clustering computing framework based on MapReduce is constructed, which is suitable for wide area network, In this framework, the improved clustering algorithm is optimized again to realize the distributed parallel execution in WAN, and then the algorithm is used to realize the distributed parallel clustering operation for the whole historical data, from which the feature groups and their behavior patterns are extracted. Finally, the clustering results of the historical data are corrected periodically by using the clustering operation of multi-period incremental large data, and the existing clusters are dynamically updated or new clusters are generated. Distributed computing avoids copying movement and repetitive clustering of large data in wide area network, which reduces the cost of data movement and improves computational efficiency.

Description

technical field [0001] The invention belongs to the technical field of data processing, in particular to a distributed traffic big data parallel clustering method suitable for wide area networks. Background technique [0002] Big data is stored in a distributed manner in a wide area network environment, and the data with complex structure and huge amount (reaching TB or even PB level) is moved and concentrated before performing traditional parallel clustering operations based on LAN. Due to time, money and equipment Such costs are not directly applicable. The use of sampling to reduce the size of the data and the use of dimensionality reduction to reduce the complexity of the data all have an impact on the accuracy of the clustering results. This urgently requires us to change the traditional clustering mining method of the local area network, so that the efficiency and accuracy of data clustering can be improved. [0003] In terms of practical application, the clustering ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06F9/50
CPCG06F9/5066G06F18/23213
Inventor 洪月华
Owner 洪月华
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products