Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Streaming large-scale power data analysis method based on Spark Streaming

A large-scale power and power data technology, applied in the direction of electrical digital data processing, data processing applications, digital data information retrieval, etc., to achieve the effect of fast and effective clustering

Pending Publication Date: 2019-07-23
STATE GRID ZHEJIANG ELECTRIC POWER +1
View PDF1 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Moreover, at present, there are few research results in data flow and distributed computing, and they are still in the initial stage of exploration.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Streaming large-scale power data analysis method based on Spark Streaming
  • Streaming large-scale power data analysis method based on Spark Streaming
  • Streaming large-scale power data analysis method based on Spark Streaming

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0077] The conventional K-means clustering algorithm and the method of the present invention are used to verify on the UCI data set, and the clustering accuracy curve is shown in figure 2 , it can be clearly seen from the figure that compared with the conventional K-means clustering algorithm, the method of the present invention has better clustering effect and higher clustering accuracy.

[0078] Use the conventional clustream algorithm and the method of the present invention to verify the user's real data set, and the clustering runtime is shown in image 3 , where, Figure (a) is a comparison chart of running time, and Figure (b) is a comparison chart of SSE (sum of squared errors). It can be seen from the figure that the clustream algorithm takes a shorter time to run on the data set, because clustream's online Both clustering and offline clustering algorithms are implemented by simple k-means, and the number of iterations is controlled. Since the method of the present in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a streaming large-scale power data analysis method based on Spark Streaming, which comprises the following steps: 1, carrying out similarity search on online electric power data streams by utilizing an SS tree so as to cluster the electric power data; and 2, clustering the offline power data flow by using an improved Spark parallel K-means clustering method, wherein the clustering center of the K-means clustering and the initial value of the number of classes adopt the clustering center obtained in the step 1. Experimental evaluation of the method on a UCI data set shows that the method is superior to a traditional K-means clustering algorithm. Meanwhile, through testing the real data set of the user, it is found that electric power data of the user can be quickly and effectively clustered.

Description

technical field [0001] The invention relates to a power data analysis method, in particular to a spark streaming-based streaming large-scale power data analysis method. Background technique [0002] In recent years, people all over the world have higher and higher requirements for environmental protection and sustainable development. In this context, how to make electricity consumption behavior intelligent has become a very important research topic. A large amount of basic electricity consumption data has been accumulated. These data are large in volume and high in frequency. At the same time, the user's power data is continuously generated. The newly generated power data can better reflect the user's power characteristics. A distributed cluster of user power data can provide different incentives for different users. This can help power grid companies understand users' consumption habits and provide users with personalized and differentiated services. In addition, it h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06Q50/06G06F16/245G06K9/62
CPCG06Q50/06G06F16/245G06F18/23213G06F18/23
Inventor 黄建平钱仲文张旭东夏洪涛王文杨少杰王政陈浩张建松沈思琪正卓凡毛宾一吴敏彦王亿陈显辉黄杰王炎陈耀军沈峰周明磊纪德良
Owner STATE GRID ZHEJIANG ELECTRIC POWER
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products