Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A power grid timing sequence large data parallel loading method

A time-series data and big data technology, applied in the direction of program loading/starting, program control devices, etc., can solve problems such as inability to load in parallel, time-consuming, waiting, etc., and achieve the effect of reduced time and efficient parallel loading

Active Publication Date: 2015-03-11
CHINA REALTIME DATABASE
View PDF5 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Loading a large amount of historical time series data from a single client cannot exert the ability of distributed concurrent processing, and it takes a lot of time to complete, while the general multi-client parallel loading will encounter multiple clients reading and writing the index mapping table at the same time during data loading The file generates a large number of disk IO conflicts and the network communication overhead between different nodes of the cluster can not be loaded in parallel and the resulting waiting phenomenon is caused; after a preliminary search, no technical solutions to solve the above technical problems have been found.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A power grid timing sequence large data parallel loading method
  • A power grid timing sequence large data parallel loading method
  • A power grid timing sequence large data parallel loading method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments.

[0019] This embodiment describes the present invention by using an application example in a power grid business scenario. Assume that the following cluster based on Hadoop and HBase consists of 5 machines and a high-availability HA configuration is performed on the cluster. The configuration of each machine is shown in Table 1. In this application scenario, there are 600,000 measurement points, the data collection frequency is 60 frames / min, and each data record collected is about 70 bytes, so the 600,000 measurement points will generate 3.3T bytes per day (24 hours). The data. The following describes the implementation of the method by taking loading 3.3T data into a big data system as an example.

[0020]

[0021] Table 1 Configuration of each machine in the cluster

[0022] The flow chart of the inventive method is as figure 2 Shown...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses a power grid timing sequence large data parallel loading method, to solve the problem, i.e., waiting phenomena occurs due to failing in parallel loading when multiple clients parallel loads an abundant amount of historical timing sequence data. By partitioning an index mapping table, the present invention performs partitioning preprocessing on a historical timing sequence data storage table according to the size of the amount of data to be loaded, and performs the processing of maintaining data locality of the abundant amount of historical timing sequence data to be loaded according to the range of the partition of the historical timing sequence data storage table distributed on each data node. After the abovementioned processes, disk IO conflicts and the network communication overheads between different nodes of the cluster encountered by the multiple clients when reading the index mapping data file can be effectively reduced when parallel loading the abundant amount of historical timing sequence data, therefore, performance issues caused by overload when loading the abundant amount of historical timing sequence data by a single node. The present method can fully use the distributed parallel processing ability to greatly reduce the time for loading the abundant amount of historical timing sequence data.

Description

technical field [0001] The invention relates to a data parallel loading method, which belongs to the field of big data processing and distributed real-time databases, and is particularly suitable for the parallel loading method of massive historical time series data in smart grids and the Internet of Things. Background technique [0002] With the continuous development of industrialization and informatization, large-scale process industry enterprises generate more and more massive historical time series data in the process of production informatization. Taking the power system as an example, on the one hand, the scale of measuring points is getting bigger and bigger, and it is expected to reach tens of millions or even more than one hundred million; This puts forward higher requirements on the processing scale and processing speed of the real-time database. [0003] Constrained by its traditional software architecture, traditional real-time databases can no longer meet the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/445
Inventor 王远袁军包建国胡健张珂珩
Owner CHINA REALTIME DATABASE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products