Method and device for sequentially segmenting data on Spark platform

A technology of time series data and data, applied in the field of Spark platform, can solve the problem of inability to divide data according to the data sequence, and achieve the effect of reducing rigid requirements and avoiding network transmission.

Pending Publication Date: 2020-10-02
INDUSTRIAL AND COMMERCIAL BANK OF CHINA
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In the existing technology, the Spark platform that uses parallel computing only provides the function of randomly splitting data, and cannot split data in order. Therefore, it is difficult to split time-series data in order on the Spark platform.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for sequentially segmenting data on Spark platform
  • Method and device for sequentially segmenting data on Spark platform
  • Method and device for sequentially segmenting data on Spark platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is an embodiment of a part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

[0039] Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for sequentially segmenting data on a Spark platform. The method comprises: acquiring distribution information of a data set of time series data on the Spark platform; obtaining a first data segmentation point and a second data segmentation point which are determined in advance according to a target data segment in the preset time series data; respectively determining two-dimensional data set coordinates of the first data segmentation point and the second data segmentation point according to the distribution information; and determining data betweenthe first data segmentation point and the second data segmentation point in the data set according to the two-dimensional data set coordinates, and generating a data set corresponding to the target data segment. The invention provides a data set segmentation method with low memory and network consumption on a Spark platform.

Description

technical field [0001] The present invention relates to the Spark platform, in particular to a method and device for sequentially segmenting data on the Spark platform. Background technique [0002] Financial time series data refers to the values ​​that financial random variables take in chronological order. Financial time series data have unique statistical characteristics, such as volatility clusters and leverage effects. In order to describe the statistical characteristics of financial time series data well, it is very important to conduct reasonable statistical modeling on financial time series data. Before statistical modeling, it is necessary to perform a series of data processing on the financial time series data. Segmenting the data according to the time sequence is one of the processing methods. One or more pieces of data. [0003] Apache Spark is an analytical engine for large-scale data processing, used to build large-scale, low-latency data analysis applicatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/27G06F16/22G06F16/2458
CPCG06F16/278G06F16/2282G06F16/2474
Inventor 饶彭彦
Owner INDUSTRIAL AND COMMERCIAL BANK OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products