Supercharge Your Innovation With Domain-Expert AI Agents!

Data processing method and device

A data processing and data technology, applied in the field of data processing, can solve the problems of loss of meaning, tediousness, and slow calculation process of the optimization scheme

Pending Publication Date: 2021-06-04
THE FOURTH PARADIGM BEIJING TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Data skew means that a large amount of data is allocated to a computing node to perform calculations, making the calculation speed of these data much lower than the average calculation speed, resulting in the entire calculation process being too slow
[0004] Due to the different reasons for the data skew problem, the existing data skew optimization scheme is more cumbersome to implement, and the optimization scheme occupies more resources (such as time resources), which will make the optimization scheme lose the meaning of existence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device
  • Data processing method and device
  • Data processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to enable those skilled in the art to better understand the present invention, exemplary embodiments of the present invention will be described in further detail below in conjunction with the accompanying drawings and specific implementation methods.

[0038] The window-based feature calculation task means that when performing feature calculation for each piece of data in the data set, it needs to rely on other data within the window range. The window is used to represent the data set that needs to be used for feature calculation for each piece of data.

[0039] The feature calculation task based on the ordered window means that when performing feature calculation for each piece of data in the data set, it needs to rely on other data within the range of an ordered window. The ordered window is used to represent the ordered data range that needs to be used when performing feature calculations for each piece of data. The ordered window may be, but not limited to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data processing method and device and a storage medium. Based on an analysis result of the feature calculation task, a first operation statement is generated, and the first operation statement is used for carrying out statistics on the to-be-processed data set; based on a statistical result obtained by executing the first operation statement, a second operation statement is generated, the second operation statement is used for marking data in the data set, and the mark of the data is used for identifying a block where the data is located; on the basis of a marking result obtained by executing the second operation statement, a third operation statement is generated, the third operation statement is used for supplementing dependent data lacking in the block, and the dependent data are data which need to depend on during feature calculation of the data in the block but do not exist in the block; and the data set is expanding by executing the third operation statement, and the data in the expanded data set is divided into a plurality of blocks. Therefore, while the problem of data skew possibly existing when the feature calculation task is executed is solved, the correctness of the calculation logic of the data in the partition can be ensured.

Description

technical field [0001] The present invention generally relates to the field of data processing, and more specifically, relates to a data processing method, device and storage medium. Background technique [0002] In the field of data processing, it is usually necessary to perform feature calculations on data to extract features that can represent specific meanings from the data. Taking the feature calculation task based on the ordered window as an example, for each piece of data, the feature calculation can be performed based on the data within the range of the ordered window corresponding to the piece of data. [0003] When performing feature calculation tasks, especially feature calculation tasks based on ordered windows, it is necessary to avoid data skew as much as possible. Data skew means that a large amount of data is allocated to a computing node to perform calculations, making the calculation speed of these data much lower than the average calculation speed, result...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/242G06F16/2458G06F16/22
CPCG06F16/2433G06F16/2462G06F16/2465G06F16/2282G06F2216/03
Inventor 王子贤陈迪豪包新启王太泽
Owner THE FOURTH PARADIGM BEIJING TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More