A data processing method and device

A data processing and data technology, applied in the field of data processing, can solve the problems of many IO operations, time-consuming, low processing efficiency, etc., to achieve the effect of improving processing efficiency, reducing the number of jobs, and reducing IO operations

Active Publication Date: 2020-11-20
AGRICULTURAL BANK OF CHINA +1
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing solutions need to count the data distribution in advance, split the data files and land them, classify and process each data file, and merge the data files. A total of 4 jobs are required. However, each time Hadoop starts a job, it takes a lot of time. And the corresponding IO operations are more, which leads to lower processing efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data processing method and device
  • A data processing method and device
  • A data processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The core idea of ​​the present invention is to carry out group statistics on the key values ​​in the associated fields of the left table, and to perform different processing on the corresponding data records in the two stages when the count value does not exceed the record number threshold and the count value exceeds the record number threshold , to reduce the number of jobs and IO operations, thereby improving processing efficiency.

[0040] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0041] This embod...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data processing method and device. When all same key values are counted in a related field of the left list, telating treatment is conducted on a left list and a to-be-related list through the first key value to obtain first data, and relating treatment is conducted on the left list and the to-be-related list through the second key value to obtain second data; relating treatment is conducted on the second data and a copied to-be-related list through the key value with an added second random number suffix to obtain third data. The group of the obtained first data and the obtained third data is the final required data group. Compared with the prior art, only two jobs are needed for the processed data inclination problem, the number of jobs in the processing processis reduced, meanwhile IO operations are reduced, and the processing efficiency is improved.

Description

technical field [0001] The present invention relates to the field of data processing, and more specifically, to a data processing method and device. Background technique [0002] At present, in the face of the growing demand for massive data processing, Hadoop solutions are often used. The core design of the Hadoop framework is HDFS (Hadoop Distributed File System, distributed file system) and MapReduce. HDFS provides storage for massive data. HDFS has high fault tolerance and is used to deploy on low-cost (low-cost) hardware; and it provides high throughput (high throughput) to access application data, suitable for Applications with very large data sets. [0003] MapReduce provides computation for massive amounts of data. MapReduce is a programming model for parallel operations on large-scale data sets (greater than 1TB). Map (mapping) and Reduce (reduction) are the main ideas of MapReduce, both borrowed from functional programming languages, as well as features borrowe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/22G06F16/27G06F16/182
Inventor 孟洋郭会韩大志
Owner AGRICULTURAL BANK OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products