Data file processing method and device

A technology of data files and processing methods, applied in the field of data processing, can solve the problems of large time consumption of data files, lag in timeliness, difficult processing, etc., and achieve the effect of improving the efficiency of data processing

Active Publication Date: 2016-08-31
AGRICULTURAL BANK OF CHINA
View PDF7 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For such a data file with a huge amount of data, the storage and import processing of the data in the process of data interaction will consume a lot of time, which will lead to processing difficulties and lag in timeliness
[0003

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data file processing method and device
  • Data file processing method and device
  • Data file processing method and device

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0088] For example, if the number of nodes in the HDFS cluster is 10, the original data file can be split into 10 sub-files, and according to the remaining storage space of each node and the value range distribution of specific key values, determine the value of each split sub-file The size and the upper and lower limits of the value domain distribution of each sub-file. For example: the primary key IDs of 10,000 records in the bank transaction flow table are distributed between 1,000 and 9,999, and there are 9,000 records from 1,000 to 3,000, then these 9,000 records can be split into 9 sub-files, and the IDs of 3,000 to 9,000 The data is a subfile. Wherein, the number of sub-files to be split, the size of each sub-file to be split, and the strategy of storing sub-files on nodes with appropriate sizes according to their sizes can be referred to as storage strategies. The strategy of how to split the original data file is called a split strategy.

[0089] It should be noted ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data file processing method and device. The data file processing method includes: searching and collecting a specific key value identical to a search field in an original data file according to a defined search field; analyzing the specific key value, and acquiring range distribution of the specific key value; determining a file storage strategy and a file splitting strategy on basis of use of cluster resources of an Hadoop data storage environment; splitting the original data file into a plurality of sub-files according to the file splitting strategy; and finally storing all the sub-files in different nodes of HDFS clusters. The data file processing method and device can achieve distributed storage of data files; the distributed stored sub-files provide the possibility of multi-thread operation of the data files; and then the a plurality of sub-files can be processed at the same time, and the efficiency of data processing can be improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a data file processing method and device. Background technique [0002] At present, for ultra-large-scale data files, such as transaction flow data in a bank transaction system, the data volume may reach TB level. In the prior art, the ultra-large-scale data file is usually stored as a whole as a large data file. For such a data file with a huge amount of data, the data storage and import processing in the process of data interaction will consume a lot of time, which will lead to processing difficulties and lag in timeliness. [0003] Moreover, since the data table is stored as a data file as a whole, the operation on such a data file with a huge amount of data can only be single-threaded, so the processing of the data file will also consume a lot of time. Contents of the invention [0004] In view of this, the present invention provides a data file processing method ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/113G06F16/182
Inventor 杨声钢李晓轩和宏涛金鹏
Owner AGRICULTURAL BANK OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products