Unlock instant, AI-driven research and patent intelligence for your innovation.

File merging method and device for big data platform

A big data platform and file technology, which is applied in the computer field, can solve the problem that the number of small files has not been well solved, the computing resources cannot be dynamically applied, and the cluster computing resources are wasted, so as to improve the efficiency of file merging, save resources, The effect of rational allocation of resources

Active Publication Date: 2021-05-07
CHINA UNITECHS
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, existing file merging schemes can only allocate schedules based on time
This maintenance mode has many disadvantages: first, the development content is relatively trivial and the development cost is high; second, the scheduling plan cannot be arranged according to the actual data situation, and there may not be many small files when the task is started, which wastes cluster computing resources, or the task is executed. There are new files written into the directory, and the problem of the number of small files has not been well resolved; the third is that the computing resources for each file processing cannot be dynamically applied according to the actual situation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • File merging method and device for big data platform
  • File merging method and device for big data platform
  • File merging method and device for big data platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Here, the exemplary embodiments and descriptions of the present invention are used to explain the present invention, but not to limit the present invention.

[0031] figure 1 It is a schematic flowchart of a method for merging files on a big data platform according to an embodiment of the present invention. Such as figure 1 As shown, the file merging method of the big data platform of some embodiments may include:

[0032] Step S110: monitor the directory changes of the big data platform, and determine whether the number of files in the changed directory has changed;

[0033] Step S120: when the number of files in the changed directory changes, group files with similar characteristics in the changed directory;

[0034] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a method and device for merging files on a big data platform. The method includes: monitoring the directory changes of the big data platform, and judging whether the number of files in the changed directory changes; When the number of files changes, group files with similar characteristics under the changed directory; judge whether there are small files with a set number of smaller than an integer multiple of the set data block size in the files of the same group; If the small files exist in the files of the group, the small files of the same group are acquired, and the small files of the same group are merged. Through the above solution, small files can be reduced, the memory usage of namenode can be optimized, and the big data platform can accommodate more files.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a file merging method and device for a big data platform. Background technique [0002] In a big data platform, such as a Hadoop cluster, when performing data analysis, there are often a large number of small files in the data directory. The existence of these small files puts a lot of pressure on the namenode, resulting in a decrease in the computing efficiency of the cluster by several times or even dozens of times. times. In the prior art, it is necessary to develop functional components for each group of data directories or each type of target data to merge files. [0003] However, existing file merging schemes can only allocate schedules based on time. This maintenance mode has many disadvantages: first, the development content is relatively trivial and the development cost is high; second, the scheduling plan cannot be arranged according to the actual data situation, an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/16G06F16/18
Inventor 毛恒
Owner CHINA UNITECHS