File merging method and device for big data platform

Active Publication Date: 2019-03-08
CHINA UNITECHS
View PDF2 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, existing file merging schemes can only allocate schedules based on time
This maintenance mode has many disadvantages: first, the development content is relatively trivial and the development cost is high; second, the scheduling plan cannot be arranged according to the actual data situation, and there may not be many small files when the task is started, which wastes cluster computing resources, or the task is executed. There are new files written into the directory, and the problem of the number of small files has not been well resolved; the third is that the computing resources for each file processing cannot be dynamically applied according to the actual situation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • File merging method and device for big data platform
  • File merging method and device for big data platform
  • File merging method and device for big data platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Here, the exemplary embodiments and descriptions of the present invention are used to explain the present invention, but not to limit the present invention.

[0031] figure 1 It is a schematic flowchart of a method for merging files on a big data platform according to an embodiment of the present invention. Such as figure 1 As shown, the file merging method of the big data platform of some embodiments may include:

[0032] Step S110: monitor the directory changes of the big data platform, and determine whether the number of files in the changed directory has changed;

[0033] Step S120: when the number of files in the changed directory changes, group files with similar characteristics in the changed directory;

[0034] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a file merging method and device of a big data platform. The method comprises the following steps of monitoring the directory change of the big data platform, and judging whether the number of files under the changed directory changes or not; grouping files having similar characteristics under the changed directory in a case where the number of files under the changed directory changes; judging whether there are small files smaller than an integer multiple of a set number of data blocks in the same group of files; when the small files exist in the files of the same group, obtaining the small files of the same group, and merging the small files of the same group. Through the above scheme, small files can be reduced, memory usage of namenode can be optimized, and the large data platform can accommodate more files.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a file merging method and device for a big data platform. Background technique [0002] In a big data platform, such as a Hadoop cluster, when performing data analysis, there are often a large number of small files in the data directory. The existence of these small files puts a lot of pressure on the namenode, resulting in a decrease in the computing efficiency of the cluster by several times or even dozens of times. times. In the prior art, it is necessary to develop functional components for each group of data directories or each type of target data to merge files. [0003] However, existing file merging schemes can only allocate schedules based on time. This maintenance mode has many disadvantages: first, the development content is relatively trivial and the development cost is high; second, the scheduling plan cannot be arranged according to the actual data situation, an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/16G06F16/18
Inventor 毛恒
Owner CHINA UNITECHS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products