Data merging method and device, electronic equipment, storage medium and program product

A data and data writing technology, applied in the computer field, can solve problems affecting the stable operation of the cluster, and achieve the effect of improving the efficiency of file management

Pending Publication Date: 2021-06-11
LAKALA PAYMENT CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when writing to Hive through Spark SQL or Spark Streaming or directly to HDFS, too many small files will put huge pressure on NameNode memory management, etc., and will affect the stable operation of the entire cluster

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data merging method and device, electronic equipment, storage medium and program product
  • Data merging method and device, electronic equipment, storage medium and program product
  • Data merging method and device, electronic equipment, storage medium and program product

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0087] Hereinafter, an exemplary embodiment of the present disclosure embodiment will be described in detail with reference to the accompanying drawings to enable those skilled in the art to easily implement them. Furthermore, for the sake of clarity, a portion independent of the description exemplary embodiment is omitted in the drawings.

[0088] In the present disclosure, it should be understood that terms such as "including" or "having" is intended to indicate the features, numbers, steps, behaviors, components, part, or combinations thereof, and do not want One or more other features, numbers, steps, behaviors, components, part, or the possibility thereof are excited or added.

[0089] It will also be also necessary to explain that the features in the present disclosure may be combined with each other in the case of an unable conflict. The present disclosure will be described in detail below with reference to the drawings.

[0090] The technical solution provided herein provi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a data merging method and device, electronic equipment, a storage medium and a program product, and the method comprises: responding to a data writing success event of a distributed file system, and reading file information under a distributed file directory related to a current data writing operation; determining a target file of which the file size is smaller than a first preset threshold value according to the file information; and when the number of the target files is multiple, merging the multiple target files. According to the technical scheme, through the mode of the embodiment of the invention, excessive small files can be prevented from being generated in the Spark file writing process, so that the file management efficiency, the data query performance and the like of the distributed file system can be improved.

Description

Technical field [0001] Embodiments of the present disclosure relate to computer technology, and specific relates to a data merge method, apparatus, electronic device, storage medium, and program product. Background technique [0002] In the era of big data, with the rapid rise of Internet technology, people collected in different fields have achieved unprecedented extent. At the same time, the production, storage and processing methods of data have undergone revolutionary changes, and people's work and life can basically explicitly use digitization, and the use query of data is very frequent. [0003] Spark is a rapid and universal computing engine designed for large-scale data processing, which is now forming a wide range of ecosystems. Spark can accomplish a variety of operations, including SQL query, text processing, machine learning, etc. Spark also offers a large number of libraries, including Spark Core, Spark SQL, Spark Street, Mllib, Graphx. However, when writing Hive or ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/16G06F16/182G06F16/13
CPCG06F16/134G06F16/16G06F16/182
Inventor 不公告发明人
Owner LAKALA PAYMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products