File merging method and device based on cassandra database

A data file and database technology, applied in the database field, can solve the problems of heavy disk read and write burden, heavy combined IO pressure, large disk space occupation, etc., and achieve the effect of optimizing file storage structure and IO efficiency

Pending Publication Date: 2020-11-03
WUHAN GREENET INFORMATION SERVICE
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] Aiming at the above defects or improvement needs of the prior art, the present invention solves the large disk read and write burden, large disk space occupation, an

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • File merging method and device based on cassandra database
  • File merging method and device based on cassandra database
  • File merging method and device based on cassandra database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] In the cassandra database, when the client writes data, the client program determines the server node to which the data should be sent according to the token range on the cluster, and the server accepts data in parallel with multiple threads, and each thread will do the data received by itself. Sort and generate data files smaller than 10M. In some implementation scenarios, when the amount of data processed by a single machine of the Cassandra database reaches 4TB / day, the number of data files that the database process needs to open will exceed 200,000. When data storage is required for 7 days, the number of data files that the process needs to open will reach 1.4 million. The data files need to be merged to reduce the number of files. Therefore, this embodiment provides a new dynamic small file merging method, which avoids the defects existing in the existing file merging strategy.

[0035]Cassandra is a NoSQL distributed database that adopts the Log Structured Merge ...

Embodiment 2

[0076] On the basis of the method for merging files based on the cassandra database provided in the above-mentioned embodiment 1, the present invention also provides a device for merging files based on the cassandra database that can be used to implement the above method, such as Figure 5 Shown is a schematic diagram of the device architecture of the embodiment of the present invention. The apparatus for merging files based on a cassandra database in this embodiment includes one or more processors 21 and a memory 22 . in, Figure 5 A processor 21 is taken as an example.

[0077] Processor 21 and memory 22 can be connected by bus or other means, Figure 5 Take connection via bus as an example.

[0078] Memory 22, as a non-volatile computer-readable storage medium based on the cassandra database-based file merging method, can be used to store non-volatile software programs, non-volatile computer-executable programs and modules, such as the one based on File merging method f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of databases, in particular to a file merging method and device based on a cassandra database. The method mainly comprises the steps of receiving a data file generated by a database, and generating a merged file list of each disk; obtaining a merged file list of the corresponding disk by the merging process of each disk, and obtaining the size of a data file needing to be merged in the merged file list of each disk; starting; and in the parallel merging process of the database, when the sum of the sizes of the data files acquired by the merging process of each disk reaches a merging file threshold value, merging the data files needing to be merged in all the disks at one time by the parallel merging process. According to the method, the small files can bemerged in time under the condition that few merging layers and temporary files are used, the merging frequency is reduced, the occupied space of the files in the disk is reduced, the disk IO frequency and disk IO competition are reduced, the file merging performance is improved, and the read-write stability of the database is improved.

Description

【Technical field】 [0001] The invention relates to the field of databases, in particular to a method and device for merging files based on a cassandra database. 【Background technique】 [0002] The Cassandra database is an open source distributed hybrid storage solution with features such as decentralization, scalability, high availability, fault tolerance, and configurable consistency. When cassandra flushes the cached data to the disk sequentially, it will generate multiple data files (Sorted String Table, abbreviated as: sstable) of about 1-10MB. When the cassandra database processes a large amount of data, the huge number of files will seriously affect the stability of the database and slow down the query speed. [0003] In order to reduce the number of files to be processed, the cassandra database provides a file merging (Compaction) mechanism to merge multiple large files into a small number of small files. Currently commonly used file merging strategies are: [0004]...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/16G06F16/182
CPCG06F16/16G06F16/182
Inventor 叶志钢王化民张本军王赟谭国权赵雨佳
Owner WUHAN GREENET INFORMATION SERVICE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products