Unlock instant, AI-driven research and patent intelligence for your innovation.

De-duplication storage method, device and equipment for massive log data and storage medium

A database and log technology, applied in the field of data processing, can solve the problems of insufficient disk capacity of a single computer and inability to expand infinitely, and achieve the effect of avoiding excessive demand for disk capacity

Active Publication Date: 2018-03-23
RUN TECH CO LTD BEIJING
View PDF14 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The present invention provides a method, device, equipment and storage medium for deduplication and warehousing of massive log data, so as to improve the efficiency of deduplication processing of massive log data, and at the same time avoid the occurrence of a single computer disk due to the increase in the amount of log data. Insufficient capacity and unable to expand infinitely

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • De-duplication storage method, device and equipment for massive log data and storage medium
  • De-duplication storage method, device and equipment for massive log data and storage medium
  • De-duplication storage method, device and equipment for massive log data and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] figure 1 It is a flow chart of a method for deduplication and storage of massive log data provided by Embodiment 1 of the present invention. This embodiment can be applied to the situation where valuable information is stored after deduplication of massive log data, for example, to provide The deduplicated log data facilitates police investigation. The method can be performed by the device for deduplication and storage of massive log data provided by the embodiment of the present invention. The device can be implemented in the form of software and / or hardware, and can generally be integrated in computer equipment, such as figure 1 As shown, the method of this embodiment specifically includes:

[0028] S110. Obtain the massive log data to be stored in the first time interval.

[0029] The first time interval is a preset time interval, and the massive logs in this time interval are deduplicated and stored in the warehouse. Preferably, the first time interval is daily, t...

Embodiment 2

[0064] figure 2 Shown is a schematic structural diagram of a device for deduplication and storage of massive log data provided by Embodiment 2 of the present invention. This embodiment is applicable to situations where valuable information is stored after deduplication of massive log data, for example, for The police provide deduplicated log data to facilitate police investigation. The device can be implemented in the form of software and / or hardware, and generally can be integrated in computer equipment, such as figure 2 As shown, the device for deduplication and warehousing of massive log data specifically includes: a data acquisition module 210 for storage, a pre-deduplication result acquisition module 220 for storage, a full deduplication result acquisition module 230, and a database update module 240, wherein,

[0065] A data acquisition module 210 to be stored, configured to acquire a large amount of log data to be stored in the first time interval;

[0066] The pre-...

Embodiment 3

[0089] like image 3 As shown, it is a schematic diagram of the hardware structure of a computer device provided by Embodiment 3 of the present invention, as shown in image 3 As shown, the computer equipment includes:

[0090] one or more processors 310, image 3 Take a processor 310 as an example;

[0091] memory 320;

[0092] The computer device may further include: an input device 330 and an output device 340 .

[0093] The processor 310, the memory 320, the input device 330 and the output device 340 in the computer device can be connected by bus or other methods, image 3 Take connection via bus as an example.

[0094] The memory 320, as a non-transitory computer-readable storage medium, can be used to store software programs, computer-executable programs and modules, such as program instructions / modules (e.g., attached figure 2 Shown are the data to be loaded into the data acquisition module 210, the data to be stored in the pre-deduplication result acquisition ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a de-duplication storage method, device and equipment for massive log data and a storage medium. The method comprises the steps that the to-be-stored mass log data is acquiredin a first time interval; local de-duplication is carried out on the to-be-stored massive log data, and a to-be-stored pre-de-duplication result is acquired; global de-duplication is carried out on the to-be-stored pre-de-duplication result and a reference whole de-duplication result, and a whole de-duplication result corresponding to the first time interval is obtained, wherein the reference whole de-duplication result is the whole de-duplication result obtained by a previous de-duplication storage operation; a log database is updated according to the whole de-duplication result correspondingto the first time interval. According to the method, the de-duplication storage processing of the mass log data is achieved, the problem that the requirement for the disk capacity of a single computer is too high is solved, and the de-duplication, statistics and storage efficiency of the mass log data is also greatly improved.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of data processing, and in particular, to a method, device, device, and storage medium for deduplication and warehousing of massive log data. Background technique [0002] In computers, a log file is a file that records events that occur during the operation of an operating system or other software, or messages between different users of communication software. At present, people's work and life are inseparable from computers, and the total amount of log data is more than one trillion. Therefore, it is very necessary to extract valuable information from massive log data for deduplication storage. [0003] There are usually two methods for deduplication of massive log data: [0004] The first way is to use the Redis cache database to save the primary key information of the log data. The system reads massive log data one by one, obtains the primary key information of the log data fr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/215G06F16/2282G06F16/2358
Inventor 谢永恒邹焱火一莽万月亮
Owner RUN TECH CO LTD BEIJING