Method and system for building big data distributed log

A distributed, big data technology, applied in the field of big data processing, can solve problems such as inability to meet real-time performance, performance bottlenecks, log data loss, etc., to achieve flexibility and reliability, good scalability and stability, and system internal The effect of flexible networking

Active Publication Date: 2015-12-09
北京思特奇信息技术股份有限公司
View PDF5 Cites 59 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in a large-scale system with a large business volume, high concurrency, and many server clusters, the above simple log processing method can no longer meet the requirements. The above log processing method has the following disadvantages:
[0003] The first is unreliability; because a large-scale system is a cluster composed of many servers, it is common for a server node in the cluster to fail, and the important log data stored on the failed server may be lost, and the traditional The normal log backup method is to perform regular backups every day, which makes the data generated during the two backup weeks irrecoverable
[0004] The second is that it cannot meet the real-time requirements, because the value of many log data will decrease with the passage of time. For example, if the business monitoring system cannot obtain the latest business log data in real time, then the problems that are occurring in the system will not be immediately resolved. be fed back
[0005] The third is that it cannot meet the performance requirements of log data processing. In large-scale business systems, the amount of log data that needs to be processed is at least GB or TB level. If these massive data are stored on a single server node or a relational database Among them, due to the limitation of stand-alone IOPS (number of read and write operations per second), its processing performance is very low
[0006] The fourth is poor scalability. With the increase of the log volume, the data processing time is getting longer and the performance is getting worse. The traditional log processing method is difficult to improve the data processing capability simply by adding servers, or even As the amount of data grows, even the storage space will not be enough
In addition, the maintainability issues brought about by the increase in complexity after system expansion also need to be carefully considered
[0007] In addition to the above-mentioned problems, when mining log data, some distributed data mining algorithms will perform repeated loop iterative operations on the data. If a log system is only built based on a disk-based distributed file system, it is still limited by the speed of disk IO. There will inevitably be performance bottlenecks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for building big data distributed log
  • Method and system for building big data distributed log
  • Method and system for building big data distributed log

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

[0058] figure 1 It is a flow chart of the method for constructing a big data distributed log according to the present invention.

[0059] Such as figure 1 As shown, a method for constructing a distributed log of big data includes the following steps:

[0060] Step S1, the log transmission subsystem receives the log data from the business system, generates a UUID identifier for each received log data, and sends the log data with the UUID identifier through multiple nodes after load balancing to the log storage subsystem;

[0061] Step S2, the log storage subsystem receives the log data, stores the log data through horizontal expansion, and then sends the log data to the batch log processing subsystem;

[0062] St...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method and a system for building a big data distributed log. The method comprises the following steps that: 1, a log transmission sub system receives log data from a service system, generates a UUID (Universally Unique Identifier) for each piece of received log data, and sends the log data with the UUID to a log storage sub system through a plurality of nodes after the load balance; 2, the log storage system receives the log data, and stores the log data in a horizontal expansion mode; and 3, a mass log processing sub system extracts the log data stored in the log storage sub system, uses a MapReduce algorithm to regularly perform mass pre-processing on the log data, and generates a report by the hour and a report by the day to be required by an external service report system. The method and the system have the advantage that the requirements on the reliability, the real-time performance, the high performance, the high expandability and the maintainability of a log system under the big data condition can be met.

Description

technical field [0001] The invention relates to the field of big data processing, in particular to a method and system for constructing a big data distributed log. Background technique [0002] In general software systems, there is no special independent system for processing logs. Existing software systems simply write logs to local disks or synchronize them to relational databases for future retrieval needs. However, in a large-scale system with a large business volume, high concurrency, and many server clusters, the above simple log processing method can no longer meet the requirements. The above log processing method has the following disadvantages: [0003] The first is unreliability; because a large-scale system is a cluster composed of many servers, it is common for a server node in the cluster to fail, and the important log data stored on the failed server may be lost, and the traditional The current log backup method is generally to perform regular backups every da...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/182G06F16/25G06F16/27
Inventor 王萍末
Owner 北京思特奇信息技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products