Method and system for collecting data in log file

A log file and data collection technology, applied in the field of big data processing, can solve problems such as spooldir does not support breakpoint resume, file content is immutable, and timeliness is poor

Inactive Publication Date: 2016-12-21
BEIJING GEO POLYMERIZATION TECH
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Question 1: If the agent process of flume hangs unexpectedly, there will be problems if tail-F is executed after startup
[0005] Question 2: When the log scrolls, such as app.log, scrolling generates app.log.20160526
[0007] Question 1: spooldir will monitor whether there are new log files in the log directory, and the content of the file is required to be immutable when reading the file
That is, spooldir reads the log of the app.log.20160422 file, but cannot read the log of app.log, because new log data can be continuously generated in the app.log file
The problem with spooldir is to read the log files generated after the rollover, which has poor timeliness. If the log files are rolled on a daily basis, they can only be collected once a day.
If you speed up log rollover, such as rolling by minutes, many small files will be generated, which is not conducive to management and maintenance
[0008] Question 2: spooldir does not support resuming uploads

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for collecting data in log file

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] like figure 1 As shown, the data collection method in this log file includes the following steps:

[0021] (1) Regularly check the information of the log files under the log directory, the information includes: log file name, log file length, inode value of the log file;

[0022] Explain why the inode value is taken: Since the name will change when the log is rolled, in order to still find the file, you need to find a unique identifier; when the file in the same physical disk is rename or mv, its inode value remains unchanged; Log rolling is equivalent to doing rename or mv operations; for example: for example, the inode of app.log=5914332, when the log is rolling, the inode of the generated log file app.log.20160422 is still 5914332, and the inode of the new app.log file is New value, for example: 5914335;

[0023] (2) Real-time collection: collect logs, the consumption offset of the newly generated file is 0, and record the inode value and consumption offset of the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The application discloses a method for collecting data in a log file. The data in the log file can be collected in real time, the breakpoint resume is supported, and the data cannot be lost in the log rolling. The method comprises the following steps: (1) regularly checking the information of the log file under a log catalogue, wherein the information comprises a log file name, the log file length, and an inode value of the log file; (2) collecting in real time: collecting the log, wherein the newly produced file consumption offset is 0, recording the inode value and the consumption offset of the file in a metadata file after the consumption; (3) breakpoint resuming: if a log collection procedure exits abnormally, loading the consumption offset while reloading the log collection procedure, continuously consuming from the consumption offset point. The invention further provides a system for collecting data n the log file.

Description

technical field [0001] The invention relates to the technical field of big data processing, in particular to a method for collecting data in log files and a system for collecting data in log files. Background technique [0002] Flume (Flume is a highly available, highly reliable, and distributed massive log collection, aggregation, and transmission system provided by Cloudera. Flume supports customizing various data senders in the log system for data collection; at the same time, Flume Provides the ability to simply process data and write to various data recipients (customizable).) Two source sources are provided: ExecSource executes the system command tail-F; the other is spooldir. The system can support real-time collection of data in log files to a certain extent. But there are certain problems. [0003] When ExecSource executes the system command tail–F to collect data in log files in real time, the problems are as follows: [0004] Question 1: If the agent process of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F11/14
CPCG06F11/1438G06F11/1443G06F16/11G06F16/1815
Inventor 范卫卫张翼温宗臣崔晶晶林佳婕
Owner BEIJING GEO POLYMERIZATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products