Combined type data cutting method based on regular expression and separators

A cutting method and delimiter technology, which is applied in the direction of electrical digital data processing, special data processing applications, natural language data processing, etc., can solve the problems of poor scalability, can only be cut, cannot be accurately extracted, and achieve good scalability.

Active Publication Date: 2016-12-07
上海轻维软件有限公司
View PDF8 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

And the efficiency is high, currently the data formatted according to the fixed delimiter can be extracted
[0005] 3. Use regular expressions to extract data. At present, the types and formats of data sources are very diverse. Often the output of data does not use a fixed format. The cutting scheme of fixed separators often cannot meet this, so it is necessary to have Only solutions with very high cutting flexibility can carry out data cutting
[0006] The programming code matching of method 1 has the following disadvantages: for each type of data, it is necessary to write and test the program once,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Combined type data cutting method based on regular expression and separators
  • Combined type data cutting method based on regular expression and separators
  • Combined type data cutting method based on regular expression and separators

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0019] figure 1 It is an overall parsing flowchart of the log file of the present invention.

[0020] See figure 1 The combined data cutting method based on regular expressions and delimiters provided by the present invention includes two steps of Event (model) confirmation and type cutting. Event (model) confirmation methods are as follows:

[0021] 1. Timestamp identification method

[0022] This method judges each line of log data through the time recognition algorithm. If there is a time format, it can be judged as the boundary of the event, otherwise the log is not the boundary of the event. The time recognition algorithm is as follows:

[0023] (1) Initialize the data, and use the Chinese month and the English month as the important information of the month in the matching time.

[0024] (2) Divide the log data into characters, numbers and c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a combined type data cutting method based on regular expression and separators. The method comprises the following steps: a) reading a log file; b) recognizing and extracting an event in the log file according to a timestamp or a start symbol; c) extracting data of the extracted event content according to the fixed separator or regular expression way; selecting a plurality of preset character strings by the step b) as the start symbol, and converting the plurality of preset character strings into a plurality of regular expressions, traversing each line of log data, orderly matching with each regular expression. Through the adoption of the combined type data cutting method based on regular expression and separators provided by the invention, the different cutting ways are used according to different event types, each cutting way is corresponding to a log within the range, and the character string operation is provided through each way, thereby satisfying the fast recognition cutting of various complex logs, the method is simple and easy to use, and good in expansibility.

Description

technical field [0001] The invention relates to a data cutting and extraction method, in particular to a combined data cutting method based on regular expressions and delimiters. Background technique [0002] The existing data cutting and extraction methods mainly include the following three methods: [0003] 1. Use the method of writing your own program to cut and extract data, write the corresponding data cutting program according to different data formats, and control the data cutting rules, extraction and output in the program. Programs can be written in different programming languages ​​for different platforms. The flexibility of writing programs for data cutting is very high, which can basically meet all data cutting needs. [0004] 2. Use fixed separators to cut and extract data. This solution is very suitable for data comparison and formatting. Usually, data is formatted according to a certain fixed separator. You only need to follow this separator. Segmentation, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/21G06F17/22G06F17/27
CPCG06F40/10G06F40/131G06F40/20G06F40/279
Inventor 程永新宋辉谢涛谭林罗成
Owner 上海轻维软件有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products