Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Regular expression generation method and data extraction method based on regular expression

A data extraction and expression technology, applied in the field of data processing, can solve the problems of affecting the efficiency of data processing, difficulty, and time-consuming, and achieve the effect of reducing the difficulty of writing and improving the efficiency of data extraction.

Active Publication Date: 2020-05-15
BEIJING QIANXIN TECH +1
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a method for generating regular expressions, a method for extracting data based on regular expressions, a device, computer equipment and storage media, which are used to solve the problem that regular expressions in the prior art need to be manually written, which is difficult and time-consuming long time, which affects the efficiency of data processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Regular expression generation method and data extraction method based on regular expression
  • Regular expression generation method and data extraction method based on regular expression
  • Regular expression generation method and data extraction method based on regular expression

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] The embodiment of the present invention provides a method for generating a regular expression. The detection method can be applied in the field of data processing, such as data extraction, data matching and filtering, etc. Through this method, the difficulty of writing regular expressions can be reduced. At the same time, after obtaining the regular expression, extracting the same type of log files based on the regular expression can greatly improve the efficiency of data extraction. specifically, figure 1 The flow chart of the regular expression generation method provided by Embodiment 1 of the present invention, such as figure 1 As shown, the method for generating a regular expression provided in this embodiment includes the following steps S101 to S103.

[0036] Step S101: Determine the fields to be extracted and the fields not to be extracted in the original data string;

[0037] When implementing, the user can first provide a piece of raw data that needs to gener...

Embodiment 2

[0112] Based on the first embodiment above, the second embodiment of the present invention provides a data extraction method based on regular expressions, see image 3 , the regular expression-based data extraction method includes the following steps S301-S303:

[0113] S301, acquiring raw data for required data extraction;

[0114] S302, analyzing the original data to generate a corresponding regular expression;

[0115] S303, performing data extraction on the original data required for data extraction according to the generated regular expression;

[0116] Wherein, the corresponding regular expression in step S302 is generated by the regular expression generating method in the first embodiment.

[0117] In the specific implementation, an operation interface can be provided, and the user can input the sample of the required regular expression as the original data at the first designated position of the operation interface. After the background server receives t...

Embodiment 3

[0120] Corresponding to Embodiment 1 above, Embodiment 3 of the present invention provides a device for generating a regular expression. For related technical features and corresponding technical effects, refer to Embodiment 1 above, which will not be repeated here. Figure 4 The structural block diagram of the device for generating regular expressions provided by Embodiment 3 of the present invention, such as Figure 4 As shown, the generating device of the regular expression includes: a determination module 401, which is used to determine the field to be extracted and the non-extracted field in the original data character string; the first generation control module 402, for the non-extracted field, performs wildcarding Filter to obtain its regular expression, for the field to be extracted, traverse the character string to obtain its regular expression in one-to-one correspondence between characters and expressions; the second generation control module 403 is used to combi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a regular expression generation method and a data extraction method based on a regular expression. The regular expression generation method comprises the steps of determining ato-be-extracted field and a non-to-be-extracted field in an original data character string; for non-to-be-extracted fields, performing general matching filtering to obtain regular expressions of the non-to-be-extracted fields, and for the to-be-extracted fields, traversing the character strings to enable the characters to be in one-to-one correspondence with the expressions to obtain regular expressions of the to-be-extracted fields; and combining the regular expression of the to-be-extracted fields and the regular expressions of the non-to-be-extracted fields according to the arrangement sequence of the to-be-extracted fields and the non-to-be-extracted fields in the original data character string to obtain the regular expression of the original data. The writing difficulty of the regularexpressions can be reduced, and the data extraction efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method for generating regular expressions and a method for extracting data based on regular expressions. Background technique [0002] In the existing data extraction technology, using regular expressions to extract is a more common way. During the implementation of data extraction, writers are required to manually write regular expressions based on data samples. [0003] However, the semantics of regular expressions are obscure, and the writers need to have very professional skills. After writing, the regular expressions must be verified with new data. A skilled worker who is good at regular expressions often still needs tens of minutes or even hours to complete this process. Writing regular expressions takes a lot of time and energy for writers, which also makes it inefficient to use regular expressions to extract data from raw data. [0004] Aiming at the problem ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/903G06F16/9032
CPCG06F16/9032G06F16/90344
Inventor 孙洪亮张勇
Owner BEIJING QIANXIN TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products