Information extraction method, device and equipment and storage medium

An information extraction and preset format technology, applied in the field of data processing, can solve the problem of complex field extraction rules, save computing space and query time, facilitate query statistics, and reduce labor costs.

Inactive Publication Date: 2020-09-25
BEIJING YOUTEJIE INFORMATION TECH
View PDF5 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the process of log data processing, the writing of field extraction rules consumes a lot of manpower, and with the continuous change of log format, field extraction rules are not only becoming more and more complicated, but also need to be continuously updated and maintained

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information extraction method, device and equipment and storage medium
  • Information extraction method, device and equipment and storage medium
  • Information extraction method, device and equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] figure 1 It is a flow chart of an information extraction method provided in Embodiment 1 of the present invention, which is applicable to the case of field extraction of a large number of logs in different formats. The method can be executed by the information extraction device provided in the embodiment of the present invention. The device can It is realized by means of software and / or hardware, and generally can be integrated in computer equipment.

[0024] Such as figure 1 As shown, the information extraction method provided in this embodiment specifically includes:

[0025] S110. Obtain a log of fields to be extracted.

[0026] The field extraction log refers to the log that needs field extraction.

[0027] S120. Determine a target log format that matches the log to be extracted.

[0028] Log format refers to the encoding format of the log. Specifically, it can be the description form and interval form of related information such as date, time, user, and action ...

Embodiment 2

[0060] figure 2 It is a flowchart of an information extraction method provided by Embodiment 2 of the present invention. This embodiment is embodied on the basis of the above-mentioned technical solution, wherein the format of the target log that matches the log to be extracted from the field can be determined, specifically:

[0061] When it is determined that there is no custom parsing rule corresponding to the log to be extracted, a target log format matching the log to be extracted is determined.

[0062] Further, when it is determined that there is a custom parsing rule corresponding to the log to be extracted, field extraction is performed on the log to be extracted according to the custom parsing rule.

[0063] Such as figure 2 As shown, the information extraction method provided in this embodiment specifically includes:

[0064] S210. Obtain a log of fields to be extracted.

[0065] S220. Determine whether there is a custom parsing rule corresponding to the log to...

Embodiment 3

[0082] image 3 It is a schematic structural diagram of an information extraction device provided in Embodiment 3 of the present invention, which is applicable to the case of extracting fields from a large number of logs in different formats. The device can be implemented by software and / or hardware, and can generally be integrated in in computer equipment.

[0083] Such as image 3 As shown, the information extraction device specifically includes: a log acquisition module 310 , a log format determination module 320 and a field extraction module 330 . in,

[0084]The log obtaining module 310 is configured to obtain the log to be extracted by the field;

[0085] The log format determination module 320 is configured to determine the target log format matching the log to be extracted from the field;

[0086] The field extraction module 330 is configured to extract the field name and field value conforming to the preset format according to the field extraction template matchin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses an information extraction method and device, equipment and a storage medium. The method comprises the steps of obtaining a log to be subjected to field extraction; determining a target log format matched with the log to be subjected to field extraction; for the log to be subjected to field extraction, extracting a field name and a field value which accordwith a preset format according to a field extraction template matched with the target log format, wherein the field extraction template is determined through log clustering training. Through the technical scheme, the fields in the logs are extracted and converted into the unified preset format, the unstructured data can be converted into the structured data, query statistics is facilitated, the calculation space and query time are saved, analysis rule writing does not need to be manually conducted on the logs of each specific format, and the labor cost is reduced.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of data processing, and in particular, to an information extraction method, device, equipment, and storage medium. Background technique [0002] Data in computer information systems can be divided into structured data and unstructured data. Among them, unstructured data formats are very diverse, and the standards are also diverse, which is difficult to understand and use directly. After converting unstructured data into structured data, it can be stored in search engines, relational databases, non-relational databases and other systems for further analysis, or stored in databases for analysis by business intelligence software, or can be streamed in real time. Import to other systems by format, or import to other systems by batch processing, etc. [0003] Converting unstructured data into structured data first requires classification and extraction of information in logs. Usually,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/31G06F16/35
CPCG06F16/313G06F16/353
Inventor 饶琛琳梁玫娟
Owner BEIJING YOUTEJIE INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products