Data monitoring method, device, equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By establishing a lineage graph in the big data platform and monitoring data changes and anomalies in the application processing chain, the problem of inaccurate task dependencies was solved, enabling real-time identification and monitoring of the impact on key applications and improving the accuracy and efficiency of data analysis.

CN116302829BActive Publication Date: 2026-06-16CHINA PING AN PROPERTY INSURANCE CO LTD

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA PING AN PROPERTY INSURANCE CO LTD
Filing Date: 2023-03-23
Publication Date: 2026-06-16

Application Information

Patent Timeline

23 Mar 2023

Application

16 Jun 2026

Publication

CN116302829B

IPC: G06F11/30; G06F16/242; G06F16/2455; G06F16/28

CPC: G06F11/302; G06F11/3051; G06F11/3072; G06F16/2433; G06F16/2455; G06F16/285; Y02D10/00

AI Tagging

Application Domain

Relational databases Hardware monitoring

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN116302829B_ABST

Patent Text Reader

Abstract

The application relates to the technical field of big data, and discloses a data monitoring method, device, equipment and storage medium. The method comprises the following steps: obtaining task records generated by each application in a processing link and analyzing the task records to obtain SQL statements corresponding to each task; obtaining running logs of the SQL statements and analyzing the running logs to obtain source tables and target tables of each task; taking the applications as root nodes, taking table names of each table as child nodes, and establishing a blood relationship graph between each task, the source tables and the target tables according to processing levels of the processing link of each application; and monitoring each node in the blood relationship graph and downstream blood relationship applications of each node. Through the above method, compared with a data analysis mode depending on scheduling and dependency relationship, the application can accurately identify data involved in an application processing link and establish a blood relationship graph, and by monitoring the data, the influence of data changes on related applications with blood relationship can be known in real time.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of big data technology, and in particular to a data monitoring method, apparatus, device, and storage medium. Background Technology

[0002] As data plays an increasingly important role in daily business analysis, most enterprises have established big data platforms, BI (Business Intelligence) reporting and analysis systems, and data mining application ecosystems. Due to the complexity of enterprise operations, different business systems are usually established, such as: auto insurance policy issuance, non-auto insurance policy issuance, group insurance policy issuance, auto insurance claims, and property insurance claims, resulting in data fragmentation between different systems. This necessitates the unified and centralized management of data from the source systems through data layering and standardization, roughly divided into ODS (Operational Data Store) -> DWD (Data Warehouse Detail) -> DWM (Data Warehouse Middle) -> DWS (Data Warehouse Service) -> APP (Application). Data is processed layer by layer to the application end, and often an application has several to more than a dozen layers in the link from the source to the APP. In addition, there are numerous data applications, typically involving tens of thousands of application interface tables; however, the core metrics, dimensions, and applications of actual enterprises are not that many. This necessitates hierarchical management of applications, focusing on the core, controlling the important ones, and mastering all application assets.

[0003] Current technologies mainly construct directed acyclic graphs from source to application based on task scheduling dependencies in big data. However, this task-based hierarchical relationship has two main drawbacks: first, the task dependencies are manually maintained, which may result in inaccurate information; second, in many scenarios, multiple tasks write data to a single table simultaneously. This limits the analysis to the task level, and the information may be inaccurate, making it impossible to accurately determine the impact of table changes on the application. Summary of the Invention

[0004] This invention provides a data monitoring method, apparatus, device, and storage medium that can accurately identify data involved in the application processing chain and establish a lineage map. By monitoring the data, the impact of data changes on related applications with lineage can be known in real time.

[0005] To solve the above-mentioned technical problems, one technical solution adopted by the present invention is to provide a data monitoring method, comprising:

[0006] Obtain the task records generated by each application in the processing link and parse the task records to obtain the SQL statements corresponding to each task;

[0007] Obtain the execution log of the SQL statement and parse the execution log to obtain the source table and target table of each task;

[0008] With the application as the root node and the table name of each table as a child node, a lineage graph is established between each task, source table, and target table based on the processing level of the processing link of each application.

[0009] The bloodline relationship graph is monitored for each node and its downstream bloodline applications.

[0010] According to one embodiment of the present invention, the monitoring of each node in the kinship graph and the downstream kinship applications of each node further includes:

[0011] Real-time monitoring of whether any nodes in the kinship graph have changed;

[0012] If so, locate the change node and identify all downstream related applications of the change node according to the lineage graph;

[0013] Determine whether the downstream lineage application is a preset key application;

[0014] If so, the change node will be sent to the maintenance personnel corresponding to the preset key application.

[0015] According to one embodiment of the present invention, the monitoring of each node in the kinship graph and the downstream kinship applications of each node further includes:

[0016] Real-time monitoring of whether any abnormalities occur at any node in the kinship map;

[0017] If so, locate the abnormal node and identify all downstream related applications of the abnormal node according to the bloodline relationship map;

[0018] Determine whether the downstream lineage application is a preset key application;

[0019] If so, the abnormal node will be sent to the maintenance personnel corresponding to the preset key application.

[0020] According to one embodiment of the present invention, obtaining the execution log of the SQL statement and parsing the execution log to obtain the source table and target table of each task includes:

[0021] Obtain the execution log of the SQL statement and parse the execution log;

[0022] Based on preset regular expressions, the first and second preset keywords are extracted from the operation log by keyword search;

[0023] The source table of each task is identified based on the first preset keyword, and the target table of each task is identified based on the second preset keyword.

[0024] According to an embodiment of the present invention, obtaining and parsing the execution log of the SQL statement includes:

[0025] The operation logs are synchronized to the big data cluster, and the operation logs are cleaned.

[0026] The cleaned runtime logs are segmented to obtain field information.

[0027] According to one embodiment of the present invention, after obtaining the execution log of the SQL statement and parsing the execution log to obtain the source table and target table of each task, the method further includes:

[0028] Determine whether the data structure relationship between the source table and the target table of each task satisfies the requirements of a directed acyclic graph;

[0029] If so, then retain the source table and the target table;

[0030] If not, then remove the source table and the target table.

[0031] According to an embodiment of the present invention, before establishing the lineage graph between tasks, source tables, and target tables based on the processing hierarchy of the processing links of each application, with the application as the root node and the table names of each table as child nodes, the method further includes:

[0032] The applications are classified into different levels;

[0033] Based on the classification results, key applications are identified and tagged.

[0034] To solve the above-mentioned technical problems, another technical solution adopted by the present invention is: to provide a data monitoring device, comprising:

[0035] The first acquisition module is used to acquire task records generated by each application in the processing link and parse the task records to obtain SQL statements corresponding to each task.

[0036] The second acquisition module is used to acquire the execution log of the SQL statement and parse the execution log to obtain the source table and target table of each task;

[0037] A module is established to create a lineage graph between tasks, source tables, and target tables, with the application as the root node and the table names of each table as child nodes, based on the processing hierarchy of the processing links of each application.

[0038] The monitoring module is used to monitor each node in the kinship graph and the downstream kinship applications of each node.

[0039] To solve the above-mentioned technical problems, another technical solution adopted by the present invention is to provide a computer device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the data monitoring method.

[0040] To solve the above-mentioned technical problems, another technical solution adopted by the present invention is to provide a computer storage medium on which a computer program is stored, wherein the computer program implements the above-mentioned data monitoring method when executed by a processor.

[0041] The beneficial effects of this invention are as follows: By acquiring and parsing the task records generated by each application in the processing chain, the SQL statements corresponding to each task are obtained; by acquiring and parsing the execution logs of the SQL statements, the source tables and target tables of each task are obtained; with the application as the root node and the table name of each table as the child node, a lineage relationship graph between each task, source table, and target table is established according to the processing level of each application's processing chain; by monitoring each node in the lineage relationship graph and the downstream lineage applications of each node, compared with the method of parsing data based on the scheduling dependency relationship of dependent tasks, the data involved in the application processing chain can be accurately identified and a lineage relationship graph can be established. By monitoring the data, the impact of data changes on related applications with lineage relationships can be known in real time. Attached Figure Description

[0042] Figure 1 This is a flowchart illustrating a data monitoring method according to an embodiment of the present invention;

[0043] Figure 2 This is a flowchart illustrating a data monitoring method according to another embodiment of the present invention;

[0044] Figure 3 This is a flowchart illustrating a data monitoring method according to another embodiment of the present invention;

[0045] Figure 4 This is a flowchart illustrating a data monitoring method according to another embodiment of the present invention;

[0046] Figure 5 This is a flowchart illustrating a data monitoring method according to another embodiment of the present invention;

[0047] Figure 6This is a schematic diagram of the structure of the data monitoring device according to an embodiment of the present invention;

[0048] Figure 7 This is a schematic diagram of the structure of a computer device according to an embodiment of the present invention;

[0049] Figure 8 This is a schematic diagram of the structure of a computer storage medium according to an embodiment of the present invention. Detailed Implementation

[0050] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.

[0051] The terms "first," "second," and "third" used in this invention are for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of that feature. In the description of this invention, "a plurality of" means at least two, such as two, three, etc., unless otherwise explicitly specified. All directional indications (such as up, down, left, right, front, back, etc.) in the embodiments of this invention are only used to explain the relative positional relationships and movements between components in a specific orientation (as shown in the figures). If the specific orientation changes, the directional indications also change accordingly. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or devices.

[0052] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of the invention. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.

[0053] Figure 1 This is a flowchart illustrating a data monitoring method according to an embodiment of the present invention. It should be noted that if substantially the same result is achieved, the method of the present invention is not necessarily identical. Figure 1 The illustrated process sequence is limited. For example... Figure 1 As shown, the method includes the following steps:

[0054] Step S101: Obtain the task records generated by each application in the processing link and parse the task records to obtain the SQL statements corresponding to each task.

[0055] In step S101, the daily running schedule tasks of each application are obtained from the production environment. Through the schedule tasks, all task records executed on the big data platform can be obtained from the logs of YARN (Yet Another Resource Negotiator). The raw data is extracted into a JSON format file, and the data in the file is parsed to obtain the execution statement containing key information such as JOB_ID and JOB. The execution statement is the SQL statement.

[0056] Step S102: Obtain the execution log of the SQL statement and parse the execution log to obtain the source table and target table of each task.

[0057] In step S102, the SQL statement execution log carries the table name of the source table recording the original processed data and the table name of the target table that needs to write data when executing the SQL statement. By parsing the execution log, the source and target tables used to execute each task can be identified. In this embodiment, a task can correspond to multiple source tables and multiple target tables, and a target table can also correspond to multiple source tables, and a source table can also correspond to multiple target tables. However, the data structure relationship between the source tables and target tables of each task should meet the requirements of a directed acyclic graph, that is, the data structure relationship cannot form a closed loop.

[0058] In one embodiment, the association between each task and the source table and the target table is established, and the task name, the corresponding source table name, and the target table name are displayed.

[0059] This embodiment obtains the source and target tables of each task through the execution log of SQL statements, which can accurately obtain the processed data of the tables and the relationship between the tables and tasks. It can be used not only to analyze the impact of tasks on applications, but also to analyze the impact of tables on applications, effectively improving data accuracy and solving the problem of low data accuracy obtained by the original application processing chain through task scheduling dependencies.

[0060] In one feasible embodiment, please refer to Figure 2 Step S102 further includes:

[0061] Step S201: Obtain the execution log of the SQL statement and parse the execution log.

[0062] In step S201, the operation log is synchronized to the big data cluster, and the operation log is cleaned. The cleaned operation log is then segmented to obtain field information. This embodiment improves the accuracy of data extraction through data cleaning, ensuring high reliability of data information. Segmentation facilitates subsequent keyword searching, improving search efficiency.

[0063] Specifically, the runtime logs are synchronized to the big data Hadoop cluster. Synchronization tools such as Filebeat, Filesycn, and ETL are used to feed the runtime logs into the Hadoop cluster, and the log data is filtered and cleaned according to the log format. The logs are then split into fields using a word segmentation tool according to the format, such as date, executed SQL, execution time, returned result data volume, whether there was an error, and error message.

[0064] Step S202: Extract the first and second preset keywords from the running logs by searching for keywords based on preset regular expressions.

[0065] In step S202, the first preset keyword is "table scan" and the second preset keyword is "loadingdata".

[0066] Step S203: Identify the source table of each task based on the first preset keyword, and identify the target table of each task based on the second preset keyword.

[0067] In step S203, the field after the first preset keyword is the name of the source table, and the field after the second preset keyword is the name of the target table. Therefore, the source table and the target table can be identified by locating the first and second preset keywords through keyword search.

[0068] In one feasible embodiment, after step S102, please refer to Figure 3 It also includes:

[0069] Step S301: Determine whether the data structure relationship between the source table and the target table of each task satisfies the requirements of a directed acyclic graph.

[0070] In step S301, the data structure relationship between the source table and the target table of each task should meet the requirements of a directed acyclic graph, that is, the data structure relationship cannot form a closed loop. For example, if the target table is A and the source table is B in a task, then the same task will not have a situation where the target table is B and the source table is A. This is normal data. If a closed loop occurs, it is abnormal data.

[0071] Step S302: If yes, then retain the source table and the target table.

[0072] In step S302, if the data structure relationship between the source table and the target table of each task satisfies the requirements of a directed acyclic graph, it indicates that the data is normal and can be retained.

[0073] Step S303: If not, remove the source table and the target table.

[0074] In step S303, if the data structure relationship between the source table and the target table of each task does not meet the requirements of a directed acyclic graph, it indicates that the data is abnormal and needs to be removed.

[0075] Step S103: With the application as the root node and the table name of each table as the child node, establish a lineage graph between each task, source table, and target table according to the processing level of each application's processing link.

[0076] In step S103, the processing chain for each application refers to the process from source processing to the application product, and the processing levels can range from several to over a dozen layers. In one embodiment, the processing levels can be ODS (Operational Data Store) -> DWD (Data Warehouse Detail) -> DWM (Data Warehouse Middle) -> DWS (Data Warehouse Service) -> APP (Application). This lineage diagram illustrates the application's processing flow and can be used for data tracing.

[0077] In one feasible embodiment, each application is classified into levels, key applications are determined based on the classification results, and key applications are tagged. The node attributes may include the application name, application level, and the name of the task written to the table.

[0078] Specifically, table parameter information of the target and source tables used by each application can be obtained. Based on this table parameter information, each application can be classified into different levels. The table parameter information includes the number of table calls, key field information, and database access frequency. Higher values for these metrics indicate greater importance of the corresponding table and related application. In one embodiment, application levels can be divided into four categories: core, important, medium, and general. Applications at the core and important levels can be defined as critical applications, and these critical applications can be tagged accordingly.

[0079] Step S104: Monitor each node in the kinship graph and the downstream kinship applications of each node.

[0080] In step S104, each node in the lineage graph and its downstream lineage applications are monitored. Common monitoring methods include real-time monitoring or scheduled task monitoring. The lineage graph records the detailed processing levels of each application from the source system to the application, specifically which tables and tasks are used. By monitoring changes in the tables, changes in the tasks or applications related to the tables can be tracked.

[0081] In one feasible implementation, in a monitoring scenario prior to version release, please refer to [link to relevant documentation]. Figure 4 Step S104 may include the following steps:

[0082] Step S401: Monitor in real time whether any nodes in the kinship graph have changed.

[0083] Step S402: If yes, locate the change node and identify all downstream related applications of the change node according to the lineage graph.

[0084] Step S403: Determine whether the downstream lineage application is a preset critical application.

[0085] Step S404: If yes, send the change node to the maintenance personnel corresponding to the preset critical application.

[0086] In this embodiment, a node change refers to a change in the table structure or script. If a node change is detected, the changed node is located and all downstream related applications are identified based on the lineage graph. The system identifies whether a preset critical application is included based on the tag. If so, the relevant operations and maintenance personnel are notified via email service, thereby enabling the monitoring of changes on the critical application chain. Furthermore, the system manually reviews whether the changed node has an impact on the critical application.

[0087] In another feasible embodiment, in the monitoring scenario where the task is actually running, please refer to... Figure 5 Step S104 may include the following steps:

[0088] Step S501: Monitor in real time whether any abnormalities occur at any node in the kinship map.

[0089] Step S502: If yes, locate the abnormal node and identify all downstream related applications of the abnormal node according to the lineage graph.

[0090] Step S503: Determine whether the downstream lineage application is a preset key application.

[0091] Step S504: If so, send the abnormal node to the operation and maintenance personnel corresponding to the preset critical application.

[0092] In this embodiment, a node anomaly refers to a node reporting an error or operational anomaly. If a node anomaly is detected, the abnormal node is located and all downstream related applications are identified based on the lineage graph. The system identifies whether a preset critical application is included based on the tag. If so, the relevant operations and maintenance personnel are notified via email service, thereby enabling the monitoring of changes on the critical application chain. Furthermore, the abnormal node is manually reviewed to determine whether it has affected the critical application.

[0093] An embodiment of the data monitoring method of the present invention obtains task records generated by each application in the processing chain and parses the task records to obtain the SQL statements corresponding to each task; obtains and parses the execution logs of the SQL statements to obtain the source table and target table of each task; establishes a lineage relationship graph between each task, source table, and target table based on the processing level of each application's processing chain, with the application as the root node and the table name of each table as the child node; and monitors each node in the lineage relationship graph and its downstream lineage applications. Compared with the method of parsing data based on the scheduling dependency relationship of dependent tasks, this method can accurately identify the data involved in the application processing chain and establish a lineage relationship graph. By monitoring the data, the method can know in real time the impact of data changes on related applications with lineage relationships.

[0094] Figure 6 This is a schematic diagram of the structure of the data monitoring device according to an embodiment of the present invention. Figure 6 As shown, the device 60 includes a first acquisition module 61, a second acquisition module 62, an establishment module 63, and a monitoring module 64.

[0095] The first acquisition module 61 is used to acquire task records generated by each application in the processing link and parse the task records to obtain SQL statements corresponding to each task;

[0096] The second acquisition module 62 is used to acquire and parse the execution log of the SQL statement to obtain the source table and target table of each task;

[0097] Module 63 is used to establish a lineage graph between tasks, source tables, and target tables, with the application as the root node and the table name of each table as a child node, based on the processing level of each application's processing chain.

[0098] The monitoring module 64 is used to monitor each node in the kinship graph and the downstream kinship applications of each node.

[0099] In one possible embodiment, the monitoring module 64 may include a first monitoring unit, a first execution unit, a first judgment unit, and a second execution unit;

[0100] The first monitoring unit is used to monitor in real time whether any nodes in the kinship map have changed;

[0101] If so, the first execution unit is used to locate the change node and identify all downstream related applications of the change node according to the lineage graph;

[0102] The first judgment unit is used to determine whether the downstream lineage application is a preset key application;

[0103] The second execution unit is used to send the change node to the operation and maintenance personnel corresponding to the preset key application if the condition is met.

[0104] In another possible embodiment, the monitoring module 64 may include a second monitoring unit, a third execution unit, a second judgment unit, and a fourth execution unit;

[0105] The second monitoring unit is used to monitor in real time whether any abnormalities occur at each node in the kinship map.

[0106] The third execution unit is used to locate the abnormal node and identify all downstream related applications of the abnormal node according to the lineage graph.

[0107] The second judgment unit is used to determine whether the downstream bloodline application is a preset key application;

[0108] The fourth execution unit is used to send the abnormal node to the operation and maintenance personnel corresponding to the preset critical application if the condition is met.

[0109] In another possible embodiment, the second acquisition module 62 may further include an acquisition unit, an extraction unit, and an identification unit;

[0110] The acquisition unit is used to retrieve and parse the execution logs of SQL statements;

[0111] The extraction unit is used to extract the first and second preset keywords from the operation log based on the keyword search using preset regular expressions;

[0112] The identification unit is used to identify the source table of each task according to the first preset keyword, and to identify the target table of each task according to the second preset keyword.

[0113] Please see Figure 7 , Figure 7 This is a schematic diagram of the structure of a computer device according to an embodiment of the present invention. Figure 7 As shown, the computer device 70 includes a processor 71 and a memory 72 coupled to the processor 71.

[0114] The memory 72 stores program instructions for implementing the data monitoring method described in any of the above embodiments.

[0115] The data monitoring method includes the following steps:

[0116] Obtain the task records generated by each application in the processing chain and parse the task records to obtain the SQL statements corresponding to each task;

[0117] Obtain and parse the execution logs of the SQL statements to obtain the source and target tables for each task;

[0118] With the application as the root node and the table name of each table as a child node, a lineage graph is established between each task, source table, and target table based on the processing level of each application's processing chain.

[0119] Monitor each node in the kinship graph and the downstream kinship applications of each node.

[0120] The processor 71 is used to execute program instructions stored in the memory 72 to monitor data.

[0121] The processor 71 can also be referred to as a CPU (Central Processing Unit). The processor 71 may be an integrated circuit chip with signal processing capabilities. The processor 71 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. A general-purpose processor can be a microprocessor or any conventional processor.

[0122] See Figure 8 , Figure 8 This is a schematic diagram of the structure of a computer storage medium according to an embodiment of the present invention. The computer storage medium of this embodiment stores a program file 81 capable of implementing all the above methods. The program file 81 can be stored in the computer storage medium in the form of a software product, and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present invention.

[0123] The data monitoring method includes the following steps:

[0124] Obtain the task records generated by each application in the processing chain and parse the task records to obtain the SQL statements corresponding to each task;

[0125] Obtain and parse the execution logs of the SQL statements to obtain the source and target tables for each task;

[0126] With the application as the root node and the table name of each table as a child node, a lineage graph is established between each task, source table, and target table based on the processing level of each application's processing chain.

[0127] Monitor each node in the kinship graph and the downstream kinship applications of each node.

[0128] The aforementioned computer storage media include: USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks or optical disks, and other media that can store program code, or terminal devices such as computers, servers, mobile phones, and tablets.

[0129] In the embodiments provided by this invention, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, indirect coupling or communication connection between apparatuses or units, and may be electrical, mechanical, or other forms.

[0130] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0131] The above are merely embodiments of the present invention and do not limit the patent scope of the present invention. Any equivalent structural or procedural transformations made based on the content of the present invention's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of the present invention.

Claims

1. A data monitoring method, characterized in that, include: Obtain the task records generated by each application in the processing link and parse the task records to obtain the SQL statements corresponding to each task; Obtain the execution log of the SQL statement and parse the execution log to obtain the source table and target table of each task; With the application as the root node and the table name of each table as a child node, a lineage graph is established between each task, source table, and target table based on the processing level of the processing link of each application. Monitor each node in the kinship graph and the downstream kinship applications of each node; The step of obtaining and parsing the execution log of the SQL statement includes: The operation logs are synchronized to the big data cluster, and the operation logs are cleaned. The cleaned runtime logs are segmented to obtain field information; Before establishing the lineage graph between tasks, source tables, and target tables based on the processing hierarchy of the processing chain of each application, with the application as the root node and the table names of each table as child nodes, the method further includes: The applications are classified into different levels; Based on the classification results, key applications are identified and tagged.

2. The data monitoring method according to claim 1, characterized in that, The monitoring of each node in the kinship graph and the downstream kinship applications of each node also includes: Real-time monitoring of whether any nodes in the kinship graph have changed; If so, locate the change node and identify all downstream related applications of the change node according to the lineage graph; Determine whether the downstream lineage application is a preset key application; If so, the change node will be sent to the maintenance personnel corresponding to the preset key application.

3. The data monitoring method according to claim 1, characterized in that, The monitoring of each node in the kinship graph and the downstream kinship applications of each node also includes: Real-time monitoring of whether any abnormalities occur at any node in the kinship map; If so, locate the abnormal node and identify all downstream related applications of the abnormal node according to the bloodline relationship map; Determine whether the downstream lineage application is a preset key application; If so, the abnormal node will be sent to the maintenance personnel corresponding to the preset key application.

4. The data monitoring method according to claim 1, characterized in that, The step of obtaining the execution log of the SQL statement and parsing the execution log to obtain the source table and target table of each task includes: Obtain the execution log of the SQL statement and parse the execution log; Based on preset regular expressions, the first and second preset keywords are extracted from the operation log by keyword search; The source table of each task is identified based on the first preset keyword, and the target table of each task is identified based on the second preset keyword.

5. The data monitoring method according to claim 1, characterized in that, After obtaining and parsing the execution log of the SQL statement to obtain the source table and target table of each task, the process further includes: Determine whether the data structure relationship between the source table and the target table of each task satisfies the requirements of a directed acyclic graph; If so, then retain the source table and the target table; If not, then remove the source table and the target table.

6. A data monitoring device, characterized in that, include: The first acquisition module is used to acquire task records generated by each application in the processing link and parse the task records to obtain SQL statements corresponding to each task. The second acquisition module is used to acquire the execution log of the SQL statement and parse the execution log to obtain the source table and target table of each task; A module is established to create a lineage graph between tasks, source tables, and target tables, with the application as the root node and the table names of each table as child nodes, based on the processing hierarchy of the processing links of each application. The monitoring module is used to monitor each node in the bloodline graph and the downstream bloodline applications of each node; The step of obtaining and parsing the execution log of the SQL statement includes: The operation logs are synchronized to the big data cluster, and the operation logs are cleaned. The cleaned runtime logs are segmented to obtain field information; Before establishing the lineage graph between tasks, source tables, and target tables based on the processing hierarchy of the processing chain of each application, with the application as the root node and the table names of each table as child nodes, the method further includes: The applications are classified into different levels; Based on the classification results, key applications are identified and tagged.

7. A computer device, comprising: A memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, when the processor executes the computer program, it implements the data monitoring method as described in any one of claims 1-5.

8. A computer storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the data monitoring method as described in any one of claims 1-5.