An industrial data full life cycle multi-dimension evaluation method and system
By constructing a global data flow topology map and a multi-dimensional evaluation method, the isolation and static nature of data evaluation in existing technologies are solved. This enables the dynamic quantification and scenario adaptation of data assets in networked flow, improving the practicality of evaluation and the intelligent closed-loop drive of governance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA NAT INST OF STANDARDIZATION
- Filing Date
- 2026-04-01
- Publication Date
- 2026-06-30
Smart Images

Figure CN122309484A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of industrial big data, and in particular to a multi-dimensional evaluation method and system for the entire lifecycle of industrial data. Background Technology
[0002] In the field of industrial big data, effective assessment of data assets is fundamental to governance and value extraction. Existing typical technologies mainly focus on static measurement of data quality, auditing specific datasets for dimensions such as accuracy and completeness through preset rules, and generating static quality reports to support the identification of data problems and basic governance.
[0003] Existing static evaluation methods reveal limitations due to their isolated and static nature. They treat data objects as independent entities, failing to reflect the dynamic impact and influencing factors of the data within its global circulation network. For example, the transmission effect of upstream data quality changes on downstream derived data value cannot be quantified. Fixed-dimensional evaluations struggle to adapt to the differentiated value demands of data in various business scenarios, resulting in insufficient guidance for optimizing data applications in specific contexts. Furthermore, there is a lack of intelligent closed-loop drivers between evaluation and governance based on networked connections and contextualized value. Summary of the Invention
[0004] In view of the aforementioned existing problems, the present invention is proposed.
[0005] Therefore, this invention provides a multi-dimensional assessment method for the entire lifecycle of industrial data to address the problem that existing assessment methods, due to their isolated and static nature, cannot quantify the dynamic impact of data in networked circulation, are difficult to adapt to the value demands of multiple business scenarios, and thus lack intelligent closed-loop driving force between assessment and governance.
[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution: In a first aspect, the present invention provides a multi-dimensional evaluation method for the entire lifecycle of industrial data, which includes collecting industrial data sources and extracting static metadata information to obtain registered data asset objects; Monitor the operation of the data pipeline, identify the flow and evolution relationships between registered data asset objects, and construct a global data flow topology map; Initialize a multi-dimensional evaluation vector for each data asset node in the global data flow topology graph to obtain a set of multi-dimensional evaluation vectors after network-based updates; Capture the context of data asset nodes accessing the global data flow topology graph, identify topology access patterns based on the access context, dynamically reconstruct multi-dimensional evaluation vectors, and generate contextualized final evaluation results. The contextualized final assessment results are stored and input into the strategy engine, and data governance operation instructions are triggered based on the contextualized final assessment results to drive the full lifecycle management of industrial data assets.
[0007] As a preferred embodiment of the multi-dimensional evaluation method for the entire lifecycle of industrial data described in this invention, the registered data asset objects include: Industrial data entities are obtained by collecting data from industrial data sources, and static metadata information is extracted. Based on the extracted static metadata information, a globally unique asset identifier is assigned to the industrial data entity. The industrial data entity with the globally unique asset identifier and the corresponding extracted static metadata information are used to create an asset record in the data asset register, resulting in a registered data asset object.
[0008] As a preferred embodiment of the multi-dimensional evaluation method for the entire lifecycle of industrial data described in this invention, the construction of the global data flow topology includes: Deploy lineage probes at the processing nodes of the running data pipeline to capture data lineage logs and obtain the data lineage logs; By analyzing the data lineage log, the registered data asset objects can be identified, and their flow and evolution relationships can be obtained. Using registered data asset objects as nodes and flow and evolution relationships as directed edges, the data is assembled and updated in real time in the graph database to construct a global data flow topology graph.
[0009] As a preferred embodiment of the multi-dimensional evaluation method for the entire lifecycle of industrial data described in this invention, the set of multi-dimensional evaluation vectors after network-updated analysis includes: Define a multi-dimensional evaluation vector for each data asset node in the global data flow topology diagram, with data quality and business value dimensions. Calculate the initial scores for data quality and business value dimensions based on the metadata of the data asset node to obtain the initialized multi-dimensional evaluation vector. In the global data flow topology diagram, when the initial multi-dimensional evaluation vector of a data asset node changes due to quality audit or business call, an evaluation update event is triggered. Based on the direction and semantic type of the connecting edges in the global data flow topology graph, the coefficients and influence functions for the evaluation update event are defined to propagate along the edges. The score change of the data asset node is calculated and superimposed on the corresponding dimension of the initial multi-dimensional evaluation vector of the upstream and downstream data asset nodes. Traverse the data asset nodes in the global data flow topology that are triggered to update due to direct or indirect associations, summarize the multi-dimensional evaluation vectors after node updates, and form a set of multi-dimensional evaluation vectors after network updates.
[0010] As a preferred embodiment of the multi-dimensional evaluation method for the entire lifecycle of industrial data described in this invention, the access context includes: In the global data flow topology diagram, every operation that accesses a data asset node is monitored, and basic log information such as access time, visitor identity, access operation type, and access performance requirements is recorded. From the basic log information, extract the source and target node paths of the access operation in the global data flow topology graph, and combine them with the metadata of the data asset nodes to generate semantic tags that describe the location and intent of this access in the topology. The basic log information is integrated and structured with semantic tags to form the context for data asset nodes to access in the global data flow topology.
[0011] As a preferred embodiment of the multi-dimensional evaluation method for the entire lifecycle of industrial data described in this invention, the identification of topology access patterns includes: From the context of data asset nodes accessing the global data flow topology graph, extract the path depth, branch complexity, and real-time performance of access requests in the global data flow topology graph to form a structured feature vector for identifying topology access patterns. The structured feature vectors are input into a pre-trained graph embedding fine-tuning model that classifies topological access patterns to obtain the probability distribution. The category with the highest probability is determined as the topology access mode identifier for this visit; Based on the topology access pattern identifier, the topology access pattern, which is bound to the topology access pattern identifier, is retrieved from the dynamically evolving pattern knowledge graph. The topology access pattern is characterized by typical scenarios, evaluation focus, and a complete description of reconstruction rules.
[0012] As a preferred embodiment of the multi-dimensional evaluation method for the entire lifecycle of industrial data described in this invention, the contextualized final evaluation result includes: Based on the topology access pattern, the corresponding dynamic reconstruction strategy is searched and matched from the predefined strategy library. The evaluation dimensions, operation types and operation parameters that need to be operated are specified to obtain the dynamic reconstruction strategy. Based on the operation type and operation parameters specified in the dynamic reconstruction strategy, nonlinear scaling and selective explicit / implicit operations are performed on the scores of the specified evaluation dimensions in the multi-dimensional evaluation vector set after network update to obtain intermediate reconstructed evaluation vectors. The intermediate reconstruction evaluation vector is normalized and formatted, and topology access mode identifiers and timestamps are added to generate contextualized final evaluation results.
[0013] As a preferred embodiment of the multi-dimensional evaluation method for the entire lifecycle of industrial data described in this invention, the data governance operation instructions include: Write the contextualized final evaluation result into the evaluation result history database, and publish the contextualized final evaluation result as a real-time message to the message input queue of the strategy engine; The evaluation dimension scores in the contextualized final evaluation results are matched with preset threshold rules. When a match is successful, a data governance operation instruction to be executed is generated.
[0014] As a preferred embodiment of the multi-dimensional evaluation method for the entire lifecycle of industrial data described in this invention, the lifecycle management includes: The data governance operation instructions to be executed are sent to the corresponding data governance executor, which then executes the operations defined in the instructions. Successful execution of the operations changes the status, location, or configuration of the registered data asset objects, driving the full lifecycle management of industrial data assets.
[0015] Secondly, the present invention provides a multi-dimensional evaluation system for the entire lifecycle of industrial data, including a data acquisition module that collects industrial data sources and extracts static metadata information to obtain registered data asset objects. The module builds a global data flow topology diagram, monitors the operation of the data pipeline, identifies the flow and evolution relationships between registered data asset objects, and constructs a global data flow topology diagram. The initialization module initializes a multi-dimensional evaluation vector for each data asset node in the global data flow topology graph, and obtains a set of multi-dimensional evaluation vectors after network update. The identification module captures the context of data asset nodes accessing the global data flow topology graph, identifies the topology access pattern based on the access context, dynamically reconstructs the multi-dimensional evaluation vector, and generates a contextualized final evaluation result. The driver module stores the contextualized final evaluation results and inputs them into the strategy engine. Based on the contextualized final evaluation results, it triggers data governance operation instructions to drive the full lifecycle management of industrial data assets.
[0016] The beneficial effects of this invention are as follows: By defining multi-dimensional evaluation vectors for data asset nodes and enabling them to be updated collaboratively in a networked manner along the global data flow topology, the evaluation results are dynamically propagated and globally linked in the data lineage network. This allows changes in the quality and value of upstream data to be quantified and transmitted to downstream related assets, thereby accurately identifying key influencing nodes and risk transmission paths in the data ecosystem and providing a basis for the precise allocation of governance resources. By capturing the context of data access and identifying topology access patterns, the evaluation vectors are dynamically reconstructed based on scenarios. This allows the same data asset to generate highly adaptable and directly operational contextualized evaluation results according to different business scenarios, achieving a leap from general evaluation to scenario-optimal evaluation, improving the practicality and guiding value of the evaluation. By automatically triggering corresponding data governance operation instructions with contextualized evaluation results, an intelligent closed loop of evaluation-diagnosis-decision-execution is formed, driving lean management of industrial data assets throughout their entire lifecycle from passive response to proactive optimization. Attached Figure Description
[0017] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0018] Figure 1 A flowchart for a multi-dimensional evaluation method for the entire lifecycle of industrial data.
[0019] Figure 2 This is a schematic diagram of a multi-dimensional evaluation system for the entire lifecycle of industrial data. Detailed Implementation
[0020] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
[0021] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.
[0022] Secondly, the term "one embodiment" or "example" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the invention. The appearance of an embodiment in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that mutually excludes other embodiments.
[0023] Reference Figures 1-2 This is one embodiment of the present invention, which provides a multi-dimensional evaluation method for the entire lifecycle of industrial data, including the following steps: S1. Collect industrial data sources and extract static metadata information to obtain registered data asset objects.
[0024] S1.1 Obtain industrial data entities from industrial data sources and extract static metadata information.
[0025] Furthermore, by deploying adapters or acquisition agents on the data source side, data content is extracted in streaming or batch processing to ensure the complete acquisition of data entities. During the acquisition process, static parsing of industrial data entities is performed simultaneously to extract their static metadata information. The extracted static metadata information includes technical metadata and basic business metadata. Technical metadata involves data structure, field definitions, storage location, and format encoding, while basic business metadata covers the data entity name, its business line, and preliminary data classification and grading tags. Information extraction is completed based on the parsing of the industrial data entity's own structure and awareness of the data source context, ensuring that each acquired industrial data entity is accompanied by accurate and complete static metadata information.
[0026] S1.2. Assign a globally unique asset identifier to the industrial data entity based on the extracted static metadata information. Create an asset record in the data asset register for the industrial data entity with the globally unique asset identifier and the corresponding extracted static metadata information, and obtain the registered data asset object.
[0027] Furthermore, the process of assigning globally unique asset identifiers follows predefined naming conventions or employs a universally unique identifier generation algorithm to ensure the uniqueness and persistence of the identifiers across the entire evaluation system. Subsequently, the generated globally unique asset identifier, the logical reference to the industrial data entity, and the extracted static metadata information are bound together to form a structured asset record. This asset record is persistently written to a centrally managed data asset register, which serves as the authoritative directory and index for all data asset objects. Once the write operation is complete, the industrial data entity is formally assigned asset status, transforming into a registered data asset object with a standard identity, complete attribute description, and verifiable records in the central directory. This completes the transformation from raw data to a manageable and traceable standardized data asset.
[0028] S2. Monitor the operation of the data pipeline, identify the flow and evolution relationships between registered data asset objects, and construct a global data flow topology diagram.
[0029] S2.1 Deploy lineage probes at the processing nodes of the running data pipeline to capture data lineage logs and obtain the data lineage logs.
[0030] Furthermore, the lineage probe, in the form of a lightweight agent or plugin, is embedded within key processing nodes that perform data reading, transformation, computation, and writing operations. These processing nodes include ETL job engines, stream processing computation tasks, data cleaning script execution environments, and API service endpoints. While data processing logic executes within these nodes, the lineage probe synchronously listens for and intercepts operation events. Event information includes the operation type, a precise timestamp, the identifier set of registered data asset objects used as input, and the identifier set of registered data asset objects used as output. The lineage probe encapsulates this event information in real-time into a structured log record, i.e., a data lineage log, according to a predefined format. The data lineage log is continuously generated and fed into a unified message bus or log collection component, forming a time-ordered sequence of data lineage logs reflecting all production and consumption behaviors in the data pipeline, providing an atomic event stream for subsequent relationship identification.
[0031] S2.2. Based on the data lineage log, analyze the data lineage log to identify the registered data asset objects and obtain their flow and evolution relationships.
[0032] Furthermore, the parsing process consumes the continuously flowing data lineage logs, applying predefined semantic parsing rules to each log entry. These rules extract core elements of the operational events from the logs, including lists of input and output identifiers, and operation type codes. Based on these elements, and combined with the records of registered data asset objects in the data asset register, the validity and correspondence of the input and output identifiers are verified. For each valid pair of input and output identifiers, the operation type code maps them to a semantically defined flow and evolution relationship, such as a derivative, transformation, fusion, or consumption relationship. The parsing process generates a structured record for each identified relationship, containing the relationship type, source data asset object identifier, target data asset object identifier, and a reference to the original data lineage log that generated the relationship. This record constitutes a flow and evolution relationship. Continuous parsing transforms the data lineage log stream into a flow and evolution relationship stream, dynamically revealing the micro-mechanisms of how registered data asset objects are interconnected and evolve.
[0033] S2.3. Using the registered data asset objects as nodes and the flow and evolution relationships as directed edges, the data is assembled and updated in real time in the graph database to construct a global data flow topology graph.
[0034] Furthermore, the graph database, acting as a storage and query engine, predefines data schemas conforming to the attribute graph model. Node schemas correspond to registered data asset objects, and edge schemas correspond to flow and evolution relationships. The assembly process continuously monitors the flow of flow and evolution relationships. For each flow and evolution relationship, a creation or update transaction is executed in the graph database. This ensures that the identifiers of the source and target registered data asset objects involved in the flow and evolution relationship exist as nodes with corresponding attributes in the graph. If they do not exist, the corresponding nodes are created based on the information in the data asset register. Subsequently, the transaction creates a directed edge between the existing source and target nodes according to the relationship type defined in the flow and evolution relationship, and attaches other attributes from the flow and evolution relationship (such as generation time and operation reference) as attributes of the edge. The update process handles the modification or logical deletion of existing relationships. Through the continuous execution of such transactions, the node and edge network in the graph database dynamically grows and changes, ultimately forming a memory-based or persistent graph structure covering all known registered data asset objects and all known flow and evolution relationships between them. This structure is the global data flow topology graph.
[0035] S3. Initialize a multi-dimensional evaluation vector for each data asset node in the global data flow topology graph to obtain the set of multi-dimensional evaluation vectors after network update.
[0036] S3.1 Define a multi-dimensional evaluation vector for each data asset node in the global data flow topology diagram, which includes data quality dimension and business value dimension. Calculate the initial scores for data quality dimension and business value dimension based on the metadata of the data asset node to obtain the initialized multi-dimensional evaluation vector.
[0037] Furthermore, the process of defining multi-dimensional evaluation vectors for data quality and business value dimensions involves creating a vector container with a fixed structure for each data asset node. This container explicitly contains two storage locations: one for the data quality dimension and one for the business value dimension. The initial score for the data quality dimension is calculated using a data quality dimension aggregation function. Implementation, function The input parameters include structural metrics extracted and quantified from the metadata of data asset nodes. Completeness indicators Timeliness indicators and accuracy reference indicators These metrics are derived from metadata fields through predefined measurement rules, and the functions... Multiple metrics are aggregated and normalized to output initial scores for the data quality dimension. The initial score for the business value dimension is calculated using the business value dimension aggregation function. Implementation, function The input parameters include the correlation index calculated based on the connectivity characteristics of data asset nodes in the global data flow topology. Key business indicators obtained by mapping business classification and hierarchical information from metadata And potential demand indicators predicted based on historical access patterns or similarities. ,function By combining these factors that reflect the potential and importance of data, an initial score is output for the business value dimension. The calculated and By filling the vector container corresponding to the data asset node, the construction of the multi-dimensional evaluation vector for initializing the node is completed, thereby attaching a preliminary quality and value quantification label to each node in the global data flow topology.
[0038] Specifically, data quality and business value are defined as two orthogonal, core dimensions of the evaluation vector, and their initial scores are calculated separately using aggregation functions based on node metadata and network topology characteristics. Traditional evaluations either focus on a single quality aspect or conflate quality and value. This design separates and quantifies the two, acknowledging that data possesses both intrinsic attributes and extrinsic utility. The data quality dimension aggregation function... By combining multiple objective technical indicators and aggregating them through a business value dimension function, By integrating network correlations, business rules, and demand signals, the initial evaluation vector reflects both the objective state of the data and its potential business significance. The initial score expression for the data quality dimension is: ; in, This is the initial score for the data quality dimension. As a structural indicator, As a completeness indicator, As a timeliness indicator, For accuracy reference indicators, Aggregation functions for data quality dimensions; The initial score expression for the business value dimension is: ; in, This serves as the initial score for the business value dimension. As a correlation index, As a key business indicator, As an indicator of potential demand, Aggregate functions for business value dimensions; S3.2 In the global data flow topology diagram, when the score of the initial multi-dimensional evaluation vector of a data asset node changes due to quality audit or business call, an evaluation update event is triggered.
[0039] Furthermore, quality auditing refers to generating a quality report after executing predefined data quality inspection rules on data asset nodes. The report's conclusions lead to a correction of the initial scores for the data quality dimensions. Business invocation refers to the access and consumption of data asset nodes by applications, analytical models, or interface services. This access behavior or result is recorded and analyzed to verify or correct the initial scores for the business value dimensions. When any of the above scores is corrected, a difference is generated between this difference and the originally initialized multi-dimensional evaluation vector. This difference, along with the identifier of the changed data quality dimension or business value dimension, the reason for the change, the timestamp, and the triggering node identifier, is encapsulated to generate an evaluation update event. The evaluation update event is published to an asynchronous event bus, marking the beginning of an evaluation state transition driven by changes in the data's own state or external usage feedback.
[0040] Specifically, quality audits and business calls—two key activities in the data lifecycle that best reflect changes in data status and value realization—serve as triggers for assessment updates. Once such activities occur and new insights are generated, a structured assessment update event is immediately produced. This event-driven model ensures timely assessment updates with clear causal relationships. The assessment vector evolves closely following actual data usage and governance activities, achieving synchronization between assessment and the data lifecycle. This avoids information lag and resource waste associated with periodic assessments, improving the real-time nature and relevance of the assessment.
[0041] S3.3 Based on the direction and semantic type of the connecting edges in the global data flow topology graph, define the coefficients and influence functions for the evaluation update event to propagate along the edges, calculate the score change of the data asset node, and superimpose it on the corresponding dimension of the initial multi-dimensional evaluation vector of the upstream and downstream data asset nodes.
[0042] Furthermore, the definitions of the propagation coefficients and influence functions are based on a pre-configured strategy that maps different types of edges in the global data flow topology graph to different propagation behaviors. For example, for edges of the derivative type, a higher forward propagation coefficient may be defined, as an increase in the initial score of the upstream data quality dimension will significantly enhance the quality reputation of the downstream data. For edges of the consumption type, a backpropagation influence function may be defined, as the confirmation of the initial score of the downstream business value dimension can be partially traced back to the upstream data, increasing its value contribution. When processing an evaluation update event, the event trigger node is located, and all edges directly connected to that node in the global data flow topology graph are queried. For each edge, the corresponding propagation coefficient and influence function are found based on its semantic type. The score change, propagation coefficient, influence function, and edge direction in the evaluation update event are used as inputs to obtain the adjustment amount to be applied to the corresponding evaluation dimension of the adjacent nodes. The adjustment amount is then superimposed on the corresponding dimension score of the currently initialized multi-dimensional evaluation vector of the adjacent data asset nodes, completing a local propagation of the evaluation impact.
[0043] Specifically, by predefining differentiated propagation coefficients and influence functions for different types of edges, this mechanism can finely characterize the strength and manner in which different types of data relationships influence the assessment. For example, quality defects may propagate more strongly on derived edges but weakly on reference edges; value confirmation may propagate in the reverse direction on consumption edges. This allows the assessment model to reflect the true dependency and influence logic within the data ecosystem. A change in the assessment of a node is no longer isolated but spreads like ripples throughout the network, naturally giving the assessment results network synergy and global consistency, revealing the root causes of data problems and the source of value.
[0044] S3.4. Traverse the data asset nodes in the global data flow topology graph that are triggered to update due to direct or indirect association, summarize the multi-dimensional evaluation vectors after node updates, and form a set of multi-dimensional evaluation vectors after network updates.
[0045] Furthermore, the traversal process begins at the node that initially triggers the evaluation update event. According to the propagation logic, its direct impact updates the evaluation vectors of its direct neighbors. The updated neighbor nodes, whose own multi-dimensional evaluation vectors change due to the propagation effect, are also considered new evaluation update event sources, but with changes attenuated by propagation. This process iterates along the edges of the global data flow topology graph with a finite depth or finite number of hops, each iteration following defined propagation rules until the propagation impact falls below a preset threshold or reaches the maximum number of iterations. During propagation, all data asset nodes whose evaluation vector scores have changed and their updated multi-dimensional evaluation vectors are recorded. When the propagation process terminates, the multi-dimensional evaluation vectors of all recorded changed data asset nodes, containing the latest data quality and business value dimension scores, are collected and aggregated. This aggregated vector set contains the latest evaluation status of the initial trigger node and all its associated nodes within the propagation influence range; this set constitutes the networked updated multi-dimensional evaluation vector set.
[0046] Specifically, the diffusion process involves a finite number of iterations along the topology, with the impact gradually diminishing, summarizing the latest assessment states of all affected nodes. This simulates the transmission and superposition of influences in complex systems in the real world. By generating a set of networked, updated, multi-dimensional assessment vectors, it becomes a dynamic snapshot of assessments that records the collaborative evolution caused by the same source event. This set clearly demonstrates the scope and depth of impact of assessment changes, allowing users to intuitively see how an improvement in data quality or value confirmation is transmitted step by step in the data lineage network, ultimately improving the assessment level of the entire related data chain. This provides a powerful analytical tool for understanding the global interconnected value and risks of data assets, and is a key manifestation of the shift from point-based thinking to network-based thinking in assessment.
[0047] S3.5 In the global data flow topology diagram, monitor each operation that accesses a data asset node and record basic log information such as access time, visitor identity, access operation type, and access performance requirements.
[0048] Furthermore, monitoring is achieved by deploying log hooks on data service gateways, query engine proxies, or data API endpoints. When an access request for a registered data asset object arrives and is processed, the log hook captures detailed information about the request. The access time records the precise timestamp of the request; the visitor identity identifies the requesting application, service account, or user entity; the access operation type describes the intent of the request, such as query, read, write, or subscription; and the access performance requirements may include the maximum latency tolerance or throughput expectation declared in the request. This information is formatted into a structured record, the basic log information, and immediately sent to a unified log collection pipeline for further processing, creating a detailed and standardized audit trail for each data access.
[0049] Specifically, the monitoring and logging of data access is elevated to a continuous and standardized infrastructure capability. It ensures comprehensive auditing of every node and every access in the global data flow topology. The recorded basic log information is not simply access counts, but a multi-dimensional snapshot including identity, intent, and performance requirements. Fine-grained, real-time access logs are indispensable raw inputs for subsequent high-level contextual understanding and pattern recognition, enabling the evaluation system to see how data is actually used, not just know that it has been used, providing a data foundation for inferring value and scenarios from usage behavior.
[0050] S3.6 Extract the source and target node paths of the access operation in the global data flow topology from the basic log information, and combine them with the metadata of the data asset nodes to generate semantic tags that describe the location and intent of this access in the topology.
[0051] Furthermore, the basic log information is parsed to identify the identifiers of the accessed target data asset nodes, and potentially the identifiers of related source data asset nodes mentioned in the access request or inferred from parameters. These identifiers are used to query the global data flow topology graph, calculating the shortest or reachable path from the source node to the target node; this path represents the footprint of this access in the data network. Simultaneously, the metadata of the target data asset nodes, particularly their business classification, data theme information, and access operation type, is analyzed using a predefined semantic rule engine. The rule engine combines path features, node business attributes, and operation type to generate phrases with business meaning, such as cross-domain fusion analysis, real-time consumption of the core pipeline, and batch backtracking of historical data. These phrases serve as semantic tags for this access. These semantic tags encapsulate the core characteristics of this access in terms of topology and business intent.
[0052] Specifically, by querying the topology map to reveal the span and path of access behavior in the data network, and then combining this with metadata to understand its business context, the complex background of an access can be summarized with concise labels. For example, a query might be marked as a critical path for real-time consumption, which immediately conveys the high timeliness requirement of this access, its core position in the network, and the consumption intent.
[0053] S3.7 Integrate and structure basic log information with semantic tags to form the context for data asset nodes to access in the global data flow topology.
[0054] Furthermore, the fusion process creates a new structured document whose field set is the union of all fields from the basic log information and semantic tags. The basic log information provides objective factual details of the access, while the semantic tags provide a processed semantic summary. Both are aligned and combined according to a predefined contextual data model. For example, fields from the basic log information, such as access time and visitor identity, are organized together with the semantic tags in a message body in JSON or Protocol Buffer format. This complete, structured message body constitutes the context for the data asset node's access in the global data flow topology. This context not only records the original facts of the access but also carries an interpretation of these facts from the perspectives of the data network and business, providing the evaluation system with a complete, self-descriptive information package of what this access actually entails.
[0055] Specifically, the raw, technical basic log information and the derived, business-related semantic tags are encapsulated into a whole. This context object, accessed by the data asset node in the global data flow topology, becomes a holographic archive for the evaluation system to understand a single data access event. The evaluation algorithm no longer needs to parse logs and tags separately, but directly consumes this pre-processed, information-dense context object. This makes the subsequent context-aware evaluation logic simpler and more efficient, and ensures that the factual basis and semantic understanding on which the evaluation is based are synchronous and consistent, laying a reliable data input foundation for generating accurate contextualized evaluation results.
[0056] S3.8 Extract the path depth, branch complexity, and real-time performance of the access request in the global data flow topology from the context of the data asset node's access in the global data flow topology, and construct a structured feature vector to identify the topology access pattern.
[0057] Furthermore, the extraction process analyzes the path information contained in the context. Path depth is calculated as the number of edges traversed by the shortest path from the source node to the target node in the global data flow topology graph, representing the indirectness of the access. Branch complexity is quantified by analyzing the average out-degree or in-degree of nodes on the path, or the structural complexity index of the subgraph traversed by the path, representing the complexity of the data relationships involved in the access. Real-time performance is derived from the access performance requirement field in the basic log information, or by the difference between the access time and the data update timestamp, mapped to a value representing the urgency of timeliness. The three values of path depth, branch complexity, and real-time performance are arranged in a fixed order to form a numerical array, which is the structured feature vector used to identify the topology access pattern. The structured feature vector, in a compact numerical form, characterizes the key topological features of a single access in the data network structure and time dimensions.
[0058] Specifically, three core graph structure features—path depth, branch complexity, and real-time performance—are defined to characterize access patterns. Indicators reflecting the structural form and timeliness of access behavior within the network are extracted from the context of data asset nodes accessing the global data flow topology graph. Path depth and branch complexity reveal whether the access is local or cross-domain, simple or complex; real-time performance distinguishes whether the access is immediate or has tolerable latency. The structured feature vector formed by these three features can effectively distinguish fundamentally different topology access patterns, such as shallow real-time point queries and deep, complex offline analysis, providing the most discriminative input features for accurate pattern classification based on machine learning.
[0059] S3.9. Input a pre-trained graph embedding fine-tuning model with topological access patterns as the classification target to obtain the probability distribution; The category with the highest probability is determined as the topology access mode identifier for this visit.
[0060] Furthermore, the graph embedding fine-tuning model is a trained machine learning classifier whose input layer receives structured feature vectors. During pre-training, the model may use unsupervised methods to learn the node and structural representations of the global data flow topology graph. During fine-tuning, it uses supervised training with historical access context samples labeled with topology access patterns to learn the mapping from structured feature vectors to different topology access patterns. During forward propagation, the model processes individual input structured feature vectors, ultimately producing a probability distribution at the output layer. Each probability value in the distribution corresponds to the likelihood of a predefined topology access pattern category. The category with the highest probability value, and its corresponding unique code or name, is selected as the topology access pattern identifier for this visit.
[0061] Specifically, a finely tuned graph embedding model is employed. Because this model has been exposed to the structural information of the global data flow topology graph during the pre-training phase, it can better understand the deep semantics of these topological features. It can capture the complex nonlinear relationships between different feature combinations, thus more accurately distinguishing access patterns that are easily confused under simple rules. It can handle complex and varied access behaviors in the real world, ensuring the reliability of the generated topological access pattern identifiers.
[0062] S3.10. Based on the topology access pattern identifier, retrieve the topology access pattern that is bound to the topology access pattern identifier and has a complete description of the typical scenarios, evaluation focus and reconstruction rules from the dynamically evolving pattern knowledge graph.
[0063] Furthermore, the pattern knowledge graph is a graph database or document library storing complete knowledge of various topology access patterns. Each topology access pattern identifier is an entity node, which is associated with multiple knowledge items through attributes or edges. Typical scenarios describe common business use cases for this pattern. The evaluation focus indicates which aspects or combinations of data quality and business value dimensions should be prioritized under this pattern. The reconstruction rules specify in detail how to dynamically transform the networked, updated multi-dimensional evaluation vector under this pattern to generate contextualized results. The retrieval operation uses the topology access pattern identifier as the key to query the pattern knowledge graph, returning all attributes and relationships directly associated with that identifier. This information together constitutes a complete and actionable topology access pattern description, serving as a knowledge package or configuration template to guide the subsequent contextualized evaluation.
[0064] S4. Capture the context of data asset nodes accessing the global data flow topology graph, identify the topology access pattern based on the access context, dynamically reconstruct the multi-dimensional evaluation vector, and generate a contextualized final evaluation result.
[0065] S4.1. Based on the topology access pattern, search for and match the corresponding dynamic reconstruction strategy from the predefined strategy library, specifying the evaluation dimensions, operation types and operation parameters to be operated, and obtain the dynamic reconstruction strategy.
[0066] Furthermore, the search operation uses the pattern identifier contained in the topology access pattern as the search key to query a centralized policy configuration repository, i.e., a predefined policy repository. The predefined policy repository is organized in the form of key-value pairs, configuration files, or database tables. Each entry maps a topology access pattern to a detailed policy description document. The policy description document explicitly lists the specific evaluation dimensions targeted by this dynamic refactoring, such as whether it targets data quality, business value, or specific sub-indicators of both. The operation type defines the types of transformations applied to these dimensions, mainly including non-linear scaling and selective hiding / revealing. The operation parameters further refine the specific behavior of each operation type. For example, for non-linear scaling, the parameters may specify the type and strength of the scaling function used; for selective hiding / revealing, the parameters list the dimensions that need to be hidden. Upon successful retrieval, this complete policy description document, containing the evaluation dimensions, operation types, and operation parameters, is loaded and instantiated as the currently executable dynamic refactoring policy.
[0067] S4.2. Based on the operation type and operation parameters specified in the dynamic reconstruction strategy, perform nonlinear scaling and selective explicit / implicit scaling on the scores of the specified evaluation dimensions in the multi-dimensional evaluation vector set after network update to obtain intermediate reconstruction evaluation vectors.
[0068] Furthermore, the execution process locates the multi-dimensional evaluation vector of the specific data asset node related to the current access request within the network-updated multi-dimensional evaluation vector set. For non-linear scaling operations, the scores of the target evaluation dimensions are transformed according to the scaling function type and strength parameters specified in the dynamic reconstruction strategy. Using functions such as sigmoid functions, exponential functions, or piecewise functions aims to amplify or compress differences in specific numerical intervals to more prominently reflect the importance gradient in that context. For example, in scenarios with extremely high real-time requirements, an exponential scaling function is applied to the timeliness dimension, causing a significant difference in scores between high- and low-timeliness data in that dimension. For selective explicit / implicit operations, based on the dimension list given in the dynamic reconstruction strategy, dimensions considered secondary or irrelevant in the current context are temporarily removed or folded from the evaluation vector, retaining only dimensions strongly correlated with the current topology access pattern. After the above series of non-linear scaling and selective explicit / implicit operations, the original network-updated multi-dimensional evaluation vector is transformed into a new vector with changes in both dimension value scaling and dimension composition; this new vector is the intermediate reconstruction evaluation vector.
[0069] S4.3. Normalize and format the intermediate reconstruction evaluation vector, add topology access mode identifier and timestamp, and generate contextualized final evaluation results.
[0070] Furthermore, normalization ensures that the scores of each dimension in the intermediate reconstruction evaluation vector conform to the range or standard agreed upon in the output. For example, all dimension values are mapped to a closed interval between zero and one, or normalization is performed to eliminate the dimensional effects that may be caused by different scaling operations. Formatting and encapsulation then assemble the normalized intermediate reconstruction evaluation vector, along with necessary metadata, according to a predefined structured data format. The required metadata includes a topology access mode identifier that triggered this evaluation reconstruction, used to trace the contextual source of the evaluation result; and a timestamp indicating the effective time point of the evaluation result. The final generated complete data package, containing the normalized evaluation vector, topology access mode identifier, and timestamp, is the contextualized final evaluation result. This data package is self-describing, containing both the core numerical values of the evaluation conclusion and the contextual and temporal context in which this conclusion was generated.
[0071] S5. Store the contextualized final evaluation results and input them into the strategy engine. Trigger data governance operation instructions based on the contextualized final evaluation results to drive the full lifecycle management of industrial data assets.
[0072] S5.1 Write the contextualized final evaluation result into the evaluation result history database, and publish the contextualized final evaluation result as a real-time message to the message input queue of the strategy engine.
[0073] Furthermore, the contextualized final evaluation results are persistently stored in an evaluation result history database. This database can be a time-series database, a relational database, or a document database, used for efficient storage and querying of historical evaluation records by time range, data asset identifier, and other dimensions. Simultaneously, the contextualized final evaluation results are published as a real-time message to the message input queue of the strategy engine, maintained by the message middleware. Message publishing ensures that the contextualized final evaluation results can be consumed by the strategy engine almost in real-time after generation. This achieves a dual output path for evaluation results, satisfying both the needs for long-term historical tracing and analysis and the needs for real-time event-driven immediate decision-making.
[0074] S5.2 Match the evaluation dimension scores in the contextualized final evaluation results with the preset threshold rules. When the match is successful, generate the data governance operation instructions to be executed.
[0075] Furthermore, the strategy engine continuously monitors the message input queue, consuming the arriving contextualized final evaluation results. For each contextualized final evaluation result, the strategy engine extracts evaluation dimension scores and, based on the topology access mode identifier attached to the contextualized final evaluation result, selects a set of pre-defined threshold rules associated with it. The pre-defined threshold rules have a condition-action structure. The condition part defines the logical judgment of the evaluation dimension scores, such as a data quality dimension score below threshold X and a topology access mode identifier indicating critical real-time consumption. The strategy engine substitutes the extracted evaluation dimension scores into the condition parts of these rules for evaluation. When all conditions of a rule are met, i.e., a successful match, the strategy engine instantiates the action part corresponding to that rule. The action part defines the data governance operation type to be generated, the target object identifier, and the operation parameters, thereby generating a structured, executable data governance operation instruction to be executed.
[0076] S5.3. Send the data governance operation instructions to be executed to the corresponding data governance executor. The data governance executor executes the operation defined in the instruction. The successful execution of the operation changes the status, location or configuration of the registered data asset object, driving the full lifecycle management of industrial data assets.
[0077] Furthermore, based on the operation type specified in the data governance operation instruction to be executed, the instruction is routed to a registered data governance executor capable of handling that type of operation. A data governance executor is an independent service or agent specifically responsible for executing a particular type of data governance operation, such as a data quality repair executor, a data archiving executor, or a data access permission adjustment executor. After receiving the data governance operation instruction to be executed, the data governance executor parses the operation parameters and target object in the instruction, and calls its own implemented operation logic to complete the actual task. The operation directly affects the registered data asset object, potentially changing its state (e.g., marking it as repaired or archived); changing its location (e.g., migrating data from hot storage to cold storage); or changing its configuration (e.g., adjusting its access control list or data retention policy). Successful execution of the operation signifies that the actual state of the data asset has changed as expected according to the assessment-driven decision, thus achieving proactive management of one stage of the entire lifecycle of industrial data assets.
[0078] This embodiment also provides a multi-dimensional evaluation system for the entire lifecycle of industrial data, including: a data acquisition module, which collects industrial data sources and extracts static metadata information to obtain registered data asset objects; The module builds a global data flow topology diagram, monitors the operation of the data pipeline, identifies the flow and evolution relationships between registered data asset objects, and constructs a global data flow topology diagram. The initialization module initializes a multi-dimensional evaluation vector for each data asset node in the global data flow topology graph, and obtains a set of multi-dimensional evaluation vectors after network update. The identification module captures the context of data asset nodes accessing the global data flow topology graph, identifies the topology access pattern based on the access context, dynamically reconstructs the multi-dimensional evaluation vector, and generates a contextualized final evaluation result. The driver module stores the contextualized final evaluation results and inputs them into the strategy engine. Based on the contextualized final evaluation results, it triggers data governance operation instructions to drive the full lifecycle management of industrial data assets.
[0079] This embodiment also provides a computer device applicable to the multi-dimensional evaluation method of the entire lifecycle of industrial data, including: a memory and a processor; the memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions to realize the multi-dimensional evaluation method of the entire lifecycle of industrial data as proposed in the above embodiment.
[0080] The computer device can be a terminal, comprising a processor, memory, communication interface, display screen, and input devices connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, carrier networks, NFC (Near Field Communication), or other technologies. The display screen can be an LCD screen or an e-ink screen. The input devices can be a touch layer covering the display screen, buttons, a trackball, or a touchpad on the computer device's casing, or an external keyboard, touchpad, or mouse.
[0081] This embodiment also provides a storage medium storing a computer program that, when executed by a processor, implements the multi-dimensional evaluation method for the entire lifecycle of industrial data as proposed in the above embodiments. The storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Red-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.
[0082] In summary, this invention defines multi-dimensional evaluation vectors for data asset nodes and enables them to be updated collaboratively in a networked manner along the global data flow topology. This achieves dynamic propagation and global linkage of evaluation results in the data lineage network, allowing changes in the quality and value of upstream data to be quantified and transmitted to downstream related assets. This accurately identifies key influencing nodes and risk transmission paths in the data ecosystem, providing a basis for the precise allocation of governance resources. By capturing the context of data access and identifying topology access patterns, the evaluation vectors are dynamically reconstructed based on scenarios. This allows the same data asset to generate highly adaptable and directly operational contextualized evaluation results according to different business scenarios, achieving a leap from general evaluation to scenario-optimal evaluation. This enhances the practicality and guiding value of the evaluation. By automatically triggering corresponding data governance operation instructions with contextualized evaluation results, an intelligent closed loop of evaluation-diagnosis-decision-execution is formed, driving lean management of industrial data assets throughout their entire lifecycle from passive response to proactive optimization.
[0083] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.
Claims
1. A multi-dimensional evaluation method for the entire lifecycle of industrial data, characterized in that: This includes collecting data from industrial data sources and extracting static metadata to obtain registered data asset objects; Monitor the operation of the data pipeline, identify the flow and evolution relationships between registered data asset objects, and construct a global data flow topology map; Initialize a multi-dimensional evaluation vector for each data asset node in the global data flow topology graph to obtain a set of multi-dimensional evaluation vectors after network-based updates; Capture the context of data asset nodes accessing the global data flow topology graph, identify topology access patterns based on the access context, dynamically reconstruct multi-dimensional evaluation vectors, and generate contextualized final evaluation results. The contextualized final assessment results are stored and input into the strategy engine, and data governance operation instructions are triggered based on the contextualized final assessment results to drive the full lifecycle management of industrial data assets.
2. The multi-dimensional evaluation method for the entire lifecycle of industrial data as described in claim 1, characterized in that: The registered data asset objects include, Industrial data entities are obtained by collecting data from industrial data sources, and static metadata information is extracted. Based on the extracted static metadata information, a globally unique asset identifier is assigned to the industrial data entity. The industrial data entity with the globally unique asset identifier and the corresponding extracted static metadata information are used to create an asset record in the data asset register, resulting in a registered data asset object.
3. The multi-dimensional evaluation method for the entire lifecycle of industrial data as described in claim 2, characterized in that: The construction of the global data flow topology graph includes, Deploy lineage probes at the processing nodes of the running data pipeline to capture data lineage logs and obtain the data lineage logs; By analyzing the data lineage log, the registered data asset objects can be identified, and their flow and evolution relationships can be obtained. Using registered data asset objects as nodes and flow and evolution relationships as directed edges, the data is assembled and updated in real time in the graph database to construct a global data flow topology graph.
4. The multi-dimensional evaluation method for the entire lifecycle of industrial data as described in claim 3, characterized in that: The network-updated multi-dimensional evaluation vector set includes, Define a multi-dimensional evaluation vector for each data asset node in the global data flow topology diagram, with data quality and business value dimensions. Calculate the initial scores for data quality and business value dimensions based on the metadata of the data asset node to obtain the initialized multi-dimensional evaluation vector. In the global data flow topology diagram, when the initial multi-dimensional evaluation vector of a data asset node changes due to quality audit or business call, an evaluation update event is triggered. Based on the direction and semantic type of the connecting edges in the global data flow topology graph, the coefficients and influence functions for the evaluation update event are defined to propagate along the edges. The score change of the data asset node is calculated and superimposed on the corresponding dimension of the initial multi-dimensional evaluation vector of the upstream and downstream data asset nodes. Traverse the data asset nodes in the global data flow topology that are triggered to update due to direct or indirect associations, summarize the multi-dimensional evaluation vectors after node updates, and form a set of multi-dimensional evaluation vectors after network updates.
5. The multi-dimensional evaluation method for the entire lifecycle of industrial data as described in claim 4, characterized in that: The context of the access includes, In the global data flow topology diagram, every operation that accesses a data asset node is monitored, and basic log information such as access time, visitor identity, access operation type, and access performance requirements is recorded. From the basic log information, extract the source and target node paths of the access operation in the global data flow topology graph, and combine them with the metadata of the data asset nodes to generate semantic tags that describe the location and intent of this access in the topology. The basic log information is integrated and structured with semantic tags to form the context for data asset nodes to access in the global data flow topology.
6. The multi-dimensional evaluation method for the entire lifecycle of industrial data as described in claim 5, characterized in that: The identified topology access patterns include, From the context of data asset nodes accessing the global data flow topology graph, extract the path depth, branch complexity, and real-time performance of access requests in the global data flow topology graph to form a structured feature vector for identifying topology access patterns. The structured feature vectors are input into a pre-trained graph embedding fine-tuning model that classifies topological access patterns to obtain the probability distribution. The category with the highest probability is determined as the topology access mode identifier for this visit; Based on the topology access pattern identifier, the topology access pattern, which is bound to the topology access pattern identifier, is retrieved from the dynamically evolving pattern knowledge graph. The topology access pattern is characterized by typical scenarios, evaluation focus, and a complete description of reconstruction rules.
7. The multi-dimensional evaluation method for the entire lifecycle of industrial data as described in claim 6, characterized in that: The final evaluation results of the contextualization include, Based on the topology access pattern, the corresponding dynamic reconstruction strategy is searched and matched from the predefined strategy library. The evaluation dimensions, operation types and operation parameters that need to be operated are specified to obtain the dynamic reconstruction strategy. Based on the operation type and operation parameters specified in the dynamic reconstruction strategy, nonlinear scaling and selective explicit / implicit operations are performed on the scores of the specified evaluation dimensions in the multi-dimensional evaluation vector set after network update to obtain intermediate reconstructed evaluation vectors. The intermediate reconstruction evaluation vector is normalized and formatted, and topology access mode identifiers and timestamps are added to generate contextualized final evaluation results.
8. The multi-dimensional evaluation method for the entire lifecycle of industrial data as described in claim 7, characterized in that: The data governance operation instructions include, Write the contextualized final evaluation result into the evaluation result history database, and publish the contextualized final evaluation result as a real-time message to the message input queue of the strategy engine; The evaluation dimension scores in the contextualized final evaluation results are matched with preset threshold rules. When a match is successful, a data governance operation instruction to be executed is generated.
9. The multi-dimensional evaluation method for the entire lifecycle of industrial data as described in claim 8, characterized in that: The full lifecycle management includes, The data governance operation instructions to be executed are sent to the corresponding data governance executor, which then executes the operations defined in the instructions. Successful execution of the operations changes the status, location, or configuration of the registered data asset objects, driving the full lifecycle management of industrial data assets.
10. A multi-dimensional assessment system for the entire lifecycle of industrial data, based on the multi-dimensional assessment method for the entire lifecycle of industrial data as described in any one of claims 1 to 9, characterized in that: This includes a data acquisition module, which collects data from industrial data sources and extracts static metadata information to obtain registered data asset objects; The module builds a global data flow topology diagram, monitors the operation of the data pipeline, identifies the flow and evolution relationships between registered data asset objects, and constructs a global data flow topology diagram. The initialization module initializes a multi-dimensional evaluation vector for each data asset node in the global data flow topology graph, and obtains a set of multi-dimensional evaluation vectors after network update. The identification module captures the context of data asset nodes accessing the global data flow topology graph, identifies the topology access pattern based on the access context, dynamically reconstructs the multi-dimensional evaluation vector, and generates a contextualized final evaluation result. The driver module stores the contextualized final evaluation results and inputs them into the strategy engine. Based on the contextualized final evaluation results, it triggers data governance operation instructions to drive the full lifecycle management of industrial data assets.