Data asset visualization method and system based on business requirements

By classifying multi-source heterogeneous data into major categories and performing stability analysis, and constructing an effective difference distance, the problems of accurate classification and dynamic mapping in traditional data asset visualization processing are solved, and high-precision matching and visualization updates of data assets and enterprise management indicators are achieved.

CN122240901APending Publication Date: 2026-06-19HUBEI CENT CHINA TECH DEV OF ELECTRIC POWER

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUBEI CENT CHINA TECH DEV OF ELECTRIC POWER
Filing Date
2026-05-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional data asset visualization processing cannot achieve accurate classification, making it difficult to dynamically update the matching and linkage between enterprise management indicators and data assets, thus affecting the accuracy and reliability of visualization results.

Method used

By acquiring multi-source heterogeneous data, the data is initially divided into major categories based on the source. The stable difference values ​​and temporal correlations of individual data in each collection period are analyzed, an effective difference distance is constructed, clustering is performed, and a two-way mapping relationship between data and enterprise management indicators is established to achieve precise visualization processing of data assets.

Benefits of technology

It improves the classification accuracy of data assets, ensures the dynamic mapping and updating of enterprise management indicators and data assets, and enhances the accuracy and credibility of visualization results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240901A_ABST
    Figure CN122240901A_ABST
Patent Text Reader

Abstract

This application relates to the field of visualization processing technology, specifically to a data asset visualization method and system based on business needs. The method includes: acquiring multi-source heterogeneous data and initially classifying it into multiple homogeneous data categories based on the data source; for a single homogeneous data category, obtaining the stable difference value of a single data type in each collection period; obtaining the stability analysis confidence level of the single data type; forming a sample to be divided by combining the stable difference values ​​of all data types within each collection period, and obtaining the effective difference distance between any two samples to be divided; clustering all samples to be divided and extracting the effective collection period; establishing a two-way mapping relationship between data within the effective collection period and enterprise management indicators, and visualizing the data assets. This application aims to achieve accurate visualization processing of data assets.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of visualization processing technology, specifically to data asset visualization methods and systems based on business needs. Background Technology

[0002] Data asset visualization based on business needs can aggregate and manage dispersed, heterogeneous, and multi-source data assets in a standardized manner. It is a core processing tool for enterprise management, business operations, and decision analysis. In the visualization process, complex data assets can be transformed into intuitive forms such as charts, dashboards, and large screens, clearly showing the data distribution, quality status, and relationships. This can help managers and technicians quickly grasp the overall picture of data assets, improve data retrieval efficiency, and assist in monitoring business indicators, locating problems, issuing risk warnings, and optimizing resources, thereby realizing the visualized management and analysis of data assets.

[0003] However, traditional data asset visualization processing uses static and fixed classification methods, which cannot achieve accurate classification. This makes it difficult to achieve accurate matching and linkage between enterprise management indicators and data assets. When indicator definitions change or data asset fields are adjusted, the mapping relationship cannot be automatically updated, resulting in reduced accuracy and reliability of visualization results. Summary of the Invention

[0004] In light of the above, it is necessary to provide a data asset visualization method and system based on business needs, which, compared to traditional data asset visualization methods, achieves precise visualization processing of data assets.

[0005] In a first aspect, embodiments of this application provide a data asset visualization method based on business needs, the method comprising the following steps:

[0006] Acquire multi-source heterogeneous data and initially classify it into several major categories of data from the same source based on the data source;

[0007] For a single data category from the same source, the stable difference value of a single data type in each collection period is obtained by analyzing the difference in probability distribution and numerical change between each collection period and other collection periods.

[0008] By analyzing the temporal correlation of stable differences between a single data type and other data types within a single data category from the same source, and the dispersion of stable differences of a single data type across all collection periods, the confidence level of stability analysis for a single data type is obtained.

[0009] The stable difference values ​​of all data types within each collection period are grouped into a sample to be divided. The effective difference distance between any two samples is obtained by analyzing the differences in the stable difference values ​​of various data types between any two samples and by considering the proportion of the stability analysis confidence scores of each data type in the stability analysis confidence scores of all data types. All samples to be divided are then clustered using the effective difference distance. The effective collection period is extracted by the number of samples to be divided in each cluster. Finally, a two-way mapping relationship between the data within the effective collection period and enterprise management indicators is established to visualize the data assets.

[0010] In one embodiment, the process of obtaining the stable difference value is as follows:

[0011] Perform probability distribution statistics on a single data type within each acquisition period to obtain the probability distribution curve of the single data type within each acquisition period; calculate the sum of the two-way KL divergence values ​​of the probability distribution curves of the single data type between each acquisition period and each other acquisition period.

[0012] Measure the difference between a single data type and each of the other data collection periods when arranged chronologically.

[0013] By combining the sum and the difference between a single data type in each acquisition period and all other acquisition periods, a stable difference value for a single data type in each acquisition period is obtained.

[0014] In one embodiment, the calculation process for the stable difference value is as follows:

[0015] The product of the normalized sum and the difference is denoted as the difference measure product.

[0016] The stable difference value is obtained by multiplying the difference measure of a single data type between each acquisition period and all other acquisition periods.

[0017] In one embodiment, the process of obtaining the stability analysis confidence level is as follows:

[0018] The stable difference values ​​of various data in all collection periods are arranged in time sequence to form a stable difference value sequence for each data type; the mean of the absolute values ​​of the correlation coefficients between a single data type and all other data types in the stable difference value sequence is calculated.

[0019] The confidence level of the stability analysis is positively correlated with the mean and negatively correlated with the dispersion.

[0020] In one embodiment, the stability analysis confidence level is the ratio of the mean to the dispersion.

[0021] In one embodiment, the process of obtaining the effective difference distance is as follows:

[0022] Calculate the deviation of the stable difference values ​​of various data between any two samples to be divided;

[0023] Calculate the arithmetic mean of the stable differences in various data between any two samples to be divided;

[0024] The effective difference distance is positively correlated with the deviation of all types of data between any two samples to be divided, positively correlated with the proportion of all types of data, and negatively correlated with the arithmetic mean of all types of data between any two samples to be divided.

[0025] In one embodiment, the calculation process for the effective difference distance is as follows:

[0026] The ratio of the deviation of various data between any two samples to be divided to the arithmetic mean is denoted as the relative deviation value.

[0027] The effective difference distance is positively correlated with the relative deviation value of all types of data between any two samples to be divided, and is also positively correlated with the numerical proportion of all types of data.

[0028] In one embodiment, the effective difference distance is calculated as follows:

[0029] The product of the relative deviation value of various data between any two samples to be divided and the proportion of the values ​​is denoted as the comprehensive product;

[0030] The effective difference distance is the average of the comprehensive product of all types of data between any two samples to be divided.

[0031] In one embodiment, the effective acquisition period is the acquisition period containing the samples to be divided in the cluster with the largest number of samples to be divided.

[0032] Secondly, embodiments of this application also provide a data asset visualization system based on business needs, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, it implements the steps of any of the above-described data asset visualization methods based on business needs.

[0033] This application has at least the following beneficial effects:

[0034] This application performs a preliminary classification of multi-source heterogeneous data based on the data source, which can classify multi-source heterogeneous data according to the source, effectively reduce data complexity, avoid analysis bias caused by the mixing of features of data from different sources, and help improve the accuracy of subsequent periodic stability judgment.

[0035] Furthermore, probability distribution differences reflect shifts in the inherent statistical characteristics of the data, while numerical variation differences reflect deviations in the temporal evolution trend. By integrating these two dimensions—probability distribution differences and numerical variation differences—we can effectively identify data fluctuations caused by equipment failures, communication anomalies, or external disturbances. This comprehensively characterizes the degree of stability deviation of a single type of data in a specific collection period compared to other periods, providing a quantitative basis for screening high-quality stable data. Furthermore, by introducing temporal correlation analysis, we can utilize the inherent correlations between various data within the same data category to verify the reliability of single-data stability analysis. Simultaneously, by combining dispersion, we can comprehensively measure periodic fluctuations and random abnormal fluctuations, thereby comprehensively evaluating the credibility of the stability analysis results and providing reliable weighting parameters for subsequently measuring the differences between samples to be divided.

[0036] Furthermore, by constructing an effective difference distance, when measuring the differences between samples to be divided, it is possible to adaptively amplify dimensions with significant data fluctuations and suppress the influence of dimensions with low analytical credibility, thereby improving the accuracy of the effective screening period. This allows for the accurate extraction of high-quality data that can establish a stable mapping relationship with enterprise management indicators, improving the accuracy of the mapping relationship between enterprise management indicators and data assets. Based on the extracted high-quality data, visualization rendering is performed to achieve precise visualization processing of data assets. Attached Figure Description

[0037] To more clearly illustrate the technical solutions and advantages in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0038] Figure 1 A flowchart illustrating the steps of a data asset visualization method based on business needs, provided in one embodiment of this application;

[0039] Figure 2 A schematic diagram of the extraction process for an effective data collection cycle. Detailed Implementation

[0040] In the description of the embodiments in this application, the words "exemplary," "or," and "for example" are used to indicate examples, illustrations, or descriptions. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design schemes. Specifically, the use of the words "exemplary," "or," and "for example" is intended to present the relevant concepts in a specific manner.

[0041] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. It should be understood that, unless otherwise stated, " / " in this application means "or".

[0042] It should also be noted that the terms "first" and "second" in this application are used to distinguish similar objects, rather than to describe a specific order or sequence.

[0043] The following, in conjunction with the accompanying drawings, details the specific solutions for the data asset visualization method and system based on business needs provided in this application.

[0044] Please see Figure 1 The diagram illustrates a flowchart of a data asset visualization method based on business needs provided in an embodiment of this application. The method includes the following steps:

[0045] Step 1: Obtain multi-source heterogeneous data.

[0046] To achieve data asset visualization processing based on business needs, this application acquires multi-source heterogeneous data requiring visualization processing. It unifies the access of various internal business systems, databases, IoT devices, and external data, utilizing a combination of real-time synchronization and batch acquisition to achieve full coverage of structured, semi-structured, and unstructured data collection. Specifically, this application uses FlinkCDC, Sqoop, and Kafka components to complete the collection of multi-source heterogeneous data, which includes four categories: power grid operation measurement data, voltage and current load time-series data, IoT device data, and external power grid environment data. Among these, power grid operation measurement data... Data refers to measurement data collected by the power grid dispatch and control system that reflects the real-time operating status of the power grid, including at least: measured voltage, measured current, active power, and reactive power at each monitoring point; voltage and current load time series data refers to time series data used for load analysis and forecasting, including at least: active power load value and reactive power load value for each power consumption area or user; data collected by Internet of Things (IoT) devices refers to data collected through IoT terminal devices such as smart meters and distribution automation terminals, including at least user electricity consumption, transformer temperature, and winding temperature; external power grid environment data refers to data on external environmental factors that affect the operating status of the power grid, including at least: temperature, humidity, wind speed, and precipitation. Subsequently, multi-source heterogeneous data is cleaned, transformed, and standardized using algorithms for duplicate data identification, intelligent missing value imputation, outlier correction, and format standardization. This process removes redundant data, repairs missing values, corrects erroneous values, and unifies data formats, field encoding, and business definitions. It avoids issues such as data duplication, missing data, garbled characters, conflicting definitions, and time asynchrony that could lead to distorted visualization results, incorrect indicator calculations, or decreased monitoring accuracy. This ensures that the data entering the classification and visualization stages possesses high accuracy, completeness, and consistency, providing a high-quality data foundation for subsequent high-precision classification, association mapping, and visualization rendering. It should be noted that the data cleaning, transformation, and standardization techniques are well-known to those skilled in the art and will not be elaborated upon here.

[0047] Step 2: Based on the data source, the data is initially divided into multiple major categories of data from the same source; for a single major category of data from the same source, the stable difference value of a single data type in each collection period is obtained; the stability analysis confidence of a single data type is obtained; the stable difference values ​​of all data types in each collection period are combined into a sample to be divided, and the effective difference distance between any two samples to be divided is obtained; all samples to be divided are clustered to extract the effective collection period.

[0048] Step 2.1: Based on the data source, the data is initially divided into several major categories with the same source.

[0049] Based on the data collection and preprocessing in step 1, multi-source heterogeneous data from different sources are obtained after preprocessing. Considering that in the process of data asset visualization based on business needs, if the classification and management accuracy of multi-source heterogeneous data is poor, it may lead to the inability of the bidirectional correlation between enterprise management indicators and data assets to meet the requirements of dynamic mapping and updating. Therefore, in this application, multi-source heterogeneous data is initially divided into several major categories of data from the same source based on the data source. The specific process is as follows:

[0050] Multi-source heterogeneous data, after collection and preprocessing, is used as input. Based on the data source system, business affiliation, and collection channel, a preliminary classification is performed using a homogeneous grouping rule and a tree-structured directory partitioning method. Data from the same business system, database, IoT device type, or collection interface are grouped into the same category, resulting in multiple homogeneous data categories after source normalization. Based on this processing, data from the same source are grouped into the same category, allowing for further analysis of the periodic stability characteristics of various data within each homogeneous data category. This enables the selection of highly stable and effective data, reducing data complexity and providing a reliable data foundation for subsequent feature extraction and accurate clustering.

[0051] Step 2.2: For a single data category from the same source, the stable difference value of a single data type in each collection period is obtained by analyzing the difference in probability distribution and numerical change between each collection period and other collection periods.

[0052] Furthermore, since the quality of a single type of data varies significantly at different times during actual processing, if a single type of data deviates greatly in reflecting enterprise management indicators, it will affect the accuracy of dynamic update mapping management when enterprise management indicators and data assets are linked in a dual-chain manner. Therefore, after the initial division, it is necessary to analyze the data in each major category of data from the same source and select data with high stability as valid data. The higher the stability of the valid data, the higher the accuracy of the visualization processing of the two-way linkage between enterprise management indicators and data assets.

[0053] Based on the above analysis, taking a single common data category as an example, subsequent analysis is conducted on this single common data category. By analyzing the differences in probability distribution and numerical changes of a single data type within the single common data category between each collection period and other collection periods, the stable difference values ​​of the single data type in each collection period are obtained. The specific process is as follows:

[0054] The probability distribution of a single data type is statistically analyzed in each collection period to obtain the probability distribution curve of the single data type in each collection period; the normalized value of the sum of the two-way KL divergence values ​​of the probability distribution curves of the single data type in each collection period and each other collection period is denoted as the probability distribution difference.

[0055] Since the data variation of a single type of data collected from the same source is relatively small in different collection periods, that is, the periodic data variation is relatively stable, in order to effectively obtain periodic data with stable data characteristics, the difference between a single type of data in each collection period and each of the other collection periods is measured in chronological order and recorded as the degree of difference in numerical variation of a single type of data between each collection period and each of the other collection periods.

[0056] The product of the probability distribution difference degree and the numerical change difference degree is denoted as the difference measure product;

[0057] The stable difference value is obtained by multiplying the difference measure of a single data type between each acquisition period and all other acquisition periods. The calculation of the KL divergence value is a well-known technique and will not be elaborated upon in this application.

[0058] It should be noted that the collection period is preset by the user. The collection period for all types of data in a single data category is the same. In this embodiment, the collection period is 5 minutes. The implementer can set the collection period length according to the actual situation. This application does not impose any special restrictions.

[0059] In this embodiment, the probability distribution curve is obtained by nonlinearly fitting the statistical results of the probability distribution using the least squares method. The least squares method is a well-known technique and will not be described in detail here. As other implementation methods, based on the ability to perform nonlinear fitting on the statistical results of the probability distribution, implementers may use other existing feasible techniques, and this application does not impose any special restrictions.

[0060] In this embodiment, since the KL divergence value is asymmetric, not specifying the baseline distribution will result in two different KL divergence values ​​being calculated for the same pair of acquisition periods. Therefore, probability distribution difference analysis is performed by calculating the sum of the two-way KL divergence values.

[0061] In this embodiment, the process of measuring the difference between a single type of data in each acquisition period and each of the other acquisition periods when arranged in time sequence is as follows: the single type of data is arranged in time sequence in each acquisition period to form a time sequence of the single type of data in each acquisition period; the DTW (Dynamic Time Warping) distance of the time sequence of the single type of data in each acquisition period and each of the other acquisition periods is calculated; and the normalized value of the DTW distance is taken as the difference between the single type of data in each acquisition period and each of the other acquisition periods when arranged in time sequence. The calculation of the DTW distance is a well-known technique and will not be described in detail in this application. As other implementation methods, based on the ability to measure the degree of difference between time sequence sequences, the implementer may adopt other existing feasible techniques, such as Euclidean distance, etc. This application does not impose any special restrictions.

[0062] In this application, unless otherwise specified, all normalization operations are performed using the maximum value normalization method, which is a well-known technique and will not be described in detail here.

[0063] In this embodiment, the expression for the stable difference value of a single type of data in each acquisition cycle is:

[0064] In the formula, This represents the stable difference value of a single data type in the xth collection period; This represents the degree of difference in the probability distribution of a single type of data between the x-th and i-th data collection periods; This represents the difference in numerical variation of a single data type between the x-th and i-th acquisition periods; n represents the total number of acquisition periods for a single data type. Wherein, This is denoted as the product of the difference measure.

[0065] It should be noted that: the greater the calculated probability distribution difference, the more likely the single data point is to have a significant probability distribution difference in each collection period compared to other collection periods; the greater the calculated numerical change difference, the greater the difference in the change of the single data point over time in different collection periods; the greater the calculated stable difference value, the greater the stability difference of the single data point in each collection period compared to other collection periods, and the greater the impact on the accuracy of subsequent enterprise management indicator correlation matching.

[0066] Step 2.3: By analyzing the temporal correlation of stable difference values ​​between a single data type and other data types within a single source data category, and the dispersion of stable difference values ​​of a single data type across all collection periods, the confidence level of the stability analysis of a single data type is obtained.

[0067] Furthermore, by analyzing the temporal correlation of stable differences between a single data type and other data types within a single data category, and the dispersion of stable differences for a single data type across all collection periods, the confidence level of stability analysis for that single data type is obtained. The specific process is as follows:

[0068] The stable difference values ​​of various data in all collection periods are arranged in time sequence to form a stable difference value sequence for each data type; the mean of the absolute values ​​of the correlation coefficients between a single data type and all other data types in the stable difference value sequence is calculated.

[0069] The expression for the confidence level of stability analysis of a single data set is:

[0070] In the formula, S represents the confidence level of stability analysis of a single data type; p represents the mean of the absolute values ​​of the correlation coefficients between the single data type and all other data types; and b represents the dispersion of the stable difference values ​​of the single data type across all collection periods.

[0071] In this embodiment, the correlation coefficient between stable differential value sequences is the Pearson correlation coefficient. The calculation of the Pearson correlation coefficient is a well-known technique and will not be described in detail here. As other implementation methods, based on the ability to measure the correlation between stable differential value sequences, implementers may use other existing feasible techniques, such as the Spearman correlation coefficient, etc. This application does not impose any special restrictions.

[0072] In this embodiment, the dispersion is specifically the variance. As other implementation methods, based on the ability to measure the dispersion of stable difference values, implementers may use other existing feasible techniques, such as standard deviation, coefficient of variation, etc. This application does not impose any special restrictions.

[0073] It should be added that: In this application, when calculating the ratio, if there is a case where the denominator is 0, the denominator is first mapped to a positive number before subsequent calculations are performed. There are many ways to map data to a positive number, and implementers can choose existing feasible methods according to the actual situation. In this embodiment, the purpose of mapping the data to a positive number is achieved by calculating the sum of the data and a preset value greater than 0. The value of the preset value greater than 0 is preset by humans, and implementers can set it according to the actual situation. This application does not impose any special restrictions. In this embodiment, the value of the preset value greater than 0 is 0.001.

[0074] It should be added that when there is only one type of data in a single common data category, p should be assigned a value of 1 in the formula for calculating the confidence level of stability analysis of this type of data.

[0075] It should be noted that: the larger the mean absolute value of the calculated correlation coefficient, the higher the correlation between a single data point and all other data points, and the higher the reliability of the stability analysis results for that single data point; the larger the calculated dispersion, the more significant the periodic fluctuations of the single data point, and the lower the reliability of the stability analysis results for that single data point; the higher the calculated confidence level of the stability analysis, the more reliable the stability analysis results for that single data point, and the higher the confidence level of the stability analysis results for judging the differences between data from different collection periods.

[0076] Step 2.4: Combine the stable difference values ​​of all data types within each collection period into a sample to be divided. Obtain the effective difference distance between any two samples to be divided by the difference in stable difference values ​​of various data types between any two samples to be divided, and the proportion of the stability analysis confidence of each data type in the stability analysis confidence of all data types. Cluster all samples to be divided using the effective difference distance, and extract the effective collection period by the number of samples to be divided in each cluster.

[0077] Furthermore, to avoid the impact of cyclical differences in different types of data within a single category of data from the same source on subsequent matching analysis of enterprise management indicators, it is necessary to select collection periods where all types of data are relatively stable, and use the data within these relatively stable collection periods as the basis for subsequent two-way linkage analysis between enterprise management indicators and data assets.

[0078] Based on the above analysis, the stable difference values ​​of all data types within each collection period are grouped into a sample to be divided. The effective difference distance between any two samples to be divided is obtained by analyzing the differences in stable difference values ​​of various data types between any two samples, and by considering the proportion of the stability analysis confidence scores of each data type in the stability analysis confidence scores of all data types. The specific process is as follows:

[0079] Calculate the deviation of the stable difference values ​​of various data between any two samples to be divided; calculate the arithmetic mean of the stable difference values ​​of various data between any two samples to be divided; and record the ratio of the deviation of various data between any two samples to be divided to the arithmetic mean as the relative deviation value.

[0080] The product of the relative deviation value of various data between any two samples to be divided and the proportion of the values ​​is denoted as the comprehensive product;

[0081] The effective difference distance is the average of the comprehensive product of all types of data between any two samples to be divided.

[0082] In this embodiment, the deviation between stable difference values ​​is the absolute value of the difference. As other implementation methods, based on the ability to measure the degree of difference between stable difference values, the implementer may use other calculation methods, such as the square of the difference, the ratio, etc. This application does not impose any special restrictions.

[0083] It should be noted that: the larger the calculated relative deviation value, the more significant the stable difference between any two samples to be divided under various data; the larger the calculated value proportion, the higher the confidence level of various data in judging the difference between any two samples to be divided after periodic comparative analysis.

[0084] Furthermore, based on the calculated effective difference distance between different samples to be divided, all samples to be divided are clustered. Through cluster analysis, the collection periods of each type of data that are relatively stable are selected to avoid affecting the accuracy of subsequent enterprise management indicator correlation matching due to inconsistent changes in the periods of different types of data from the same source.

[0085] In this embodiment, agglomerative hierarchical clustering algorithm is used to obtain the clustering results of all samples to be divided. The effective difference distance between different samples to be divided is used as the distance metric in the clustering process. The optimal number of clusters is determined based on the silhouette coefficient. Both the silhouette coefficient and the agglomerative hierarchical clustering algorithm are well-known technologies and will not be described in detail in this application. As other implementation methods, based on the ability to cluster all samples to be divided, implementers may use other existing feasible technologies. This application does not impose any special restrictions.

[0086] Since the power system operates stably for most of the data collection periods, the number of samples to be divided in each cluster is counted. The data collection period containing the samples to be divided in the cluster with the largest number of samples is taken as the effective data collection period. The data characteristics within the effective data collection period can characterize the baseline distribution of the normal operating state. The data within the effective data collection period is used as valid data for subsequent secondary classification, thereby improving the accuracy of the mapping relationship analysis of enterprise management indicators after secondary classification. A schematic diagram of the effective data collection period extraction process is shown below. Figure 2 As shown.

[0087] Step 3: Establish a two-way mapping relationship between data within the effective collection period and enterprise management indicators in order to visualize data assets.

[0088] Furthermore, the valid data retained after filtering individual data types are grouped into valid data subsets, which are then secondary-classified based on the enterprise management indicator system, indicator definitions, calculation logic, and display dimensions. A two-way matching rule between enterprise management indicators and data is adopted to establish a precise mapping relationship between data fields and enterprise management indicators: First, the enterprise management indicators corresponding to each valid data subset are identified, such as voltage qualification rate and load factor. Then, the data fields within the valid data subset are matched one-to-one with the calculation fields, statistical definitions, and display formats of the enterprise management indicators, ensuring that each type of data serves only the specified enterprise management indicator. For example, the measured voltage value and its collection time in the power grid operation measurement data are mapped to the voltage qualification rate indicator, and the active power and its operating time in the voltage and current load time series data are mapped to the load factor indicator, avoiding conflicts and calculation errors caused by mixing data from multiple enterprise management indicators. Finally, a secondary classification result set with a one-to-one correspondence with enterprise management indicators, stable correlation, and unified definitions is obtained, such as a dedicated dataset for voltage qualification rate and a dedicated dataset for load factor.

[0089] It should be noted that the effective data subset extracted through clustering serves to provide a statistically stable benchmark sample for the mapping relationship between fields and enterprise management indicators, ensuring a high confidence level in the mapping relationship. After the mapping rules are bound, they are sent to the real-time processing engine as global configuration parameters. During actual business operation, all real-time collected data are uniformly associated using the mapping rules. Through the above processing, even if physical faults occur in equipment such as transformers, lines, or smart meters, causing abnormal data fluctuations, the abnormal information can still be accurately pushed to the visualization interface through the established mapping channel, realizing the risk warning function.

[0090] Based on the results of secondary classification, a two-way linkage and mapping between enterprise management indicators and data assets is established. The calculation chain of enterprise management indicators is traced through a data lineage analysis algorithm, and the relationship between enterprise management indicators and data fields is determined using a field dependency identification algorithm. The Apriori algorithm is used to mine frequent itemset associations between enterprise management indicators and data assets, completing the intelligent binding of enterprise management indicators and data assets, forming a mutually verifiable and traceable two-way correspondence. When the definition of an enterprise management indicator changes, the changes are automatically identified, triggering updates to the underlying data asset mapping relationship, permission synchronization, and data quality re-verification to ensure that enterprise management indicators and data assets remain consistent, avoiding errors in visualization results due to inconsistent definitions. When a data asset field is modified, deleted, or adjusted, the associated enterprise management indicators are automatically identified, triggering recalculation and anomaly labeling of the enterprise management indicators, while simultaneously transmitting the change results to the visualization layer, achieving automatic and accurate updates of the visualization data. The data lineage analysis algorithm, field dependency identification algorithm, and Apriori algorithm are all well-known technologies and will not be described in detail here.

[0091] To achieve accurate visualization of data assets based on business needs, this application uses data association mapping results for visualization rendering and output. Specifically: First, enterprise management indicators are dynamically weighted according to business priorities. Weight values ​​are set through numerical input. Higher weights indicate greater impact of enterprise management indicators on business decisions and quality assessments, ensuring the visualization model aligns with business priorities. Based on the weight configuration results, and combined with the asset structure and bidirectional linkage after classification and grading, a visualization model is constructed using SpringBoot, Vue3, Element Plus, and data quality platform visualization modeling components, forming a standardized, scalable, and iterative visualization framework.

[0092] Furthermore, based on the constructed visualization model, visualization rendering engines such as ECharts, AntV, and DataV are used to visually display data asset distribution, quality scores, problem proportions, trend changes, and interrelationships in the form of pie charts, bar charts, line charts, dashboards, heatmaps, and tables, ensuring that the displayed content is highly aligned with business needs. After rendering, visualization results such as data asset dashboards, quality monitoring dashboards, analysis reports, and early warning panels are generated. All data displayed in the visualization layer is automatically updated based on bidirectional linkage, ensuring high accuracy, strong business collaboration, and high credibility of the visualization results, achieving high-quality data asset visualization management.

[0093] Based on the same inventive concept as the above methods, this application also provides a data asset visualization system based on business needs, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, it implements the steps of any one of the above-described data asset visualization methods based on business needs.

[0094] In summary, this application makes a preliminary classification of multi-source heterogeneous data based on the data source, which can classify multi-source heterogeneous data according to the source, effectively reduce data complexity, avoid analysis bias caused by the mixing of features of data from different sources, and help improve the accuracy of subsequent periodic stability judgment.

[0095] Furthermore, probability distribution differences reflect shifts in the inherent statistical characteristics of the data, while numerical variation differences reflect deviations in the temporal evolution trend. By integrating these two dimensions—probability distribution differences and numerical variation differences—we can effectively identify data fluctuations caused by equipment failures, communication anomalies, or external disturbances. This comprehensively characterizes the degree of stability deviation of a single type of data in a specific collection period compared to other periods, providing a quantitative basis for screening high-quality stable data. Furthermore, by introducing temporal correlation analysis, we can utilize the inherent correlations between various data within the same data category to verify the reliability of single-data stability analysis. Simultaneously, by combining dispersion, we can comprehensively measure periodic fluctuations and random abnormal fluctuations, thereby comprehensively evaluating the credibility of the stability analysis results and providing reliable weighting parameters for subsequently measuring the differences between samples to be divided.

[0096] Furthermore, by constructing an effective difference distance, when measuring the differences between samples to be divided, it is possible to adaptively amplify dimensions with significant data fluctuations and suppress the influence of dimensions with low analytical credibility, thereby improving the accuracy of the effective screening period. This allows for the accurate extraction of high-quality data that can establish a stable mapping relationship with enterprise management indicators, improving the accuracy of the mapping relationship between enterprise management indicators and data assets. Based on the extracted high-quality data, visualization rendering is performed to achieve precise visualization processing of data assets.

[0097] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than that shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. In the descriptions corresponding to the flowcharts and block diagrams in the accompanying drawings, the operations or steps corresponding to different blocks may also occur in a different order than disclosed in the description, and sometimes there is no specific order between different operations or steps. For example, two consecutive operations or steps may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. Each block in a block diagram and / or flowchart, and combinations of blocks in a block diagram and / or flowchart, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0098] It will be apparent to those skilled in the art that this application is not limited to the details of the exemplary embodiments described above, and that this application can be implemented in other specific forms without departing from its essential characteristics. Therefore, the embodiments described above should be considered exemplary and non-limiting in all respects.

Claims

1. A method for data asset visualization based on business requirements, characterized in that, The method includes the following steps: Acquire heterogeneous data from multiple sources and initially classify it into several major categories of data from the same source based on the data source; For a single data category from the same source, the stable difference value of a single data type in each collection period is obtained by analyzing the difference in probability distribution and numerical change between each collection period and other collection periods. By analyzing the temporal correlation of stable differences between a single data type and other data types within a single data category from the same source, and the dispersion of stable differences of a single data type across all collection periods, the confidence level of stability analysis for a single data type is obtained. The stable difference values ​​of all data types within each collection period are grouped into a sample to be divided. The effective difference distance between any two samples is obtained by analyzing the differences in the stable difference values ​​of various data types between any two samples and by considering the proportion of the stability analysis confidence scores of each data type in the stability analysis confidence scores of all data types. All samples to be divided are then clustered using the effective difference distance. The effective collection period is extracted by the number of samples to be divided in each cluster. Finally, a two-way mapping relationship between the data within the effective collection period and enterprise management indicators is established to visualize the data assets.

2. The business requirement based data asset visualization method of claim 1, wherein, The process for obtaining the stable difference value is as follows: Perform probability distribution statistics on a single data type within each acquisition period to obtain the probability distribution curve of the single data type within each acquisition period; calculate the sum of the two-way KL divergence values ​​of the probability distribution curves of the single data type between each acquisition period and each other acquisition period. Measure the difference between a single data type and each of the other data collection periods when arranged chronologically. By combining the sum and the difference between a single data type in each acquisition cycle and all other acquisition cycles, a stable difference value for a single data type in each acquisition cycle is obtained.

3. The business requirement based data asset visualization method of claim 2, wherein, The calculation process for the stable difference value is as follows: The product of the normalized sum and the difference is denoted as the difference measure product. The stable difference value is obtained by multiplying the difference measure of a single data type between each acquisition period and all other acquisition periods.

4. The business requirement based data asset visualization method of claim 1, wherein, The process for obtaining the confidence level for the stability analysis is as follows: The stable difference values ​​of various data in all collection periods are arranged in time sequence to form a stable difference value sequence for each data type; the mean of the absolute values ​​of the correlation coefficients between a single data type and all other data types in the stable difference value sequence is calculated. The stability analysis confidence level is positively correlated with the mean and negatively correlated with the dispersion.

5. The data asset visualization method based on business needs as described in claim 4, characterized in that, The confidence level for stability analysis is the ratio of the mean to the dispersion.

6. The data asset visualization method based on business needs as described in claim 1, characterized in that, The process of obtaining the effective difference distance is as follows: Calculate the deviation of the stable difference values ​​of various data between any two samples to be divided; Calculate the arithmetic mean of the stable differences in various data between any two samples to be divided; The effective difference distance is positively correlated with the deviation of all types of data between any two samples to be divided, positively correlated with the proportion of all types of data, and negatively correlated with the arithmetic mean of all types of data between any two samples to be divided.

7. The data asset visualization method based on business needs as described in claim 6, characterized in that, The calculation process for the effective difference distance is as follows: The ratio of the deviation of various data between any two samples to be divided to the arithmetic mean is denoted as the relative deviation value. The effective difference distance is positively correlated with the relative deviation value of all types of data between any two samples to be divided, and is also positively correlated with the numerical proportion of all types of data.

8. The data asset visualization method based on business needs as described in claim 7, characterized in that, The method for calculating the effective difference distance is as follows: The product of the relative deviation value of various data between any two samples to be divided and the proportion of the values ​​is denoted as the comprehensive product; The effective difference distance is the average of the comprehensive product of all types of data between any two samples to be divided.

9. The data asset visualization method based on business needs as described in claim 1, characterized in that, The effective acquisition period is the acquisition period containing the samples to be divided in the cluster with the largest number of samples to be divided.

10. A data asset visualization system based on business needs, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the data asset visualization method based on business needs as described in any one of claims 1-9.