Data asset directory intelligent recommendation method based on graph calculation and multi-dimensional association mining
By constructing a knowledge graph of data assets and a graph neural network, combined with user interest modeling, the problem of low relevance in data asset recommendations in existing technologies is solved, and more accurate and efficient data asset recommendations are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUBEI CENT CHINA TECH DEV OF ELECTRIC POWER
- Filing Date
- 2026-02-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing data asset catalog systems fail to fully utilize the inherent relationships between data assets when recommending them, resulting in low relevance of the recommendation results and an inability to effectively mine multidimensional features, thus requiring improvement in recommendation accuracy.
By constructing a knowledge graph of data assets, extracting multi-dimensional related feature vectors, and using graph neural networks for embedding computation, combined with user interest modeling, and comprehensively considering user preferences and the correlation of data assets, intelligent recommendations are made.
It improves the relevance and accuracy of recommendation results, enables the discovery of data assets that are relevant to user interests and have potential value, and enhances the coverage and efficiency of recommendations.
Smart Images

Figure CN121722979B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing and information retrieval technology, specifically to an intelligent recommendation method for data asset catalogs based on graph computing and multidimensional association mining. Background Technology
[0002] As enterprises deepen their digital transformation, they accumulate massive amounts of data assets, including data tables, datasets, data interfaces, and data reports. To effectively manage and utilize these data assets, enterprises typically establish data asset catalog systems to uniformly register, classify, and display them. However, with the rapid growth in the scale of data assets, how to help users quickly discover the data they need from massive amounts of data assets has become a significant challenge in the field of data asset management.
[0003] Existing data asset catalog systems primarily use keyword search and category browsing to help users find data. Keyword search requires users to know the name and key characteristics of the data they need, which makes it difficult for users unfamiliar with data assets to accurately express their search intent. While category browsing provides a hierarchical data organization structure, when the amount of data assets is large, users need to browse through a large number of categories, which is inefficient.
[0004] To address the aforementioned issues, some data asset catalog systems have introduced recommendation functions, automatically recommending relevant data assets based on users' historical behavior and preferences. However, existing recommendation methods have the following shortcomings:
[0005] First, it only considers users' historical access behavior and ignores the inherent relationships between data assets, resulting in low relevance of the recommendation results;
[0006] Second, the use of a simple collaborative filtering algorithm fails to fully explore the multidimensional features of data assets, and the accuracy of the recommendations needs to be improved.
[0007] Third, the failure to effectively utilize the semantic and structural relationships between data assets makes it difficult to discover potential high-value data assets.
[0008] Therefore, there is a need for an intelligent recommendation method that can fully utilize the multidimensional relationships between data assets to improve the accuracy and relevance of recommendations. Summary of the Invention
[0009] The purpose of this invention is to provide an intelligent recommendation method for data asset catalogs based on graph computing and multidimensional association mining.
[0010] To achieve the above objectives, the present invention provides the following technical solution:
[0011] A data asset catalog intelligent recommendation method based on graph computing and multidimensional association mining includes the following steps:
[0012] S1. Data Asset Acquisition and Preprocessing: Collect metadata information of data assets, clean, deduplicate and standardize the metadata information to obtain a standardized data asset set;
[0013] S2. Data Asset Knowledge Graph Construction: Based on the standardized data asset set, extract data asset entities and the relationships between entities to construct a data asset knowledge graph.
[0014] The data asset entities include data table entities, field entities, business theme entities, and data tag entities;
[0015] The relationships include attribution, referencing, lineage, and semantic similarity.
[0016] S3. Multidimensional association feature extraction: Based on the data asset knowledge graph, extract multidimensional association feature vectors of data assets from structural, semantic and behavioral dimensions;
[0017] S4. Graph Neural Network Embedding Calculation: Input the data asset knowledge graph into the graph neural network model, and calculate the embedding vector of each data asset node through neighborhood aggregation and feature propagation;
[0018] S5. User interest modeling: Obtain the target user's historical access records and operation behavior data, and construct a user interest feature vector based on the historical access records and operation behavior data;
[0019] S6. Intelligent Recommendation Calculation: The multidimensional association feature vector, the embedding vector, and the user interest feature vector are fused and calculated to obtain the recommendation score of the candidate data assets. The data assets are sorted from high to low according to the recommendation score, and a data asset catalog recommendation list is output.
[0020] As a further aspect of the present invention: in step S1, the metadata information includes data asset name, data asset description, data asset type, data asset creation time, data asset update time, data asset department, and data asset access permissions;
[0021] The standardization process includes: unifying data formats, standardizing naming rules, and filling in missing fields.
[0022] As a further aspect of the present invention: the method for constructing a data asset knowledge graph in step S2 includes:
[0023] S21. Entity Recognition: Using a named entity recognition algorithm, identify data table entities, field entities, business theme entities, and data tag entities from the standardized data asset set;
[0024] S22. Relation Extraction: Based on preset relation templates and semantic analysis methods, extract the attribution relationship, reference relationship, lineage relationship and semantic similarity relationship between entities;
[0025] S23. Graph storage: The identified entities are used as nodes and the extracted relationships are used as edges to construct and store the data asset knowledge graph.
[0026] As a further aspect of the present invention: in step S3, the method for extracting the multidimensional associated feature vector includes:
[0027] S31. Structural dimension feature extraction: Calculate the degree centrality value, betweenness centrality value, and clustering coefficient value of each data asset node in the data asset knowledge graph, and combine the degree centrality value, betweenness centrality value, and clustering coefficient value into a structural feature vector;
[0028] S32. Semantic Dimension Feature Extraction: A pre-trained language model is used to encode the names and descriptions of the data assets to obtain semantic feature vectors;
[0029] S33. Behavioral dimension feature extraction: Statistically calculate the access frequency, number of favorites, and number of downloads for each data asset, and combine the access frequency, number of favorites, and number of downloads after normalization into a behavioral feature vector;
[0030] S34. Feature Fusion: The structural feature vector, semantic feature vector, and behavioral feature vector are concatenated to obtain the multidimensional associated feature vector.
[0031] As a further aspect of the present invention: in step S4, the graph neural network model employs a graph attention network, and the calculation process of the graph attention network includes:
[0032] S41. Initialize embedding: Use the multidimensional association feature vector of each data asset node as the initial node embedding vector;
[0033] S42. Attention Weight Calculation: For each node, calculate the attention weight between it and its neighboring nodes;
[0034] S43, Neighborhood Aggregation: Weighted aggregation of the embedding vectors of neighboring nodes based on the attention weights;
[0035] S44. Feature Update: Combine the aggregated neighborhood features with the current node features, and update the node embedding vector using a non-linear activation function;
[0036] S45. Multi-layer propagation: Repeat steps S42 to S44 for a total of K layers, where K is the preset number of propagation layers, to obtain the final node embedding vector.
[0037] As a further aspect of the present invention: the method for constructing the user interest feature vector in step S5 includes:
[0038] S51. Historical Behavior Extraction: Obtain data asset records of the target user's access, collection, and download within a preset time window;
[0039] S52. Interest Weight Calculation: Calculate the interest weight of each historical behavior record based on behavior type and time decay factor.
[0040] S53. Interest Vector Aggregation: The embedding vectors of the data assets that the target user has interacted with in the past are weighted and summed according to the interest weights to obtain the user interest feature vector.
[0041] As a further aspect of the present invention: in step S52, the formula for calculating the interest weight is:
[0042] ;
[0043] in, Indicates the first Interest weights for each historical behavior record;
[0044] This indicates the weight of the behavior type: the weight of the access behavior is 0.3, the weight of the favorite behavior is 0.5, and the weight of the download behavior is 0.8.
[0045] This represents the time decay coefficient, with a value of 0.1.
[0046] Indicates the first The number of days between each historical behavior record and the current moment;
[0047] Represents the natural constant.
[0048] As a further aspect of the present invention: in step S6, the method for calculating the recommendation score includes:
[0049] S61. Similarity Calculation: Calculate the cosine similarity between the user interest feature vector and the embedding vector of each candidate data asset to obtain the interest matching score;
[0050] S62. Relevance Calculation: Based on the data asset knowledge graph, calculate the graph distance between candidate data assets and user historical interaction data assets, and convert the graph distance into a relevance score.
[0051] S63. Comprehensive score calculation: The interest matching score and the relevance score are weighted and summed to obtain the recommendation score.
[0052] As a further aspect of the present invention: in step S63, the formula for calculating the recommendation score is:
[0053] ;
[0054] in, Indicates the first Recommendation scores for each candidate data asset;
[0055] Indicates the first Interest matching scores for each candidate data asset;
[0056] Indicates the first The correlation score of each candidate data asset;
[0057] This represents the interest matching weight coefficient, with a value of 0.6.
[0058] This represents the correlation weight coefficient, with a value of 0.4.
[0059] As a further aspect of the present invention: in step S62, the formula for calculating the correlation score is:
[0060] ;
[0061] in, Indicates the first The correlation score of each candidate data asset;
[0062] Indicates the first The shortest graph distance between each candidate data asset and all data assets in the user's historical interactions;
[0063] The shortest graph distance is calculated in the data asset knowledge graph using a breadth-first search algorithm.
[0064] Compared with the prior art, the beneficial effects of the present invention by adopting the above technical solution are as follows:
[0065] (1) By constructing a knowledge graph of data assets, this invention can explicitly model the ownership, reference, lineage and semantic similarity relationships between data assets, which can fully explore the inherent connections between data assets and improve the relevance of recommendation results.
[0066] (2) This invention extracts multidimensional association features of data assets from structural, semantic and behavioral dimensions, comprehensively characterizes the feature information of data assets, and provides rich feature support for accurate recommendation.
[0067] (3) The present invention uses graph neural networks for embedding computation. Through neighborhood aggregation and feature propagation mechanisms, the embedding vector of data assets can be integrated with the information of its neighboring nodes, thereby enhancing the expressive power of feature representation.
[0068] (4) This invention combines user interest modeling and graph association analysis, taking into account user preferences and data asset correlation, and can discover data assets that are related to user interests and have potential value, thereby improving the accuracy and coverage of recommendations. Attached Figure Description
[0069] Figure 1 This is an overall flowchart of the intelligent recommendation method for data asset catalogs based on graph computing and multidimensional association mining, as presented in this invention.
[0070] Figure 2 This is a flowchart illustrating the construction process of the data asset knowledge graph for this invention.
[0071] Figure 3 This is a flowchart of the multidimensional correlation feature extraction process of the present invention.
[0072] Figure 4 This is a flowchart of the neural network embedding computation process of the present invention.
[0073] Figure 5 This is a flowchart of the user interest modeling process of the present invention.
[0074] Figure 6 This is a flowchart of the intelligent recommendation calculation process of the present invention.
[0075] Figure 7 This is a schematic diagram of the data asset knowledge graph structure of the present invention. Detailed Implementation
[0076] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments.
[0077] Example 1
[0078] like Figure 1 As shown, this invention provides an intelligent recommendation method for data asset catalogs based on graph computation and multidimensional association mining, comprising the following steps:
[0079] Step S1: Data asset collection and preprocessing.
[0080] This step involves collecting metadata information from data assets, cleaning, deduplicating, and standardizing the metadata information to obtain a standardized set of data assets.
[0081] Specifically, metadata information includes data asset name, data asset description, data asset type, data asset creation time, data asset update time, data asset department, and data asset access permissions. Data asset types include data tables, datasets, data interfaces, and data reports.
[0082] Cleaning processes include: removing whitespace characters, special characters, and invalid content from metadata; deduplication processes include: removing duplicate data asset records based on the unique identifier of the data asset; and standardization processes include: unifying data formats (e.g., unifying the time format to "YYYY-MM-DD HH:MM:SS"), standardizing naming rules (e.g., using camelCase naming) and filling missing fields (e.g., filling missing description fields with "No description available").
[0083] Step S2: Construction of data asset knowledge graph.
[0084] like Figure 2 As shown, this step is based on a standardized set of data assets, extracts data asset entities and the relationships between entities, and constructs a data asset knowledge graph.
[0085] Data asset entities include four types:
[0086] Data table entity: Represents a data table in the database, including table name, table description, and database information;
[0087] Field entity: Represents a field in a data table, including field name, field type, and field description information;
[0088] Business subject entity: Represents the business domain to which the data asset belongs, such as "customer management" or "sales analysis";
[0089] Data tag entity: Represents the tag attributes of data assets, such as "core data" or "sensitive data".
[0090] There are four types of relationships:
[0091] Attribution: This indicates that the field entity belongs to the data table entity, and the data table entity belongs to the business subject entity;
[0092] Foreign key relationships: These represent foreign key relationships between data tables.
[0093] Lineage: Indicates the data flow and processing relationships between data tables;
[0094] Semantic similarity relation: Represents the semantic similarity between data assets based on name and description text.
[0095] like Figure 7 As shown, the specific methods for constructing a data asset knowledge graph include:
[0096] Step S21, Entity Recognition: The Named Entity Recognition (NAME) algorithm is used to identify data table entities, field entities, business theme entities, and data tag entities from the standardized data asset set. Specifically, by parsing the structured metadata information of the data assets, the data table name is extracted as the data table entity, the field name is extracted as the field entity, the business classification information is extracted as the business theme entity, and the tag information is extracted as the data tag entity.
[0097] Step S22, Relationship Extraction: Based on the preset relationship template and semantic analysis method, extract the attribution relationship, reference relationship, lineage relationship and semantic similarity relationship between entities.
[0098] For attribution relationships, they are directly extracted based on the inclusion relationship between the data table and the field, and the classification relationship between the data table and the business theme.
[0099] For reference relationships, the foreign key constraint information of the data table is parsed for extraction.
[0100] For blood relations, extraction is performed by parsing the input and output configuration information of the data processing task.
[0101] For semantic similarity relationships, a pre-trained language model is used to calculate the semantic similarity between the data asset name and the description text. When the semantic similarity is greater than a preset threshold (set to 0.8 in this embodiment), a semantic similarity relationship is established.
[0102] Step S23, Graph Storage: Using the identified entities as nodes and the extracted relationships as edges, construct and store the data asset knowledge graph using a graph database (such as Neo4j graph database software, or other graph database management systems that support attribute graph models).
[0103] Step S3: Extraction of multidimensional correlation features.
[0104] like Figure 3 As shown, this step is based on the data asset knowledge graph and extracts multidimensional association feature vectors of data assets from structural, semantic and behavioral dimensions.
[0105] Step S31, Structural Dimension Feature Extraction: Calculate the degree centrality, betweenness centrality, and clustering coefficient of each data asset node in the data asset knowledge graph, and combine the degree centrality, betweenness centrality, and clustering coefficient into a structural feature vector.
[0106] Degree centrality represents the number of connections between nodes, reflecting the breadth of associations in data assets. Betweenness centrality represents the degree to which a node acts as a mediator between other nodes in the shortest path, reflecting the bridging role of data assets. Clustering coefficient represents the tightness of connections between a node's neighbors, reflecting the density of the local network in which the data assets reside.
[0107] Step S32, Semantic Dimension Feature Extraction: The name and description text of the data asset are encoded using a pre-trained language model to obtain a semantic feature vector. In this embodiment, the BERT model is used as the pre-trained language model. The name and description text of the data asset are concatenated and input into the BERT model, and the output CLS vector is taken as the semantic feature vector.
[0108] Step S33, Behavioral Dimension Feature Extraction: Count the access frequency, collection frequency, and download frequency of each data asset. Normalize the access frequency, collection frequency, and download frequency and combine them into a behavioral feature vector. The normalization process adopts the minimum-maximum normalization method to map each indicator value to the [0,1] interval.
[0109] Step S34, Feature Fusion: Concatenate the structural feature vector (3 dimensions), semantic feature vector (768 dimensions), and behavioral feature vector (3 dimensions) to obtain a multidimensional associated feature vector with a dimension of 774.
[0110] Step S4: Graph neural network embedding computation.
[0111] like Figure 4 As shown, this step inputs the data asset knowledge graph into the graph neural network model, and calculates the embedding vector of each data asset node through neighborhood aggregation and feature propagation.
[0112] The graph neural network model uses a graph attention network (GAT), and its computation process includes:
[0113] Step S41, Initialize Embedding: Use the multidimensional association feature vector of each data asset node as the initial node embedding vector, denoted as... ,in Indicates the node number.
[0114] Step S42, Attention Weight Calculation: For each node Calculate its relationship with neighboring nodes The attention weights between nodes are first determined by... and nodes The embedding vectors are concatenated after a linear transformation, and then the attention score is calculated using the attention vector. The attention score is then normalized using the softmax function to obtain the attention weights. .
[0115] Step S43, Neighborhood Aggregation: Based on the attention weights, the embedding vectors of neighboring nodes are weighted and aggregated to obtain the aggregated vector. .
[0116] Step S44, Feature Update: Combine the aggregated neighborhood features with the current node features, and update the node embedding vector using a non-linear activation function (ReLU activation function is used in this embodiment) to obtain... ,in Indicates the current floor number.
[0117] Step S45, Multi-layer Propagation: Repeat steps S42 to S44 for a total of K layers, where K is the preset number of propagation layers (K=2 in this embodiment), to obtain the final node embedding vector. .
[0118] Step S5: User interest modeling.
[0119] like Figure 5 As shown, this step obtains the target user's historical access records and operation behavior data, and constructs a user interest feature vector based on the historical access records and operation behavior data.
[0120] Step S51, Historical Behavior Extraction: Obtain the data asset records of the target user's access, collection and download within a preset time window (30 days in this embodiment). Each record includes user ID, data asset ID, behavior type and behavior time.
[0121] Step S52, Interest Weight Calculation: Calculate the interest weight for each historical behavior record based on the behavior type and time decay factor. The formula for calculating the interest weight is as follows:
[0122] ;
[0123] in, Indicates the first Interest weights for each historical behavior record;
[0124] This indicates the weight of the behavior type: the weight of the access behavior is 0.3, the weight of the favorite behavior is 0.5, and the weight of the download behavior is 0.8.
[0125] This represents the time decay coefficient, with a value of 0.1.
[0126] Indicates the first The number of days between each historical behavior record and the current moment;
[0127] This represents the natural constant, with a value of approximately 2.71828.
[0128] The formula is designed based on the principle that downloading behavior indicates that users have a clear need to use data assets, and therefore has a high weight.
[0129] The act of collecting indicates that users have a willingness to continue to pay attention to data assets, and its weight is secondary;
[0130] Access behavior may be browsing-like and has a low weight. The time decay factor makes recent behavior have a higher weight than earlier behavior, reflecting the timeliness of user interests.
[0131] Step S53, Interest Vector Aggregation: The embedding vectors of the target user's historically interacted data assets are weighted and summed according to interest weights to obtain the user's interest feature vector. Let the user... The collection of historical interaction data assets is The corresponding embedding vector is Then the user interest feature vector The calculation formula is:
[0132] ;
[0133] in, Indicates user Interest feature vectors;
[0134] This indicates the amount of data assets representing the user's historical interactions;
[0135] Indicates the first Interest weights for each historical behavior record;
[0136] Indicates the first An embedding vector of a historical interactive data asset.
[0137] Step S6: Intelligent recommendation calculation.
[0138] like Figure 6 As shown, this step fuses and calculates the multidimensional association feature vector, embedding vector, and user interest feature vector to obtain the recommendation score of the candidate data assets. The data asset catalog recommendation list is then output by sorting the candidates from high to low according to their recommendation scores.
[0139] Step S61, Similarity Calculation: Calculate the cosine similarity between the user's interest feature vector and the embedding vector of each candidate data asset to obtain the interest matching score. The formula for calculating the cosine similarity is:
[0140] ;
[0141] in, Indicates the first Interest matching scores for each candidate data asset;
[0142] Represents a user interest feature vector;
[0143] Indicates the first Embedding vectors of candidate data assets;
[0144] This represents the dot product operation of two vectors;
[0145] Representing vectors The modulus length;
[0146] Representing vectors The length of the module.
[0147] Step S62, Relevance Calculation: Based on the data asset knowledge graph, calculate the graph distance between candidate data assets and user historical interaction data assets, and convert the graph distance into a relevance score. The formula for calculating the relevance score is as follows:
[0148] ;
[0149] in, Indicates the first The correlation score of each candidate data asset;
[0150] Indicates the first The shortest graph distance between each candidate data asset and all data assets in the user's historical interactions;
[0151] The shortest graph distance is calculated in the data asset knowledge graph using a breadth-first search algorithm.
[0152] The design principle of this formula is: the closer the candidate data asset is to the user's historical interaction data asset in the graph, the higher the correlation score. When the score is 0 (i.e., the candidate data assets are the data assets that the user has interacted with in the past), the relevance score is 1.
[0153] when When the value is 1, the relevance score is 0.5;
[0154] As distance increases, the correlation score gradually decreases.
[0155] Step S63, Comprehensive Score Calculation: The interest matching score and relevance score are weighted and summed to obtain the recommendation score. The formula for calculating the recommendation score is:
[0156] ;
[0157] in, Indicates the first Recommendation scores for each candidate data asset;
[0158] Indicates the first Interest matching scores for each candidate data asset;
[0159] Indicates the first The correlation score of each candidate data asset;
[0160] This represents the interest matching weight coefficient, with a value of 0.6.
[0161] This represents the correlation weight coefficient, with a value of 0.4.
[0162] Step S64: Sorting and Outputting: Sort all candidate data assets from high to low according to their recommendation scores, take the top N (N=10 in this embodiment) data assets as the recommendation results, and output the data asset catalog recommendation list.
[0163] Example 2
[0164] This embodiment uses a company's data asset catalog system as an example to illustrate the specific application of the method of the present invention.
[0165] The company has 1,000 data assets, including 600 data tables, 200 datasets, 100 data interfaces, and 100 data reports. Using the method of this invention, the metadata information of these 1,000 data assets is first collected and then cleaned, deduplicated, and standardized.
[0166] Then, a knowledge graph of data assets was constructed, identifying 1,000 data table / dataset / data interface / data report entities, 5,000 field entities, 50 business theme entities, and 100 data tag entities. A total of 10,000 attribution relationships, 500 reference relationships, 300 lineage relationships, and 2,000 semantic similarity relationships were extracted.
[0167] Next, the multidimensional association feature vector of each data asset is extracted, and the node embedding vector is calculated through a two-layer graph attention network.
[0168] When a user accesses the data asset catalog, the system retrieves the user's historical behavior records for the past 30 days, constructs a user interest feature vector, calculates the recommendation score for all candidate data assets, and returns the top 10 data assets with the highest scores as the recommendation results.
[0169] Actual testing showed that the click-through rate of the recommended results increased by 35% after adopting the method of this invention, and the average time for users to find the target data asset was shortened by 50%, effectively improving the discovery efficiency of data assets.
Claims
1. A data asset catalog intelligent recommendation method based on graph computing and multidimensional association mining, characterized in that, Includes the following steps: S1. Data Asset Acquisition and Preprocessing: Collect metadata information of data assets, clean, deduplicate and standardize the metadata information to obtain a standardized data asset set; S2. Data Asset Knowledge Graph Construction: Based on the standardized data asset set, extract data asset entities and the relationships between entities to construct a data asset knowledge graph. The data asset entities include data table entities, field entities, business theme entities, and data tag entities; The relationships include attribution, referencing, lineage, and semantic similarity. The semantic similarity relationship is established by using a pre-trained language model to calculate the semantic similarity between the data asset name and the description text, and when the semantic similarity is greater than a preset threshold; S3. Multidimensional association feature extraction: Based on the data asset knowledge graph, extract multidimensional association feature vectors of data assets from structural, semantic and behavioral dimensions; The structural feature vector, semantic feature vector, and behavioral feature vector are concatenated to obtain the multidimensional association feature vector; S4. Graph Neural Network Embedding Calculation: Input the data asset knowledge graph into the graph attention network model, use the multidimensional association feature vector as the initial node embedding vector, and calculate the embedding vector of each data asset node through attention weight calculation, neighborhood aggregation and feature propagation. S5. User interest modeling: Obtain the target user's historical access records and operation behavior data, and construct a user interest feature vector based on the historical access records and operation behavior data; S6. Intelligent Recommendation Calculation: The multidimensional association feature vector, the embedding vector, and the user interest feature vector are fused and calculated to obtain the recommendation score of the candidate data assets. The data assets are sorted from high to low according to the recommendation score, and a data asset catalog recommendation list is output.
2. The intelligent recommendation method for data asset catalogs based on graph computation and multidimensional association mining according to claim 1, characterized in that, In step S1, the metadata information includes data asset name, data asset description, data asset type, data asset creation time, data asset update time, data asset department, and data asset access permissions. The standardization process includes: unifying data formats, standardizing naming rules, and filling in missing fields.
3. The intelligent recommendation method for data asset catalogs based on graph computation and multidimensional association mining according to claim 1, characterized in that, In step S2, the method for constructing the data asset knowledge graph includes: S21. Entity Recognition: Using a named entity recognition algorithm, identify data table entities, field entities, business theme entities, and data tag entities from the standardized data asset set; S22. Relation Extraction: Based on preset relation templates and semantic analysis methods, extract the attribution relationship, reference relationship, lineage relationship and semantic similarity relationship between entities; S23. Graph storage: The identified entities are used as nodes and the extracted relationships are used as edges to construct and store the data asset knowledge graph.
4. The intelligent recommendation method for data asset catalogs based on graph computation and multidimensional association mining according to claim 1, characterized in that, In step S3, the method for extracting the multidimensional associated feature vector includes: S31. Structural dimension feature extraction: Calculate the degree centrality value, betweenness centrality value, and clustering coefficient value of each data asset node in the data asset knowledge graph, and combine the degree centrality value, betweenness centrality value, and clustering coefficient value into a structural feature vector; S32. Semantic Dimension Feature Extraction: A pre-trained language model is used to encode the names and descriptions of the data assets to obtain semantic feature vectors; S33. Behavioral dimension feature extraction: Statistically calculate the access frequency, number of favorites, and number of downloads for each data asset, and combine the access frequency, number of favorites, and number of downloads after normalization into a behavioral feature vector; S34. Feature Fusion: The structural feature vector, semantic feature vector, and behavioral feature vector are concatenated to obtain the multidimensional associated feature vector.
5. The intelligent recommendation method for data asset catalogs based on graph computation and multidimensional association mining according to claim 1, characterized in that, In step S4, the graph neural network model employs a graph attention network, and the computation process of the graph attention network includes: S41. Initialize embedding: Use the multidimensional association feature vector of each data asset node as the initial node embedding vector; S42. Attention Weight Calculation: For each node, calculate the attention weight between it and its neighboring nodes; S43, Neighborhood Aggregation: Weighted aggregation of the embedding vectors of neighboring nodes based on the attention weights; S44. Feature Update: Combine the aggregated neighborhood features with the current node features, and update the node embedding vector using a non-linear activation function; S45. Multi-layer propagation: Repeat steps S42 to S44 for a total of K layers, where K is the preset number of propagation layers, to obtain the final node embedding vector.
6. The intelligent recommendation method for data asset catalogs based on graph computation and multidimensional association mining according to claim 1, characterized in that, In step S5, the method for constructing the user interest feature vector includes: S51. Historical Behavior Extraction: Obtain data asset records of the target user's access, collection, and download within a preset time window; S52. Interest Weight Calculation: Calculate the interest weight of each historical behavior record based on behavior type and time decay factor. S53. Interest Vector Aggregation: The embedding vectors of the data assets that the target user has interacted with in the past are weighted and summed according to the interest weights to obtain the user interest feature vector.
7. The intelligent recommendation method for data asset catalogs based on graph computation and multidimensional association mining according to claim 6, characterized in that, In step S52, the formula for calculating the interest weight is: ; in, Indicates the first Interest weights for each historical behavior record; This indicates the weight of the behavior type: the weight of the access behavior is 0.3, the weight of the favorite behavior is 0.5, and the weight of the download behavior is 0.
8. This represents the time decay coefficient, with a value of 0.
1. Indicates the first The number of days between each historical behavior record and the current moment; Represents the natural constant.
8. The intelligent recommendation method for data asset catalogs based on graph computation and multidimensional association mining according to claim 1, characterized in that, In step S6, the method for calculating the recommendation score includes: S61. Similarity Calculation: Calculate the cosine similarity between the user interest feature vector and the embedding vector of each candidate data asset to obtain the interest matching score; S62. Relevance Calculation: Based on the data asset knowledge graph, calculate the graph distance between candidate data assets and user historical interaction data assets, and convert the graph distance into a relevance score. S63. Comprehensive score calculation: The interest matching score and the relevance score are weighted and summed to obtain the recommendation score.
9. The intelligent recommendation method for data asset catalogs based on graph computation and multidimensional association mining according to claim 8, characterized in that, In step S63, the formula for calculating the recommendation score is: ; in, Indicates the first Recommendation scores for each candidate data asset; Indicates the first Interest matching scores for each candidate data asset; Indicates the first The correlation score of each candidate data asset; This represents the interest matching weight coefficient, with a value of 0.
6. This represents the correlation weight coefficient, with a value of 0.
4.
10. The intelligent recommendation method for data asset catalogs based on graph computation and multidimensional association mining according to claim 8, characterized in that, In step S62, the formula for calculating the relevance score is: ; in, Indicates the first The correlation score of each candidate data asset; Indicates the first The shortest graph distance between each candidate data asset and all data assets in the user's historical interactions; The shortest graph distance is calculated in the data asset knowledge graph using a breadth-first search algorithm.