Hierarchical permission constraint and multi-dimensional feature association industrial data credible traceability method

By using a hierarchical permission constraint and multi-dimensional feature association method, the problems of data security and low retrieval efficiency in industrial data traceability are solved. A complete traceability chain is constructed, and fine-grained permission management and multi-dimensional information integration are realized, thereby improving the accuracy and security of traceability retrieval.

CN122241736APending Publication Date: 2026-06-19BEIJING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING UNIV OF POSTS & TELECOMM
Filing Date
2026-03-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing industrial data traceability methods suffer from insufficient data security protection, low retrieval efficiency, and incomplete traceability chain construction, making it difficult to achieve fine-grained access control, multi-dimensional information integration, and anomaly detection.

Method used

The method of hierarchical permission constraints and multi-dimensional feature association is adopted. By collecting heterogeneous data from multiple sources for preprocessing, quantifying the hierarchical classification and establishing identity level mapping, and combining semantic retrieval and multi-dimensional feature fusion to construct a traceability chain for anomaly detection and auditing.

Benefits of technology

It achieves fine-grained access control, improves the accuracy of source tracing and the integrity of the source chain, and ensures data security and retrieval efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241736A_ABST
    Figure CN122241736A_ABST
Patent Text Reader

Abstract

This invention provides a reliable traceability method for industrial data based on hierarchical permission constraints and multi-dimensional feature association. The method includes: preprocessing multi-source heterogeneous data to generate a standardized dataset; classifying data fields by security level; establishing mapping rules between user identity levels and data field levels and generating permission constraints; retrieving semantically relevant candidate information through vectorized retrieval; extracting temporal, spatial, and business attribute features of nodes and concatenating them into a complete traceability chain through comprehensive similarity calculation; detecting abnormal nodes in the traceability chain, marking anomalies when business attribute values ​​deviate from standard reference values ​​exceeding a threshold; sorting and filtering the link inference results and optimizing the retrieval strategy; verifying the traceability chain and auditing access behavior. This invention ensures data security through hierarchical permission constraints and constructs a complete traceability chain through multi-dimensional feature association, offering advantages such as high security, accurate retrieval, and complete traceability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of industrial data traceability technology, specifically to a reliable industrial data traceability method based on hierarchical permission constraints and multidimensional feature association. Background Technology

[0002] With the deepening of the Industrial Internet and digital transformation, industrial data traceability has become an important means to ensure product quality, optimize production processes, and meet compliance requirements. However, existing industrial data traceability methods face many challenges in practical applications: traditional methods use coarse-grained access control strategies, making it difficult to achieve fine-grained permission management and posing a risk of data leakage; keyword-based retrieval methods struggle to accurately understand user query intent, resulting in insufficient relevance of search results; existing methods fail to effectively integrate multi-dimensional information such as time, space, and business logic, leading to broken or logically inconsistent traceability chains; and anomaly detection relies solely on single-dimensional threshold judgments, lacking comprehensive analysis of multi-dimensional business attributes, resulting in anomaly detection accuracy that fails to meet practical needs. This invention, prioritizing industrial data security, proposes a trusted industrial data traceability method based on hierarchical permission management and multi-dimensional feature association. This method improves the accuracy of traceability retrieval and the integrity of the traceability chain while ensuring data security. Summary of the Invention

[0003] This invention provides a reliable traceability method for industrial data based on hierarchical permission constraints and multidimensional feature association, characterized in that the method includes:

[0004] Step 1: Collect heterogeneous source tracing data from multiple sources, perform preprocessing such as missing value imputation, outlier removal, and weighted normalization to generate a standardized source tracing dataset. ;

[0005] Step 2, based on multiple dimensional indicators The data fields are quantified and classified into several security levels, with each security level corresponding to a level value. ;

[0006] Step 3: Divide user identities into several levels based on user roles. And establish mapping rules between identity levels and data field levels. ;

[0007] Step 4: Upon receiving a source tracing query request, verify the user's identity level and generate permission constraints based on the mapping rules. ,in For the set of all data fields, For fields The security level value is embedded in the search process;

[0008] Step 5, in permission constraints Under the condition of standardized traceability dataset Extracting and querying Relevant candidate information, including user queries Vectorization into query vector The data content within the permission scope is divided into multiple text blocks, and each text block is vectorized to obtain a set of text vectors. By calculating the cosine similarity between the query vector and each text vector. ,in The recall with the highest similarity A set of text blocks as candidate information ;

[0009] Step 6, construct a traceability node A directed graph with vertices and logical relationships as edges. From standardized datasets Extract node features and calculate the comprehensive similarity between nodes. By piecing together the relevant nodes, a complete traceability chain is formed, in which... , , For balance coefficient, This is a measure of temporal similarity based on the difference in node timestamps. Spatial similarity is based on the distance between nodes' geographical locations. Logical similarity is measured based on the similarity of node business attribute vectors;

[0010] Step 7: Perform anomaly node detection on the complete traceability chain generated in Step 6, and calculate the relative deviation of the business attribute value of each node from the standard reference value as the anomaly degree. ,in For nodes The Each business attribute value, This is the standard reference value for this attribute. For a positive number approaching 0, when the anomaly exceeds a preset threshold... Mark it as an abnormal node and output an alarm;

[0011] Step 8: Sort and filter the link inference results, and optimize the retrieval strategy by combining user historical queries and feedback;

[0012] Step 9: Verify the completeness and logical rationality of the generated traceability chain, and audit user access behavior, record legitimate operations and block abnormal access, and store all logs in the trusted audit system.

[0013] Specifically, in step 1, the process of collecting multi-source heterogeneous traceability data, performing missing value imputation, outlier removal, and weighted normalization preprocessing, generates a standardized traceability dataset. This includes: cleaning the collected multi-source heterogeneous data, and using formulas... Weighted normalization is performed, where For the first Dimensional raw data, and These are the global maximum and minimum values ​​for this dimension, respectively. For source relevance weights and , For positive numbers approaching 0; the standardized dataset is denoted as . .

[0014] Specifically, in step 2, the multi-dimensional indicators The data fields are quantified and classified into several security levels, with each security level corresponding to a level value. This includes: multiple dimensions of indicators, including data sensitivity. Necessity of tracing the source and privacy breach risks The range of values ​​for each indicator is [missing information]. The scores are determined by expert ratings or automatically calculated based on data attributes; the weights of each indicator are determined using the analytic hierarchy process (AHP). And based on the overall score Data fields are divided into four security levels: Public level Internal level Sensitive level Highly sensitive level .

[0015] Specifically, in step 3, the user identity is divided into several levels according to the user role. And establish mapping rules between identity levels and data field levels. This includes: dividing user identities into four levels: ordinary users, general participants, core participants, and administrators, which correspond to... The mapping rule is: level User access security level value The data fields are integrated with a dynamic identity adjustment mechanism based on user behavior and credibility.

[0016] Specifically, in step 4, the permission constraints... Based on user identity level Security level of data fields The mapping relationship is generated, including: permission constraints. ,in For the set of all data fields, For fields The security level value; the constraints are injected as filtering conditions into the search engine's query interface, allowing only the search results to contain values ​​that meet the requirements. This allows for fine-grained access control through specific fields.

[0017] Specifically, in step 5, the step of recalling candidate information semantically related to the user query from the standardized dataset using a vectorized retrieval method under permission constraints includes: using a pre-trained language model as an encoder. , will user query Mapped to query vector ,in The dimension is a vector; the permission constraints are... The data content that is allowed to be accessed is divided into a set of text blocks. Each text block is mapped to a text vector using the same encoder. ,in Calculate the cosine similarity between the query vector and each text vector. Sort all text blocks in descending order of similarity score and select the highest-scoring blocks. The text blocks constitute a candidate information set. .

[0018] Specifically, in step 6, the construction of a graph structure based on the multi-dimensional features of nodes, such as time, space, and business attributes, and the splicing of related nodes to form a complete traceability chain through comprehensive similarity calculation, includes: from a standardized dataset... Extract multidimensional features from each node, including timestamps. Geographical coordinates Business attribute vector ,in For business attribute dimensions; calculate any two nodes and Temporal similarity between ,in Time decay factor; spatial similarity Logical similarity The comprehensive similarity between nodes is calculated through weighted fusion. ,in , , The balance coefficients are used to construct a directed graph with the source node as the vertex and the comprehensive similarity as the edge weight. On the constructed directed graph, a path search is performed starting from the starting node, and the node sequence with the highest similarity is selected and spliced ​​together to form a complete tracing chain.

[0019] Specifically, in step 7, the abnormal node detection of the constructed complete traceability chain, which identifies abnormal nodes by calculating the degree of deviation between the business attribute values ​​of each node and the standard reference values, includes: pre-calculating for each business attribute dimension. Set standard reference value The standard reference value is determined based on historical data statistics or domain knowledge; for each traceability node Calculate the values ​​of each business attribute. Compared with the corresponding standard reference value relative deviation The maximum deviation among all dimensions is taken as the outlier of that node. Compare the anomaly level with a preset threshold. In comparison, when When this happens, mark the node as an abnormal node and record the abnormal dimension information.

[0020] Specifically, in step 8, the process of sorting and filtering the multiple possible results obtained from the link inference, and dynamically optimizing the retrieval strategy in conjunction with the user's historical query records and feedback behavior, includes: constructing a comprehensive evaluation function. The inference results for each link are sorted and filtered, among which... Score the relevance of the answer. Score for factual accuracy. Weighting coefficients for contextual accuracy scores satisfy And dynamically adjust based on users' historical queries and feedback.

[0021] Specifically, in step 9, the process of verifying the integrity and logical rationality of the generated traceability chain, and simultaneously auditing each user's data access behavior in real time, recording legitimate operations and intercepting abnormal access, includes: verifying the integrity of the generated traceability chain; calculating the arithmetic mean of the comprehensive similarity between all adjacent nodes on the traceability chain; when the average value is lower than a preset threshold, it is determined to be logically unreasonable and a re-retrieval is triggered; and auditing user access behavior, when the security level of the field requested by the user is... Higher than user identity level If the access is deemed unauthorized and immediately blocked, an audit log will be recorded.

[0022] This invention addresses the problems of insufficient data security protection, low retrieval efficiency, and incomplete traceability chain construction in existing industrial data traceability methods. It proposes a trusted industrial data traceability method based on hierarchical permission constraints and multi-dimensional feature association. This method ensures data security through fine-grained permission control, improves query accuracy through semantic retrieval, and constructs a complete traceability chain through multi-dimensional feature fusion. It has the advantages of high security, accurate retrieval, and complete traceability. Attached Figure Description

[0023] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0024] Figure 1 A flowchart illustrating a trusted traceability method for industrial data that associates hierarchical access constraints with multidimensional features;

[0025] Figure 2 This is a flowchart of the traceability chain construction and anomaly detection in the trusted traceability method for industrial data. Detailed Implementation

[0026] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0027] Existing industrial data traceability methods suffer from insufficient data security, low retrieval efficiency, and incomplete traceability chains. This invention addresses these issues by proposing a trusted industrial data traceability method based on hierarchical permission constraints and multi-dimensional feature association. It ensures data security through fine-grained permission control, improves query accuracy through semantic retrieval, and constructs a complete traceability chain through multi-dimensional feature fusion. The specific steps are as follows:

[0028] S101: Collect multi-source heterogeneous traceability data, perform missing value imputation, outlier removal and normalization preprocessing to generate a standardized traceability dataset.

[0029] Specifically, multi-source heterogeneous traceability data, including structured and unstructured data, is collected from multiple data sources such as industrial production lines, logistics systems, and quality inspection. The collected raw data is cleaned, and missing values ​​are filled using mean imputation or regression imputation methods. Outliers are identified and removed using methods based on statistical distribution or clustering. The cleaned data undergoes weighted normalization preprocessing. Dimensional raw data Through formula Calculate its normalized value, where and These are the global maximum and minimum values ​​for this dimension in historical data, respectively. For the pre-defined traceability relevance weights based on domain knowledge or business importance and satisfying , To prevent extremely small positive numbers with a denominator of zero, the normalized values ​​of all dimensions together constitute a standardized source dataset. .

[0030] S102: Based on multiple dimensions of indicators, the data field is divided into several security levels, and each security level corresponds to a level value.

[0031] Specifically, select data sensitivity Necessity of tracing the source and privacy breach risks As dimensions for quantitative grading, the values ​​of each indicator are all within... The range is defined as follows: larger values ​​indicate higher sensitivity, stronger necessity, or greater risk. A judgment matrix is ​​constructed using the analytic hierarchy process (AHP). The eigenvector corresponding to the largest eigenvalue of the judgment matrix is ​​calculated and normalized to obtain the weight coefficients of each indicator. For each data field, the values ​​of the three dimensions of indicators are evaluated based on its attribute characteristics, and the results are calculated according to the overall score. Calculate the security score for this field; based on the score, divide the data field into four security levels: when The time is divided into public level, corresponding to the level value. ;when Time is divided into internal levels, corresponding to level values. ;when Time is divided into sensitivity levels, corresponding to level values. ;when The time is divided into highly sensitive levels, corresponding to the level values. Establish and save a field-security level mapping table to record the correspondence between each data field and its security level value.

[0032] S103: Divide user identities into several levels based on user roles, and establish mapping rules between identity levels and data field levels.

[0033] Specifically, based on the user's role, responsibilities, and scope of authority in the traceability system, user identities are divided into four levels: ordinary users, general participants, core participants, and administrators, each corresponding to an identity level value. Ordinary users have only basic query permissions, general participants can access data related to their business, core participants have broader data access permissions, and administrators have the highest level of data access permissions; the mapping rule is established based on identity level. User access security level value All data fields are accessible to ordinary users, public-level data; general participants can access public and internal-level data; core participants can access public, internal, and sensitive-level data; and administrators can access data at all security levels. A dynamic identity adjustment mechanism based on user behavior and trustworthiness is also introduced, dynamically adjusting the identity level according to the user's cumulative active time and historical behavior trustworthiness score, allowing long-term active and trustworthy users to receive temporary identity level upgrades.

[0034] S104: Upon receiving a source tracing query request, verify the user's identity level, generate permission constraints according to the mapping rules, and embed them into the retrieval process.

[0035] Specifically, when a user initiates a source tracing query request, the system first extracts the user's identity identifier from the request, verifies the user's identity through the identity authentication system, and obtains the user's current identity level. Then iterate through all data field collections. Query the pre-established field-security level mapping table to retrieve each field. Corresponding security level value Based on the mapping rules established in step 3 Filter out the fields that the current user has the right to access and generate a field whitelist. The whitelist is converted into filtering predicates that the database query interface can recognize, such as field selection clauses in SQL queries or projection conditions in NoSQL queries, and injected into the retrieval process to ensure that all subsequent data access operations are only performed within the fields specified by the whitelist, thereby achieving fine-grained access control.

[0036] S105: Under permission constraints, retrieve candidate information semantically relevant to the user query from a standardized dataset using vectorized retrieval.

[0037] Specifically, a pre-trained language model is used as the encoder. , will use the natural language query input by the user Mapped to dense vectors Set permission constraints The data content that is allowed to be accessed is divided into sets of text blocks according to fixed length or semantic boundaries. Each text block is mapped to a text vector using the same encoder. ,in Calculate the cosine similarity between the query vector and each text vector. The closer the value is to 1, the stronger the semantic relevance. All text blocks are sorted in descending order of similarity score, and the highest-scoring blocks are selected using either a max-heap or fast-select algorithm. Each text block constitutes a candidate information set. .

[0038] S106: Construct a graph structure based on the time, space and business attributes of nodes, and form a traceability chain by splicing them together through comprehensive similarity calculation.

[0039] Specifically, from standardized datasets Extract each source node Multidimensional features, including timestamps recording the occurrence of events. (Usually in Unix timestamp format), geographic coordinates (Longitude and latitude) and a vector composed of multiple business attributes ,in For business attribute dimensions, including business information related to traceability such as temperature, humidity, pressure, equipment status, and operators; calculate any two nodes and Temporal similarity between ,in The time decay factor is used; the smaller the time difference, the higher the similarity. Spatial similarity is calculated. ,in The similarity is calculated using Euclidean distance; the smaller the distance, the higher the similarity. Logical similarity is also calculated. The cosine similarity of the business attribute vectors reflects the degree of similarity between two nodes in terms of business attributes; the comprehensive similarity between nodes is calculated through weighted fusion. ,in , , The balance coefficients are used to construct a directed graph with the source node as the vertex and the comprehensive similarity as the edge weight. On the constructed directed graph, a greedy path search is performed starting from the starting node. Each time, the next node with the highest comprehensive similarity to the current node is selected until it can no longer be extended. Finally, the searched node sequence is spliced ​​together to form a complete tracing chain.

[0040] S107: Calculate the deviation between the business attribute values ​​of the calculated node and the standard reference values ​​to identify abnormal nodes in the traceability chain.

[0041] Specifically, for each business attribute dimension, pre-defined Set standard reference value The standard reference values ​​are determined based on the statistical distribution characteristics of historical data (such as mean and median) or the knowledge and experience of domain experts. For certain business attributes with clear specifications, the standard reference values ​​can directly adopt industry standards or process requirements. For each traceability node... Calculate the values ​​of each of its business attributes one by one. Compared with the corresponding standard reference value relative deviation ,in To prevent division by zero errors, the smallest positive number is used; the maximum deviation across all dimensions is taken as the outlier of that node. Compare the anomaly level with a preset threshold. In comparison, if Then determine the node It identifies abnormal nodes and outputs an alarm containing the abnormal node identifier and its abnormal dimension information so that corrective measures can be taken in a timely manner.

[0042] S108: Sort and filter the link inference results, and optimize the retrieval strategy by combining user historical queries and feedback.

[0043] Specifically, construct a comprehensive evaluation function. Each link inference result is scored, where The answer relevance score, which measures the degree of semantic matching between the results and the user's query, is obtained by calculating the cosine similarity between the result text and the query vector. The factual correctness score, which measures the degree to which the result is consistent with known facts, is obtained by comparing it with a knowledge base or standard answer; The contextual accuracy score, which measures how well the result matches the current dialogue context, is calculated based on the user's historical query sequence and the current dialogue state; weighting coefficients are used. satisfy The initial values ​​are set based on experience and business requirements; during system operation, user feedback behaviors such as clicking, accepting, correcting, or rejecting historical search results are recorded to construct a feedback feature vector; and the gradient descent method is used to... The weighting coefficients are dynamically adjusted, where For learning rate, The feedback feature values ​​are normalized to make the scoring results more in line with the user's actual needs; the current link inference results are reordered according to the adjusted comprehensive evaluation function, and the top-scoring results are selected and returned to the user.

[0044] S109: Verify the integrity and logical rationality of the traceability chain, audit user data access behavior in real time, record legitimate operations and block abnormal access.

[0045] Specifically, the generated traceability chain is verified for completeness and logical rationality. The arithmetic mean of the comprehensive similarity between all adjacent nodes is calculated. When this mean is lower than a preset continuity threshold, it is determined that the traceability chain has a break or logical inconsistency, triggering a re-retrieval process. At the same time, user access behavior is audited in real time, monitoring every data access request. When the security level of a field requested by a user is detected, the system will take action. Higher than its current status level When an access request is deemed to be unauthorized, it is immediately blocked. At the same time, detailed information such as user ID, access time, and request fields are recorded in the audit log, and all logs are stored in the trusted audit system.

[0046] For ease of understanding of the present invention, Figure 2 The diagram shows a detailed flowchart of traceability chain construction and anomaly detection in the industrial data trust traceability method. It uses a real industrial scenario as an example to illustrate the specific implementation approach of multi-dimensional feature fusion and anomaly identification.

[0047] like Figure 2 The diagram shown is a detailed flowchart of the traceability chain construction and anomaly detection in the industrial data trusted traceability method, specifically including:

[0048] S201: Receive user queries under permission constraints, map the queries to vectors, and also map the data content within the permission scope, after dividing it into text blocks, to vectors.

[0049] Specifically, the system first verifies the user's identity level. And ensure that the query is within permission constraints. Within the range, a pre-trained language model is then used as the encoder. , will use the natural language query input by the user Mapped to dense query vector ,in The dimension is vector, typically 768 or 1024; simultaneously, the accessible data content is segmented into sets of text blocks according to semantic boundaries. And use the same model to map each text block to a text vector. ,in Establish a vector index library to store all text blocks and their corresponding vector representations, in preparation for subsequent semantic retrieval.

[0050] S202: Calculate the cosine similarity between the query vector and each text vector, sort them in descending order of similarity score, and recall the K text blocks with the highest similarity as a candidate information set.

[0051] Specifically, through the formula The system calculates the cosine similarity between the query vector and each text vector. A higher score indicates a better match with the user's query intent. The system selects the vector with the highest similarity score. The text blocks constitute a candidate information set. This is used for subsequent link inference.

[0052] S203: Extract multidimensional features of each traceability node from the standardized dataset, including timestamps, geographic coordinates, and business attribute vectors.

[0053] Specifically, from standardized datasets Extract each source node Multidimensional features, including timestamps recording the occurrence of events. (Unix timestamp format), geographic coordinates (Longitude and Latitude) and Business Attribute Vectors ,in As a business attribute dimension, it includes business information related to traceability, such as temperature, humidity, pressure, equipment status, and operators.

[0054] S204: Calculate the temporal similarity, spatial similarity, and logical similarity between any two nodes, and obtain the comprehensive similarity between the nodes through weighted fusion.

[0055] Specifically, time similarity is calculated based on the degree of difference between timestamps between nodes. ,in The time decay factor is used; spatial similarity is calculated based on the Euclidean distance between the geographical locations of the nodes. Logical similarity is calculated based on the cosine similarity of the business attribute vectors between nodes. The three factors are weighted and fused using preset weighting coefficients to obtain a comprehensive similarity score. ,in , , This is the balance coefficient.

[0056] S205: Construct a directed graph with the source node as the vertex and the comprehensive similarity as the edge weight. Start from the starting node and perform path search to splice the sequence of nodes with the highest similarity to form a complete source chain.

[0057] Specifically, using the source node as the vertex and the comprehensive similarity as the vertices... The weights of the edges are used to construct a directed graph. Set a similarity threshold Only retain The strong correlation connection is established; a greedy search strategy is adopted to start from the starting node, and each time the next node with the highest comprehensive similarity with the current node is selected until it can no longer be extended. Finally, the node sequence obtained by the search is spliced ​​to form a complete tracing chain.

[0058] S206: Perform abnormal node detection on the complete traceability chain, calculate the deviation of each node's business attribute value from the standard reference value, mark it as an abnormal node and output an alarm when the deviation exceeds the preset threshold, and sort, filter and audit the results at the same time.

[0059] Specifically, for each business attribute dimension, pre-defined Set standard reference value Calculate the relative deviation between the actual value and the standard value of the node. The maximum deviation among all dimensions is taken as the node anomaly degree. ,when When a node is identified as abnormal, an alarm is output; simultaneously, a comprehensive evaluation function is used. The link inference results are sorted and filtered, the best result is returned to the user, and all access behaviors are audited in real time. When a violation is detected... It was immediately identified as unauthorized access and blocked.

[0060] This invention addresses the problems of insufficient data security protection, low retrieval efficiency, and incomplete traceability chain construction in current industrial data traceability. It proposes a trusted industrial data traceability method based on hierarchical permission constraints and multi-dimensional feature association. This method is applicable to data traceability tasks in various industrial scenarios. While ensuring data security, it improves the accuracy of traceability retrieval and the integrity of the traceability chain. It is convenient, efficient, and highly generalizable, and can be directly applied to practical applications such as quality traceability, anomaly location, and compliance auditing in industrial production.

Claims

1. A reliable traceability method for industrial data based on hierarchical permission constraints and multidimensional feature association, characterized in that: The method includes the following steps: Step 1: Collect heterogeneous source tracing data from multiple sources, perform preprocessing such as missing value imputation, outlier removal, and weighted normalization to generate a standardized source tracing dataset. ; Step 2: Based on multiple dimensional indicators The data fields are quantified and classified into several security levels, with each security level corresponding to a level value. ; Step 3: Divide user identities into several levels based on user roles. And establish mapping rules between identity levels and data field levels. ; Step 4: Upon receiving a source tracing query request, verify the user's identity level and generate permission constraints based on the mapping rules. ,in For the set of all data fields, For fields The security level value is embedded in the search process; Step 5: In the permission constraints Under the condition of standardized traceability dataset Extracting and querying Relevant candidate information, including user queries Vectorization into query vector The data content within the permission scope is divided into multiple text blocks, and each text block is vectorized to obtain a set of text vectors. By calculating the cosine similarity between the query vector and each text vector. ,in The recall with the highest similarity A set of text blocks as candidate information ; Step 6: Construct traceability nodes A directed graph with vertices and logical relationships as edges. From standardized datasets Extract node features and calculate the comprehensive similarity between nodes. By piecing together the relevant nodes, a complete traceability chain is formed, in which... , , For balance coefficient, This is a measure of temporal similarity based on the difference in node timestamps. Spatial similarity is based on the distance between nodes' geographical locations. Logical similarity is measured based on the similarity of node business attribute vectors; Step 7: Perform anomaly node detection on the complete traceability chain generated in Step 6, and calculate the relative deviation of the business attribute value of each node from the standard reference value as the anomaly score. ,in For nodes The Each business attribute value, This is the standard reference value for this attribute. For a positive number approaching 0, when the anomaly exceeds a preset threshold... The node is marked as abnormal and an alarm is output. Step 8: Sort and filter the link inference results, and optimize the retrieval strategy by combining user historical queries and feedback; Step 9: Verify the integrity and logical rationality of the generated traceability chain, and audit user access behavior, record legitimate operations and block abnormal access, and store all logs in the trusted audit system.

2. The industrial data trusted traceability method based on hierarchical permission constraints and multi-dimensional feature association as described in claim 1, characterized in that, Step 1 involves collecting multi-source heterogeneous tracing data, performing missing value imputation, outlier removal, and weighted normalization preprocessing to generate a standardized tracing dataset. Specifically, this involves cleaning the collected multi-source heterogeneous data using a formula. Weighted normalization is performed, where For the first Dimensional raw data, and These are the global maximum and minimum values ​​for this dimension, respectively. For source relevance weights and , For positive numbers approaching 0; the standardized dataset is denoted as . .

3. The industrial data trusted traceability method based on hierarchical permission constraints and multi-dimensional feature association as described in claim 1, characterized in that, Step 2, based on multiple dimensional indicators The data fields are quantified and classified into several security levels, with each security level corresponding to a level value. Specifically, this includes multiple dimensions of indicators, including data sensitivity. Necessity of tracing the source and privacy breach risks The range of values ​​for each indicator is [missing information]. The scores are determined by expert ratings or automatically calculated based on data attributes; the weights of each indicator are determined using the analytic hierarchy process (AHP). And based on the overall score Data fields are divided into four security levels: Public level Internal level Sensitive level Highly sensitive level .

4. The industrial data trusted traceability method based on hierarchical permission constraints and multi-dimensional feature association according to claim 1, characterized in that, Step 3 describes classifying user identities into several levels based on user roles. And establish mapping rules between identity levels and data field levels. Specifically, user identities are divided into four levels: ordinary users, general participants, core participants, and administrators, corresponding to... The mapping rule is: level User access security level value The data fields are integrated with a dynamic identity adjustment mechanism based on user behavior and credibility.

5. The industrial data trusted traceability method based on hierarchical permission constraints and multidimensional feature association according to claim 1, characterized in that, The permission constraints described in step 4 Based on user identity level Security level of data fields The mapping relationship is generated specifically as follows: permission constraints. ,in For the set of all data fields, For fields The security level value; the constraints are injected as filtering conditions into the search engine's query interface, allowing only the search results to contain values ​​that meet the requirements. This allows for fine-grained access control through specific fields.

6. The industrial data trusted traceability method based on hierarchical permission constraints and multi-dimensional feature association according to claim 1, characterized in that, Step 5, which describes retrieving candidate information semantically relevant to the user's query from a standardized dataset using a vectorized retrieval method under permission constraints, specifically involves using a pre-trained language model as the encoder. , will user query Mapped to query vector ,in The dimension is a vector; the permission constraints are... The data content that is allowed to be accessed is divided into a set of text blocks. Each text block is mapped to a text vector using the same encoder. ,in ; Calculate the cosine similarity between the query vector and each text vector. Sort all text blocks in descending order of similarity score and select the highest-scoring blocks. The text blocks constitute a candidate information set. .

7. The industrial data trusted traceability method based on hierarchical permission constraints and multi-dimensional feature association according to claim 1, characterized in that, Step 6 involves constructing a graph structure based on the multi-dimensional features of nodes, such as time, space, and business attributes, and then piecing together related nodes to form a complete traceability chain through comprehensive similarity calculation. Specifically, this involves using a standardized dataset... Extract multidimensional features from each node, including timestamps. Geographical coordinates Business attribute vector ,in For business attribute dimensions; calculate any two nodes and Temporal similarity between ,in Time decay factor; spatial similarity Logical similarity The comprehensive similarity between nodes is calculated through weighted fusion. ,in , , The balance coefficients are used to construct a directed graph with the source node as the vertex and the comprehensive similarity as the edge weight. On the constructed directed graph, a path search is performed starting from the starting node, and the node sequence with the highest similarity is selected and spliced ​​together to form a complete tracing chain.

8. The industrial data trusted traceability method based on hierarchical permission constraints and multidimensional feature association according to claim 1, characterized in that, Step 7 describes detecting abnormal nodes in the constructed complete traceability chain. Abnormal nodes are identified by calculating the deviation of each node's business attribute value from the standard reference value. Specifically, this involves pre-calculating the deviation of each business attribute dimension... Set standard reference value The standard reference value is determined based on historical data statistics or domain knowledge; for each traceability node Calculate the values ​​of each business attribute. Compared with the corresponding standard reference value relative deviation The maximum deviation among all dimensions is taken as the outlier of that node. ; Compare anomaly level with preset threshold In comparison, when When this happens, mark the node as an abnormal node and record the abnormal dimension information.

9. The industrial data trusted traceability method based on hierarchical permission constraints and multidimensional feature association according to claim 1, characterized in that, Step 8 involves sorting and filtering the multiple possible results obtained from link inference, and dynamically optimizing the retrieval strategy by combining the user's historical query records and feedback behavior. Specifically, this includes constructing a comprehensive evaluation function. The inference results for each link are sorted and filtered, among which... Score the relevance of the answer. Score for factual accuracy. Weighting coefficients for contextual accuracy scores satisfy And dynamically adjust based on users' historical queries and feedback.

10. The industrial data trusted traceability method based on hierarchical permission constraints and multidimensional feature association according to claim 1, characterized in that, Step 9 involves verifying the integrity and logical rationality of the generated traceability chain, and simultaneously auditing each user's data access behavior in real time, recording legitimate operations and blocking abnormal access. Specifically, this includes: verifying the integrity of the generated traceability chain by calculating the arithmetic mean of the comprehensive similarity between all adjacent nodes on the traceability chain; if this average is lower than a preset threshold, it is judged as logically unreasonable and a re-retrieval is triggered; and auditing user access behavior by recording legitimate operations and blocking abnormal access when the user requests access to a field with a security level... Higher than user identity level If the access is deemed unauthorized and immediately blocked, an audit log will be recorded.