Dynamic data distribution method and system based on multi-attribute perception and intelligent decision

By constructing a multi-dimensional feature system and using machine learning and large-scale model collaborative decision-making, the problems of insufficient attribute perception, static strategy decision-making, and the disconnect between security and efficiency in existing data distribution technologies are solved. This enables a dynamically adaptive and optimized data distribution strategy, improving the efficiency and security of data transmission.

CN121880889BActive Publication Date: 2026-06-19THE FIFTH RES INST OF TELECOMM SCI & TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
THE FIFTH RES INST OF TELECOMM SCI & TECH CO LTD
Filing Date
2026-03-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing data distribution technologies suffer from several problems when dealing with multi-attribute data, including single attribute dimensions, static strategy decision-making mechanisms, a disconnect between security and efficiency, and insufficient integration of cutting-edge technologies. These issues lead to a disconnect between data distribution strategies and needs, making dynamic adaptive optimization impossible.

Method used

We construct a multi-dimensional feature system and achieve adaptive optimization of dynamic data distribution strategies through collaborative decision-making between machine learning and large models. We adopt multi-attribute perception and intelligent decision-making methods, combined with L1 regularization, random forest model, BERT model and gradient descent algorithm, to extract, filter, classify and match data features and dynamically adjust the distribution strategy.

Benefits of technology

It achieves deep perception of core data needs, improves network resource utilization efficiency, ensures the security and stability of data transmission, has strong adaptive capabilities, and is suitable for various heterogeneous network environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121880889B_ABST
    Figure CN121880889B_ABST
Patent Text Reader

Abstract

This invention discloses a dynamic data distribution method and system based on multi-attribute perception and intelligent decision-making, belonging to the field of data processing technology. The method first extracts multi-dimensional attribute features from the data and filters them to obtain key feature vectors. Then, it classifies scenes and calculates feature weights using a random forest model, while simultaneously combining a finely tuned pre-trained large model for scene semantic understanding and policy matching. Next, it fuses the outputs of the two models to determine the optimal distribution strategy using a decision function. Finally, it executes the distribution according to the strategy and dynamically optimizes the decision model based on real-time performance feedback. This invention achieves dynamic and accurate matching of distribution strategies through multi-attribute collaborative perception and machine learning-large model intelligent decision-making, effectively improving data transmission efficiency, security, and adaptability in complex network environments.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology, and in particular to a dynamic data distribution method and system based on multi-attribute perception and intelligent decision-making. Background Technology

[0002] With the deep integration and widespread application of cloud computing, edge computing, and the Internet of Things (IoT) technologies, data has become a core production factor in the digital economy era. Data distribution, as a crucial link in achieving efficient and secure data flow from source to destination, directly impacts the service quality and user experience of upper-layer applications. Currently, application scenarios are becoming increasingly complex, and the data to be distributed exhibits significant heterogeneity: it includes not only real-time monitoring video streams and interactive commands with extremely high timeliness requirements, but also large-scale batch business data that can tolerate a certain delay, and highly sensitive content containing user privacy or trade secrets. This diversity in data type, importance, real-time requirements, and transmission format presents unprecedented challenges to the intelligence, refinement, and adaptability of data distribution strategies.

[0003] Traditional data distribution technologies and existing improvements exhibit the following main limitations when faced with the aforementioned complex requirements:

[0004] First, at the data feature perception level, existing solutions suffer from limited attribute dimensions and insufficient feature utilization. Most distribution strategies make decisions based on only one or a few attributes, such as data size and data type, lacking a systematic perception and comprehensive analysis of all data attributes. For example, adaptive distribution methods based on data size primarily rely on data size attributes, failing to consider key characteristics such as data timeliness requirements, business importance levels, and security sensitivity. This one-sided perception leads to a severe disconnect between strategies and the actual, complex needs of the data, potentially misdirecting highly time-sensitive sensitive data to high-latency or weakly secure transmission paths.

[0005] Secondly, at the strategic decision-making mechanism level, existing technologies generally lack deep intelligence and dynamic adaptive capabilities. Many solutions employ strategies based on fixed rules (such as always choosing the shortest path or nearest node) or static configurations, failing to dynamically adjust according to real-time network topology changes, node load fluctuations (such as resource contention at edge nodes during peak business periods), and data attribute combinations. Some solutions attempt to introduce machine learning for optimization, such as using BP neural networks for path prediction; however, their intelligent applications are often limited to the single stage of path selection, and the models are simple, failing to form a closed-loop intelligent decision-making process covering "attribute awareness - strategy generation - execution verification." When the network environment or data characteristics change dynamically, such strategies are prone to failure, leading to transmission congestion, interruptions, or service quality degradation.

[0006] Furthermore, in terms of balancing security and efficiency, existing methods are often fragmented or overly simplistic. Security protection relies heavily on traditional and relatively static technologies such as end-to-end encryption and access control, failing to dynamically correlate with the importance and sensitivity level of the data content itself, and thus unable to achieve "tiered protection and on-demand application." Regarding efficiency, for large-scale or complex data, there is a lack of priority-based intelligent segmentation and differentiated transmission mechanisms, making it difficult to balance prioritizing the delivery of security-critical data with optimizing overall transmission costs. Existing security measures and efficiency optimization strategies are typically designed separately, making it difficult to achieve an integrated optimal balance in complex multi-attribute data distribution scenarios.

[0007] Finally, at the level of technological integration and innovation, the application of cutting-edge artificial intelligence technologies is fragmented, failing to achieve synergistic effects. Although machine learning and large-scale modeling technologies have demonstrated powerful capabilities in pattern recognition and semantic understanding, respectively, existing solutions have failed to effectively integrate the two for data distribution decisions. Machine learning models (such as neural networks) excel at learning quantitative patterns from structured features, but have limited ability to understand the contextual semantics of complex scenarios; while large-scale language models can deeply understand the semantics of requirements, they struggle to directly process quantitative data such as network performance metrics. Currently, there is a lack of a mechanism that can combine the precise quantitative analysis of machine learning with the deep semantic understanding of large-scale models to achieve an accurate and reliable mapping from multi-attribute data features to the optimal distribution strategy in complex network environments.

[0008] Therefore, there is an urgent need for a new generation of data distribution strategy implementation method that can systematically perceive the multi-dimensional attributes of data, deeply integrate machine learning and large model intelligence, realize dynamic strategy generation, hierarchical security protection, and continuous adaptive optimization, and solve the technical problems of existing data distribution technologies in terms of the comprehensiveness of multi-attribute feature perception, the dynamic intelligence of decision-making mechanism, the synergy of security and efficiency, and the integration of cutting-edge technologies. Summary of the Invention

[0009] The purpose of this invention is to overcome the shortcomings of the prior art and provide a dynamic data distribution method and system based on multi-attribute perception and intelligent decision-making. It aims to construct a multi-dimensional feature system covering basic data attributes, importance attributes, and carrying attributes, and designs an integrated processing flow of "extraction-screening-quantification". It establishes a dynamic optimization mechanism based on feedback of key performance indicators, and adjusts the parameters of the decision model in real time by optimizing the objective function and gradient descent algorithm to achieve adaptive iteration of the distribution strategy.

[0010] To achieve the above objectives, this application proposes a dynamic data distribution method based on multi-attribute perception and intelligent decision-making, comprising the following steps:

[0011] Step S1: Extract multi-dimensional attribute features from the data to be distributed, and standardize and filter the extracted features to obtain a simplified data attribute feature vector;

[0012] Step S2: Based on the data attribute feature vector, classify the distribution scenario using a machine learning model and calculate the weight of each attribute feature;

[0013] Step S3: Input the data attribute feature vector, distribution scenario classification result, attribute feature weight and current network state information into the pre-trained language model for semantic understanding and policy matching to obtain at least one candidate distribution policy and its matching score.

[0014] Step S4: Combine the matching scores of the distribution scenario classification results and the candidate distribution strategies, and select the optimal distribution strategy through a decision function;

[0015] Step S5: Perform dynamic data distribution operation according to the optimal distribution strategy;

[0016] Step S6: Collect performance metrics during the dynamic data distribution process, construct an optimization objective function based on the performance metrics, and dynamically adjust the parameters of the machine learning model and / or the pre-trained language model according to the optimization results.

[0017] As a further solution, in step S1, the multi-dimensional attribute features include at least:

[0018] Basic attributes: data size, data type, data timeliness requirements;

[0019] Importance attributes: data importance level, data sensitivity;

[0020] Bearer attributes: data bearer type, link bandwidth requirements, and data transmission reliability requirements;

[0021] After performing Min-Max standardization on the multi-dimensional attribute features, L1 regularization is used for feature filtering to obtain the simplified data attribute feature vector.

[0022] As a further solution, in step S2,

[0023] The machine learning model is a random forest model, which is used to classify the data attribute feature vectors and output the distribution scenario category;

[0024] The information entropy of each feature in the data attribute feature vector is calculated using the entropy weight method, and the attribute feature weight is calculated based on the information entropy.

[0025] As a further solution, in step S3...

[0026] The pre-trained language model is based on the BERT architecture and has been fine-tuned using distributed scenario datasets;

[0027] The input to the pre-trained language model is formatted text containing the data attribute feature vector, distribution scenario category, attribute feature weight, network topology information, and node load information;

[0028] The output of the pre-trained language model is a set of candidate distribution strategies and corresponding matching scores. The candidate distribution strategies include at least one of proximity distribution, shortest path distribution, and fusion distribution.

[0029] As a further solution, in step S4, the decision function is:

[0030]

[0031] in, Choose a function for the maximum value. This is the optimal distribution strategy. For the set of candidate distribution strategies, Candidate distribution strategy Match score, For the distribution scenario category k and candidate distribution strategy The historical fit, where α and β are decision weight coefficients and α+β=1.

[0032] As a further solution, in step S5, when the optimal distribution strategy is a fusion distribution strategy, the specific execution steps include:

[0033] Step S51: Based on the attribute feature weights, calculate the priority score of each sub-data block in the data to be distributed, and sort them according to priority;

[0034] Step S52: Divide the data to be distributed into multiple sub-data blocks according to the data type;

[0035] Step S53: Different distribution strategies are applied to transmit sub-data blocks of different priorities. Among them, the encryption and proximity distribution strategy is used for high-priority sub-data blocks, and the shortest path distribution or batch distribution strategy is used for medium and low-priority sub-data blocks.

[0036] Step S54: Reassemble all received sub-data blocks at the target node to obtain complete data.

[0037] As a further solution, in step S5, when performing the distribution operation, the following general steps are also performed:

[0038] Based on the average link bandwidth and the data timeliness requirements, the number of data packets to be unpacked is dynamically determined and unpacked accordingly.

[0039] A hash algorithm is used to generate a content digest of the data to be distributed and sent with the data packet;

[0040] The target node verifies the content digest of the received data. If they match, they are reassembled; otherwise, a retransmission is requested.

[0041] As a further solution, in step S6, the performance metrics include at least transmission latency, transmission success rate, and node load rate;

[0042] The optimization objective function is: ;in, For transmission delay, To improve transmission success rate, For node load rate, , , To optimize the weighting coefficients;

[0043] When the value of the optimization objective function is lower than a preset threshold, the parameters of the machine learning model and / or the pre-trained language model are updated using the gradient descent algorithm.

[0044] As a further solution, the priority score is calculated using the following formula:

[0045]

[0046] in, Let L be the priority score of the i-th sub-data block, and L be the number of attribute features involved in the priority scoring. For attribute weights, For sub-data block D i The standardized value of the j-th attribute, according to P(D i The priority sequence of sub-data blocks is obtained by sorting in descending order.

[0047] On the other hand, the present invention provides a dynamic data distribution system based on multi-attribute perception and intelligent decision-making, for implementing the dynamic data distribution method based on multi-attribute perception and intelligent decision-making as described in any of the preceding claims, comprising:

[0048] The data feature processing module is used to extract, standardize, and filter multi-dimensional attribute features of the data to be distributed, and obtain data attribute feature vectors.

[0049] The intelligent decision-making module is connected to the data feature processing module. It is used to perform scene classification and weight calculation based on the data attribute feature vector, and to perform policy matching and score calculation through a pre-trained language model. Finally, it integrates and decides the optimal distribution strategy.

[0050] The dynamic distribution execution module, connected to the intelligent decision-making module, is used to perform data distribution operations according to the optimal distribution strategy. It supports multiple strategies such as nearest distribution, shortest path distribution, and fusion distribution, and integrates intelligent unpacking and content summary verification functions.

[0051] The dynamic optimization feedback module, connected to the dynamic distribution execution module and the intelligent decision-making module, is used to collect distribution performance indicators, construct an optimization objective function, and dynamically adjust the parameters of the model in the intelligent decision-making module.

[0052] Compared with related technologies, the dynamic data distribution method and system based on multi-attribute perception and intelligent decision-making provided by this invention has the following advantages:

[0053] 1. This invention constructs a comprehensive feature system covering fundamental, importance, and carrying attributes, and employs L1 regularization for precise feature selection, achieving a deep understanding of core data needs. Based on this, through collaborative intelligent decision-making using machine learning and large-scale models, it can dynamically match optimal distribution strategies (such as proximity, shortest path, or fusion distribution) for data with different feature combinations. Compared to traditional fixed strategies or methods based on single attributes, this method can more fully utilize network resources, avoid congested nodes and inefficient paths, and effectively optimize bandwidth and computing resource utilization while ensuring service quality.

[0054] 2. This invention abandons the single, static security protection method and creatively binds security mechanisms deeply with data content characteristics. By determining the data sensitivity attributes, highly sensitive data is automatically subjected to strong encryption (such as AES) and distributed with high priority via the nearest or dedicated path; at the same time, intelligent packet unpacking and SHA-256 content digest verification technology serve as a general security layer, ensuring the integrity and tamper-proof nature of data during transmission, meeting the stringent requirements of high-security-level services.

[0055] 3. The closed-loop dynamic optimization feedback mechanism designed in this invention can collect key performance indicators such as transmission delay, success rate, and node load in real time. When network topology changes or node load fluctuations cause performance indicators to deteriorate, the system can automatically adjust the parameters of the intelligent decision-making model (such as random forest or large model) through gradient descent algorithm to optimize subsequent distribution decisions in real time. This enables the system to have strong self-healing and adaptive capabilities when facing complex dynamic scenarios such as network jitter and sudden load, significantly improving the overall service stability and robustness, and continuously ensuring the achievement of Service Level Agreements (SLAs).

[0056] 4. The proposed method framework is independent of specific network architectures or data types. Its multi-attribute feature system is flexibly expandable, and its intelligent decision-making module is compatible with different machine learning and model algorithms. Therefore, this solution can be seamlessly applied to various heterogeneous network environments, such as data synchronization between cloud computing centers, real-time data processing in edge computing scenarios, and data aggregation from massive IoT devices. Whether it is text, images, video streams, or structured data, the system can perform unified processing and policy matching through its feature vectors. Attached Figure Description

[0057] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0058] To more clearly illustrate the technical solutions in the embodiments of this application or related technologies, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, those skilled in the art can obtain other drawings based on these drawings without creative effort.

[0059] Figure 1 A schematic diagram illustrating the steps of a dynamic data distribution method based on multi-attribute perception and intelligent decision-making provided by the present invention;

[0060] Figure 2 A schematic diagram of a dynamic data distribution system based on multi-attribute perception and intelligent decision-making provided by the present invention;

[0061] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0062] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.

[0063] Example 1

[0064] Please see Figure 1 This embodiment provides a dynamic data distribution method based on multi-attribute perception and intelligent decision-making, including the following steps:

[0065] Step S1: Extract multi-dimensional attribute features from the data to be distributed, and standardize and filter the extracted features to obtain a simplified data attribute feature vector;

[0066] Step S2: Based on the data attribute feature vector, classify the distribution scenario using a machine learning model and calculate the weight of each attribute feature;

[0067] Step S3: Input the data attribute feature vector, distribution scenario classification result, attribute feature weight and current network state information into the pre-trained language model for semantic understanding and policy matching to obtain at least one candidate distribution policy and its matching score.

[0068] Step S4: Combine the matching scores of the distribution scenario classification results and the candidate distribution strategies, and select the optimal distribution strategy through a decision function;

[0069] Step S5: Perform dynamic data distribution operation according to the optimal distribution strategy;

[0070] Step S6: Collect performance metrics during the dynamic data distribution process, construct an optimization objective function based on the performance metrics, and dynamically adjust the parameters of the machine learning model and / or the pre-trained language model according to the optimization results.

[0071] It should be noted that: This embodiment first extracts multi-attribute features from the distributed data to construct a data attribute feature set; then, it classifies and ranks the data attributes based on a machine learning model, and combines a large model to perform semantic understanding and strategy matching for the distribution scenario; finally, based on the decision results, it dynamically selects strategies such as nearest distribution, fusion distribution, and shortest path distribution, and combines content summary verification, intelligent unpacking and reassembly, and other technologies to achieve secure and efficient delivery of data.

[0072] Furthermore, in step S1, the multi-dimensional attribute features include at least:

[0073] Basic attributes: data size, data type, data timeliness requirements;

[0074] Importance attributes: data importance level, data sensitivity;

[0075] Bearer attributes: data bearer type, link bandwidth requirements, and data transmission reliability requirements;

[0076] After performing Min-Max standardization on the multi-dimensional attribute features, L1 regularization is used for feature filtering to obtain the simplified data attribute feature vector.

[0077] In one specific embodiment, the specific technical solution is as follows:

[0078] Multi-attribute feature extraction (step S1): This step extracts multi-dimensional attribute features from the data to be distributed, constructs a complete feature set, and provides high-quality data support for subsequent intelligent decision-making.

[0079] To address the issue of multi-attribute redundancy, this step employs a two-step approach: feature extraction and feature selection. First, all attribute features are extracted. Then, machine learning algorithms are used to select key features. The specific process is as follows:

[0080] First, we extract all multi-dimensional attribute features, covering three core categories:

[0081] Basic attributes: Data size D (unit: Byte), data type T (text / image / video / audio / structured data, etc., represented using one-hot encoding as T=[t1,t2,t3,t4,t5], where t i ∈{0,1}), data generation time T0 (timestamp format), and data timeliness requirement τ (unit: s, i.e., data must be delivered within τ time). These attributes determine the time constraints and basic resource requirements for data transmission.

[0082] Importance attributes: Data importance level I (set based on business needs, using a 1-5 rating system, with level 1 being the lowest and level 5 being the highest), data sensitivity level S (sensitive / non-sensitive, using a 0-1 encoding system, with sensitive being 1 and non-sensitive being 0). The sensitivity level is determined by matching keywords in the data content (such as keywords involving user privacy or trade secrets). This type of attribute determines the security protection level.

[0083] Data Carrying Attribute: Data carrying type C (streaming data / batch data / real-time interactive data, encoded as C=[c1,c2,c3], c i The attributes ∈{0,1}), the bandwidth requirement of the transmission link B (unit: Mbps), and the data transmission reliability requirement R (e.g., R=99.99% for financial data, R=99% for ordinary log data) determine the selection criteria for the transmission link.

[0084] Construct the initial data attribute feature vector X, as shown in the following expression:

[0085]

[0086] To eliminate the dimensional differences between different attribute dimensions, the feature vectors are first standardized using Min-Max, as shown in the following formula:

[0087]

[0088] in, Let be the original value of the i-th dimension of the feature vector. , These are the minimum and maximum values ​​of the feature in this dimension, respectively. These are the standardized feature values. To eliminate redundant features and improve the efficiency of subsequent decision-making, L1 regularization is introduced for feature selection. The selection function is:

[0089]

[0090] Among them, y i Let ω be the sample label (i.e., the optimal distribution strategy category), and ω be the feature weight vector. j λ represents the coefficient of the j-th attribute feature (such as large data size, timeliness, importance, etc.) in the regression model, and λ is the regularization parameter (determined by cross-validation, λ∈[0.01,0.1]). This function filters out key features with non-zero weights, and finally obtains the simplified feature vector X_selected, which provides efficient input for subsequent intelligent decision-making.

[0091] Furthermore, in step S2,

[0092] The machine learning model is a random forest model, which is used to classify the data attribute feature vectors and output the distribution scenario category;

[0093] The information entropy of each feature in the data attribute feature vector is calculated using the entropy weight method, and the attribute feature weight is calculated based on the information entropy.

[0094] In one specific embodiment, the specific technical solution is as follows:

[0095] Machine learning attribute classification and weight calculation (step S2): Use the Random Forest (RF) model to classify the standardized feature vectors. The data is categorized, and the corresponding distribution scenario category is output (e.g., real-time interactive scenario, batch transmission scenario, sensitive data transmission scenario, etc.). The decision function of the random forest model is:

[0096]

[0097] Where K is the total number of distribution scenario categories, and M is the number of decision trees in the random forest. I(·) is the classification function of the m-th decision tree, and I(·) is the indicator function (1 if the condition is met, 0 otherwise).

[0098] Simultaneously, the entropy weight method is used to calculate the weights of each attribute feature, evaluating the degree of influence of different attributes on the selection of the distribution strategy. The steps are as follows:

[0099] Calculate the information entropy of the j-th attribute. :

[0100]

[0101] Where L is the total number of attribute features, and N is the number of samples. Let be the probability of the i-th sample on the j-th attribute;

[0102] Calculate the weight of the j-th attribute :

[0103]

[0104] The final attribute weight vector is obtained as W = [w1, w2, ..., w L ].

[0105] Furthermore, in step S3,

[0106] The pre-trained language model is based on the BERT architecture and has been fine-tuned using distributed scenario datasets;

[0107] The input to the pre-trained language model is formatted text containing the data attribute feature vector, distribution scenario category, attribute feature weight, network topology information, and node load information;

[0108] The output of the pre-trained language model is a set of candidate distribution strategies and corresponding matching scores. The candidate distribution strategies include at least one of proximity distribution, shortest path distribution, and fusion distribution.

[0109] In one specific embodiment, the specific technical solution is as follows:

[0110] Large-scale model scene semantic understanding and policy matching (step S3): Introduce a pre-trained large model (a domain-adaptive model based on BERT), and fine-tune it using a scene dataset to improve the semantic understanding accuracy of the distribution scene. Specifically, first construct a distribution scene dataset covering 100,000+ samples, each sample containing a four-tuple of "data attribute features - network environment - node load - optimal distribution strategy"; then fine-tune the BERT model based on this dataset, optimizing the model's attention mechanism to accurately capture the semantic relationship between data attributes and distribution strategies. The fine-tuning process uses the cross-entropy loss function:

[0111]

[0112] in, The true label (0-1 encoding) of strategy k corresponding to sample i. Predict the probability that sample i belongs to policy k for the model. The input text format of the fine-tuned large model is: "Key data features: {X_selected}, Scene category: {k}, Attribute weight: {W}, Current network topology: {Topology information, including node connection relationship and link bandwidth}, Node load: {CPU utilization and memory usage of each node}, Please output the candidate distribution policy and matching score".

[0113] The large model outputs a set of candidate distribution strategies S_candidate = [s1, s2, ..., sm] (such as nearest distribution, shortest path distribution, fusion distribution, encrypted splitting distribution, etc.), and outputs the matching score (s) for each candidate strategy. i (0-10 points, the higher the score, the higher the matching degree). To improve the stability of the output, the idea of ​​ensemble learning is introduced. The weighted average of the three model output results is taken (the weights are set based on the output confidence), and finally a stable set of candidate policies and matching scores are obtained.

[0114] Furthermore, in step S4, by combining the scene category output by the machine learning model with the policy matching score output by the large model, a policy decision function is constructed to select the optimal distribution policy. The decision function is as follows:

[0115]

[0116] in, Choose a function for the maximum value. This is the optimal distribution strategy. For the set of candidate distribution strategies, Candidate distribution strategy Match score, For the distribution scenario category k and candidate distribution strategy The historical fit (based on historical data statistics, with a value range of 0-10), α and β are decision weight coefficients and α+β=1.

[0117] Historical fit can be defined as the strategy used in scenario k within historical data. The success rate (or average performance score) is stored in the database and continuously updated as the system runs; for example: To obtain the strategy from the historical database under distribution scenario category k The historical average performance index comprehensive score, which ranges from 0 to 10, is dynamically updated based on the optimization feedback in step S6.

[0118] Furthermore, in step S5, when the optimal distribution strategy is a fusion distribution strategy, the specific execution steps include:

[0119] Step S51: Based on the attribute feature weights, calculate the priority score of each sub-data block in the data to be distributed, and sort them according to priority;

[0120] Step S52: Divide the data to be distributed into multiple sub-data blocks according to the data type;

[0121] Step S53: Different distribution strategies are applied to transmit sub-data blocks of different priorities. Among them, the encryption and proximity distribution strategy is used for high-priority sub-data blocks, and the shortest path distribution or batch distribution strategy is used for medium and low-priority sub-data blocks.

[0122] Step S54: Reassemble all received sub-data blocks at the target node to obtain complete data.

[0123] In one specific embodiment, the specific technical solution is as follows:

[0124] This embodiment is based on the optimal strategy output by intelligent decision-making. The system executes corresponding distribution operations, combining content digest verification and intelligent packet reassembly technologies to ensure the efficiency and security of data transmission. The specific implementations of each strategy are as follows:

[0125] 1. Local distribution strategy (suitable for small-volume, non-sensitive data with high timeliness requirements)

[0126] Based on the edge computing node deployment architecture, a node distance evaluation function d(u, v) is constructed, representing the combined value of the physical distance and network latency between the data source node u and the target node v:

[0127]

[0128] in, The physical distance between u and v (unit: km) d(u, v) represents the network latency (in milliseconds), and γ is the weighting coefficient (0 < γ < 1). The edge node with the smallest d(u, v) is selected as the distribution node to achieve delivery to the nearest node.

[0129] 2. Shortest path distribution strategy (suitable for batch data and scenarios sensitive to transmission costs)

[0130] An improved Dijkstra algorithm is used to calculate the optimal transmission path. A path weight function w(e) is defined, comprehensively considering factors such as link bandwidth, latency, and packet loss rate.

[0131]

[0132] Where e is the network link, B(e) is the link bandwidth, delay(e) is the link delay, loss(e) is the link packet loss rate, and λ1, λ2, and λ3 are weighting coefficients (Σλ). i = 1). This algorithm finds the path with the minimum sum of weights, which is then used as the optimal path for data distribution.

[0133] 3. Fusion and Distribution Strategy (Applicable to large-size, multi-type fusion data)

[0134] For large-scale, multi-type fused data (such as monitoring data containing video, audio, and sensitive metadata), the sub-data blocks are first prioritized based on their importance attributes. Then, they are split into multiple sub-data blocks according to their type. Differentiated distribution strategies are applied to sub-data blocks of different priorities and types. Finally, they are merged and reassembled at the target node, achieving the dual goals of "priority adaptation + efficiency optimization." The priority ranking uses a comprehensive scoring function based on the entropy weight method.

[0135]

[0136] Among them, P(D) i Let be the priority score (0-10 points) of the i-th sub-data block, and L be the number of attribute features involved in the priority scoring. For attribute weights, For sub-data block D i The standardized value of the j-th attribute, according to P(D i Sort the sub-data blocks in descending order to obtain the priority sequence. The splitting function is:

[0137]

[0138] Where D is the original data, D i Let n be the i-th sub-data block after splitting, and n be the number of splits (dynamically determined based on the number and size of data types and link bandwidth, satisfying size(D)). i) ≤ B_avg·τ). For high-priority sub-data blocks (P(D) i (≥7 points) A "encryption + nearest distribution" strategy is adopted to ensure transmission timeliness and security; for medium and low priority sub-data blocks, the shortest path or batch distribution strategy is used to reduce transmission costs. After each sub-data block is transmitted, it is reassembled through a fusion function F.

[0139] For each D i Based on its attribute characteristics, the intelligent decision-making module is invoked again to select a distribution strategy, and after transmission, it is reassembled through the fusion function F:

[0140]

[0141] Among them, D i recv For the i-th sub-data block received, This is the complete data after reorganization.

[0142] 4. Intelligent packet unpacking and content summary verification strategy (applicable to all scenarios, ensuring data security)

[0143] The data to be transmitted is intelligently fragmented, and the fragment size is dynamically adjusted according to the link bandwidth. The number of fragments, k, satisfies the following:

[0144]

[0145] Among them, B av9 The average bandwidth of the link is calculated. Each data packet is appended with a header, including sequence number, total number of packets, and checksum. Simultaneously, a data digest is generated using the SHA-256 algorithm; the digest function is:

[0146]

[0147] After receiving all data packets, the target node first verifies the consistency of the digest (compares the digest generated from the received data with the digest sent by the sender). If they match, the data packets are disassembled and reassembled; otherwise, a retransmission is requested to ensure the integrity and security of data transmission.

[0148] Furthermore, in step S5, the following general steps are also performed during the distribution operation:

[0149] Based on the average link bandwidth and the data timeliness requirements, the number of data packets to be unpacked is dynamically determined and unpacked accordingly.

[0150] A hash algorithm is used to generate a content digest of the data to be distributed and sent with the data packet;

[0151] The target node verifies the content digest of the received data. If they match, they are reassembled; otherwise, a retransmission is requested.

[0152] Furthermore, in step S6, the performance metrics at least include transmission delay, transmission success rate, and node load rate;

[0153] The optimization objective function is: ; where is the transmission delay, is the transmission success rate, is the node load rate, , , are optimization weight coefficients;

[0154] When the value of the optimization objective function is lower than the preset threshold, the gradient descent algorithm is used to update the parameters of the machine learning model and / or the pre-trained language model.

[0155] Specifically, if J < Jth (Jth is the preset threshold), the matching weights of the machine learning model parameters and the large model are adjusted through the gradient descent algorithm to optimize the decision result and achieve the dynamic adaptability of the distribution strategy. The parameter update formula for gradient descent is:

[0156]

[0157] where θ is the parameter to be updated (such as the decision tree parameters of the random forest, the matching weights of the large model, etc.), η is the learning rate (0 < η < 1), is the gradient of the objective function at 0.

[0158] Embodiment 2

[0159] Please refer to Figure 2 , based on Embodiment 1, this embodiment provides a dynamic data distribution system based on multi-attribute perception and intelligent decision-making, including:

[0160] A data feature processing module, used to extract, standardize, and screen the multi-dimensional attribute features of the data to be distributed, and obtain a data attribute feature vector;

[0161] An intelligent decision-making module, connected to the data feature processing module, used to perform scenario classification and weight calculation through a machine learning model based on the data attribute feature vector, and perform policy matching and score calculation through a pre-trained language model, and finally fuse and decide the optimal distribution strategy;

[0162] A dynamic distribution execution module, connected to the intelligent decision-making module, used to execute data distribution operations according to the optimal distribution strategy, support multiple strategies such as proximity distribution, shortest path distribution, and fusion distribution, and integrate intelligent unpacking and content summary verification functions;

[0163] The dynamic optimization feedback module, connected to the dynamic distribution execution module and the intelligent decision-making module, is used to collect distribution performance indicators, construct an optimization objective function, and dynamically adjust the parameters of the model in the intelligent decision-making module.

[0164] In a more specific embodiment, the system deployment of the present invention includes four parts: a data source node, an edge computing node, a core network node, and a target node. The data source node is responsible for collecting data to be distributed and extracting multi-attribute features. The edge computing node deploys an intelligent decision-making module and a partial distribution execution module to achieve local distribution and real-time decision-making. The core network node is responsible for shortest path distribution and large-scale model semantic understanding. The target node is responsible for data reception, summary verification, data disassembly and reassembly, and data fusion. All nodes are connected via high-speed network links, supporting real-time data transmission and interaction.

[0165] Data feature processing module: The data source node collects real-time monitoring data (including video, audio, and sensitive metadata of equipment) of a smart park and extracts full attribute features: D=1024MB, T=video (encoded [0,1,0,0,0]), T0=1699999999, τ=30s, I=level 4, S=sensitive (1), C=streaming data ([1,0,0]), B=100Mbps, R=99.9%. The initial feature vector X is constructed, and after Min-Max standardization, the simplified feature vector X_selected=[τ, I, S, B, R] is obtained by L1 regularization (λ=0.05);

[0166] Intelligent Decision Module: X_selected is input into a random forest model (M=100 decision trees), outputting the scenario category as "high-priority sensitive streaming data transmission scenario"; the entropy weight method calculates the weight vector W=[0.3, 0.25, 0.2, 0.15, 0.1] (timeliness and importance have the highest weights); X_selected, scenario category, W, and network information (edge ​​node 5km from data source, latency 10ms, load rate 30%) are input into a fine-tuned BERT model, outputting a candidate strategy set S_candidate=[fusion distribution, encrypted nearest distribution], with weighted average matching scores of 8.5 and 7 respectively; calculated using a decision function (α=0.6, β=0.4), fusion distribution scores 8.3 and encrypted nearest distribution scores 7.8, ultimately selecting the fusion distribution strategy;

[0167] Dynamic distribution execution module: Data is split by type into D1 (video frames, 512MB, P=8.2), D2 (audio, 256MB, P=6.5), and D3 (sensitive metadata, 256MB, P=9.0); D3 (highest priority) uses "AES encryption + nearest-neighbor distribution", D1 (medium-high priority) uses nearest-neighbor distribution, and D2 (medium priority) uses the shortest path calculated by the improved Dijkstra algorithm for distribution; each sub-block is distributed according to the formula k= size(D i ) / (B_avg·τ) Disassemble the packets (D1 disassembles 3 packets, D2 disassembles 1 packet, D3 disassembles 2 packets), add header information to each packet and generate an SHA-256 digest;

[0168] Data reception and verification: The target node receives all data packets, verifies the SHA-256 digests one by one (all are consistent, no packet loss), and reassembles the packets in reverse order of priority (reassembles D3 first, then D1 and D2). The complete data reconstruction is completed through the fusion function F, and the reconstruction takes 0.8 seconds.

[0169] Dynamic optimization feedback module: Collect transmission indicators: t=25s (satisfying τ=30s), r=100%, l=35%, calculate J=0.4×(1 / 25)+0.4×1-0.2×0.35=0.384 (> Jth=0.3); Simulate network fluctuations (edge ​​node load rises to 70%), J drops to 0.28< Jth, update the depth of the random forest decision tree (adjusted from 8 layers to 10 layers) and model matching weights (α=0.55, β=0.45) through gradient descent algorithm, after optimization J rises back to 0.39, and the transmission delay stabilizes within 28s.

[0170] In summary, the present invention achieves the following technical effects:

[0171] 1. Significantly improved distribution efficiency: Through multi-attribute feature filtering and intelligent strategy matching, compared with traditional fixed strategies (such as single shortest path), the average transmission latency is reduced by more than 30% (traditional strategy latency is 42s, this invention is 25-28s), and the transmission success rate is increased to more than 99.5% (traditional strategy is about 95%), which is especially suitable for high time-sensitive data scenarios.

[0172] 2. Enhanced security protection capabilities: A full-process security mechanism of "encrypted hierarchical transmission + SHA-256 digest verification" is constructed, which reduces the risk of sensitive data leakage by more than 80%, and ensures 100% data transmission integrity, solving the problem of poor adaptability of traditional encryption technologies;

[0173] 3. Strong adaptability: Through a closed-loop dynamic optimization mechanism, it can respond in real time to complex scenarios such as network topology changes and node load fluctuations (stable transmission can still be maintained even when the load rate increases from 30% to 70%), and the adaptability in multi-service heterogeneous data scenarios is improved by 40%;

[0174] 4. Wide applicability: Applicable to various distribution scenarios such as cloud computing, edge computing, and the Internet of Things, it supports multiple types of data transmission such as text, images, videos, and sensitive metadata. Compared with existing single-scenario solutions, the coverage of applicable scenarios is increased by 50%, making it extremely valuable for promotion.

[0175] The above are only some embodiments of this application and do not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.

Claims

1. A dynamic data distribution method based on multi-attribute perception and intelligent decision-making, characterized in that, Includes the following steps: Step S1: Extract multi-dimensional attribute features from the data to be distributed, and standardize and filter the extracted features to obtain a simplified data attribute feature vector; Step S2: Based on the data attribute feature vector, classify the distribution scenario using a machine learning model and calculate the weight of each attribute feature; Step S3: Input the data attribute feature vector, distribution scenario classification result, attribute feature weight and current network state information into the pre-trained language model for semantic understanding and policy matching to obtain at least one candidate distribution policy and its matching score. The pre-trained language model is based on the BERT architecture and has been fine-tuned using distributed scenario datasets; The input to the pre-trained language model is formatted text containing the data attribute feature vector, distribution scenario category, attribute feature weight, network topology information, and node load information; The output of the pre-trained language model is a set of candidate distribution strategies and corresponding matching scores. The candidate distribution strategies include at least one of proximity distribution, shortest path distribution, and fusion distribution. Step S4: Combine the matching scores of the distribution scenario classification results and the candidate distribution strategies, and select the optimal distribution strategy through a decision function; Step S5: Perform dynamic data distribution operation according to the optimal distribution strategy; Step S6: Collect performance metrics during the dynamic data distribution process, construct an optimization objective function based on the performance metrics, and dynamically adjust the parameters of the machine learning model and / or the pre-trained language model according to the optimization results.

2. The method of claim 1, wherein, In step S1, the multi-dimensional attribute features include at least: Basic attributes: data size, data type, data timeliness requirements; Importance attributes: data importance level, data sensitivity; Bearer attributes: data bearer type, link bandwidth requirements, and data transmission reliability requirements; After performing Min-Max standardization on the multi-dimensional attribute features, L1 regularization is used for feature filtering to obtain the simplified data attribute feature vector.

3. The method of claim 1, wherein, In step S2, The machine learning model is a random forest model, which is used to classify the data attribute feature vectors and output the distribution scenario category; The information entropy of each feature in the data attribute feature vector is calculated using the entropy weight method, and the attribute feature weight is calculated based on the information entropy.

4. The method of claim 1, wherein, In step S4, the decision function is: wherein, is a maximum selection function, is the optimal distribution strategy, is a set of candidate distribution strategies, is a candidate distribution strategy is a matching score, is a historical fitness of the distribution scenario category k and the candidate distribution strategy , and α and β are decision weight coefficients and α + β = 1.

5. The method of claim 1, wherein, In step S5, when the optimal distribution strategy is a fusion distribution strategy, the specific execution steps include: Step S51: Based on the attribute feature weights, calculate the priority score of each sub-data block in the data to be distributed, and sort them according to priority; Step S52: Divide the data to be distributed into multiple sub-data blocks according to the data type; Step S53: Different distribution strategies are applied to transmit sub-data blocks of different priorities. Among them, the encryption and proximity distribution strategy is used for high-priority sub-data blocks, and the shortest path distribution or batch distribution strategy is used for medium and low-priority sub-data blocks. Step S54: Reassemble all received sub-data blocks at the target node to obtain complete data.

6. The method of claim 2, wherein, In step S5, during the distribution operation, the following general steps are also performed: Based on the average link bandwidth and the data timeliness requirements, the number of data packets to be unpacked is dynamically determined and unpacked accordingly. A hash algorithm is used to generate a content digest of the data to be distributed and sent with the data packet; The target node verifies the content digest of the received data. If they match, they are reassembled; otherwise, a retransmission is requested.

7. The method of claim 1, wherein, In step S6, the performance indicators include at least transmission latency, transmission success rate, and node load rate; The optimization objective function is: ; wherein, is a transmission delay, is a transmission success rate, is a node load rate, , , is an optimization weight coefficient; When the value of the optimization objective function is lower than a preset threshold, the parameters of the machine learning model and / or the pre-trained language model are updated using the gradient descent algorithm.

8. The dynamic data distribution method based on multi-attribute perception and intelligent decision-making according to claim 5, characterized in that, The formula for calculating the priority score is as follows: in, Let L be the priority score of the i-th sub-data block, and L be the number of attribute features involved in the priority scoring. For attribute weights, For sub-data block D i The standardized value of the j-th attribute, according to P(D i The priority sequence of sub-data blocks is obtained by sorting in descending order.

9. A dynamic data distribution system based on multi-attribute perception and intelligent decision-making, used to implement the dynamic data distribution method based on multi-attribute perception and intelligent decision-making as described in any one of claims 1 to 8, characterized in that, include: The data feature processing module is used to extract, standardize, and filter multi-dimensional attribute features of the data to be distributed, and obtain data attribute feature vectors. The intelligent decision-making module is connected to the data feature processing module. It is used to perform scene classification and weight calculation based on the data attribute feature vector, and to perform policy matching and score calculation through a pre-trained language model. Finally, it integrates and decides the optimal distribution strategy. The dynamic distribution execution module, connected to the intelligent decision-making module, is used to perform data distribution operations according to the optimal distribution strategy. It supports multiple strategies such as nearest distribution, shortest path distribution, and fusion distribution, and integrates intelligent unpacking and content summary verification functions. The dynamic optimization feedback module, connected to the dynamic distribution execution module and the intelligent decision-making module, is used to collect distribution performance indicators, construct an optimization objective function, and dynamically adjust the parameters of the model in the intelligent decision-making module.