A weighted network key node identification method based on statistical significance test
By introducing node strength and a stochastic model into a weighted network, combined with a significance test method, the problems of missing statistical assumptions and high false alarm rate in node identification in existing technologies are solved, and quantitative and reliable identification of key nodes is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DALIAN UNIV
- Filing Date
- 2026-01-28
- Publication Date
- 2026-06-19
Smart Images

Figure CN122241037A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of complex network mining technology, specifically a method for identifying key nodes in a weighted network based on statistical significance testing. Background Technology
[0002] Complex network research originates from long-term empirical analysis of real-world systems such as computer networks, engineering technology networks, brain functional networks, and social relationship networks. With the improvement of large-scale data acquisition and computing capabilities, it has gradually developed into an important research direction in network science and information science. In various practical application scenarios, a few nodes in a network often play a crucial role in information transmission, structural stability, or system function maintenance. Therefore, how to quickly and accurately identify such influential core nodes in complex networks has significant theoretical research value and broad engineering application implications. For example, in social networks, it can be used to discover potential opinion leaders and key disseminators; in the fields of information dissemination and public safety, it can be used to design viral marketing strategies or suppress the spread of rumors and diseases; in transportation systems, it can be used to identify bottleneck nodes to alleviate congestion; and in power systems, it helps prevent cascading failures caused by the failure of critical nodes.
[0003] To address these needs, scholars have proposed the concept of "node centrality" in the field of complex network structure analysis, and based on this, have constructed various evaluation index systems for measuring the criticality of nodes. In existing research, methods such as degree centrality, betweenness centrality, and proximity centrality are widely used to characterize the structural position and potential influence of nodes in networks, and have achieved certain application results in different types of unweighted networks. However, most of these methods assume that the edges between nodes in the network only reflect "whether a connection exists," without distinguishing between the strength of the connection.
[0004] In real-world network systems, interactions between nodes often exhibit varying strengths. Information such as communication frequency, traffic volume, similarity level, or physical coupling is typically weighted and added to network edges, resulting in a weighted topology. To address this characteristic, patents CN120196922A and CN120528807A propose two methods for identifying key nodes, aiming to more accurately reflect the criticality of nodes in weighted network environments. While these methods can be applied to key node mining in weighted networks, several shortcomings remain. First, most existing methods only output a relative ranking of node criticality, lacking a basis for determining whether a node is statistically critical, making it difficult to distinguish between truly critical nodes and pseudo-critical nodes caused by random fluctuations. Second, in practical applications, network data inevitably contains noise and measurement errors. Existing methods lack effective quality control mechanisms, potentially identifying numerous core nodes with no real significance even in random or near-random networks, thus affecting the reliability of the analysis results. Furthermore, different networks vary significantly in size, edge density, and weight distribution. Existing centrality measures are highly dependent on specific data characteristics, making it difficult to set universal judgment thresholds and limiting the application of the methods in various scenarios. Moreover, the construction of existing indicators is mostly based on empirical design or heuristic rules, lacking a rigorous statistical foundation corresponding to random graph models, making it difficult to derive and quantify the significance of node criticality levels.
[0005] Therefore, existing technologies lack a method for evaluating node criticality within a statistical hypothesis testing framework for weighted networks, in order to provide a statistically significant quantitative determination of node criticality while maintaining weight information. Summary of the Invention
[0006] To address the problems of missing statistical verification, high false alarm rate, and difficulty in unifying thresholds in existing weighted network node identification methods, this invention provides a weighted network key node identification method based on statistical significance testing. By introducing node strength as a test statistic and constructing a weighted random network model, the significance probability value of node keyness is calculated. This method can characterize the degree of node keyness in the form of significance probability. Combined with a testing and correction mechanism, it can achieve quantitative and statistical significance identification of key nodes in weighted networks, thereby improving the reliability and stability of the identification results.
[0007] The technical solution adopted by this invention to solve its technical problem is as follows: A method for identifying key nodes in a weighted network based on statistical significance testing includes the following steps: S1. Obtain the weighted network data to be analyzed, represent it as a weighted network containing nodes, edges and edge weights, and calculate the node degree and node strength of each node, wherein the node strength is the sum of the edge weights connected to the node. S2. Construct a weighted random network model under the original assumptions, generate a random control network while keeping the number of network nodes, the number of edges, and the set of edge weights unchanged, and calculate the significance probability value and its upper bound for each node based on the weighted random network model and the random control network. S3. Compare the upper bound of the saliency probability value with the preset saliency threshold, determine the nodes whose saliency probability values meet the conditions as key nodes in the weighted network, and output the recognition result.
[0008] Furthermore: The node strength mentioned in step S1 is obtained by accumulating the weights of all edges connected to the target node, and is used to characterize the connection strength of the node in the weighted network.
[0009] Furthermore, the null hypothesis in step S2 is specifically: for each node, a null hypothesis and an alternative hypothesis are constructed respectively, wherein the null hypothesis is: the node is not a critical node, and the alternative hypothesis is: the node is a critical node.
[0010] Furthermore: the weighted random network model in step S2 is constructed based on the Erdos–Rényi random graph model; the random control network generates a random control network with consistent edge weights by randomly selecting node pairs from the complete graph to form an edge set and randomly assigning the edge weights from the original weighted network to the edge set.
[0011] Further: The step S2, which involves calculating the significance probability value and its upper bound for each node, includes the following steps: Will Each edge weight is divided into Groups, where each group has the same edge weight, arranged in descending order. The number of edge weight values in each group is denoted as follows: ; Calculate the number of nodes in all generated random control networks. The number A of random control networks whose degree is greater than or equal to the degree in the real weighted network and whose strength value is greater than or equal to the strength value in the real weighted network; Calculate the number B of all generated random control networks; Calculate the node based on the values of A and B. The probability p of having a degree greater than or equal to that in the real weighted network and a strength value in the random control network greater than or equal to that in the real weighted network. i That is, the significance probability value corresponding to the node; Calculate its upper bound based on the probability value.
[0012] Furthermore: The specific formula for calculating A is: ; Where c1, c2...c k Each represents in the whole Selected from the edges In each edge, the weights correspond to the number of edges in groups 1 to k, and ; For nodes The degree; In all Take any edge The sum of the weights obtained shall not be less than [amount]. The number of selected schemes, For nodes The strength value; N is the number of nodes in the network.
[0013] Furthermore: The specific formula for calculating B is: .
[0014] Furthermore: probability value p i The specific calculation formula is as follows: .
[0015] Furthermore: probability value p i The upper bound is denoted as: .
[0016] Further: Step S3 specifically involves: sorting the upper bounds of the probability values of all nodes, and using the Benjamini-Hochberg method in conjunction with a preset significance level to set a threshold for the test results of multiple nodes; when the upper bound of the probability value corresponding to a node is not greater than the threshold, the node is determined to be a key node in the weighted network.
[0017] The beneficial effects of this invention include: This method addresses the problem of identifying key nodes in weighted networks from a statistical significance perspective, creatively transforming it into a rigorous statistical hypothesis testing problem, filling the gap in this field lacking a framework for significance measurement. For weighted networks, this method constructs a null hypothesis and a weighted random network model to calculate the significance probability value and upper bound of each node's keyness, providing a quantitative basis for determining whether a node is statistically significant. Using node strength as the core test statistic, edge weight information is systematically incorporated into the significance testing framework for the first time, providing a statistically grounded quantitative significance evaluation of node keyness while maintaining computational efficiency. Furthermore, this invention introduces a hypothesis testing control strategy in the node significance determination stage, using the Benjamini-Hochberg method to jointly constrain the test results of each node. By controlling the overall misclassification rate, the final set of key nodes is obtained, significantly improving the stability, reliability, and engineering application value of the key node identification results. Attached Figure Description
[0018] Figure 1 This is a flowchart of the overall method of the present invention; Figure 2 This is a diagram of a real weighted network structure. Detailed Implementation
[0019] The technical solution of the present invention will now be clearly and completely described with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0020] Furthermore, the technical features involved in the different embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.
[0021] This invention proposes a weighted network key node identification method based on statistical significance testing. Node strength, i.e., the sum of the weights of all edges connected to a node, is used as the test statistic. An Erdos-Renyi random graph model is used to construct the null hypothesis distribution. Then, the probability value of each node is calculated, which can be defined as the proportion of occurrences where the node strength is not less than the observed value in a large number of random graphs following the Erdos-Renyi model. To enable the calculation to be completed in polynomial time, this invention further derives and adopts an upper bound for this probability value, and performs subsequent multiple test corrections and key node determination accordingly.
[0022] Example 1: The specific implementation steps of the present invention include: S1. Obtain the weighted network data to be analyzed, represent it as a weighted network containing nodes, edges, and edge weights, and calculate the degree and strength of each node, where the node strength is the sum of the edge weights connected to the node; specifically including the following steps: S11. Obtain the weighted network The structured data, in which It is a set of nodes. It is a set of edges. The set of weights representing the edges; S12. Calculate the degree of each node; S13. Computation Node Intensity value S i It is obtained by summing the weights of all edges connected to the target node, and is used to characterize the connection strength of the node in the weighted network: ; in, For nodes The set of neighbors; Represents a node With nodes The weight of the edges connecting them.
[0023] S2. Under the null hypothesis, construct a weighted random network model. While keeping the number of network nodes, edges, and edge weights constant, generate a random control network. Based on the weighted random network model and the random control network, calculate the significance probability value of each node and its upper bound, and perform statistical testing. Specifically, this includes the following steps: S21. For node Construct the corresponding null and alternative hypotheses: Null hypothesis :node Not a critical point; Alternative Hypothesis :node It is a crucial juncture.
[0024] S22. Constructing a weighted random network model based on the Erdos–Rényi random graph model. The weighted random network model generates a random control network in the following way: S221. Randomly and independently draw from the complete graph An unordered set of nodes forms an edge set; S222. Set the weights of the edges. In Each weight is randomly and independently assigned one-to-one to the edge set, resulting in a randomized control network, and the edge weight set of the randomized control network is... completely consistent; S23. Will Each edge weight is divided into Groups, where each group has the same edge weight, arranged in descending order. The number of edge weight values in each group is denoted as follows: ; S24 In all generated random control networks, nodes The number A of random control networks whose degree is greater than or equal to the degree in the real weighted network and whose strength value is greater than or equal to the strength value in the real weighted network is calculated as follows: ; Where c1, c2...c k Each represents in the whole Selected from the edges In each edge, the weights correspond to the number of edges in groups 1 to k, and ; For nodes The degree; In all Take any edge The sum of the weights obtained shall not be less than [amount]. The number of selected schemes, For nodes The strength value; N is the number of nodes in the network.
[0025] S25. The number B of all random control networks generated based on the weighted random graphical model is calculated as follows: .
[0026] S26. Computation Node In the random control network, the degree is greater than or equal to the degree in the true weighted network, and the strength value in the random control network is greater than or equal to the strength value in the true weighted network. The probability value p i Its upper bound is:
[0027] In step S2, the upper bound of the probability value is obtained by analytical derivation, and the upper bound is used to replace the probability value for subsequent statistical determination, so as to reduce the computational complexity.
[0028] S3. Compare the upper bound of the saliency probability value with a preset saliency threshold, determine the nodes whose saliency probability values meet the conditions as key nodes in the weighted network, and output the recognition result. Specifically: The probability values of all nodes are sorted from smallest to largest upper bound, and the Benjamini–Hochberg method is used in conjunction with a preset significance level to set thresholds for the test results of multiple nodes. , To control the misjudgment rate in the key node identification process; compare sequentially That is, when the upper bound of the probability value corresponding to a node is not greater than the threshold, the node is determined to be a key node in the weighted network.
[0029] Example 2: Combination Figure 2 The actual weighted network contains 34 nodes and 78 edges. Substituting these values: The weighted random network model generates a random control network by randomly and independently selecting 78 unordered node pairs from the complete graph to form an edge set; and then setting the weights of the edges. The 78 weights in the random control network are randomly and independently assigned one-to-one to the edge set, resulting in a random control network, and the edge weight set of the random control network is consistent with... completely consistent; The 78 weights are divided into 7 groups, with all weights in each group being equal. These groups are then arranged in descending order, and the number of weights in each group is denoted as follows: .
[0030] In all the generated random control networks, nodes The number A of random control networks whose degree is greater than or equal to the degree in the real weighted network and whose strength value is greater than or equal to the strength value in the real weighted network is calculated as follows:
[0031] in, , For nodes The degree, It is to choose any of the 78 edges. The sum of the weights obtained shall not be less than [amount]. The number of selection schemes.
[0032] The number B of all random control networks generated based on the weighted random graph model is calculated as follows:
[0033] compute nodes In the random control network, the degree is greater than or equal to the degree in the true weighted network, and the strength value in the random control network is greater than or equal to the strength value in the true weighted network. The probability value p i for:
[0034] The upper bound can be calculated as , The upper bound can be calculated as Further, nodes can be obtained. The upper bound of the probability value is:
[0035] Arrange the original probability values of the 34 nodes obtained in step S2 in ascending order as follows: The Benjamini–Hochberg method was used to set a significance threshold. Calculate the critical sequence number:
[0036] Preserve serial number All nodes are selected as the final significant nodes, and their nodes and corresponding strengths are written into the results file to complete the selection and output of key network nodes.
[0037] Obviously, the above embodiments are merely illustrative examples for clear explanation and are not intended to limit the implementation. Those skilled in the art will recognize that other variations or modifications can be made based on the above description. It is neither necessary nor possible to exhaustively list all possible implementations here. However, obvious variations or modifications derived therefrom are still within the scope of protection of this invention.
Claims
1. A method for identifying key nodes in a weighted network based on statistical significance testing, characterized in that, Includes the following steps: S1. Obtain the weighted network data to be analyzed, represent it as a weighted network containing nodes, edges and edge weights, and calculate the node degree and node strength of each node, wherein the node strength is the sum of the edge weights connected to the node. S2. Construct a weighted random network model under the original assumptions, generate a random control network while keeping the number of network nodes, the number of edges, and the set of edge weights unchanged, and calculate the significance probability value and its upper bound for each node based on the weighted random network model and the random control network. S3. Compare the upper bound of the saliency probability value with the preset saliency threshold, determine the nodes whose saliency probability values meet the conditions as key nodes in the weighted network, and output the recognition result.
2. The method for identifying key nodes in a weighted network based on statistical significance testing according to claim 1, characterized in that: The node strength mentioned in step S1 is obtained by accumulating the weights of all edges connected to the target node, and is used to characterize the connection strength of the node in the weighted network.
3. The method for identifying key nodes in a weighted network based on statistical significance testing according to claim 2, characterized in that: The null hypothesis in step S2 is as follows: for each node, a null hypothesis and an alternative hypothesis are constructed respectively, wherein the null hypothesis is: the node is not a critical node, and the alternative hypothesis is: the node is a critical node.
4. The method for identifying key nodes in a weighted network based on statistical significance testing according to claim 3, characterized in that: The weighted random network model described in step S2 is constructed based on the Erdos–Rényi random graph model; the random control network generates a random control network with consistent edge weights by randomly selecting node pairs from the complete graph to form an edge set and randomly assigning the edge weights from the original weighted network to the edge set.
5. The method for identifying key nodes in a weighted network based on statistical significance testing according to claim 4, characterized in that: Step S2, which involves calculating the significance probability value and its upper bound for each node, includes the following steps: Will Each edge weight is divided into Groups, where each group has the same edge weight, arranged in descending order, and the number of edge weight values in each group is denoted as follows: ; Calculate the number of nodes in all generated random control networks. The number A of random control networks whose degree is greater than or equal to the degree in the real weighted network and whose strength value is greater than or equal to the strength value in the real weighted network; Calculate the number B of all generated random control networks; Calculate the node based on the values of A and B. The probability p of having a degree greater than or equal to that in the real weighted network and a strength value in the random control network greater than or equal to that in the real weighted network. i That is, the significance probability value corresponding to the node; Calculate its upper bound based on the probability value.
6. The method for identifying key nodes in a weighted network based on statistical significance testing according to claim 5, characterized in that: The specific formula for calculating A is: ; Where c1, c2...c k Each represents in the whole Selected from the edges In each edge, the weights correspond to the number of edges in groups 1 to k, and ; For nodes The degree; In all Take any edge The sum of the weights obtained shall not be less than [amount]. The number of selected schemes, For nodes The strength value; N is the number of nodes in the network.
7. The method for identifying key nodes in a weighted network based on statistical significance testing according to claim 6, characterized in that: The specific formula for calculating B is: 。 8. The method for identifying key nodes in a weighted network based on statistical significance testing according to claim 7, characterized in that: probability value p i The specific calculation formula is as follows: 。 9. The method for identifying key nodes in a weighted network based on statistical significance testing according to claim 8, characterized in that: probability value p i The upper bound is denoted as: 。 10. The method for identifying key nodes in a weighted network based on statistical significance testing according to claim 9, characterized in that: Step S3 specifically involves: sorting the upper bounds of the probability values of all nodes, and using the Benjamini-Hochberg method in conjunction with a preset significance level to set a threshold for the test results of multiple nodes; when the upper bound of the probability value corresponding to a node is not greater than the threshold, the node is determined to be a key node in the weighted network.