A method for identifying important software based on a software dependency graph
By constructing a software dependency graph and combining algorithms for quantity, depth, integration, and ecosystem indicators, the problem of neglecting dependency relationships in software impact assessment is solved, and a more comprehensive assessment of software importance is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NANJING UNIV
- Filing Date
- 2023-08-14
- Publication Date
- 2026-06-12
AI Technical Summary
Existing software impact assessment methods fail to effectively utilize the dependencies between software, resulting in the neglect of these relationships during identification and an incomplete impact assessment.
We construct a software dependency graph, identify important nodes through algorithms for quantity, depth, integration, and ecosystem indicators, and use the AHP method to set weights to comprehensively evaluate the software's impact.
By constructing a software dependency graph, the impact of software can be comprehensively assessed, providing a more accurate ranking of software importance and improving the comprehensiveness and scalability of the assessment.
Smart Images

Figure CN117009239B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of software engineering, and in particular to an important software identification method based on software dependency graphs. Background Technology
[0002] Software is both a system product and a research and development tool; its essence is the crystallization of human thought, and it is also an important carrier of science and technology. However, in a research evaluation system primarily driven by publications, the value of software has long been underestimated or even ignored. Quantifying software impact can not only better reflect the contributions of developers but also provide decision-making basis for science and technology management departments in formulating science and technology policies and investing in software research and development, in order to address potential national bottlenecks or gain patent advantages in specialized fields. Although some scholars have studied software impact, there is relatively little research evaluating software impact from the perspective of R&D dependence.
[0003] Currently, there are different approaches in the field of software influence research. The most direct and earliest method is questionnaire surveys, which involve distributing questionnaires to different groups such as universities, enterprises, and users to investigate the importance of software in their work, which types of software are more important, and which software within the same type is more influential. Another approach is to measure software influence using the number of citations in scientific papers, which is more common from a research and development perspective. However, due to improper citations, the software citation omission rate often exceeds 50%. Given the serious problem of missing software citations, some scholars have proposed using the frequency and breadth of software mentions in academic papers to measure software influence. Furthermore, indicators such as the number of software downloads, registered users, user reviews, and reuse frequency can also be used to evaluate software influence. Summary of the Invention
[0004] To address the problem that existing software influence assessment methods rarely construct software dependency graphs to identify software influence from a graph theory perspective, leading to the neglect of relationships between software during the identification process, this invention proposes an important software identification method based on software dependency graphs. By constructing software dependency graphs, this invention comprehensively obtains software influence indicators. This invention is achieved through the following technical solutions.
[0005] A method for identifying critical software based on software dependency graphs, characterized by the following steps:
[0006] Step 1) Define a directed graph to describe the dependencies between software, parse the software and dependency description files, obtain the dependencies between software, and construct a software dependency graph;
[0007] Step 2) Use the quantitative index algorithm to calculate the important nodes in the dependency graph and obtain the ranking of the quantitatively important nodes;
[0008] Step 3) Use the depth index algorithm to calculate the important nodes in the dependency graph and obtain the ranking of depth important nodes;
[0009] Step 4) Use the integration index algorithm to calculate the important nodes in the dependency graph and obtain the ranking of important nodes by integration degree;
[0010] Step 5) Use the ecological index algorithm to calculate the important nodes in the dependency graph and obtain the ranking of ecologically important nodes;
[0011] Step 6) Use the AHP method to assign weights to each indicator and combine different indicators to obtain the overall influence ranking of the nodes.
[0012] The aforementioned method for identifying important software based on software dependency graphs is characterized in that step 1) obtaining the dependencies between software and constructing a software dependency graph involves using software as nodes and the dependencies between software as edges to construct the software dependency graph.<software_a,software_b> This represents an edge from node software_a to node software_b, specifically meaning that software a depends on software b.
[0013] The above-mentioned method for identifying important software based on software dependency graphs is characterized in that the software described in step 1) is a software package in different programming languages.
[0014] The aforementioned method for identifying important software based on software dependency graphs is characterized in that step 2) of obtaining the ranking of quantitatively important nodes in the dependency graph using a quantitative index algorithm includes the following steps:
[0015] Step 21) Set the initial voting capacity Vp = 1 for each node in the dependency graph. The initial values are all the same. Set the decay factor v1 = the reciprocal of the average in-degree of the nodes in the graph.
[0016] Step 22) Calculate the score P for each node. For the target node i, P... i The calculation method is to add up the voting capabilities of nodes that depend on the target node;
[0017] Step 23) Sort all node scores and take the node N with the highest score. highest1 , make N highest1 The voting power of the voting node is subtracted from v1; if the result is negative, it is set to 0.
[0018] Step 24) Transfer the node N highest1 Add it to the end of list R1, and add node N. highest1 The voting ability is set to 0;
[0019] Step 25) Repeat steps 22) through 24) until all nodes have been added to list R1.
[0020] The aforementioned method for identifying important software based on software dependency graphs is characterized in that step 3) of obtaining the ranking of deeply important nodes in the dependency graph using a depth index algorithm includes the following steps:
[0021] Step 31) Calculate the dependency level D of each node; the calculation method is as follows: for target node i, if the target node i does not depend on any node, then the dependency level Di of the target node i is 1; otherwise, calculate the square mean of the dependency levels of all nodes that the target node depends on + 1 as the dependency level of the target node.
[0022] Step 32) Calculate the score of all nodes; the calculation method is as follows: for target node i, subtract Di from the dependency level of all nodes that depend on or indirectly depend on target node i, calculate the number N of all nodes in the dependency graph, and then add the differences and divide by N-1.
[0023] The aforementioned method for identifying important software based on software dependency graphs is characterized in that step 4) of obtaining the ranking of important nodes in the dependency graph using an integration index algorithm includes the following steps:
[0024] Step 41) Set the initial voting capacity Vr = 1 for each path in the dependency graph. The initial values are all the same. Set the decay factor v2 = the reciprocal of the average path length of the graph minus 2.
[0025] Step 42) Calculate the score of each node; the calculation method is as follows: for the target node i, calculate the set Ri of paths where the node i appears at the non-ends, and add up the voting power of the paths in the set to get the score of the node i.
[0026] Step 43) Sort all node scores and take the node N with the highest score. highest2 , and make all the nodes N highest2 Subtract v2 from the voting power of the voting path; if the result is negative, then take the value as 0.
[0027] Step 44) Add node N to list R2;
[0028] Step 45) Repeat steps 42)-44) until all nodes have been added to list R2;
[0029] The above-mentioned method for identifying important software based on software dependency graphs is characterized in that step 5) of obtaining the ranking of ecologically important nodes in the dependency graph using an ecological index algorithm includes the following steps:
[0030] Step 51) Define the set of all nodes in the dependency graph as G. Calculate the subgraph for all nodes by finding the set of nodes that directly or indirectly depend on the target node i, and constructing a subgraph g based on the dependencies between the nodes in this set and these nodes in the original dependency graph. i ;
[0031] Step 52) Select any node n and in its subgraph, calculate the joint influence CI for each node i. i The calculation method involves subtracting 1 from the degree of each node reachable from node i via a non-repeating edge of length l, summing the results, and then multiplying the sum by the degree of node i minus 1; in this method, l is taken as 3. Nodes are then sorted in descending order of their CI values.
[0032] Step 53) In the subgraph, delete the node with the highest CI value. Calculate λ by summing the CI values of the current nodes in the graph and dividing by the sum of their degrees, then taking the (l+1)th root of the result; if λ is greater than 1, return to step 52); otherwise, consider the graph destroyed, record the total number of nodes deleted at this time and use it as the score of node n;
[0033] Step 54) Delete n in G. If G is not empty, repeat steps 52) to 53).
[0034] The aforementioned method for identifying important software based on software dependency graphs is characterized in that step 6) uses the AHP method to obtain the comprehensive influence ranking of four important nodes. The specific implementation process includes the following steps:
[0035] Step 61) Use the AHP method to construct a feature matrix for the four indicators in steps 2)-5) and calculate the weight of each indicator;
[0036] Step 62) Sort the scores of all software obtained in Steps 2)-5) using the four software indicator scores in descending order of the indicators. Take the natural logarithm of the scores of the top K software in each indicator and divide it by the average score of the logarithm of those K software. Use the calculated score as the actual score of that software in the corresponding indicator. The actual score of software not included in the top K is 0. In this method, K is 2000. Calculate the weighted average of the actual scores of each software in each indicator according to the weights obtained in Step 61) to obtain its comprehensive influence.
[0037] The above-described method for identifying important software based on software dependency graphs is characterized in that the weights of different indicators in step 6) can be adjusted according to the actual situation of the constructed software dependency graph.
[0038] The present invention adopts the above technical solution and has the following beneficial effects:
[0039] (1) This method constructs a software dependency graph and incorporates the relationships between software into the scope of software influence identification.
[0040] (2) This method uses software dependency graphs to identify important software, which can introduce graph theory related techniques and provide new ideas for the field of software engineering.
[0041] (3) This method is scalable. The weights assigned between different dimensional methods can be adjusted according to the graph features of the software dependency graph.
[0042] (4) This method uses indicators of different dimensions, making it more comprehensive. Attached Figure Description
[0043] Figure 1 This is an overall flowchart of the important software identification method based on software dependency graphs according to an embodiment of the present invention.
[0044] Figure 2 This is a flowchart illustrating the calculation of quantity indicators in an embodiment of the present invention.
[0045] Figure 3 This is a flowchart illustrating the calculation process of the depth index in an embodiment of the present invention.
[0046] Figure 4 This is a flowchart illustrating the calculation of the integration index in an embodiment of the present invention.
[0047] Figure 5 This is a flowchart illustrating the calculation process of ecological indicators in an embodiment of the present invention. Detailed Implementation
[0048] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments.
[0049] This invention proposes an important software identification method based on software dependency graphs to solve existing problems, such as... Figure 1 As shown, Figure 1 The important software identification method based on software dependency graphs in this embodiment of the invention includes the following steps:
[0050] Step 1) Define a directed graph to describe the dependencies between software, parse the software and dependency description files, obtain the dependencies between software, and construct a software dependency graph. The specific implementation process is as follows:
[0051] A software dependency graph is constructed using software as nodes and the dependencies between software as edges.<software_a,software_b> This represents an edge from node software_a to node software_b, specifically meaning that software a depends on software b.
[0052] Step 2) Use the quantitative index algorithm to calculate the important nodes in the dependency graph and obtain the ranking of the quantitatively important nodes, such as... Figure 2 As shown, Figure 2 This is a flowchart illustrating the calculation process of quantity indicators in an embodiment of the present invention. The specific implementation process includes the following steps:
[0053] Step 21) Set the initial voting capacity Vp = 1 for each node in the dependency graph. The initial values are all the same. Set the decay factor v1 = the reciprocal of the average in-degree of the nodes in the graph.
[0054] Step 22) Calculate the score P for each node. For the target node i, P... i The calculation method is to add up the voting capabilities of nodes that depend on the target node;
[0055] Step 23) Sort all node scores and take the node N with the highest score. highest1 , make N highest1 The voting power of the voting node is subtracted from v1; if the result is negative, it is set to 0.
[0056] Step 24) Transfer the node N highest1 Add it to the end of list R1, and add node N. highest1 The voting ability is set to 0;
[0057] Step 25) Repeat steps 22) through 24) until all nodes have been added to list R1.
[0058] Step 3) Use the depth index algorithm to calculate the important nodes in the dependency graph and obtain the ranking of depth-important nodes, such as... Figure 3 As shown, Figure 3 This is a flowchart illustrating the calculation process of the depth index in an embodiment of the present invention. The specific implementation process includes the following steps:
[0059] Step 31) Calculate the dependency level D of each node; the calculation method is as follows: for target node i, if the target node i does not depend on any node, then the dependency level Di of the target node i is 1; otherwise, calculate the square mean of the dependency levels of all nodes that the target node depends on + 1 as the dependency level of the target node.
[0060] Step 32) Calculate the score of all nodes; the calculation method is as follows: for target node i, subtract Di from the dependency level of all nodes that depend on or indirectly depend on target node i, calculate the number N of all nodes in the dependency graph, and then add the differences and divide by N-1.
[0061] Step 4) Use the integration index algorithm to calculate the important nodes in the dependency graph and obtain the ranking of important nodes by integration degree, such as... Figure 4 As shown, Figure 4This is a flowchart illustrating the calculation of the integration index in an embodiment of the present invention. The specific implementation process includes the following steps:
[0062] Step 41) Set the initial voting capacity Vr = 1 for each path in the dependency graph. The initial values are all the same. Set the decay factor v2 = the reciprocal of the average path length of the graph minus 2.
[0063] Step 42) Calculate the score of each node; the calculation method is as follows: for the target node i, calculate the set Ri of paths where the node i appears at the non-ends, and add up the voting power of the paths in the set to get the score of the node i.
[0064] Step 43) Sort all node scores and take the node N with the highest score. highest2 , and make all the nodes N highest2 Subtract v2 from the voting power of the voting path; if the result is negative, then take the value as 0.
[0065] Step 44) Add node N to list R2;
[0066] Step 45) Repeat steps 42)-44) until all nodes have been added to list R2;
[0067] Step 5) Use the ecological index algorithm to calculate the important nodes in the dependency graph and obtain the ranking of ecologically important nodes, such as... Figure 5 As shown, Figure 5 This is a flowchart illustrating the calculation process of ecological indicators in an embodiment of the present invention. The specific implementation process includes the following steps:
[0068] Step 51) Define the set of all nodes in the dependency graph as G. Calculate the subgraph for all nodes by finding the set of nodes that directly or indirectly depend on the target node i, and constructing a subgraph g based on the dependencies between the nodes in this set and these nodes in the original dependency graph. i ;
[0069] Step 52) Select any node n and in its subgraph, calculate the joint influence CI for each node i. i The calculation method involves subtracting 1 from the degree of each node reachable from node i via a non-repeating edge of length l, summing the results, and then multiplying the sum by the degree of node i minus 1; in this method, l is taken as 3. Nodes are then sorted in descending order of their CI values.
[0070] Step 53) In the subgraph, delete the node with the highest CI value. Calculate λ by summing the CI values of the current nodes in the graph and dividing by the sum of their degrees, then taking the (l+1)th root of the result; if λ is greater than 1, return to step 52); otherwise, consider the graph destroyed, record the total number of nodes deleted at this time and use it as the score of node n;
[0071] Step 54) Delete n in G. If G is not empty, repeat steps 52) to 53).
[0072] Step 6) Use the AHP method to assign weights to each indicator, and combine different indicators to obtain the overall influence ranking of the nodes. The specific implementation process includes the following steps:
[0073] Step 61) Use the AHP method to construct a feature matrix for the four indicators in steps 2)-5) and calculate the weight of each indicator;
[0074] Step 62) Sort the scores of all software obtained in Steps 2)-5) using the four software indicator scores in descending order of the indicators. Take the natural logarithm of the scores of the top K software in each indicator and divide it by the average score of the logarithm of those K software. Use the calculated score as the actual score of that software in the corresponding indicator. The actual score of software not included in the top K is 0. In this method, K is 2000. Calculate the weighted average of the actual scores of each software in each indicator according to the weights obtained in Step 61) to obtain its comprehensive influence.
[0075] The above description is merely a preferred embodiment of the present invention, but the specific embodiments described herein are only for explaining the present invention and are not intended to limit the present invention. Any simple modifications, equivalent changes, and alterations made by those skilled in the art to the above embodiments based on the technical essence of the present invention without departing from the principles and spirit of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for identifying important software based on software dependency graphs, characterized in that, The method includes the following steps: Step 1) Define a directed graph to describe the dependencies between software, parse the software and dependency description files, obtain the dependencies between software, and construct a software dependency graph; Step 2) Use the quantitative index algorithm to calculate the important nodes in the dependency graph and obtain the quantitative index score of each software node; Step 3) Use the depth index algorithm to calculate the important nodes in the dependency graph and obtain the depth index score of each software node; Step 3) specifically includes: Step 31) Calculate the dependency level D of each node; the calculation method is as follows: for target node i, if the target node i does not depend on any node, then the dependency level Di of the target node i is 1; otherwise, calculate the square mean of the dependency levels of all nodes that the target node depends on + 1 as the dependency level of the target node. Step 32) Calculate the score of all nodes; the calculation method is as follows: for target node i, subtract Di from the dependency level of all nodes that depend on or indirectly depend on target node i, calculate the number N of all nodes in the dependency graph, and then add the differences and divide by N-1. Step 4) Use the integration index algorithm to calculate the integration index score of each software node by calculating the important nodes in the dependency graph; Step 4) specifically includes: Step 41) Set the initial voting capacity Vr=1 for each path in the dependency graph. The initial values are all the same. Set the decay factor v2=the reciprocal of the average path length of the graph minus 2. Step 42) Calculate the score of each node; the calculation method is as follows: for the target node i, calculate the set Ri of paths where the node i appears at the non-ends, and add up the voting power of the paths in the set to get the score of the node i. Step 43) Sort all the nodes by their scores and take the node with the highest score, Nhighest2. Subtract v2 from the voting power of all paths that vote for node Nhighest2. If the result is negative, take 0. Step 44) Add node N to list R2; Step 45) Repeat steps 42) through 44) until all nodes have been added to list R2; Step 5) Use the ecological index algorithm to calculate the ecological index score of each software node from the important nodes in the dependency graph; Step 5) specifically includes: Step 51) Define the set of all nodes in the dependency graph as G; calculate the subgraph of all nodes by finding the set of nodes that directly or indirectly depend on the target node i, and constructing the subgraph based on the dependency relationships between the nodes in the set and these nodes in the original dependency graph. ; Step 52) Select any node n and in its subgraph, calculate the joint influence CIi for each node i; the calculation method is to subtract 1 from the degree of each node that can be reached by a non-repeating edge of length l, sum the sum, and then multiply the sum by the degree of node i minus 1; l is 3; sort the nodes in descending order of CI value. Step 53) In the subgraph, delete the node with the highest CI value; calculate λ by summing the CI values of the current nodes in the graph and dividing by the sum of their degrees, then taking the (l+1)th root of the result; if λ is greater than 1, return to step 52); otherwise, consider the graph to be destroyed, record the total number of nodes deleted at this time and use it as the score of node n. Step 54) Delete n in G. If G is not empty, repeat steps 52) to 53). Step 6) Use the AHP method to assign weights to each indicator and combine different indicators to obtain the node's overall influence score.
2. The method for identifying important software based on software dependency graphs according to claim 1, characterized in that, Step 1) obtaining the dependencies between software and constructing a software dependency graph involves building the software dependency graph with software as nodes and the dependencies between software as edges.<software_a,software_b> This represents an edge from node software_a to node software_b, specifically meaning that software a depends on software b during development.
3. The method for identifying important software based on software dependency graphs according to claim 1 or 2, characterized in that, The software described in step 1) is software in different programming languages.
4. The method for identifying important software based on software dependency graphs according to claim 1, characterized in that, Step 2) involves obtaining the quantitative index scores for each software node in the dependency graph using a quantitative index algorithm. The specific implementation process includes the following steps: Step 21) Set the initial voting capacity Vp=1 for each node in the dependency graph. The initial values are all the same. Set the decay factor v1=the reciprocal of the average in-degree of the nodes in the graph. Step 22) Calculate the score P for each node, for the target node. , The calculation method is to add up the voting capabilities of nodes that depend on the target node; Step 23) Sort all the nodes by their scores and take the node with the highest score, Nhighest1. Subtract v1 from the voting power of all nodes that voted for Nhighest1. If the result is negative, take 0. Step 24) Add node Nhighest1 to the end of list R1 and set the voting power of node Nhighest1 to 0; Step 25) Repeat steps 22) through 24) until all nodes are added to list R1.
5. The method for identifying important software based on software dependency graphs according to claim 1, characterized in that, Step 6) Using the AHP method to obtain the comprehensive influence ranking of the four important nodes includes the following steps: Step 61) Use the AHP method to construct a feature matrix for the four indicators in steps 2)-5) and calculate the weight of each indicator; Step 62) Sort the scores of the four software indicators of all software obtained in Steps 2)-5) in descending order of indicators. Take the natural logarithm of the scores of the top K software in each indicator and divide it by the average score of the logarithm of the top K software. The calculated score is the actual score of the software in the corresponding indicator. The actual score of software not included in the top K is 0. K is 2000. According to the weight of each indicator obtained in Step 61), the weighted average of the actual scores of each software in each indicator is calculated as its comprehensive influence.
6. The method for identifying important software based on software dependency graphs according to claim 1 or 5, characterized in that, The weights of different indicators and the parameters in the calculation of specific indicators in step 6) can be adjusted according to the actual situation of the constructed software dependency graph.