An API misuse detection and correction method based on feedback mechanism
By employing a feedback-based API misuse detection method, which utilizes API usage graphs and frequent pattern mining algorithms to detect API misuse and adjusts the dataset based on user feedback, this approach addresses the issues of low detection accuracy and insufficient correction in existing technologies, achieving higher precision in API misuse detection and correction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
- Filing Date
- 2023-04-04
- Publication Date
- 2026-06-12
AI Technical Summary
Existing API misuse detection methods suffer from low detection accuracy, high false alarm rate, and lack of effective correction suggestions, making it difficult to effectively detect and repair software defects caused by API misuse.
We employ a feedback-based approach, collecting source code of both correct and misused APIs. We then use API usage graphs and frequent pattern mining algorithms to discover patterns, combine graph distance algorithms to detect API misuse, and adjust the dataset based on user feedback to provide correction suggestions.
It improves the accuracy of API misuse detection and the precision of correction suggestions, reduces false alarms, and enhances the security and efficiency of software development.
Smart Images

Figure CN116483700B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of software engineering technology, specifically relating to a method for detecting and correcting API misuse based on a feedback mechanism. Background Technology
[0002] In modern software development, developers often rely on third-party libraries that provide reusable functionality, accessed through Application Programming Interfaces (APIs). APIs provide software developers with a means to interact with software development kits, libraries, operating systems, frameworks, and cloud services. Using APIs, developers can implement specific methods and directly complete corresponding functions by calling the corresponding API, without needing to access the source code or understand the internal workings of the API. Therefore, by using APIs, software developers can simplify their work, improve efficiency and code quality, and reduce the overhead of reinventing existing functionality by leveraging existing software, thus lowering development costs.
[0003] When software developers call APIs, they need to follow the constraints and usage guidelines. For example, exception handling is required when using file read / write streams to read and write files. However, due to the complexity of APIs themselves, the often implicit assumptions behind usage constraints, incomplete or ambiguous API documentation, and untimely updates, developers face significant challenges in learning and using APIs, leading to frequent API misuse issues during software development. Furthermore, API misuse also often occurs due to a disconnect between software developers and the internal workings of the APIs they use, or oversights by the API users themselves.
[0004] API misuse refers to violations of the proper usage constraints of an API, such as incorrect method calls, missing condition checks, and missing exception handling. In real-world projects, API misuse can lead to functional errors, performance issues, and security vulnerabilities, becoming a common cause of software performance degradation, crashes, and vulnerabilities, posing significant security risks to software development. Furthermore, due to varying levels of security awareness among developers and the lack of high-quality API documentation, software problems caused by API misuse persist, seriously jeopardizing software security. Therefore, detecting API misuse is a crucial task in software development, and ideally, the development environment should be able to provide accurate corrective suggestions for detected API misuse.
[0005] To reduce API misuse, many API misuse detection methods have been proposed, which can be basically divided into the following two categories:
[0006] The first category involves inference based on API documentation. This method uses natural language processing to analyze API documents and employs heuristic language patterns to infer specific types of API constraints, extracting usage rules to detect API misuse. For example, Ren et al. proposed a method using a fine-grained API constraint knowledge graph to detect whether API usage violates known usage constraints, such as call order and preconditions, thereby detecting API misuse. This method develops an open information extraction approach, crawling online API documentation to obtain API call constraints, then converting them into a declaration graph and comparing it with the source code to detect violations of API call constraints. However, due to limitations imposed by API documentation—namely, in many libraries, developers are unwilling or lack the ability to write high-quality documentation—many API constraints cannot be correctly inferred from their documentation. Furthermore, because constraint extraction from API documentation cannot be well integrated with actual software development processes, the accuracy of API misuse detection needs improvement.
[0007] Another approach involves transforming API usage instances into API call patterns based on an existing API project set, extracting API call rules, and then using these rules to detect API misuse. Sven et al. proposed an API misuse detection tool called MuDetect. This method mines API usage graphs from cross-project code instances, using cross-project data to improve the extraction of API call patterns, and then uses these patterns to detect API misuse. Many static API misuse detectors mine usage patterns—frequently occurring equivalent APIs—and report any anomalies in these patterns as potential API misuse. These methods all adhere to the assumption that any deviation related to frequent usage patterns is potential misuse, but existing detectors still have a large number of false positives because there may be uncommon but correct usage patterns that do not conform to the mined patterns.
[0008] In the area of API misuse correction, automated defect repair methods are generally employed. This involves creating patches using the program's test suite as a specification of its expected behavior, and automatically generating patches to fix software defects, thereby improving the efficiency of defect repair. Zhang et al. developed Seader, a sample-based detection tool that infers vulnerability repair patterns and applies these patterns to vulnerability detection and remediation recommendations. Seader infers API misuse templates by comparing code snippets, combining intra-program and inter-program analysis to search for API misuses and provide highly accurate remediation suggestions. However, current automated defect repair methods still suffer from shortcomings in fixing API misuse defects, including limited defect types that can be repaired, reliance on predefined repair templates, and low efficiency.
[0009] Meanwhile, many current API misuse detection tools can only detect API misuse without providing correction suggestions or defect remediation methods. Therefore, current API misuse detection methods still have certain shortcomings, and new methods are urgently needed to improve existing technologies. Summary of the Invention
[0010] The purpose of this invention is to propose a novel method for detecting API misuse based on a feedback mechanism, while also providing corrective suggestions for detected API misuse. Our new method comprises four main phases: data collection, code pattern mining, API misuse detection, and API misuse correction. The main objectives of each phase are as follows:
[0011] During the data collection phase, a large number of high-quality API correct usage projects and API misuse code sets are collected to obtain a representative and comprehensive collection of source code, ensuring the richness and diversity of API types.
[0012] In the code pattern mining phase, API usage patterns are mined using API usage graphs and frequent pattern mining algorithms to obtain a comprehensive dataset of correct / incorrect API usage patterns.
[0013] In the API misuse detection phase, the code to be detected is judged from two aspects: the API correct / misuse pattern dataset. This reduces the influence of the naive assumption that any deviation related to frequent use patterns is a potential misuse, thereby further improving the accuracy of API misuse detection and reducing false alarms.
[0014] During the API misuse correction phase, correction suggestions are proposed for detected API misuses, and the API usage pattern dataset is continuously adjusted based on user feedback. By recording user interaction information, the accuracy of API misuse detection and API correction suggestion generation is further improved.
[0015] A method for detecting and correcting API misuse based on a feedback mechanism includes the following steps:
[0016] 1) Collect source code sets for correct use of Application Programming Interfaces (APIs) and source code sets for API misuse;
[0017] 2) Extract correct API usage patterns and API misuse patterns from the correct API usage source code set and the incorrect API usage source code set;
[0018] 3) Given the code to be detected, convert the code to be detected into an API usage graph to be detected, and use the graph distance algorithm to detect whether API misuse has occurred;
[0019] 4) After detecting API misuse, propose modifications to the misused API.
[0020] Preferably, the implementation process of step 1) is as follows:
[0021] Step 1.1) Select a large, real-world open-source client codebase as the source code set for proper API use;
[0022] For the obtained API correct usage source code set, multiple source files of the API correct usage source code set are filtered according to the development language, and source files ending with .java are retained; the source files ending with .java are parsed by the Java code parsing tool JavaParser to obtain the abstract syntax tree of each method body contained in the source files ending with .java, the target API is extracted from it, and the corresponding usage examples of the target API are extracted using program slicing technology. The extracted usage examples are used as API correct usage examples;
[0023] Step 1.2) Obtain a collection of source code examples of API misuses through the collective wisdom of the technical Q&A website Stack Overflow;
[0024] Extract API types from official documentation, use a search engine to search for and link to the corresponding API types on the technical Q&A website Stack Overflow; at the same time, select examples of API misuse by searching for API types and keywords that appear in the title or question of posts.
[0025] Preferably, in step 2):
[0026] The process of converting a source code set into an API usage graph is as follows: 1) Represent objects, values, and text in API usage using data nodes; 2) Represent method calls, operators, and instructions in API usage using action nodes; 3) Represent the control and data flow between the entities represented by nodes and actions using edges, which are divided into eight types, including receiving edges, parameter edges, definition edges, sequence edges, conditional edges, throwing edges, processing edges, and synchronization edges; among which API usage includes correct API usage and API misuse;
[0027] Modify and improve the obtained API usage graph: First, add a type attribute to each order edge of the API usage graph to indicate the order of method calls; second, use fields and parameters different from local variables to represent the information in the data nodes; finally, find the statements that identify misuse of basic information from the code blocks containing constructors and field initializations, and link the found corresponding statements to the API usage graph of the methods that use the corresponding statements through the order edge.
[0028] Step 2.1) Based on the conversion process, convert the API correct usage source code set and the API misuse source code set into API correct usage diagrams and API misuse diagrams respectively;
[0029] Step 2.2) Mining API correct usage patterns: Using the API correct usage graph and the minimum threshold min_sup as input to the frequent subgraph mining algorithm gSpan, identify subgraphs with a frequency higher than the minimum threshold min_sup, that is, mine the API correct usage patterns and obtain the API correct usage pattern dataset.
[0030] For API misuse pattern mining: Since there are many forms of API misuse, and each type of API misuse exists as a separate API misuse pattern, API misuse patterns are directly represented by API misuse graphs to obtain API misuse pattern datasets;
[0031] Step 2.3) For the obtained API correct usage patterns, perform an initial sorting based on the frequency of support.
[0032] Preferably, the implementation process of step 3) is as follows:
[0033] Step 3.1) Convert the code to be detected into a graph of API usage according to the conversion process;
[0034] Step 3.2) Detect whether API misuse has occurred using a graph distance algorithm:
[0035] For the API usage graph to be detected, a graph distance algorithm is used to compare the API usage graph to be detected with the API correct usage graph and the API misuse graph. The relative distance between the API usage graphs is used to determine whether API misuse has occurred.
[0036] First, define dist as a distance function. The relative distance between any two API usage graphs augi and augj is represented as dist(augi, augj)∈[0,1], where 0 indicates that the two API usage graphs are used in exactly the same way, and 1 indicates that the two usages are completely different. In addition, each API usage graph in the set of APIs used correctly is represented as augc, and each API usage graph in the set of APIs used incorrectly is represented as augm.
[0037] For each API usage graph to be detected, the API name is used as the search keyword. Full-text search is performed in the source code set of correct API usage and the source code set of misused API to obtain a set of API usage graph datasets describing correct usage C = {augc1, augc2, ..., augcm} and a set of misused API usage graph datasets M = {augm1, augm2, ..., augmn}.
[0038] Based on the graph distance algorithm, the graph representing the API usage to be detected is denoted as augt. When the API usage graph is correctly used, the following is expected:
[0039]
[0040] When the API being tested is misused, the following is expected:
[0041]
[0042] Preferably, the implementation process of step 4) is as follows:
[0043] Step 4.1) Present suggested code corrections:
[0044] For detected API misuse, the API correct usage pattern dataset is retrieved based on the API name, and the top 5 API misuse patterns are selected based on the correction suggestion scores. Since API misuse patterns are directly represented by API usage graphs, the top 5 API usage graphs are obtained. Then, the nodes and edges in the API usage graphs are traversed to extract the order of API calls, the parameters of each API call, and the type information of the returned results, and API code is generated.
[0045] Step 4.2) User selects and records feedback information:
[0046] For the five API codes provided to the user for each API misuse pattern in step 4.1), record the feedback information during user interaction, specifically divided into the following three types:
[0047] i) If a user chooses to adopt the API code corresponding to a certain API misuse pattern, provide feedback to the API correct usage pattern dataset and set a positive feedback score for the API correct usage pattern;
[0048] ii) If the user chooses to rewrite it themselves, the rewritten API code is recorded as the correct API usage pattern. The correct API usage pattern is then converted into an API usage graph, included in the correct API usage pattern dataset, and a positive feedback score is set.
[0049] iii) If the user rejects all suggested correction codes and does not rewrite them themselves, the API usage graph to be tested is considered to be correct. The API usage graph to be tested is changed from the error code to the correct API code pattern. A positive feedback score is set for the API usage graph to be tested, and the API usage graph corresponding to the original API code is included in the API correct usage pattern dataset.
[0050] Step 4.3) Reorder using feedback information:
[0051] After receiving user feedback, correction suggestion scores are calculated and the original API correct usage pattern dataset is reordered: When no user feedback was initially generated, the correction suggestion score for each possible corrected API usage graph is calculated using the following formula:
[0052]
[0053] Where FinalScore(i) represents the score of the correction suggestion for the i-th correction API usage graph, Frequent(i) represents the frequency of support, and u and v are weight coefficients;
[0054] After user feedback is generated, the formula for calculating the correction suggestion score for each possible API correction using the graph is as follows:
[0055]
[0056] Where Feedback(i) represents the corresponding feedback score, and w is the weight corresponding to the feedback score;
[0057] As user feedback increases, the API usage pattern dataset is continuously adjusted based on the suggested correction scores, thereby improving the accuracy of API misuse detection and making the proposed code modification templates for API misuse more accurate.
[0058] Beneficial effects:
[0059] 1) In this invention, we propose a method for API misuse detection and correction based on a feedback mechanism. This method utilizes two opposing datasets, an API item set and an API misuse code set, to detect API misuse from two opposite perspectives.
[0060] 2) This invention proposes a method for recording user interaction feedback information and using it to further adjust the dataset, and details the usage of different feedback information. It further improves the accuracy of API misuse detection and API misuse correction by utilizing user interaction. Attached Figure Description
[0061] Figure 1 This is a flowchart of API misuse detection and correction based on a feedback mechanism;
[0062] Figure 2 This is an example of API usage and its corresponding AUG;
[0063] Figure 3 This is a schematic diagram illustrating the specific process of recording feedback information. Detailed Implementation
[0064] Phase 1. Data Collection
[0065] We collected datasets on correct API usage from high-quality client-side projects and datasets on API misuse from technical Q&A websites, obtaining a representative and comprehensive collection of source code to ensure the richness and diversity of API types.
[0066] Step 1.1 API Client Project Code Collection and Processing
[0067] For collecting accurate source code, large-scale, real-world open-source client code projects on code hosting platforms are selected as the source code collection. Code hosting platforms can manage user code versions; currently popular platforms include GitHub, GitLab, BitBucket, CODING, and SourceForge. In this invention, we obtain data by collecting high-quality client code projects from GitHub. This invention filters JAVA open-source projects with more than 2000 stars on GitHub, comprehensively considering the project's domain, data size, and API complexity. The projects are then downloaded and collected using the `git clone` command-line method.
[0068] For the collected software projects, multiple source files within the projects are filtered based on the development language, retaining only those ending in .java. The Java source code is then parsed using the JavaParser tool to obtain the abstract syntax tree of each method body within the source files. The target API is extracted from this tree, and program slicing techniques are used to extract usage examples corresponding to the target API. Here, we will use the API usage examples obtained from high-quality client code as correct usage examples for subsequent exploration of correct API usage patterns.
[0069] Step 1.2 Collection and processing of misused API codes on Q&A websites
[0070] In collecting source code examples of API misuse, we leveraged the collective wisdom of the technical Q&A website Stack Overflow. Because Stack Overflow is a popular technical Q&A website attracting millions of developers, we could utilize the collective wisdom of developers asking and answering questions on the site to obtain examples of API misuse and some corrective measures.
[0071] Due to the richness and diversity of API types, we chose to extract API types from the official documentation. Specifically, the official API documentation exists as a set of HTML web pages, each explaining a specific API type in detail, and the pages share a consistent formatting style. We extracted the corresponding API type by parsing the title of each webpage. Furthermore, since developers tend to use API abbreviations in Q&A, we extracted API abbreviations from the API documentation to accurately match APIs in Q&A and code samples. When there are conflicts in the same unqualified name across different packages, the fully qualified name is used for differentiation.
[0072] For API types extracted from the API documentation, a search engine was used to search Stack Overflow and link to the relevant API types. For examples of API misuse, Stack Overflow posts typically use keywords to describe the actual problem. Therefore, we chose to capture corresponding examples of API misuse by searching for API types and keywords appearing in the post title or question. The keywords here are "misuse," "error," "exception," "fail," "issue," "flaw," and "incorrect usage."
[0073] Phase 2. Code Pattern Discovery
[0074] After obtaining the code representation of API usage, it is necessary to extract the correct and incorrect API usage patterns to generalize the API usage and facilitate subsequent API misuse detection in the code to be tested and template recommendations for correcting misuse.
[0075] Step 2.1 Convert the code into a graphical representation
[0076] Uncovering API usage patterns from code typically requires converting the code into intermediate representations to achieve better generalization capabilities. Commonly used intermediate representations include call sequences, abstract syntax trees (ASTs), and graph structures. Compared to call sequences and ASTs, graph structures are better at representing interactions between variables and facilitating the coding of elements, structures, and data dependencies. Therefore, the choice is to convert the code into an API usage graph to uncover API usage patterns.
[0077] An API Usage Graph (AUG) is a directed connected graph with labeled nodes and edges that captures usage attributes related to identifying API misuse. The specific conversion process for transforming code into AUGs is as follows: 1) Represent objects, values, and text in the API usage using data nodes; 2) Represent method calls, operators, and instructions in the API usage using action nodes; 3) Represent the control and data flow between the entities represented by nodes and actions using edges, which are divided into eight types: receive edges, parameter edges, definition edges, sequence edges, conditional edges, throw edges, process edges, and synchronization edges. An example of API usage and its corresponding AUG is shown below. Figure 2 As shown.
[0078] To represent API constraints in more detail, we can modify and enhance the AUG to better assist in API misuse detection. First, we add a type attribute to each sequence edge to indicate the order of method calls; that is, the `order` edge in the AUG is represented as `order[precede]` for the preceding call constraint and `order[follow]` for the following call constraint, depending on the call order constraint. Second, we use fields and parameters, different from local variables, to represent the information in the data nodes. In addition to using the `para` parameter edge to indicate that a specific variable is passed as a parameter in a method call, we also mark the parameters of the current method in the data node as `param`. Finally, since constructors and field initializations provide necessary information for identifying misuse, we select the corresponding statements from the code block containing constructors and field initializations and link them to the AUG of the method using these fields via sequence edges. In practice, we can choose to use either the basic AUG or the modified AUG to represent API usage based on specific needs.
[0079] Converting code into AUGs makes it easier to compare the AUGs of the code to be detected with the AUGs of API constraints in the dataset, thereby more accurately detecting API misuse.
[0080] Step 2.2 Frequent Pattern Mining
[0081] In client-side project code, the frequency of API usage generally indicates correctness and determinism. Therefore, for mining correct API usage patterns, a frequent pattern mining algorithm is selected to extract frequent patterns from AUGs converted from project code. AUGs with a frequency of at least a specified frequency threshold are considered correct API usage patterns.
[0082] Here, the frequent subgraph mining algorithm gSpan is chosen for frequent pattern mining. Taking AUGs and a minimum threshold (min_sup) as input, it identifies subgraphs with a frequency higher than min_sup as output. gSpan maps each subgraph to a minimum depth-first search (DFS) encoding and enumerates subgraphs according to the DFS encoding order using depth-first search. Furthermore, gSpan uses heuristics to prune branches during code tree traversal to mine subgraphs in a shorter time. Ultimately, the subgraphs with a frequency higher than min_sup are obtained, representing the mined API usage patterns.
[0083] Since API misuse takes many forms, each type of API misuse can exist as a separate API misuse pattern. Therefore, API misuse patterns can be directly represented by misuse AUG, without the need for frequent pattern mining.
[0084] Step 2.3 Initial sorting of API usage patterns
[0085] After API usage patterns are discovered, the correct API usage patterns discovered by the frequent pattern mining algorithm are initially sorted according to their frequency of support. Subsequently, the API usage patterns are re-sorted based on feedback and a weighted average of frequency of support.
[0086] Phase 3. API Misuse Detection
[0087] Given the code to be detected, it can be converted into an AUG to be detected, and the graph distance algorithm can be used to detect whether API misuse has occurred.
[0088] Step 3.1 Convert the code to be detected into a graphical representation.
[0089] When testing the code to be tested, the code must first be converted into a graph structure as shown in step 2.1, and then converted into test AUGs.
[0090] Step 3.2 Detect misuse using the graph distance algorithm.
[0091] Based on a large-scale dataset of correct and incorrect API usage patterns, for test AUGs, a graph distance algorithm is used to compare the test AUGs with the correct / incorrect usage pattern datasets. The relative distance between AUGs is used to determine whether the API usage is incorrect. The specific implementation process is as follows:
[0092] First, we define dist as a distance function, where the relative distance between any two AUGs (augi and augj) is represented as dist(augi, augj) ∈ [0, 1]. Here, 0 indicates that the two AUGs are used in exactly the same way, and 1 indicates that their usages are completely different. Furthermore, each AUG in the API correct usage pattern dataset is denoted as augc, and each AUG in the API misuse dataset is denoted as augm.
[0093] For each API usage to be detected, the API name is used as the search keyword. A full-text search is performed in the API correct / incorrect usage pattern dataset to obtain a set of AUGs datasets C = {augc1, augc2, ..., augcm} describing the correct usage and a set of misuse datasets M = {augm1, augm2, ..., augmn}.
[0094] Based on the general idea of graph distance algorithms, the AUG to be detected is represented as augt. When the AUG to be detected is used correctly, the following is expected:
[0095]
[0096] When the AUG to be detected is misused, the following is expected:
[0097]
[0098] Therefore, a graph distance algorithm can be used to calculate the relative distances between the usage to be judged and the correct and incorrect usages, thereby determining whether API misuse has occurred. If the minimum distance between the AUG to be judged and any correct usage in C is less than the minimum distance between the AUG and any incorrect usage in M, then the AUG to be tested is considered a correct usage; otherwise, it is considered an API misuse.
[0099] Phase 4. API Misuse Correction
[0100] After detecting API misuse, suggested modifications are provided to facilitate user corrections. User feedback is also recorded to further improve the accuracy of API misuse and correction. A flowchart illustrating the feedback recording process is shown below. Figure 3 As shown.
[0101] Step 4.1 Present suggested code corrections
[0102] For detected API misuses, the correct API usage dataset is retrieved based on the API name, and the top 5 API usage patterns are selected according to their correction suggestion scores. The calculation of the correction suggestion score is explained in detail in step 4.3.
[0103] After obtaining the top 5 corrective AUGs with the highest final correction suggestions, the nodes and edges in the AUGs are traversed to extract information such as the order of API calls, the parameters of each API call, and the type of the return result. Based on this information, the corresponding API code is generated for users to refer to when correcting misused APIs.
[0104] Step 4.2 User selects and records feedback information
[0105] For the five correct code templates provided to the user for each misused API in step 4.1, the feedback information during user interaction is recorded, which can be divided into the following three types:
[0106] i) If a user chooses to adopt the API correction suggestion corresponding to a certain pattern, then provide feedback to the API correct usage pattern dataset and set a positive feedback score for that correct API usage pattern.
[0107] ii) If the user chooses to rewrite it themselves, record the rewritten API code as the correct code pattern, convert it to AUG, include it in the correct usage pattern dataset, and set a positive feedback score.
[0108] iii) If the user rejects all the modification suggestions and does not rewrite them himself, the original AUG to be detected is considered to be correct. It is changed from an error code to a correct API code pattern, a positive feedback score is set for it, and the AUG corresponding to the original API code is included in the API correct usage pattern dataset.
[0109] Step 4.3 Reordering using feedback information
[0110] After receiving user feedback, it is necessary to calculate the correction suggestion score and reorder the original API correct usage pattern dataset.
[0111] The correction suggestion score is the basis for ranking the correct patterns corresponding to detected API misuses. Since a feedback mechanism is introduced here, initially, the correction suggestion score is determined by both graph distance and frequent support. After the feedback mechanism starts running, the correction suggestion score is determined by graph distance, frequent support, and the feedback score. Here, graph distance refers to the distance between the misuse AUG calculated using the graph distance algorithm and the corresponding correct AUGs in the API correct use dataset. According to the definition in step 3.2, the misuse AUG is represented as `augt`, and each corresponding correct AUG is represented as `augci`. The graph distance can be represented as `dist(augt, augci)`. Since the correlation between `dist(augt, augci)` and the two AUGs is negative, the reciprocal of `dist` needs to be taken in the final calculation.
[0112] When no user feedback is initially generated, the formula for calculating the suggested correction score for each possible correction AUG is as follows:
[0113]
[0114] Where FinalScore(i) represents the revised suggestion score of the i-th revised AUG, and Frequent(i) represents its frequent support. u and v are the different weights corresponding to each item.
[0115] After user feedback is generated, the formula for calculating the suggested correction score for each possible AUG is as follows:
[0116]
[0117] Where Feedback(i) represents the corresponding feedback score, and w is the weight corresponding to the feedback score.
[0118] As user feedback increases, the API usage pattern dataset is continuously adjusted based on the suggested correction scores, thereby improving the accuracy of API misuse detection and making the proposed code modification templates for API misuse more accurate.
[0119] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on its differences from other embodiments. In particular, the device embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments. The above descriptions are merely specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for detecting and correcting API misuse based on a feedback mechanism, characterized in that, Includes the following steps: 1) Collect source code sets for correct use of Application Programming Interfaces (APIs) and source code sets for API misuse; We can identify correct API usage patterns and API misuse patterns from the correct API usage source code set and the incorrect API usage source code set. Given the code to be detected, convert the code into an API usage graph to be detected, and use the graph distance algorithm to detect whether API misuse has occurred. After detecting API misuse, suggestions for modification were made for the misused API; The implementation process of step 4) is as follows: Step 4.1) Present suggested code corrections: For detected API misuse, the API correct usage pattern dataset is retrieved based on the API name, and the top 5 API misuse patterns are selected based on the correction suggestion scores. Since API misuse patterns are directly represented by API usage graphs, the top 5 API usage graphs are obtained. Then, the nodes and edges in the API usage graphs are traversed to extract the order of API calls, the parameters of each API call, and the type information of the returned results, and API code is generated. Step 4.2) User selects and records feedback information: For the five API codes provided to the user for each API misuse pattern in step 4.1), record the feedback information during user interaction, specifically divided into the following three types: i) If a user chooses to adopt the API code corresponding to a certain API misuse pattern, provide feedback to the API correct usage pattern dataset and set a positive feedback score for the API correct usage pattern; ii) If the user chooses to rewrite it themselves, the rewritten API code is recorded as the correct API usage pattern. The correct API usage pattern is then converted into an API usage graph, included in the correct API usage pattern dataset, and a positive feedback score is set. iii) If the user rejects all suggested correction codes and does not rewrite them themselves, the API usage graph to be tested is considered to be correct. The API usage graph to be tested is changed from the error code to the correct API code pattern. A positive feedback score is set for the API usage graph to be tested, and the API usage graph corresponding to the original API code is included in the API correct usage pattern dataset. Step 4.3) Reorder using feedback information: After receiving user feedback, correction suggestion scores were calculated and the original API correct usage pattern dataset was reordered: When no user feedback is initially generated, the formula for calculating the correction suggestion score for each possible API correction using the graph is as follows: Where FinalScore(i) represents the corrected suggestion score of the i-th corrected API usage graph, Frequent(i) represents the frequent support; u and v are weight coefficients; dist is defined as the distance function, and the relative distance between any two API usage graphs augi and augj is represented as dist(augi, augj)∈[0,1]; where 0 indicates that the two API usage graphs have the same usage, and 1 indicates that the two usages are completely different; After user feedback is generated, the formula for calculating the correction suggestion score for each possible API correction using the graph is as follows: Where Feedback(i) represents the corresponding feedback score, and w is the weight corresponding to the feedback score.
2. The API misuse detection and correction method based on a feedback mechanism as described in claim 1, characterized in that, The implementation process of step 1) is as follows: Step 1.1) Select a large, real-world open-source client codebase as the source code set for proper API use; For the obtained API correct usage source code set, multiple source files of the API correct usage source code set are filtered according to the development language, and source files ending with .java are retained; the source files ending with .java are parsed by the Java code parsing tool JavaParser to obtain the abstract syntax tree of each method body contained in the source files ending with .java, the target API is extracted from it, and the corresponding usage examples of the target API are extracted using program slicing technology. The extracted usage examples are used as API correct usage examples; Step 1.2) Obtain a collection of source code examples of API misuses through the collective wisdom of the technical Q&A website Stack Overflow; Extract API types from official documentation, use a search engine to search for and link to the corresponding API types on the technical Q&A website Stack Overflow; at the same time, select examples of API misuse by searching for API types and keywords that appear in the title or question of posts.
3. The API misuse detection and correction method based on a feedback mechanism as described in claim 2, characterized in that, In step 2): The process of converting the source code set into an API usage graph is as follows: 1) Represent the objects, values, and text in the API usage using data nodes; 2) Represent method calls, operators, and instructions in API usage using action nodes; 3) The control and data flow between the entities represented by nodes and actions are represented by edges, which are divided into eight types, including receiving edges, parameter edges, definition edges, sequence edges, conditional edges, output edges, processing edges, and synchronization edges; among them, API usage includes correct API usage and API misuse; Modify and improve the obtained API usage graph: First, add a type attribute to each order edge of the API usage graph to indicate the order of method calls; second, use fields and parameters different from local variables to represent the information in the data nodes; finally, find the statements that identify misuse of basic information from the code blocks containing constructors and field initializations, and link the found corresponding statements to the API usage graph of the methods that use the corresponding statements through the order edge. Step 2.1) Based on the conversion process, convert the API correct usage source code set and the API misuse source code set into API correct usage atlas and API misuse atlas respectively; Step 2.2) Mining API correct usage patterns: Using the API correct usage graph and the minimum threshold min_sup as input to the frequent subgraph mining algorithm gSpan, identify subgraphs with a frequency higher than the minimum threshold min_sup, that is, mine the API correct usage patterns and obtain the API correct usage pattern dataset. For API misuse pattern mining: Since there are many forms of API misuse, and each type of API misuse exists as a separate API misuse pattern, API misuse patterns are directly represented by API misuse graphs to obtain API misuse pattern datasets; Step 2.3) For the obtained API correct usage patterns, perform an initial sorting based on the frequency of support.
4. The API misuse detection and correction method based on a feedback mechanism as described in claim 3, characterized in that, The implementation process of step 3) is as follows: Step 3.1) Convert the code to be detected into a graph of API usage according to the conversion process; Step 3.2) Detect whether API misuse has occurred using a graph distance algorithm: For the API usage graph to be detected, a graph distance algorithm is used to compare the API usage graph to be detected with the API correct usage graph and the API misuse graph. The relative distance between the API usage graphs is used to determine whether API misuse has occurred. Each API usage graph in the source code set for correct API usage is represented as augc, and each API usage graph in the source code set for misuse API usage is represented as augm. For each API usage graph to be detected, the API name is used as the search keyword. Full-text search is performed in the source code set of correct API usage and the source code set of misused API to obtain a set of API usage graph datasets describing correct usage C={augc1,augc2, …, augcm} and a set of misused API usage graph datasets M={augm1, augm2, …, augmn}. Based on the graph distance algorithm, the graph representing the API usage to be detected is denoted as augt. When the API usage graph is correctly used, the following is expected: When the API being tested is misused, the following is expected: 。