Unlock instant, AI-driven research and patent intelligence for your innovation.

Repeated Pull Request detection method based on graph neural network

A detection method and neural network technology, applied in the direction of neural learning methods, biological neural network models, neural architectures, etc., can solve the problem of few features, backward change code similarity detection methods, and poor detection of code semantic similarity, etc. problem, to achieve the effect of reducing the workload

Pending Publication Date: 2022-03-25
NANJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The work is mainly divided into two categories. One is to calculate the similarity based on the title and description of the Pull request, but there are fewer features considered. There are more features to be considered for the Pull request duplicate detection.
The second category is to obtain similarity through title, description, change file, and change code, and use machine learning to train the model. Although the characteristics are considered adequately, the detection method for the similarity of the changed code is relatively backward, and the code cannot be detected well. Semantic similarity of

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Repeated Pull Request detection method based on graph neural network
  • Repeated Pull Request detection method based on graph neural network
  • Repeated Pull Request detection method based on graph neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] specific implementation plan

[0017] The technical solutions of the present invention will be further elaborated below according to the drawings and in conjunction with the embodiments. After reading the present invention, modifications to various equivalent forms of the present invention by those skilled in the art fall within the scope defined by the appended claims of the present application.

[0018] Concrete steps of the present invention are as follows:

[0019] 1) For the obtained data set, obtain the desired feature Pull Request information through the GitHub API call, including title information, description information, commit information, changed file information, and changed code information. Filter Pull Requests with more than 50 changed files or more than 10,000 lines of code added or deleted.

[0020] 2) Find the similarity of title, description and commit information based on natural language processing related technologies, and use the longest common...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a repeated Pull Request detection method based on a graph neural network, and the method comprises the following steps: 1) processing a data set and obtaining data, and obtaining the title, description, commit information, change file information and change code information of a Pull Request through the calling of a GitHub API according to the number of the Pull Request in the data set; and filtering the Pull Request when the change file exceeds 50 or the change code is added and deleted by more than 10000 lines. 2) calculating the similarity of title, description and commit information by using a natural language processing method in combination with cosine similarity; calculating the path similarity of the changed file by using a longest common sub-path algorithm; and calculating the similarity of the positions of the change codes, calculating the length of the overlapped change positions by calculating the specific positions of the change codes in the two PullRequest, and calculating the similarity of the change codes by dividing the total length of the overlapped positions. And 3) based on a large code clone data warehouse, training a graph neural network model by adding a flow abstract syntax tree, a graph matching network and mean square error loss, and calculating the similarity of changed codes. And 4) on the basis of the obtained title similarity, description similarity, commit information similarity, change file path similarity, change code position similarity and change code similarity as feature values and corresponding labels for determining whether to be repeated or not, training a repeated Pull Request detection model by utilizing an AdaBoost algorithm in machine learning.

Description

technical field [0001] The invention belongs to the field of software engineering development and maintenance, and in particular relates to a repeated Pull Request detection method based on a graph neural network. Background technique [0002] In recent years, with the popularity of collaborative development, more and more developers contribute to open source projects by submitting Pull Requests. A typical Pull Request on GitHub includes the following steps: first, contributors follow some well-known developers to participate in attractive projects; then clone the project to the local, add new features or fix bugs, etc.; then contributors initiate Pull Requests to the original repository; then the reviewers of the original repository review the PullRequest, discuss whether the Pull Request meets the requirements, whether the quality needs to be further improved, etc., and give some suggestions; next, the contributors improve according to the reviewer's suggestions And updat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F11/36G06N3/04G06N3/08
CPCG06F11/3628G06F11/3668G06N3/08G06N3/045
Inventor 张卫丰崔博夕
Owner NANJING UNIV OF POSTS & TELECOMM