Related patch recommendation method based on heterogeneous data

A technology of heterogeneous data and recommended methods, applied in the field of code review, can solve the problems of labor cost and time cost, and the number of code iterations, etc., to achieve the effect of improving high-efficiency work, saving costs, and improving reliability and stability

Active Publication Date: 2020-04-21
SUN YAT SEN UNIV
View PDF9 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the large number of code iterations and too many code files in each project, it takes a lot of labor cost and time to manually find other submissions related to the current problem submission, or to manually mark whether the submissions are related in advance. cost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Related patch recommendation method based on heterogeneous data
  • Related patch recommendation method based on heterogeneous data
  • Related patch recommendation method based on heterogeneous data

Examples

Experimental program
Comparison scheme
Effect test

example

[0092] 1. Adopt the method of web crawler to automatically obtain the review records and files or data of multiple projects on Gerrit. The data obtained by crawling is heterogeneous and has many types, including at least the following three types:

[0093] 1) The basic information of the patch (the source of the patch meta-feature): discrete data such as the submitter and the reviewer; for crawling the submission information of the patch using the crawler method, including the name of the reviewer of the patch and the name of the submitter of the patch , the name of the author of the patch, the name of the project to which the patch belongs, the name of the branch of the project to which the patch belongs, the time when the patch was submitted, the number of personnel participating in the patch review, the number of modified code files, etc.

[0094] 2) Crawl the brief description of each patch submission, which is text data.

[0095] 3) The patch is crawled by means of a craw...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a related patch recommendation method based on heterogeneous data. The method comprises the following steps: crawling multivariate heterogeneous data of code review; cleaning data, splicing multivariate heterogeneous data features into patch feature vectors; pairing patch pairs, taking the correlation with the prediction patch as a positive sample; taking a negative samplewithout correlation with the prediction patch; marking binary classification labels for the positive sample and the negative sample; dividing a training set and a verification set, using the trainingset to train a logistic regression model, a random forest model and a LightGBM model respectively to obtain a corresponding probability and a prediction label, calculating a corresponding accuracy rate according to the prediction label, and finally constructing a prediction score according to weighted summation of the fusion weight and the corresponding probability to obtain an optimal predictionscore. According to the method, correlation evaluation is carried out on the data submitted to the code review system through machine learning to obtain optimal recommendation, recommendation reliability and stability are improved, and more labor cost is saved.

Description

technical field [0001] The invention relates to the field of code review, in particular to a related patch recommendation method based on heterogeneous data. Background technique [0002] Code review is an important basis for the smooth iteration of software engineering projects. It is composed of multiple and complex small tasks, including code specification revisions, code supplementary comments, etc. At present, it is widely used in the field of software engineering to update the code and manage the version by manual review, and the labor cost is high. [0003] At present, the software engineering industry generally uses git, Gerrit and other similar systems for code management and review. Every time the code is updated on these systems, we call it a code modification or a patch. These systems provide a reviewer, committer, for each patch submission to complete queries or comments. The basic process for patch review is: [0004] 1) The programmer, that is, the author ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/658G06F40/284
CPCG06F8/658
Inventor 郑子彬陈志豪李全忠
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products