Decision-based code annotation generation method fusing information retrieval and deep learning

An information retrieval and deep learning technology, applied in the computer field, can solve problems such as difficulty in covering low-frequency words, achieve the effect of easy development and maintenance, and improve the efficiency of code understanding

Pending Publication Date: 2021-12-03
NANTONG UNIVERSITY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Experimental research shows that these methods can achieve better performance and have higher generalization ability, but the generated annotations tend to contain high-frequency words, and it is difficult to cover low-frequency words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Decision-based code annotation generation method fusing information retrieval and deep learning
  • Decision-based code annotation generation method fusing information retrieval and deep learning
  • Decision-based code annotation generation method fusing information retrieval and deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] see figure 1 , figure 2 As shown, the present invention provides a code annotation generation method based on decision-making fusion information retrieval and deep learning, which specifically includes the following contents:

[0036] (1) Collect two large-scale corpora PCSD (Python code annotation data set) and JCSD (Java code annotation data set) as experimental objects, both of which are pairs mined from GitHub, and further divided The specific statistical information (such as the number of code segments, the average number of tokens in the code and the average number of tokens in the comments) is shown in Table 1.

[0037] Table 1

[0038]

[0039] (2) Build a model based on information retrieval, and find the most similar code segment c in the training set for the target code segment c by integrating semantic, lexical and grammatical similarities sim , and then, by reusing the comment of the most similar code segment as the comment of the targ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a decision-based code annotation generation method fusing information retrieval and deep learning, which comprises the following steps: (1) based on a GitHub website, mining high-quality open source items, collecting functions and annotations in codes to form a corpus, and dividing the corpus into a training set and a verification set; (2) constructing a model based on information retrieval, searching a code segment csim most similar to the target code segment c in the training set by fusing semantic, lexical and grammatical similarities, and reusing annotations of the code segment csim; (3) constructing a model based on deep learning, considering an adversarial sample, and generating a code annotation by using a Transform-based model; (4) analyzing the relationship between the performance of the two models and the similarity score between the two code segments c and csim on a verification set, and determining a similarity threshold value; and (5) when an annotation is generated for a new target code, outputting a corresponding code annotation according to the decision module. The method has the beneficial effect that high-quality code annotations can be generated.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a code annotation generation method based on decision-making fusion information retrieval and deep learning. Background technique [0002] Code comment generation aims to help software developers understand the design intent and functionality of code snippets. Due to limited project development budget or insufficient attention to the writing of code comments, the quality of code comments is difficult to guarantee. Furthermore, code comments are often not automatically updated as code snippets evolve. Therefore, there is an urgent need to design effective methods to automatically generate high-quality code annotations after analyzing the semantics of target code segments. [0003] Previous studies usually use information retrieval for code comment generation tasks and achieve good performance. Since code cloning is prevalent in software development, information retrie...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/169G06F8/41G06K9/62G06N3/04G06N3/08
CPCG06F40/169G06F8/436G06F8/42G06N3/084G06N3/044G06N3/045G06F18/2415
Inventor 陈翔周彦琳杨光于池刘珂
Owner NANTONG UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products