Robustness code summary generation method based on self-attention mechanism

A code summary and robust technology, applied in the intersection of software engineering and natural language processing technology, can solve the problems of less coding structure dependence, poor summary generation effect, etc., and achieve excellent evaluation results

Active Publication Date: 2018-09-11
WUHAN UNIV
View PDF4 Cites 72 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] The present invention aims to improve the accuracy and naturalness of the code summary generation effect, and has strong robustness when processing the code-description (descriptive text of the question-the code fragment of the reply) noise corpus pair extracted from the programming question-and-answer community. It can overcome the noise introduced by directly collecting parallel corpus to train the summary generation model, and introduces the self-attention self-attention mechanism in the sequence model, which can reduce the dependence on long-distance sequences in the coding structure when performing seq2seq sequence learning, resulting in Issues with poor summary generation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Robustness code summary generation method based on self-attention mechanism
  • Robustness code summary generation method based on self-attention mechanism
  • Robustness code summary generation method based on self-attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0076] Step 1: Preprocess the initially collected noise corpus, remove some data that does not conform to the format, construct a feature matrix according to the feature value of the social attribute, perform the mean value completion operation of the default value, and remove the noise points whose frequency distribution is less than %1, All data were normalized.

[0077] Step 2: Use the DB-WTFF feature fusion framework to extract high-quality (PN, NL) corpus, in which the pyWavelets toolkit is used to implement wavelet transform, and Daubechies7 wavelet is used for 5-layer decomposition to extract high-quality corpus (C# 88000, SQL 46000).

[0078] Step 3: Use the Tflearn framework to build a T-SNNC network, the text serialization length is unified to 33, and the output fully connected layer dimension is set to 32. In the DS-Bi-LSTM encoding module, the word embedding dimension is 128, and the number of bidirectional LSTM hidden layer units is 128. Split the manually label...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a robustness code summary generation method based on a self-attention mechanism. The method comprises the steps that firstly, high-quality codes and description corpus pairs (query description texts and reply code texts) in a programming community are extracted; then redundant information of the codes and description corpus pairs is filtered out; then the query descriptiontexts corresponding to the codes are converted into declarative statements; lastly, a code summary of a sequential model based on the self-attention mechanism is generated. The robustness code summarygeneration method based on the self-attention mechanism has the advantages that the redundant information and noise content can be effectively removed, the automatic evaluation and artificial evaluation accuracy of the generated summary are both improved, and the evaluation result is superior to that of an existing baseline method.

Description

technical field [0001] The invention belongs to the cross field of software engineering and natural language processing technology, and specifically relates to a method for generating robust code summaries based on the two advantages of wavelet time-frequency transform EM algorithm and sequence model based on self-attention mechanism, especially suitable for Code description data with noisy information in the programming community. Background technique [0002] In the evolution of large-scale software projects, code comments are the key work in software maintenance. High-quality comments often provide efficient reference information for developers to understand and update code. Although there are many tools and methods for program understanding, it still takes longer to understand the code than to write it. At the same time, developers tend to skim the code (code entities such as method signatures) to understand the program to reduce the workload, but this ignores key infor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/70G06F8/72
CPCG06F8/70G06F8/72
Inventor 彭敏胡刚袁梦霆王清曲金帅
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products