Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Bash code annotation generation method based on dual information retrieval

A technology of information retrieval and coding, applied in the computer field, can solve problems such as low efficiency and multi-time cost, and achieve high-quality results

Pending Publication Date: 2022-01-21
NANTONG UNIVERSITY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method usually requires a large amount of computing resources to train the model, and requires more time cost and low efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bash code annotation generation method based on dual information retrieval
  • Bash code annotation generation method based on dual information retrieval
  • Bash code annotation generation method based on dual information retrieval

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] see figure 1 As shown, the present invention provides a kind of Bash code annotation generation method based on double information retrieval, specifically comprises the following content:

[0037] (1) Collect data from the NL2Bash corpus and the data provided by the NLC2CMD competition to obtain a high-quality corpus, and perform deduplication operations on the data in the corpus. The final corpus contains 10592 data, and the data format is .

[0038] (2) To make statistics on the data in the corpus, Table 1 and Table 2 respectively show the detailed statistics of the length of code fragments and the length of code comments in the corpus.

[0039] Table 1

[0040]

[0041] Table 2

[0042]

[0043] (3) In order to ensure a fair comparison with baseline methods, 1063 pairs of data are extracted from the corpus as the test set and the rest of the corpus as the training set according to the data partitioning method of previous studies.

[0044] (4) Enter the targe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a Bash code annotation generation method based on dual information retrieval, which comprises the following steps of: (1) collecting data to obtain a high-quality corpus, and performing duplicate removal operation on the corpus to obtain a data set; (2) extracting code semantic features by using CodeBert; (3) regarding the code snippets as sets composed of lexical elements, calculating the lexical similarity between the sets through the text editing distance, wherein the code snippets with the highest lexical similarity with the target code are retrieved from the k candidate code snippets through the lexical similarity; and (4) retrieving a code snippet most similar to the target code from the data set, and taking the annotation corresponding to the code as the code annotation of the target code. The method has the beneficial effects that the most similar code can be retrieved from the code library according to the target code, so that the high-quality code annotation is generated, the readability and the understandability of the Bash code are improved, and developers are helped to quickly understand the Bash code.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method for generating Bash code annotations based on dual information retrieval. Background technique [0002] The Shell is the interface for interaction between developers and the Linux operating system. The current Linux operating system supports different types of shells, among which Bash is the default shell command language of Linux, and is widely used in the process of program development. As a scripting language, Bash usually includes three basic components: utilities (for example: find, mkdir, cd, grep), option labels (for example: -name, -i), and parameters (for example: "*.java" , "TODO"). Compared with the traditional programming languages ​​C, Java, Python, etc., the Bash language has fewer usage scenarios, but the role of the Bash language in the development and maintenance of the Linux system cannot be ignored. In addition, the Bash language also has the char...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/73G06F40/169G06F40/253G06F40/284G06F40/30G06N20/00
CPCG06F8/73G06F40/169G06F40/253G06F40/284G06F40/30G06N20/00
Inventor 陈翔于池杨光刘珂夏鸿崚胡新宇顾亚锋
Owner NANTONG UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products