Unlock instant, AI-driven research and patent intelligence for your innovation.

Layered semantic perception code representation learning method

A technology of code representation and learning method, applied in the field of distributed vector representation, can solve problems such as deep learning models that have not yet been retrieved, and achieve the effect of improving feature representation capabilities

Pending Publication Date: 2022-07-29
HARBIN INST OF TECH
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Currently, no deep learning models capable of simultaneously encoding the sequence and structural semantics of a program have been retrieved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Layered semantic perception code representation learning method
  • Layered semantic perception code representation learning method
  • Layered semantic perception code representation learning method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] Take the C code for calculating the greatest common divisor (GCD) as an example (such as image 3 ) to analyze the construction process of the hierarchical program composite graph.

[0063] (1) First, parse the source code into AST, such as Figure 4 (a);

[0064] (2) Then, parse the source code into PDG, such as Figure 4 (c);

[0065] (3) Finally, replace the statement nodes in the PDG with the syntax subtrees in the AST to construct a hierarchical program compound graph. The specific replacement process is as follows Figure 5 shown.

Embodiment 2

[0067] Taking the hierarchical program compound graph corresponding to the C code for calculating the greatest common divisor (GCD) as an example, a deep traversal algorithm is used to remove some directed edges in the program compound graph to form a directed acyclic semantic graph of the program.

[0068] Deleting a loop from a program composite graph is equivalent to deleting a loop from a PDG, because loop structures can only appear in the data-dependent and control-dependent (i.e., PDG) relationships between statements, not in the statements corresponding to in the grammar subtree. Therefore, for simplicity, this example uses the id of the PDG node corresponding to the source code to represent the syntax subtree corresponding to each statement in the hierarchical program compound graph. According to step 2, by removing some directed edges, construct image 3 The process of the directed acyclic semantic graph of the c program is as follows Image 6 shown. E.g Image 6 ...

Embodiment 3

[0070] Taking the directed acyclic semantic graph corresponding to the C code for calculating the greatest common divisor (GCD) as an example, the Graph-LSTM model is used to learn the global semantic vector representation of sentences.

[0071] First, according to step 41, extract the dependencies between the statement nodes in the semantic graph corresponding to the sample code, and then perform topological sorting on the statement nodes according to step 42, and the obtained processing order of the statement nodes is "3→2→1→5→ 7→4→6→8”. Combining the order information and dependencies of nodes to obtain the topologically sorted statement node relationship diagram as shown in the figure below Figure 7 shown. According to step 43, the statement nodes are processed according to the node topology order. Taking node 4 as an example, the neighbor nodes (node ​​2 and node 7) that have direct dependencies with statement node 4 are used as the global context of this node, and the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a hierarchical semantic perception code representation learning method, which comprises the following steps of: aiming at a given source code, firstly constructing a directed acyclic semantic graph of a program by utilizing a program analysis technology, then extracting grammar sub-tree information in the semantic graph, and learning local semantic vector representation of each statement in the program by utilizing a Tree-LSTM model; and finally, based on the local semantic vector representation of the statement, learning the structure and sequence semantic information of the code by using a Graph-LSTM model. According to the method, the graph-based LSTM model Graph-LSTM suitable for program structure semantic coding is proposed for the first time, a new framework capable of fusing source code sequence information into a code representation learning process is proposed, and the feature representation capability of the model is improved.

Description

technical field [0001] The invention belongs to the field of software engineering, and relates to a method for learning code representation with hierarchical semantic perception, in particular to a method for converting a program into a distributed vector representation containing deep semantic information of code by utilizing program analysis and deep learning technology. Background technique [0002] At present, in the field of software engineering, deep learning technology has been widely used in various types of software development and maintenance tasks such as program classification, clone detection, code completion, code summarization, etc., and has achieved excellent performance in a large number of tasks. , which can effectively help developers reduce time and labor costs. Developing effective code representation learning methods to capture deep semantic information of programs is the key to enabling models to achieve good performance in downstream tasks. Deep lear...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/74G06N3/04G06N3/08
CPCG06F8/74G06N3/08G06N3/044
Inventor 蒋远苏小红郑伟宁王甜甜
Owner HARBIN INST OF TECH