Supercharge Your Innovation With Domain-Expert AI Agents!

A Code Classification Method Based on Neural Network Language Model

A technology of language model and code classification, which is applied in the field of code classification based on neural network language model, and can solve problems such as the curse of dimensionality

Active Publication Date: 2020-08-04
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Based on the above technical problems, the present invention provides a code classification method based on a neural network language model, aiming to solve the technical problem of dimension disaster caused by symbolic representation methods during code classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Code Classification Method Based on Neural Network Language Model
  • A Code Classification Method Based on Neural Network Language Model
  • A Code Classification Method Based on Neural Network Language Model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] All the features disclosed in this specification, except mutually exclusive features and / or steps, can be combined in any way.

[0051] The present invention will be described in detail below in conjunction with the accompanying drawings.

[0052] A code classification method based on a neural network language model, comprising the following steps:

[0053] Step 1: Use the tool pycparser to convert the code into an AST tree (Abstract Syntax Tree).

[0054] Step 2: Initialize AST tree node c i The vector vec(c i ), that is, assign a random value vector to each node in the AST tree; the node c i Central African leaf node p k The vector is vec(p k ) 1 , the non-leaf node p k child node t x The vector of vec(t x ), where vec(p k ) 1 ∈vec(c i ), vec(t x )∈vec(c i ), where i represents the sequence number of the node, k represents the sequence number of the non-leaf node, and x represents the sequence number of the child node.

[0055] Step 3: In order to make ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

It belongs to the field of software engineering and discloses a code classification method based on the neural network language model. First, the code is converted into an AST tree, and the node c of the AST tree is initialized i A vector of , using the child node t x The vector of get the non-leaf node p k The reconstruction vector; use the AST_Node2Vec model to the node c i If the loop condition is not satisfied, the loop will continue; if the loop condition is met, the AST tree with updated node vector and the reconstructed vector of the updated non-leaf node will be output; the AST tree with node vector will be updated And the reconstruction vector of the non-leaf node after updating is used as the input of the convolutional neural network based on the tree, and utilizes the convolutional neural network based on the tree to complete the classification of the code; adopting this method to classify the code can effectively avoid The curse of dimensionality problem, while being able to show semantic similarity, is a good way to classify code by function.

Description

technical field [0001] The invention relates to a code classification method, in particular to a code classification method based on a neural network language model, which can classify codes according to functions. Background technique [0002] Hindle et al. used statistical methods to compare programming languages ​​with natural languages ​​and found that they had very similar statistical properties. These features are very difficult for humans to capture, but they demonstrate that learning-based methods can be applied to the field of code analysis. Code analysis methods based on machine learning have been studied for a long time, relying on a large number of artificial features when solving problems such as code error detection and code duplication analysis. For a specific problem, these features require a large amount of labeled data. Moreover, the data representation of this method is a one hot representation, that is, an N-dimensional vector is used to encode the N wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/36G06F16/35G06N3/08
CPCG06F11/3608G06F16/35G06N3/08
Inventor 屈鸿杨林川涂强张书州王淼颜志鹏王一鸣
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More