Code classification method based on neural network linguistic model

A language model, code classification technology, applied in the field of code classification based on neural network language model, can solve problems such as dimensional disaster
CN107220180AActive Publication Date: 2017-09-29UNIV OF ELECTRONICS SCI & TECH OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
UNIV OF ELECTRONICS SCI & TECH OF CHINA
Publication Date
2017-09-29

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention belongs to the field of the software engineering, and discloses a code classification method based on a neural network linguistic model. The method comprises the following steps: firstly converting a code to an AST tree, initializing a vector of a node ci of the AST tree, and to obtain a reconstitution vector of a non-leaf node pk by using a vector of a child node tx; updating the vector of the node ci by using an AST-Node2Vec model, if the circulation condition is not satisfied, continuously circulating; if the circulation condition is satisfied, outputting the AST tree with the updated node vector and the reconstitution vector of the updated non-leaf node; and using the AST tree with the updated node vector and the reconstitution vector of the updated non-leaf node as the input of a convolution neutral network based on the tree, and completing the code classification by using the convolution neutral network based on the tree. The method is used for classifying the codes so that the problem of dimension curse can be effectively avoided, the semantically similarity can be displayed, and the codes can be classified better according to the function.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to a code classification method, in particular to a code classification method based on a neural network language model, which can classify codes according to functions. Background technique

[0002] Hindle et al. used statistical methods to compare programming languages ​​with natural languages ​​and found that they had very similar statistical properties. These features are very difficult for humans to capture, but they demonstrate that learning-based methods can be applied to the field of code analysis. Code analysis methods based on machine learning have been studied for a long time, relying on a large number of artificial features when solving problems such as code error detection and code duplication analysis. For a specific problem, these features require a large amount of labeled data. Moreover, the data representation of this method is a one hot representation, that is, an N-dimensional vector is used to encode the N wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More