A c source code vulnerability detection method based on bert model and bilstm

A vulnerability detection and source code technology, applied in code compilation, program code conversion, neural learning methods, etc., can solve problems such as lack of semantic information, inability to effectively learn complex graph nodes, and low detection accuracy

Active Publication Date: 2022-05-13
STATE GRID GASU ELECTRIC POWER RES INST +2
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, in the process of generating word vectors, semantic information will inevitably be missing, which will affect the detection accuracy of the model.
Patent CN201911363149 uses semi-supervised learning technology, uses labeled data and unlabeled data as training data, and directly inputs code elements into the ELMo model to predict whether the source code contains vulnerability information. Although it saves code processing time, due to ELMo The model needs to set the parameters of each layer in the downstream of the training, because it can only encode a single word, and there is no negative sampling process, so it cannot guarantee a high detection accuracy
Patent CN202010576421 proposes a source code vulnerability detection method based on a graph convolutional neural network. This method obtains the code attribute graph corresponding to the source code, constructs a code slice graph structure based on the vulnerability characteristics, and then uses a graph convolutional network to learn each graph. The vector representation of the node, training to obtain the source code vulnerability detection model, but when the training code structure is more complex, the generated code attribute graph is more complex, and the graph convolutional network cannot effectively learn complex graph nodes, so there is still a detection accuracy lower question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A c source code vulnerability detection method based on bert model and bilstm
  • A c source code vulnerability detection method based on bert model and bilstm
  • A c source code vulnerability detection method based on bert model and bilstm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present invention will be further described below in combination with specific embodiments.

[0030] A C source code vulnerability detection method based on the Bert (Bidirectional Encoder Representations from Transformers) model and bidirectional long-term short-term memory network BiLSTM (Bi-directional Long Short-Term Memory), which mainly includes the following steps:

[0031] Step A: Generate program slices, based on the source code of the software, use the Joern tool to generate the program dependency graph PDG (Program Dependence Graph) and abstract syntax tree AST (Abstract Syntax Tree) corresponding to the source code. PDG contains the control dependency graph CDG between codes (Control Dependence Graph) and data dependency graph DDG (Data Dependence Graph), AST contains grammatical information between program statements; based on the control dependency information and data dependency information in the control dependency graph CDG and data dependency graph ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A C source code vulnerability detection method based on the Bert model and BiLSTM. By analyzing the software source code, a control dependency graph and a data dependency graph are constructed, and the code is sliced ​​according to the control dependency relationship and data dependency relationship between codes to generate slices. Level code blocks, then perform data cleaning and preprocessing on the generated code blocks, and label each generated code block to distinguish whether the code block contains vulnerability information. Second, input the processed code block as a training set into the Bert pre-training model to fine-tune the standard Bert model to obtain a new Bert model. Then input the code block into the new Bert model to learn the semantic information and contextual relationship between the codes in an unsupervised manner, perform word embedding encoding on the code block, and obtain a word vector with maximized code semantic information and contextual relationship. Finally, input the obtained word vector into BiLSTM to train the detection model, and obtain the source code vulnerability detection model. The invention can improve the accuracy rate of loophole detection and reduce the false alarm rate.

Description

technical field [0001] The invention relates to a software source code loophole detection method, in particular to a C source code loophole detection method based on Bert model and BiLSTM. Background technique [0002] Most of the network attack security incidents that occur in current life are mostly based on various software vulnerabilities in the device software. Software vulnerabilities refer to software defects caused by software developers during the development stage due to factors such as technical problems and lack of experience. Defects exist throughout the entire phase of software deployment and operation. Therefore, attackers can use exploit tools to attack the target system based on such software vulnerabilities at any time or place, extract administrator privileges, obtain system data and command and control privileges to disrupt the normal operation of the system or obtain economic benefits. Purpose. [0003] The existing relatively mature vulnerability mini...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F21/56G06F21/57G06F8/41G06N3/04G06N3/08
CPCG06F21/563G06F21/577G06F8/42G06F8/436G06N3/08G06F2221/033G06N3/044
Inventor 马之力马宏忠李志茹张学军盖继扬杨启帆赵红张驯弥海峰谭任远李玺朱小琴白万荣杨勇魏峰龚波杨凡高丽娜
Owner STATE GRID GASU ELECTRIC POWER RES INST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products