C source code vulnerability detection method based on Bert model and BiLSTM

A vulnerability detection and source code technology, applied in code compilation, program code conversion, neural learning methods, etc., can solve problems such as unguaranteed, low detection accuracy, and inability to effectively learn complex graph nodes, so as to improve accuracy, The effect of reducing the false positive rate

Active Publication Date: 2021-09-21
STATE GRID GASU ELECTRIC POWER RES INST +2
View PDF14 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, in the process of generating word vectors, semantic information will inevitably be missing, which will affect the detection accuracy of the model.
Patent CN201911363149 uses semi-supervised learning technology, uses labeled data and unlabeled data as training data, and directly inputs code elements into the ELMo model to predict whether the source code contains vulnerability information. Although it saves code processing time, due to ELMo The model needs to set the parameters of each layer in the downstream of the training, because it can only encode a single word, and there is no negative sampling process, so it cannot guarantee a high detection accuracy
Patent CN202

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • C source code vulnerability detection method based on Bert model and BiLSTM
  • C source code vulnerability detection method based on Bert model and BiLSTM
  • C source code vulnerability detection method based on Bert model and BiLSTM

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present invention will be further described below in combination with specific embodiments.

[0030] A C source code vulnerability detection method based on the Bert (Bidirectional Encoder Representations from Transformers) model and bidirectional long-term short-term memory network BiLSTM (Bi-directional Long Short-Term Memory), which mainly includes the following steps:

[0031] Step A: Generate program slices. The present invention is based on the source code of the software, and uses the Joern tool to generate a program dependency graph (PDG) (Program Dependence Graph) and an abstract syntax tree (AST) (Abstract SyntaxTree) corresponding to the source code. The PDG contains the control dependency graph between codes CDG (Control Dependence Graph) and data dependency graph DDG (Data Dependence Graph), AST contains grammatical information between program statements; based on the control dependency information and data dependency information in the control dependenc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A C source code vulnerability detection method based on a Bert model and BiLSTM comprises the steps that software source codes are analyzed, a control dependency graph and a data dependency graph are constructed, the codes are sliced according to the control dependency relation and the data dependency relation between the codes, slice-level code blocks are generated, then the generated code blocks are subjected to data cleaning and preprocessing, and each generated code block is labeled to distinguish whether the code block contains vulnerability information or not. Secondly, the processed code blocks serve as a training set to be input into the Bert pre-training model to conduct fine adjustment on the standard Bert model, and a new Bert model is obtained; and the code blocks are input into a new Bert model to learn semantic information and context relationships between codes in an unsupervised manner, and word embedding coding is performed on the code blocks to obtain word vectors with maximized code semantic information and context relationships. And finally, the obtained word vector is input into BiLSTM to train a detection model, and a source code vulnerability detection model is obtained. The vulnerability detection accuracy can be improved, and the false alarm rate can be reduced.

Description

technical field [0001] The invention relates to a software source code loophole detection method, in particular to a C source code loophole detection method based on Bert model and BiLSTM. Background technique [0002] Most of the network attack security incidents that occur in current life are mostly based on various software vulnerabilities in the device software. Software vulnerabilities refer to software defects caused by software developers during the development stage due to factors such as technical problems and lack of experience. Defects exist throughout the entire phase of software deployment and operation. Therefore, attackers can use exploit tools to attack the target system based on such software vulnerabilities at any time or place, extract administrator privileges, obtain system data and command and control privileges to disrupt the normal operation of the system or obtain economic benefits. Purpose. [0003] The existing relatively mature vulnerability mini...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F21/56G06F21/57G06F8/41G06N3/04G06N3/08
CPCG06F21/563G06F21/577G06F8/42G06F8/436G06N3/08G06F2221/033G06N3/044
Inventor 马之力马宏忠李志茹张学军盖继扬杨启帆赵红张驯弥海峰谭任远李玺朱小琴白万荣杨勇魏峰龚波杨凡高丽娜
Owner STATE GRID GASU ELECTRIC POWER RES INST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products