Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Software security vulnerability detection method based on grammatical features and semantic features

A semantic feature and software security technology, applied in software testing/debugging, error detection/correction, computer security devices, etc., can solve problems such as missing semantics, incomplete semantic expression, and large loss of original information, so as to improve accuracy and improve Accuracy and completeness, the effect of improving detection performance

Active Publication Date: 2021-03-23
BEIJING INSTITUTE OF TECHNOLOGYGY +1
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] (1) The existing technology uses an AST-based feature extraction method. The idea is to use a certain search algorithm to convert the AST into a node sequence, and then perform feature extraction based on the node sequence. The disadvantage is: AST is a tree structure, which reflects In the process of converting the node type and the relationship between nodes into a node sequence, existing search algorithms (such as depth-first search) cannot reflect the adjacency and sequence relationship between nodes in the same layer, that is, nodes in the same layer are in After converting to a sequence, there may be various results, that is, the converted sequence cannot retain the structural information of the original tree structure, resulting in a large loss of original information
[0007] (2) The existing technology uses a feature extraction method based on CFG. The idea is to use a graph neural network to extract the semantic features of CFG for CFG. The disadvantage of the method for extracting semantic features based on CFG is that only the control flow information of the program is included in the CFG. Lack of data flow information, incomplete semantic expression
[0011] (6) Existing detection methods based on source code syntax and semantic features have certain deficiencies: the word bag model based on n-gram sequence and the word vector model of token sequence are difficult to accurately describe the syntax and semantics of the code; The detection method, AST can describe the syntax of the code very well, but AST cannot represent the execution semantics of the program, so many vulnerabilities related to the execution semantics cannot be detected; the detection method based on the control flow graph, CFG can well represent the execution of the program process, but CFG does not contain variable declarations, missing some semantics, which has a great impact on the detection and location of vulnerabilities

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Software security vulnerability detection method based on grammatical features and semantic features
  • Software security vulnerability detection method based on grammatical features and semantic features
  • Software security vulnerability detection method based on grammatical features and semantic features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. Such as figure 1 Shown, method of the present invention comprises the following steps:

[0060] Step 1. Determine the granularity of the detection object:

[0061] The granularity of the detection object is a function, a file, a component or any related code fragment, which is determined according to the actual detection project needs. The language of the detection project is C / C++, Java, PHP.

[0062] Step 2. Establish a software history vulnerability library:

[0063] Collect software security vulnerabilities that are the same as the programming language of the detection software project from the public software vulnerability library, and establish a vulnerability sample library for languages. The sample size is the size of the detection granularity. The vulnerability library indicates that the samples with the detection granularity have vulnerabilit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a software security vulnerability detection method based on grammatical features and semantic features. The method comprises the following steps: 1, determining the granularityof a detection object; 2, establishing a software historical vulnerability library; 3, establishing an abstract syntax tree of the detection object; 4, embedding the abstract syntax tree; 5, compiling the source code of the detection object software; 6, establishing a program dependence graph of the detection object; 7, embedding the program dependence graph; 8, learning the features of the AST by using a graph convolutional neural network; and 9, learning the features of the PDG by using a bidirectional LSTM. The method has the advantages that the performance indexes of precision, accuracy and recall rate of the detection model are improved; and an AST tree structure is directly learned by adopting a graph neural network, so that any information is not lost, and the detection performanceof the model can be greatly improved by adopting a feature direct extraction mode based on the graph neural network.

Description

technical field [0001] The invention belongs to the technical field of software security, and in particular relates to a method for detecting software security loopholes based on grammatical features and semantic features. Background technique [0002] At present, with the massive disclosure of software source code and its vulnerability data, relevant data can be obtained in large quantities at low cost, and data-driven methods are used for vulnerability detection. Among them, the idea is to use the feature learning ability of deep learning technology to automatically extract the vulnerability characteristics of the source code module to establish a vulnerability detection model. The whole process is divided into two stages. The first is the model building stage, and the second is the model application stage. In the model building stage, first determine the granularity of the analysis object, that is, determine the size of the software source code module. The software soft ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/57G06F11/36G06N3/04
CPCG06F21/577G06F11/3668G06N3/044G06N3/045
Inventor 危胜军胡昌振钟浩陶莎赵敬宾
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products