Semantic similar code online detection method based on deep learning

A technology of semantic similarity and deep learning, which is applied in the field of software defect prediction in software engineering, can solve problems such as semantically similar codes cannot be detected, and achieve the effect of simple and easy operation

Active Publication Date: 2021-05-25
NAT INNOVATION INST OF DEFENSE TECH PLA ACAD OF MILITARY SCI
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The research results show that the existing clone code detection methods can detect less than 1% of the semantically similar codes, that is, more than 99% of the semantically similar codes cannot be detected by the existing clone code detection methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic similar code online detection method based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The method of the present invention will be described in further detail below in conjunction with the accompanying drawings.

[0019] The present invention uses static code analysis and natural language processing technology to mine natural language and program semantic information hidden in program text identifiers, then trains a neural network model based on deep learning technology to perform semantic similarity mapping, and performs semantic similarity according to network output results predict.

[0020] like figure 1 Described, a kind of semantically similar code online detection method based on deep learning technology, comprises the following steps:

[0021] Step 1: Use static code analysis technology to extract function information from the sample library.

[0022] First, the source code in the sample library is analyzed by static analysis technology, and the structure information and text identifier information of each function are extracted. Then, store th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a semantic similar code online detection method based on deep learning, and belongs to the technical field of software engineering software defect prediction. On the basis of identifier text similarity and semantic similarity, whether two given code snippets (functions) are semantically similar codes or not is judged online. The method comprises the following steps: firstly, extracting related identifier text and function structure information of each function from a sample library; performing natural language processing such as word segmentation, abbreviation expansion and part-of-speech tagging on the identifier text information; performing abstract processing on program structure information, converting the program structure information into vector representation, connecting the vector representation of the two functions, inputting the vector representation into a deep neural network, and performing semantic similarity learning and automatic detection. According to the invention, similar semantic information implied in a program text is mined by fully utilizing a deep learning technology, and the real-time performance and the accuracy of online detection are ensured.

Description

technical field [0001] The invention relates to an online detection method for semantically similar codes, in particular to an online detection method for semantically similar codes based on deep learning, and belongs to the technical field of software engineering software defect prediction. Background technique [0002] In the field of software defect prediction technology, clone code refers to code fragments with the same or similar functions, which can be divided into four categories: clone code with the same text, clone code with similar text, clone code with similar syntax, and clone code with similar semantics . [0003] The existence of clone code not only increases the redundancy of software, but also brings certain difficulties to software maintenance and software evolution. To this end, different methods have been proposed for detecting, eliminating and managing cloned code. Among them, effective detection of cloned codes is the premise and basis for managing and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/75G06F16/33G06N3/08
CPCG06F8/751G06F16/3344G06F16/3334G06F16/3338G06N3/08
Inventor 李光杰唐艺张翔易比一侯胜杰
Owner NAT INNOVATION INST OF DEFENSE TECH PLA ACAD OF MILITARY SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products