A multi-dimensional graph tensor fusion representation and embedding method for a code

CN116720185BActive Publication Date: 2026-06-26HUAZHONG UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUAZHONG UNIV OF SCI & TECH
Filing Date
2023-05-23
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing code embedding methods based on text and symbols are inaccurate, cannot effectively capture the structured semantic information of the source code, and require a large training corpus, resulting in low efficiency. GINN ignores remote information and node order, leading to context loss.

Method used

A multidimensional graph tensor fusion representation method is adopted. By generating heterogeneous code graph structures such as AST, DDG, CFG and NCS of source code files, and combining graph convolution and tensor loop calculation, a high-dimensional code graph tensor is generated to learn the internal features of the code and apply it to tasks such as malicious code identification and vulnerability detection.

Benefits of technology

It improves the accuracy and efficiency of code semantic embedding, better captures the contextual information and long-term dependencies of the code, enhances the classification ability of neural networks, and improves the detection accuracy of downstream tasks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116720185B_ABST
    Figure CN116720185B_ABST
Patent Text Reader

Abstract

The application discloses a kind of multi-dimensional graph tensor fusion representation and embedding of code and application, belong to artificial intelligence field.It includes: extracting the syntax information and hierarchical structure information of source code file and binary file;Abstract Syntax Tree abstract syntax tree (AST), Data Dependence Graph data dependence graph (DDG), Control Flow Graph control flow graph (CFG), Natural Code Sequence natural language sequence (NCS) four different heterogeneous code graph structures are generated simultaneously to source code file and binary file;Four kinds of heterogeneous code graph structures are combined to generate high-dimensional graph tensor;Using Graph Tensor Convolution Network interpretable graph tensor convolution neural network (GTCN) to generate accurate code semantic embedding and capture code internal features, and related technology is applied in various downstream tasks, such as malicious code identification, in detection efficiency and accuracy rate aspect, good balance is achieved.
Need to check novelty before this filing date? Find Prior Art