A binary code analysis method and system for cross-architecture knowledge transfer

By employing a cross-architecture knowledge transfer binary code analysis method, and utilizing word embedding models and linear transformation alignment matrices, a unified instruction semantic space is established. This resolves the analysis differences between different CPU architectures, improves the analysis accuracy of low-frequency CPU architectures, and enhances cross-architecture application capabilities.

CN122195451APending Publication Date: 2026-06-12SHANGHAI PALMIN TECH +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI PALMIN TECH
Filing Date
2026-03-27
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing binary code analysis methods are difficult to reuse across CPU architectures, especially for low-frequency CPU architectures where there is a lack of sufficient code samples and labeled data, resulting in low accuracy and coverage of analysis tools. Furthermore, existing cross-architecture conversion methods lose architecture-specific semantic information, affecting analysis accuracy.

Method used

By generating binary code corpora for specific CPU architectures, training the instruction vector space using a word embedding model, and aligning the instruction semantic spaces of different CPU architectures using a linear transformation alignment matrix, a unified general word embedding vector space is established, enabling cross-architecture knowledge transfer.

🎯Benefits of technology

It enables the reuse of binary code analysis knowledge across different CPU architectures, improves the analysis accuracy of CPU architectures used in low frequency, provides a unified vector space to support tasks such as instruction prediction and vulnerability detection across architectures, and has good scalability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The application provides a binary code analysis method and system for cross-architecture knowledge migration, and relates to binary code static analysis technology in the field of software code analysis. In view of the technical problems that binary code analysis methods between different CPU architectures are difficult to reuse, and the low-frequency use of CPU architecture marking data and analysis knowledge is scarce, the application extracts basic blocks and word embedding training from binary codes of multiple CPU architectures to obtain corresponding word embedding vector spaces of each architecture. Then, the sparse matrix is used to mark the semantic similar instruction pairs across architectures, and the linear transformation alignment matrix is calculated through iterative optimization. Finally, the vector space corresponding to the high-frequency use of the CPU architecture is taken as the benchmark, and the vector spaces of other architectures are aligned and merged into a unified general word embedding vector space through linear transformation. Compared with the prior art, the application realizes semantic alignment and knowledge migration across instruction sets, so that the analysis knowledge accumulated on the high-frequency CPU architecture can be reused for binary code analysis of the low-frequency CPU architecture, and effectively solves the problem of low analysis accuracy caused by insufficient marking data.
Need to check novelty before this filing date? Find Prior Art