A steganographic detection and attack method based on data format deviation

By analyzing the format deviations and object reference relationships in BPlist files, the system identifies and attacks steganographic data, thus solving the problem of detecting and blocking steganographic data in BPlist files and achieving efficient steganography detection and defense.

CN122241744APending Publication Date: 2026-06-19XIAMEN MEIYABAIKE INFORMATION SECURITY RES INST CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAMEN MEIYABAIKE INFORMATION SECURITY RES INST CO LTD
Filing Date
2026-01-23
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies lack effective methods for steganography detection and attack, especially for steganography detection and attack methods targeting the binary attribute list file (BPlist) format commonly used on iOS devices. They are unable to identify and block the transmission of steganographic data without destroying the file structure.

Method used

By parsing the header, object table, and offset table of the BPlist file, the system determines whether the offset relationships, boundaries, and content of objects conform to the format specifications, constructs object reference closures, and combines lenient and strict parsing methods to identify and attack steganographic data blocks, performing replacement or erasure operations.

Benefits of technology

It achieves efficient identification and blocking of steganographic data in BPlist files, with high detection accuracy and strong adversarial capabilities. It is applicable to multiple versions of BPlist files and significantly improves steganography detection and defense capabilities.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241744A_ABST
    Figure CN122241744A_ABST
Patent Text Reader

Abstract

This invention discloses a steganography detection and attack method based on data format deviation, specifically including: acquiring a BinaryPlist file to be detected and reading the file as a binary byte stream; parsing the file header of the BinaryPlist file to determine whether the file conforms to a preset BinaryPlist file identifier format; parsing the offset table information in the BinaryPlist file to obtain the file offset address corresponding to each object in the object table; based on the offset address, parsing each object in the object table one by one to obtain the object type, object length, and object content; determining whether the offset relationship, object boundary, and object content of the object deviate from the format specification of the BinaryPlist file; if a format deviation is detected, determining the position of the corresponding steganography data block; and performing an attack operation on the steganography data block, including steganography data replacement or steganography data erasure, to destroy or block the steganography information.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of document processing technology, and mainly to a method for steganography detection and attack based on data format deviation. Background Technology

[0002] Plist files are a unique file format frequently encountered in iOS application development or iOS device forensics. As one of the system data persistence solutions, this file format is very convenient and quick to use.

[0003] Modern steganography commonly uses images, audio, and video as file carriers. It employs a combination of methods, such as least significant bit (LBS), frequency domain steganography, DCT steganography algorithms, and data encryption, to hide the desired text or file, achieving the purpose of covert transmission.

[0004] Binary Plist (BPlist) is a common file format for iOS devices, often used to store basic device information and application configuration data. The structure of this file format can also be used to steganographically write data without disrupting the viewing of the original BPlist file data. Currently, there are no methods to detect or attack this type of file format. Summary of the Invention

[0005] To address the aforementioned problems, this invention proposes a BPlist file steganography detection and attack method based on data format deviation. This method supports attacks on steganographic files, replacing the data the sender intends to transmit; it also supports the complete erasure of steganographic data, disrupting the message transmission chain. The specific steps are as follows: S1. Obtain the Binary Plist file to be detected and read the file as a binary byte stream; S2. Parse the header of the Binary Plist file to determine whether the file conforms to the preset Binary Plist file identifier format; S3. Parse the offset table information in the Binary Plist file to obtain the file offset address corresponding to each object in the object table; S4. Based on the offset address, parse each object in the object table one by one to obtain the object type, object length and object content; S5. Determine whether the offset relationship, object boundary, and object content of the object deviate from the format specification of the Binary Plist file; S6. If a format deviation is detected, determine the location of the corresponding steganographic data block; S7. For the steganographic data block, perform an attack operation, including steganographic data replacement or steganographic data erasure, to destroy or block the steganographic information.

[0006] Preferably, in step S3, when parsing the offset table information, the starting position of the offset table, the number of objects, and the length of the offset field are determined based on the trail structure at the end of the file.

[0007] Preferably, in step S4, when parsing the object, the object type is determined based on the high-order information of the object identifier byte, and the object length or length encoding method is determined based on the low-order information.

[0008] Preferably, the determination of format deviation in step S5 specifically includes: the object offset address exceeding the legal range of the file, the abnormal spacing between adjacent object offset addresses, the object type identifier not conforming to the Binary Plist file specification, and the object declaration length being inconsistent with the actual parsable length.

[0009] Preferably, a reference graph between objects in the Binary Plist file is constructed, and the object reference closure is calculated based on the top-level object; Determine whether there are any objects not contained in the reference closure, or objects whose content changes while the object reference relationship remains unchanged, in order to identify the steganographic data object.

[0010] Preferably, when parsing a dictionary type object, the key-value pair logical relationship is reconstructed, and the reasonableness of the value object corresponding to the key in terms of type or length is judged based on the preset key-value semantic rules.

[0011] Preferably, the complete steganographic data block boundary is determined by merging multiple consecutive objects with abnormal deviations.

[0012] Preferably, for Binary Plist files generated by different versions or different generation tools, a combination of loose and strict parsing is used to perform steganography detection.

[0013] According to a second aspect of the present invention, a computer program product is provided, on which one or more computer programs are stored, which, when executed by a computer processor, implement the method described above.

[0014] The above-described one or more technical solutions in the embodiments of this application have at least one of the following technical effects: This invention uses discrete judgment based on the inherent object reference invariants of Binary Plist files, effectively identifying steganographic data blocks embedded under conditions of valid format and complete structure. Furthermore, without disrupting the original object reference structure or file resolvability, this invention can perform replacement, perturbation, or removal operations on detected steganographic data, thereby interfering with and blocking steganographic communication. This invention boasts advantages such as high detection accuracy, strong adversarial capabilities, and applicability to multiple versions of Binary Plist files, significantly improving the detection and defense capabilities against steganographic behavior in this type of file. Attached Figure Description

[0015] The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and, together with the description, serve to explain the principles of the invention. Other embodiments and many anticipated advantages of the embodiments will be readily recognized as they become better understood through reference to the following detailed description. Elements in the drawings are not necessarily to scale. The same reference numerals refer to corresponding similar parts.

[0016] Figure 1 A binary Plist file framework diagram according to an embodiment of the present invention is shown.

[0017] Figure 2a A schematic diagram of the original hexadecimal file header and object table critical section data interface of a sample file according to an embodiment of the present invention is shown.

[0018] Figure 2b A schematic diagram of the critical section data interface between the original hexadecimal object table and offset table of a sample file according to an embodiment of the present invention is shown.

[0019] Figure 2c A schematic diagram of the data interface in the middle area of ​​the original hexadecimal object table of a sample file according to an embodiment of the present invention is shown.

[0020] Figure 2d A schematic diagram of the raw hexadecimal file tail spare byte area data interface of a sample file according to an embodiment of the present invention is shown.

[0021] Figure 3 A flowchart illustrating a steganography detection and attack method based on data format deviation according to an embodiment of the present invention is shown.

[0022] Figure 4 A schematic diagram illustrating the specific process of steganography detection based on data format deviation according to an embodiment of the present invention is shown.

[0023] Figure 5This is a schematic diagram of the structure of a computer system suitable for implementing the electronic devices of the present application embodiments. Detailed Implementation

[0024] The present application will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings.

[0025] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.

[0026] Each binary Plist file consists of four main parts, such as Figure 1 As shown, the components, in order, are the header, object table, offset table, and tail. The standard format of a BPlist file contains only these four parts, which are ordered and continuous, with no other information in between; the data in the object table and offset table are also continuous.

[0027] When browsing BPlist file data, the file type is first identified by the file header. Then, the corresponding data is retrieved by iterating through the object table using the offset table starting index, the number of elements in the offset table, the length of integer bytes in the offset table, and the length of integer values ​​in the object table, all defined in the last 32 bytes of the file, combined with the indexes of each element written in the offset table.

[0028] 1) The critical section between the file header and the object table; In the standard BPlist file format, the file header and object table are contiguous. The file header is fixed at 8 bytes; the first 6 bytes are the format, HEX: 62 70 6C 69 73 74 (ASCII: bplist); the last 2 bytes are the version, HEX: 30 30 (ASCII: 00). The object table index is specified by the value of the first element of the offset table, such as... Figure 2a As shown, 0x3A is the first element of the offset table, and its value 0x14 indicates that the object table starts at 0x14 in the file.

[0029] According to the standard format, the object table should start at 0x08, so it can be determined that the 0x08~0x13 block in the figure contains steganographic data.

[0030] 2) Critical section between object table and offset table; In the standard format of a BPlist file, the object table and the offset table are contiguous, and the first element of the offset table should be appended to the end of the object table.

[0031] like Figure 2b As shown, the first element of the offset table is at 0x3A, but the last byte of the object table is at 0x2D. The two are not continuous and do not conform to the standard format. Therefore, it can be determined that the block from 0x2E to 0x39 in the figure contains steganographic data.

[0032] 3) The middle area of ​​the object table; In the standard format of a BPlist file, each element in the object table is consecutive, meaning that the end of each element description should immediately follow the beginning of the next element description. The element description includes the object type, length (if any), and content (if any).

[0033] There are 11 object types, and their byte values ​​and lengths are as follows: Single byte (HEX:0X): X=0 indicates null, X=8 indicates Boolean false, and X=9 indicates Boolean true; Integer (HEX:1X): The value of the integer is the 2^X bytes following this byte; Floating-point number (HEX:2X): The value of the floating-point number is the 2 to the power of X bytes following this byte; Date (HEX:33): The 8 bytes following this byte are the timestamp; Binary (HEX:4X): The X bytes following this byte are the binary content. If X=F, the following bytes are treated as an integer object and parsed further, and the resulting number is the number of bytes. String (HEX:5X): ASCII encoding, where X represents the number of bytes in this data. If X=F, it will be parsed as an integer object, and the resulting number will be the number of bytes. String (HEX:6X): Unicode encoding, where X represents the number of bytes in this data. If X=F, it will be parsed as an integer object, and twice the resulting number is the number of bytes. UID(HEX: 8X): The content of the UID is the X+1 bytes following this byte; Array (HEX:AX): X represents the number of its elements. If X=F, it will be parsed as an integer object, and the resulting number is the number of elements. Then, X elements are followed by their positions in the offset table. Set (HEX:CX): X represents the number of its elements. If X=F, then it is parsed as an integer object, and the result is the number of elements. Then, the positions of the X elements in the offset table are followed. Dictionary (HEX:DX): X represents the number of its elements. If X=F, it will be parsed as an integer object, and the resulting number is the number of elements. Then there are X keys in the offset table and X values ​​in the offset table.

[0034] like Figure 2c In the diagram, 0x0B is the index of the second element, and 0x2D is the index of the third element. The value at 0x0B is 0x5F, indicating that this element is a string. The value of the following byte, 0x10, is treated as an integer object for further parsing. 0x10 indicates that the next byte (1 = 20) represents the length of the object's content, which is 0x13 (at 0x0D in the diagram). This means the content of the second element in the object table is the string "NSHTTPAcceptCookies". The second element ends at 0x20. According to the standard format, the third element should start at 0x21, but in the diagram it starts at 0x2D. Therefore, it can be determined that the block from 0x21 to 0x2C in the diagram contains steganographic data.

[0035] 4) Spare bytes area at the end of the file; In the standard BPlist file format, the file ends with 6 bytes of spare bytes, HEX:00 00 00 00 00 00. For example... Figure 2d The area from 0x31 to 0x36 in the diagram is a spare byte region, which does not conform to the standard format. Therefore, it can be determined that the block from 0x31 to 0x36 in the diagram contains steganographic data.

[0036] Figure 3 The flowchart illustrates the steganography detection and attack methods for data format deviations. Figure 4 The specific flowchart for steganography detection is shown, such as... Figure 3 and Figure 4 As shown, it specifically includes: S1. Obtain the Binary Plist file to be detected and read the file as a binary byte stream.

[0037] S2. Parse the header of the Binary Plist file to determine whether the file conforms to the preset Binary Plist file identifier format.

[0038] S3. Parse the offset table information in the Binary Plist file to obtain the file offset address corresponding to each object in the object table; when parsing the offset table information, determine the starting position of the offset table, the number of objects, and the length of the offset field based on the trail structure at the end of the file.

[0039] S4. Based on the offset address, each object in the object table is parsed one by one to obtain the object type, object length and object content; when parsing the object, the object type is determined according to the high-order information of the object identifier byte, and the object length or length encoding method is determined according to the low-order information; when parsing a dictionary type object, the key-value pair logical relationship is reconstructed, and the rationality of the value object corresponding to the key in terms of type or length is judged based on the preset key-value semantic rules.

[0040] S5. Determine whether the offset relationship, object boundary, and object content of the object deviate from the format specification of the Binary Plist file. The conditions for determining deviation are: the object offset address exceeds the legal range of the file, the spacing between adjacent object offset addresses is abnormal, the object type identifier does not conform to the Binary Plist file specification, and the object declaration length is inconsistent with the actual parsable length.

[0041] S6. If a format deviation is detected, determine the location of the corresponding steganographic data block, and determine the complete boundary of the steganographic data block by merging multiple consecutive objects with abnormal deviations.

[0042] S7. For the steganographic data block, perform an attack operation, including steganographic data replacement or steganographic data erasure, to destroy or block the steganographic information.

[0043] Furthermore, a crucial structural characteristic of Binary Plist files is that all objects must ultimately be referenced by `topObject`, forming a complete reference closure. Therefore, there are no detached objects or data that is parsed but not logically used; objects exist... Since objects are used, steganography is most easily hidden within existing objects or object content that do not affect the semantics of reference. Therefore, this embodiment implements an object reference closure consistency determination algorithm, which specifically includes: First, input the Binary Plist object table, offset table, and topObject index; Construct an object reference graph, treating each object as a node. If the content of object A references object B, then a directed edge is created, resulting in a directed graph G. Starting from topObject, the set of all nodes that can be accessed by performing DFS / BFS algorithms on graph G is denoted as Closure(topObject). For each object, determine whether it belongs to a reference closure. If it does not, then the discretization is suspected of strong steganography.

[0044] If the content of a referenced object changes, but its referencing method, reference count, and reference path remain completely unchanged, this is extremely rare in normal configuration files, but it is very suitable for steganography. The determination steps are as follows: Calculate the in-degree of the object point (in_degree) and the reference path set (RefPathSet); Perform content length change detection, content entropy change detection, or content interpretability detection on the object; If the following conditions are met: RefPathSet remains unchanged, in_degree remains unchanged, but the content undergoes non-semantic changes, then it is determined to be a steganographic object.

[0045] For Binary Plist files generated by different versions or different generation tools, a combination of lenient and strict parsing methods is used to perform steganography detection.

[0046] Steganography detection is performed using a combination of loose parsing and strict parsing. Loose parsing is used to maintain object references even when there are non-standard conditions in object offset, object length, or object type, while strict parsing is used to verify the consistency of object format.

[0047] This invention proposes a method for steganography detection and attack on BPlist files based on data format deviation. This method has been verified and can effectively detect the presence of steganographic data, and supports steganographic data replacement and erasure. It complements modern steganography techniques, has strong practicality, and has high application value.

[0048] The following is for reference. Figure 5 It shows a schematic diagram of the structure of a computer system 500 suitable for implementing electronic devices according to embodiments of the present application. Figure 5 The electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.

[0049] like Figure 5As shown, the computer system 500 includes a central processing unit (CPU) 501, which can perform various appropriate actions and processes based on programs stored in read-only memory (ROM) 502 or programs loaded from storage section 508 into random access memory (RAM) 503. The RAM 503 also stores various programs and data required for the operation of the system 500. The CPU 501, ROM 502, and RAM 503 are interconnected via a bus 504. An input / output (I / O) interface 505 is also connected to the bus 504.

[0050] The following components are connected to I / O interface 505: an input section 506 including a keyboard, mouse, etc.; an output section 507 including a liquid crystal display (LCD) and speakers, etc.; a storage section 508 including a hard disk, etc.; and a communication section 509 including a network interface card such as a LAN card and a modem, etc. The communication section 509 performs communication processing via a network such as the Internet. A drive 510 is also connected to I / O interface 505 as needed. A removable medium 511, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on drive 510 as needed so that computer programs read from it can be installed into storage section 508 as needed.

[0051] Specifically, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a computer-readable storage medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 509, and / or installed from removable medium 511. When the computer program is executed by central processing unit (CPU) 501, it performs the functions defined in the methods of this application. It should be noted that the computer-readable storage medium of this application can be a computer-readable signal medium or a computer-readable storage medium or any combination thereof. The computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this application, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in connection with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can also be any computer-readable storage medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. Program code contained on a computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.

[0052] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof. Programming languages ​​include object-oriented programming languages—such as Java, Smalltalk, and C++—as well as conventional procedural programming languages—such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0053] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0054] The modules described in the embodiments of this application can be implemented in software or in hardware.

[0055] In another aspect, this application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or it may exist independently and not assembled into the electronic device. The aforementioned computer-readable storage medium carries one or more programs. When the aforementioned one or more programs are executed by the electronic device, the electronic device causes the following to occur: S1. Obtain the Binary Plist file to be detected and read the file as a binary byte stream; S2. Parse the file header of the Binary Plist file to determine whether the file conforms to a preset Binary Plist file identifier format; S3. Parse the offset table information in the Binary Plist file to obtain the file offset address corresponding to each object in the object table; S4. Based on the offset address, parse each object in the object table one by one to obtain the object type, object length, and object content; S5. Determine whether the offset relationship, object boundary, and object content of the object deviate from the format specification of the Binary Plist file; S6. If a format deviation is detected, determine the location of the corresponding steganographic data block; S7. Perform an attack operation on the steganographic data block, including steganographic data replacement or steganographic data erasure, to destroy or block the steganographic information.

[0056] The above description is merely a preferred embodiment of this application and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-described inventive concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features with similar functions disclosed in this application.

Claims

1. A steganography detection and attack method based on data format deviation, characterized in that, include: S1. Obtain the BinaryPlist file to be detected and read the file as a binary byte stream; S2. Parse the header of the BinaryPlist file to determine whether the file conforms to the preset BinaryPlist file identifier format; S3. Parse the offset table information in the BinaryPlist file to obtain the file offset address corresponding to each object in the object table; S4. Based on the offset address, parse each object in the object table one by one to obtain the object type, object length and object content; S5. Determine whether the offset relationship, object boundary and object content of the object deviate from the format specification of the BinaryPlist file; S6. If a format deviation is detected, determine the location of the corresponding steganographic data block; S7. For the steganographic data block, perform an attack operation, including steganographic data replacement or steganographic data erasure, to destroy or block the steganographic information.

2. The steganography detection and attack method according to claim 1, characterized in that, In step S3, when parsing the offset table information, the starting position of the offset table, the number of objects, and the length of the offset field are determined based on the trail structure at the end of the file.

3. The steganography detection and attack method according to claim 1, characterized in that, In step S4, when parsing the object, the object type is determined based on the high-order information of the object identifier byte, and the object length or length encoding method is determined based on the low-order information.

4. The steganography detection and attack method according to claim 1, characterized in that, The specific criteria for determining format deviation in step S5 include: object offset address exceeding the legal range of the file, abnormal spacing between adjacent object offset addresses, object type identifier not conforming to the BinaryPlist file specification, and object declaration length inconsistent with the actual parsable length.

5. The steganography detection and attack method according to claim 1, characterized in that, Also includes: Construct a graph of reference relationships between objects in the BinaryPlist file, and calculate the object reference closure based on the top-level object; Determine whether there are any objects not contained in the reference closure, or objects whose content changes while the object reference relationship remains unchanged, in order to identify the steganographic data object.

6. The steganography detection and attack method according to claim 1, characterized in that, Also includes: When parsing a dictionary-type object, the key-value pair logical relationship is reconstructed, and the reasonableness of the value object corresponding to the key in terms of type or length is judged based on the preset key-value semantic rules.

7. The steganography detection and attack method according to claim 1, characterized in that, Also includes: By merging multiple consecutive objects with abnormal deviations, the complete boundary of the steganographic data block is determined.

8. The steganography detection and attack method according to claim 1, characterized in that, Also includes: For BinaryPlist files generated by different versions or different generation tools, a combination of lenient and strict parsing methods is used to perform steganography detection.

9. A computer program product, characterized in that, It stores a computer program that, when executed by a processor, implements the method as described in any one of claims 1-8.

10. A computing system, characterized in that, It includes a processor and a memory, the processor being configured to perform the method as described in any one of claims 1-8.