A method and system for serialization and deserialization of structured data based on meta-tag self-description

By introducing meta tags, the problems of inaccurate, unreadable, and semantically lost data type representations in existing technologies are solved. This enables data self-description, accuracy, and AI-friendly serialization and deserialization methods, improving the efficiency and accuracy of data exchange.

CN122309596APending Publication Date: 2026-06-30ZHENZHI (BEIJING) CONSULTING CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHENZHI (BEIJING) CONSULTING CO LTD
Filing Date
2026-04-13
Publication Date
2026-06-30

Smart Images

  • Figure CN122309596A_ABST
    Figure CN122309596A_ABST
Patent Text Reader

Abstract

This invention discloses a structured data serialization and deserialization method and system based on meta-tag self-description, belonging to the field of computer data serialization and deserialization technology. It aims to solve the industry pain points of existing data serialization and deserialization formats, such as incomplete types, separation of structure and data, poor human readability, debugging difficulties, and inefficient AI interaction. It provides a data serialization and deserialization method and system that balances human readability, AI friendliness, and machine efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer data serialization and deserialization technology. Background Technology

[0002] Currently used data serialization and deserialization formats have many shortcomings, mainly including:

[0003] 1. Text formats such as JSON cannot accurately represent data types and do not support precise type differentiation such as int8, uint16, and float32.

[0004] 2. Binary formats such as Protobuf are unreadable by humans, difficult to debug, and separate data structure definitions from data.

[0005] 3. Formats such as MessagePack only implement type and length descriptions, losing complete data structure and semantic information.

[0006] 4. Existing formats lack semantic descriptions and validation rules, making it difficult for artificial intelligence to understand the meaning of the data and to accurately parse and use the data without external documentation.

[0007] To address the aforementioned issues, this invention proposes a method and system for structured data serialization and deserialization based on meta-tag self-description. Summary of the Invention

[0008] This invention achieves deep binding between data structure, semantic information, verification rules and original data through meta tags, solving the problems of insufficient accuracy, poor readability, semantic missingness and difficulty in AI understanding in existing technologies.

[0009] The specific technical solution is as follows:

[0010] 1. Introduction to Meta Tags: Meta tags are segments of data that define data types, describe data functions, data validation rules, and other attribute values. In various language data structures, data classes, and data objects, they are represented as tags, annotations, etc. In text formats such as JSONC, they are represented as comments. It is a key-value pair data structure.

[0011] 2. Standardized Data Types: The main classes primarily include simple values, integers, decimals, strings, byte arrays, containers, and tag types. Subclasses include many more types such as int, uint, float32, and float64.

[0012] 3. Package data with meta tags:

[0013] a. Serialization: If there are meta tags, wrap the data with the meta tags and encode it.

[0014] b. Deserialization: First determine the main data type, then parse out the meta tags to extract the specific type and value.

[0015] 4. Representation of meta tags: In each language, they are represented as complete attributes; during serialization, some attributes can be ignored if there is no ambiguity; in text representation, some attributes can be ignored if there is no ambiguity.

[0016] 5. Core content of meta tags:

[0017] a. Data types: such as int8, uuid, url, etc.

[0018] b. Data functions: such as name, user login time, etc.

[0019] c. Data validation: For example, the email field must conform to the email specification format.

[0020] d. Data example: If the data is null, a data set based on the defined type will be created, allowing the caller to reconstruct it without loss of data.

[0021] e. Data attributes: such as IPv4 or IPv6, etc.

[0022] 6. Strong type constraints. All data must have a clearly defined type, and missing types are not allowed to ensure data consistency across languages ​​and systems.

[0023] Core advantages of the invention

[0024] 1. Self-description: The data comes with its own structure, semantics, and rules, requiring no external documentation.

[0025] 2. High-precision types: Supports precise types such as int8 / uint16 to avoid precision loss.

[0026] 3. Human-readable: can be mapped to JSONC / YAML / TOML, and meta tags are presented as comments.

[0027] 4. AI-friendly: The semantics are clear, and the meaning of the data can be understood without documentation.

[0028] 5. High efficiency and compact design: Low-level binary encoding, small size, and fast encoding and decoding.

[0029] 6. Automatic code generation: It can automatically generate structs, classes, etc. in various languages.

[0030] This invention is applicable to scenarios such as network communication, API interaction, configuration files, AI tool calls, and cross-system data exchange. Attached Figure Description

[0031] Figure 1: Serialization and Deserialization Flowchart

[0032] Figure 2: Example of JSONC text representation Detailed Implementation

[0033] Example 1: Serialization (URL type) 1. Determine that the data to be encoded is of URL type; 2. Automatically add URL type tags, format rules, and validation rules based on the meta tag system; 3. Perform validity checks on the URL data; 4. Integrate meta tags and URL data into a compact binary encoding; 5. Output the serialized binary data.

[0034] Example 2: Deserialization (URL type) 1. Input serialized binary data; 2. Parse the data, extract meta tags, and identify the data type as URL; 3. Parse the data content according to URL type rules; 4. Output the complete URL data.

[0035] Example 3: JSONC Text Representation 1. Enter the parsed URL data; 2. Convert meta tags into JSONC comments, annotating their type, function, format, etc.; 3. Output the data as text; 4. Generate JSONC format text that is readable by humans and understandable by AI.

[0036] Example 4: Automatic Generation of Go Language Code 1. Input a data structure containing complete meta tags; 2. Generate Go language struct code based on data type, fields, semantic information, etc.; 3. Convert meta tags to Go language struct tags; 4. Outputs Go language code files that can be directly compiled.

[0037] Example 5: Go Language Data Binding 1. Enter the parsed URL data; 2. Based on the URL type in the meta tag, automatically map it to the Go language standard library url.URL type; 3. Bind the data values ​​to the corresponding structure fields; 4. Outputs a Go language struct object that can be used directly.

Claims

1. A method for constructing a data intermediate structure, characterized in that: It receives source data in programming language structures, classes, or text formats such as JSONC, YAML, and TOML; it parses meta-tag information from annotations, tags, attributes, or similar structures of programming language structures and classes, or from text-formatted comments; it constructs an intermediate structure by unifying the data content and meta-tag information; the meta-tag information includes data structure definitions, semantic descriptions, and data validation rules; and it can complete data validation and structure reconstruction without external documentation.

2. A data serialization method, characterized in that: The intermediate structure constructed according to claim 1 is encoded, and the meta tag information and data content are integrated and encapsulated into a compact binary format.

3. A data deserialization method, characterized in that: The binary data described in claim 2 is deserialized to restore the intermediate structure; meta-tag information and data content can be obtained from the intermediate structure.

4. A method for representing data as text, characterized in that: The intermediate structure described in claim 1 is mapped to text formats such as JSONC, YAML, and TOML; Meta tag information is presented in the form of annotations.

5. A method for automatic code generation and data binding in multiple programming languages, characterized in that: According to claim 1, the meta-tag information in the intermediate structure automatically generates structures, classes, etc., corresponding to the programming language, and maps the meta-tag information to annotations, tags, attributes, or similar structures of that language. Simultaneously, the data content in the intermediate structure is bound to the generated structures, classes, etc., forming directly usable programming language objects.