Streaming session processing methods, products, and equipment for heterogeneous database playback

CN122309378APending Publication Date: 2026-06-30CETC JINCANG (BEIJING) TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CETC JINCANG (BEIJING) TECH CO LTD
Filing Date
2026-04-13
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies lack session-level continuity and state management when handling the MySQL protocol, resulting in the inability to handle MySQL commands across multiple data packets, the inability to associate multiple requests and responses within the same session, and the inability to guarantee the correctness of session state, SQL execution, and replay.

Method used

By obtaining the identification information of data packets to determine the TCP stream structure, maintaining session state information, reassembling data packets into the data buffer of the TCP stream structure, and parsing them, TCP stream-level session state management is achieved, including doubly linked list management, data packet reassembly, protocol type parsing, and state information updating.

Benefits of technology

It implements complete parsing of the MySQL protocol, ensuring that the replay process is highly consistent with the source business behavior, improving the accuracy, completeness and reliability of heterogeneous database replay, supporting the processing of cross-packet long SQL commands and fragmented transmission result sets, and meeting the testing requirements for database localization migration.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309378A_ABST
    Figure CN122309378A_ABST
Patent Text Reader

Abstract

This invention provides a streaming session processing method, product, and device for heterogeneous database playback. The method includes: acquiring data packets to be parsed; determining the TCP stream structure to which the data packets belong based on their identification information; each TCP stream structure maintaining the state information of its corresponding session; reassembling the data packets into the data buffer of the TCP stream structure; the data buffer recording the payload information in the data packets; parsing the payload information in the data buffer; and updating the state information in the TCP stream structure based on the parsing result. This method enables unified management and state maintenance of TCP stream sessions, ensuring a high degree of consistency between the playback process and the source-end business behavior, improving the accuracy, completeness, and reliability of heterogeneous database playback, and providing stable and reliable traffic parsing and playback support for domestic database substitution migration.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of database technology, and in particular to a streaming session processing method, product, and device for heterogeneous database playback. Background Technology

[0002] Currently, foreign databases such as Oracle and MySQL still dominate key sectors in China, creating a deep technological dependence. Under the domestic IT innovation policy, enterprises hope to achieve domestic substitution at minimal cost, but face three major challenges: performance and compatibility differences among domestic databases, difficulty in simulating real production scenarios in testing environments, and low efficiency and error-proneness of manual migration. To address these challenges, heterogeneous database replay technology has become a key solution. It captures real production workloads for precise testing and can automatically compare and analyze data to quickly pinpoint performance bottlenecks and compatibility issues after migration.

[0003] Traditional heterogeneous database playback solutions based on network protocol analysis and packet capture typically employ a single-packet parsing model when handling MySQL protocols, resulting in a lack of session-level continuity and state management. A key characteristic is that existing network protocol analysis methods for MySQL protocols are usually based on individual data packets, lacking session management at the TCP stream level. This makes it impossible to handle MySQL commands spanning multiple data packets, or to correlate multiple requests and responses within the same session.

[0004] Under such an implementation, key metrics for measuring the correctness of replay, such as session state, SQL execution, and final comparison after replay, cannot be guaranteed. Therefore, designing and implementing a MySQL streaming session protocol parsing system for heterogeneous database replay is a fundamental requirement for enabling replay functionality from a MySQL database to other heterogeneous systems. Summary of the Invention

[0005] In view of the above problems, the present invention proposes a streaming session processing method, product and device for heterogeneous database replay that overcomes or at least partially solves the above problems.

[0006] One objective of this invention is to achieve TCP stream-level MySQL protocol parsing and session state management.

[0007] A further objective of this invention is to ensure the integrity, authenticity, and consistency of the database replay process, thereby improving the accuracy and reliability of migration testing for domestically produced databases.

[0008] Specifically, this invention provides a streaming session processing method for heterogeneous database playback, comprising: The system acquires the data packets to be parsed, determines the TCP stream structure to which the data packets belong based on their identification information, and each TCP stream structure is used to maintain the state information of the corresponding session. The data packets are reassembled into the data buffer of the TCP stream structure. The data buffer is used to record the payload information in the data packets. The load information is parsed in the data buffer; Update the state information in the TCP stream structure based on the parsing results.

[0009] Optionally, the status information includes: source and destination IPs and ports, session ID, sequence number status, reassembled data stream, and login information; All TCP stream structures are managed uniformly through a doubly linked list; The steps to determine the TCP flow structure to which a data packet belongs based on its identification information include: finding the corresponding TCP flow structure based on the source and destination IP addresses and ports of the data packet.

[0010] Optionally, the steps for finding the corresponding TCP stream structure based on the source and destination IPs and ports of the data packets include: Determine if a TCP stream structure corresponding to a data packet exists in the doubly linked list; If it exists, locate the TCP stream structure and execute the step of reassembling the data packets into the data buffer of the TCP stream structure; If it does not exist, a new TCP stream structure is created and added to the doubly linked list, followed by the step of reassembling the packets into the data buffer of the TCP stream structure.

[0011] Optionally, the status information may also include a list of preprocessing statements; The steps for parsing the payload information in the data buffer include: When a data packet is parsed as a request to prepare a preprocessing statement, the statement identifier, statement text, and parameter information are extracted and stored in the preprocessing statement list of the TCP stream structure. When a data packet is parsed as a preprocessing statement execution request, the corresponding statement information is searched in the preprocessing statement list according to the statement identifier to complete the instruction parsing; When a data packet is parsed as a preprocessed statement release request, the corresponding statement information is removed from the preprocessed statement list based on the statement identifier.

[0012] Optionally, the step of reassembling packets into the data buffer of the TCP stream structure includes: Extract the sequence number of the data packet, and perform validity verification and sorting on the sequence number; According to the sorted sequence number order, the payload data of the data packet is written into the data buffer of the corresponding TCP stream structure, thereby completing the continuous data reassembly of the data packet.

[0013] Optionally, the session lifecycle state corresponding to the TCP stream structure can be detected in real time; When any TCP stream structure is detected to meet the preset cleanup conditions, a cleanup operation is performed on the TCP stream structure. The preset cleanup conditions include: no activity in the TCP stream structure within a preset time threshold or capture of the corresponding end message or reset message of the TCP stream structure.

[0014] Optionally, the step of parsing the payload information in the data buffer includes: Determine the protocol type of the payload information; When the protocol type is database command, the payload information is parsed and processed using database operation instructions. When the protocol type is database response, perform database execution result parsing processing on the payload information.

[0015] Optionally, after the step of parsing the payload information in the data buffer, the method further includes: By performing time-series association and binding on database commands and database responses within the same session, request-response association pairs can be obtained. The request-response pair is recorded in the TCP stream structure for comparison of results after replaying from a heterogeneous database.

[0016] According to another aspect of the present invention, a computer program product is also provided, comprising a computer program that, when executed by a processor, implements the steps of the streaming session processing method for heterogeneous database playback as described above.

[0017] According to another aspect of the present invention, a computer device is also provided, including a memory, a processor, and a machine-executable program stored in the memory and running on the processor, wherein the processor executes the machine-executable program to implement the steps of the streaming session processing method for heterogeneous database playback as described above.

[0018] The heterogeneous database playback streaming session processing method of this invention first acquires the data packets to be parsed, determines the TCP stream structure to which the data packets belong based on the packet identification information, and each TCP stream structure is used to maintain the state information of the corresponding session; then, the data packets are reassembled into the data buffer of the TCP stream structure, and the data buffer is used to record the payload information in the data packets; the data packets are processed in the data buffer; and the state information in the TCP stream structure is updated according to the processing result. This method enables unified management and state maintenance of TCP stream sessions, thereby ensuring a high degree of consistency between the playback process and the source end business behavior, improving the accuracy, completeness, and reliability of heterogeneous database playback, and providing stable and reliable traffic parsing and playback support for the migration of domestically produced databases.

[0019] The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments of the invention in conjunction with the accompanying drawings. Attached Figure Description

[0020] The following sections will describe some specific embodiments of the invention in detail by way of example and not limitation, with reference to the accompanying drawings. The same reference numerals in the drawings denote the same or similar parts or portions. Those skilled in the art should understand that these drawings are not necessarily drawn to scale. In the drawings: Figure 1 This is a flowchart illustrating a streaming session processing method for heterogeneous database playback according to an embodiment of the present invention. Figure 2 This is a flowchart illustrating a streaming session processing method for heterogeneous database playback according to another embodiment of the present invention. Figure 3 This is a schematic diagram of a computer program product according to an embodiment of the present invention; Figure 4 This is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention; and Figure 5 This is a schematic diagram of a computer device according to an embodiment of the present invention. Detailed Implementation

[0021] Those skilled in the art should understand that the embodiments described below are merely a part of the embodiments of the present invention, and not all of the embodiments of the present invention. These partial embodiments are intended to explain the technical principles of the present invention and are not intended to limit the scope of protection of the present invention. Based on the embodiments provided by the present invention, all other embodiments obtained by those skilled in the art without creative effort should still fall within the scope of protection of the present invention.

[0022] It should be noted that the logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be specifically implemented in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus or device (such as a computer-based system, a processor-included system or other system that can fetch and execute instructions from, an instruction execution system, apparatus or device).

[0023] Currently, foreign databases such as Oracle and MySQL still dominate key sectors in China, creating a deep technological dependence. Under the domestic IT innovation policy, enterprises hope to achieve domestic substitution at minimal cost, but face three major challenges: performance and compatibility differences among domestic databases, difficulty in simulating real production scenarios in testing environments, and low efficiency and error-proneness of manual migration. To address these challenges, heterogeneous database replay technology has become a key solution. It captures real production workloads for precise testing and can automatically compare and analyze data to quickly pinpoint performance bottlenecks and compatibility issues after migration.

[0024] Traditional heterogeneous database playback solutions based on network protocol analysis and packet capture typically employ a single-packet parsing model when handling MySQL protocols, thus lacking session-level continuity and state management. A key characteristic is that current network protocol analysis for MySQL protocols is usually based on individual data packets, lacking session management at the TCP stream level. This makes it impossible to handle MySQL commands spanning multiple data packets, or to correlate multiple requests and responses within the same session. Such processing cannot effectively address the following situations: 1. Long MySQL commands spanning multiple TCP packets; 2. The preparation and execution process of preprocessing statements; 3. Fragmented transmission of query result sets; 4. State transitions within the session lifecycle.

[0025] Under such an implementation scheme, key metrics for measuring the correctness of replay, such as session status, SQL execution, and final comparison after replay, cannot be guaranteed.

[0026] In similar database load replays or simulations, limitations are common. Some parsing schemes completely skip the response packet in the communication protocol and only focus on the request packet. Others only consider the request and response of some commands, ignoring session-level state and continuity. These schemes will result in session-related errors or SQL failures due to incomplete parsing during the final replay, causing severe distortion of the replay results.

[0027] However, in actual customer business scenarios, users usually have a strong demand for the consistency and correctness of the replay results. This requires that database replay must restore the behavior at the time of source capture as much as possible in terms of session establishment and state maintenance, correct SQL parsing and replay, and comparison of replay results.

[0028] To address this need, this solution proposes an innovative approach, implementing a MySQL streaming session protocol parsing system for heterogeneous database playback. This solution supports parsing mimicking source-end capture, ensuring playback more closely reflects real-world business scenarios and improving the applicability and operability of database playback.

[0029] Specifically, this invention provides a streaming session processing method for heterogeneous database playback. Figure 1 This is a flowchart illustrating a streaming session processing method for heterogeneous database playback according to an embodiment of the present invention, as shown below. Figure 1 As shown, the streaming session processing method for heterogeneous database playback includes at least the following steps S101 to S104.

[0030] Step S101: Obtain the data packet to be parsed and determine its TCP stream structure based on the packet's identification information. Each TCP stream structure is used to maintain the state information of the corresponding session.

[0031] To fully encompass the complete context information of a single MySQL session from its establishment to its closure, ensuring that all operations within the same session can be processed continuously in a unified environment, the status information typically includes: source and destination IPs and ports, session ID, sequence number status, reassembled data stream, list of preprocessed statements, and login information. Furthermore, to achieve efficient management, rapid location, and flexible scheduling of session resources for massive numbers of active sessions, this invention employs a doubly linked list for unified management of all TCP stream structures. Based on the doubly linked list, rapid traversal, addition, and removal of active sessions are possible, adapting to the management needs of a large number of concurrent sessions in high-concurrency business scenarios.

[0032] In some optional embodiments, the step of determining the TCP flow structure to which a data packet belongs based on its identification information generally includes: searching for the corresponding TCP flow structure based on the source and destination IP addresses and ports of the data packet. Using the source and destination IP addresses and ports as the basis for determining session affiliation ensures that bidirectional data packets of the same TCP connection are accurately grouped into the same TCP flow structure, thereby quickly and accurately matching data packets with session affiliation and avoiding the fragmentation of data packets from the same session, which would disrupt session continuity.

[0033] Optionally, the steps for finding the corresponding TCP stream structure based on the source and destination IPs and ports of the data packet generally include: determining whether a TCP stream structure corresponding to the data packet exists in the doubly linked list; if it exists, locating the TCP stream structure and performing the step of reassembling the data packet into the data buffer of the TCP stream structure; if it does not exist, creating a new TCP stream structure and adding it to the doubly linked list, and then performing the step of reassembling the data packet into the data buffer of the TCP stream structure. This process enables the automatic creation of new sessions and the precise reuse of existing sessions, ensuring that each data packet belongs to its corresponding session carrier.

[0034] Step S102: Reassemble the data packets into the data buffer of the TCP stream structure.

[0035] To ensure the integrity and accuracy of the reassembled data stream and to avoid out-of-order, duplicate, or abnormal data packets interfering with the subsequent parsing process, in some optional embodiments, the step of reassembling data packets into the data buffer of the TCP stream structure may generally include: extracting the sequence number of the data packet and performing validity verification and order sorting on the sequence number; writing the payload data of the data packet into the data buffer of the corresponding TCP stream structure according to the sorted sequence number order, thereby completing the continuous data reassembly of the data packet.

[0036] This step is the core of achieving complete cross-packet data parsing, using TCP sequence numbers to achieve ordered concatenation of scattered data packets. The sequence number, typically a field in the TCP header used to identify the byte order of the data stream, is a core identifier ensuring reliable TCP data transmission. During reassembly, validity checks filter out duplicate, out-of-order, and invalid data packets exceeding the receive window range, preventing abnormal data from interfering with the parsing process. Then, according to the sequence numbers, the payload data of the data packets is sequentially written into the data buffer of the corresponding TCP stream structure, concatenating the protocol data scattered across multiple TCP packets into a continuous and complete data stream. This step enables the complete concatenation of long SQL commands transmitted across multiple TCP packets or fragmented query result sets, completely overcoming the limitations of traditional single-packet parsing in handling long and fragmented data.

[0037] Alternatively, the data buffer can be implemented using a dynamic circular buffer, which can automatically adjust memory usage according to the data length and automatically release the used space after data parsing is completed.

[0038] Step S103: Parse the load information in the data buffer.

[0039] To fully cover the bidirectional interaction logic of a MySQL session and overcome the core deficiency of traditional solutions that only parse request messages and ignore response messages, in some optional embodiments, the step of parsing the payload information in the data buffer generally includes: determining the protocol type of the payload information; if the protocol type is a database command, performing database operation instruction parsing on the payload information; if the protocol type is a database response, performing database execution result parsing on the payload information. This step is the core of achieving complete MySQL protocol parsing, strictly following the MySQL native protocol specification, and performing bidirectional full parsing on the reassembled continuous data stream. First, based on the transmission direction of the data packets and the protocol header identifier, the protocol type of the payload information is distinguished: messages sent from the client to the server are classified as database commands, and messages returned from the server to the client are classified as database responses. Both types of messages are fully parsed simultaneously, overcoming the deficiency of traditional solutions that only parse request messages and ignore response messages.

[0040] The parsing content for database operation instructions typically includes: SQL statement text, execution type (query / write / update / delete), prepared statement ID, parameters, transaction instructions (BEGIN / COMMIT / ROLLBACK), login authentication information, etc. The parsing content for database execution results typically includes: query result set (multi-row, multi-column data), execution success / failure status, number of rows affected, error code, error message, column names, column types, result end marker, etc. In the MySQL protocol, the message formats for these two types of data are completely different, therefore, they need to be distinguished to improve parsing efficiency.

[0041] In some alternative embodiments, the step of parsing the payload information in the data buffer may also include: when a data packet is parsed as a prepared statement request, extracting the statement identifier, statement text, and parameter information, and storing them in the prepared statement list of the TCP stream structure; when a data packet is parsed as a prepared statement execution request, searching for the corresponding statement information in the prepared statement list according to the statement identifier to complete the instruction parsing; when a data packet is parsed as a prepared statement release request, removing the corresponding statement information from the prepared statement list according to the statement identifier. For frequently used MySQL prepared statements in business applications, this invention also designs a full lifecycle parsing and tracking logic: when a prepared statement request is parsed, the unique identifier, SQL text, and parameter information of the statement are extracted and stored in the prepared statement list of the current TCP stream structure; when an execution request with the corresponding identifier is parsed, the pre-stored statement information is directly retrieved from the list to complete the complete instruction parsing; when a release request is parsed, the corresponding statement information in the list is promptly cleaned up to avoid invalid data occupying memory. This design fully supports the entire process of preparing, executing, and releasing prepared statements, solving the core pain point that traditional solutions cannot handle prepared statements across requests, leading to SQL execution failures during replay.

[0042] In addition, to achieve a one-to-one correspondence between requests and responses, and to provide accurate traceability for subsequent consistency comparison of replay results, the process can optionally include, after parsing the load information in the data buffer, the following steps: temporally associating and binding database commands and responses within the same session to obtain request-response pairs; and recording these pairs in the TCP stream structure for result comparison after heterogeneous database replay. This way, after protocol parsing, database commands and corresponding database responses within the same session can be bound one-to-one according to the temporal sequence of business interactions, forming traceable request-response pairs, which are then stored in the current TCP stream structure. This associated data serves as the core basis for result consistency comparison and performance bottleneck localization after subsequent heterogeneous database replay, ensuring that the replay effect is verifiable and traceable.

[0043] Step S104: Update the state information in the TCP stream structure based on the parsing result.

[0044] To ensure controllable state throughout the entire session lifecycle and guarantee that all operations within the same session are executed in the correct context, avoiding parsing errors and replay anomalies caused by missing or incorrect state, this step synchronizes all critical information generated during parsing to the corresponding TCP stream structure's state information in real time. This includes login authentication state, sequence number update state, preprocessing statement execution state, and result set transmission state. This ensures that all operations within the same session are executed in a matching context. By maintaining session state in real time, the entire lifecycle state transition from session establishment to closure can be fully tracked, fundamentally preventing various replay problems caused by state anomalies and ensuring complete consistency between the entire session's execution logic and the production source.

[0045] In addition, it is necessary to monitor the session lifecycle status corresponding to the TCP stream structure in real time. When any TCP stream structure is detected to meet the preset cleanup conditions, a cleanup operation is performed on the TCP stream structure. The preset cleanup conditions include: the TCP stream structure has no activity within a preset time threshold or the corresponding end packet or reset packet is captured. This operation enables automated closed-loop management of session resources. The system continuously polls the running status of all TCP stream structures to promptly identify ended or invalid sessions. The end packet and reset packet here correspond to the FIN end packet and RST reset packet of the TCP protocol, respectively, indicating the normal closure and abnormal disconnection of the TCP connection. The preset time threshold is a configurable session timeout threshold that can be flexibly adjusted according to business scenarios to identify idle sessions with no data interaction for a long time. When a session is detected to meet any cleanup condition, the system automatically removes the corresponding TCP stream structure from the doubly linked list and releases the corresponding memory resources to avoid memory leaks caused by invalid sessions occupying resources for a long time.

[0046] The method of this invention achieves MySQL streaming session protocol parsing for heterogeneous database replay through a complete closed loop, including unified session management at the TCP stream level, packet streaming reassembly, bidirectional full parsing of the MySQL protocol, real-time maintenance of session state, and automatic cleanup of session resources. This method can fully handle complex scenarios such as cross-packet long SQL commands, fragmented transmission of result sets, and the entire process of preprocessing statements. It accurately tracks session state changes throughout the entire lifecycle, providing complete, accurate, and traceable parsing results. This significantly improves the authenticity, consistency, and reliability of heterogeneous database replay, fully meeting the full-process requirements of load replay, compatibility verification, and performance tuning during the migration of databases to domestic production environments.

[0047] Figure 2 This is a flowchart illustrating a streaming session processing method for heterogeneous database playback according to another embodiment of the present invention, as shown below. Figure 2As shown, the streaming session processing method for heterogeneous database playback includes at least the following steps S201 to S210.

[0048] Step S201: Receive data packets. This step is the entry point for the entire method. The received data packets are TCP packets captured from the network link layer, representing the bidirectional interaction between the MySQL client and the database server. These packets serve as the foundational data source for all subsequent session parsing, reassembly, and processing.

[0049] Step S202: Determine if a corresponding TCP flow structure exists. This step is the core decision-making step for implementing unified session-level management. Its core function is to determine whether a TCP flow structure to which a data packet belongs exists within the currently managed active sessions using the core identifier of the data packet. In actual execution, the source IP address, source port, destination IP address, and destination port of the data packet are extracted as unique matching identifiers. The doubly linked list of all managed active TCP flow structures is then traversed to complete the matching judgment. This pre-emptive session attribution determination ensures that all data packets from the same session can be aggregated into the same processing link, avoiding the core defects of traditional single-packet parsing without session attribution and scattered data processing from the root of the process.

[0050] Step S203: If the determination in step S202 is yes, locate the existing TCP stream structure. When a corresponding TCP stream structure already exists, directly locate the TCP stream structure specific to that session, reusing existing session context, data buffer, state information, and other resources. This step ensures the continuity of packet processing context within the same session while avoiding the memory and computational overhead of repeatedly creating data structures, significantly improving packet processing efficiency in high-concurrency scenarios.

[0051] Step S204: If the determination in step S202 is negative, a new TCP stream structure is created. When no matching TCP stream structure is found, it indicates that the data packet corresponds to a newly initiated MySQL session. In this case, a new TCP stream structure is created according to the preset structure specifications, initializing core fields such as session ID, sequence number status, preprocessed statement list, and login information. Simultaneously, the newly created TCP stream structure is added to the doubly linked list managing active sessions. Optionally, the session ID can generally be generated by hashing the client IP address and port to ensure the global uniqueness of a single session. This step enables full lifecycle tracking of newly created MySQL sessions, ensuring that all business sessions are completely covered without omission.

[0052] Step S205: Perform data packet reassembly.

[0053] Step S206: Reassemble the data packets into the data buffer of the corresponding TCP stream structure according to the sequence number of the data packets.

[0054] The two steps described above together complete the streaming ordered reassembly of TCP packets, which is the core of solving the problem of parsing long MySQL commands and fragmented query result sets across multiple TCP packets. In actual execution, the TCP sequence number of the current packet is first extracted, and its validity is verified to filter out duplicate, out-of-order, or invalid packets that exceed the TCP receive window. Then, the valid sequence numbers are sorted sequentially, and the payload data of the packet is written into the data buffer of the corresponding TCP stream structure according to the sorted byte order, thus completing the continuous reassembly of protocol data scattered across multiple TCP packets.

[0055] The data buffer can typically be implemented using a dynamic circular buffer, which automatically adjusts its upper capacity based on the data length, balancing the storage needs of long commands with efficient use of memory resources. This sequential reassembly, based on sequence numbers, allows fragmented TCP packets to be pieced together into a continuous and complete protocol data stream, completely overcoming the limitations of traditional single-packet parsing in handling long SQL commands and fragmented result sets, fundamentally ensuring the integrity of protocol parsing.

[0056] Step S207: Determine the protocol type of the payload information. This step is a preliminary classification step for implementing bidirectional full parsing of the MySQL protocol. The core is to determine the protocol type corresponding to the payload information in the buffer based on the transmission direction of the data packets and the identifier bits in the MySQL protocol header. For example, messages sent from the client to the server can generally be classified as database commands; messages returned from the server to the client can generally be classified as database responses. This preliminary protocol type classification provides a clear basis for subsequent differentiated parsing, while ensuring that both request and response messages in the MySQL session are fully covered, overcoming the core deficiency of traditional solutions that only parse request messages and ignore response messages.

[0057] Step S208: When the protocol type is determined to be a database command in step S207, the database operation instruction parsing process is performed on the load information.

[0058] When the protocol type is determined to be a database command, the MySQL protocol is followed, and the continuous data stream in the data buffer is completely parsed to extract the corresponding database operation instructions, such as login authentication commands, ordinary SQL execution commands, prepared statement operation commands, transaction control commands, and other types of MySQL client commands. Through the parsing of all command types, all operation requests initiated by the client to the database can be accurately reconstructed, providing a complete instruction set for subsequent heterogeneous database replay and ensuring that the replay operation is completely consistent with the source production business behavior.

[0059] Step S209: When the protocol type is determined to be a database response in step S207, perform database execution result parsing processing on the load information.

[0060] When the protocol type is determined to be a database response, the system strictly adheres to the native MySQL protocol specifications, performing a complete parsing of the continuous data stream in the data buffer to extract the corresponding database execution feedback information. This includes, but is not limited to, all types of server-side feedback such as login status responses, result set responses, execution status responses, and error message responses. Through comprehensive response parsing, all execution results of the database's responses to client requests can be accurately reconstructed, providing benchmark data for subsequent consistency comparisons of replay results and ensuring that the replay effect is verifiable and traceable.

[0061] Step S210: Update the state information in the TCP stream structure based on the parsing result.

[0062] During protocol parsing, all critical information generated, such as login authentication status, sequence number update status, preprocessing statement execution status, and result set transmission status, is synchronously updated in real time to the status information fields of the corresponding TCP stream structure. Status transitions are automatically triggered based on the parsing results, ensuring that subsequent data packets within the same session are always processed in the correct context. This real-time synchronous session state update allows for complete tracking of the state changes throughout the entire lifecycle of a MySQL session, fundamentally preventing parsing errors and replay anomalies caused by missing or incorrect states. It guarantees that the execution logic of the entire session is completely consistent with the production source, significantly improving the fidelity of heterogeneous database replay.

[0063] Based on the core process of this embodiment, the following optional solutions can be used to further improve the parsing completeness, scenario adaptability and operational stability of the method. All extended solutions are fully compatible with the core process of this embodiment and can be flexibly combined and used according to business needs.

[0064] Firstly, a full lifecycle parsing and tracking scheme for prepared statements can be added. During the database operation instruction parsing process in step S208, the full-process tracking logic for prepared statements can be extended: when a PREPARE request for a prepared statement is parsed, the unique identifier of the statement, the SQL text, and parameter information are extracted and stored in the prepared statement list of the corresponding TCP stream structure; when an EXECUTE execution request with the corresponding statement identifier is parsed, the pre-stored statement information is directly retrieved from the prepared statement list, and the complete parsing of the instruction is completed by combining it with the passed parameters; when a DEALLOCATE release request is parsed, the information of the corresponding statement identifier is cleared from the prepared statement list, and the prepared statement status of the TCP stream structure is updated synchronously. The technical effect of this extended scheme is that, through session-level full lifecycle tracking of prepared statements, it fully supports the entire process of MySQL prepared statement preparation-execution-release, solving the core pain point that traditional solutions cannot handle prepared statements across requests, leading to SQL execution failures during replay, and significantly improving the adaptability of the method to complex business scenarios.

[0065] Secondly, a request-response time-series association binding scheme can be added. After the parsing processing is completed in steps S208 and S209, the request-response association binding logic can be extended: according to the business interaction sequence of the MySQL session, the database commands parsed within the same session are bound one-to-one with the corresponding database execution results, forming a traceable request-response association pair, and this association pair is synchronously recorded in the status information of the corresponding TCP stream structure. The technical effect of this extended scheme is that, through time-series association binding, a one-to-one correspondence between client requests and server responses is achieved, providing accurate traceability for consistency comparison of results after subsequent heterogeneous database replay and performance bottleneck location, ensuring the verifiability of the replay effect.

[0066] Thirdly, an automated session resource cleanup scheme can be implemented. In addition to the core process of this embodiment, a session lifecycle detection and resource cleanup process can be executed in parallel: The running status of all TCP stream structures in the doubly linked list is polled in real time to check whether the session meets the preset cleanup conditions; when any TCP stream structure is detected to meet the cleanup conditions, the node corresponding to that structure is removed from the doubly linked list, and all memory resources it occupies are released, including the storage space corresponding to the data buffer, preprocessing statement list, and status information. The preset cleanup conditions include two categories: first, capturing a FIN end packet or RST reset packet for the TCP connection corresponding to the TCP stream, indicating that the connection is normally or abnormally closed; second, the TCP stream structure has no data packet interaction within a preset configurable timeout threshold, indicating it as an idle and invalid session. The technical effect of this extended scheme is that, through automated resource cleanup, it achieves closed-loop management of the entire lifecycle of session resources, avoiding memory leaks caused by invalid sessions occupying resources for a long time, ensuring the stability of the system in high-concurrency, long-term continuous operation scenarios, and further improving the efficiency of session matching and searching by streamlining the active session list.

[0067] Fourth, the entire process of this embodiment can be implemented based on a multi-threaded lock-free concurrency mechanism. To adapt to the massive concurrent session processing requirements of enterprise-level production environments, an independent data packet processing thread can be allocated to each CPU core, and each processing thread maintains its own doubly linked list and TCP stream structure set. After receiving the data packet in step S201, it is split according to the data packet's identification information, and all data packets corresponding to the same session are fixedly assigned to the same processing thread. The process steps within the same processing thread are executed serially, and the processing operations between different processing threads are completely parallel. The technical effect of this solution is that, through the lock-free concurrency design of hash splitting, it not only ensures the processing order of data packets in the same session and avoids the performance loss caused by multi-threaded lock contention, but also achieves linear performance expansion of multi-core CPUs, which can support the processing requirements of tens of thousands of concurrent sessions per second and adapt to the traffic parsing scenarios of large-scale business systems.

[0068] This method enables TCP stream-level MySQL session parsing and full lifecycle state management, addressing the core pain points of existing technologies that cannot handle cross-packet long commands, preprocessed statements, sharded result sets, and session state transitions. It provides stable, accurate, and high-fidelity underlying parsing support for heterogeneous database replay, fully adapting to the entire business process requirements such as load replay, compatibility verification, and performance tuning during the localization of databases.

[0069] The flowchart provided in this embodiment is not intended to indicate that the operations of the method will be performed in any particular order, or that all operations of the method are included in every case. Furthermore, the method may include additional operations. Within the scope of the technical concept provided by the method in this embodiment, additional variations can be made to the above method.

[0070] It should be understood that in some embodiments, the components may be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods may be implemented using software or firmware stored in memory and executed by a suitable instruction execution system.

[0071] This embodiment also provides a computer program product 10, a computer-readable storage medium 20, and a computer device 30. Figure 3 This is a schematic diagram of a computer program product 10 according to an embodiment of the present invention. Figure 4 This is a schematic diagram of a computer-readable storage medium 20 according to an embodiment of the present invention. Figure 5 This is a schematic diagram of a computer device 30 according to an embodiment of the present invention. The computer program product 10 includes a computer program 11, which, when executed by the processor 32, implements the steps of the streaming session processing method for heterogeneous database playback described above. A computer-readable storage medium 20 stores the computer program 11 thereon, which, when executed by the processor 32, implements the steps of the streaming session processing method for heterogeneous database playback described above. The computer device 30 may include a memory 31, a processor 32, and the computer program 11 stored in the memory 31 and running on the processor 32.

[0072] The computer program 11 used to perform the operations of this invention may be assembly instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, integrated circuit configuration data, or source code or object code written in any combination of one or more programming languages ​​and procedural programming languages. The computer program 11 may execute entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the latter case, the remote computer may be connected to the user's computer via any type of network, including a Local Area Network (LAN) or Wide Area Network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, to perform aspects of this invention, electronic circuits, including, for example, programmable logic circuits, Field-Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), may execute computer-readable program instructions to personalize the electronic circuits by utilizing state information from computer-readable program instructions.

[0073] For the purposes of this embodiment, computer program product 10 is a related product containing computer program 11. For the purposes of this embodiment, computer-readable storage medium 20 is a tangible device capable of holding and storing computer program 11, and can be any device capable of containing, storing, communicating, propagating, or transmitting program 11 for use by or in conjunction with an instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of computer-readable storage medium 20 include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable optical disc read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical encoding device, and any suitable combination thereof.

[0074] Computer device 30 can be, for example, a server, desktop computer, laptop computer, tablet computer, or smartphone. In some examples, computer device 30 can be a cloud computing node. Computer device 30 can be described in the general context of computer system executable instructions (such as program modules) executed by a computer system. Typically, program modules can include routines, programs, object programs, components, logic, data structures, etc., that perform specific tasks or implement specific abstract data types. Computer device 30 can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices linked through a communication network. In a distributed cloud computing environment, program modules can reside on local or remote computing system storage media, including storage devices.

[0075] Computer device 30 may include a processor 32 adapted to execute stored instructions and a memory 31 that provides temporary storage space for the operation of said instructions during operation. The processor 32 may be a single-core processor, a multi-core processor, a computing cluster, or any other configuration. The memory 31 may include random access memory (RAM), read-only memory, flash memory, or any other suitable storage system.

[0076] Computer device 30 may also include a network adapter / interface and an input / output (I / O) interface. The I / O interface allows external devices that can be connected to the computer device to input and output data. The network adapter / interface provides communication between the computer device and a network, typically represented as a communication network.

[0077] Therefore, those skilled in the art should recognize that although numerous exemplary embodiments of the present invention have been shown and described in detail herein, many other variations or modifications conforming to the principles of the present invention can be directly determined or derived from the disclosure of the present invention without departing from the spirit and scope of the invention. Thus, the scope of the present invention should be understood and construed as covering all such other variations or modifications.

Claims

1. A streaming session processing method for heterogeneous database playback, comprising: Obtain the data packet to be parsed, determine the TCP stream structure to which it belongs based on the identification information of the data packet, and each TCP stream structure is used to maintain the state information of the corresponding session; The data packets are reassembled into the data buffer of the TCP stream structure, and the data buffer is used to record the payload information in the data packets; The load information is parsed in the data buffer; Update the state information in the TCP stream structure based on the parsing results.

2. The streaming session processing method for heterogeneous database playback according to claim 1, wherein, The status information includes: source and destination IPs and ports, session ID, sequence number status, reassembled data stream, and login information; All of the aforementioned TCP stream structures are managed uniformly using a doubly linked list; The step of determining the TCP flow structure to which the data packet belongs based on its identification information includes: searching for the corresponding TCP flow structure based on the source and destination IP addresses and ports of the data packet.

3. The streaming session processing method for heterogeneous database playback according to claim 2, wherein, The step of finding the corresponding TCP stream structure based on the source and destination IPs and ports of the data packet includes: Determine whether the TCP stream structure corresponding to the data packet exists in the doubly linked list; If it exists, locate the TCP stream structure and execute the step of reassembling the data packet into the data buffer of the TCP stream structure; If it does not exist, a new TCP stream structure is created and added to the doubly linked list, and then the step of reassembling the data packet into the data buffer of the TCP stream structure is performed.

4. The streaming session processing method for heterogeneous database playback according to claim 2, wherein, The status information also includes a list of preprocessed statements; The step of parsing the load information in the data buffer includes: When the data packet is parsed as a preprocessing statement preparation request, the statement identifier, statement text and parameter information are extracted and stored in the preprocessing statement list of the TCP stream structure; When the data packet is parsed as a preprocessing statement execution request, the corresponding statement information is searched in the preprocessing statement list according to the statement identifier to complete the instruction parsing; When the data packet is parsed as a preprocessing statement release request, the corresponding statement information is removed from the preprocessing statement list according to the statement identifier.

5. The streaming session processing method for heterogeneous database playback according to claim 1, wherein, The step of reassembling the data packets into the data buffer of the TCP stream structure includes: Extract the sequence number of the data packet, and perform a validity check and sorting on the sequence number; According to the sorted sequence number order, the payload data of the data packet is written into the data buffer corresponding to the TCP stream structure, thereby completing the continuous data reassembly of the data packet.

6. The streaming session processing method for heterogeneous database playback according to claim 1, further comprising: Real-time detection of the session lifecycle status corresponding to the TCP stream structure; When any of the TCP stream structures is detected to meet the preset cleanup conditions, a cleanup operation is performed on the TCP stream structure. The preset cleanup conditions include: the TCP stream structure has no activity within a preset time threshold or the TCP stream structure's corresponding end message or reset message is captured.

7. The streaming session processing method for heterogeneous database playback according to claim 1, wherein, The step of parsing the load information in the data buffer includes: Determine the protocol type of the load information; When the protocol type is database command, the payload information is parsed and processed using database operation instructions. When the protocol type is database response, the payload information is parsed to obtain the database execution result.

8. The streaming session processing method for heterogeneous database playback according to claim 7, wherein, The step of parsing the load information in the data buffer further includes: The database commands and database responses in the same session are time-series associated and bound to obtain request-response association pairs; The request-response association is recorded in the TCP stream structure for comparison of results after the heterogeneous database is replayed.

9. A computer program product comprising a computer program that, when executed by a processor, implements the steps of the streaming session processing method for heterogeneous database playback as described in any one of claims 1 to 8.

10. A computer device comprising a memory, a processor, and a machine-executable program stored in the memory and running on the processor, wherein the processor, when executing the machine-executable program, implements the steps of the streaming session processing method for heterogeneous database playback according to any one of claims 1 to 8.