A full-amount SQL data acquisition method, device and system

By using DPDK driver and multi-level queue technology, the real-time performance and performance loss issues of full SQL data collection on cloud platforms are solved, achieving efficient and low-loss data collection and processing, which is suitable for real-time monitoring and governance of cloud databases.

CN117615271BActive Publication Date: 2026-06-12CHINA TELECOM CLOUD TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA TELECOM CLOUD TECH CO LTD
Filing Date
2023-11-23
Publication Date
2026-06-12

Smart Images

  • Figure CN117615271B_ABST
    Figure CN117615271B_ABST
Patent Text Reader

Abstract

The application relates to a full-amount SQL data acquisition method, device and system, wherein a DPDK driver is configured in advance for a cloud database, and a DPDK environment is initialized; a data communication port corresponding to the cloud database is listened to, so that a data packet is acquired; the acquired data packet is filtered, and the filtered data packet is forwarded to at least one of multistage queues; the multistage queues respectively process the received data packet, and store the data packet to different regions of a pre-allocated large page storage buffer; and the data packet stored in the large page storage buffer is transmitted to a data receiving center. The application utilizes the multi-queue packet capturing capacity provided by the DPDK, directly captures data in the user state, avoids interrupt overhead, has extremely high real-time performance, is of great significance for real-time data management, and is dependent on CPU and memory, so that the performance loss of the database itself is extremely low, thereby providing a full-amount SQL data acquisition scheme with high real-time performance, low loss and high efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of cloud computing technology, and in particular to a method, apparatus and system for acquiring full SQL data. Background Technology

[0002] In the cloud computing field, monitoring, operation, and governance services for cloud databases often rely on the real-time collection of full SQL data to support downstream applications such as SQL execution monitoring, SQL insight and optimization, and SQL security auditing and early warning. Unlike ordinary database SQL collection, cloud databases have more related services and a much larger volume of full SQL data, placing extremely high performance requirements on the collection end.

[0003] There are three main approaches to existing cloud platform solutions for full SQL collection:

[0004] (1) Output SQL data based on database kernel. This type of method has good performance and collects rich data, but it requires modification of kernel source code, which has a very high technical threshold and extremely high operational complexity.

[0005] (2) Periodic data collection based on database logs. This type of method has low operational complexity, but because it cannot continuously query SQL logs, its real-time performance is poor, and frequent operations on database logs will greatly affect the performance of the database itself.

[0006] (3) Packet capture methods: The mainstream methods for capturing packets in databases include pcap, pf_ring, etc. These methods capture and parse the TCP protocol packets generated by the interaction between the user and the cloud database to obtain the full SQL data. These methods need to switch to the system state through an interrupt within the device to capture packets, which has poor stability, moderate collection speed, and average real-time performance. Summary of the Invention

[0007] In view of the shortcomings of the prior art, the purpose of the invention is to provide a method, device and system for full SQL data acquisition, which provides a full SQL data acquisition solution with high real-time performance, low loss and high efficiency, taking into account the characteristics of SQL data packets.

[0008] Firstly, this application provides a method for collecting full SQL data, which pre-configures the DPDK driver for the cloud database and initializes the DPDK environment; the method includes:

[0009] Listen to the data communication port corresponding to the cloud database to obtain data packets;

[0010] The acquired data packets are filtered, and the filtered data packets are forwarded to at least one of the multi-level queues;

[0011] The multi-level queues process the received data packets respectively and store the data packets in different areas of the pre-allocated DPDK big page storage buffer;

[0012] The data packets stored in the large page storage buffer are transmitted to the data receiving center.

[0013] Optionally, the step of pre-configuring the DPDK driver for the cloud database and initializing the DPDK environment includes:

[0014] Configure an RSS routing policy in DPDK to forward packets to at least one of the multi-level queues according to the RSS routing policy;

[0015] Configure the number of multi-level queues for receiving data packets, configure the memory allocation for the large page storage buffer, and bind the database's data communication port to the DPDK's UIO protocol stack.

[0016] Optionally, the memory allocation for setting the large page storage buffer includes:

[0017] The large page storage buffer is divided into a normal buffer and a large SQL buffer according to a preset initial ratio. The page size of the large SQL buffer is larger than the page size of the normal buffer, and the memory allocation of the normal buffer and the large SQL buffer can be dynamically adjusted.

[0018] Optionally, the multi-level queue includes at least a primary receiving queue and a secondary processing queue; the secondary processing queue includes a session information queue and a large SQL processing queue.

[0019] The step of filtering the acquired data packets and forwarding the filtered data packets to at least one of the multi-level queues includes:

[0020] The system determines whether the acquired data packet is a TCP protocol packet; if so, it is received; otherwise, it is discarded.

[0021] For TCP protocol packets, determine whether they are invalid packets generated during the TCP connection process; if they are invalid packets and do not contain database session information, they are discarded; if they are invalid packets and contain database session information, they are forwarded to the session information queue of the secondary processing queue.

[0022] If the received data packet is a large SQL data packet exceeding the preset threshold, the data packet will be forwarded to the large SQL processing queue of the secondary processing queue.

[0023] Other data packets that do not meet the above conditions will enter the first-level receive queue.

[0024] Optionally, the multi-level queues process the received data packets respectively, storing the data packets in different areas of the pre-allocated DPDK big-page storage buffer, including:

[0025] The session information queue stores the received session information data into a regular buffer after removing duplicates;

[0026] The large SQL processing queue stores data packets into the large SQL buffer;

[0027] The primary receive queue caches valid ordinary data packets into the ordinary buffer.

[0028] Optionally, after transmitting the data packets stored in the large page storage buffer to the data receiving center, the method further includes:

[0029] The data receiving center distributes the received data packets to the distributed processing cluster, so that the distributed processing cluster can process the data and store the processed data in the database.

[0030] Optionally, the data receiving center distributes the received data packets to the distributed processing cluster, including:

[0031] For large SQL data packets, the large SQL data packets are fragmented and transmitted based on the fragmentation function of DPDK. The data receiving center then distributes them to a pre-specified set of distributed processing clusters for reassembly, processing, and / or compression.

[0032] Optionally, after the distributed processing cluster processes the data, storing the processed data in the database includes:

[0033] The distributed processing cluster parses the collected data packets to obtain the SQL execution statement, creation time, execution time and source information, and performs structured processing.

[0034] Wildcards are used to replace user data in SQL statements except for SQL keywords, resulting in de-identified data;

[0035] The processed data is stored in the database.

[0036] Secondly, this application provides a full SQL data acquisition device, comprising:

[0037] The pre-configured module is designed to pre-configure the DPDK driver for the cloud database and initialize the DPDK environment.

[0038] The listening module is configured to listen to the data communication port corresponding to the cloud database in order to obtain data packets;

[0039] The filtering module is configured to filter the acquired data packets and forward the filtered data packets to at least one of the multi-level queues;

[0040] A multi-level queue processing module, wherein the multi-level queues process the received data packets respectively and store the data packets in different areas of the pre-allocated DPDK big page storage buffer;

[0041] The transmission module is configured to transmit data packets stored in the large page storage buffer to the data receiving center.

[0042] Thirdly, this application provides a full-volume SQL data acquisition system, including:

[0043] At least one processor; and

[0044] At least one memory storing a computer program;

[0045] When the computer program is executed by the at least one processor, the full SQL data acquisition system performs the steps of the full SQL data acquisition method described above.

[0046] The beneficial effects of this invention are as follows:

[0047] The full SQL data acquisition method described in this invention involves pre-configuring the DPDK driver for the cloud database and initializing the DPDK environment; monitoring the data communication port corresponding to the cloud database to acquire data packets; filtering the acquired data packets and forwarding the filtered data packets to at least one of the multi-level queues; processing the received data packets in each of the multi-level queues and storing the data packets in different areas of a pre-allocated DPDK large-page storage buffer; and transmitting the data packets stored in the large-page storage buffer to the data receiving center. This application utilizes the multi-queue packet capture capability provided by DPDK, enabling direct data capture in user space based on the UIO protocol, avoiding interruption overhead, and exhibiting extremely high real-time performance. This is of great significance for real-time data governance. Furthermore, DPDK relies on CPU and memory, and with reasonable resource allocation in the early stages, it has only minimal performance overhead on the database itself, thus providing a high-real-time, low-loss, and high-efficiency full SQL data acquisition solution.

[0048] In addition, this application also provides a full SQL data acquisition device and system with the above-mentioned technical effects. Attached Figure Description

[0049] The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Throughout the drawings, the same reference numerals denote the same parts. It is obvious that the drawings described below are merely some embodiments of the present invention, and those skilled in the art can obtain other drawings based on these drawings.

[0050] Figure 1 A flowchart of the full SQL data acquisition method provided in this application;

[0051] Figure 2 A flowchart illustrating the data processing process of the distributed processing cluster in this embodiment of the invention.

[0052] Figure 3 Flowchart of another specific implementation of the full SQL data acquisition method provided in this application;

[0053] Figure 4 A schematic diagram illustrating another specific implementation of the full SQL data acquisition method provided in this application;

[0054] Figure 5 A structural block diagram of the full SQL data acquisition device provided in this application;

[0055] Figure 6 The structural block diagram of the full SQL data acquisition system provided in this application. Detailed Implementation

[0056] To enable those skilled in the art to better understand the technical solutions in the embodiments of the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. It should be understood that these descriptions are merely exemplary and are not intended to limit the scope of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0057] Furthermore, descriptions of well-known structures and techniques are omitted in the following description to avoid unnecessarily obscuring the concepts disclosed in this invention.

[0058] In the description of this invention, it should be noted that, unless otherwise explicitly specified and limited, the terms "center," "upper," "lower," "left," "right," "vertical," "horizontal," "inner," and "outer," etc., indicating orientation or positional relationships based on the orientation or positional relationships shown in the accompanying drawings, are only for the convenience of describing the invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of the invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and should not be construed as indicating or implying relative importance. The terms "installed," "connected," and "linked" should be interpreted broadly; for example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal communication of two components. Those skilled in the art can understand the specific meaning of the above terms in this invention based on the specific circumstances.

[0059] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of methods and systems consistent with some aspects of the invention as detailed in the appended claims.

[0060] This application provides a method for collecting full SQL data, such as... Figure 1 The flowchart of the full SQL data collection method provided in this application is shown, and the method specifically includes:

[0061] S100: Pre-configures the DPDK driver for the cloud database and initializes the DPDK environment.

[0062] In this application, SQL (Structured Query Language) is a domain-specific language used to manage and operate relational database systems. It is a standardized programming language used to perform various database operations, including data querying, insertion, updating, deletion, etc.

[0063] DPDK (Data Plane Development Kit) is a data plane transformation tool primarily designed for high-performance packet processing. It possesses the following three core functions and advantages:

[0064] (1) Based on the UIO (Userspace I / O) protocol, it bypasses the kernel protocol stack and directly connects to the network card, enabling user space access and processing of data packets, reducing interrupt overhead and improving processing performance.

[0065] (2) Multi-queue support allows for simultaneous packet reception and processing on multiple CPU cores, resulting in high parallelism.

[0066] (2) Large page storage: Using large page storage (generally >= 2MB) as a buffer eliminates the need to access the disk, reduces the number of memory page tables, and improves memory management efficiency.

[0067] (3) Zero-copy technology: Reduce unnecessary multiple copies of memory data, reduce CPU and memory overhead. A single database TCP protocol packet is generally larger than 4 bytes and the data volume is large, making it very suitable for large page storage.

[0068] Specifically, the initialization process may include: configuring the DPDK driver for each cloud database service and initializing the DPDK environment. This includes, but is not limited to, adapting network interface card attributes, configuring RSS routing policies in DPDK to forward data packets to at least one of the multi-level queues according to the RSS routing policies, setting the number of multi-level queues for receiving data packets, setting the memory allocation for the large page storage buffer, and binding the database's data communication port to the DPDK's UIO protocol stack.

[0069] The large page storage buffer is divided into a normal buffer and a large SQL buffer according to a preset initial ratio. The page size of the large SQL buffer is larger than the page size of the normal buffer, and the memory allocation of the normal buffer and the large SQL buffer can be dynamically adjusted.

[0070] S101: Listens to the data communication port corresponding to the cloud database to obtain data packets.

[0071] It's understandable that the data communication port can be a network interface card (NIC) port. Data packets are obtained by listening to the NIC port.

[0072] S102: Filter the acquired data packets and forward the filtered data packets to at least one of the multi-level queues.

[0073] In real-world production environments, packet capture of TCP packets targeting databases often encounters numerous online issues such as invalid packets and large SQL packets. Invalid packets, which lack valid SQL information, are generated during the three-way handshake and four-way handshake processes of establishing a TCP connection with the database. These packets constitute a significant portion of the network, and capturing them severely impacts acquisition performance. Discarding them directly results in the loss of packets containing database session information, as this information is only present in the packets generated during the three-way handshake phase and is easily lost. Large SQL packets, containing large SQL statements, are fewer in number but can cause lag in acquisition and transmission, affecting the system's real-time performance.

[0074] As one specific implementation, the multi-level queues in this application all use lock-free circular queues, and the specific structure of the multi-level queues can be divided into two levels. It includes at least a primary receiving queue and a secondary processing queue; the secondary processing queue includes a session information queue and a large SQL processing queue.

[0075] The step of filtering the acquired data packets and forwarding the filtered data packets to at least one of the multi-level queues includes:

[0076] The system determines whether the acquired data packet is a TCP protocol packet; if so, it is received; otherwise, it is discarded.

[0077] For TCP protocol packets, determine whether they are invalid packets generated during the TCP connection process; if they are invalid packets and do not contain database session information, they are discarded; if they are invalid packets and contain database session information, they are forwarded to the session information queue of the secondary processing queue.

[0078] If the received data packet is a large SQL data packet exceeding the preset threshold, the data packet will be forwarded to the large SQL processing queue of the secondary processing queue.

[0079] Other data packets that do not meet the above conditions will enter the first-level receive queue.

[0080] Understandably, during the data collection process, the database port can be bound to receive only TCP protocol data packets generated by the interaction between the cloud database and the server. The SYN / ACK flags are used to determine whether the data packet is an invalid packet generated during the TCP connection process. If it is an invalid packet and contains database session information, it is forwarded to the session information queue in the secondary processing queue. Otherwise, it is discarded directly. If it contains large SQL data packets, it is forwarded to the large SQL processing queue in the secondary processing queue. Ordinary data packets are directly entered into the queue.

[0081] S103: The multi-level queues process the received data packets respectively and store the data packets in different areas of the pre-allocated DPDK large page storage buffer;

[0082] The session information queue stores the received session information data into a regular buffer after removing duplicates; the large SQL processing queue stores data packets into the large SQL buffer; and the first-level receiving queue caches valid regular data packets into the regular buffer.

[0083] Understandably, in the multi-level queue, the first-level receiving queue polls and receives all data packets that conform to the filter rules of the corresponding network card port of the cloud database through the UIO protocol, and caches the valid ordinary data packets in the ordinary buffer of DPDK; in the second-level processing queue, the session information queue removes duplicates from the received session information data according to its source and stores it in the buffer to avoid a lot of redundancy, while the large SQL queue stores the data packets in the large SQL buffer.

[0084] S104: Transmit the data packets stored in the large page storage buffer to the data receiving center.

[0085] It is understandable that the data receiving center can be specifically a data receiving gateway. The data receiving gateway distributes the received data packets to the distributed processing cluster responsible for data processing, and the processed data is finally stored in a real-time analytical database through a message queue.

[0086] The large page storage buffer transmits the received data to the data processing gateway in real time via socket. During this transmission process, a retransmission mechanism can be set to ensure the quality of data transmission in case of transmission failure.

[0087] In addition, a packet capture monitor can be set up to dynamically adjust the buffer size based on the traffic and buffer memory status, and to clean up the buffer periodically to avoid memory overflow.

[0088] This invention aims to provide real-time, full-volume SQL data for the management, monitoring, and governance of cloud platform databases. Addressing the shortcomings of existing solutions, this application does not rely on complex, customized database kernels and operational database logs, resulting in minimal impact on database performance. Utilizing the multi-queue packet capture capabilities provided by DPDK, data can be directly captured in user space based on the UIO protocol, avoiding interruption overhead and achieving extremely high real-time performance. This is significant for real-time data governance. Furthermore, DPDK relies on CPU and memory; with proper initial resource allocation, it incurs only minimal performance overhead on the database itself, thus providing a highly real-time, low-loss, and highly efficient full-volume SQL data acquisition solution. In addition, this application can solve online problems in real-world production environments, such as the large number of invalid packets generated by database TCP connections, the easy loss of session information, and the performance degradation caused by large SQL data.

[0089] This application addresses a series of production-related issues commonly encountered in real-world production environments, such as excessive invalid packets, slowdowns with large SQL data packets, and easy loss of session information, by setting filtering and traffic distribution strategies. Furthermore, by utilizing DPDK's large-page caching and zero-copy technology, it achieves data partitioning and fragmented transmission, reducing storage overhead, improving data packet transmission efficiency, and better meeting the system's real-time requirements.

[0090] Based on the above embodiments, after transmitting the data packet stored in the large page storage buffer to the data receiving center, this application further includes: the data receiving center distributing the received data packet to the distributed processing cluster, so that the distributed processing cluster processes the data and stores the processed data in the database.

[0091] The data receiving center distributes the received data packets to the distributed processing clusters, including: for large SQL data packets, the large SQL data packets are fragmented and transmitted based on the fragmentation function of DPDK, and then uniformly distributed by the data receiving center to a pre-specified set of distributed processing clusters for reassembly, processing and / or compression.

[0092] Reference Figure 2 The flowchart of the data processing process of the distributed processing cluster is shown. After the distributed processing cluster processes the data, storing the processed data in the database may specifically include the following steps:

[0093] S201: The distributed processing cluster parses the collected data packets to obtain the SQL execution statement, creation time, execution time and source information, and performs structured processing.

[0094] S202: Use wildcards to replace user data in SQL statements other than SQL keywords to obtain de-identified data;

[0095] S203: Store the processed data in the database.

[0096] Processed data is promptly sent to the message queue and ultimately stored in the real-time analytics database.

[0097] This application separates data acquisition from data processing, reducing the performance overhead of the acquisition end, while using a distributed processing cluster for fast data processing, thus improving the real-time performance of data acquisition.

[0098] The method for collecting full SQL data provided in this application is described below with reference to specific embodiments. (Refer to...) Figure 3 Another specific implementation flowchart of the full SQL data acquisition method provided in this application, and Figure 4 Another specific implementation diagram of the full SQL data acquisition method provided in this application is shown, and its specific implementation process is as follows:

[0099] S301: Configure the DPDK driver for the cloud database service, install the latest DPDK version and compile it, and initialize the DPDK environment.

[0100] The network card is bound using the dpdk-devbind tool. The RSS traffic splitting strategy of the network card in DPDK is set to symmetric flow balancing. The large page storage is partitioned using rte_malloc_socket provided by DPDK. The partition memory is normal_mem (page size 2MB) of normal buffer and huge_mem (page size 1GB) of large SQL buffer.

[0101] S302: Define filters that adapt to the characteristics of SQL data packets.

[0102] The `rte_flow_item` module provided by DPDK defines a filter adapted to the characteristics of SQL data packets. By binding to the database service port (default 3306), it only receives TCP protocol data packets generated by the interaction between the cloud database and the server. It determines whether the packet is an invalid packet generated during the TCP connection phase based on the SYN / ACK / PSH / FIN flags in the packet. If it is an invalid packet and contains database session information, it is forwarded to the session information queue in the secondary processing queue to remove duplicates; otherwise, it is discarded. If the data packet contains large SQL statements, it is forwarded to the large SQL processing queue in the secondary processing queue; ordinary data packets are directly placed into the queue. As a specific example, data packets larger than 100MB can be set as large SQL data packets.

[0103] S303: Create a multi-level queue.

[0104] Use rte_ring to create multi-level queues. The number of first-level queues can be set to 8. Listen to the corresponding port of the database service. In the multi-level queues, the first-level receiving queue polls and receives all data packets of the corresponding network card port of the cloud database through the UIO protocol, and caches non-empty normal data packets in the normal_mem area of ​​DPDK. In the second-level processing queue, the session information queue deduplicates the received session information data and stores it in the normal_mem area, and the large SQL queue stores the data packets in the huge_mem area.

[0105] S304: Store the collected data packets into the DPDK's large-page storage buffer and distribute the data to the data receiving gateway in real time via socket.

[0106] For large SQL data packets that require operation on large page storage using the mbuf method, fragmentation is necessary; if transmission fails, retransmission is required. A watcher should be configured to dynamically adjust the buffer size based on traffic and memory usage; it can be set to clean the memory pool every minute to prevent memory overflow.

[0107] S305: The data receiving gateway distributes the received data packets to the worker cluster responsible for data processing.

[0108] The specific distribution strategy is as follows: one worker is dedicated to handling session information, one worker cluster handles large SQL data packets, and another worker cluster handles ordinary data packets. The ratio can be set to 1:6. The workers primarily parse the collected data packets, extracting key information such as (but not limited to) SQL execution statements, creation time, execution time, source user, IP address, and database. This information is then processed into JSON structure. For large SQL data, it needs to be assembled first, and valid information extracted, structured, and anonymized before being compressed using the Snappy algorithm. Secondly, the SQL data undergoes anonymization; user data in the SQL statements, except for SQL keywords, is uniformly replaced with wildcards '?' to remove sensitive information.

[0109] S306: The processed data is sent to the Kafka message queue and finally stored in the real-time analytics database ClickHouse.

[0110] In addition, this application also provides a full SQL data acquisition device, such as... Figure 5 The structural block diagram of the full SQL data acquisition device provided in this application is shown. The device specifically includes:

[0111] The pre-configured module 500 is configured to pre-configure the DPDK driver for the cloud database and initialize the DPDK environment;

[0112] The listening module 501 is configured to listen to the data communication port corresponding to the cloud database in order to obtain data packets;

[0113] The filtering module 502 is configured to filter the acquired data packets and forward the filtered data packets to at least one of the multi-level queues;

[0114] The multi-level queue processing module 503 processes the received data packets in different areas of the pre-allocated DPDK big page storage buffer.

[0115] The transmission module 504 is configured to transmit data packets stored in the large page storage buffer to the data receiving center.

[0116] In addition, this application also provides a full SQL data acquisition system, such as Figure 6 The structural block diagram of the full SQL data acquisition system provided in this application is shown. The system specifically includes:

[0117] At least one processor 61; and

[0118] At least one memory 62 storing a computer program;

[0119] When the computer program is executed by the at least one processor 61, the full SQL data acquisition system performs the steps of the full SQL data acquisition method described above.

[0120] It is understood that the specific implementation process of the full SQL data acquisition device and full SQL data acquisition system provided in this application can refer to the specific implementation process of the full SQL data acquisition method described above, and will not be repeated here.

[0121] This application pre-configures the DPDK driver for the cloud database and initializes the DPDK environment; it listens to the data communication port corresponding to the cloud database to obtain data packets; it filters the obtained data packets and forwards the filtered data packets to at least one of the multi-level queues; the multi-level queues process the received data packets respectively and store the data packets in different areas of the pre-allocated DPDK large-page storage buffer; and it transmits the data packets stored in the large-page storage buffer to the data receiving center. This application utilizes the multi-queue packet capture capability provided by DPDK to directly capture data in user space based on the UIO protocol, avoiding interruption overhead and exhibiting extremely high real-time performance, which is of great significance for real-time data governance. Furthermore, DPDK relies on CPU and memory, and with proper prior resource allocation, it has only minimal performance overhead on the database itself, thus providing a high-real-time, low-loss, and high-efficiency full SQL data acquisition solution.

[0122] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the embodiments of the present invention, and are not intended to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the protection scope of the present invention.

Claims

1. A method for collecting full SQL data, characterized in that, The method includes: pre-configuring the DPDK driver for the cloud database and initializing the DPDK environment; Listen to the data communication port corresponding to the cloud database to obtain data packets; The acquired data packets are filtered, and the filtered data packets are forwarded to at least one of the multi-level queues; The multi-level queues process the received data packets respectively and store the data packets in different areas of the pre-allocated DPDK big page storage buffer; The data packets stored in the large page storage buffer are transmitted to the data receiving center; The multi-level queue includes at least a primary receiving queue and a secondary processing queue; the secondary processing queue includes a session information queue and a large SQL processing queue. The step of filtering the acquired data packets and forwarding the filtered data packets to at least one of the multi-level queues includes: The system determines whether the acquired data packet is a TCP protocol packet; if so, it is received; otherwise, it is discarded. For TCP protocol packets, determine whether they are invalid packets generated during the TCP connection process; if they are invalid packets and do not contain database session information, they are discarded; if they are invalid packets and contain database session information, they are forwarded to the session information queue of the secondary processing queue. If the received data packet is a large SQL data packet exceeding the preset threshold, the data packet will be forwarded to the large SQL processing queue of the secondary processing queue. Other data packets that do not meet the above conditions will enter the first-level receive queue.

2. The full SQL data acquisition method according to claim 1, characterized in that, The pre-configuration of the DPDK driver for the cloud database and the initialization of the DPDK environment include: Configure an RSS routing policy in DPDK to forward packets to at least one of the multi-level queues according to the RSS routing policy; Configure the number of multi-level queues for receiving data packets, configure the memory allocation for the large page storage buffer, and bind the database's data communication port to the DPDK's UIO protocol stack.

3. The full SQL data acquisition method according to claim 2, characterized in that, The memory allocation for setting up the large page storage buffer includes: The large page storage buffer is divided into a normal buffer and a large SQL buffer according to a preset initial ratio. The page size of the large SQL buffer is larger than the page size of the normal buffer, and the memory allocation of the normal buffer and the large SQL buffer can be dynamically adjusted.

4. The full SQL data acquisition method according to claim 1, characterized in that, The multi-level queues process the received data packets respectively, storing the data packets in different areas of the pre-allocated DPDK big-page storage buffer, including: The session information queue stores the received session information data into a regular buffer after removing duplicates; The large SQL processing queue stores data packets into the large SQL buffer; The primary receive queue caches valid ordinary data packets into the ordinary buffer.

5. The method for collecting full SQL data according to any one of claims 1 to 4, characterized in that, After transmitting the data packets stored in the large page storage buffer to the data receiving center, the process further includes: The data receiving center distributes the received data packets to the distributed processing cluster, so that the distributed processing cluster can process the data and store the processed data in the database.

6. The full SQL data acquisition method according to claim 5, characterized in that, The data receiving center distributes the received data packets to the distributed processing cluster, including: For large SQL data packets, the large SQL data packets are fragmented and transmitted based on the fragmentation function of DPDK. The data receiving center then distributes them to a pre-specified set of distributed processing clusters for reassembly, processing, and / or compression.

7. The full SQL data acquisition method according to claim 5, characterized in that, After processing the data, the distributed processing cluster stores the processed data in the database, including: The distributed processing cluster parses the collected data packets to obtain the SQL execution statement, creation time, execution time and source information, and performs structured processing. Wildcards are used to replace user data in SQL statements except for SQL keywords, resulting in de-identified data; The processed data is stored in the database.

8. A full SQL data acquisition device, characterized in that, include: The pre-configured module is designed to pre-configure the DPDK driver for the cloud database and initialize the DPDK environment. The listening module is configured to listen to the data communication port corresponding to the cloud database in order to obtain data packets; The filtering module is configured to filter the acquired data packets and forward the filtered data packets to at least one of the multi-level queues; A multi-level queue processing module, wherein the multi-level queues process the received data packets and store the data packets in different areas of a pre-allocated DPDK large page storage buffer; wherein the multi-level queues include at least a primary receiving queue and a secondary processing queue; the secondary processing queues include a session information queue and a large SQL processing queue; The transmission module is configured to transmit data packets stored in the large page storage buffer to the data receiving center; The filtering module is configured to: determine whether the acquired data packet is a TCP protocol data packet based on the protocol type; if so, receive it; otherwise, discard it; for TCP protocol data packets, determine whether they are invalid packets generated during the TCP connection process; if they are invalid packets and do not contain database session information, discard them; if they are invalid packets and contain database session information, forward the data packet to the session information queue of the secondary processing queue; if the acquired data packet is a large SQL data packet exceeding a preset threshold, forward the data packet to the large SQL processing queue of the secondary processing queue; for other data packets that do not meet the above conditions, enter the primary receiving queue.

9. A full SQL data acquisition system, comprising: At least one processor; as well as At least one memory storing a computer program; When the computer program is executed by the at least one processor, the full SQL data acquisition system performs the steps of the full SQL data acquisition method according to any one of claims 1 to 7.