Data processing method and device, computer device, storage medium and program product

By parsing tablespace files and table structure files in an offline MySQL database to obtain row data, the problem of low data extraction efficiency in traditional MySQL databases is solved, achieving efficient and accurate data extraction.

CN115982156BActive Publication Date: 2026-06-12INDUSTRIAL AND COMMERCIAL BANK OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INDUSTRIAL AND COMMERCIAL BANK OF CHINA
Filing Date
2022-12-14
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Traditional methods for extracting data from MySQL databases are inefficient.

Method used

By obtaining the tablespace files and table structure files generated by the storage engines supported by the MySQL database, the table structure information is parsed and row data is obtained when the MySQL database is offline. Data extraction is performed using information such as the page header information and row record format of the table structure and tablespace files.

🎯Benefits of technology

This technology improves data extraction efficiency and accuracy when the MySQL database is offline, reduces reliance on database operation, and enables fast and accurate data extraction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115982156B_ABST
    Figure CN115982156B_ABST
Patent Text Reader

Abstract

The application relates to a data processing method and device, computer equipment, a storage medium and a program product, and relates to the technical field of big data. The method comprises the following steps: obtaining a table space file and a table structure file of a MySQL database from a preset storage space; the table space file and the table structure file are generated based on a storage engine supported by the MySQL database; the table structure file is parsed to obtain table structure information corresponding to the table space file; and row data of the MySQL database is obtained according to the table structure information and the table space file. The method can improve the data extraction efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of big data technology, and in particular to a data processing method, apparatus, computer equipment, storage medium, and program product. Background Technology

[0002] With the development of MySQL databases, due to their advantages of being open-source and lightweight, more and more enterprises are choosing to use MySQL databases for secondary development. Typically, when using MySQL databases, enterprises first need to extract data corresponding to different application scenarios from the MySQL database, and then use this extracted data to further develop and apply the MySQL database.

[0003] However, traditional data extraction methods suffer from low extraction efficiency. Summary of the Invention

[0004] Therefore, it is necessary to provide a data processing method, apparatus, computer equipment, storage medium, and program product that can improve the efficiency of data extraction in response to the above-mentioned technical problems.

[0005] Firstly, this application provides a data processing method. The method includes:

[0006] The tablespace file and table structure file of the MySQL database are obtained from the preset storage space; the tablespace file and the table structure file are generated based on the storage engine supported by the MySQL database.

[0007] The table structure file is parsed to obtain the table structure information corresponding to the tablespace file;

[0008] Based on the table structure information and the tablespace file, obtain the row data of the MySQL database.

[0009] In one embodiment, obtaining row data from the MySQL database based on the table structure information and the tablespace file includes:

[0010] Based on the page header information of the tablespace file, obtain the data pages in the tablespace file;

[0011] Based on the table structure information, obtain the row record format of the data page;

[0012] The row data is obtained based on the table structure information, the row record format, and the data page.

[0013] In one embodiment, obtaining the row data based on the table structure information, the row record format, and the data page includes:

[0014] The data page is split according to the row record format, multiple data rows corresponding to the data page are obtained, and the multiple data rows are stored in a preset message queue;

[0015] Based on the length information of the data columns in the table structure information, the multiple data rows stored in the message queue are split to obtain multiple byte arrays;

[0016] Based on the row record format and the table structure information, the multiple byte arrays are parsed concurrently to obtain the row data.

[0017] In one embodiment, obtaining the data page in the tablespace file based on the page header information of the tablespace file includes:

[0018] Obtain the file header information from the page header information;

[0019] Based on the file header information, the data page is obtained from the tablespace file.

[0020] In one embodiment, parsing the table structure file to obtain the table structure information corresponding to the tablespace file includes:

[0021] A preset command is invoked to parse the table structure file and obtain the table structure information corresponding to the tablespace file.

[0022] In one embodiment, the storage engine includes any one of the InnoDB storage engine, the MyISAM storage engine, and the MERGE storage engine.

[0023] In one embodiment, the method further includes:

[0024] Based on the target application scenario, the storage strategy corresponding to the target application scenario is adopted to store the row data in the target object.

[0025] Secondly, this application also provides a data processing apparatus. The apparatus includes:

[0026] The first acquisition module is used to acquire the tablespace file and table structure file of the MySQL database from a preset storage space; the tablespace file and the table structure file are generated based on the storage engine supported by the MySQL database;

[0027] The second acquisition module is used to parse the table structure file and obtain the table structure information corresponding to the tablespace file.

[0028] The third acquisition module is used to acquire row data of the MySQL database based on the table structure information and the tablespace file.

[0029] Thirdly, this application also provides a computer device. The computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the data processing method as described in the first aspect.

[0030] Fourthly, this application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program thereon, the computer program being executed by a processor using the data processing method described in the first aspect.

[0031] Fifthly, this application also provides a computer program product. The computer program product includes a computer program that is executed by a processor using the data processing method described in the first aspect.

[0032] The aforementioned data processing methods, apparatus, computer equipment, storage media, and program products obtain tablespace files and table structure files generated based on the storage engine supported by the MySQL database from a preset storage space. They then parse the table structure files to obtain the corresponding table structure information from the tablespace files. Based on the table structure information and the tablespace files, they retrieve the row data from the MySQL database. Since the files generated by the storage engine supported by the MySQL database are stored on the disk, compared to traditional techniques, data can be extracted from the database even when it is offline. This eliminates the need for database operation to facilitate data extraction, reducing limitations and improving efficiency. Furthermore, the parsed table structure information allows for the rapid determination of the data storage format. Parsing the tablespace files according to this format yields accurate row data from the MySQL database, further improving the accuracy of the extracted data. Attached Figure Description

[0033] Figure 1 This is a diagram illustrating the application environment of a data processing method in one embodiment.

[0034] Figure 2 This is a flowchart illustrating a data processing method in one embodiment;

[0035] Figure 3 This is a flowchart illustrating the data processing method in another embodiment;

[0036] Figure 4 This is a flowchart illustrating the data processing method in another embodiment;

[0037] Figure 5 This is a flowchart illustrating the data processing method in another embodiment;

[0038] Figure 6 This is a structural block diagram of a data processing device in one embodiment;

[0039] Figure 7 This is a structural block diagram of a data processing device in another embodiment. Detailed Implementation

[0040] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0041] It should be noted that the data processing methods, apparatus, devices, storage media, and program products of this application can be applied in the field of big data, as well as in other technical fields. This application does not limit the application fields of the data processing methods, apparatus, devices, storage media, and program products.

[0042] In this application, the reference to "embodiment" means that a specific feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment that is mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described in this application may be combined with other embodiments without conflict.

[0043] Unless otherwise defined, the technical or scientific terms used in this application shall have the ordinary meaning understood by one of ordinary skill in the art to which this application pertains. The terms “a,” “an,” “an,” “the,” and similar words used in this application do not indicate quantity limitation and may indicate singular or plural. The terms “comprising,” “including,” “having,” and any variations thereof used in this application are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or device that includes a series of steps or modules (units) is not limited to the listed steps or units, but may also include steps or units not listed, or may include other steps or units inherent to these processes, methods, products, or devices. The term “multiple” used in this application means two or more. “And / or” describes the relationship between related objects, indicating that three relationships may exist; for example, “A and / or B” can indicate: A alone, A and B simultaneously, and B alone. The terms “first,” “second,” etc., used in this application are merely to distinguish similar objects and do not represent a specific ordering of objects. In the description of this invention, "multiple" means at least two, such as two, three, etc., unless otherwise explicitly specified.

[0044] The data processing method provided in this application embodiment can be applied to, for example... Figure 1 The application environment shown. Figure 1 A computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 1 As shown, the computer device includes a processor, memory, and a network interface connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores data involved in data processing. The network interface communicates with external terminals via a network connection. When the computer program is executed by the processor, it implements a data processing method.

[0045] Those skilled in the art will understand that Figure 1 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0046] It's important to note that MySQL databases support different types of storage engines, such as MyISAM, MERGE, and InnoDB. MyISAM is suitable for applications that primarily involve read and insert operations with fewer update and delete operations, and where transaction integrity and concurrency requirements are relatively low. The MERGE engine can be used to logically combine a series of identical MyISAM tables and reference this combination as an object. Distributing different tables across multiple disks effectively improves the access efficiency of MERGE tables, making it suitable for very large databases (VLDBs) such as data warehouses. InnoDB offers superior data read / write and concurrency capabilities, making it suitable for applications with high consistency requirements and frequent data updates. Furthermore, when processing data using InnoDB, data stored on disk can be loaded into memory, modified, or added in memory, and then the data is written back to disk. In other words, InnoDB is disk-based. Due to the speed difference between the Central Processing Unit (CPU) and the disk, the InnoDB storage engine uses a buffer pool to improve overall database performance. The buffer pool is a memory area. When reading a page from the database, the page is first stored in the buffer pool. The next time the same page is read, it is checked whether the page is in the buffer pool. If it is, the page is considered a cached hit and can be read directly; otherwise, the page needs to be read from the disk. When modifying a page in the database, the page in the buffer pool is modified first, and then the modified page is flushed to the disk at a certain frequency. This can also be understood as overwriting the original page with the modified page and writing it to the disk. Furthermore, the modified page is not flushed to the disk every time it changes. Therefore, based on the InnoDB storage engine, data in the MySQL database persists even when the server is shut down or power is lost. This application uses InnoDB as an example to illustrate the data processing method; other storage engines supported by MySQL can also be implemented using the solutions provided in this application.

[0047] In one embodiment, such as Figure 2 As shown, a data processing method is provided, which can be applied to... Figure 1 Taking a computer device as an example, the explanation includes the following steps:

[0048] S201, retrieve the tablespace file and table structure file of the MySQL database from the preset storage space; the tablespace file and table structure file are generated based on the storage engine supported by the MySQL database.

[0049] In this context, storage space refers to the buffer used by the MySQL database to store data. The buffer stores tablespace files and table structure files. When the MySQL database is running, tablespace files and table structure files can be generated based on the InnoDB storage engine and stored in the buffer. The space in the buffer used to manage data files is called a tablespace. The managed data files include tablespace files and table structure files. Tablespace files are files that store data, and table structure files are files that store the data structures stored in the tablespace files. The storage engine refers to the storage method or format in which the database stores data in the file system. Before data is saved to the data files, it is transferred to the storage engine and then stored according to the storage format of the different storage engines.

[0050] It should be noted that the storage structure of data in a database can be multiple tablespaces included in a database. A tablespace can contain multiple data files. A tablespace is divided into segments of multiple physical files. A segment consists of multiple intervals. An interval consists of a set of contiguous pages. A page is the smallest unit of disk management in a storage engine and is used to store related data in the database. For example, in this embodiment, a page can be the smallest unit of disk management in the InnoDB storage engine.

[0051] Understandably, creating a table in a database will generate a tablespace file with the filename extension ".ibd" and a table structure file with the filename extension ".frm" to store the table's data structure. In this embodiment, in a scenario where the MySQL database supports the InnoDB storage engine, the tablespace file and table structure file in the tablespace containing the data can be obtained by executing the `innodb_file_per_table` command.

[0052] S202, parse the table structure file to obtain the table structure information corresponding to the tablespace file.

[0053] The table structure information includes the row record format of the data stored in the tablespace file, the column names, column order, column data types, column lengths, etc.

[0054] It is understandable that when the MySQL database is offline, it is impossible to query the table structure information of the target table directly from the database. Therefore, in this embodiment, the table structure file of the MySQL database can be parsed by executing a command to obtain the data information stored in the table structure file corresponding to the tablespace file, thereby obtaining the table structure information corresponding to the tablespace file. In an optional embodiment, if the tablespace file is missing, the table structure information can be manually entered.

[0055] S203: Retrieve row data from the MySQL database based on table structure information and tablespace file.

[0056] Row data refers to the actual data of each item stored in the MySQL database.

[0057] In this embodiment, the data storage format in the corresponding tablespace file can be obtained based on the parsed table structure information. Based on the data storage format, the data type and data object of the stored data can be obtained. The tablespace file is then parsed based on the obtained data type and data object of the stored data to obtain the row data of the MySQL database.

[0058] In the above data processing method, tablespace files and table structure files generated based on the storage engine supported by the MySQL database are obtained from a preset storage space. The table structure file is parsed to obtain the table structure information corresponding to the tablespace file. Based on the table structure information and the tablespace file, the row data of the MySQL database is obtained. Since the files generated by the storage engine supported by the MySQL database are stored on the disk, compared with traditional techniques, data can be extracted from the database even when the MySQL database is offline. This makes data extraction independent of the database operation, reduces the limitations of data extraction, and thus improves the efficiency of data extraction. In addition, by parsing the table structure information, the data storage format can be quickly obtained. Therefore, by parsing the tablespace file according to the data storage format, accurate row data of the MySQL database can be obtained, thereby improving the accuracy of the extracted data.

[0059] In the scenario described above for retrieving row data from a MySQL database, row data can be obtained based on table structure information and tablespace files. In one embodiment, such as... Figure 3 As shown, the above S203 includes:

[0060] S301, based on the page header information of the tablespace file, retrieve the data pages in the tablespace file.

[0061] The page header information records the page's status information, which may include the page type, page format, number of rows in the page, page index, and page space usage. The page status information includes common header information and transaction information. The common header information includes the page type, the starting position of the data record on the disk, the starting position of the free list, the page directory information, and the page checksum. The transaction information includes active transaction information and historical transaction information. A data page refers to the page in the MySQL database's tablespace file used to store the actual data.

[0062] It should be noted that in a database, a page is the smallest unit of data management. The structure of a page includes a file header, page data, and file footer. Different page types have different characteristics. Pages in a tablespace can be divided into different types based on their stored content, including pages that store actual data, pages that store management data, and pages that store backup-related data. Pages that store actual data include data pages, index pages, row overflow pages, etc. Pages that store management data include pages that manage available space and pages that manage extents allocated by allocation units, etc. Pages that store backup-related data include pages containing information about extents that have been modified since the last backupdatabase statement. The file header information of different types of pages is different.

[0063] In this embodiment, the data page used to store the actual data can be determined based on the page type contained in the page header information of the page in the tablespace, thereby obtaining the data page in the tablespace file.

[0064] S302, Based on the table structure information, obtain the row record format of the data page.

[0065] Among them, row record format refers to the format in which data is stored in the database. In a data page, data can be stored in the form of row records, and each row record represents the overall information of a thing.

[0066] In this embodiment, the data page can be parsed according to different types of row record formats, and the data page can be parsed into several row records to obtain the row record format of the data page.

[0067] S303: Retrieve row data based on table structure information, row record format, and data page.

[0068] In this embodiment, the format of the data stored in the data page can be determined according to the row record format. The starting position of each row of data in the data page can be obtained according to the format of the data stored in the data page. Then, the starting position of each column of data in each row of data can be obtained according to the length of each column of data stored in the table structure information. Thus, each row of data is split into each column of data according to the starting position of each column of data. Then, each column of data is parsed into data types and data objects that the program can directly recognize, and row data is obtained.

[0069] In this embodiment, by using the header information of the tablespace file, the data pages in the tablespace file can be quickly obtained. Then, based on the table structure information, the row record format of the data pages can be accurately obtained. Thus, based on the table structure information, row record format, and data pages, row data can be quickly and accurately obtained, thereby improving the efficiency and accuracy of data extraction.

[0070] In the scenario described above for retrieving row data from a MySQL database, row data can be obtained based on table structure information, row record format, and data page. In one embodiment, such as... Figure 4 As shown, the above S303 includes:

[0071] S401: Split the data page according to the row record format, obtain multiple data rows corresponding to the data page, and store the multiple data rows in a preset message queue.

[0072] Message queues are containers that store messages during transmission. They are used to receive messages and store them as files, and are a technology for exchanging information between distributed applications.

[0073] In this embodiment, the acquired data page can be split and parsed according to the row record format in the table structure information to obtain multiple data rows contained in the data page, and then the obtained multiple data rows can be stored in a preset message queue. Optionally, the message queue in this embodiment can be a regular queue, an Advanced Message Queuing Protocol (AMQP) message queue, or a message queue that supports the use of the open-source Kafka message queue; this embodiment does not impose any restrictions.

[0074] In an optional embodiment, a high watermark and a low watermark can be set in the message queue to indicate the number of messages stored. Furthermore, if the number of messages in the message queue exceeds the preset high watermark, the data page parsing process is paused; if the number of messages in the message queue is lower than the preset low watermark, the data page parsing process is re-executed.

[0075] It should be noted that if the data type is a long-length type, there may be situations where the data cannot be stored in a single data page. This is called row overflow, and the overflow data needs to be stored in a specific type of page. Different row record formats in MySQL databases result in different storage formats for overflow data. For example, in the Redundant row record format, columns of VARHCAR type data with a length exceeding 8098 bytes will experience "row overflow". The first 768 bytes of prefix data will be stored in the current data page, and the excess part will be stored in a data page pointing to the "BLOB" type according to the offset. Therefore, when splitting data pages for overflow data, it is also necessary to split the data pages offset from the current data page.

[0076] S402, based on the length information of the data columns in the table structure information, split the multiple data rows stored in the message queue to obtain multiple byte arrays.

[0077] In this context, a data column refers to the constituent unit of a data row. A row can contain one or more columns, and different columns have different types, lengths, and stored values.

[0078] In this embodiment, the bytes of multiple data rows stored in the message queue can be split according to the length information of the data column corresponding to each row record in different data pages stored in the table structure information to obtain multiple byte arrays, with each byte data corresponding to a specific column of the row record.

[0079] Optionally, in this embodiment, the stored row records can be split separately according to a first-in, last-out (LIFO) order, or the stored row records can be split concurrently. The concurrent processing can be scheduled according to the following model:

[0080] Setting the total concurrent execution capacity in memory to M, after the system starts and begins concurrent processing, the number of concurrent processes can be calculated from both CPU and memory perspectives. The baseline value for available CPU resources can be set to... The base value for available memory resources is set to

[0081] (1) Determine the maximum number of concurrent connections based on 80% CPU resource utilization. The calculation formula is: Where C represents the number of CPUs.

[0082] (2) Determine the maximum number of concurrent connections based on the maximum memory capacity. The calculation formula is: Where m is the memory capacity.

[0083] The maximum number of concurrent processes that can handle multiple rows in a message queue is:

[0084] Once the system is running, the potential for increased concurrency per unit of CPU can be reassessed based on the already running threads. Assuming linear CPU consumption, the utilization per unit of CPU is: The number of concurrent connections that can be added to the CPU is:

[0085] Then, the increase in concurrency that can be achieved by concurrently splitting multiple rows in the message queue is: K = min(n CPU n mem_max ), where n mem_max This represents the number of memory units that can be allocated at this stage.

[0086] S403: Based on the row record format and table structure information, multiple byte arrays are parsed concurrently to obtain row data.

[0087] In this embodiment, each byte array can be converted into data of the corresponding type according to the data type stored in the table structure information, such as integer data, floating-point data, string data, etc. Furthermore, the converted data can be converted into corresponding data according to the data object in the row record format, such as the name being string data, the number being integer data, and the price being floating-point data.

[0088] In this embodiment, data pages can be split according to the row record format to obtain multiple data rows corresponding to the data page. Furthermore, the parsed multiple data rows can be segmented according to the column length of each row record stored in the table structure information to obtain multiple byte arrays. This allows for further concurrent parsing of each byte array. Since the number of bytes in each segmented byte array is reduced compared to the original whole data row, the computational load of the parsing process is reduced, thereby improving the efficiency of data extraction. In addition, since the obtained data rows are stored in a message queue, concurrent parsing processing can be performed on the stored data rows, further improving the efficiency of data extraction.

[0089] In the scenario described above where data pages in a tablespace file are retrieved based on the page header information of the tablespace file, in one embodiment, such as Figure 5 As shown, the above S301 includes:

[0090] S501, retrieve the file header information from the page header information.

[0091] The file header information records the page type information, the page number used for unique identification, pointers to the previous and next pages, and the offset value of the page in the tablespace.

[0092] Optionally, in this embodiment, the header information of the tablespace file in the database can be obtained by executing the corresponding header information retrieval command, and then the file header information can be obtained based on the obtained header information. Alternatively, the file header information can be obtained by scanning all pages in the tablespace file and then based on the scanned header information.

[0093] S502 retrieves data pages from the tablespace file based on the file header information.

[0094] In this embodiment, the page type can be obtained based on the acquired file header information, and then the page with the data type of Data Page can be searched and obtained in the tablespace file according to the acquired page type.

[0095] In this embodiment, by obtaining the file header information from the page header information, and then obtaining the data page from the tablespace file based on the obtained file header information, the data page storing the actual data can be obtained quickly and accurately, thereby improving the efficiency and accuracy of obtaining row data.

[0096] In the scenario described above for obtaining table structure information, the table structure information can be obtained by parsing the table structure file. In one embodiment, S202 includes: calling a preset command to parse the table structure file and obtain the table structure information corresponding to the tablespace file.

[0097] The preset command is used to obtain table structure information. In this embodiment, the table structure information corresponding to the table structure file in the tablespace can be obtained by executing the mysqlfrm command after obtaining the table structure file with the file extension ".frm".

[0098] In this embodiment, by calling a preset command to parse the table structure file, the table structure information corresponding to the tablespace file can be obtained. Compared with traditional technology, the table structure information can be obtained directly and quickly by executing the preset command, thereby improving the efficiency of obtaining row data based on the table structure information.

[0099] In the scenario described above for obtaining row data from a MySQL database, the obtained row data can be stored. In one embodiment, the method further includes: based on the target application scenario, adopting a storage strategy corresponding to the target application scenario, and storing the row data in the target object.

[0100] Among them, the target application scenario is the specific scenario of applying row data in a MySQL database. Optionally, the target application scenario can be a data sharing scenario, a data recovery scenario, or a data analysis scenario, etc.; the storage strategy is the method of storing data corresponding to the target application scenario. Optionally, the storage strategy can be full table storage, incremental table storage, snapshot table storage, etc.; the target object refers to the object used to store and apply data in the target application scenario.

[0101] Understandably, before storing data, the acquired row data can be filtered according to preset data retention or filtering rules. For example, the data filtering function can be achieved by executing the command "where statement plus data retention or filtering rules".

[0102] In this embodiment, the filtered data can be stored in different file formats according to the storage strategy and application requirements corresponding to the target application scenario. For example, in a data sharing scenario, data can be sent to the Kafka distributed message queue by calling the Kafka client interface to achieve message sharing. In a data recovery scenario, batchinsert SQL can be exported and stored as a text file, which can then be imported using MySQL commands; alternatively, it can be exported as a load file, i.e., separated by delimiters according to row record format, and imported using the MySQL load command; or, the data can be exported as a CSV or XLS file, which users can further process according to the application scenario. In a data analysis scenario, data can be exported in different file formats according to user needs. For example, for Python-based data analysis, data can be exported as a CSV or load file.

[0103] In this embodiment, based on the target application scenario, the storage strategy corresponding to the target application scenario is adopted to store row data in the target object. This allows for flexible selection of data storage methods according to different application scenarios, meeting different data management needs and further improving data management capabilities and efficiency.

[0104] To facilitate understanding by those skilled in the art, the data processing method provided in this application will be described in detail below. This method may include:

[0105] S1, while the MySQL database is offline, runs a preset command to retrieve the tablespace file and table structure file generated by the InnoDB engine from the MySQL database on the disk.

[0106] S2 uses the mysqlfrm command to parse the table structure file and obtain table structure information.

[0107] S3: Obtain the file header information from the tablespace file based on the page header information of the tablespace file.

[0108] S4, based on the obtained file header information, identifies and retrieves data pages from the tablespace file.

[0109] S5: Split the data page according to the row record format recorded in the table structure information to obtain multiple rows of data.

[0110] S6 stores the acquired data rows in a preset message queue.

[0111] S7: Based on the column length of each data row stored in the table structure information, divide each data row in the message queue into multiple byte arrays.

[0112] S8 performs concurrent parsing of the segmented byte array based on the row record format and table structure information to obtain the row data.

[0113] S9, based on the target application scenario and the corresponding storage strategy, stores the obtained row data into the target object.

[0114] It should be understood that although the steps in the flowcharts of the above embodiments are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the above embodiments may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.

[0115] Based on the same inventive concept, this application also provides a data processing apparatus for implementing the data processing method described above. The solution provided by this apparatus is similar to the implementation scheme described in the above method; therefore, the specific limitations in one or more data processing apparatus embodiments provided below can be found in the limitations of the data processing method described above, and will not be repeated here.

[0116] In one embodiment, such as Figure 6 As shown, a data processing device is provided, including: a first acquisition module 11, a second acquisition module 12, and a third acquisition module 13, wherein:

[0117] The first acquisition module 11 is used to acquire the tablespace file and table structure file of the MySQL database from the preset storage space; the tablespace file and table structure file are generated based on the storage engine supported by the MySQL database.

[0118] The second acquisition module 12 is used to parse the table structure file and obtain the table structure information corresponding to the tablespace file.

[0119] The third acquisition module 13 is used to acquire row data from the MySQL database based on table structure information and tablespace files.

[0120] Optionally, the storage engine can be any one of the InnoDB, MyISAM, and MERGE storage engines.

[0121] The data processing device provided in this embodiment can execute the above method embodiment, and its implementation principle and technical effect are similar, so it will not be described again here.

[0122] In one embodiment, such as Figure 7 As shown, the third acquisition module 13 includes: a first acquisition unit 131, a second acquisition unit 132, and a third acquisition unit 133, wherein:

[0123] The first acquisition unit 131 is used to acquire data pages in the tablespace file based on the page header information of the tablespace file;

[0124] The second acquisition unit 132 is used to acquire the row record format of the data page based on the table structure information;

[0125] The third acquisition unit 133 is used to acquire row data based on table structure information, row record format, and data page.

[0126] The data processing device provided in this embodiment can execute the above method embodiment, and its implementation principle and technical effect are similar, so it will not be described again here.

[0127] In one embodiment, please continue to refer to Figure 7 The aforementioned third acquisition unit 133 is specifically used for:

[0128] The data page is split according to the row record format, and multiple data rows corresponding to the data page are obtained and stored in a preset message queue. According to the length information of the data column in the table structure information, the multiple data rows stored in the message queue are split to obtain multiple byte arrays. According to the row record format and table structure information, the multiple byte arrays are parsed concurrently to obtain the row data.

[0129] The data processing device provided in this embodiment can execute the above method embodiment, and its implementation principle and technical effect are similar, so it will not be described again here.

[0130] In one embodiment, please continue to refer to Figure 7 The first acquisition unit 131 mentioned above is specifically used for:

[0131] Retrieve the file header information from the page header information; based on the file header information, retrieve the data page from the tablespace file.

[0132] The data processing device provided in this embodiment can execute the above method embodiment, and its implementation principle and technical effect are similar, so it will not be described again here.

[0133] In one embodiment, please continue to refer to Figure 7 The second acquisition module 12 mentioned above includes:

[0134] The fourth acquisition unit 121 is used to call a preset command to parse the table structure file and obtain the table structure information corresponding to the tablespace file.

[0135] The data processing device provided in this embodiment can execute the above method embodiment, and its implementation principle and technical effect are similar, so it will not be described again here.

[0136] In one embodiment, please continue to refer to Figure 7 The aforementioned device also includes:

[0137] Storage module 14 is used to store row data into the target object based on the target application scenario and the storage strategy corresponding to the target application scenario.

[0138] The data processing device provided in this embodiment can execute the above method embodiment, and its implementation principle and technical effect are similar, so it will not be described again here.

[0139] Each module in the aforementioned data processing device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the operations corresponding to each module.

[0140] In one embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to perform the following steps:

[0141] Retrieves the tablespace file and table structure file of the MySQL database from the preset storage space; the tablespace file and table structure file are generated based on the storage engine supported by the MySQL database.

[0142] Parse the table structure file to obtain the table structure information corresponding to the tablespace file;

[0143] Retrieve row data from the MySQL database based on the table structure information and tablespace file.

[0144] Optionally, the storage engine can be any one of the InnoDB, MyISAM, and MERGE storage engines.

[0145] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0146] Based on the page header information of the tablespace file, retrieve the data pages in the tablespace file;

[0147] Based on the table structure information, obtain the row record format of the data page;

[0148] Retrieve row data based on table structure information, row record format, and data page.

[0149] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0150] The data page is split according to the row record format, multiple data rows corresponding to the data page are obtained, and the multiple data rows are stored in a preset message queue;

[0151] Based on the length information of the data columns in the table structure information, the multiple data rows stored in the message queue are split to obtain multiple byte arrays;

[0152] Based on the row record format and table structure information, multiple byte arrays are parsed concurrently to obtain row data.

[0153] In one embodiment, when the processor executes the computer program, it further performs the following steps: obtaining file header information from the page header information;

[0154] Based on the file header information, retrieve the data pages from the tablespace file.

[0155] In one embodiment, when the processor executes the computer program, it also performs the following steps: calling a preset command to parse the table structure file and obtain the table structure information corresponding to the tablespace file.

[0156] In one embodiment, when the processor executes the computer program, it further performs the following steps: based on the target application scenario, adopting the storage strategy corresponding to the target application scenario, storing the row data into the target object.

[0157] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, the computer program performing the following steps when executed by a processor:

[0158] Retrieves the tablespace file and table structure file of the MySQL database from the preset storage space; the tablespace file and table structure file are generated based on the storage engine supported by the MySQL database.

[0159] Parse the table structure file to obtain the table structure information corresponding to the tablespace file;

[0160] Retrieve row data from the MySQL database based on the table structure information and tablespace file.

[0161] Optionally, the storage engine can be any one of the InnoDB, MyISAM, and MERGE storage engines.

[0162] In one embodiment, when the computer program is executed by a processor, it further performs the following steps:

[0163] Based on the page header information of the tablespace file, retrieve the data pages in the tablespace file;

[0164] Based on the table structure information, obtain the row record format of the data page;

[0165] Retrieve row data based on table structure information, row record format, and data page.

[0166] In one embodiment, when the computer program is executed by a processor, it further performs the following steps:

[0167] The data page is split according to the row record format, multiple data rows corresponding to the data page are obtained, and the multiple data rows are stored in a preset message queue;

[0168] Based on the length information of the data columns in the table structure information, the multiple data rows stored in the message queue are split to obtain multiple byte arrays;

[0169] Based on the row record format and table structure information, multiple byte arrays are parsed concurrently to obtain row data.

[0170] In one embodiment, when the computer program is executed by the processor, it further performs the following steps: obtaining file header information from the page header information;

[0171] Based on the file header information, retrieve the data pages from the tablespace file.

[0172] In one embodiment, when the computer program is executed by the processor, it further performs the following steps: invoking a preset command to parse the table structure file and obtain the table structure information corresponding to the tablespace file.

[0173] In one embodiment, when the computer program is executed by the processor, it further performs the following steps: based on the target application scenario, adopting the storage strategy corresponding to the target application scenario, storing the row data into the target object.

[0174] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, performs the following steps:

[0175] Retrieves the tablespace file and table structure file of the MySQL database from the preset storage space; the tablespace file and table structure file are generated based on the storage engine supported by the MySQL database.

[0176] Parse the table structure file to obtain the table structure information corresponding to the tablespace file;

[0177] Retrieve row data from the MySQL database based on the table structure information and tablespace file.

[0178] Optionally, the storage engine can be any one of the InnoDB, MyISAM, and MERGE storage engines.

[0179] In one embodiment, when the computer program is executed by a processor, it further performs the following steps:

[0180] Based on the page header information of the tablespace file, retrieve the data pages in the tablespace file;

[0181] Based on the table structure information, obtain the row record format of the data page;

[0182] Retrieve row data based on table structure information, row record format, and data page.

[0183] In one embodiment, when the computer program is executed by a processor, it further performs the following steps:

[0184] The data page is split according to the row record format, multiple data rows corresponding to the data page are obtained, and the multiple data rows are stored in a preset message queue;

[0185] Based on the length information of the data columns in the table structure information, the multiple data rows stored in the message queue are split to obtain multiple byte arrays;

[0186] Based on the row record format and table structure information, multiple byte arrays are parsed concurrently to obtain row data.

[0187] In one embodiment, when the computer program is executed by the processor, it further performs the following steps: obtaining file header information from the page header information;

[0188] Based on the file header information, retrieve the data pages from the tablespace file.

[0189] In one embodiment, when the computer program is executed by the processor, it further performs the following steps: invoking a preset command to parse the table structure file and obtain the table structure information corresponding to the tablespace file.

[0190] In one embodiment, when the computer program is executed by the processor, it further performs the following steps: based on the target application scenario, adopting the storage strategy corresponding to the target application scenario, storing the row data into the target object.

[0191] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties.

[0192] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0193] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0194] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A data processing method, characterized in that, The method includes: The tablespace file and table structure file of the MySQL database are obtained from the preset storage space; the tablespace file and the table structure file are generated based on the storage engine supported by the MySQL database. The table structure file is parsed to obtain the table structure information corresponding to the tablespace file; Based on the table structure information and the tablespace file, obtain the row data of the MySQL database; The step of obtaining row data from the MySQL database based on the table structure information and the tablespace file includes: Based on the page header information of the tablespace file, obtain the data pages in the tablespace file; Based on the table structure information, obtain the row record format of the data page; The data page is split according to the row record format, multiple data rows corresponding to the data page are obtained, and the multiple data rows are stored in a preset message queue; Based on the length information of the data columns in the table structure information, the multiple data rows stored in the message queue are split to obtain multiple byte arrays; Based on the row record format and the table structure information, the multiple byte arrays are parsed concurrently to obtain the row data.

2. The method according to claim 1, characterized in that, The step of obtaining the data pages in the tablespace file based on the page header information of the tablespace file includes: Obtain the file header information from the page header information; Based on the file header information, the data page is obtained from the tablespace file.

3. The method according to claim 1, characterized in that, The step of parsing the table structure file to obtain the table structure information corresponding to the tablespace file includes: A preset command is invoked to parse the table structure file and obtain the table structure information corresponding to the tablespace file.

4. The method according to claim 1, characterized in that, The storage engine includes any one of the InnoDB, MyISAM, and MERGE storage engines.

5. The method according to claim 1, characterized in that, The method further includes: Based on the target application scenario, the storage strategy corresponding to the target application scenario is adopted to store the row data in the target object.

6. A data processing apparatus, characterized in that, The device includes: The first acquisition module is used to acquire the tablespace file and table structure file of the MySQL database from a preset storage space; the tablespace file and the table structure file are generated based on the storage engine supported by the MySQL database; The second acquisition module is used to parse the table structure file and obtain the table structure information corresponding to the tablespace file. The third acquisition module is used to acquire row data of the MySQL database based on the table structure information and the tablespace file; The third acquisition module is specifically used to: acquire data pages in the tablespace file based on the page header information of the tablespace file; acquire the row record format of the data pages according to the table structure information; split the data pages according to the row record format to acquire multiple data rows corresponding to the data pages, and store the multiple data rows in a preset message queue; perform segmentation processing on the multiple data rows stored in the message queue according to the length information of the data columns in the table structure information to acquire multiple byte arrays; and perform concurrent parsing of the multiple byte arrays according to the row record format and the table structure information to obtain the row data.

7. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 5.

8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 5.

9. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 5.