A database data storage optimization method and device, electronic equipment and storage medium
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TIANJIN NANKAI UNIV GENERAL DATA TECH
- Filing Date
- 2026-02-02
- Publication Date
- 2026-06-19
AI Technical Summary
Existing databases suffer from performance degradation and query latency issues when storing LONGBLOB data, especially when dealing with large binary files, resulting in low storage efficiency and excessive memory consumption.
By performing morphological recognition and classification on the LONGBLOB data type, initialization and compression are performed using enumeration flags, and data storage is optimized using a preset compression algorithm, including the differentiation and processing of morphological types such as variable strings, transaction files, global files, network files, and stable storage.
It improves storage space utilization, reduces network transmission and I/O overhead, enhances the overall performance and query efficiency of the database, and is suitable for distributed database environments.
Smart Images

Figure CN122240684A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the fields of distributed database and big data technology, and specifically to a database data storage optimization method, apparatus, electronic device and storage medium. Background Technology
[0002] BLOB (Binary Large Object) is a data type used to store large binary objects, primarily for representing and storing binary data such as unstructured data like images, audio, video, or documents. In databases, the BLOB type can store large amounts of binary data. Currently, the maximum size of a single BLOB record in a database is 32KB (kilobytes). For scenarios where data length exceeds the BLOB range, the LONGBLOB (Long Binary Large Object) type is needed. However, the LONGBLOB type exhibits a significant performance degradation compared to the BLOB type. This performance degradation is due to the fact that during data transmission, data smaller than 32KB is typically stored in data units as Varchar (a variable-length string type) for transmission. However, LONGBLOB data exceeding 32KB needs to be stored in a temporary file and sent separately. Large file transfers put significant pressure on the network and disk, impacting storage efficiency. Furthermore, if a LONGBLOB field exists in a database table, the database query performance will be significantly affected. Summary of the Invention
[0003] In view of the above problems, this application provides a database data storage optimization method, apparatus, electronic device and storage medium.
[0004] According to the first aspect of this application, a database data storage optimization method is provided, comprising: identifying the data processing form of a target data type in the database based on historical processing information of the database, obtaining a form identification result, and identifying the form identification result using an enumeration type to obtain an enumeration identification result; dividing the storage unit of the target data type to obtain a compression-form flag, and initializing the compression-form flag using the enumeration identification result to obtain an initialized compression-form flag; and optimizing the storage of the target data type based on the data processing scenario of the database using the initialized compression-form flag to obtain optimized storage data.
[0005] According to embodiments of this application, the above enumeration identifier results include the variable string form, transaction file form, global file form, network file form, temporary file form, stable storage form, optional compression state, format mask, and optional mask of the target data type.
[0006] According to embodiments of this application, the aforementioned variable string format represents target data of a data type with a data size of less than 32KB, the transaction file format represents target data of a data type used to describe database transaction management information, the global file represents target data of a data type with a data size greater than 32KB, the network file format represents target data of a data type that is network data, the temporary file format represents target data of a data type that is temporary data used for transmission, and the stable storage format represents target data of a data type that is stablely stored in the database; wherein, when the target data of a data type is in temporary file format or stable storage format, the target data of a data type is written to a binary format file.
[0007] According to an embodiment of this application, the above-mentioned initialization of the compression-format flag bit using the enumeration identifier result to obtain the initialized compression-format flag bit includes: initializing the first segment of the compression-format flag bit by performing an AND operation on the optional mask, wherein when the value of the first segment is non-zero, the target data type is in a compressed state; and initializing the second segment of the compression-format flag bit by performing an AND operation on the format mask, wherein the value of the second segment represents the current data format of the target data type.
[0008] According to an embodiment of this application, in the above-mentioned database-based data processing scenario, the storage optimization of the target data type data is performed using the initialized compression-format flag to obtain the storage-optimized data. This includes: when the data processing scenario involves data writing, obtaining the storage path, storage capacity, and data format of the target data type data using the initialized compression-format flag; compressing the storage path, storage capacity, and data format using a preset compression algorithm to obtain compressed information; and storing the compressed information in a multi-level binary format file to obtain the storage-optimized data.
[0009] According to an embodiment of this application, the above-mentioned database-based data processing scenario, which optimizes the storage of data of the target data type by using the initialized compression-format flag bit to obtain the optimized data, further includes: when the data processing scenario is data reading, calling a preset compression algorithm to parse the binary format file of the target data type data to obtain the storage path, storage capacity and data format of the target data type data, and using the storage path, storage capacity and data format to perform data consistency verification on the decompressed target data type data.
[0010] According to embodiments of this application, the aforementioned preset compression algorithm includes a standard fast compression algorithm based on entropy coding.
[0011] According to a second aspect of this application, a database data storage optimization device is provided, comprising: a morphological marking module, used to identify the data processing morphology of a target data type in the database based on historical processing information of the database, obtain a morphological recognition result, and use an enumeration type to mark the morphological recognition result to obtain an enumeration marking result; a flag bit acquisition module, used to divide the storage unit of the target data type to obtain a compression-morphological flag bit, and use the enumeration marking result to initialize the compression-morphological flag bit to obtain an initialized compression-morphological flag bit; and a storage optimization module, used to optimize the storage of the target data type based on the data processing scenario of the database, using the initialized compression-morphological flag bit to obtain storage-optimized data.
[0012] A third aspect of this application provides an electronic device comprising: one or more processors; and a memory for storing one or more computer programs, wherein the one or more processors execute the one or more computer programs to implement the steps of the method described above.
[0013] A fourth aspect of this application also provides a computer-readable storage medium having a computer program or instructions stored thereon, which, when executed by a processor, implement the steps of the above-described method.
[0014] The database data storage optimization method provided in this application achieves precise classification and storage of different types of data by dividing storage units and adding morphological flags; dynamically adjusting the storage strategy according to the actual data processing scenario can improve storage space utilization and query performance; and by combining compression and morphological flags, it can effectively reduce storage space occupation, reduce network transmission and I / O (Input / Output) overhead of the database, and improve the overall performance of the database. Attached Figure Description
[0015] The above-mentioned contents, other objects, features and advantages of this application will become clearer from the following description of embodiments with reference to the accompanying drawings, in which:
[0016] Figure 1 An application scenario diagram of the database data storage optimization method according to an embodiment of this application is shown.
[0017] Figure 2 A flowchart of a database data storage optimization method according to an embodiment of this application is shown.
[0018] Figure 3 A structural block diagram of a database data storage optimization apparatus according to an embodiment of this application is shown.
[0019] Figure 4A block diagram of an electronic device suitable for implementing a database data storage optimization method according to an embodiment of this application is shown. Detailed Implementation
[0020] The embodiments of this application will now be described with reference to the accompanying drawings. However, it should be understood that these descriptions are exemplary only and are not intended to limit the scope of this application. In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the embodiments of this application for ease of explanation. However, it will be apparent that one or more embodiments may be implemented without these specific details. Furthermore, descriptions of well-known structures and technologies are omitted in the following description to avoid unnecessarily obscuring the concepts of this application.
[0021] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of this application. The terms “comprising,” “including,” etc., as used herein indicate the presence of the stated features, steps, operations, and / or components, but do not exclude the presence or addition of one or more other features, steps, operations, or components.
[0022] All terms used herein (including technical and scientific terms) have the meanings commonly understood by those skilled in the art, unless otherwise defined. It should be noted that the terms used herein are to be interpreted in a manner consistent with the context of this specification, and not in an idealized or overly rigid way.
[0023] When using expressions such as "at least one of A, B and C", they should generally be interpreted in accordance with the meaning that is commonly understood by those skilled in the art (e.g., "a system having at least one of A, B and C" should include, but is not limited to, a system having A alone, a system having B alone, a system having C alone, a system having A and B, a system having A and C, a system having B and C, and / or a system having A, B and C, etc.).
[0024] In existing databases, using the LONGBLOB data type to store large binary files typically leads to decreased database storage efficiency and increased query latency. Furthermore, processing large binary files requires significant memory, making it easy for databases to encounter data processing errors when handling massive amounts of such files. Therefore, it is necessary to provide a storage optimization solution for the LONGBLOB data type to at least address one of the problems in existing technologies.
[0025] This invention provides a database storage optimization method that improves the insertion performance of the LONGBLOB data type by optimizing the LONGBLOB data type, thereby solving problems such as performance degradation in the database.
[0026] Figure 1An application scenario diagram of the database data storage optimization method according to an embodiment of this application is shown.
[0027] like Figure 1 As shown, application scenario 100 according to this embodiment may include application scenarios such as databases and big data. Network 104 is used as a medium to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. Network 104 may include various connection types, such as wired or wireless communication links or fiber optic cables, etc.
[0028] Users can use the first terminal device 101, the second terminal device 102, and the third terminal device 103 to interact with the server 105 via the network 104 to receive or send messages, etc. Various communication client applications can be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social media platform software, etc. (for example only).
[0029] The first terminal device 101, the second terminal device 102, and the third terminal device 103 can be various electronic devices with displays and support web browsing, including but not limited to smartphones, tablets, laptops, and desktop computers.
[0030] Server 105 can be a server that provides various services, such as a backend management server that supports websites browsed by users using the first terminal device 101, the second terminal device 102, and the third terminal device 103 (this is just an example). The backend management server can analyze and process data such as received user requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal devices.
[0031] It should be noted that the database data storage optimization method provided in this application embodiment can generally be executed by server 105. Correspondingly, the database data storage optimization device provided in this application embodiment can generally be located in server 105. The database data storage optimization method provided in this application embodiment can also be executed by a server or server cluster that is different from server 105 and capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and / or server 105. Correspondingly, the database data storage optimization device provided in this application embodiment can also be located in a server or server cluster that is different from server 105 and capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and / or server 105.
[0032] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0033] The following will be based on Figure 1 The scenario described herein will be described in detail with reference to the accompanying drawings or specific embodiments to illustrate the database data storage optimization method of the disclosed embodiments.
[0034] Figure 2 A flowchart of a database data storage optimization method according to an embodiment of this application is shown.
[0035] like Figure 2 As shown, the database data storage optimization in this embodiment includes operations S210 to S230.
[0036] In operation S210, based on the historical processing information of the database, the data processing form of the target data type in the database is identified to obtain the form identification result, and the form identification result is identified by the enumeration type to obtain the enumeration identification result.
[0037] The data processing form of the target data type mentioned above refers to the type of data object (e.g., high-definition image, audio, video) and the storage format of the data object during the data type processing process.
[0038] The aforementioned databases include centralized databases or distributed databases, wherein distributed databases, for example, can be column-based analytical databases.
[0039] The target data type mentioned above is the LONGBLOB data type, which is used to store video, audio, high-definition images, or large PDF (Portable Document Format) files. Those skilled in the art can apply the technical solutions of this application to other large data types in databases based on the disclosure herein.
[0040] In operation S220, the storage unit of the target data type is divided to obtain the compression-format flag bit, and the compression-format flag bit is initialized using the enumeration flag result to obtain the initialized compression-format flag bit.
[0041] Based on the characteristics and storage requirements of the target data type, the data is logically divided into fixed-size blocks (e.g., dividing 8 bits in 1 byte) to form multiple storage units, each containing a unique identifier and basic metadata information.
[0042] In the S230 operation, in a database-based data processing scenario, the storage of data of the target data type is optimized by using the initialized compression-format flags to obtain the optimized data.
[0043] The database data storage optimization method provided in this application achieves precise classification and storage of different types of data by dividing storage units and adding morphological flags; dynamically adjusting the storage strategy according to the actual data processing scenario can improve storage space utilization and query performance; and by combining compression and morphological flags, it can effectively reduce storage space occupation, reduce network transmission and I / O (Input / Output) overhead of the database, and improve the overall performance of the database.
[0044] According to embodiments of this application, the above enumeration identifier results include the variable string form, transaction file form, global file form, network file form, temporary file form, stable storage form, optional compression state, format mask, and optional mask of the target data type.
[0045] According to embodiments of this application, the aforementioned variable string format represents target data of a data type with a data size of less than 32KB, the transaction file format represents target data of a data type used to describe database transaction management information, the global file represents target data of a data type with a data size greater than 32KB, the network file format represents target data of a data type that is network data, the temporary file format represents target data of a data type that is temporary data used for transmission, and the stable storage format represents target data of a data type that is stablely stored in the database; wherein, when the target data of a data type is in temporary file format or stable storage format, the target data of a data type is written to a binary format file.
[0046] The following section uses the LONGBLOB data type as an example to illustrate the modification of the marking of the target data type in this application, adding a compressed form mark.
[0047] LONGBLOB data exists in multiple forms during internal processing, and these forms are marked by the following enumeration values:
[0048] enum longblob_flag{
[0049] LONG_BLOB_DATA=0x01,
[0050] LONG_BLOB_FILE,
[0051] LONG_BLOB_GLOBAL,
[0052] LONG_BLOB_NET,
[0053] LONG_BLOB_TEMP,
[0054] LONG_BLOB_BLOCK
[0055] };
[0056] in,
[0057] LONG_BLOB_DATA: Variable string (i.e., variable length string) format, data smaller than 32KB is stored in data units in varchar format;
[0058] LONG_BLOB_FILE: Transaction file format, a data format that may be used within a transaction;
[0059] LONG_BLOB_GLOBAL: Global file format, the materialized data format of longblob data larger than 32KB;
[0060] LONG_BLOB_NET: Network file type. This flag is used when the LONGBLOB data type is TYPE_URL.
[0061] LONG_BLOB_TEMP: Temporary file format, the transmission format of LONGBLOB data larger than 32KB. It is a temporary file that can be directly written into the BLK file after being sent to the other end.
[0062] LONG_BLOB_BLOCK: The final stable storage form of LONGBLOB data larger than 32KB. Only a link to the corresponding BLK file is stored. When reading data, the corresponding BLK file is accessed through this link.
[0063] The embodiments of this application, by classifying data according to its form (such as variable strings, transaction files, global files, etc.), can select the most suitable storage method for different data characteristics, thereby improving overall storage efficiency. Different processing logics are adopted according to different data forms; for example, temporary files and stable storage data need to be written to binary format files, enhancing the system's adaptability and scalability. This solution is suitable for distributed database scenarios, helping to achieve more reasonable data distribution and management in multi-node environments. Special processing of transaction files and stable storage data helps ensure data integrity and consistency in distributed environments. The introduction of optional compression states and format mask mechanisms allows data to be compressed or converted according to actual needs, further saving storage space and improving transmission efficiency. The embodiments of this application, through structured classification and differentiated processing, significantly improve the storage efficiency and management capabilities of LONGBLOB type data in distributed databases.
[0064] According to an embodiment of this application, the above-mentioned initialization of the compression-format flag bit using the enumeration identifier result to obtain the initialized compression-format flag bit includes: initializing the first segment of the compression-format flag bit by performing an AND operation on the optional mask, wherein when the value of the first segment is non-zero, the target data type is in a compressed state; and initializing the second segment of the compression-format flag bit by performing an AND operation on the format mask, wherein the value of the second segment represents the current data format of the target data type.
[0065] To distinguish whether data is compressed, a compression flag is added based on the longblob_flag mentioned above. The implementation scheme is as follows:
[0066] Divide the LONGBLOB data tag (1 byte) into two usable segments:
[0067] Section A: 1 bit, LONGBLOB data compression flag, used to record whether the current longblob data is compressed;
[0068] Section B: 4 bits, LONGBLOB data format, used to record the current LONGBLOB data format.
[0069] To avoid judging visible characters, the highest three bits are not currently used.
[0070] The following enumeration values have been added to longblob_flag:
[0071] LONG_BLOB_FMT_MASK=0x0F (hexadecimal 15).
[0072] LONG_BLOB_OPT_COMP=0x10 (hexadecimal 16).
[0073] LONG_BLOB_OPT_MASK=LONG_BLOB_OPT_COMP;
[0074] in,
[0075] LONG_BLOB_OPT_COMP: Represents compressed LONGBLOB data.
[0076] LONG_BLOB_FMT_MASK and LONG_BLOB_OPT_MASK are used as masks to perform a AND operation (logical AND operation) to preserve the flags of segment A or segment B. The specific operation method is as follows:
[0077] In section A, performing an AND operation with LONG_BLOB_OPT_MASK yields a result representing the compressed form of the LONGBLOB data. A non-zero result indicates compression, while a zero result indicates no compression. In section B, performing an AND operation with LONG_BLOB_FMT_MASK yields a result representing the actual data form.
[0078] For example: if the LONGBLOB flag 0x16 is received, the result of the AND operation in section A is 0x01, indicating that the current data compression format is compressed; the result of the AND operation in section B is 0x06, indicating that the current data format is LONG_BLOB_BLOCK.
[0079] The above embodiments of this application divide the compression-format flag into a first segment and a second segment, which are used to represent the compression state and data format, respectively. The structure is clear and easy to manage and parse. The non-zero value of the first segment directly determines whether the target data is in a compressed state, which improves the efficiency and accuracy of state identification. The second segment determines the data format through the AND operation of the format mask, which can accurately reflect the current specific format of the target data type. The storage optimization of the LONGBLOB data type in the distributed database helps to improve the efficiency and performance of large data volume storage.
[0080] According to an embodiment of this application, in the above-mentioned database-based data processing scenario, the storage optimization of the target data type data is performed using the initialized compression-format flag to obtain the storage-optimized data. This includes: when the data processing scenario involves data writing, obtaining the storage path, storage capacity, and data format of the target data type data using the initialized compression-format flag; compressing the storage path, storage capacity, and data format using a preset compression algorithm to obtain compressed information; and storing the compressed information in a multi-level binary format file to obtain the storage-optimized data.
[0081] According to an embodiment of this application, the above-mentioned database-based data processing scenario, which optimizes the storage of data of the target data type by using the initialized compression-format flag bit to obtain the optimized data, further includes: when the data processing scenario is data reading, calling a preset compression algorithm to parse the binary format file of the target data type data to obtain the storage path, storage capacity and data format of the target data type data, and using the storage path, storage capacity and data format to perform data consistency verification on the decompressed target data type data.
[0082] According to embodiments of this application, the aforementioned preset compression algorithm includes a standard fast compression algorithm based on entropy coding. A standard fast compression algorithm based on entropy coding is, for example, the ZSTD algorithm.
[0083] New compression and decompression functions have been added for compressing and decompressing LONGBLOB data. The compression algorithm uses the ZSTD algorithm with a level of 5.
[0084] A new function has been added to construct and retrieve compression information, which is used to mark the position and length of LONGBLOB data. The compression information is constructed synchronously when the compression function is called, for use during subsequent decompression.
[0085] After the write operation calls the compression function, it calls the function to construct compression information, concatenating the location and length of the LONGBLOB data to the BLK file path and saving it. The read operation first calls the function to obtain the compression information, which obtains the location and length of the LONGBLOB data concatenated after the BLK file path. Then, based on the relevant information, it calls the decompression function to restore the actual data and complete the original data length verification.
[0086] For example, the final data file information generated using the above scheme is 0-0-BLK.dat_0-19-40000, where 0-0-BLK.dat represents the BLK file name that stores the LONGBLOB data, and 0-19-40000 represents the specific position and length information in the above BLK file: the offset of the data start position in the file - the length of the compressed data - the length of the original data.
[0087] The embodiments described above in this application first employ a compression information construction mechanism. During data writing, the database first compresses the LONGBLOB data and simultaneously constructs compression information. This compression information includes three key elements: the starting offset of the data within the BLK file, the length of the compressed data, and the original data length. This information is concatenated with the BLK file path in a specific format to form a complete data location identifier. Then, during data reading, the database first calls a function to retrieve the compression information, parsing the BLK file path and the additional position and length information from the stored complete identifier. Based on this metadata, the system can accurately locate the specific position of the compressed data within the file and obtain the length information before and after compression, providing necessary parameters for subsequent decompression and data verification. Secondly, through a data verification mechanism, after decompression, the database verifies the original data length stored in the compression information to ensure the integrity of the decompressed data. This dual verification mechanism effectively prevents data corruption or loss that may occur during compression / decompression. The embodiments described above in this application exhibit good adaptability to distributed environments, simplifying the complexity of cross-node data location by binding location information to file paths. Each node only needs to parse the data identifier to obtain complete access information, eliminating the need for additional metadata query operations and improving data access efficiency in a distributed environment. Simultaneously, by compressing and storing large-capacity LONGBLOB data and integrating key location information with file paths for unified management, this solution effectively reduces storage space usage, simplifies data management complexity, and improves the overall performance of distributed databases in processing large object data. Furthermore, the embodiments of this application employ compressed storage for LONGBLOB data, reducing network transmission and I / O overhead and significantly improving performance. Considering the characteristics of LONGBLOB data, the compression algorithm uses a level 5 ZSTD algorithm, increasing speed without significantly sacrificing compression effectiveness, making it more suitable for LONGBLOB data storage scenarios.
[0088] Figure 3 A structural block diagram of a database data storage optimization apparatus according to an embodiment of this application is shown.
[0089] like Figure 3As shown, the database data storage optimization device 300 of this embodiment includes a morphological marking module 310, a flag bit acquisition module 320, and a storage optimization module 330.
[0090] The morphology marking module 310 is used to identify the data processing morphology of the target data type in the database based on the historical processing information of the database, obtain the morphology recognition result, and use the enumeration type to mark the morphology recognition result to obtain the enumeration marking result; in one embodiment, the morphology marking module 310 can be used to perform the operation S210 described above, which will not be repeated here.
[0091] The flag acquisition module 320 is used to divide the storage unit of the target data type to obtain the compression-morphology flag, and initialize the compression-morphology flag using the enumeration flag result to obtain the initialized compression-morphology flag. In one embodiment, the flag acquisition module 320 can be used to perform the operation S220 described above, which will not be repeated here.
[0092] The storage optimization module 330 is used in database-based data processing scenarios to optimize the storage of data of the target data type using initialized compression-format flags, thereby obtaining storage-optimized data. In one embodiment, the storage optimization module 330 can be used to perform the operation S230 described above, which will not be repeated here.
[0093] According to embodiments of this application, any plurality of modules among the morphology marking module 310, flag acquisition module 320, and storage optimization module 330 can be merged into one module, or any one of these modules can be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules can be combined with at least part of the functionality of other modules and implemented in one module. According to embodiments of this application, at least one of the morphology marking module 310, flag acquisition module 320, and storage optimization module 330 can be at least partially implemented as hardware circuitry, such as a field-programmable gate array (FPGA), a programmable logic array (PLA), a system-on-a-chip, a system-on-a-substrate, a system-on-package, an application-specific integrated circuit (ASIC), or any other reasonable means of integrating or packaging circuitry, or implemented in software, hardware, or firmware, or in any appropriate combination of any of these three implementation methods. Alternatively, at least one of the morphology marking module 310, flag acquisition module 320, and storage optimization module 330 can be at least partially implemented as a computer program module, which, when run, can perform corresponding functions.
[0094] Figure 4A block diagram of an electronic device suitable for implementing a database data storage optimization method according to an embodiment of this application is shown.
[0095] like Figure 4 As shown, an electronic device 400 according to an embodiment of this application includes a processor 401, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 402 or a program loaded from a storage portion 408 into a random access memory (RAM) 403. The processor 401 may include, for example, a general-purpose microprocessor (e.g., a CPU), an instruction set processor and / or an associated chipset and / or a special-purpose microprocessor (e.g., an application-specific integrated circuit (ASIC)), etc. The processor 401 may also include onboard memory for caching purposes. The processor 401 may include a single processing unit or multiple processing units for performing different actions of the method flow according to an embodiment of this application.
[0096] RAM 403 stores various programs and data required for the operation of electronic device 400. Processor 401, ROM 402, and RAM 403 are interconnected via bus 404. Processor 401 executes various operations of the method flow according to embodiments of this application by executing programs in ROM 402 and / or RAM 403. It should be noted that the programs may also be stored in one or more memories other than ROM 402 and RAM 403. Processor 401 may also execute various operations of the method flow according to embodiments of this application by executing programs stored in said one or more memories.
[0097] According to embodiments of this application, the electronic device 400 may further include an input / output (I / O) interface 405, which is also connected to a bus 404. The electronic device 400 may also include one or more of the following components connected to the input / output (I / O) interface 405: an input section 406 including a keyboard, mouse, etc.; an output section 407 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 408 including a hard disk, etc.; and a communication section 409 including a network interface card such as a LAN card, modem, etc. The communication section 409 performs communication processing via a network such as the Internet. A drive 410 is also connected to the input / output (I / O) interface 405 as needed. A removable medium 411, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 410 as needed so that computer programs read from it can be installed into the storage section 408 as needed.
[0098] This application also provides a computer-readable storage medium, which may be included in the device / apparatus / system described in the above embodiments; or it may exist independently and not assembled into the device / apparatus / system. The computer-readable storage medium carries one or more programs, which, when executed, implement the method according to the embodiments of this application.
[0099] According to embodiments of this application, the computer-readable storage medium can be a non-volatile computer-readable storage medium, such as including but not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this application, the computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. For example, according to embodiments of this application, the computer-readable storage medium may include ROM 402 and / or RAM 403 and / or one or more memories other than ROM 402 and RAM 403 described above.
[0100] Embodiments of this application also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowchart. When the computer program product is run on a computer system, the program code enables the computer system to implement the database data storage optimization method provided in the embodiments of this application.
[0101] When the computer program is executed by the processor 401, it performs the functions defined in the system / apparatus of this application embodiment. According to the embodiments of this application, the systems, apparatuses, modules, units, etc., described above can be implemented by computer program modules.
[0102] In one embodiment, the computer program may rely on a tangible storage medium such as an optical storage device or a magnetic storage device. In another embodiment, the computer program may also be transmitted and distributed in the form of signals over a network medium, and downloaded and installed via communication section 409, and / or installed from removable medium 411. The program code contained in the computer program can be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination thereof.
[0103] In such an embodiment, the computer program can be downloaded and installed from a network via communication section 409, and / or installed from removable medium 411. When the computer program is executed by processor 401, it performs the functions defined in the system of this application embodiment. According to embodiments of this application, the systems, devices, apparatuses, modules, units, etc., described above can be implemented by computer program modules.
[0104] According to embodiments of this application, program code for executing the computer programs provided in the embodiments of this application can be written in any combination of one or more programming languages. Specifically, these computational programs can be implemented using high-level procedural and / or object-oriented programming languages, and / or assembly / machine languages. Programming languages include, but are not limited to, languages such as Java, C++, Python, "C", or similar programming languages. The program code can be executed entirely on the user's computing device, partially on the user's device, partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).
[0105] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0106] Those skilled in the art will understand that the features described in the various embodiments of this application can be combined and / or combined in various ways, even if such combinations or combinations are not explicitly described in this application. In particular, the features described in the various embodiments of this application can be combined and / or combined in various ways without departing from the spirit and teachings of this application. All such combinations and / or combinations fall within the scope of this application.
[0107] The embodiments of this application have been described above. However, these embodiments are merely illustrative and not intended to limit the scope of this application. Although various embodiments have been described above, this does not mean that the measures in the various embodiments cannot be used advantageously in combination. Without departing from the scope of this application, those skilled in the art can make various substitutions and modifications, all of which should fall within the scope of this application.
Claims
1. A database data storage optimization method, characterized in that, include: Based on the historical processing information of the database, the data processing form of the target data type in the database is identified to obtain the form identification result, and the form identification result is identified by an enumeration type to obtain the enumeration identification result. The storage unit of the target data type is divided to obtain the compression-format flag bit, and the compression-format flag bit is initialized using the enumeration identifier result to obtain the initialized compression-format flag bit; Based on the data processing scenario of the database, the storage of the target data type is optimized using the initialized compression-format flags to obtain the optimized data.
2. The method according to claim 1, characterized in that, The enumeration identifier results include the variable string form, transaction file form, global file form, network file form, temporary file form, stable storage form, optional compression state, format mask, and optional mask of the target data type.
3. The method according to claim 2, characterized in that, The variable string format represents data of the target data type with a data size of less than 32KB; the transaction file format represents data of the target data type used to describe the database transaction management information; the global file represents data of the target data type with a data size of more than 32KB; the network file format represents data of the target data type that is network data; the temporary file format represents data of the target data type that is temporary data used for transmission; and the stable storage format represents data of the target data type that is stablely stored in the database. Specifically, if the target data type is in the form of a temporary file or a stable storage form, the target data type is written into a binary format file.
4. The method according to claim 3, characterized in that, The compressed-morphological flags are initialized using the enumeration identifiers to obtain the initialized compressed-morphological flags, which include: The first segment of the compression-morphology flag is initialized by performing a bitwise AND operation on the optional mask, wherein the target data type is in a compressed state when the value of the first segment is non-zero. The second segment of the compression-format flag is initialized by performing a bitwise AND operation on the format mask, wherein the value of the second segment represents the current data format of the target data type.
5. The method according to claim 1, characterized in that, Based on the data processing scenario of the database, the storage optimization of the target data type is performed using the initialized compression-format flag, resulting in the following storage-optimized data: In the case of data writing in the data processing scenario, the storage path, storage capacity and data format of the target data type are obtained by using the initialized compression-format flag. The storage path, storage capacity, and data format are compressed using a preset compression algorithm to obtain compressed information. The compressed information is then stored in multiple levels in the form of binary format files to obtain data with optimized storage.
6. The method according to claim 5, characterized in that, Also includes: In the data processing scenario of data reading, the preset compression algorithm is invoked to parse the binary format file of the target data type to obtain the storage path, storage capacity and data format of the target data type. Then, the storage path, storage capacity and data format are used to perform data consistency verification on the decompressed target data type.
7. The method according to any one of claims 5 and 6, characterized in that, The preset compression algorithm includes a standard fast compression algorithm based on entropy coding.
8. A database data storage optimization device, characterized in that, include: The morphology marking module is used to identify the data processing morphology of the target data type in the database based on the historical processing information of the database, obtain the morphology recognition result, and use an enumeration type to mark the morphology recognition result to obtain the enumeration marking result; The flag acquisition module is used to divide the storage unit of the target data type to obtain the compression-format flag, and initialize the compression-format flag using the enumeration identifier result to obtain the initialized compression-format flag. The storage optimization module is used to optimize the storage of the target data type based on the data processing scenario of the database by using the initialized compression-format flag bit, so as to obtain the storage-optimized data.
9. An electronic device, comprising: One or more processors; Memory, used to store one or more computer programs. The characteristic feature is that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium having a computer program or instructions stored thereon, characterized in that, When the computer program or instructions are executed by a processor, they implement the steps of the method according to any one of claims 1 to 7.