Array compression processing method, device, equipment and storage medium
By dividing memory address data into offset and location data, a compressed array is generated and stored in a list, which solves the problem of excessive storage space consumption in large memory environments and improves analysis efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- APOLLO INTELLIGENT CONNECTIVITY (BEIJING) TECH CO LTD
- Filing Date
- 2023-03-09
- Publication Date
- 2026-06-23
AI Technical Summary
Traditional memory image generation and analysis methods are inefficient in large memory environments, failing to meet time and cost requirements, and affecting analysis efficiency and problem localization.
By dividing memory address data into offsets and location data, using offsets to represent the same parts and location data to represent different parts, a compressed array is generated and stored in a compressed array list.
It significantly reduces the storage space occupied by the memory address array while maintaining indexing and memory addressing efficiency, thus improving memory analysis efficiency.
Smart Images

Figure CN116303114B_ABST
Abstract
Description
TECHNICAL FIELD
[0001] The present disclosure relates to the technical field of computer, in particular to the technical field of cloud computing and big data, and more particularly to an array compression processing method and device, equipment and storage medium. BACKGROUND
[0002] With the development of cloud computing and big data technology, the memory used by single server is getting larger and larger (for example, 64-bit operating system, maximum support 2^63 = 16777216TB large memory), up to several hundred G or even several T memory, occupying too much storage space.
[0003] However, using the traditional core dump technology to generate memory image and then downloading to the local cannot meet the time cost requirement, for example, the transmission time of 200G memory image is up to 1 hour to several hours, not including the time of analyzing memory, which greatly reduces the analysis efficiency and problem positioning solution. SUMMARY
[0004] The present disclosure provides an array compression processing method, device, equipment and storage medium.
[0005] According to a first aspect of the present disclosure, an array compression processing method is provided, the method comprising:
[0006] obtaining a plurality of memory address data contained in a to-be-compressed array;
[0007] According to the plurality of memory address data, the to-be-compressed array is divided into an offset and a plurality of positioning data respectively corresponding to the plurality of memory address data, wherein the offset is used to represent the same part between the plurality of memory address data, and the positioning data is used to represent the different part between the plurality of memory address data;
[0008] Based on the positioning data respectively corresponding to the plurality of memory address data, a compressed array is obtained;
[0009] Based on the offset and the compressed array, a compressed array list is obtained.
[0010] Further, according to the plurality of memory address data, the to-be-compressed array is divided into an offset and a plurality of positioning data respectively corresponding to the plurality of memory address data, comprising:
[0011] According to the plurality of memory address data, the offset between the plurality of memory address data is determined;
[0012] The offset is compared with the plurality of memory address data respectively to obtain the positioning data respectively corresponding to the plurality of memory address data.
[0013] Further, determining the offset between the multiple memory address data based on the multiple memory address data includes:
[0014] Obtain the same bytes among multiple memory address data;
[0015] The offset is obtained based on the same bytes among the multiple memory address data.
[0016] Further, the offset is compared with multiple memory address data respectively to obtain the positioning data corresponding to each of the multiple memory address data, including:
[0017] The offset is compared with the multiple memory address data respectively to obtain the different bytes among the multiple memory address data;
[0018] Based on the different bytes among the multiple memory address data, the location data corresponding to each of the multiple memory address data is obtained.
[0019] Furthermore, before obtaining the multiple memory address data contained in the array to be compressed, the method further includes:
[0020] Obtain the data shard array to be compressed, wherein the memory address data in the data shard array to be compressed exceeds the array capacity limit of the array to be compressed;
[0021] The array of data to be sharded is sharded to obtain a sharded array that meets the array capacity limit of the array to be compressed;
[0022] The fragmented array is stored in the array to be compressed, so that after the array to be compressed is full, multiple memory address data contained in the array to be compressed can be obtained.
[0023] Furthermore, after obtaining the list of compressed arrays based on the offset and the compressed array, the method further includes:
[0024] Clear the array to be compressed so that the cleared array can continue to receive new memory address data.
[0025] Furthermore, the index values of the compressed array and the index values of the array to be compressed are the same. After obtaining the list of compressed arrays based on the offset and the compressed array, the method further includes:
[0026] In response to a memory address query request, obtain the index value carried in the memory address query request;
[0027] The target compressed array is determined by querying the compressed array list based on the index value.
[0028] The target compressed array is queried according to the index value to determine the location data in the target compressed array corresponding to the index value;
[0029] The array before compression is obtained based on the positioning data and the offset.
[0030] According to a second aspect of this disclosure, an array compression processing apparatus is provided, the apparatus comprising:
[0031] The data acquisition unit is used to acquire multiple memory address data contained in the array to be compressed;
[0032] A partitioning unit is used to divide the array to be compressed into offsets and positioning data corresponding to each of the multiple memory address data according to the multiple memory address data, wherein the offsets are used to represent the common parts among the multiple memory address data, and the positioning data are used to represent the different parts among the multiple memory address data;
[0033] The first processing unit is used to obtain a compressed array based on the location data corresponding to each of the multiple memory address data;
[0034] The second processing unit is used to obtain a list of compressed arrays based on the offset and the compressed array.
[0035] Further, the partitioning unit includes:
[0036] The determining module is configured to determine the offset between the multiple memory address data based on the multiple memory address data.
[0037] The comparison module is used to compare the offset with multiple memory address data respectively to obtain the positioning data corresponding to each of the multiple memory address data.
[0038] Furthermore, the determining module is also used for:
[0039] Obtain the common bytes among the multiple memory address data; obtain the offset based on the common bytes among the multiple memory address data.
[0040] Furthermore, the comparison module is also used for:
[0041] The offset is compared with multiple memory address data to obtain the different bytes among the multiple memory address data; based on the different bytes among the multiple memory address data, the positioning data corresponding to each of the multiple memory address data is obtained.
[0042] Furthermore, the device also includes:
[0043] An array acquisition unit is used to acquire a data shard array to be compressed, wherein the memory address data in the data shard array to be compressed exceeds the array capacity limit of the array to be compressed.
[0044] The sharding processing unit is used to shard the data to be sharded array to obtain a sharded array that meets the array capacity limit of the array to be compressed.
[0045] A storage unit is used to store the fragmented array into the array to be compressed, so that after the array to be compressed is full, multiple memory address data contained in the array to be compressed can be obtained.
[0046] Furthermore, the index values of the compressed array are consistent with the index values of the array to be compressed, and the device further includes:
[0047] A response unit is used to respond to a memory address query request and obtain the index value carried in the memory address query request;
[0048] The first determining unit is configured to query the compressed array list based on the index value to determine the target compressed array;
[0049] The second determining unit is used to query the target compressed array according to the index value to determine the positioning data in the target compressed array corresponding to the index value;
[0050] The third processing unit is used to obtain the uncompressed array based on the positioning data and the offset.
[0051] According to a third aspect of this disclosure, an electronic device is provided, comprising:
[0052] At least one processor; and
[0053] A memory communicatively connected to the at least one processor; wherein,
[0054] The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform any of the methods described.
[0055] According to a fourth aspect of this disclosure, a non-transitory computer-readable storage medium is provided storing computer instructions, wherein the computer instructions are configured to cause the computer to perform the method according to any one of the following methods.
[0056] According to a fifth aspect of this disclosure, a computer program product is provided, the computer program product comprising: a computer program stored in a readable storage medium, wherein at least one processor of an electronic device can read the computer program from the readable storage medium, and the at least one processor executes the computer program to cause the electronic device to perform the method described in the first aspect.
[0057] According to the technology disclosed herein, the following steps are performed: First, multiple memory address data contained in the array to be compressed are obtained. Then, based on the multiple memory address data, the array to be compressed is divided into offsets and corresponding location data for each of the multiple memory address data. The offsets are used to represent the common parts among the multiple memory address data, and the location data are used to represent the different parts among the multiple memory address data. Based on the location data corresponding to each of the multiple memory address data, a compressed array is obtained. Finally, based on the offsets and the compressed array, a compressed array list is obtained.
[0058] This disclosure uses a common portion (e.g., the same byte) among multiple memory address data as an offset and different portions (e.g., different bytes) as positioning data for indexing. The positioning data corresponding to each memory address data is then stored in a compressed array. The offset and the compressed array are then stored in a compressed array list. Therefore, this embodiment of the disclosure can achieve compressed storage of memory address data, significantly reducing the storage space occupied by the memory address array, while not affecting the efficiency of indexing and memory addressing, thereby improving memory analysis efficiency.
[0059] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description
[0060] The accompanying drawings are provided to better understand this solution and do not constitute a limitation of this disclosure. Wherein:
[0061] Figure 1 This is a flowchart of an array compression processing method provided according to an embodiment of the present disclosure;
[0062] Figure 2 This is a flowchart of an optional array compression processing method provided according to an embodiment of the present disclosure;
[0063] Figure 3 This is a flowchart of an optional array compression processing method provided according to an embodiment of the present disclosure;
[0064] Figure 4This is a schematic diagram of a scenario in which an optional array compression processing method can be implemented according to the embodiments of this disclosure;
[0065] Figure 5 This is a flowchart of an optional array compression processing method provided according to an embodiment of the present disclosure;
[0066] Figure 6 This is a schematic diagram of the frame of an array compression processing apparatus provided according to an embodiment of the present disclosure;
[0067] Figure 7 This is a schematic diagram of the framework of an optional array compression processing apparatus provided according to an embodiment of the present disclosure;
[0068] Figure 8 This is a block diagram of an electronic device used to implement an array compression processing method according to an embodiment of the present disclosure. Detailed Implementation
[0069] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.
[0070] First, let me explain the terms used in this application:
[0071] A standalone server is a single server that provides all the functions, and all client connections are made on a single server.
[0072] A core dump, sometimes jokingly referred to as "spitting out the kernel" in Chinese, is a disk file created by the operating system when a process terminates due to certain signals. It writes the contents of the process's address space and other information about the process's state. This information is often used for debugging. When a process is about to terminate abnormally, you can choose to save all of the process's user-space memory data to disk; the filename is usually "core," and this is called a core dump.
[0073] Memory image: A backup of the memory data used by an application during runtime in an operating system, stored in a local file.
[0074] The Java Development Kit (JDK) is a software development kit for the Java language, primarily used for Java applications on mobile and embedded devices.
[0075] jhat is a virtual machine heap dump snapshot analysis tool that comes with the Java Virtual Machine.
[0076] Hadoop Distributed File System, or HDFS for short, is a distributed file system that typically contains a metadata node (namenode) and data nodes (datanodes). The namenode mainly stores file metadata information. If the namenode fails, the entire HDFS becomes unusable.
[0077] Currently, the Java Virtual Machine heap dump snapshot analysis tool jhat, included in the Java Development Kit (JDK), can analyze Java memory usage and provide thread stack information. However, the problem is that because jhat directly loads the memory mapping into memory and decompresses some compressed data, it cannot run programs with large amounts of memory. Therefore, jhat has limited or no ability to analyze memory images larger than 4GB.
[0078] In addition, there is another method that uses the program debugging tool GDB to analyze memory images. However, since GDB requires partially loading the entire memory image to restore the program execution process, which is purely manual analysis, it is impossible to perform detailed analysis of memory using GDB.
[0079] To address the aforementioned issues, this disclosure provides an array compression processing method, apparatus, device, and storage medium, applicable to the fields of cloud computing and big data technologies. This array compression processing method is used for online analysis on a target server, and under normal circumstances, memory compression can achieve a compression rate of over 60%, while not affecting the efficiency of indexing and memory addressing.
[0080] Example 1
[0081] Figure 1 This is a flowchart of an array compression processing method provided according to embodiments of the present disclosure, such as... Figure 1 As shown, this disclosure provides an array compression processing method, which includes the following steps:
[0082] S101, obtain multiple memory address data contained in the array to be compressed.
[0083] S102, based on multiple memory address data, divide the array to be compressed into offsets and positioning data corresponding to each of the multiple memory address data.
[0084] The aforementioned offset is used to characterize the common parts among the multiple memory address data, and the aforementioned positioning data is used to characterize the different parts among the multiple memory address data.
[0085] S103, based on the location data corresponding to each of the multiple memory address data, obtains the compressed array.
[0086] S104, based on the offset and the compressed array, obtain a list of compressed arrays.
[0087] In this example, the array to be compressed can be an array containing multiple memory address data, such as an array that is already full of memory address data. Since the array that is already full of memory address data will occupy too much storage space, the array compression method of this embodiment can be used to compress the array to reduce the storage space occupied by the memory address array.
[0088] Optionally, the multiple memory address data can be all or part of the array to be compressed, and the memory address data can be the memory address of an object.
[0089] In one example, the memory of a typical physical server (i.e., a single-machine server) is generally between 4GB and several terabytes. For some applications, most objects (type variables, instances after memory allocation, occupying a certain amount of memory) are generally very small, a few bytes or tens of bytes, usually less than a few kilobytes, while the storage space from 0x0000 to 0xffff can hold objects of up to 63 kilobytes in size.
[0090] In one example, for a 64-bit operating system, a memory address occupies approximately 64 bits (8 bytes) of storage space. For instance, two bytes of space between 0xffffffffffff0000 and 0xffffffffffffffff can store an object of size 63KB.
[0091] like Figure 2 As shown, 0xffffffffff000111-0xffffffffff000631 is a compressed array consisting of multiple memory address data. Each memory address data in this compressed array occupies 8 bytes, but there are 6 bytes of the same size (e.g., ...). Figure 2 The offset shown is 0xffffffffff000). Therefore, when saving the object's memory address, the object's memory address can be directly compressed: the common part between multiple memory address data (the same data part, such as the same byte) is used as the offset, and the other different parts between multiple memory address data (such as different bytes) are used as the positioning data. Each positioning data occupies only two bytes in the compressed array. In this way, each memory address data can reduce the size occupied by 6 bytes in the compressed array.
[0092] Then, the location data corresponding to each of the multiple memory address data is stored into the compressed array, so that the compressed array list is formed based on the offset and multiple compressed arrays (e.g., compressed array 1, compressed array 2, compressed array 3, etc.).
[0093] In this embodiment, the common parts (same data, such as the same byte) between multiple memory address data are used as an offset, and the different parts (different data, such as different bytes) between multiple memory address data are used as positioning data for indexing. The positioning data corresponding to each of the multiple memory address data are then stored to obtain a compressed array. The offset and the compressed array are then stored in a compressed array list. Therefore, this embodiment can achieve compressed storage of memory address data, greatly reducing the storage space occupied by the memory address array, while not affecting the efficiency of indexing and memory addressing, thereby improving memory analysis efficiency.
[0094] In another example, the memory address array compression method provided in this disclosure can be used for online analysis on a target server. Generally, memory compression can achieve a compression rate of over 60% without affecting indexing and memory addressing efficiency. For example, if the original memory address array (i.e., the array to be compressed) occupies 16*8 bits of space, and using this disclosure, the compressed array only occupies 5*8 bits of space, then the array compression method provided in this disclosure can achieve a compression rate of 5 / 16 = 31.25 for the memory address array.
[0095] Example 2
[0096] According to one or more embodiments of this disclosure, based on multiple memory address data, the array to be compressed is divided into offsets and positioning data corresponding to each of the multiple memory address data, including:
[0097] S201, determine the offset between multiple memory address data based on multiple memory address data.
[0098] S202, compare the offset with multiple memory address data respectively to obtain the positioning data corresponding to each of the multiple memory address data.
[0099] In the above embodiments, each memory address data includes: location data and offset corresponding to the memory address data.
[0100] As an optional implementation, for a 64-bit memory address data array to be compressed, the offsets between multiple memory address data in the array to be compressed can be calculated first, that is, the common bytes between multiple memory address data, such as... Figure 3 As shown, after calculating the offset (0xffffffffff000), the offset is compared with all the memory address data in the array to be compressed to obtain the different bytes between each memory address data and other memory address data. Then, the different bytes between each memory address data and other memory address data are used as the location data corresponding to each of the multiple memory address data. Finally, the multiple location data are used to obtain the compressed array.
[0101] Since the location data is used to characterize the different bytes between multiple memory address data, the index value of the compressed array is consistent with the index value of the array to be compressed. In this embodiment of the disclosure, the above compression processing method can preserve the fast index lookup performance of the array while ensuring a certain compression ratio.
[0102] According to one or more embodiments of this disclosure, determining the offset between multiple memory address data based on multiple memory address data includes:
[0103] S301 retrieves the same byte between multiple memory address data.
[0104] S302, obtains the offset based on the common bytes between multiple memory address data.
[0105] Still Figure 2 or Figure 3 As shown, the same byte between multiple memory address data is obtained, and the same byte between multiple memory address data is used as the offset of multiple memory address data (0xffffffffff000), or it can be regarded as the offset between multiple location data in the compressed array.
[0106] In the same memory address array, the identical bytes among multiple memory address data are obviously more important than the different bytes among multiple memory address data for memory analysis, indexing, and memory addressing. However, these identical bytes occupy too much storage space. Therefore, the embodiments of this disclosure can be used to extract the identical bytes among multiple memory address data and use them as offsets. Each compressed array has a corresponding offset, which allows the compressed array to be restored to the array to be compressed. Furthermore, the different bytes among multiple memory address data are extracted and used as the content to be included in the compressed array.
[0107] In this embodiment of the disclosure, the identical bytes in the array to be compressed are compressed into an offset, and the different parts are stored as the focus, which can be used as an index and memory addressing.
[0108] According to one or more embodiments of this disclosure, the offset is compared with multiple memory address data respectively to obtain the positioning data corresponding to each of the multiple memory address data, including:
[0109] S401, compare the offset with multiple memory address data respectively to obtain the different bytes among the multiple memory address data.
[0110] S402, based on the different bytes among the above memory address data, obtain the location data corresponding to each of the multiple memory address data.
[0111] In one example, after extracting the common bytes between multiple memory address data as offsets, the offsets can be compared with the multiple memory address data separately to obtain the non-common bytes between each memory address data and other memory address data. These non-common bytes are then used as the location data corresponding to each memory address data.
[0112] Therefore, the embodiments of this disclosure can achieve compressed storage of memory address data, greatly reducing the storage space occupied by the memory address array, while retaining the location data corresponding to each memory address data, without affecting the efficiency of indexing and memory addressing.
[0113] Example 3
[0114] In one example, in big data and recommendation applications, where an application can consume a large amount of memory (ranging from 100GB to several terabytes), memory information is analyzed primarily on the target server, rarely downloaded for analysis. For example... Figure 4 As shown, in the HDFS big data file system application, a Namenode application with 200GB of memory stores the file system's metadata information. To analyze the application's memory structure and storage usage, a memory mapping file can be generated using jmap and saved locally on the target server. This memory mapping file can be used by the target server's memory mapping parsing module to analyze the application's object structure, such as object version, signature, size, type, and distribution. Understanding this allocation information reveals the memory usage. Conversely, this information can be used to optimize the memory allocator, for example, by determining the specific block size and the granularity of each block.
[0115] In one example, when an application is running, it first requests memory of the corresponding size from the memory allocator when creating an object. The memory allocator usually requests a large contiguous block of memory (e.g., 500M) from the system and then allocates this block of memory into smaller blocks (e.g., each block is 1K, so 500*1024K=500M can be allocated). When an object is created (let's say it's 100 bytes), the memory allocator will request 100 bytes of size in one of the 1K memory blocks.
[0116] However, if an object is 4KB, a single block may not be enough, requiring multiple smaller blocks to be joined together before allocating to the requester. If there aren't several contiguous memory blocks to form a 4KB space, the allocator cannot allocate memory, and the program will fail. Even if there is non-contiguous available space exceeding 4KB, an error will still occur; this is memory fragmentation, which renders the memory unusable and impacts memory utilization efficiency. Understanding the object size distribution—that is, roughly knowing the memory requirements of objects—allows for pre-configuration of memory block sizes and memory strategies (memory reclamation).
[0117] For large-scale applications like big data or recommendation systems, the memory consumption is enormous, with hundreds of millions or even billions of objects or variables. Using a single array to store these objects is impractical because arrays occupy contiguous memory, resulting in an excessively large array. Therefore, array sharding can be considered, breaking down the large memory address array into smaller, uncompressed arrays. This avoids occupying contiguous memory and allows for larger offsets during compression, leading to a higher compression ratio.
[0118] According to one or more embodiments of this disclosure, before obtaining the multiple memory address data contained in the array to be compressed, the above method further includes:
[0119] S501, obtain the data shard array to be compressed, where the memory address data in the data shard array exceeds the array capacity limit of the array to be compressed.
[0120] S502, the array to be sharded is sharded to obtain a sharded array that meets the array capacity limit of the array to be compressed.
[0121] S503 uses the sliced array as the array to be compressed, so that after the array to be compressed is full, it can retrieve multiple memory address data contained in the array to be compressed.
[0122] In one example, in the memory mapping resolution module of the target server, such as Figure 5As shown, the array containing memory address data can be judged to determine whether it belongs to the array to be sharded. If the memory address data of a memory address array exceeds the array capacity limit of the array to be compressed, then the memory address array is the array to be sharded. In this embodiment of the disclosure, an array sharder can be used to shard the array to be sharded to obtain a sharded array that meets the array capacity limit of the array to be compressed.
[0123] In one embodiment, after the array after being processed into a shard is used as the array to be compressed, it is determined whether the array to be compressed is full. If it is determined that the array to be compressed is full, multiple memory address data contained in the array to be compressed are obtained to execute the array compression processing method provided in the above embodiments of this disclosure, such as steps S101-S104.
[0124] In another embodiment, after determining that the array to be compressed is not full, the system continues to wait for new memory address data to be input until the array to be compressed is full, and then executes the array compression processing method provided in steps S101-S104 above.
[0125] In this embodiment of the present disclosure, a large array of data to be compressed can be processed into individual arrays to be compressed, thus avoiding the use of contiguous memory space. At the same time, the offset during compression processing can be larger, resulting in a higher compression ratio.
[0126] In another example, after obtaining the list of compressed arrays based on the offset and the compressed array, the above method also includes: clearing the array to be compressed so that the cleared array to be compressed can continue to receive new memory address data.
[0127] It is understandable that, since the array to be compressed receives externally stored memory address data, in order to achieve continuous and large-capacity storage of memory address data, in this embodiment of the disclosure, the array to be compressed that is already full is compressed, and then the offset and the compressed array are stored in the compressed array list. In order to avoid duplicate storage of memory address data, the array to be compressed can be cleared thereafter, so that the cleared array to be compressed can continue to receive new memory address data, thereby achieving continuous storage of more memory address data and avoiding a large amount of storage space occupation.
[0128] Example 4
[0129] According to one or more embodiments of this disclosure, Figure 6 This is a flowchart illustrating an optional array compression method provided according to an embodiment of the present disclosure, as shown below. Figure 6 As shown, the index values of the compressed array are the same as the index values of the array to be compressed. After obtaining the list of compressed arrays based on the offset and the compressed array, the above method also includes:
[0130] S601, in response to a memory address lookup request, retrieves the index value carried in the memory address lookup request.
[0131] S602, query the list of compressed arrays based on the index value to determine the target compressed array.
[0132] S603, query the target compressed array according to the index value to determine the location data in the target compressed array corresponding to the index value.
[0133] S604, obtains the uncompressed array based on positioning data and offset.
[0134] In this embodiment of the disclosure, since the index value of the compressed array and the index value of the array to be compressed are the same, after obtaining the list of compressed arrays, when indexing the memory address in the array, the position of the compressed array in the list of compressed arrays can be indexed first to find the corresponding target compressed array. Then, the corresponding positioning data can be found in the corresponding target compressed array, and the array before compression can be obtained based on the positioning data and the offset.
[0135] In this embodiment of the disclosure, the above-described compression method can preserve the fast indexing and querying performance of the array while ensuring a certain compression ratio.
[0136] According to one or more of the above embodiments, the array compression processing method provided in this disclosure is used to obtain multiple memory address data contained in the array to be compressed; based on the multiple memory address data, the array to be compressed is divided into offsets and positioning data corresponding to each of the multiple memory address data; based on the positioning data corresponding to each of the multiple memory address data, a compressed array is obtained; based on the offsets and the compressed array, a compressed array list is obtained.
[0137] This disclosure uses a common byte among multiple memory address data as an offset and different bytes among multiple memory address data as positioning data for indexing. Furthermore, since the positioning data corresponding to each of the multiple memory address data is stored as a compressed array, the index values of this compressed array are consistent with the index values of the array to be compressed. This compression method preserves both fast array index lookup and a certain compression ratio.
[0138] Therefore, the embodiments of this disclosure can achieve compressed storage of memory address data, greatly reducing the storage space occupied by the memory address array, while not affecting the efficiency of indexing and memory addressing, thereby improving memory analysis efficiency.
[0139] In one example, an array contains many elements (such as memory address data), each with a sequence number used to locate its position within the array. This sequence number is the index value, which can then be used to find the corresponding element in the array. In this embodiment, because the index values of the compressed array and the array to be compressed are consistent, fast indexing and lookup of the compressed array is still possible. Using array compression not only saves storage space but also does not affect the efficiency of indexing and looking up data in the array.
[0140] This disclosed solution can compress the original memory address array (i.e., the array to be compressed) to a certain extent, saving storage space. Moreover, the compression process does not affect the fast indexing and querying of the compressed array. Therefore, it plays an important role in memory analysis.
[0141] The collection, storage, use, processing, transmission, provision, and disclosure of user personal information involved in the technical solution disclosed herein comply with the provisions of relevant laws and regulations and do not violate public order and good morals.
[0142] Example 5
[0143] According to embodiments of this disclosure, Figure 7 This is a schematic diagram of the framework of an array compression processing apparatus provided according to an embodiment of the present disclosure, such as... Figure 7 As shown, this disclosure also provides an array compression processing apparatus 700, which includes:
[0144] The data acquisition unit 701 is used to acquire multiple memory address data contained in the array to be compressed.
[0145] The partitioning unit 702 is used to partition the array to be compressed into offsets and positioning data corresponding to each of the multiple memory address data according to the multiple memory address data, wherein the offsets are used to represent the common parts among the multiple memory address data, and the positioning data are used to represent the different parts among the multiple memory address data.
[0146] The first processing unit 703 is used to obtain a compressed array based on the location data corresponding to each of the above memory address data.
[0147] The second processing unit 704 is used to obtain a list of compressed arrays based on the aforementioned offset and the aforementioned compressed array.
[0148] Using the array compression processing apparatus provided in this embodiment, multiple memory address data contained in the array to be compressed are obtained; the array to be compressed is divided into offsets and positioning data corresponding to each of the multiple memory address data according to the multiple memory address data; a compressed array is obtained based on the positioning data corresponding to each of the multiple memory address data; and a compressed array list is obtained based on the offsets and the compressed array.
[0149] In this embodiment, the common parts (same data, such as the same byte) between multiple memory address data are used as an offset, and the different parts (different data, such as different bytes) between multiple memory address data are used as positioning data for indexing. The positioning data corresponding to each of the multiple memory address data are then stored to obtain a compressed array. The offset and the compressed array are then stored in a compressed array list. Therefore, this embodiment can achieve compressed storage of memory address data, greatly reducing the storage space occupied by the memory address array, while not affecting the efficiency of indexing and memory addressing, thereby improving memory analysis efficiency.
[0150] According to one or more embodiments of this disclosure, the above-mentioned dividing unit includes:
[0151] The determination module is used to determine the offset between the multiple memory address data based on the multiple memory address data.
[0152] The comparison module is used to compare the offset with the multiple memory address data respectively to obtain the positioning data corresponding to each of the multiple memory address data.
[0153] According to one or more embodiments of this disclosure, the determination module is further configured to:
[0154] Obtain the common bytes among the multiple memory address data; based on the common bytes among the multiple memory address data, obtain the offset.
[0155] According to one or more embodiments of this disclosure, the comparison module is further configured to:
[0156] The offset is compared with the memory address data to obtain the different bytes among the memory address data; based on the different bytes among the memory address data, the location data corresponding to each of the memory address data is obtained.
[0157] According to one or more embodiments of this disclosure, the above-described apparatus further includes:
[0158] The array acquisition unit is used to acquire the data slice array to be compressed, wherein the memory address data in the data slice array to be compressed exceeds the array capacity limit of the array to be compressed.
[0159] The sharding processing unit is used to shard the above-mentioned data to be sharded array to obtain a sharded array that meets the array capacity limit of the above-mentioned data to be compressed array.
[0160] The storage unit is used to store the array after the above-mentioned fragmentation process into the array to be compressed, so that after the array to be compressed is full, multiple memory address data contained in the array to be compressed can be obtained.
[0161] According to one or more embodiments of this disclosure, the index values of the compressed array and the index values of the array to be compressed are the same, and the apparatus further includes:
[0162] The response unit is used to respond to the memory address query request and obtain the index value carried in the memory address query request.
[0163] The first determining unit is used to query the compressed array list based on the index value to determine the target compressed array.
[0164] The second determining unit is used to query the compressed array of the target based on the index value to determine the positioning data in the compressed array of the target corresponding to the index value.
[0165] The third processing unit is used to obtain the uncompressed array based on the above positioning data and the above offset.
[0166] According to embodiments of this disclosure, this disclosure also provides an electronic device, a readable storage medium, and a computer program product.
[0167] According to embodiments of the present disclosure, the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to perform the method according to any one of the above claims.
[0168] According to embodiments of this disclosure, a computer program product is provided, comprising: a computer program stored in a readable storage medium, at least one processor of an electronic device being able to read the computer program from the readable storage medium, and the at least one processor executing the computer program causing the electronic device to perform the solution provided in any of the above embodiments.
[0169] According to embodiments of this disclosure, this disclosure also provides an electronic device. Figure 8A schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.
[0170] like Figure 8 As shown, device 800 includes a computing unit 801, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 802 or a computer program loaded from storage unit 808 into random access memory (RAM) 803. RAM 803 may also store various programs and data required for the operation of device 800. The computing unit 801, ROM 802, and RAM 803 are interconnected via bus 804. Input / output (I / O) interface 805 is also connected to bus 804.
[0171] Multiple components in device 800 are connected to I / O interface 805, including: input unit 806, such as keyboard, mouse, etc.; output unit 807, such as various types of monitors, speakers, etc.; storage unit 808, such as disk, optical disk, etc.; and communication unit 809, such as network card, modem, wireless transceiver, etc. Communication unit 809 allows device 800 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0172] The computing unit 801 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above, such as the array compression processing method. For example, in some embodiments, the array compression processing method may be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program may be loaded and / or installed on device 800 via ROM 802 and / or communication unit 809. When the computer program is loaded into RAM 803 and executed by the computing unit 801, one or more steps of the array compression processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the array compression processing method by any other suitable means (e.g., by means of firmware).
[0173] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.
[0174] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.
[0175] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
[0176] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).
[0177] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with embodiments of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.
[0178] Computer systems can include clients and servers. Clients and servers are generally geographically separated and typically interact via communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. A server can be a cloud server, also known as a cloud computing server or cloud host, a hosting product within the cloud computing service ecosystem, addressing the shortcomings of traditional physical hosts and VPS (Virtual Private Server, or simply "VPS") services, such as high management difficulty and weak business scalability. Servers can also be servers for distributed systems or servers incorporating blockchain technology.
[0179] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.
[0180] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.
Claims
1. An array compression processing method, the method comprising: Obtain the data shard array to be compressed, wherein the memory address data in the data shard array to be compressed exceeds the array capacity limit of the array to be compressed; The array of data to be sharded is sharded to obtain a sharded array that meets the array capacity limit of the array to be compressed; The fragmented array is stored in the array to be compressed, so that after the array to be compressed is full, multiple memory address data contained in the array to be compressed can be obtained. Based on the multiple memory address data, the array to be compressed is divided into offsets and positioning data corresponding to each of the multiple memory address data, wherein the offsets are used to represent the same bytes among the multiple memory address data, and the positioning data are used to represent the different bytes among the multiple memory address data; Based on the location data corresponding to each of the multiple memory address data, a compressed array is obtained; the index value of the compressed array is the same as the index value of the array to be compressed. Based on the offset and the compressed array, a list of compressed arrays is obtained; Clear the array to be compressed so that the cleared array can continue to receive new memory address data.
2. The method according to claim 1, wherein, The step of dividing the array to be compressed into offsets and positioning data corresponding to each of the multiple memory address data based on the multiple memory address data includes: Based on the multiple memory address data, determine the offset between the multiple memory address data; The offset is compared with multiple memory address data respectively to obtain the positioning data corresponding to each of the multiple memory address data.
3. The method according to claim 2, wherein, Determining the offset between the multiple memory address data based on the multiple memory address data includes: Obtain the same bytes among multiple memory address data; The offset is obtained based on the same bytes among the multiple memory address data.
4. The method according to claim 2, wherein, The offset is compared with multiple memory address data respectively to obtain the location data corresponding to each of the multiple memory address data, including: The offset is compared with the multiple memory address data respectively to obtain the different bytes among the multiple memory address data; Based on the different bytes among the multiple memory address data, the location data corresponding to each of the multiple memory address data is obtained.
5. The method according to any one of claims 1 to 4, after obtaining the compressed array list based on the offset and the compressed array, the method further includes: In response to a memory address query request, obtain the index value carried in the memory address query request; The target compressed array is determined by querying the compressed array list based on the index value. The target compressed array is queried according to the index value to determine the location data in the target compressed array corresponding to the index value; The array before compression is obtained based on the positioning data and the offset.
6. An array compression processing apparatus, the apparatus comprising: An array acquisition unit is used to acquire a data shard array to be compressed, wherein the memory address data in the data shard array to be compressed exceeds the array capacity limit of the array to be compressed. The sharding processing unit is used to shard the data to be sharded array to obtain a sharded array that meets the array capacity limit of the array to be compressed. A storage unit is used to store the fragmented array into the array to be compressed, so that after the array to be compressed is full, multiple memory address data contained in the array to be compressed can be obtained. A data acquisition unit is used to acquire multiple memory address data contained in the array to be compressed; A partitioning unit is used to divide the array to be compressed into offsets and positioning data corresponding to each of the multiple memory address data according to the multiple memory address data, wherein the offsets are used to represent the same bytes among the multiple memory address data, and the positioning data are used to represent the different bytes among the multiple memory address data; The first processing unit is used to obtain a compressed array based on the location data corresponding to each of the multiple memory address data; the index value of the compressed array is the same as the index value of the array to be compressed; The second processing unit is used to obtain a list of compressed arrays based on the offset and the compressed array; Clear the array to be compressed so that the cleared array can continue to receive new memory address data.
7. The apparatus according to claim 6, wherein, The partitioning unit includes: The determining module is configured to determine the offset between the multiple memory address data based on the multiple memory address data. The comparison module is used to compare the offset with multiple memory address data respectively to obtain the positioning data corresponding to each of the multiple memory address data.
8. The apparatus according to claim 7, wherein, The determining module is also used for: Obtain the common bytes among the multiple memory address data; obtain the offset based on the common bytes among the multiple memory address data.
9. The apparatus according to claim 7, wherein, The comparison module is also used for: The offset is compared with multiple memory address data to obtain the different bytes among the multiple memory address data; based on the different bytes among the multiple memory address data, the positioning data corresponding to each of the multiple memory address data is obtained.
10. The apparatus according to any one of claims 6 to 9, wherein the apparatus further comprises: A response unit is used to respond to a memory address query request and obtain the index value carried in the memory address query request; The first determining unit is configured to query the compressed array list based on the index value to determine the target compressed array; The second determining unit is used to query the target compressed array according to the index value to determine the positioning data in the target compressed array corresponding to the index value; The third processing unit is used to obtain the uncompressed array based on the positioning data and the offset.
11. An electronic device, comprising: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer-readable storage medium storing computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-5.
13. A computer program product comprising a computer program that, when executed by a processor, implements the steps of the method according to any one of claims 1-5.