A data storage method and apparatus

By dynamically adjusting the bitmap structure in variable-length and fixed-length modes, the problems of space occupation and query efficiency when storing massive amounts of time-stamped data are solved, achieving efficient data storage and querying.

CN116028497BActive Publication Date: 2026-06-19ZHEJIANG ZEEKR INTELLIGENT TECH CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG ZEEKR INTELLIGENT TECH CO LTD
Filing Date
2023-01-06
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

When storing massive amounts of time-stamped data, existing technologies result in large storage space consumption and low query efficiency.

Method used

Data is stored using bitmap structures with both variable-length and fixed-length modes. By determining the bitmap structure mode, the data storage method is dynamically adjusted, and storage space and query efficiency are optimized by combining time offset and data bit settings.

🎯Benefits of technology

It effectively saves storage space, reduces space complexity, and has low time complexity during queries, thus improving query efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116028497B_ABST
    Figure CN116028497B_ABST
Patent Text Reader

Abstract

This disclosure provides a data storage method and apparatus. The method includes: acquiring target data to be stored; reading a bitmap structure from a database based on a target identifier in the target data; and determining the mode of the bitmap structure. When the bitmap structure is determined to be in a variable-length mode, writing a target time from the target data into a newly added time unit in the data portion of the bitmap structure, wherein the data portion of the variable-length bitmap structure contains at least one time unit. When the bitmap structure is determined to be in a fixed-length mode, setting corresponding data bits in the data portion of the bitmap structure based on the target time in the target data, wherein the storage location of each data bit in the data portion of the fixed-length bitmap structure corresponds to a time point. This method can reduce the space occupied by data storage, thereby reducing storage costs, and improve the efficiency of data retrieval, thereby reducing query latency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of computer technology, and more specifically to a data storage method and apparatus. Background Technology

[0002] With the development of internet technology, users on the internet generate a large amount of exposure data every day. If the massive amount of data with time tags is not processed when storing the data and is stored directly as is, it will occupy a lot of storage space. Moreover, when querying data, it is necessary to traverse each piece of data, which is not efficient. Summary of the Invention

[0003] In view of this, embodiments of the present disclosure provide at least one data storage method and apparatus.

[0004] Specifically, the embodiments of this disclosure are implemented through the following technical solutions:

[0005] Firstly, a method is provided, the method comprising:

[0006] Obtain the target data to be stored, read the bitmap structure from the database based on the target identifier in the target data, and determine the mode of the bitmap structure;

[0007] When the bitmap structure is determined to be in variable length mode, the target time in the target data is written into the newly added time unit in the data part of the bitmap structure, and the data part of the bitmap structure in variable length mode contains at least one time unit.

[0008] When the bitmap structure is determined to be in fixed-length mode, the corresponding data bits in the data part of the bitmap structure are set according to the target time in the target data. The storage location of each data bit in the data part of the fixed-length bitmap structure corresponds to a time point.

[0009] In conjunction with any embodiment of this disclosure, the step of writing the target time from the target data into a newly added time unit in the data portion of the bitmap structure includes:

[0010] The target time in the target data is converted into a time offset, and the time offset is written into the newly added time unit in the data part of the bitmap structure.

[0011] In conjunction with any embodiment of this disclosure, when it is determined that the bitmap structure is in variable length mode, writing the target time from the target data into a newly added time unit in the data portion of the bitmap structure includes...

[0012] Parse the data portion of the bitmap structure and determine whether the data length of the data portion is less than a preset threshold.

[0013] If the target time is less than the target time, the target time in the target data is written into the newly added time unit of the data part of the bitmap structure.

[0014] In conjunction with any embodiment of this disclosure, the method further includes:

[0015] If it is not less than, the bitmap structure of the variable length mode is converted into the bitmap structure of the fixed length mode, and the corresponding data bits in the data part of the bitmap structure are set according to the target time in the target data.

[0016] In conjunction with any embodiment of this disclosure, before acquiring the target data to be stored, the method further includes:

[0017] Obtain multiple historical data sets;

[0018] For each piece of historical data, the historical data is converted into a first bitmap structure of the variable-length mode and a second bitmap structure of the fixed-length mode, and the data length deviation between the first bitmap structure and the second bitmap structure is calculated.

[0019] Determine the target data length deviation that is the smallest positive value among the multiple data length deviations;

[0020] The data length of the first bit of the variable-length pattern corresponding to the target data length deviation is determined as a preset threshold.

[0021] In any embodiment of this disclosure, the data portion of the bitmap structure in the fixed-length mode is stored in a compressed state in the database, and the step of setting the corresponding data bits in the data portion of the bitmap structure according to the target time in the target data includes:

[0022] The compressed data portion in the bitmap structure is decompressed to obtain the uncompressed data portion.

[0023] Based on the target time in the target data, set the corresponding data bits in the uncompressed data portion;

[0024] The data portion set in the bitmap structure is compressed to generate a new bitmap structure.

[0025] In conjunction with any embodiment of this disclosure, determining the mode of the bitmap structure includes:

[0026] The mode of the bitmap structure is determined based on the identifier portion of the bitmap structure.

[0027] In conjunction with any embodiment of this disclosure, the method further includes:

[0028] In response to receiving a query request, based on the target identifier in the query conditions of the query request, the bitmap structure in the database is read, and the mode of the bitmap structure is determined; the query conditions also include the time range to be queried; the query request is used to query the number of target data corresponding to the target identifier within the time range;

[0029] When it is determined that the bitmap structure is in variable length mode, the data part of the bitmap structure is parsed, the time units in the data part are traversed, and the number of target time units whose time stored in the time unit is within the time range is determined.

[0030] When the bitmap structure is determined to be in a fixed-length mode, the number of data bits that have been set in at least one storage location of the data portion corresponding to the time range is determined.

[0031] In any embodiment of this disclosure, the target identifier is a user identifier, the target data is user behavior data, and the target time is behavior time;

[0032] The step of reading the bitmap structure from the database based on the target identifier in the target data includes:

[0033] The bitmap structure of the target user is read from the database based on the user identifier in the user behavior data.

[0034] In a second aspect, a data storage device is provided, the device comprising:

[0035] The data reading module is used to: acquire target data to be stored, read the bitmap structure in the database according to the target identifier in the target data, and determine the mode of the bitmap structure;

[0036] The data writing module is used to: when it is determined that the bitmap structure is in a variable length mode, write the target time in the target data into a newly added time unit in the data part of the bitmap structure, wherein the data part of the bitmap structure in the variable length mode contains at least one time unit;

[0037] When the bitmap structure is determined to be in fixed-length mode, the corresponding data bits in the data part of the bitmap structure are set according to the target time in the target data. The storage location of each data bit in the data part of the fixed-length bitmap structure corresponds to a time point.

[0038] Thirdly, an electronic device is provided, the device including a memory and a processor, the memory being used to store computer instructions executable on the processor, and the processor being used to implement the data storage method described in any embodiment of this disclosure when executing the computer instructions.

[0039] Fourthly, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements the data storage method described in any embodiment of the present disclosure.

[0040] The data storage method provided in this disclosure utilizes two bitmap structures to store target data corresponding to a target identifier. In variable-length mode, when storing target data, the target time is written into a newly added time unit in the data part of the bitmap structure. After each writing of the target time, the data length of the data part of the bitmap structure in variable-length mode increases, with one time unit added for each data write, instead of writing the complete target data, which can effectively save storage space. In fixed-length mode, when storing target data, the corresponding data bit in the data part of the bitmap structure is set according to the target time in the target data. The data length of the data part of the bitmap structure in fixed-length mode remains unchanged. The storage location of each data bit corresponds to a time point. Each time the setting is made according to the target time, only the value of the original data bit needs to be changed, which greatly reduces space complexity and makes it easy to directly query data with minimal time complexity based on the storage location. Attached Figure Description

[0041] To more clearly illustrate the technical solutions in one or more embodiments or related technologies of this disclosure, the accompanying drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments recorded in one or more embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0042] Figure 1 This is a flowchart illustrating a data storage method according to at least one embodiment of the present disclosure;

[0043] Figure 2 This is a schematic diagram illustrating two modes of a bitmap structure as shown in at least one embodiment of this disclosure;

[0044] Figure 3 This is a flowchart illustrating a data storage process according to at least one embodiment of the present disclosure;

[0045] Figure 4 This is a flowchart illustrating a data query process according to at least one embodiment of the present disclosure;

[0046] Figure 5 This is a schematic diagram illustrating at least one embodiment of the present disclosure when exposure data is stored in a bitmap structure in a fixed-length mode;

[0047] Figure 6 This is a block diagram illustrating a data storage device according to at least one embodiment of the present disclosure;

[0048] Figure 7 This is a block diagram of another data storage device shown in at least one embodiment of the present disclosure;

[0049] Figure 8 This is a schematic diagram of the hardware structure of an electronic device shown in at least one embodiment of the present disclosure. Detailed Implementation

[0050] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this specification. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this specification as detailed in the appended claims.

[0051] The terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to be limiting of this specification. The singular forms “a,” “the,” and “the” as used in this specification and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.

[0052] It should be understood that although the terms first, second, third, etc., may be used in this specification to describe various information, this information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this specification, first information may also be referred to as second information, and similarly, second information may also be referred to as first information. Depending on the context, the word "if" as used herein may be interpreted as "when," "when," or "in response to determination."

[0053] like Figure 1 As shown, Figure 1 This is a flowchart illustrating at least one embodiment of a data storage method, which can be used in a device with computing processing capabilities, and includes the following steps:

[0054] In step 102, the target data to be stored is obtained, the bitmap structure in the database is read according to the target identifier in the target data, and the mode of the bitmap structure is determined.

[0055] The target data to be stored is data containing the target time and target identifier. It can be data sent by other devices, such as data read from a message queue. The message queue can contain multiple target data to be stored. The target data in the message queue is read one by one and stored into the database sequentially using the data storage method of this embodiment.

[0056] The target time can be the generation time of the target data, and the target identifier is used to identify the owner of different target data. For example, when the target data is user behavior data, the target time is the behavior time, and the target identifier is the user identifier. Or, for example, when the target data is network fault data, the target time is the time the fault occurred, and the target identifier is the device identifier of the device where the fault occurred.

[0057] In this embodiment, target data with the same target identifier is stored in the same bitmap structure. The bitmap structure corresponding to the target identifier can be read from the database storing multiple bitmap structures through the mapping relationship between the target identifier and the bitmap structure.

[0058] In one embodiment, determining the mode of the bitmap structure includes: determining the mode of the bitmap structure based on the identifier portion of the bitmap structure.

[0059] The bitmap structure includes an identifier portion and a data portion. The values ​​of the data bits in the identifier portion differ in different modes, and the mode can be identified through the identifier portion. In other embodiments, the mode of the bitmap structure can also be determined by the data length of the bitmap structure, since the data lengths of fixed-length and variable-length modes are different.

[0060] A bitmap structure is a data structure used to store data. A bitmap structure consists of multiple data bits, also known as "bits," with values ​​of 0 or 1. In this embodiment, the bitmap structure has two modes:

[0061] The first type is variable-length mode. The bitmap structure in variable-length mode includes an identifier portion and a data portion. The data portion has a variable-length structure and contains at least one time unit. Each time unit contains multiple data bits, and the data length of each time unit is the same. Each time unit is used to store a point in time. As more data is stored, the data portion becomes longer. In one example, when the data length of the data portion exceeds a certain threshold, it will switch to fixed-length mode, which can compress space.

[0062] The second type is the fixed-length mode. The bitmap structure in fixed-length mode also includes an identifier portion and a data portion. The data portion is a fixed, immutable structure, and its length can be determined by those skilled in the art based on actual business needs. The smallest unit of the data portion is 1 bit. Each data bit in the data portion corresponds to a specific time point in time. For example, when the data portion is 00010010, since the values ​​of the 4th and 7th data bits are 1, the storage location of the 4th bit represents the 4th second, and the storage address of the 7th bit represents the 7th second. This indicates that the target data corresponding to the 4th and 7th seconds has been written into the bitmap structure.

[0063] The identifier for the fixed-length mode is different from that for the variable-length mode, in order to distinguish between the two modes.

[0064] The following examples illustrate this. Figure 2 The two modes of the bitmap structure are explained. In this example, in order to further save storage space, the time offset is stored in the data part of the bitmap structure. That is, a base time is set in advance, and the offset between the target time and the base time is stored in the bitmap structure.

[0065] like Figure 2 As shown, in the bitmap structure of the variable-length mode, bits 1 to 16 are the identifier portion, with all bits being 0. Bits 16 and onwards are the data portion. The time units in the data portion are used to store the time offset, with each time unit being 32 bits. For example, if the base time is 00:00:00 on November 20, 2022, and the target time is 15:10:01 on November 22, 2022, the time offset of the written time unit is 2 days 15:10:01.

[0066] In the fixed-length bitmap structure, bits 1 to 16 are also the identifier. This identifier stores the date offset of the bitmap structure. For example, if the base date in the base time is November 20, 2022, and the target date in the target time is November 22, 2022, with a date offset of 2, then the identifier would be stored as 0000 0000 0000 0010. Bits 16 and onwards are also data. The data part is a fixed-length structure, with a minimum unit of 1 bit. The value of each data bit is 0 or 1, indicating whether target data for the corresponding time point has been written. The first bit of the data part is the 16+1 bit of the bitmap structure. For the current bit in the data part, such as bit T+16, if the target data is user behavior data, a bit value of 1 indicates that a user action occurred at second T of the target date, while a bit value of 0 indicates that no action occurred.

[0067] The mode of the bitmap structure can be determined by identifying that the first 16 bits are all 0, thus determining that the bitmap structure is a variable length mode, and by identifying that the first 16 bits are not all 0, thus determining that the bitmap structure is a fixed length mode.

[0068] In step 104, when it is determined that the bitmap structure is in variable length mode, the target time in the target data is written into the newly added time unit of the data part of the bitmap structure, and the data part of the bitmap structure in variable length mode contains at least one time unit.

[0069] When the bitmap structure is determined to be in variable length mode, a new time unit can be added directly to the end of the original data part, and the target time is written into the newly added time unit. After each writing of the target time, the data length of the data part of the bitmap structure in variable length mode increases, and each time data is written, a time unit is added, thus generating a new bitmap structure.

[0070] In one embodiment, the step of writing the target time from the target data into a newly added time unit in the data portion of the bitmap structure includes:

[0071] The target time in the target data is converted into a time offset, and the time offset is written into the newly added time unit in the data part of the bitmap structure.

[0072] With a pre-set base time, the target time can be subtracted from the base time to obtain the time offset. The time offset is then written into the time unit. At this point, the time unit can be set to be shorter, meaning that each time unit includes fewer data bits, further reducing the storage space occupied.

[0073] As the length of the variable-length pattern gradually increases, the storage space it occupies becomes larger and larger. In order to compress the space, it needs to be converted to a fixed-length pattern. In one embodiment, when it is determined that the bitmap structure is in a variable-length pattern, writing the target time from the target data into a newly added time unit in the data portion of the bitmap structure includes:

[0074] Parse the data portion of the bitmap structure and determine whether the data length of the data portion is less than a preset threshold.

[0075] If the target time is less than the target time, the target time in the target data is written into the newly added time unit of the data part of the bitmap structure.

[0076] If it is not less than, the bitmap structure of the variable length mode is converted into the bitmap structure of the fixed length mode, and the corresponding data bits in the data part of the bitmap structure are set according to the target time in the target data.

[0077] There is a threshold value, or preset threshold, between variable-length mode and fixed-length mode. When the data length of the data part in variable-length mode is less than the preset threshold, less data bits are used for data storage in variable-length mode. When the data length of the data part in variable-length mode is greater than the preset threshold, less data bits are used for data storage in fixed-length mode.

[0078] Therefore, when the data is less than the preset threshold, the variable-length mode is used for storage. When the data is not less than the preset threshold, the variable-length mode is converted to the fixed-length mode, and the data is stored in the fixed-length mode. In this way, data is flexibly stored according to the actual storage situation, which further saves storage space.

[0079] The following example illustrates a method for calculating a preset threshold. It should be noted that those skilled in the art can employ other methods for calculating preset thresholds based on actual needs, and this embodiment does not impose any limitations on this. Before acquiring the target data to be stored, the method further includes:

[0080] Obtain multiple historical data sets;

[0081] For each piece of historical data, the historical data is converted into a first bitmap structure of the variable-length mode and a second bitmap structure of the fixed-length mode, and the data length deviation between the first bitmap structure and the second bitmap structure is calculated.

[0082] Determine the target data length deviation that is the smallest positive value among the multiple data length deviations;

[0083] The data length of the first bit of the variable-length pattern corresponding to the target data length deviation is determined as a preset threshold.

[0084] For example, all historical data are converted into a first bitmap structure Bi in variable length mode and a second bitmap structure Gi in fixed length mode, and the data length deviation Ti = Bi - Gi is calculated respectively. All Ti are arranged in ascending order and the first Ti > 0 is found, or all Ti are arranged in descending order and the last Ti > 0 is found. This determines the target data length deviation. The Bi corresponding to the target length deviation Ti is returned, and the data length of the data part of Bi is the preset threshold.

[0085] In step 106, when the bitmap structure is determined to be in a fixed-length mode, the corresponding data bits in the data part of the bitmap structure are set according to the target time in the target data. The storage location of each data bit in the data part of the fixed-length bitmap structure corresponds to a time point.

[0086] When the initial value of the data bit is 0, the target time can be stored by setting the value of the data bit corresponding to the target time to 1; or, when the initial value of the data bit is 1, the target time can be stored by setting the value of the data bit corresponding to the target time to 0.

[0087] Since the data portion of the fixed-length bitmap has a fixed length and often cannot be fully stored, many undefined data bits still occupy a large amount of storage space. To avoid the problem of sparse bitmaps wasting storage space, the data portion of the bitmap structure in the fixed-length bitmap can be compressed to further reduce the storage space occupied.

[0088] In one embodiment, the data portion of the bitmap structure in the fixed-length mode is stored in a compressed state in the database, and setting the corresponding data bits in the data portion of the bitmap structure according to the target time in the target data includes:

[0089] The compressed data portion in the bitmap structure is decompressed to obtain the uncompressed data portion.

[0090] Based on the target time in the target data, set the corresponding data bits in the uncompressed data portion;

[0091] The data portion set in the bitmap structure is compressed to generate a new bitmap structure.

[0092] This embodiment does not limit the compression and decompression algorithms used. For example, it can use run-length encoding (RLE) or an open-source decompression tool that implements the riding-bitmap algorithm.

[0093] After writing the target time from the target data into the data portion of the bitmap structure, or after setting the corresponding data bits in the data portion of the bitmap structure, the method further includes: writing a new bitmap structure into the database based on the target identifier.

[0094] After using the above method to store data with reduced space complexity, data querying can also be performed with reduced time complexity to reduce data latency during queries. Based on the above embodiments, the method further includes:

[0095] In response to receiving a query request, based on the target identifier in the query conditions of the query request, the bitmap structure in the database is read, and the mode of the bitmap structure is determined; the query conditions also include the time range to be queried; the query request is used to query the number of target data corresponding to the target identifier within the time range;

[0096] When it is determined that the bitmap structure is in variable length mode, the data part of the bitmap structure is parsed, the time units in the data part are traversed, and the number of target time units whose time stored in the time unit is within the time range is determined.

[0097] When the bitmap structure is determined to be in a fixed-length mode, the number of data bits that have been set in at least one storage location of the data portion corresponding to the time range is determined.

[0098] In this example, when querying the number of target data corresponding to a target identifier within a certain time range, the corresponding bitmap structure is first read from the database based on the mapping relationship between the target identifier and the bitmap structure. The time complexity of this read operation is O(1), meaning that the time complexity remains unchanged as the data size increases. This is because Redis has an indexing mechanism when storing the bitmap structure corresponding to the target identifier, which is not affected by the data size.

[0099] Then, determine the mode of the bitmap structure.

[0100] When in variable length mode, the time units in the data portion are traversed, and the time points stored in the time units are compared to see if they are within the time range. The number of target time units whose stored time is within the time range is also counted.

[0101] In fixed-length mode, since the data bits in the bitmap use their own storage locations to represent the time points they store, the number of time points within the time range can be determined directly by querying whether the corresponding data bit is in the target state. For example, when the initial value of a data bit is 0, the target state can be 1, and the target time can be stored by setting the value of the data bit corresponding to the target time to 1; or, when the initial value of a data bit is 1, the target state can be 0, and the target time can be stored by setting the value of the data bit corresponding to the target time to 0. The time complexity of this query method is O(T), where T is the time range to be queried. In other words, the time complexity of this step is only related to the time range to be queried, without needing to traverse the entire data portion, resulting in very high query efficiency.

[0102] This embodiment is used to query target data within a certain time range. Similarly, it can also be used to query whether target data exists at a specific time point, which will not be elaborated here.

[0103] In one embodiment, the data storage method of this embodiment can be used to store user behavior data on the network, wherein the target identifier is a user identifier, the target data is user behavior data, and the target time is behavior time; the step of reading the bitmap structure in the database according to the target identifier in the target data includes: reading the bitmap structure of the target user in the database according to the user identifier in the user behavior data.

[0104] The following is combined with Figure 3 The process of writing user behavior data into the database is explained.

[0105] Step 1: Retrieve a user behavior data entry from the message queue. The user behavior data includes information such as user ID (Identity document) and behavior time. Using the user ID, read the bitmap structure of the current user stored in the Redis database. In other examples, other types of databases can also be used.

[0106] Step 2: Parse the bitmap structure and determine if it is a variable-length bitmap. If yes, proceed to step S1; otherwise, proceed to step S2.

[0107] S1: Parse the data and determine if the data length is less than the threshold. If yes, proceed with process K1; otherwise, proceed with process K2.

[0108] K1: Convert the behavior time into a time offset, write it to the data part, and generate a new bitmap structure.

[0109] K2: Converts the variable-length bitmap structure into a fixed-length bitmap structure, sets the corresponding bits in the data part of the bitmap to 1 according to the behavior time, and then uses the roaring-bitmap to compress and generate a new bitmap structure.

[0110] S2: Use the roaring-bitmap to decompress the data portion, set the corresponding bit in the bitmap structure to 1 according to the behavior time, and then use the roaring-bitmap to compress the data portion to generate a new bitmap structure.

[0111] Step 3: Write the new bitmap structure to Redis based on the user ID. Retrieve the next message from the message queue and continue to Step 1.

[0112] The following is combined with Figure 4 This section explains the process of reading user behavior data from the database and querying the number of behaviors within a specific time period.

[0113] Step 1: Read the bitmap structure from Redis based on the user ID in the query conditions. The query conditions include the user ID, the current time (local), and the time period T to be queried.

[0114] Step 2: Determine if the bitmap structure is in variable length mode. If yes, proceed to step S1; otherwise, proceed to step S2.

[0115] S1: Parse the data, iterate through the time offsets, convert the base time into the action time, and count the number R of time offsets within the range of [local-T,local).

[0116] S2: Use the roaring-bitmap to decompress the data portion and count the number of bits R with a value of 1 in the range [local-T,local).

[0117] Step 3: Return the query result R, where R is the number of user actions corresponding to the user ID within the time period T from the current local time.

[0118] For example, if all users generate 100 million impressions online in a single day, with 500 million users, that's an average of 20 impressions per user. Here, "impression" refers to exposure to a specific object. For instance, when a user scrolls through a friend's post on social media, from the moment the post is published until the user sees it, that's one impression. The object in this context refers to the friend's post.

[0119] Assuming each user ID is a 16-byte string and the behavior time is a Long type (64 bits, 8 bytes), then if the exposure data or user behavior data is stored as is without structuring, the storage format is as shown in Table 1 below:

[0120] User ID Behavioral Time vhnvsprsgrboqqps 1663516811000(2022-09-19 00:00:11) vhnvsprsgrboqqps 1663520400000(2022-09-19 01:00:00) xrcufjemzmkpivqx 1663530400000(2022-09-19 03:46:40) ... ...

[0121] Table 1

[0122] For 100 million exposures, each data entry occupies 24 bytes of storage space (16 bytes + 8 bytes), so 100 million data entries would occupy approximately 2.4 billion bytes (approximately 2.2 GB).

[0123] If the data storage method shown in this embodiment is used for storage, for example, if the bitmap structure of the above user ID: vhnvsprsgrboqqps uses a fixed-length mode to store the user's exposure for one day, such as... Figure 5The image shows the uncompressed fixed-length pattern. In this pattern, the identifier portion indicates the date as September 19, 2022. The 11th and 3600th bits of the data portion have values ​​of 1, indicating that exposure occurred at 00:00:11 and 01:00:00 on September 19, 2022, respectively. After compression with the roaring bitmap, it occupies 36 bytes (16 bytes for the user ID + 20 bytes for the compressed bitmap structure). Therefore, 5 million users would occupy approximately 0.16GB. The comparison shows that this solution reduces storage space complexity by 13.75 times.

[0124] Furthermore, regarding the computational time complexity when querying data, if the storage and querying method of this embodiment is not used, but instead the data is stored as is and traversed for querying, then: 100 million exposures generated by users in a day would require traversing 100 million data entries, comparing user IDs and behavior times to see if they meet the query conditions and returning the number of entries. The time complexity is O(M), where M is the total number of exposures. This means that the time used for querying is related to the total amount of data stored in the database, resulting in very slow query efficiency and a significant amount of time consumption.

[0125] If the above method of this embodiment is used for querying: the bitmap can be directly retrieved from Redis by user ID. At this time, due to the use of the indexing mechanism, the time complexity is O(1). Then, the number of exposures within the most recent T (seconds) time period is queried from the data target of the bitmap structure, and the time complexity is O(T). The greater M is than T, the higher the efficiency of this method for querying. The data storage method of this embodiment can be applied to fields such as recommendation, online broadcasting, and search, and is used to store data such as user clicks, user exposures, and user search volume, and can store the generated data in real time.

[0126] like Figure 6 As shown, Figure 6 This is a block diagram illustrating a data storage device according to at least one embodiment of the present disclosure, the device comprising:

[0127] The data reading module 61 is used to: acquire target data to be stored, read the bitmap structure in the database according to the target identifier in the target data, and determine the mode of the bitmap structure;

[0128] The data writing module 62 is used to: when it is determined that the bitmap structure is in a variable length mode, write the target time in the target data into a newly added time unit in the data part of the bitmap structure, wherein the data part of the bitmap structure in the variable length mode contains at least one time unit;

[0129] When the bitmap structure is determined to be in fixed-length mode, the corresponding data bits in the data part of the bitmap structure are set according to the target time in the target data. The storage location of each data bit in the data part of the fixed-length bitmap structure corresponds to a time point.

[0130] In one implementation, the data writing module 62, when used to write the target time from the target data into the newly added time unit of the data portion of the bitmap structure, specifically performs the following:

[0131] The target time in the target data is converted into a time offset, and the time offset is written into the newly added time unit in the data part of the bitmap structure.

[0132] In one implementation, the data writing module 62, when used to write the target time from the target data into a newly added time unit in the data portion of the bitmap structure when it is determined that the bitmap structure is in a variable-length mode, is specifically used for:

[0133] Parse the data portion of the bitmap structure and determine whether the data length of the data portion is less than a preset threshold.

[0134] If the target time is less than the target time, the target time in the target data is written into the newly added time unit of the data part of the bitmap structure.

[0135] In one embodiment, the data writing module 62 is further configured to: if the length is not less than the target time, convert the bitmap structure of the variable-length mode into the bitmap structure of the fixed-length mode, and set the corresponding data bits in the data portion of the bitmap structure according to the target time in the target data.

[0136] In one implementation, such as Figure 7 As shown, the device further includes a threshold calculation module 60, configured to: acquire multiple historical data before acquiring the target data to be stored; for each historical data, convert the historical data into a first bitmap structure of the variable-length mode and a second bitmap structure of the fixed-length mode, and calculate the data length deviation between the first bitmap structure and the second bitmap structure; determine the target data length deviation that is the smallest positive value among the multiple data length deviations; and determine the data length of the data portion of the first bitmap structure of the variable-length mode corresponding to the target data length deviation as a preset threshold.

[0137] In one embodiment, the data portion of the bitmap structure in the fixed-length mode is stored in a compressed state in the database. The data writing module 62, when setting the corresponding data bit in the data portion of the bitmap structure according to the target time in the target data, is specifically used for:

[0138] The compressed data portion in the bitmap structure is decompressed to obtain the uncompressed data portion.

[0139] Based on the target time in the target data, set the corresponding data bits in the uncompressed data portion;

[0140] The data portion set in the bitmap structure is compressed to generate a new bitmap structure.

[0141] In one implementation, the data reading module 61, in the mode for determining the bitmap structure, is specifically used for:

[0142] The mode of the bitmap structure is determined based on the identifier portion of the bitmap structure.

[0143] In one embodiment, the device further includes: a data query module 63, used for:

[0144] In response to receiving a query request, based on the target identifier in the query conditions of the query request, the bitmap structure in the database is read, and the mode of the bitmap structure is determined; the query conditions also include the time range to be queried; the query request is used to query the number of target data corresponding to the target identifier within the time range;

[0145] When it is determined that the bitmap structure is in variable length mode, the data part of the bitmap structure is parsed, the time units in the data part are traversed, and the number of target time units whose time stored in the time unit is within the time range is determined.

[0146] When the bitmap structure is determined to be in a fixed-length mode, the number of data bits that have been set in at least one storage location of the data portion corresponding to the time range is determined.

[0147] In one implementation, the target identifier is a user identifier, the target data is user behavior data, and the target time is behavior time;

[0148] The data reading module 61, when used to read the bitmap structure in the database based on the target identifier in the target data, is specifically used for:

[0149] The bitmap structure of the target user is read from the database based on the user identifier in the user behavior data.

[0150] The specific implementation process of the functions and roles of each module in the above device can be found in the implementation process of the corresponding steps in the above method, and will not be repeated here.

[0151] This disclosure also provides an electronic device, such as... Figure 8As shown, the electronic device includes a memory 81 and a processor 82. The memory 81 is used to store computer instructions that can be run on the processor, and the processor 82 is used to implement the data storage method described in any embodiment of this disclosure when executing the computer instructions.

[0152] This disclosure also provides a computer program product, which includes a computer program / instructions that, when executed by a processor, implement the data storage method described in any embodiment of this disclosure.

[0153] This disclosure also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the data storage method described in any embodiment of this disclosure.

[0154] For the device embodiments, since they basically correspond to the method embodiments, the relevant parts can be referred to in the description of the method embodiments. The device embodiments described above are merely illustrative. The modules described as separate components may or may not be physically separate, and the components shown as modules may or may not be physical modules, that is, they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of the solution in this specification according to actual needs. Those skilled in the art can understand and implement this without creative effort.

[0155] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.

[0156] Other embodiments of this specification will readily occur to those skilled in the art upon consideration of the specification and practice of the invention claimed herein. This specification is intended to cover any variations, uses, or adaptations that follow the general principles of this specification and include common knowledge or customary techniques in the art not claimed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this specification are indicated by the following claims.

[0157] It should be understood that this specification is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this specification is limited only by the appended claims.

[0158] The above description is merely a preferred embodiment of this specification and is not intended to limit this specification. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this specification should be included within the scope of protection of this specification.

Claims

1. A data storage method, characterized by, The method includes: Obtain the target data to be stored, read the bitmap structure from the database based on the target identifier in the target data, and determine the mode of the bitmap structure; When the bitmap structure is determined to be in variable length mode, the target time in the target data is written into the newly added time unit of the data part of the bitmap structure. The data part of the bitmap structure in variable length mode contains at least one time unit, and the target time is the generation time of the target data. When the bitmap structure is determined to be in fixed-length mode, the corresponding data bits in the data part of the bitmap structure are set according to the target time in the target data. The storage location of each data bit in the data part of the fixed-length bitmap structure corresponds to a time point.

2. The method of claim 1, wherein, The step of adding a new time unit to the data portion of the bitmap structure by writing the target time from the target data includes: The target time in the target data is converted into a time offset, and the time offset is written into the newly added time unit in the data part of the bitmap structure.

3. The method of claim 1, wherein, When it is determined that the bitmap structure is in variable length mode, writing the target time from the target data into a newly added time unit in the data portion of the bitmap structure includes: Parse the data portion of the bitmap structure and determine whether the data length of the data portion is less than a preset threshold. If the target time is less than the target time, the target time in the target data is written into the newly added time unit of the data part of the bitmap structure.

4. The method of claim 3, wherein, The method further includes: If it is not less than, the bitmap structure of the variable length mode is converted into the bitmap structure of the fixed length mode, and the corresponding data bits in the data part of the bitmap structure are set according to the target time in the target data.

5. The method of claim 3, wherein, Before acquiring the target data to be stored, the method further includes: Obtain multiple historical data sets; For each piece of historical data, the historical data is converted into a first bitmap structure of the variable-length mode and a second bitmap structure of the fixed-length mode, and the data length deviation between the first bitmap structure and the second bitmap structure is calculated. Determine the target data length deviation that is the smallest positive value among the multiple data length deviations; The data length of the first bit of the variable-length pattern corresponding to the target data length deviation is determined as a preset threshold.

6. The method of claim 1, wherein, The data portion of the bitmap structure in the fixed-length mode is stored in a compressed state in the database. Setting the corresponding data bits in the data portion of the bitmap structure according to the target time in the target data includes: The compressed data portion in the bitmap structure is decompressed to obtain the uncompressed data portion. Based on the target time in the target data, set the corresponding data bits in the uncompressed data portion; The data portion set in the bitmap structure is compressed to generate a new bitmap structure.

7. The method of claim 1, wherein, The determination of the mode of the bitmap structure includes: The mode of the bitmap structure is determined based on the identifier portion of the bitmap structure.

8. The method of claim 1, wherein, The method further includes: In response to receiving a query request, based on the target identifier in the query conditions of the query request, the bitmap structure in the database is read, and the mode of the bitmap structure is determined; the query conditions also include the time range to be queried; the query request is used to query the number of target data corresponding to the target identifier within the time range; When it is determined that the bitmap structure is in variable length mode, the data part of the bitmap structure is parsed, the time units in the data part are traversed, and the number of target time units whose time stored in the time unit is within the time range is determined. When the bitmap structure is determined to be in a fixed-length mode, the number of data bits that have been set in at least one storage location of the data portion corresponding to the time range is determined.

9. The method according to any one of claims 1 to 8, characterized in that, The target identifier is a user identifier, the target data is user behavior data, and the target time is the behavior time; The step of reading the bitmap structure from the database based on the target identifier in the target data includes: The bitmap structure of the target user is read from the database based on the user identifier in the user behavior data.

10. A data storage device, characterized by The device includes: The data reading module is used to: acquire target data to be stored, read the bitmap structure in the database according to the target identifier in the target data, and determine the mode of the bitmap structure; The data writing module is used to: when it is determined that the bitmap structure is in variable length mode, write the target time in the target data into the newly added time unit of the data part of the bitmap structure, wherein the data part of the bitmap structure in variable length mode contains at least one time unit, and the data length of the data part of the bitmap structure in variable length mode increases after each writing of the target time, wherein the target time is the generation time of the target data; When the bitmap structure is determined to be in fixed-length mode, the corresponding data bits in the data part of the bitmap structure are set according to the target time in the target data. The storage location of each data bit in the data part of the fixed-length bitmap structure corresponds to a time point.

11. An electronic device, comprising: The device includes a memory and a processor, the memory being used to store computer instructions that can run on the processor, and the processor being used to implement the method of any one of claims 1 to 9 when executing the computer instructions.

12. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method according to any one of claims 1 to 9.

Citation Information

Patent Citations

  • Redis (Remote dictionary server)-based data storage method and reading method and device

    CN108509592A

  • Data storage method and device, equipment and storage medium

    CN113742332A

  • Variable-length / fixed-length data conversion method and apparatus

    GB0306838D0