A big data storage and retrieval method in a high concurrency scenario

By aligning variable-length data and introducing multi-dimensional attribute bitmaps and physical tombstone bitmap mechanisms, the problems of cache consistency conflicts and write amplification effects in high-concurrency scenarios are solved, improving system throughput and query response speed, and achieving efficient storage and retrieval.

CN122240522APending Publication Date: 2026-06-19BEIJING BAIGUAN TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING BAIGUAN TECHNOLOGY CO LTD
Filing Date
2026-03-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In high-concurrency scenarios, traditional storage engines are prone to cache consistency conflicts, write amplification effects, and excessive garbage collection overhead when processing variable-length data, leading to increased system throughput and query response latency, making it difficult to meet high availability requirements.

Method used

By aligning variable-length data according to the hardware cache line size and appending placeholder bytes, and utilizing multidimensional attribute bitmaps and physical tombstone bitmap mechanisms, strict isolation and asynchronous processing of data envelopment are achieved. Combined with an invalid byte accumulator to calculate the fragmentation rate, the storage and retrieval process is optimized.

Benefits of technology

It effectively prevents cache consistency conflicts, reduces write amplification, improves data write throughput, reduces query latency, and ensures efficient reclamation of storage space and retrieval consistency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240522A_ABST
    Figure CN122240522A_ABST
Patent Text Reader

Abstract

This application relates to the technical field of computer data processing and storage underlying architecture, and discloses a method for big data storage and retrieval in high-concurrency scenarios. This method aligns variable-length data with metadata headers based on the hardware cache line size, appends placeholder bytes to form a data packet envelope, and appends it to active data blocks. When the capacity limit threshold is reached, a physical barrier flush command is issued to convert it into a closed data block. Discrete bit intervals are calculated based on the absolute physical offset and envelope length. Atomic bit setting and invalid byte accumulator updates are performed in the physical tombstone bitmap using a single instruction multiple data stream instruction. A linear scan is performed on active data blocks, and asynchronous read requests are generated for closed data blocks. The results are merged and deduplicated. Based on the physical fragmentation rate, a hardware instruction to calculate Hamming weight is called to count the number of bit settings to locate the target data block, migrate surviving data, and reclaim space. This invention eliminates concurrent false sharing and write amplification effects, improving system throughput.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer data processing and storage underlying architecture technology, specifically a method for big data storage and retrieval in high-concurrency scenarios. Background Technology

[0002] With the increasing prevalence of high-concurrency business scenarios, underlying storage systems need to handle massive, variable-length data writes and real-time query requests. Traditional storage engines typically employ a compact, contiguous memory allocation mechanism when handling concurrent ingestion of variable-length data. In a multi-core, multi-threaded architecture, this contiguous stacking of data without physical alignment and isolation can easily lead to memory addresses operated on by different threads falling within the same underlying hardware cache line, thereby causing cache consistency conflicts at the underlying bus level and severely limiting the overall multi-threaded write throughput of the system.

[0003] For modifying or logically deleting historical data, existing technologies mostly rely on in-situ overwriting or backend structure merging mechanisms. In-situ overwriting requires reading the original physical data page and acquiring an exclusive lock on the entire page, which can cause significant input / output amplification and read / write lock contention and blocking under high load. While append-only structure merging mechanisms alleviate single-point write conflicts, they often require a full scan of the physical file to determine data invalidation states during space compaction and garbage collection, which significantly consumes the disk bandwidth required for frontend business retrieval.

[0004] Furthermore, in mixed read / write scenarios, the real-time ingested active data in memory and the closed historical data already written to disk are physically separated. Conventional concurrent query processes, to ensure strong consistency of global data, must introduce complex global mutex locks or high-overhead multi-version control linked list traversal. This overhead of lock waiting and addressing across physical media leads to a significant increase in long-tail latency in query responses when dealing with sudden surges in concurrent traffic, making it difficult to meet the stringent timeliness requirements of high-availability systems.

[0005] Therefore, this invention proposes a method for big data storage and retrieval in high-concurrency scenarios to address the shortcomings of existing technologies. Summary of the Invention

[0006] To address the shortcomings of existing technologies, this invention provides a method for storing and retrieving big data in high-concurrency scenarios, which solves the problems of false sharing conflicts in the underlying cache, write amplification effect, and excessive garbage collection overhead that are easily caused by variable-length data access in high-concurrency read-write mixed scenarios.

[0007] To achieve the above objectives, the present invention provides a method for big data storage and retrieval in high-concurrency scenarios, comprising the following steps: Based on the hardware cache line size, the variable-length data is aligned with the total length of the generated metadata header after concatenation. The envelope length is calculated and placeholder bytes are added to form a data packet envelope. The data packet envelope is added to the data block in the active state, the multi-dimensional attribute bitmap set is updated, and when the physical length of the data reaches the capacity limit threshold, a physical barrier flush command is issued to convert the data block in the active state into a data block in the closed state. The discrete bit interval is calculated based on the absolute physical offset of the data to be processed and the envelope length. Atomic setting is performed in the physical tombstone bitmap using a single instruction multiple data stream instruction, and atomic accumulation is performed on the associated invalid byte accumulator. Perform a linear scan on data blocks that are in an active state, and generate asynchronous read requests for data blocks that are in a closed state based on the hit physical offset. Merge the search results and use the record primary key to remove duplicates. The physical fragmentation rate is calculated based on the invalid byte accumulator. When the trigger threshold is exceeded, the hardware instruction for calculating the Hamming weight is invoked to count the number of bits set in the physical tombstone bitmap to locate the target data block. The surviving data of the target data block is then migrated and the original physical space is reclaimed.

[0008] Preferably, the step of aligning the variable-length data with the total length of the generated metadata header concatenated according to the hardware cache line size, calculating the envelope length, and appending placeholder bytes to form the data envelope includes: Obtain the original data length of the variable-length data, and generate the metadata header containing the original data length, logical write timestamp, data version number, and cyclic redundancy checksum; The envelope length is obtained by adding the original data length to the fixed length of the metadata header, dividing the sum by the hardware cache line size, rounding up, and then multiplying by the hardware cache line size. Based on the envelope length, placeholder bytes filled with zero values ​​are appended to the end of the variable-length data.

[0009] Preferably, the data packet envelopment is appended to the active data block, the multidimensional attribute bitmap set is updated, and a physical barrier flush command is issued when the data physical length reaches the capacity limit threshold to convert the active data block into a closed data block, including: Record the absolute physical offset of the data network in the storage layer, extract the target service attributes, and use the comparison and exchange instruction to set the bit position corresponding to the global identifier of the active data block to 1 in the multidimensional attribute bitmap set; Monitor the physical length of the accumulated written data in the active data blocks, and trigger an asynchronous write command when it crosses an integer multiple of the physical sector boundary; When the physical length of the cumulative written data reaches the capacity limit threshold, the primary keys of all variable-length data contained in the active data block are extracted to generate a hash dictionary, which is then appended to the end of the active data block. The physical barrier flush command is then issued. After the underlying physical medium returns a synchronization success confirmation message, the active data block is converted into a closed data block.

[0010] Preferably, calculating the discrete bit interval based on the absolute physical offset of the data to be processed and the envelope length includes: Divide the absolute physical offset by the hardware cache line size and round down to obtain the starting bit index; Subtract one from the sum of the absolute physical offset and the envelope length, divide by the hardware cache line size, and round down to obtain the end bit index; The discrete bit range is defined by the start bit index and the end bit index.

[0011] Preferably, atomic setting is performed in the physical tombstone bitmap using a single instruction multiple data stream instruction, and atomic accumulation is performed on the associated invalid byte accumulator, including: The advanced vector extension instruction set is invoked as the single instruction multiple data stream instruction to set all bits in the discrete bit interval defined within the physical tombstone bitmap to 1 in parallel. The invalid byte accumulator is configured as an unsigned integer variable based on the spatial location of the data to be processed; Based on the atomic acquisition and accumulation instruction, the envelope length value of the data to be processed is directly added to the invalid byte accumulator.

[0012] Preferably, a linear scan of the data blocks in an active state includes: The read memory barrier instruction is invoked to force a flush of the local cache, and a lock-free linear scan is performed from the starting address of the active data block to the current write cursor. When parsing the metadata header of the data network, extract the actual length and cyclic redundancy checksum; The variable-length data payload is read based on the actual length and the cyclic redundancy check value is recalculated. The data packet envelopment is extracted as a real-time record residing in memory if and only if the recalculated cyclic redundancy check value is consistent with the cyclic redundancy check sum stored in the metadata header.

[0013] Preferably, for data blocks in a closed state, an asynchronous read request is generated based on the hit physical offset, including: The starting physical page index is calculated by dividing the hit physical offset by the physical page size and rounding down; the ending physical page index is calculated by subtracting one from the sum of the hit physical offset and the payload length, dividing by the physical page size, and rounding down. All calculated physical page indexes are arranged in ascending order of numerical value to form a page sequence; the page sequence is traversed to calculate the difference between adjacent physical page indexes. When the difference is less than or equal to the span threshold, they are merged into a continuous asynchronous read request. When the difference is greater than the span threshold, the page sequence is split into multiple independent asynchronous read requests.

[0014] Preferably, merging search results and using record primary keys for deduplication includes: The historical records returned by the asynchronous read request are merged with the real-time records in full, and the primary key of each data record in the merged result is extracted. A hash table is constructed using the primary key of the record as the mapping feature. When an entry with the same primary key is detected in the hash bucket, the logical write timestamp, data version number, and state priority of the conflicting record are compared. The data record originating from the active data block and with the latest logical write timestamp is retained.

[0015] Preferably, the physical fragmentation rate is calculated based on the invalid byte accumulator, and when it exceeds the trigger threshold, the hardware instruction for calculating the Hamming weight is invoked to count the number of bits set in the physical tombstone bitmap to locate the target data block, including: The physical fragmentation rate is obtained by dividing the total number of invalid bytes recorded in real time in the invalid byte accumulator by the pre-allocated total physical capacity. When the physical fragmentation rate exceeds the trigger threshold and the asynchronous input / output queue depth is lower than the idle threshold, the corresponding physical tombstone bitmap is loaded into the memory layer. The hardware instruction for calculating Hamming weight is invoked to count the total number of invalid bits marked in the physical tombstone bitmap, and the total number of bits is multiplied by the hardware cache line size to obtain the invalid byte size of each data block. The invalid byte size of each data block is sorted in descending order, and the target data block is located by combining the physical creation timestamp.

[0016] Preferably, migrating and reclaiming the surviving data of the target data block from its original physical space includes: Based on the unset bit indices in the physical tombstone bitmap of the target data block, the physical offset of the surviving data is deduced in reverse. The live data is read and re-appended to the active data block; Using the compare and swap instruction, the bit position corresponding to the global identifier of the new active data block is set to 1 in the multidimensional attribute bitmap set. After confirming that the surviving data has been completely migrated, the bit position corresponding to the global identifier of the old data block is cleared to zero using the compare and swap instruction. A space release command is issued to the underlying file system to erase the physical pages occupied by the target data block in order to complete the physical reclamation.

[0017] This invention provides a method for big data storage and retrieval in high-concurrency scenarios. It has the following beneficial effects: 1. This invention performs spatial alignment calculations on variable-length data based on the size of the underlying hardware cache lines, and appends placeholder bytes to encapsulate it into a fixed-length data packet network, which is then appended to the active data block. This technical feature ensures that the memory addresses operated on by concurrent write threads are strictly isolated as integer multiples in physical space, preventing false sharing interference caused by cache coherency protocols when multi-core processors concurrently access close memory addresses. Therefore, it improves the system's data write throughput without introducing operating system mutex locks.

[0018] 2. This invention introduces a physical tombstone bitmap mechanism for updating the state of historical data. Based on the absolute physical offset and envelope length of the data to be processed in the storage medium, a discrete bit interval is calculated, and a single-instruction multiple-data-stream instruction is directly invoked to atomically set the interval. This feature transforms the physical overwrite operation of real data into a reduced-dimensional memory bitmap marking operation, completely eliminating the blocking of read instructions and write-back operations on the original data page during update operations, avoiding the write amplification effect of the underlying storage medium, and thus reducing the long-tail latency of query requests in mixed read-write concurrency scenarios.

[0019] 3. During the storage space reclamation phase, this invention calculates the physical fragmentation rate based on an invalid byte accumulator to achieve lazy triggering, and directly counts the number of bits set in the physical tombstone bitmap loaded into memory by calling hardware instructions for calculating Hamming weight. This mechanism avoids the input / output bus bandwidth occupation caused by a full scan of the underlying physical file during the garbage collection evaluation phase, and can accurately locate the target data block to be compacted with extremely low processor clock cycle overhead. At the same time, after the surviving data is migrated, the comparison and exchange instructions are used to complete the mapping and replacement of the multi-dimensional attribute bitmap, ensuring the addressing consistency of front-end data retrieval during the background physical space reclamation. Attached Figure Description

[0020] Figure 1 This is a schematic diagram of the system architecture of the present invention; Figure 2 This is a schematic diagram of the method flow of the present invention; Figure 3 This is a schematic diagram of the multi-threaded concurrent write throughput test of the present invention; Figure 4 This is a schematic diagram of the P99 long-tail latency test in a read-write hybrid scenario according to the present invention.

[0021] The module consists of: 10, memory layer; 20, storage layer; 100, write module; 200, update module; 300, retrieval module; and 400, recycling module. Detailed Implementation

[0022] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0023] See attached document Figure 1 , Figure 1 This is a schematic diagram of a big data storage and retrieval system architecture under high concurrency scenarios according to an embodiment of the present invention. The big data storage and retrieval system under high concurrency scenarios is divided into a memory layer 10 and a storage layer 20 in terms of hierarchical topology; the memory layer 10 and the storage layer 20 cooperate with each other; the system internally includes a write module 100, an update module 200, a retrieval module 300, and a recycling module 400.

[0024] The physical space of storage layer 20 is located under a preset absolute path on the storage medium. The absolute path is set to the directory / var / lib / db_engine / wal / . The system pre-allocates a fixed-capacity storage segment under the absolute path. Each storage segment is internally divided into fixed-length data blocks. The system opens the underlying storage medium in O_DIRECT mode through the device file interface, bypassing the operating system's page cache mechanism. The underlying storage medium is equipped with power-loss protection capacitors.

[0025] Memory layer 10 allocates a cache pool in user space. The cache pool is allocated via the posix_memalign interface and follows a 4096-byte physical page alignment standard. Data blocks have two states during system runtime: active and closed. Active data blocks reside in the cache pool of memory layer 10 and receive appended data. Closed data blocks are those for which hash dictionaries have been generated and physical barriers have been flushed to disk; they release their occupied space in the cache pool and become read-only.

[0026] The write module 100 handles the system's data write operations. The write module 100 performs spatial envelope calculation based on cache line alignment on the received variable-length data, generating a metadata header carrying the data length and checksum. The write module 100 continuously appends the metadata header, variable-length data, and placeholder bytes to the active data blocks in the cache pool, and updates the multi-dimensional attribute bitmap set in memory layer 10. When data crosses the underlying physical sector boundary, an asynchronous write instruction is triggered; when the capacity limit is reached, a physical barrier flush instruction is issued.

[0027] The update module 200 handles logical deletion and update operations for historical data. The update module 200 calculates discrete bit intervals based on the absolute physical offset and envelope length of the data to be processed in storage layer 20. The update module 200 uses a single instruction multiple data stream (SID) to atomically set the interval in the physical tombstone bitmap and performs numerical accumulation on the invalid byte accumulator corresponding to the storage segment in memory layer 10.

[0028] The retrieval module 300 handles multi-dimensional attribute query operations. It performs a linear scan of active data blocks in the cache pool, using checksums to prevent fragmented data reading. For closed data blocks, the retrieval module 300 obtains the physical offset using a multi-dimensional attribute bitmap set, executes a page masking and hole truncation algorithm based on a 4096-byte boundary to generate an aggregated asynchronous read request, and then merges the two results and deduplicates before outputting.

[0029] The reclamation module 400 handles the reclamation of space in storage layer 20. The reclamation module 400 calculates the physical fragmentation rate using the ratio of the invalid byte accumulator to the total allocated space. When a set threshold is reached, the reclamation module 400 loads the physical tombstone bitmap to locate the data block with the highest number of invalid bytes, migrates the surviving data to the currently active data block, updates the multi-dimensional attribute bitmap set in memory layer 10 through comparison and swap instructions, and finally reclaims the physical space occupied by the original data block.

[0030] See attached document Figure 2 , Figure 2 This is a schematic flowchart of a big data storage and retrieval method for high-concurrency scenarios according to an embodiment of the present invention. The present invention provides a big data storage and retrieval method for high-concurrency scenarios, comprising the following steps: S100 performs variable-length data append writes based on cache line alignment, updates the multidimensional attribute bitmap, and triggers asynchronous writes or physical barrier flushes based on the underlying physical boundaries to complete the data block state transition. S200 calculates the discrete bit interval corresponding to the data to be processed and performs atomic setting in the persistent physical tombstone bitmap, while performing numerical accumulation on the invalid byte accumulator in the memory layer. S300 concurrently scans active data blocks in the cache pool and closed data blocks in the storage layer, executes page masking and hole truncation algorithms to generate asynchronous read requests, merges and deduplicates, and outputs the final retrieval results; S400 loads the physical tombstone bitmap based on the physical fragmentation rate to locate the data block with the highest number of invalid bytes, migrates the surviving data and updates the multidimensional attribute bitmap, and reclaims the original physical space.

[0031] To further clarify the implementation of each technical aspect of the present invention, the following will provide a detailed description of the implementation of each functional module involved above and its internal processing flow.

[0032] In this embodiment, the detailed implementation of step S100 is as follows: data append writing and state transition based on cache line alignment. The write module 100 handles the underlying operating mechanism of variable-length data ingestion, multi-dimensional attribute bitmap projection, and submission of underlying physical media barriers. Specifically, it includes the following sub-steps: S101, the write module 100 calculates the envelope length of the variable-length data, which is aligned upwards to the 64-byte cache line boundary, and generates a metadata header containing the actual length and checksum.

[0033] In this embodiment, based on the underlying general physical mechanism of modern processors typically using cache lines as the basic unit for memory access, to avoid cache consistency traffic storms (i.e., false sharing problems) caused by multiple threads concurrently accessing close memory addresses, the write module 100 obtains the original data length of the variable-length data when it receives a variable-length data write request. Further, the write module 100 generates a fixed-length metadata header, which records the original data length, the logical write timestamp obtained from the system clock, and the initialized data version number. A cyclic redundancy checksum is calculated using the original data length and the variable-length data content and stored in the metadata header. The write module 100 employs an adaptive space filling mechanism based on hardware cache line alignment. It rounds up the total length of the variable-length data and the metadata header according to the hardware cache line size to calculate the envelope length. The formula for calculating the aligned envelope length is: ; In the formula, Indicates the envelope length; This indicates the original data length of variable-length data, and its value is dynamically determined based on the data body of the external business write request. This indicates the fixed length of the metadata header, which is determined by the sum of the system's preset metadata structure fields; This indicates the size of the underlying hardware cache line. As a preferred implementation, the hardware cache line size is fixed at 64 bytes, which strictly matches the L1 data cache line specification of mainstream processors. This represents the floor function operator. It's important to note that the core technology behind this calculation formula aims to achieve physical isolation at the memory operation level by strictly quantizing space to encapsulate variable-length data of arbitrary length into integer multiples of the cache line size.

[0034] Based on the aforementioned envelope length, the writing module 100 appends placeholder bytes to the end of the variable-length data, ensuring that the total physical length of the concatenated data equals the envelope length, thereby forming an aligned data envelope. Preferably, the placeholder bytes are filled with zero-valued data.

[0035] S102, the write module 100 continuously appends the aligned data packet network to the active data blocks in the cache pool of memory layer 10, and updates the multidimensional attribute bitmap set of memory layer 10 in a lock-free manner.

[0036] Based on the completed space alignment encapsulation, the cache pool is configured as a direct memory access cache region conforming to the 4096-byte physical page alignment standard. The write module 100 obtains the memory write pointer of the active data block in the cache pool, sequentially writes the data packet network to the location of the memory write pointer, and records the absolute physical offset of the data packet network in the storage layer 20. The absolute physical offset serves as the addressing reference for subsequent retrieval and status update operations.

[0037] After appending the data, the write module 100 extracts the target business attribute of the variable-length data and locates the corresponding multi-dimensional attribute bitmap set in memory layer 10. The multi-dimensional attribute bitmap set consists of multiple discrete bitmap structures resident in memory. The write module 100 uses atomic operation instructions to set the bit corresponding to the global identifier of the active data block in the multi-dimensional attribute bitmap set to 1. Specifically, the write module 100 relies on the compare and swap instructions provided by the underlying hardware to implement state projection. This mechanism can effectively ensure that update operations in a multi-threaded concurrent execution environment do not generate read-write conflicts without introducing operating system mutex lock scheduling.

[0038] S103, when the appended data crosses the boundary of the underlying 4096-byte physical sector, the write module 100 triggers an asynchronous write command, and when the data block reaches the capacity limit, it issues a physical barrier flush command to complete the closed state transition.

[0039] In the embodiments of this disclosure, the write module 100 implements a dual-state sector batching and physical barrier degradation submission mechanism in the cache pool. To establish precise alignment with the underlying storage hardware, the physical sector boundary size of the underlying physical medium is set to 4096 bytes. This value directly maps to the minimum physical programming page size of mainstream NAND flash memory chips, aiming to avoid read-to-write amplification effects caused by misaligned writes. During operation, the write module 100 monitors the cumulative physical length of written data in active data blocks in real time. When the cumulative physical length of written data crosses an integer multiple of the physical sector boundary, the write module 100 triggers an asynchronous write instruction through the asynchronous input / output interface, submitting the full 4096-byte memory block data to the hardware cache of the underlying physical medium. The calling logic of the asynchronous input / output interface can be implemented using existing asynchronous submission queue technology by those skilled in the art; its task distribution mechanism is well-known in the field and will not be described further here.

[0040] In addition, a capacity limit threshold is set for the data block. This capacity limit threshold refers to the maximum physical storage space pre-allocated by the system for a single data block. As a preferred approach, the value of this capacity limit threshold is set between 64KB and 256KB. The logic for determining this threshold range is not based on a single extreme value, but rather on a weighted balance calculation based on the physical carrying capacity limit of the solid-state drive's concurrent I / O operations per second and the latency overhead of a single synchronization barrier instruction. When the physical length of the accumulated written data in an active data block reaches this capacity limit threshold, the write module 100 extracts the primary keys of all variable-length data contained in the data block, generates a fixed-length hash dictionary, and appends the hash dictionary to the end of the data block. Here, the hash dictionary serves as a verification identifier for the closed state of the data block.

[0041] Subsequently, the write module 100 issues a forced synchronization command containing a physical barrier flush operation to the asynchronous input / output queue. The physical barrier flush operation forces the underlying physical medium to synchronously write the sector data residing in the hardware cache to the non-volatile memory particles. After the underlying physical medium returns a synchronization success confirmation, the write module 100 changes the state of the data block from active to closed and releases the write buffer it occupies in the cache pool.

[0042] In this embodiment, the detailed implementation of step S200 is as follows: non-overlapping state mutation and hierarchical state vector accumulation. The update module 200 executes the underlying physical addressing mechanism for logical deletion of historical data and multi-level failure state updates. Specifically, it includes the following sub-steps: S201, the update module 200 calculates the corresponding discrete bit interval in the physical tombstone bitmap based on the absolute physical offset and alignment envelope length of the data to be processed in the storage layer 20.

[0043] In this embodiment, for large-scale underlying storage systems, frequent physical data overwriting can cause severe write amplification and lead to read-write lock contention. Based on the aforementioned hardware operating mechanism, to avoid consistency issues caused by modifying existing data, the system employs a logical append mechanism based on a physical tombstone bitmap for modifying or deleting historical data. Specifically, when the update module 200 receives a data update or deletion request, it obtains the absolute physical offset of the data to be processed recorded in the storage layer 20 and the envelope length determined in the aforementioned steps. Based on the spatial alignment rules of the underlying hardware cache line size, the physical tombstone bitmap implements a fixed-length spatial mapping. As a preferred approach, a single bit in the physical tombstone bitmap strictly corresponds to a physical storage interval in the storage medium that is the same length as the hardware cache line size. Further, the update module 200 calculates the starting bit index using the absolute physical offset and calculates the ending bit index in combination with the envelope length, thereby defining the discrete bit interval of the target operation. The calculation formulas for the starting bit index and the ending bit index are as follows: ; ; In the formula, Indicates the starting bit index; This represents the absolute physical offset of the data to be processed in the storage layer, and its value is derived from the system's addressing record when the data is first written. Indicates the size of the hardware cache line; This represents the floor function operator; Indicates the end bit index; This indicates the envelope length of the data to be processed.

[0044] It should be noted that the technical purpose of the above calculation formula is to accurately convert the one-dimensional contiguous byte physical address space on the storage medium into a compact bit index range in memory through dimensionality reduction division mapping, providing an addressing reference for subsequent fast masking. Specifically, the operation of subtracting a constant 1 when calculating the end bit index aims to ensure that when the envelope length is exactly an integer multiple of the hardware cache line size, the calculated end bit index remains strictly limited to the actual physical boundaries, avoiding out-of-bounds mapping at the algorithmic level. Because The value is always set to 64 bytes. The denominator of this division operation is a non-zero fixed constant, which avoids the risk of abnormal calculation when the denominator approaches 0 from the logical level.

[0045] S202, the update module 200 uses a single instruction multiple data stream instruction to perform an atomic set operation on the discrete bit range in the physical tombstone bitmap of the storage layer 20.

[0046] Based on the discrete bit ranges determined by the above calculations, the update module 200 loads the physical tombstone bitmap corresponding to the data block to which the data to be processed belongs in the storage layer 20. In multi-threaded concurrent scenarios, conventional bit-by-bit loop assignment can easily lead to cache line thrashing and bus contention. As a preferred underlying implementation mechanism, the update module 200 calls the high-level vector extension instruction set of the underlying architecture to set all bits in the discrete bit range within the physical tombstone bitmap to 1 in parallel, while ensuring that the memory address meets the physical alignment requirements of the data bus. Through this hardware-accelerated single-instruction-cycle operation, the system can complete the atomic setting of large-span bit ranges with wide-bit logical OR operations. Since all data packets in the write phase are strictly aligned to integer multiples of the hardware cache line size, and the absolute physical offset must be the starting point of the alignment boundary, the discrete bit ranges mapped by different variable-length data are independent and do not overlap. This mechanism avoids the possibility of adjacent data sharing the same mapped bit, constructs a concurrent setting mechanism based on strict physical boundary isolation, and prevents the state overlap and pollution problem during concurrent updates.

[0047] S203, the update module 200 extracts the invalid byte accumulator bound to the storage segment to which the data to be processed belongs in the memory layer 10, and performs an atomic accumulation operation to complete the macro state update.

[0048] After accurately setting the underlying bitmap, the update module 200 synchronously executes higher-level status feedback. For high-concurrency systems, frequently scanning the underlying physical tombstone bitmap to obtain the fragmentation level of the storage space will consume a large amount of input / output bandwidth. In this embodiment, the memory layer 10 independently maintains a pure memory-state invalid byte accumulator for each storage segment. The update module 200 locates the corresponding invalid byte accumulator in the memory layer 10 according to the spatial allocation of the data to be processed. To prevent overflow of accumulated values ​​in high-concurrency, massive data scenarios, as a preferred configuration structure, the invalid byte accumulator is pre-configured as a 64-bit unsigned integer variable in memory. Furthermore, the update module 200 directly adds the envelope length value of the data to be processed to the invalid byte accumulator based on hardware-level atomic acquisition and accumulation instructions. The invalid byte accumulator records the total physical byte size of the data marked as invalid in the corresponding storage segment in real time. This macroscopic state update operation decouples the computation of the bottom-level fine-grained failure markers from the top-level coarse-grained state monitoring, providing a direct decision-making basis for subsequent spatial compaction and resource recovery.

[0049] In this embodiment, the detailed implementation of step S300 is as follows: dual-state concurrent retrieval funnel and physical hole truncation. The retrieval module 300 should perform read-write decoupling, physical aggregation addressing, and anti-background data drift deduplication mechanisms when performing multi-dimensional attribute queries. Specifically, it includes the following sub-steps: S301, the retrieval module 300 performs a lock-free linear scan of the active data blocks in the cache pool of memory layer 10, uses checksums to filter out power-down tear segments and extracts real-time records residing in memory.

[0050] In this embodiment, considering the current situation where the latest written data in a high-concurrency scenario has not yet triggered a physical barrier flush, to address the asynchronous issue of invisible resident data during real-time queries, the retrieval module 300, upon receiving a query request, directly locates the active data block mapped to the target business attribute in memory layer 10. Since the active data block continuously receives append write operations, the retrieval module 300 uses a memory pointer offset to perform a lock-free linear scan from the starting address of the active data block to the current write cursor. Based on the time and operational condition alignment requirements of multi-threaded concurrent read / write, as a preferred approach, the retrieval module 300 implicitly calls the underlying hardware's read memory barrier instruction before performing the linear scan, forcibly flushing the processor's local cache, thereby ensuring that the memory state obtained before the write cursor is consistent across the global timeline. During the concurrent scan, to prevent reading incomplete residual data caused by sudden power outages or concurrent thread scheduling interruptions, the retrieval module 300 extracts the actual length and checksum of the records recorded in the metadata header of each data packet network when parsing it. Furthermore, the retrieval module 300 reads the corresponding variable-length data payload based on the actual length and recalculates the cyclic redundancy check value of the payload in memory. The retrieval module 300 determines that the data packet network is complete and valid only if the recalculated check value strictly matches the checksum stored in the metadata header, and extracts it as a real-time record residing in memory. This embodiment prevents the reading of torn data through strict data integrity, achieving physical-level decoupling of querying and writing within the same memory space.

[0051] S302, the retrieval module 300 extracts the hit physical offset for the data block in the closed state, performs mask calculation across a physical page of 4096 bytes, and merges or splits the page sequence according to the set span threshold to generate an asynchronous read request.

[0052] In the embodiments of this disclosure, in parallel with the scanning operation in the cache pool, the retrieval module 300 locates data blocks in a closed state that meet the query conditions in the multi-dimensional attribute bitmap set, and extracts the hit physical offset and its payload length corresponding to the target data. Based on the general physical characteristics of the underlying storage medium performing page-based input / output operations, the retrieval module 300 needs to convert discrete byte-level addressing into the smallest standard read unit recognizable by the underlying hardware before reading. Specifically, the retrieval module 300 performs a masking calculation across a 4096-byte boundary on the extracted hit physical offset to cover multiple physical pages that the target data may span. The specific calculation formula for the physical page index is as follows: ; ; In the formula, Indicates the index of the starting physical page where the target data is located; This represents the physical offset of the target data in the storage medium, and its value is derived from the detection result of the front bitmap. This indicates the physical page size of the underlying physical medium. As a preferred configuration, this value is fixed at 4096 bytes. This represents the floor function operator; Indicates the index of the last physical page where the target data is located; This represents the payload length of the target data. It should be noted that the technical purpose of the above calculation formula is to accurately delineate the actual span range of the target data on the physical medium. The operation of subtracting a constant 1 when calculating the final physical page index is to prevent out-of-bounds reading of the next useless physical page when the payload edge happens to coincide with the physical page boundary, thus ensuring the completeness of the algorithm logic under extreme boundary conditions. Simultaneously, due to the constant in the denominator... Since it is a non-zero fixed value, this algorithm eliminates the risk of division-by-zero anomalies from the bottom layer.

[0053] Based on this, the retrieval module 300 sorts all the calculated physical page indices in ascending order of their numerical values ​​to form a page sequence. To avoid unnecessary use of input / output bus bandwidth caused by merging and reading a large number of consecutive invalid pages, the system pre-sets a span threshold. This span threshold represents the maximum number of non-missing physical pages allowed in a single aggregate read request. As a preferred approach, the span threshold is set to a range of 2 to 4 physical pages. The determination of this value does not rely on a single empirical extreme value, but rather is a weighted balance point that comprehensively considers the context switching latency overhead of the underlying storage hardware initiating an independent read command, the bus transmission time consumed by sequentially reading invalid bytes, and the current system input / output queue depth. The retrieval module 300 traverses the page sequence and calculates the difference between adjacent physical page indices. When the difference is less than or equal to the span threshold and the total number of merged pages does not exceed the maximum single transmission unit of the underlying storage controller, the retrieval module 300 determines that the pages are densely distributed and merges them into a single consecutive asynchronous read request. When the difference exceeds the span threshold, the retrieval module 300 executes hole truncation logic, cutting the page sequence at the current position and splitting it into multiple independent asynchronous read requests. This hardware page boundary-aware hole threshold truncation algorithm effectively reduces the concurrent load on the storage controller.

[0054] S303, the retrieval module 300 merges the above two retrieval results in the memory layer 10, performs hash deduplication operation using the extracted record primary key, and outputs the final consistent result.

[0055] After the underlying storage medium returns historical data based on the generated asynchronous read request, the retrieval module 300 merges the returned historical data with the real-time records extracted in step S301 in the working area of ​​memory layer 10. Because the space reclamation mechanism running in the system background may trigger asynchronous physical migration of surviving data, the target data may simultaneously exist in both the original data block in a closed state and the new data block in an active state within a very short time window. To prevent upper-layer services from obtaining duplicate redundant data, the retrieval module 300 extracts the record primary key of each data record in the merged result. Further, the retrieval module 300 constructs a hash table in memory space using the record primary key as a mapping feature. During the insertion of data records into the hash table, if an entry with the same record primary key is detected in the hash bucket, the retrieval module 300 triggers multi-dimensional conflict comparison logic. Specifically, the retrieval module 300 comprehensively compares the logical write timestamp, data version number, and the state priority of the data block in which the conflicting records are located, thereby avoiding misjudgments caused by relying solely on the block state. Based on the aforementioned multi-source state alignment, the retrieval module 300 retains records originating from active state data blocks with the latest timestamps and discards older records originating from closed state data blocks. In this embodiment, the duplicate read elimination mechanism based on hash table primary key collisions effectively shields the visibility of upper-layer logic from the interference of underlying physical space drift, ensuring strong consistency of output results in high-concurrency retrieval scenarios.

[0056] In this embodiment, step S400 is specifically implemented as an on-demand garbage collection and unlocked state alternation mechanism. The collection module 400 executes the underlying timing logic that triggers unlocked physical compaction, live data drift, and system resource security closed loop. Specifically, it includes the following sub-steps: S401, the recycling module 400 periodically calculates the physical fragmentation rate of the storage segment using the ratio of the invalid byte accumulator to the total allocation, and loads the corresponding physical tombstone bitmap to the memory layer 10 as needed when the physical fragmentation rate exceeds the threshold.

[0057] In the embodiments of this disclosure, to avoid input / output bus congestion caused by frequent scanning of the underlying storage medium, the system adopts a lazy triggering mechanism based on macroscopic indicator monitoring. Specifically, the recycling module 400 reads the invalid byte accumulators bound to each storage segment in the memory layer 10 according to a preset time period. Before calculating specific indicators, the system obtains the total pre-allocated capacity of each storage segment based on a continuous physical address space allocation mechanism. Further, the recycling module 400 combines this total capacity with the accumulated invalid bytes to calculate the corresponding physical fragmentation rate. The formula for calculating the physical fragmentation rate is: ; In the formula, Indicates the physical fragmentation rate; This indicates the total number of invalid bytes recorded in real time in the invalid byte accumulator; This represents the total physical capacity pre-allocated to the target storage segment during system initialization. The physical meaning of the above formula lies in quantifying the proportion of invalid space occupied by discarded data within the current storage segment. As a preferred configuration, When allocating system space, it is forcibly set to a non-zero constant. The system sets the expected value through this underlying constant, thus avoiding the risk of abnormal calculation with a zero denominator from the algorithm structure.

[0058] The system presets a trigger threshold for this physical fragmentation rate. As a preferred implementation, this trigger threshold is set between 0.4 and 0.6. This value is not determined based on a single empirical judgment, but rather by a joint evaluation model of the physical margin of the underlying solid-state storage medium's erase / write cycle and the background write amplification factor. This aims to balance the additional write burden brought by garbage collection with the performance gains from space release. To avoid system congestion caused by a single extreme value judgment, the garbage collection module 400 adopts a multi-dimensional weighted judgment logic. Only when the calculated physical fragmentation rate exceeds the trigger threshold, and the current controller's asynchronous input / output queue depth is lower than a preset idle threshold, does the garbage collection module 400 determine that the storage segment has entered a severely fragmented state, and loads the physical tombstone bitmap corresponding to all data blocks contained in the storage segment from storage layer 20 pages to the working area of ​​memory layer 10 as needed.

[0059] S402, the recycling module 400 calls the hardware instruction to calculate the Hamming weight to count the number of bits set in each physical tombstone bitmap, and accurately locates the data block with the highest invalid byte according to the byte mapping relationship.

[0060] Based on the physical tombstone bitmap loaded into memory layer 10, the recycling module 400 needs to perform further quantitative analysis of invalid bytes at the data block level. Traditional bit-by-bit traversal statistical methods can lead to excessive CPU clock cycle consumption when dealing with large bitmaps. In this embodiment, the recycling module 400 directly calls the hardware instruction built into the underlying architecture to calculate the Hamming weight, performing wide-byte statistics on the contiguous memory address segment where the bitmap is located. This hardware instruction can directly return the total number of bits set to 1 in the input register within a single instruction cycle. Furthermore, the recycling module 400 utilizes the pre-established strict alignment mapping relationship between the bitmap and physical space to convert this total number of bits into a specific invalid byte scale. The formula for calculating the invalid byte scale is: ; In the formula, Indicates the size of the accumulated invalid bytes within the target data block; This represents the Hamming weight of the physical tombstone bitmap, calculated using hardware instructions; that is, the total number of bits marked as invalid in the bitmap. This represents the size of the underlying hardware cache line. The technical purpose of this calculation step is to utilize the underlying spatial quantization relationship to restore discrete state flags to the actual physical space occupancy.

[0061] Through the above calculation steps, the recycling module 400 can obtain the actual physical fragmentation distribution of each data block within the storage segment. Based on the results, the recycling module 400 sorts the invalid byte sizes of each data block in descending order. For multiple candidate data blocks with identical invalid byte sizes, the recycling module 400 introduces the physical creation timestamp recorded during system initialization allocation as a second-dimensional sorting weight, prioritizing the data block with the earliest timestamp to take into account the wear leveling strategy of the underlying storage medium. Based on this multi-dimensional sorting result, the recycling module 400 selects the highest-ranking data block as the target data block to be compacted. In this embodiment, this precise quantitative compaction mechanism based on underlying hardware instructions does not require any disk read operations on the original business data throughout the process, achieving low-overhead evaluation of candidate blocks.

[0062] S403, the recycling module 400 migrates the surviving data in the old data block to the currently active data block, and completes the old and new mapping replacement of the multi-dimensional attribute bitmap set in the memory layer 10 by comparing and exchanging instructions, and finally notifies the storage layer 20 to reclaim the original physical space.

[0063] After identifying the target data block to be compacted, the recycling module 400 enters the physical migration phase of the surviving data. Specifically, the recycling module 400 uses the unset bit indices in the physical tombstone bitmap of the target data block to deduce the physical offset of the surviving data in the storage layer 20. Based on this physical offset, the recycling module 400 reads the corresponding surviving data into the memory layer 10 and re-appends it to the currently active data blocks in the cache pool.

[0064] After the surviving data completes its physical location shift, the system must ensure that the data retrieval path provided to the outside world can switch to the new location in real time. In this embodiment, the recycling module 400 performs a lock-free bitmap update operation that ensures mapping consistency. Specifically, to prevent routing loss anomalies caused by underlying concurrent operations, the recycling module 400 relies on the underlying compare-and-swap instructions to perform atomic state transitions in the multidimensional attribute bitmap set. During this update process, the recycling module 400 extracts the target business attributes of the surviving data, and uses the compare-and-swap instructions to set the bit position corresponding to the global identifier of the new active data block to 1 in the multidimensional attribute bitmap set; after confirming that the surviving data has completely physically shifted, it again uses the compare-and-swap instructions to clear the bit position corresponding to the global identifier of the old data block to zero. This mechanism eliminates addressing conflicts and routing loss problems that may occur during concurrent retrieval without introducing a global mutex lock.

[0065] After the mapping and replacement of the multi-dimensional attribute bitmap set are all successful, the recycling module 400 determines that all valid data in the target data block has been safely transferred. Finally, the recycling module 400 issues a space release command to the underlying file system of storage layer 20, erases the physical pages occupied by the target data block, and returns them to the system's available free physical resource pool, thereby completing the physical recycling loop of system resources.

[0066] To facilitate a more intuitive understanding of the core logic and physical implementation process of the technical solution of this invention by those skilled in the art, this embodiment takes a financial transaction flow processing scenario as an example to elaborate in detail the complete operational lifecycle of the system from data ingestion and concurrent retrieval to physical recycling. Simultaneously, in conjunction with supporting performance verification data and accompanying drawings, the technical effects of this invention are objectively explained.

[0067] In a real-world financial core transaction system deployment environment, the absolute path for the system's underlying data storage is fixed at the directory / mnt / nvme01 / financial_wal / . Since financial transaction records themselves possess multi-dimensional business tags (such as user identifiers, transaction types, etc.) and the length of a single record dynamically changes according to transaction complexity, this embodiment treats it as typical variable-length data.

[0068] Regarding the physical memory layout mechanism during the data ingestion phase: When the system is in a high-concurrency business cycle, the write module 100 continuously receives variable-length transaction logs from the front end. In this embodiment, the system does not adopt the traditional compact contiguous memory copy, but introduces a space alignment envelope mechanism based on hardware physical characteristics. The general principle of this mechanism is that modern multi-core processors do not read memory byte by byte, but use fixed-size cache lines as the basic access unit. If concurrent write operations of multiple threads happen to fall within the same cache line, it is very easy to trigger a cache consistency traffic storm (i.e., false sharing phenomenon) at the underlying bus level, resulting in a precipitous drop in system throughput.

[0069] To avoid the aforementioned physical layer read / write interference, the write module 100 performs padding calculations on each received transaction stream. Specifically, let the original data length of a single variable-length transaction be... The system generates a fixed-length metadata header for it. The underlying hardware cache line size is The system calculates the alignment envelope length based on this. The calculation formula is: In the formula This represents the floor function operator. The system uses the calculated floor function... Zero-value placeholder bytes are appended to the end of the variable-length pipeline to ensure that the encapsulated data packet network strictly occupies an integer number of cache lines in the physical memory space. Subsequently, the write module 100 writes the data packet network into the cache pool of memory layer 10 in a lock-free append manner. When the total amount of data accumulated in the cache pool reaches the boundary of the underlying physical sector, the system bypasses the operating system's page cache and triggers an asynchronous physical flush to the absolute path / mnt / nvme01 / financial_wal / via direct input / output mode.

[0070] During system operation, logical updates to historical data are inevitable. For example, when a transaction that has already been written to disk is reversed (refunded), traditional relational databases typically need to read the original data page, modify its status, and then write the entire page back. This can lead to severe read / write lock contention and I / O amplification under high load. In this embodiment, the update module 200 adopts the principle of in-situ state replacement. It does not touch any real data load, but calculates the discrete bit range based on the absolute physical offset of the original transaction in the storage medium. It directly uses a single instruction multiple data stream instruction to set the corresponding bit to 1 in the persistent physical tombstone bitmap, and performs numerical accumulation on the invalid byte accumulator corresponding to the storage segment in memory, without generating any disk read operations throughout the process.

[0071] Ensuring that concurrent background searches obtain the latest and consistent data status without locking is another core consideration of this solution. When performing risk control searches or historical transaction audits, the search module 300's operation spans memory layer 10 and storage layer 20. For storage segments with closed absolute paths, the search module 300 extracts physical offsets using a multi-dimensional attribute bitmap set and aggregates read instructions from adjacent physical pages into batches of asynchronous requests. For active data blocks in the cache pool that have not yet been flushed to disk, the search module 300 directly performs linear offset scans in memory. If a sudden hardware power failure causes tail data tearing during the scan, the system recalculates the data payload and compares it with the checksum in the metadata header, directly blocking the reading of incomplete data during the memory addressing stage. After the above two data streams are aggregated into the work area, the system performs hash deduplication using the transaction primary key as the mapping feature, thereby outputting strongly consistent query results to the business layer.

[0072] As the system runs continuously for extended periods, the proportion of discarded historical data gradually increases. To achieve closed-loop recycling of storage space, the system is configured with a lazy-triggered space compaction mechanism. The system periodically acquires the invalid byte accumulator value of each storage segment and divides it by the total physical capacity to obtain the physical fragmentation rate. When the fragmentation rate exceeds the set monitoring threshold, the recycling module 400 loads the associated physical tombstone bitmap into memory layer 10 and calls the wide-byte statistical instructions of the underlying hardware (such as the Hamming weight calculation instruction) to quickly calculate the target data block with the largest invalid byte size. After migrating the remaining live data within this target data block to the active data block, the recycling module 400 uses comparison and swap instructions to perform lock-free state switching between old and new blocks in the multi-dimensional attribute bitmap set, ultimately safely erasing the physical space under the original absolute path.

[0073] Based on the specific embodiments of the above-mentioned financial transaction flow processing scenario, in order to verify the actual operating effect of the technical solution of the present invention, the R&D team constructed a C++-based test dataset in an independent test cluster. The load testing tool simulated a high-frequency trading concurrency environment to independently generate performance statistics, and the load testing logs were fixedly output to the absolute path / var / log / benchmark_results / metrics.csv. Figure 3 and Figure 4 The content is generated based on the actual stress test data extracted under this absolute path. The following is an objective comparative analysis of the test results with reference to the attached figures.

[0074] See attached document Figure 3 This experiment aims to evaluate the system's physical capacity under extreme concurrent write loads. Comparison scheme A uses a key-value storage engine based on an LSM-Tree architecture; comparison scheme B uses a relational database storage engine based on a B+ tree architecture. For example... Figure 3As shown, when the number of concurrent write threads is below 32, all three schemes can maintain normal operation. As the number of concurrent threads increases exponentially to 256, scheme B, constrained by page splitting in the tree structure and severe mutex lock contention, experiences a sharp drop in throughput to approximately 30KIOPS. Scheme A, limited by write lock contention in the memory skip list and input / output blocking due to background data merging, reaches a hardware bottleneck around 150KIOPS. The present invention, by introducing a cache line-aligned space isolation mechanism and a lock-free state rotation mechanism, eliminates read / write conflicts between threads at the physical bus level. Its throughput exhibits good linear scalability with the number of concurrent threads, maintaining a stable range of approximately 420KIOPS at 256 threads, demonstrating a significant performance advantage over scheme A.

[0075] See attached document Figure 4 The system's long-tail latency (i.e., the 99th percentile latency) directly determines the availability of the financial risk control system during peak business periods. Under the 3:7 read / write mixed load configured in this experiment, the P99 latency of the comparative scheme B reached 28.5 milliseconds due to row-level lock waiting; the P99 latency of the comparative scheme A was 12.3 milliseconds, the main issue being the preemption of disk read bandwidth by the background cascading compaction process. The P99 latency of the proposed solution was 1.4 milliseconds. This data characteristic confirms the effectiveness of the in-situ failure logic of the physical tombstone bitmap and the bi-state concurrent retrieval funnel mechanism in the embodiments of the present invention. By replacing macroscopic physical overwriting with fine-grained bitmap masking, it achieves non-overlapping execution of historical state updates and real-time data retrieval at the physical media access level, effectively suppressing response latency spikes caused by the underlying storage media read / write conversion.

[0076] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A method for storing and retrieving big data in high-concurrency scenarios, characterized in that, Includes the following steps: Based on the hardware cache line size, the variable-length data is aligned with the total length of the generated metadata header after concatenation, the envelope length is calculated, and placeholder bytes are appended to form the data envelope; The data packet envelopment is appended to the active data block, the multidimensional attribute bitmap set is updated, and when the physical length of the data reaches the capacity limit threshold, a physical barrier flush command is issued to convert the active data block into a closed data block. The discrete bit interval is calculated based on the absolute physical offset of the data to be processed and the envelope length. Atomic setting is performed in the physical tombstone bitmap using a single instruction multiple data stream instruction, and atomic accumulation is performed on the associated invalid byte accumulator. Perform a linear scan on data blocks that are in an active state, and generate asynchronous read requests for data blocks that are in a closed state based on the hit physical offset. Merge the search results and use the record primary key to remove duplicates. The physical fragmentation rate is calculated based on the invalid byte accumulator. When the trigger threshold is exceeded, the hardware instruction for calculating the Hamming weight is invoked to count the number of bits set in the physical tombstone bitmap to locate the target data block. The surviving data of the target data block is then migrated and the original physical space is reclaimed.

2. The method for big data storage and retrieval in high-concurrency scenarios according to claim 1, characterized in that, The process of aligning the variable-length data with the total length of the generated metadata header concatenated according to the hardware cache line size, calculating the envelope length, and appending placeholder bytes to form the data envelope includes: Obtain the original data length of the variable-length data, and generate the metadata header containing the original data length, logical write timestamp, data version number, and cyclic redundancy checksum; The envelope length is obtained by adding the original data length to the fixed length of the metadata header, dividing the sum by the hardware cache line size, rounding up, and then multiplying by the hardware cache line size. Based on the envelope length, placeholder bytes filled with zero values ​​are appended to the end of the variable-length data.

3. The method for big data storage and retrieval in high-concurrency scenarios according to claim 1, characterized in that, Appending the data packet envelopment to the active data block, updating the multidimensional attribute bitmap set, and issuing a physical barrier flush command when the data physical length reaches the capacity limit threshold to convert the active data block into a closed data block, including: Record the absolute physical offset of the data network in the storage layer, extract the target service attributes, and use the comparison and exchange instruction to set the bit position corresponding to the global identifier of the active data block to 1 in the multidimensional attribute bitmap set; Monitor the physical length of the accumulated written data in the active data blocks, and trigger an asynchronous write command when it crosses an integer multiple of the physical sector boundary; When the physical length of the cumulative written data reaches the capacity limit threshold, the primary keys of all variable-length data contained in the active data block are extracted to generate a hash dictionary, which is then appended to the end of the active data block. The physical barrier flush command is then issued. After the underlying physical medium returns a synchronization success confirmation message, the active data block is converted into a closed data block.

4. The method for big data storage and retrieval in high-concurrency scenarios according to claim 1, characterized in that, Calculating discrete bit intervals based on the absolute physical offset of the data to be processed and the envelope length includes: Divide the absolute physical offset by the hardware cache line size and round down to obtain the starting bit index; Subtract one from the sum of the absolute physical offset and the envelope length, divide by the hardware cache line size, and round down to obtain the end bit index; The discrete bit range is defined by the start bit index and the end bit index.

5. The method for big data storage and retrieval in high-concurrency scenarios according to claim 1, characterized in that, Atomic bit setting is performed in the physical tombstone bitmap using a single instruction multiple data stream instruction, and atomic accumulation is performed on the associated invalid byte accumulator, including: The advanced vector extension instruction set is invoked as the single instruction multiple data stream instruction to set all bits in the discrete bit interval defined within the physical tombstone bitmap to 1 in parallel. The invalid byte accumulator is configured as an unsigned integer variable based on the spatial location of the data to be processed; Based on the atomic acquisition and accumulation instruction, the envelope length value of the data to be processed is directly added to the invalid byte accumulator.

6. The method for big data storage and retrieval in high-concurrency scenarios according to claim 1, characterized in that, Perform a linear scan of the active data blocks, including: The read memory barrier instruction is invoked to force a flush of the local cache, and a lock-free linear scan is performed from the starting address of the active data block to the current write cursor. When parsing the metadata header of the data network, extract the actual length and cyclic redundancy checksum; The variable-length data payload is read based on the actual length and the cyclic redundancy check value is recalculated. The data packet envelopment is extracted as a real-time record residing in memory if and only if the recalculated cyclic redundancy check value is consistent with the cyclic redundancy check sum stored in the metadata header.

7. The method for big data storage and retrieval in high-concurrency scenarios according to claim 1, characterized in that, For data blocks in a closed state, generate asynchronous read requests based on the hit physical offset, including: The starting physical page index is calculated by dividing the hit physical offset by the physical page size and rounding down; the ending physical page index is calculated by subtracting one from the sum of the hit physical offset and the payload length, dividing by the physical page size, and rounding down. All calculated physical page indexes are arranged in ascending order of numerical value to form a page sequence; the page sequence is traversed to calculate the difference between adjacent physical page indexes. When the difference is less than or equal to the span threshold, they are merged into a continuous asynchronous read request. When the difference is greater than the span threshold, the page sequence is split into multiple independent asynchronous read requests.

8. The method for big data storage and retrieval in high-concurrency scenarios according to claim 1, characterized in that, Merge search results and deduplicate them using the primary key of each record, including: The historical records returned by the asynchronous read request are merged with the real-time records in full, and the primary key of each data record in the merged result is extracted. A hash table is constructed using the primary key of the record as the mapping feature. When an entry with the same primary key is detected in the hash bucket, the logical write timestamp, data version number, and state priority of the conflicting record are compared. The data record originating from the active data block and with the latest logical write timestamp is retained.

9. The method for big data storage and retrieval in high-concurrency scenarios according to claim 1, characterized in that, The physical fragmentation rate is calculated based on the invalid byte accumulator. When the rate exceeds a trigger threshold, the hardware instruction for calculating the Hamming weight is invoked to count the number of bits set in the physical tombstone bitmap to locate the target data block, including: The physical fragmentation rate is obtained by dividing the total number of invalid bytes recorded in real time in the invalid byte accumulator by the pre-allocated total physical capacity. When the physical fragmentation rate exceeds the trigger threshold and the asynchronous input / output queue depth is lower than the idle threshold, the corresponding physical tombstone bitmap is loaded into the memory layer. The hardware instruction for calculating Hamming weight is invoked to count the total number of invalid bits marked in the physical tombstone bitmap, and the total number of bits is multiplied by the hardware cache line size to obtain the invalid byte size of each data block. The invalid byte size of each data block is sorted in descending order, and the target data block is located by combining the physical creation timestamp.

10. The method for big data storage and retrieval in high-concurrency scenarios according to claim 1, characterized in that, Migrating and reclaiming the surviving data of the target data block from its original physical space includes: Based on the unset bit indices in the physical tombstone bitmap of the target data block, the physical offset of the surviving data is deduced in reverse. The live data is read and re-appended to the active data block; Using the compare and swap instruction, the bit position corresponding to the global identifier of the new active data block is set to 1 in the multidimensional attribute bitmap set. After confirming that the surviving data has been completely migrated, the bit position corresponding to the global identifier of the old data block is cleared to zero using the compare and swap instruction. A space release command is issued to the underlying file system to erase the physical pages occupied by the target data block in order to complete the physical reclamation.