A method, device and medium for real-time storage of high-concurrency messages

By generating globally unique sequence identifiers and using a sharded storage mechanism in high-concurrency scenarios, the performance degradation and consistency issues of traditional databases under high concurrency are solved, achieving low-latency, high-throughput message storage and ensuring message order consistency and system stability.

CN121887865BActive Publication Date: 2026-06-30北京啄木鸟云健康科技有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
北京啄木鸟云健康科技有限公司
Filing Date
2026-03-18
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In high-concurrency scenarios, traditional databases struggle to handle performance degradation and message global order consistency issues caused by a large number of users sending messages simultaneously. The caching layer is prone to crashing or losing messages when faced with instantaneous traffic spikes. Furthermore, in group chat scenarios, storage space and write load increase exponentially, resulting in insufficient system stability.

Method used

Messages are received, written to the first-level cache, and a globally unique sequence identifier is generated. The messages are then buffered to the second-level message queue in the order of the sequence identifiers. The messages are then stored in shards according to the receiver or group identifier. Combined with the batch write mechanism of the third-level persistent storage unit, the write pressure is distributed to multiple storage nodes.

Benefits of technology

It achieves low-latency, high-throughput, and strongly consistent message storage, ensuring that messages are strictly ordered, avoiding overloading of a single database, improving system stability and scalability, and supporting real-time interaction for a large number of users.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121887865B_ABST
    Figure CN121887865B_ABST
Patent Text Reader

Abstract

This invention discloses a high-concurrency real-time message storage method, device, and medium, relating to the field of message storage technology. The method includes: receiving messages sent by a client and writing them to a first-level cache; responding to the message being written to the first-level cache by returning a successful transmission response to the client; generating a globally unique sequence identifier for each message, used to determine the message's temporal order; determining the target storage shard corresponding to the message according to preset sharding rules; publishing messages carrying sequence identifiers to a second-level message queue bound to the target storage shard, the second-level message queue used to buffer messages according to the sequence identifier order; monitoring the second-level message queue and writing messages in batches to a third-level persistent storage unit corresponding to the target storage shard according to the sequence identifier order. This invention supports real-time storage of high-concurrency instant messages, ensuring strict message ordering, system stability and reliability, and efficient storage.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of message storage technology, and in particular to a method, device and medium for real-time storage of high-concurrency messages. Background Technology

[0002] In large-scale instant messaging systems, such as those used in collaborations within medical institutions or on internet platforms, reliable message storage and real-time processing are fundamental to ensuring communication quality. Current common technical solutions typically rely on traditional relational databases for direct message persistence.

[0003] The inventors of this application have discovered that in high-concurrency scenarios, a large number of users sending messages simultaneously can cause a sharp drop in database write performance, resulting in message processing delays and making it difficult to guarantee the global order consistency of messages.

[0004] To alleviate database pressure, existing technologies often introduce caching layers for buffering. However, simple caching mechanisms easily reach their capacity and throughput bottlenecks when faced with instantaneous traffic spikes, potentially leading to cache crashes or message loss, resulting in insufficient system stability. Especially in group chat scenarios, adopting a write diffusion model that stores a complete copy of messages for each member generates significant data redundancy, causing storage space and write load to increase exponentially, resulting in low efficiency. Furthermore, without an effective data distribution strategy, the read and write pressure of all hot sessions will be concentrated on a single database node, easily forming a performance bottleneck and causing a decline in the overall system service capacity. Summary of the Invention

[0005] This invention provides a method, device, and medium for real-time storage of high-concurrency messages. The technical problem it aims to solve is: how to provide a technical solution that can support real-time storage of high-concurrency instant messages, ensuring that messages are strictly ordered, the system is stable and reliable, and storage is efficient.

[0006] In a first aspect, embodiments of the present invention provide a high-concurrency real-time message storage method, comprising:

[0007] Receive messages sent by the client and write the messages to the first-level cache;

[0008] In response to the message being written to the first-level cache, a success response is returned to the client.

[0009] Generate a globally unique sequence identifier for the message, the sequence identifier being used to determine the temporal order of the message;

[0010] The target storage shard corresponding to the message is determined according to the preset sharding rules;

[0011] The message carrying the sequence identifier is published to a secondary message queue bound to the target storage shard, the secondary message queue being used to buffer messages in the order of the sequence identifier;

[0012] Listen to the secondary message queue and write messages in batches to the tertiary persistent storage unit corresponding to the target storage shard according to the sequence identifier.

[0013] Optionally, generating a globally unique sequence identifier for the message includes:

[0014] Determine the session identifier corresponding to the message;

[0015] Query the sequence number corresponding to the session identifier;

[0016] Based on the distributed lock mechanism, the sequence number corresponding to the session identifier is atomically incremented to generate the sequence identifier of the message;

[0017] The mapping relationship between the sequence identifier and the session identifier is updated to a globally shared metadata store.

[0018] Optionally, determining the target storage shard corresponding to the message according to a preset sharding rule includes:

[0019] If the message is a one-way chat message, then calculate the first hash value based on the recipient user identifier corresponding to the message, and take the first hash value modulo the preset total number of shards to obtain the first target storage shard;

[0020] If the message is a group message, then the second hash value is calculated based on the group identifier corresponding to the message, and the second hash value is modulo the preset total number of shards to obtain the second target storage shard.

[0021] Optionally, the method further includes:

[0022] Establish an index relationship between the group member user identifier of the group corresponding to the message and the second target storage shard, and store the index relationship in a separate routing table.

[0023] Optionally, the step of monitoring the secondary message queue and writing messages in batches to the tertiary persistent storage unit corresponding to the target storage shard according to the sequence identifier includes:

[0024] For each target storage shard, deploy at least one independent write consumer process to the corresponding secondary message queue.

[0025] The write consumer process performs the following loop operation:

[0026] A batch retrieval request is initiated to the secondary message queue to obtain a batch of messages sorted by sequence identifier;

[0027] If a message is retrieved, the batch size and waiting timeout for this batch write operation are dynamically calculated based on the current system load, and the retrieved messages are temporarily stored in memory after being sorted by the sequence identifier.

[0028] When the number of messages temporarily stored in memory reaches the batch size or the waiting time reaches the timeout period, the messages temporarily stored in memory are written as a transaction batch to the third-level persistent storage unit in the order of the sequence identifier.

[0029] If no messages are retrieved, the write consumer process registers an asynchronous push listener with the secondary message queue and enters a blocking waiting state.

[0030] When the asynchronous push listener is triggered, indicating that a new message has arrived in the secondary message queue, the step of initiating a batch pull request to the secondary message queue is re-executed.

[0031] Optionally, the method further includes:

[0032] The processing latency and backlog of unprocessed messages for each of the secondary message queues are collected separately.

[0033] If the processing delay of the secondary message queue exceeds a preset delay threshold, and the backlog of unprocessed messages in the secondary message queue continues to grow, the number of write consumer process instances monitoring the secondary message queue will be increased.

[0034] If the processing latency of the secondary message queue is lower than the preset recovery threshold, and the backlog of unprocessed messages in the secondary message queue continues to decrease, the number of write consumer process instances listening to the secondary message queue will be reduced.

[0035] Optionally, increasing the number of write consumer process instances listening to the secondary message queue includes:

[0036] The processing latency of the secondary message queue, the backlog of unprocessed messages, and the number of write consumer process instances currently listening to the secondary message queue are input into a pre-trained first instance number prediction model. The first instance number prediction model outputs a first initial recommended instance number, wherein the first instance number prediction model limits the output of the first initial recommended instance number to be greater than the number of write consumer process instances currently listening to the secondary message queue.

[0037] Obtain the growth rate of the backlog of unprocessed messages in the secondary message queue within a preset statistical period;

[0038] If the growth rate exceeds a preset growth rate threshold, then a first elastic redundancy quantity positively correlated with the growth rate is added to the first initial recommended instance quantity to obtain the first target recommended instance quantity;

[0039] If the growth rate does not exceed the preset growth rate threshold, the first initial recommended instance number will be used as the first target recommended instance number.

[0040] Adjust the number of write consumer process instances listening to the secondary message queue to the number of the first target recommended instances.

[0041] Optionally, reducing the number of write consumer process instances listening to the secondary message queue includes:

[0042] The processing latency of the secondary message queue, the backlog of unprocessed messages, and the number of write consumer process instances currently listening to the secondary message queue are input into a pre-trained second instance number prediction model. The second instance number prediction model outputs a second initial recommended instance number, wherein the output second initial recommended instance number is limited to be less than the number of write consumer process instances currently listening to the secondary message queue.

[0043] Obtain the rate at which the backlog of unprocessed messages in the secondary message queue decreases within a preset statistical period;

[0044] If the absolute value of the reduction rate exceeds a preset absolute value threshold, then a second elastic redundancy quantity that is positively correlated with the absolute value of the reduction rate is subtracted from the second initial recommended instance quantity to obtain the second target recommended instance quantity.

[0045] If the absolute value of the reduction rate does not exceed the preset absolute value threshold, the second initial recommended instance number will be used as the second target recommended instance number.

[0046] Adjust the number of write consumer process instances listening to the secondary message queue to the second target recommended number of instances.

[0047] Secondly, embodiments of the present invention also provide a computer device, which includes a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the above-described method.

[0048] Thirdly, embodiments of the present invention also provide a computer-readable storage medium storing a computer program that, when executed by a processor, can implement the above-described method.

[0049] This invention provides a high-concurrency message real-time storage method, device, and medium. The method includes: receiving a message sent by a client and writing the message into a first-level cache; responding to the message being written into the first-level cache by returning a successful transmission response to the client; generating a globally unique sequence identifier for the message, the sequence identifier being used to determine the temporal order of the message; determining the target storage shard corresponding to the message according to a preset sharding rule; publishing the message carrying the sequence identifier to a second-level message queue bound to the target storage shard, the second-level message queue being used to buffer messages according to the order of the sequence identifier; monitoring the second-level message queue and writing messages in batches into a third-level persistent storage unit corresponding to the target storage shard according to the order of the sequence identifier. This invention reduces the perceived transmission latency by writing messages to a high-performance first-level cache first and immediately confirming them. By generating globally unique and ordered sequence identifiers for messages and combining this with a mechanism that routes messages from the same session to the same ordered buffer queue, it fundamentally ensures that the storage order of messages is strictly consistent with the transmission order during massive asynchronous processing, thus solving the message out-of-order problem. Furthermore, storage sharding is performed based on the identifier information of the receiver or group, distributing the total write pressure across multiple independent database nodes and avoiding performance bottlenecks caused by the concentration of hot data in a single database. Simultaneously, an ordered message queue is introduced as a buffer layer to absorb instantaneous traffic surges, and a batch persistence mechanism significantly reduces the database's write pressure and connection overhead. This results in overall low-latency, high-throughput, strongly consistent, and horizontally scalable message storage capabilities, stably supporting large-scale simultaneous online users and real-time interaction with massive amounts of messages. Attached Figure Description

[0050] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0051] Figure 1 A flowchart illustrating a high-concurrency real-time message storage method provided in an embodiment of the present invention;

[0052] Figure 2 A schematic block diagram of a computer device provided for an embodiment of the present invention. Detailed Implementation

[0053] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0054] It should be understood that, when used in this specification and the appended claims, the terms "comprising" and "including" indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.

[0055] It should also be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.

[0056] It should also be further understood that the term "and / or" as used in this specification and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0057] As used in this specification and the appended claims, the term "if" may be interpreted, depending on the context, as "when," "once," "in response to determination," or "in response to detection." Similarly, the phrase "if determined" or "if [described condition or event] is detected" may be interpreted, depending on the context, as "once determined," "in response to determination," "once [described condition or event] is detected," or "in response to detection of [described condition or event]."

[0058] Please see Figure 1 This invention provides a high-concurrency message real-time storage method, which includes the following steps:

[0059] S1, receive the message sent by the client and write the message into the first-level cache.

[0060] In practice, the system receives messages from clients and writes them to the first-level cache. Specifically, the message gateway service receives network requests from clients and parses out the message body and metadata. Then, the message gateway service calls the first-level cache client to write the complete message object to the first-level cache in key-value format. The key can be constructed as a temporary message identifier, and the value is the message content. The first-level cache can be implemented using a memory-based key-value storage system, such as a Redis cluster, which provides microsecond-level write performance.

[0061] S2, in response to the message being written to the first-level cache, a successful sending response is returned to the client.

[0062] In practice, upon receiving the message and writing it to the first-level cache, a successful transmission response is returned to the client. Once the first-level cache client confirms successful transmission, the message gateway service immediately returns an application-layer confirmation response to the requesting client. This response only indicates that the message has been successfully received and entered the system buffer, not that final persistence is complete, thus decoupling backend processing latency from user perception and achieving a millisecond-level transmission experience.

[0063] S3, Generate a globally unique sequence identifier for the message, the sequence identifier being used to determine the temporal order of the message.

[0064] In practice, a globally unique sequence identifier is generated for each message. This sequence identifier is used to determine the message's temporal order. The generation of the sequence identifier is handled by a dedicated functional module in the system. This module can be a standalone timing service or integrated into the message processing pipeline as a sequence generation submodule. A monotonically increasing counter is maintained for each session. By obtaining the session identifier to which the message belongs and atomically incrementing the counter for that session, a globally unique and ordered sequence identifier (e.g., a 64-bit integer) is generated within that session. This sequence identifier is bound to the message and serves as its logical timestamp throughout the system.

[0065] In some preferred embodiments, generating a globally unique sequence identifier for the message includes: determining the session identifier corresponding to the message; querying the sequence number corresponding to the session identifier; performing an atomic increment operation on the sequence number corresponding to the session identifier based on a distributed lock mechanism to generate the sequence identifier of the message; and updating the mapping relationship between the sequence identifier and the session identifier to a globally shared metadata storage.

[0066] In practice, the session identifier is extracted from the message. In the message processing pipeline, when processing a request to generate a sequence identifier, the message object is first parsed. The session identifier is extracted from the message's metadata fields. This session identifier uniquely identifies a one-on-one chat or a group chat; for example, it can be a string concatenated from the sender's and receiver's identifiers according to rules, or a unique group ID assigned by the system.

[0067] Further, the sequence number corresponding to the session identifier is queried. Here, the sequence number refers to the baseline value of the sequence identifier used to generate messages, pre-assigned and maintained for this session. Upon receiving a request containing the session identifier, the sequence generation service queries a globally shared metadata store to obtain the latest sequence number for the session. The metadata store can be a highly available key-value store that supports strong consistency, such as Etcd, ZooKeeper, or a relational database with transaction capabilities. The query operation reads the corresponding value based on the session identifier as the key; this value is the previously successfully assigned sequence number.

[0068] Furthermore, based on the distributed lock mechanism, an atomic increment operation is performed on the sequence number corresponding to the session identifier to generate the message's sequence identifier. To ensure that operations on the sequence number of the same session do not conflict or duplicate in a distributed multi-instance environment, a distributed lock for the session identifier must be acquired before the increment operation. In the specific implementation, the distributed lock function built into the metadata storage system can be used, or a lock service can be built on top of it. After acquiring the distributed lock, the current sequence number is read again (to avoid modification during lock acquisition), its value is incremented by one, and a new sequence identifier is generated. Furthermore, this new sequence identifier is temporarily stored as the result of this request and is prepared to be updated back to the metadata storage. The entire read-increment-write process is completed under the protection of the lock, constituting an atomic operation.

[0069] Furthermore, the mapping between the sequence identifier and the session identifier is updated in a globally shared metadata store. After successfully generating a new sequence identifier, it needs to be written back to the metadata store as the new latest sequence number for the session; that is, the value stored with the session identifier as the key is updated with the new sequence identifier value. Only after this update operation is completed can the previously acquired distributed lock be released. This update action ensures that subsequent sequence generation requests for the same session can increment based on the latest cardinality, thereby maintaining the continuity and uniqueness of the sequence.

[0070] The sequence identifier generation method provided in this embodiment offers a reliable ordered identifier generation mechanism for distributed messaging systems by combining session identifiers, distributed locks, and central metadata storage. First, by using a globally shared metadata storage as the single source of fact to store and update the latest sequence number for each session, it ensures that the allocation base point for sequence numbers is consistent across the entire distributed cluster for any given session. This fundamentally avoids the risk of identifier duplication due to asynchronous local caches on different nodes, guaranteeing global uniqueness. Furthermore, the generation of sequence identifiers is based on atomic increment operations, ensuring that the identifier value of each new message is strictly larger than the identifier value of previous messages within the same session. This monotonically increasing characteristic provides a clear and unambiguous temporal relationship for messages, a fundamental prerequisite for subsequent message-ordered buffering and storage. Finally, the introduction of a distributed lock mechanism cleverly solves the problem of contention for the same session sequence number resource under high concurrency. When multiple threads or processes process messages of the same session simultaneously, locks can force these operations to be serialized, thereby ensuring the atomicity of increment operations and the continuity of the sequence. This avoids sequence number skipping or duplication due to concurrent updates, allowing the scheme to work stably in horizontally scaled cluster environments.

[0071] S4. Determine the target storage shard corresponding to the message according to the preset sharding rules.

[0072] In practice, the target storage shard corresponding to the message is determined according to preset sharding rules. The sharding routing module in the system calculates based on the message type and receiver identification information. The sharding routing module is a functional logic unit that executes the preset sharding rules. Specific rules can be pre-configured. For example, the hash value of the receiver's user identifier or group identifier is calculated, and then modulo the total number of shards to obtain a shard number. This shard number logically corresponds to a specific storage unit or database instance, thereby distributing the storage load of massive messages across multiple independent storage nodes.

[0073] In some preferred embodiments, determining the target storage shard corresponding to the message according to a preset sharding rule includes: if the message is a one-on-one chat message, calculating a first hash value based on the recipient user identifier corresponding to the message, and taking the first hash value modulo a preset total number of shards to obtain a first target storage shard; if the message is a group message, calculating a second hash value based on the group identifier corresponding to the message, and taking the second hash value modulo a preset total number of shards to obtain a second target storage shard.

[0074] In practice, based on different message types, differentiated identifiers are used as sharding keys for calculation, thereby achieving uniform data distribution and targeted routing. Specifically:

[0075] If the message is a one-way chat message, a first hash value is calculated based on the recipient's user identifier. This first hash value is then moduloed by the preset total number of shards to obtain the first target storage shard. The message sharding routing module first determines the message type. When the type field in the message body or message header indicates that this is a one-way chat message, the router extracts the recipient's user identifier as the input key for sharding calculation. Using a preset hash function, such as MurmurHash or FNV-1a, the recipient's user identifier string is calculated to obtain an integer first hash value. Subsequently, this first hash value is moduloed by the total number of logical shards configured in the system. The modulo result is an integer between 0 and the total number of shards minus 1, which is determined as the number of the first target storage shard. This shard number logically maps to a specific database instance or table.

[0076] Further, if the message is a group message, a second hash value is calculated based on the group identifier corresponding to the message. This second hash value is then moduloed by the preset total number of shards to obtain the second target storage shard. When the message type is a group message, the shard routing module extracts the group identifier to which the message belongs as the input key for shard calculation. Using the same or different hash function as in the one-on-one chat scenario, the group identifier is calculated to obtain another integer second hash value. This second hash value is then moduloed by the preset total number of logical shards. The modulo result is the number of the second target storage shard. This shard will be used to store group-related data, such as the complete message content of the group.

[0077] The sharding rules provided in this embodiment effectively optimize data distribution and system performance by selecting sharding keys based on message type. Regarding load balancing, both one-on-one and group chat messages are allocated storage locations by hashing their key identifiers. Due to the uniformity of the hash function, a large number of users or groups can be mapped approximately evenly across various storage shards, thus distributing the overall read / write pressure across multiple backend database nodes. This avoids overloading a single database due to a few popular users or groups, improving the overall system throughput and horizontal scalability. In terms of storage optimization, using the group identifier instead of the recipient's user identifier as the sharding key for group messages has significant advantages. It routes all messages from the same group to the same fixed second target storage shard. This design allows the complete message content of the group to be stored centrally, which is highly beneficial for implementing a hybrid storage mode of write-spreading indexes and read aggregation. That is, only one complete message is stored in the group shard, while a lightweight index is stored in each member's personal shard, thereby greatly saving overall storage space and reducing write amplification effects in group chat scenarios.

[0078] In some preferred embodiments, the method further includes: establishing an index relationship between the group member user identifier of the group corresponding to the message and the second target storage shard, and storing the index relationship in a separate routing table.

[0079] In practice, a mapping relationship is constructed from group members to the main storage location of group messages to provide a routing basis for subsequent hybrid storage modes.

[0080] Specifically, an index relationship is established between the group member user identifiers of the group corresponding to the message and the second target storage shard. After processing a group message and determining its second target storage shard, the system needs to record the association between all members of the group and that shard. In the specific implementation, the group information service can be called to obtain a list of user identifiers of all current members corresponding to the group identifier. Then, an index record is created for each member user identifier in the list. The core content of this index record is to establish a pointing relationship between the member user identifier and the second target storage shard number. This means that by using any member identifier, it is possible to query which shard the main message of its group is stored on.

[0081] Furthermore, this index relationship is stored in a separate routing table. The generated index relationships need to be persistently stored for querying. These relationships are stored in a separate routing table or index store. This routing table can be a storage system separate from the business data, such as a separate database table, a high-performance key-value store, or a search engine index. Each index record can use the member user identifier as the primary key or unique key, and its value includes the corresponding group identifier and the second target storage shard number. As part of the system's metadata, this routing table needs to ensure high availability and query performance, because it will be frequently queried to locate the storage location of the group's main message when retrieving messages for group members.

[0082] This embodiment establishes and maintains independent routing tables from group members to storage shards, laying a crucial metadata foundation for the efficient storage and retrieval of large-scale group messages. First, it decouples users from the physical storage location of messages. In the traditional write-diffusion model, user messages are directly stored in their personal shards, with the storage and query paths consistent. In this solution, however, the user's group message index is stored in their personal shard, but the complete content is centrally stored in the group shard. The existence of the routing table allows the system to quickly locate the corresponding group shard by querying the user identifier when it needs to retrieve the complete group message, without traversing all shards or relying on other complex logic, achieving efficient addressing after decoupling. Furthermore, this mechanism directly supports the efficient operation of the hybrid storage model of write-diffusion indexing and read aggregation. When a user requests historical group messages, the system first reads a lightweight index from their personal shard, then uses the information in the index, combined with the routing table, to locate the group's main storage shard, and finally retrieves the required complete message content in batches from that shard. As a key addressing component, the routing table ensures that the read aggregation process can quickly and accurately find the target data source, thereby enabling the optimization strategy of replacing multiple distributed writes with a single centralized read. While ensuring read performance, it significantly reduces storage redundancy and write pressure in group chat scenarios.

[0083] It should be further explained that the lightweight index includes at least: a group identifier, a message sequence identifier, and the physical storage location identifier (such as offset or row ID) of the message in the group's main storage shard. When a user requests historical group messages, the system performs the following read aggregation operation: First, it queries the relevant lightweight index list of group messages in the storage shard to which the user belongs (determined according to the one-chat message sharding rules) based on the user identifier; then, it queries the routing table based on the group identifier in the index to obtain the second target storage shard where the complete group message is located; next, it organizes the message location identifiers in the lightweight index list into a batch query request and sends it to the third-level persistent storage unit corresponding to the second target storage shard; finally, it returns the complete message content returned from the third-level persistent storage unit to the user after sorting it by the sequence identifier.

[0084] S5, the message carrying the sequence identifier is published to the secondary message queue bound to the target storage shard, the secondary message queue being used to buffer messages in the order of the sequence identifier.

[0085] In practice, the message carrying the sequence identifier is published to the secondary message queue bound to the target storage shard. This secondary message queue buffers messages in the order of the sequence identifier. The message, along with its sequence identifier and target shard information, is encapsulated into a new internal event. The producer client publishes the event to the corresponding secondary message queue topic or partition based on the target storage shard number. The secondary message queue uses a system that guarantees message ordering within a partition, such as Apache Kafka. Messages from the same session, having the same shard number, are routed to the same queue partition, thus naturally maintaining their sending order.

[0086] S6, listen to the secondary message queue, and write messages in batches to the tertiary persistent storage unit corresponding to the target storage shard according to the sequence identifier.

[0087] In practice, the secondary message queue is monitored, and messages are written in batches to the tertiary persistent storage unit corresponding to the target storage shard according to the sequence identifier. One or more write consumer processes are started for each secondary message queue partition. These consumer processes sequentially pull messages from the queue. They sort and organize the pulled messages in memory according to the sequence identifier, accumulate them to a certain quantity or wait for a period of time, and then insert them in batches into the tertiary persistent storage unit bound to the queue, such as a specific database shard or table in a MySQL database or cloud-native database, in the form of a database transaction. This achieves a smooth transition from high-speed queues to persistent databases and ensures that the storage order of messages in the same session in the database is consistent with the sending order.

[0088] This invention effectively addresses message storage challenges in high-concurrency instant messaging scenarios by constructing a three-tier processing chain. First, by rapidly writing messages to a high-performance first-level cache and immediately responding to the client, user-perceived sending latency is minimized, significantly improving the interactive experience. Furthermore, by generating a globally unique sequence identifier for each message within a session, a clear temporal basis is provided for the messages. Combined with a mechanism that routes messages from the same session to the same ordered second-level message queue, it ensures that even if messages are processed asynchronously in a distributed system, their final persistence order strictly matches the sending order, resolving the message out-of-order problem. Furthermore, by using sharding rules based on receivers or groups, the total message writing pressure is evenly distributed across multiple independent third-level persistent storage units, avoiding hotspot performance bottlenecks caused by all traffic converging on a single database, enabling the system to horizontally scale to support massive numbers of users. Furthermore, the second-level message queue, acting as a buffer layer, effectively absorbs instantaneous write surges and significantly reduces the write frequency and connection pressure on the third-level persistent storage units through batch writing, improving the overall system throughput and stability.

[0089] In some preferred embodiments, the step of listening to the secondary message queue and writing messages in batches to the tertiary persistent storage unit corresponding to the target storage shard according to the sequence identifier includes: deploying at least one independent write consumer process for each secondary message queue corresponding to the target storage shard; the write consumer process performs the following loop operation: initiating a batch pull request to the secondary message queue to obtain a batch of messages sorted by sequence identifier; if messages are pulled, dynamically calculating the batch size and waiting timeout time for this batch write operation based on the current system load, and temporarily storing the pulled messages in memory after sorting by sequence identifier; when the number of messages temporarily stored in memory reaches the batch size or the waiting time reaches the timeout time, writing the messages temporarily stored in memory in memory as a transaction to the tertiary persistent storage unit according to the sequence identifier order; if no messages are pulled, the write consumer process registers an asynchronous push listener with the secondary message queue and enters a blocked waiting state; when the asynchronous push listener is triggered, indicating that new messages have arrived at the secondary message queue, the step of initiating a batch pull request to the secondary message queue is re-executed.

[0090] In practice, efficient and orderly persistence is achieved by writing to the consumer process in a loop, combined with push-pull message retrieval and dynamic batch processing strategies.

[0091] First, deploy at least one independent write consumer process for each target storage shard's corresponding secondary message queue. During system deployment, based on the size and performance requirements of the secondary message queues, configure one or more dedicated write consumer process instances for each logical queue or partition. These processes can exist as independent service processes, containers, or threads, specifically responsible for consuming data from a particular queue and writing it to the corresponding tertiary persistent storage unit. Multiple processes can collaborate to consume a queue as a consumer group to improve consumption capacity.

[0092] Furthermore, the consumer process executes the following loop operation. After the process starts, it enters a continuously running work loop, which contains logic such as message fetching, processing, and waiting.

[0093] A batch fetch request is initiated to the secondary message queue to retrieve a batch of messages sorted by sequence identifiers. In each iteration of the loop, the consumer process first proactively initiates a batch fetch request to the secondary message queue server it is listening to. The request can specify the maximum number of bytes or messages to fetch. Since the secondary message queue itself guarantees the order of messages within the partition, the fetched batch of messages is already sorted by sequence identifiers internally.

[0094] Furthermore, if messages are retrieved, the batch size and timeout for this batch write operation are dynamically calculated based on the current system load. The retrieved messages are then temporarily stored in memory, sorted by sequence identifier. When the retrieval request succeeds and returns messages, the consumer process does not immediately write to the database. It first dynamically calculates an optimal number of messages for this batch write as the batch size, and a timeout period to allow more messages to be waited for to complete the batch, based on real-time system metrics such as database connection pool utilization and average system load. Next, the retrieved messages are loaded into a buffer or queue structure in the process's memory. Although the messages are already ordered in the queue, re-sorting by sequence identifier provides a robustness guarantee. Then, the process begins to wait.

[0095] In some preferred embodiments, the batch size and waiting timeout time for this batch write operation are dynamically calculated based on the current system load, specifically including:

[0096] Step 1: Collect real-time metrics of the current system load.

[0097] Before each batch write operation, the write consumer process calls the system monitoring interface or reads data from the local collector to obtain real-time metrics reflecting the current system load status. These metrics include, but are not limited to: the ratio of currently active database connections in the Level 3 persistent storage unit to the total capacity of the connection pool, the CPU utilization of the database instance, the average load of the database server used for batch writes, and the memory utilization of the host machine where the write consumer process resides. These metrics are obtained in numerical form and used as input for subsequent calculations.

[0098] Step 2: Calculate the batch size adjustment factor based on the real-time indicators.

[0099] The consumer process incorporates a pre-defined batch size calculation function. This function takes multiple real-time metrics obtained in the preceding steps as input parameters. Internally, each metric has a pre-defined weighting coefficient and a reference threshold. The calculation process is as follows: the value of each real-time metric is compared with its corresponding reference threshold to obtain the relative stress value of that metric (e.g., the ratio of the real-time metric value to the reference threshold is used as the relative stress value; this invention is not specifically limited). Then, the relative stress values ​​of each metric are multiplied by their corresponding weighting coefficients and summed using a weighted average to obtain a comprehensive load score. Finally, a pre-defined base batch size and the comprehensive load score are substituted into a pre-defined negatively correlated mathematical formula to calculate a batch size adjustment factor. This mathematical formula is designed so that the batch size adjustment factor decreases when the comprehensive load score increases and increases conversely.

[0100] Step 3: Determine the batch size for this batch write operation.

[0101] The batch size adjustment factor calculated in step two is multiplied by a system-configured baseline batch size constant to obtain an initial calculated batch size. To ensure the feasibility and stability of the operation, this initial calculated batch size is limited to a closed interval formed by a preset minimum batch size and a preset maximum batch size. Specifically, the system presets a minimum batch size and a maximum batch size. The write consumer process compares and truncates the dynamically calculated initial calculated batch size with these two preset boundaries: if the initial calculated batch size is less than the minimum batch size, the final batch size used for batch write operations is determined to be the minimum batch size; if the initial calculated batch size is greater than the maximum batch size, it is determined to be the maximum batch size; otherwise, the value of the initial calculated batch size is directly adopted. Through this process, the final batch size is strictly limited to a closed interval formed by the minimum batch size and the maximum batch size.

[0102] Step 4: Calculate the waiting timeout adjustment factor based on the real-time indicators.

[0103] The write consumer process also incorporates a built-in timeout calculation function. This function, like the one obtained in step one, uses multiple real-time metrics as input parameters, but may employ different weighting coefficients and reference thresholds than in step two. The goal of this function is to assess the system's tolerance for write operation latency. Another comprehensive latency tolerance score is calculated using a similar weighted summation method. Subsequently, a preset base timeout and this comprehensive latency tolerance score are substituted into a positively correlated mathematical formula to calculate a waiting timeout adjustment factor. This mathematical formula is designed so that the adjustment factor decreases when the system's comprehensive latency tolerance score is low, and increases conversely, it increases when the score is high.

[0104] Step 5: Determine the timeout period for this batch write operation.

[0105] The waiting timeout adjustment factor calculated in step four is multiplied by a system-configured baseline timeout constant to obtain a preliminary calculated timeout. Similarly, to ensure controllable system behavior, this preliminary calculated timeout is limited to a closed interval formed by a preset minimum timeout and a maximum timeout. Specifically, the system presets a minimum timeout and a maximum timeout. The write consumer process compares and truncates the dynamically calculated preliminary timeout with these two preset boundaries: if the preliminary calculated timeout is less than the minimum timeout, the final waiting timeout used for batch write operations is determined as the minimum timeout; if the preliminary calculated timeout is greater than the maximum timeout, it is determined as the maximum timeout; otherwise, the value of the preliminary calculated timeout is directly adopted. Through this process, the final waiting timeout is strictly limited to a closed interval formed by the minimum timeout and the maximum timeout.

[0106] Through the above steps, the write consumer process can adaptively determine the upper limit of the number of messages to be accumulated in each batch write operation and the upper limit of the time that can be waited for to accumulate messages based on the instantaneous load of the system, thereby achieving a dynamic balance between system throughput and response latency.

[0107] Furthermore, when the number of messages temporarily stored in memory reaches the batch size or the waiting time reaches the timeout period, the messages temporarily stored in memory are written to the third-level persistent storage unit in batches as a transaction, according to the sequence identifier. During the waiting period, new messages may be added to the staging area through subsequent pull operations. A batch write operation is triggered when either of the following two conditions is met: first, the accumulated number of messages in the staging area reaches the dynamically calculated batch size; second, the dynamically calculated timeout period has exceeded the timeout period from the first trigger of the wait. After triggering, the consumer process organizes all messages in the staging area according to the sequence identifier, and then inserts them into the third-level persistent storage unit in batches as a single transaction through a database session. The transaction ensures that this batch of writes either all succeeds or all fails, guaranteeing data consistency.

[0108] Furthermore, if no messages are retrieved, the consumer process registers an asynchronous push listener with the secondary message queue and enters a blocking wait state. If the initiated batch retrieval request returns null, it indicates that there are no new messages available for consumption in the current queue. To reduce the system and network overhead caused by invalid empty polling, the consumer process registers an asynchronous callback listener with the message queue server or subscribes to a notification channel. After registration, the process can suspend or enter a low-power blocking wait state to release computing resources.

[0109] Furthermore, when the asynchronous push listener is triggered, indicating that new messages have arrived in the secondary message queue, the step of initiating a batch pull request to the secondary message queue is re-executed. Once a new message is produced in this queue, the message queue server will actively notify or wake up the consumer processes in the waiting state through the previously registered listener channels. After the process is woken up, it immediately clears or unregisters the previous listener and jumps back to the beginning of the loop to execute the step of initiating a new batch pull request, thereby processing the newly arrived messages in a timely manner.

[0110] This embodiment achieves efficient and stable data synchronization from the secondary message queue to the tertiary persistent storage unit through fine-grained operations in the consumer process. Regarding consumption efficiency, the use of batch retrieval instead of single-item retrieval effectively reduces network round trips and server request processing overhead, significantly improving data throughput. Combined with dynamically calculated batch sizes, each database write operation can process as many messages as possible, merging numerous small insert operations into a few large batch operations, greatly alleviating the database's write pressure. Furthermore, in terms of system resource optimization, the push-pull mechanism balances real-time performance with resource conservation. Active retrieval when messages are available ensures low latency; when no messages are available, registering a push listener and entering a waiting state avoids unnecessary CPU and network bandwidth consumption from continuous empty polling, allowing the system to maintain low resource consumption even under low load. Furthermore, regarding data reliability, sorting temporarily stored messages by sequence identifier and writing them in batches as a single transaction offers multiple advantages. Sorting ensures the sequential nature of storage, while transactions ensure the atomicity of batch operations. This means that a batch of messages is either all successfully persisted or all are rolled back in case of failure, preventing data inconsistency caused by partial writes. The entire process forms an ordered, reliable, and adjustable data pipeline.

[0111] In some preferred embodiments, the method further includes: collecting the processing latency and unprocessed message backlog of each of the secondary message queues; if the processing latency of the secondary message queue exceeds a preset latency threshold and the unprocessed message backlog of the secondary message queue continues to increase, increasing the number of write consumer process instances monitoring the secondary message queue; if the processing latency of the secondary message queue is lower than a preset recovery threshold and the unprocessed message backlog of the secondary message queue continues to decrease, decreasing the number of write consumer process instances monitoring the secondary message queue.

[0112] In practice, key queue metrics are monitored and consumer resources are dynamically adjusted to cope with load fluctuations.

[0113] Specifically, the processing latency and backlog of unprocessed messages are collected for each secondary message queue. The system deploys a monitoring agent or utilizes the message queue system's own monitoring interface to periodically collect metrics for each secondary message queue. The collected processing latency can be defined as the waiting time of the oldest message in the queue, i.e., the difference between the current time and the timestamp of the message entering the queue. The backlog of unprocessed messages can be directly collected as the difference between the current total number of messages in the queue and the offset of messages awaiting consumption. These metrics are then sent to a time-series database or a monitoring center.

[0114] Furthermore, if the processing latency of the secondary message queue exceeds a preset latency threshold, and the backlog of unprocessed messages in the secondary message queue continues to grow, the number of write consumer process instances monitoring the secondary message queue is increased. The elastic scaling controller periodically analyzes the collected metrics. For each secondary message queue, the controller checks whether its processing latency exceeds a preset threshold, which defines the maximum acceptable latency level. Simultaneously, the controller analyzes the trend of the queue's unprocessed message backlog over a period of time to determine if it is continuously increasing. When both conditions are met, the controller determines that the consumer processing capacity corresponding to that queue is insufficient. Therefore, it triggers a scaling operation, increasing the number of write consumer process instances responsible for consuming that specific queue by calling the deployment platform API or script. After the new instances start, they are added to the existing consumer group to share the message consumption load of that queue.

[0115] Furthermore, if the processing latency of the secondary message queue is below a preset recovery threshold and the backlog of unprocessed messages in the secondary message queue continues to decrease, the number of write consumer process instances listening to the secondary message queue is reduced. Conversely, the elastic scaling controller checks whether the queue's processing latency has decreased to below a more lenient preset recovery threshold and whether its backlog of unprocessed messages is showing a continuous decreasing trend. When both conditions are met, it indicates that the current consumer processing capacity is excessive and resources are idle. The controller then triggers a scaling-down operation, selecting and safely stopping a portion of the write consumer process instances listening to the queue, reducing its overall number. The scaling-down process needs to ensure that messages being processed are completed and that the queue's consumption offset is properly transferred to avoid message loss.

[0116] This embodiment introduces an elastic scaling mechanism based on key queue metrics, enabling the message storage system to adapt to load changes and achieve a dynamic balance between resource efficiency and system stability. Regarding automated resource adjustment, the system allocates computing resources based on actual load rather than a pre-configured fixed capacity. When a queue experiences increased processing latency and backlog due to a surge in message volume, the system automatically identifies the bottleneck and adds consumer instances to improve the concurrent processing capacity of that stage, thereby accelerating the digestion of backlogged messages and preventing further latency deterioration. Furthermore, when the load decreases, the system automatically reduces consumer instances, releasing unnecessary computing resources and lowering operating costs. This dynamic adjustment avoids excessive resource reservation and waste during low-load periods and prevents the need for manual emergency intervention due to resource shortages during high-load periods, improving operational efficiency. In terms of ensuring service quality, this mechanism uses processing latency and backlog trends as core control indicators, directly serving the system's performance goals. By controlling latency below a threshold, it ensures that the overall processing time from message entry into the queue to completion of persistence meets the requirements of the service level agreement, providing users with stable and predictable storage performance.

[0117] In some preferred embodiments, increasing the number of write consumer process instances monitoring the secondary message queue includes: inputting the processing latency of the secondary message queue, the backlog of unprocessed messages, and the current number of write consumer process instances monitoring the secondary message queue into a pre-trained first instance count prediction model; outputting a first initial recommended instance count from the first instance count prediction model, wherein the first initial recommended instance count output by the first instance count prediction model is limited to be greater than the current number of write consumer process instances monitoring the secondary message queue; obtaining the growth rate of the backlog of unprocessed messages in the secondary message queue within a preset statistical period; if the growth rate exceeds a preset growth rate threshold, adding a first elastic redundancy quantity positively correlated with the growth rate to the first initial recommended instance count to obtain a first target recommended instance count; if the growth rate does not exceed the preset growth rate threshold, using the first initial recommended instance count as the first target recommended instance count; and adjusting the number of write consumer process instances monitoring the secondary message queue to the first target recommended instance count.

[0118] In specific implementation, the processing latency of the secondary message queue, the backlog of unprocessed messages, and the number of write consumer process instances currently listening to the secondary message queue are input into a pre-trained first instance count prediction model. This model outputs a first initial recommended instance count, which is constrained to be greater than the number of write consumer process instances currently listening to the secondary message queue. The system maintains a dedicated pre-trained first instance count prediction model for scaling decisions. This model is trained on historical data, and its input features include the current processing latency of the target queue, the current backlog of unprocessed messages, and the current number of consumer instances. The first instance count prediction model is designed to output a theoretically sufficient number of consumer instances to meet performance requirements when sensing current load pressure; this is the first initial recommended instance count. The model is constrained in training and design to ensure that its recommended count is always greater than the current number of instances, thus ensuring that its recommendations always point towards scaling.

[0119] Furthermore, the growth rate of the unprocessed message backlog in the secondary message queue is obtained within a preset statistical period. The elastic scaling controller calculates the change in the unprocessed message backlog of this queue within the most recent preset statistical period. Specifically, the average growth rate is obtained by calculating the difference between the backlog at the end of the period and the beginning of the period, and then dividing it by the period duration. This rate reflects the severity of the load increase.

[0120] Furthermore, if the growth rate exceeds a preset growth rate threshold, a first elastic redundancy quantity positively correlated with the growth rate is added to the first initial recommended instance count, resulting in a first target recommended instance count. The controller compares the calculated growth rate with a preset threshold. If the growth rate exceeds this threshold, it indicates that the load is rapidly increasing, and expansion that only meets the current instantaneous demand may soon be insufficient. Therefore, an additional first elastic redundancy quantity needs to be added to the model-recommended first initial recommended instance count. This first elastic redundancy quantity is calculated based on the portion of the growth rate exceeding the threshold using a preset positive correlation function; the faster the growth, the greater the redundancy. The sum of these values ​​yields the final first target recommended instance count, which is a more forward-looking expansion target.

[0121] Furthermore, if the growth rate does not exceed a preset growth rate threshold, the first initial recommended instance count will be used as the first target recommended instance count. If the calculated growth rate does not exceed the preset threshold, it indicates that the current load growth is slow or stable. In this case, it is considered that the first initial recommended instance count output by the first instance count prediction model can adequately meet the short-term demand in the future, and therefore it is directly determined as the final first target recommended instance count.

[0122] Furthermore, the number of write consumer process instances listening to the secondary message queue is adjusted to the first target recommended number of instances. The controller calculates the number of new instances needed, which is the difference between the first target recommended number of instances and the current number of instances. Then, it calls the infrastructure management interface to create and start the corresponding number of new write consumer process instances, enabling them to join the listening to the queue, ultimately increasing the total number of instances to the first target recommended number.

[0123] This embodiment achieves more intelligent and accurate scaling decisions than simple threshold triggering by introducing a predictive model and combining it with dynamic adjustments based on the load growth rate. In terms of decision foresight, a pre-trained first instance count prediction model is used to comprehensively infer the reasonable number of instances required to meet performance goals based on multi-dimensional features such as current processing latency, backlog, and existing resources, rather than making a binary judgment based solely on whether a single threshold has been exceeded. This model-based approach can detect resource insufficiency trends earlier and provide quantitative scaling guidance. Furthermore, the model is constrained to output a recommended value greater than the current number of instances, forcing the decision logic to focus on solving capacity shortage problems and providing a clear scaling baseline. Furthermore, in terms of resource buffering, by introducing a first elastic redundancy quantity positively correlated with the growth rate, the system is equipped to handle sudden traffic spikes or accelerated growth scenarios. When a rapid increase in backlog is detected, the system not only alleviates the current bottleneck but also anticipates potential future growth in demand, scaling up to a target number slightly higher than the current computing requirements in one go. This is equivalent to preparing buffer resources in advance for the upcoming load peak, which can effectively avoid frequent and continuous small-step expansion operations during rapid growth, thereby reducing processing latency fluctuations caused by expansion lag and improving the stability and smoothness of the system when facing a surge in load.

[0124] In some preferred embodiments, reducing the number of write consumer process instances monitoring the secondary message queue includes: inputting the processing latency of the secondary message queue, the backlog of unprocessed messages, and the current number of write consumer process instances monitoring the secondary message queue into a pre-trained second instance count prediction model; outputting a second initial recommended instance count from the second instance count prediction model, wherein the output second initial recommended instance count is limited to be less than the current number of write consumer process instances monitoring the secondary message queue; obtaining the rate of decrease of the backlog of unprocessed messages in the secondary message queue within a preset statistical period; if the absolute value of the rate of decrease exceeds a preset absolute value threshold, subtracting a second elastic redundancy quantity positively correlated with the absolute value of the rate of decrease from the second initial recommended instance count to obtain a second target recommended instance count; if the absolute value of the rate of decrease does not exceed the preset absolute value threshold, using the second initial recommended instance count as the second target recommended instance count; and adjusting the number of write consumer process instances monitoring the secondary message queue to the second target recommended instance count.

[0125] In practice, the processing latency of the secondary message queue, the backlog of unprocessed messages, and the number of write consumer process instances currently listening to the secondary message queue are input into a pre-trained second instance number prediction model. This model outputs a second initial recommended instance number, which is constrained to be less than the number of write consumer process instances currently listening to the secondary message queue. The system maintains a pre-trained second instance number prediction model specifically for scaling down decisions. The input characteristics of this model are similar to those of the scaling up model, including the current processing latency, backlog, and current number of consumer instances. Its design goal is to output the minimum number of instances required to maintain an acceptable performance level under the current load, i.e., the second initial recommended instance number. This second instance number prediction model is constrained in training or by limiting its output to always be less than the current number of instances, thus ensuring that its recommendations always point towards scaling down.

[0126] Furthermore, the rate of decrease in the backlog of unprocessed messages in the secondary message queue is obtained within a preset statistical period. The elastic scaling controller calculates the decrease in the backlog of unprocessed messages in this queue within the most recent preset statistical period. The average rate of decrease is obtained by calculating the difference between the backlog at the beginning and end of the period and then dividing it by the period duration. This rate is usually negative, and its absolute value reflects the speed at which the load is reduced.

[0127] Furthermore, if the absolute value of the rate of decrease exceeds a preset absolute value threshold, a second elastic redundancy quantity positively correlated with the absolute value of the rate of decrease is subtracted from the second initial recommended instance count to obtain the second target recommended instance count. The controller compares the calculated absolute value of the rate of decrease with a preset threshold. If the absolute value exceeds the threshold, it indicates that the backlog is being rapidly cleared and the load is decreasing rapidly. Scaling down during a rapid decrease requires greater caution to avoid over-scaling and resulting in a sudden shortage of processing capacity. Therefore, instead of directly using the model-recommended second initial recommended instance count, a second elastic redundancy quantity is subtracted. This second elastic redundancy quantity is calculated based on the portion of the absolute value of the rate of decrease exceeding the threshold, using a positive correlation function; the faster the decrease, the greater the amount of redundancy subtracted. This means that during a rapid load decrease, the final determined second target recommended instance count will be less than the number directly recommended by the model, allowing for a larger scale-down, as the rapid decrease trend suggests that the low load may continue in the future.

[0128] Furthermore, if the absolute value of the reduction rate does not exceed a preset absolute value threshold, the second initial recommended instance count is used as the second target recommended instance count. If the absolute value of the reduction rate does not exceed the preset threshold, it indicates that the backlog reduction rate is gradual or stable. In this case, the second initial recommended instance count output by the second instance count prediction model is used as the final second target recommended instance count, and a normal scaling-down is performed.

[0129] Furthermore, the number of write consumer process instances listening to the secondary message queue is adjusted to the second target recommended number of instances. The controller calculates the number of instances that need to be reduced, which is the difference between the current number of instances and the second target recommended number of instances. Then, according to the safe scaling-down strategy, a corresponding number of write consumer process instances are selected and stopped, ultimately reducing the total number of instances to the second target recommended number of instances.

[0130] This embodiment achieves more intelligent and robust scaling-down decisions than simple threshold triggering by introducing a dedicated prediction model for scaling down and combining it with dynamic adjustments based on the load reduction rate. Regarding decision accuracy, a pre-trained second instance count prediction model can infer the minimum resources required to maintain service based on the current system state, avoiding the risks of blindly scaling down solely based on low latency and reduced backlog. The model is constrained to output a recommended value less than the current instance count, ensuring its recommendations focus on releasing redundant resources, providing a clear scaling-down baseline, and facilitating refined cost control. Furthermore, regarding scaling-down robustness, by analyzing the reduction rate and introducing a second elastic redundancy adjustment mechanism, the system can respond differently to different load decline patterns. When the load declines gradually, the system performs regular scaling down according to the model recommendation, safely reclaiming resources. Furthermore, when the load declines rapidly, the system recognizes this trend and allows for more aggressive scaling down by subtracting a redundancy amount positively correlated with the decline rate. This design is because a rapid decline often indicates that the traffic peak has passed, and the system will enter and maintain a lower load state, at which point reserving too many resources is of little benefit. Through this dynamic adjustment, the system can release excess resources back to the resource pool more quickly without immediately triggering a performance rebound, thus improving the turnover efficiency of resource utilization.

[0131] It should be further explained that the first instance count prediction model is a machine learning model specifically designed for system scaling scenarios, such as gradient boosting decision trees or deep neural networks. This first instance count prediction model takes the current processing latency of the target secondary message queue, the current backlog of unprocessed messages, and the number of write consumer process instances currently listening to that queue as input features. Its core function is to map and output a suggested total number of instances required to meet preset performance metrics, based on the load-resource relationship learned from historical system operation data; this is called the first initial recommended instance count. During the training phase, this first instance count prediction model is given a structural constraint through a specific loss function or sample construction method: for any input, its output first initial recommended instance count must be greater than the current instance count carried in the input features. This constraint ensures that the prediction logic of this first instance count prediction model is essentially aimed at identifying resource gaps and suggesting additional resources.

[0132] Furthermore, the second instance count prediction model is a machine learning model specifically designed for system scaling scenarios, such as gradient boosting decision trees or deep neural networks. This second instance count prediction model receives the exact same set of input features as the first instance count prediction model. Its core function is to learn, based on historical data, the minimum resource boundary required to maintain performance targets and output a suggested minimum total number of instances, called the second initial recommended instance count. During the training phase, this second instance count prediction model is subject to a structural constraint: for any input, its output second initial recommended instance count must be less than the current instance count carried in the input features. This constraint ensures that the prediction logic of this second instance count prediction model is essentially geared towards identifying resource redundancy and suggesting resource release.

[0133] Specifically, the training method for the first instance count prediction model and / or the second instance count prediction model includes: collecting monitoring data from multiple secondary message queues within a historical time period as training samples. The features of each sample include, but are not limited to: processing latency, backlog of unprocessed messages, current number of consumer instances, and message production rate. The label of the sample is the minimum or optimal number of consumer instances determined by manual or automated rules to meet performance requirements under the monitored data state. Gradient boosting decision tree algorithm is used for training, with the goal of minimizing the error between the predicted instance count and the labeled instance count. The constraint of "limiting the output to be greater than (or less than) the current instance count" is implemented by adding post-processing logic after the model inference stage: for the first instance count prediction model, if the original output value of the model is not greater than the current instance count, it is adjusted to the current instance count plus one; for the second instance count prediction model, if the original output value of the model is not less than the current instance count, it is adjusted to the current instance count minus one.

[0134] The two models mentioned above together constitute a complete resource decision-making system. The first instance quantity prediction model is responsible for providing expansion guidance during the load increase period, and the second instance quantity prediction model is responsible for providing shrinkage guidance during the load decrease period, thereby realizing intelligent two-way elastic scaling.

[0135] The core invention of the aforementioned first and second instance count prediction models lies in their application to dynamic recommendation scenarios for the number of message queue consumer instances, supplemented by redundancy adjustments to the growth / decrease rates, thereby achieving intelligent elastic scaling. The models can be trained based on historical monitoring data using mature machine learning algorithms in the field (such as gradient boosting decision trees), and this invention does not specifically limit their training.

[0136] Please see Figure 2 , Figure 2This is a schematic block diagram of a computer device provided in an embodiment of this application. The computer device 500 can be a terminal or a server, wherein the server can be a standalone server or a server cluster composed of multiple servers.

[0137] The computer device 500 includes a processor 502, a memory, and a network interface 505 connected via a system bus 501. The memory may include a non-volatile storage medium 503 and internal memory 504.

[0138] The non-volatile storage medium 503 can store the operating system 5031 and the computer program 5032. When the computer program 5032 is executed, it enables the processor 502 to execute a high-concurrency message real-time storage method.

[0139] The processor 502 provides computing and control capabilities to support the operation of the entire computer device 500.

[0140] The internal memory 504 provides an environment for the execution of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can execute a high-concurrency message real-time storage method.

[0141] The network interface 505 is used for network communication with other devices. Those skilled in the art will understand that the above structure is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device 500 to which the present application is applied. A specific computer device 500 may include more or fewer components than shown in the figures, or combine certain components, or have different component arrangements.

[0142] The processor 502 is used to run a computer program 5032 stored in a memory to implement the steps of a high-concurrency real-time message storage method provided in any of the above method embodiments.

[0143] It should be understood that in the embodiments of this application, the processor 502 may be a central processing unit (CPU), or it may be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor.

[0144] It will be understood by those skilled in the art that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program may be stored in a storage medium, which is a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the process steps of the embodiments of the above methods.

[0145] Therefore, the present invention also provides a storage medium. This storage medium can be a computer-readable storage medium. The storage medium stores a computer program. When executed by a processor, the computer program causes the processor to perform the steps of a high-concurrency message real-time storage method provided in any of the above method embodiments.

[0146] The storage medium is a physical, non-transient storage medium, such as a USB flash drive, external hard drive, read-only memory (ROM), magnetic disk, or optical disk, or any other physical storage medium capable of storing program code. The computer-readable storage medium can be non-volatile or volatile.

[0147] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0148] In the several embodiments provided by this invention, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative. For example, the division of each unit is merely a logical functional division, and there may be other division methods in actual implementation. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed.

[0149] In the methods of this invention, the order can be adjusted, merged, and reduced according to actual needs. The units in the apparatus of this invention can be merged, divided, and reduced according to actual needs. Furthermore, the functional units in the various embodiments of this invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0150] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a terminal, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention.

[0151] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.

[0152] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from its spirit and scope. Since these modifications and variations fall within the scope of the claims and their equivalents, this invention also intends to include these modifications and variations.

[0153] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in the present invention, and these modifications or substitutions should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A high-concurrency message real-time storage method, characterized in that, include: Receive messages sent by the client and write the messages to the first-level cache; In response to the message being written to the first-level cache, a success response is returned to the client. Generate a globally unique sequence identifier for the message, the sequence identifier being used to determine the temporal order of the message; The target storage shard corresponding to the message is determined according to the preset sharding rules; The message carrying the sequence identifier is published to a secondary message queue bound to the target storage shard, the secondary message queue being used to buffer messages in the order of the sequence identifier; Listen to the secondary message queue and, according to the sequence identifier, write messages in batches to the tertiary persistent storage unit corresponding to the target storage shard. This step includes: For each target storage shard, deploy at least one independent write consumer process to the corresponding secondary message queue. The write consumer process performs the following loop operation: A batch retrieval request is initiated to the secondary message queue to obtain a batch of messages sorted by sequence identifier; If a message is retrieved, the batch size and waiting timeout for this batch write operation are dynamically calculated based on the current system load, and the retrieved messages are temporarily stored in memory after being sorted by the sequence identifier. When the number of messages temporarily stored in memory reaches the batch size or the waiting time reaches the timeout period, the messages temporarily stored in memory are written as a transaction batch to the third-level persistent storage unit in the order of the sequence identifier. If no messages are retrieved, the write consumer process registers an asynchronous push listener with the secondary message queue and enters a blocking waiting state. When the asynchronous push listener is triggered, indicating that a new message has arrived in the secondary message queue, the step of initiating a batch pull request to the secondary message queue is re-executed.

2. The high-concurrency message real-time storage method according to claim 1, characterized in that, The step of generating a globally unique sequence identifier for the message includes: Determine the session identifier corresponding to the message; Query the sequence number corresponding to the session identifier; Based on the distributed lock mechanism, the sequence number corresponding to the session identifier is atomically incremented to generate the sequence identifier of the message; The mapping relationship between the sequence identifier and the session identifier is updated to a globally shared metadata store.

3. The high-concurrency message real-time storage method according to claim 2, characterized in that, The step of determining the target storage shard corresponding to the message according to the preset sharding rules includes: If the message is a one-way chat message, then calculate the first hash value based on the recipient user identifier corresponding to the message, and take the first hash value modulo the preset total number of shards to obtain the first target storage shard; If the message is a group message, then the second hash value is calculated based on the group identifier corresponding to the message, and the second hash value is modulo the preset total number of shards to obtain the second target storage shard.

4. The high-concurrency message real-time storage method according to claim 3, characterized in that, The method further includes: Establish an index relationship between the group member user identifier of the group corresponding to the message and the second target storage shard, and store the index relationship in a separate routing table.

5. The high-concurrency message real-time storage method according to claim 4, characterized in that, The method further includes: The processing latency and backlog of unprocessed messages for each of the secondary message queues are collected separately. If the processing delay of the secondary message queue exceeds a preset delay threshold, and the backlog of unprocessed messages in the secondary message queue continues to grow, the number of write consumer process instances monitoring the secondary message queue will be increased. If the processing latency of the secondary message queue is lower than the preset recovery threshold, and the backlog of unprocessed messages in the secondary message queue continues to decrease, the number of write consumer process instances listening to the secondary message queue will be reduced.

6. The high-concurrency message real-time storage method according to claim 5, characterized in that, Increasing the number of write consumer process instances monitoring the secondary message queue includes: The processing latency of the secondary message queue, the backlog of unprocessed messages, and the number of write consumer process instances currently listening to the secondary message queue are input into a pre-trained first instance number prediction model. The first instance number prediction model outputs a first initial recommended instance number, wherein the first instance number prediction model limits the output of the first initial recommended instance number to be greater than the number of write consumer process instances currently listening to the secondary message queue. Obtain the growth rate of the backlog of unprocessed messages in the secondary message queue within a preset statistical period; If the growth rate exceeds a preset growth rate threshold, then a first elastic redundancy quantity positively correlated with the growth rate is added to the first initial recommended instance quantity to obtain the first target recommended instance quantity; If the growth rate does not exceed the preset growth rate threshold, the first initial recommended instance number will be used as the first target recommended instance number. Adjust the number of write consumer process instances listening to the secondary message queue to the number of the first target recommended instances.

7. The high-concurrency message real-time storage method according to claim 5, characterized in that, The reduction of the number of write consumer process instances listening to the secondary message queue includes: The processing latency of the secondary message queue, the backlog of unprocessed messages, and the number of write consumer process instances currently listening to the secondary message queue are input into a pre-trained second instance number prediction model. The second instance number prediction model outputs a second initial recommended instance number, wherein the output second initial recommended instance number is limited to be less than the number of write consumer process instances currently listening to the secondary message queue. Obtain the rate at which the backlog of unprocessed messages in the secondary message queue decreases within a preset statistical period; If the absolute value of the reduction rate exceeds a preset absolute value threshold, then a second elastic redundancy quantity that is positively correlated with the absolute value of the reduction rate is subtracted from the second initial recommended instance quantity to obtain the second target recommended instance quantity. If the absolute value of the reduction rate does not exceed the preset absolute value threshold, the second initial recommended instance number will be used as the second target recommended instance number. Adjust the number of write consumer process instances listening to the secondary message queue to the second target recommended number of instances.

8. A computer device, characterized in that, The computer device includes a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the method as described in any one of claims 1-7.

9. A computer-readable storage medium, characterized in that, The storage medium stores a computer program that, when executed by a processor, can implement the method as described in any one of claims 1-7.