High-throughput real-time data aggregation system for massive measurement points of oil and gas fields
By employing consistent hashing sharding and batch write real-time aggregation technologies, the problems of write latency and slow query response in the processing of massive measurement point data in oil and gas fields have been solved, achieving high throughput, real-time aggregation, and fast querying, thereby improving the efficiency of oil and gas field production data processing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- VICTORY SOFT CORP
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies suffer from low write throughput, high aggregation calculation latency, and slow query response speed in processing massive amounts of measurement data from oil and gas fields, making it difficult to meet the needs of real-time monitoring and fault early warning.
The method employs consistent hash sharding, batch write real-time aggregation, multi-granularity in-memory tables and primary key indexes. It achieves uniform data mapping and parallel processing through hash location calculation module and data sharding routing module, combined with real-time write aggregation module for batch retrieval and sorted writing, and establishes primary key index for fast querying.
It achieves high-throughput writing of massive amounts of oil and gas field measurement data, millisecond-level real-time aggregation, and fast query response, improving the timeliness of data processing and the operational flexibility of the system, and supporting smooth scaling up and down of the cluster.
Smart Images

Figure CN122309566A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of data processing technology, and in particular relates to a high-throughput real-time data aggregation system for massive measurement points in oil and gas fields. Background Technology
[0002] Tens of thousands of sensors, remote terminal units, and distributed control systems are deployed during the exploration, development, and production of oil and gas fields. These devices continuously collect data from various measuring points, such as pressure, temperature, flow, and vibration. In order to monitor the production situation in real time and make optimization decisions, it is necessary to unify and aggregate these scattered heterogeneous data sources so that the upper-level application system can quickly obtain the statistical characteristics of specified measuring points within a specific time range. The data aggregation system is a key intermediate link connecting the field equipment layer and the data analysis layer, and its performance directly determines the response time of operations such as production scheduling, fault warning, and trend analysis.
[0003] In existing technologies, traditional data aggregation methods are mostly built on general-purpose relational databases. Their storage engines are optimized for transaction processing rather than high-frequency writes, resulting in severe write bottlenecks when processing massive amounts of measurement point data. They cannot support the concurrent write requirements of millions of data points per second. At the same time, existing methods usually use offline batch processing for aggregation calculations. After the data is written, it needs to wait for periodic tasks to be triggered before the statistical calculations can be completed, resulting in a delay of minutes or even hours in the output of aggregation results. This cannot meet the requirements of real-time monitoring scenarios for second-level data response. In addition, existing methods lack index optimization for time series features during query processing. When it is necessary to retrieve aggregated data over a long period of time, it is often necessary to scan a large number of original records. The query response time deteriorates sharply with the increase of data volume, making it difficult to support the real-time requirements of interactive analysis. Therefore, there is an urgent need to develop a high-throughput real-time data aggregation system for massive measurement points in oil and gas fields to solve the problems of low data write throughput, high aggregation calculation latency, and slow query response speed, and improve the processing efficiency of oil and gas field production data from collection to application. Summary of the Invention
[0004] In view of the shortcomings of the prior art, the purpose of this invention is to provide a high-throughput real-time data aggregation system for massive oil and gas field measurement points. Through consistent hash sharding, batch writing for real-time aggregation, multi-granularity memory tables and primary key indexes, it can achieve high-throughput writing of massive oil and gas field measurement point data, millisecond-level real-time aggregation, fast query response, and support smooth cluster scaling.
[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows: A high-throughput real-time data aggregation system for massive monitoring points in oil and gas fields, including: The measurement point data parsing module is used to parse the measurement point data packets of oil and gas fields according to the protocol to obtain the raw data points. The raw data points include the measurement point identifier, the data generation timestamp, and the measurement point value. The hash location calculation module is used to calculate the hash value of the test point identifier according to the preset consistent hash function, and obtain the location information on the hash ring; The data fragmentation routing module is used to perform data fragmentation routing on the measurement point identifier based on the position information on the hash ring, so as to determine the queue identifier in the target data fragmentation queue corresponding to the position information, and send the original data point to the target data fragmentation queue corresponding to the queue identifier; The real-time write aggregation module is used to write the raw data points of the target data shard queue into a preset distributed time series database in batches. After the raw data points are written in batches, the raw data points in the distributed time series database are aggregated in real time to obtain the first-level aggregation result data. The aggregation result storage and indexing module is used to store the primary aggregation result data in the aggregation result storage area of the oil and gas field and to build a primary key index for the aggregation result storage area. The aggregation query response module is used to perform aggregation queries on the aggregation result storage area based on the primary key index when receiving an aggregation query request from an oil and gas field, and return the retrieved target first-level aggregation result data to the sender of the aggregation query request.
[0006] Preferably, in the measurement point data parsing module, the process of obtaining the original data points is as follows: Capture data packets from measurement points sent by field equipment in oil and gas fields; Parse the header fields of the measurement point data packet to obtain the protocol type and device identifier of the measurement point data packet; Based on the protocol type, the corresponding protocol parser is called from the protocol library of the oil and gas field to decode the payload field of the measurement point data packet and obtain the measurement point identifier, data generation timestamp and measurement point value; Outlier removal and dimension normalization are performed on the measurement point values to obtain the original data points of the oil and gas field.
[0007] Preferably, in the hash position calculation module, the process of obtaining the position information on the hash ring is as follows: Extract the measurement point identifier from the original data points and obtain the byte sequence corresponding to the measurement point identifier; The byte sequence is input into a preset consistent hash function, and the byte sequence is processed according to the consistent hash function to generate a hash value; The hash value is projected onto a preset hash ring to determine the coordinates of the hash value's landing point on the hash ring; Based on the landing point coordinates, query the hash ring partition information pre-stored in the routing mapping table to obtain the queue identifier of the target data fragment queue corresponding to the landing point coordinates; The original data points are associated with the queue identifier to obtain the position information on the hash ring, and data points to be sent are generated.
[0008] Preferably, the routing mapping table is constructed as follows: Create an equal number of data shard queues based on the total number of data shards in the distributed time-series database; The hash ring is evenly divided into continuous segments identical to the data sharding queues of equal size, and corresponding queue identifiers are assigned to the continuous segments. Store the mapping relationship between consecutive segments and their corresponding queue identifiers in the routing mapping table.
[0009] Preferably, in the data fragmentation routing module, the process of determining the queue identifier in the target data fragment queue corresponding to the location information and sending the original data point to the target data fragment queue corresponding to the queue identifier is as follows: Parse the queue identifier and the original data point from the data points to be sent; Locate the target data shard queue corresponding to the queue identifier; The original data points are appended to the tail of the target data shard queue to form the data points to be processed in the queue; Update the current length counter of the target data shard queue and generate queue status information.
[0010] Preferably, the process of writing the data into the aggregation module in real time to obtain the first-level aggregation result data is as follows: A preset number of raw data points are pulled from the target data shard queue to form a batch of data to be written; Sort the batch data to be written in ascending order according to the data generation timestamp to generate an ordered data point sequence. The ordered data point sequence is written in batches into the data shards of the distributed time series database to generate the stored original data points; After the original data points have been written, an aggregation calculation task is triggered for the data shards. The aggregation computing task reads the stored original data points whose timestamps fall within the current aggregation window from the data shards and generates a set of window data points; The window data point set is grouped according to the measurement point identifier to generate measurement point group data; Based on a preset aggregation function, the measurement point values of the measurement point group data are aggregated and calculated in real time to obtain temporary aggregated values; The temporary aggregated value is merged with the intermediate aggregated result of the previous aggregated window in the measurement point group data to generate an updated intermediate aggregated result, and the updated intermediate aggregated result is output as the first-level aggregated result data.
[0011] Preferably, the formula for calculating the temporary aggregation value is: in, This is a temporary aggregation value, where i is the index of the original data point, and n is the number of original data points in the current aggregation window for this measurement point group. This represents the measurement value of the i-th original data point within the current aggregation window. All within the current aggregation window The arithmetic mean, This is the preset volatility coefficient weighting factor.
[0012] Preferably, in the aggregation result storage and indexing module, the process of storing the first-level aggregation result data in the aggregation result storage area of the oil and gas field and constructing a primary key index for the aggregation result storage area is as follows: The first-level aggregation result data includes the measurement point identifier, aggregation window start time, aggregation window end time, and aggregation value; The first-level aggregation results data are sorted according to the measurement point identifier and the start time of the aggregation window to generate ordered aggregation records; The ordered aggregation records are appended to the data file in the aggregation result storage area to generate persistent aggregation data; In the index file of the aggregation result storage area, a primary key index entry is created for the aggregation record, consisting of the measurement point identifier, the start time of the aggregation window, and the end time of the aggregation window. The primary key index entry points to the storage offset of the aggregation record in the data file.
[0013] Preferably, in the aggregation query response module, the process of performing an aggregation query on the aggregation result storage area and returning the retrieved target first-level aggregation result data to the sender of the aggregation query request is as follows: When an aggregated query request from an oil and gas field is received, the target measurement point identifier, query start time, and query end time are parsed from the message body of the aggregated query request. Based on the target measurement point identifier, query start time, and query end time, search for matching primary key index entries in the index file of the aggregation result storage area to obtain the corresponding list of storage offsets. Based on the list of storage offsets, the corresponding aggregation result data is read from the data file in the aggregation result storage area to generate the query result dataset; The query result dataset is encapsulated into a response message in chronological order and returned to the sender of the aggregate query request.
[0014] The present invention has the following beneficial effects: This invention, through the coordinated operation of a hash location calculation module and a data sharding routing module, uses a consistent hash function to uniformly map and queue-route massive measurement point identifiers, achieving horizontal decomposition and parallel processing of data write pressure. This fundamentally eliminates the bottleneck limitation of traditional single-point writing, enabling the system to linearly expand its write throughput by increasing the number of data shards. Simultaneously, the real-time write aggregation module employs a batch retrieval, sorted writing, and write-triggered streaming aggregation mechanism, completing minute-level window statistical calculations the instant data persistence, compressing the aggregation result output latency from minutes to milliseconds, significantly improving the timeliness of aggregated data, and providing second-level response data support for real-time monitoring and fault early warning in oil and gas fields.
[0015] This invention establishes a multi-level aggregation result data system and primary key index structure through the aggregation result storage and indexing module. This enables the synchronous generation and hierarchical storage of aggregation results at multiple time granularities, from minutes to hours to days. The sorted append writing method fully utilizes the continuous disk write performance. The construction of the primary key index allows query requests to directly locate the precise storage location of the data file through the measurement point identifier and time range, avoiding the huge input / output overhead caused by full file scanning. The aggregation query response module further improves query throughput based on the batch reading mechanism of the storage offset list, ensuring millisecond-level query response capability in massive aggregation data scenarios. At the same time, the continuous segment division of the hash ring and the dynamic maintenance mechanism of the routing mapping table ensure that only a small number of data mapping relationships need to be migrated during the cluster expansion and contraction process, greatly improving the system's operational flexibility and long-term operational stability under high load. Attached Figure Description
[0016] Figure 1 This is a system architecture diagram of the present invention; Figure 2 This is a comparison chart of the aggregation curves of the real-time write aggregation module under different numbers of data points in different windows in this embodiment of the invention; Figure 3 This is a comparison chart of the temporary aggregated value, the original sampled value, and the arithmetic mean of the real-time written aggregation module in this embodiment of the invention; Figure 4 This is a comparison chart of aggregation curves of the real-time writing aggregation module under different volatility data in an embodiment of the present invention; Figure 5 This is a comparison chart of aggregation curves of the real-time write aggregation module at different window granularities in an embodiment of the present invention. Detailed Implementation
[0017] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
[0018] like Figure 1 As shown, the high-throughput real-time data aggregation system 100 for a large number of oil and gas field measuring points can be set up in a cloud server. In terms of implementation, it can be used as one or more service devices, or as an application installed in the cloud (such as a mobile service operator's server, server cluster, etc.), or it can be developed into a website. Depending on the implemented functions, the high-throughput real-time data aggregation system 100 for a large number of oil and gas field measuring points may include a measuring point data parsing module 101, a hash location calculation module 102, a data sharding and routing module 103, a real-time writing and aggregation module 104, an aggregation result storage and indexing module 105, and an aggregation query response module 106.
[0019] A module, also known as a unit, refers to a series of computer program segments that can be executed by the processor of an electronic device and perform a fixed function. These segments are stored in the memory of the electronic device. Each of the modules mentioned above can be implemented independently and can call other modules. In practical applications, these modules can be located in the same device or different devices, or they can be located in virtual devices, such as service instances in a cloud server.
[0020] The measurement point data parsing module 101 is used to parse the measurement point data packets of the oil and gas field according to the protocol to obtain the raw data points. The raw data points include the measurement point identifier, the data generation timestamp, and the measurement point value. The specific process is as follows: Capture data packets from measurement points sent by field equipment in oil and gas fields; Parse the header fields of the measurement point data packet to obtain the protocol type and device identifier of the measurement point data packet; Based on the protocol type, the corresponding protocol parser is called from the protocol library of the oil and gas field to decode the payload field of the measurement point data packet and obtain the measurement point identifier, data generation timestamp and measurement point value; Outlier removal and dimension normalization are performed on the measurement point values to obtain the original data points of the oil and gas field.
[0021] The measurement point data parsing module 101 continuously captures measurement point data packets sent by field devices in the oil and gas field through a network monitoring service deployed on the oil and gas field data acquisition front-end unit. These field devices include pressure sensors, temperature sensors, flow meters, and remote terminal units. The collected data is sent in the form of data packets through industrial Ethernet or serial communication links. The network monitoring service captures all data frames flowing through from the physical link layer based on raw sockets, and filters out specific data packets belonging to measurement point data transmission according to IP address and port number, thereby obtaining complete measurement point data packets.
[0022] A fixed-length header is extracted from the captured data packets from the measurement points. Based on the protocol specifications of commonly used industrial communication protocols in oil and gas fields, such as Modbus TCP, OPC UA, MQTT, and IEC 104, the protocol identifier byte, protocol version number, and device address field in the header are read sequentially. The protocol type of the data packet is determined by comparing it with the predefined protocol signature. At the same time, the field device number or IP address that sent the data packet is parsed from the header as the device identifier, thereby obtaining the protocol type and device identifier corresponding to each data packet from the measurement point.
[0023] According to the protocol type, the corresponding protocol parser is called from the protocol library of the oil and gas field to decode the payload field of the measurement point data packet, thereby obtaining the measurement point identifier, data generation timestamp, and measurement point value. The specific process is as follows: the protocol library pre-integrates parser components for different industrial protocols. Each parser contains the message format definition and field extraction rules of the corresponding protocol. The measurement point data parsing module 101 searches for and loads the corresponding protocol parser in the protocol library according to the protocol type obtained in the previous step. The parser then parses the payload field of the remaining part of the measurement point data packet byte by byte according to the protocol specification, extracting the measurement point identifier representing the measurement point number, the data generation timestamp representing the time of data generation, and the measurement point value representing the physical quantity value, thereby obtaining a set of unprocessed raw triples.
[0024] The decoded measurement point values are compared with the pre-configured reasonable range for that measurement point. If the value exceeds the upper limit or falls below the lower limit, it is judged as an outlier and removed, and will not proceed to the next processing step. For the retained measurement point values, according to the dimensional definition of the measurement point, the values of all measurement points are uniformly converted to the standard dimensions under the International System of Units (SI) by multiplying or dividing by the corresponding conversion coefficient. For example, pressure is uniformly converted to megapascals and temperature is uniformly converted to degrees Celsius. After the conversion is completed, the measurement point identifier, data generation timestamp, and processed measurement point values are combined to form the original oil and gas field data points that can be directly used by subsequent modules.
[0025] The measurement point data parsing module 101 can convert heterogeneous measurement point data into standardized raw data points by real-time capture and accurate parsing of various protocol data packets in the oil and gas field. This effectively solves the problem of complex data source formats and inconsistent naming in oil and gas fields, providing a clean and consistent data foundation for subsequent high-throughput writing and real-time aggregation. At the same time, outlier removal and unit normalization significantly improve data quality and avoid aggregation errors caused by abnormal data or inconsistent units, thereby ensuring the accuracy and reliability of the entire system's processing results.
[0026] The hash location calculation module 102 is used to calculate the hash value of the test point identifier according to a preset consistent hash function, and obtain the location information on the hash ring. The process is as follows: Extract the measurement point identifier from the original data points and obtain the byte sequence corresponding to the measurement point identifier; The byte sequence is input into a preset consistent hash function, and the byte sequence is processed according to the consistent hash function to generate a hash value; The hash value is projected onto a preset hash ring to determine the coordinates of the hash value's landing point on the hash ring; Based on the landing point coordinates, query the hash ring partition information pre-stored in the routing mapping table to obtain the queue identifier of the target data fragment queue corresponding to the landing point coordinates; The original data points are associated with the queue identifier to obtain the position information on the hash ring, and data points to be sent are generated.
[0027] The process of constructing the routing mapping table is as follows: Create an equal number of data shard queues based on the total number of data shards in the distributed time-series database; The hash ring is evenly divided into continuous segments identical to the data sharding queues of equal size, and corresponding queue identifiers are assigned to the continuous segments. Store the mapping relationship between consecutive segments and their corresponding queue identifiers in the routing mapping table.
[0028] The measuring point identifier is extracted from the original data points. This measuring point identifier is a unique number string corresponding to each sensor or monitoring device in the oil and gas field. By reading the identifier field in the original data point data structure, the original byte sequence of the measuring point identifier stored in the computer memory is obtained. This byte sequence is a binary data stream composed of ASCII code or Unicode encoding, which serves as the input material for subsequent hash operations.
[0029] The byte sequence corresponding to the obtained test point identifier is input into the preset consistent hash function processing unit. The consistent hash function is a mapping rule that maps input data of arbitrary length to a fixed-length output space. The hash position calculation module 102 calls the function to perform byte-by-byte cyclic shift and XOR operation on the byte sequence. Through multiple rounds of mixed operations, each bit of information in the byte sequence is fully diffused, and finally an unsigned integer hash value is generated. This hash value uniquely represents the mapping result of the test point identifier in the hash space.
[0030] The calculated hash value is projected onto a preset hash ring, which is a circular virtual space with its values covering all integers from zero to the maximum value. The hash position calculation module 102 maps the hash value to the total length of the hash ring to determine the specific angular position of the value on the circumference of the ring, thereby obtaining the landing point coordinates of the hash value on the hash ring. These landing point coordinates are a position mark between the start boundary and the end boundary of the hash ring.
[0031] Based on the landing point coordinates determined in the previous step, a search is performed in the pre-built and stored routing mapping table. The routing mapping table records the correspondence between each continuous segment on the hash ring and the data shard queue. The hash position calculation module 102 traverses the segment range records in the routing mapping table, finds the segment to which the landing point coordinates belong, and reads the queue identifier of the target data shard queue bound to it from the record corresponding to that segment. This queue identifier is a name string used to uniquely locate a physical queue in the message middleware.
[0032] The original data point is associated with the found queue identifier. This involves adding a field to the data structure of the original data point and writing the queue identifier into this field. This marks the target queue to which the original data point should be routed. After the association operation is completed, the original data point carrying the queue identifier is encapsulated into a new data unit called the data point to be sent. This data point to be sent contains all the information of the original data point and a clear routing destination identifier.
[0033] When the hash location calculation module 102 is constructing the routing mapping table, it first obtains the total number of data shards currently configured in the distributed time-series database. This total number is the number of shards preset by the database administrator based on the cluster size and data volume. Each shard corresponds to an independent storage unit in the database. Based on this total number, it creates a data shard queue with an exact number of shards in the message middleware. Each queue corresponds to a data processing channel, which is used to receive and temporarily store the data to be written to the corresponding database shard.
[0034] The circular virtual space of the hash ring is evenly divided according to the total number of data shard queues. That is, the total length of the hash ring is divided by the number of queues to get the length of each segment. Thus, the entire hash ring is divided into continuous segments of equal length that are connected end to end. The number of segments is exactly the same as the number of data shard queues. Then, a unique queue identifier is assigned to each continuous segment. This queue identifier is the name of the data shard queue created earlier. Each segment is bound to a queue in a one-to-one correspondence.
[0035] The starting boundary value, ending boundary value, and queue identifier assigned to each segment are combined into a record. All such records are organized into a table-like data structure and stored in memory for fast lookup. This table is the routing mapping table. When the hash location calculation module 102 performs data point routing, it can quickly locate the corresponding queue identifier based on the landing point coordinates by querying this table.
[0036] The hash location calculation module 102 uses a consistent hash function to uniformly map massive measurement point identifiers onto the hash ring. Combined with a dynamically scalable routing mapping table, it achieves a balanced distribution of measurement point data across different processing queues, fundamentally eliminating single-point write bottlenecks and providing the system with linearly scalable throughput. At the same time, the segmentation mechanism of the routing mapping table ensures that when the number of database shards is adjusted, only a small amount of data mapping relationships need to be migrated, greatly reducing the data redistribution overhead during cluster scaling and ensuring the system's stability and operational flexibility under high load.
[0037] The data fragmentation routing module 103 is used to perform data fragmentation routing on the measurement point identifier based on the position information on the hash ring, to determine the queue identifier in the target data fragmentation queue corresponding to the position information, and to send the original data point to the target data fragmentation queue corresponding to the queue identifier. The process is as follows: Parse the queue identifier and the original data point from the data points to be sent; Locate the target data shard queue corresponding to the queue identifier; The original data points are appended to the tail of the target data shard queue to form the data points to be processed in the queue; Update the current length counter of the target data shard queue and generate queue status information.
[0038] The data fragmentation routing module 103 parses the queue identifier and the original data point from the data point to be sent generated by the hash position calculation module 102. The data point to be sent is an encapsulated data structure containing two main parts: a queue identifier string written during the hash position calculation stage and the original measurement point data triplet. The data fragmentation routing module 103 extracts the queue identifier and the original data point by reading the offset of a specific field of the data structure, thereby obtaining the target queue name and the actual data content to be processed for subsequent routing.
[0039] Based on the parsed queue identifier, the target data shard queue corresponding to the queue identifier is located in the message middleware cluster. Multiple data shard queues, equal to the number of shards in the distributed time-series database, are pre-created in the message middleware. Each queue has a unique name identifier. The data shard routing module 103 passes the queue identifier to the middleware's name resolution service through the queue addressing mechanism in the client library provided by the message middleware. This service returns a handle or reference to the physical queue, thereby locking the target data shard queue to be written.
[0040] The parsed raw data points are appended to the tail of the target data fragment queue obtained by positioning. The data fragment routing module 103 sends the raw data points as a message unit to the target queue through the producer interface of the message middleware. After receiving the message, the queue places it at the end of the queue storage structure and arranges it together with other data points that entered the queue in the first-in-first-out order to form a sequential message sequence. The data points written into the queue and waiting for subsequent processing are called the data points to be processed in the queue.
[0041] After completing the append write operation of the original data points, the current length counter of the target data shard queue is updated. The message middleware maintains a counter for each data shard queue to record the number of data points currently stored in the queue in real time. After successful writing, the data shard routing module 103 triggers the auto-increment operation of the counter to increment the counter value by one. At the same time, it can obtain information such as the current queue length, write timestamp, and queue health status. This information is combined to generate queue status information for subsequent flow control and load monitoring.
[0042] The data sharding routing module 103 ensures that each raw data point is accurately delivered to the target data sharding queue corresponding to its measurement point identifier hash value through precise queue identifier resolution and targeted writing mechanism. This achieves horizontal decomposition and parallel processing of data write pressure. At the same time, the method of appending to the tail of the queue ensures the orderliness of the data and avoids data contention and lock contention. The queue status information generated by updating the queue length counter provides the system with real-time load awareness, enabling the write traffic to be dynamically adjusted according to the queue status, further improving the stability and throughput efficiency of the entire data aggregation link.
[0043] The real-time write aggregation module 104 is used to write the raw data points of the target data shard queue in batches to a preset distributed time-series database. After the raw data points are written in batches, real-time aggregation calculations are performed on the raw data points in the distributed time-series database to obtain the first-level aggregation result data. The process is as follows: A preset number of raw data points are pulled from the target data shard queue to form a batch of data to be written; Sort the batch data to be written in ascending order according to the data generation timestamp to generate an ordered data point sequence. The ordered data point sequence is written in batches into the data shards of the distributed time series database to generate the stored original data points; After the original data points have been written, an aggregation calculation task is triggered for the data shards. The aggregation computing task reads the stored original data points whose timestamps fall within the current aggregation window from the data shards and generates a set of window data points; The window data point set is grouped according to the measurement point identifier to generate measurement point group data; Based on a preset aggregation function, the measurement point values of the measurement point group data are aggregated and calculated in real time to obtain temporary aggregated values; The temporary aggregated value is merged with the intermediate aggregated result of the previous aggregated window in the measurement point group data to generate an updated intermediate aggregated result, and the updated intermediate aggregated result is output as the first-level aggregated result data.
[0044] The formula for calculating the temporary aggregation value is: in, This is a temporary aggregation value, where i is the index of the original data point, and n is the number of original data points in the current aggregation window for this measurement point group. This represents the measurement value of the i-th original data point within the current aggregation window. All within the current aggregation window The arithmetic mean, This is the preset volatility coefficient weighting factor.
[0045] For the aggregation window, the time granularity in the aggregation window is maintained in an independent region to obtain the intermediate aggregation result memory area corresponding to the time granularity. Using the measurement point identifier as the key, store the intermediate aggregation results of measurement points in the time granularity to construct an in-memory table of intermediate aggregation results in the time granularity; When a temporary aggregate value is generated, the temporary aggregate value is updated in the intermediate aggregate result memory table at the corresponding time granularity according to the time granularity of the current aggregation window; The aggregation results in the memory table with a coarser time granularity are output as the first-level aggregation result data.
[0046] The real-time write aggregation module 104 pulls a preset number of raw data points from the target data shard queue written by the data shard routing module 103. The pull operation is performed in batch reading mode through the consumer interface of the message middleware. Each time, a fixed number of data records are taken out from the head of the queue at once. These raw data points are removed from the queue temporary storage area and enter the processing flow. They are combined together to form a batch of data to be written. This batch of data contains multiple raw data points belonging to the same data shard, which prepares for subsequent batch writing.
[0047] The batch data to be written is sorted in ascending order according to the timestamp field generated within each original data point. The sorting operation is performed in memory. The order of data points within the batch is rearranged by comparing the size of the timestamp values, ensuring that earlier data points with smaller timestamps are placed first and more recent data points with larger timestamps are placed last. After sorting, the originally unordered batch data becomes an ordered sequence of data points arranged strictly according to time. This sequence ensures the temporal continuity of the data when it is written to the database.
[0048] The ordered data point sequence is written in batches to the data shards corresponding to the data shard queue in the distributed time series database. The write operation sends the entire sequence to the target data shard at once through the database's batch insert interface. After receiving this batch of data, the database appends it to the storage structure of the shard to complete persistent storage. At this point, these data points have been solidified in the database's disk file and become stored original data points that can be queried and calculated.
[0049] After the original data points have been written, the real-time write aggregation module 104 triggers an aggregation calculation task for the data shard. This triggering action is automatically executed by the callback mechanism of the write operation. That is, whenever a batch of data is successfully written to the database, the system immediately starts a thread or process in the background that is specifically responsible for aggregation calculation. This task is decoupled from the write operation but is driven by the write event, ensuring that the aggregation calculation can be carried out in real time following the data writing.
[0050] The aggregation calculation task reads all stored raw data points whose timestamps fall within the current aggregation window from the data slice. The aggregation window is a sliding window with a fixed time length preset by the system. The task calculates the start and end boundaries of the window based on the current system time and the window length, and then sends a range query request to the database to obtain all data points whose timestamps fall within these boundaries. These data points are combined to form a window data point set, which contains all the data from the measurement points received by the data slice within the current window.
[0051] The window data point set is grouped according to the measurement point identifier. That is, each stored original data point in the set is traversed, and data points belonging to the same measurement point are grouped together according to the value of the measurement point identifier field, forming multiple measurement point group data. Each group corresponds to a specific measurement point identifier. The group contains all measurement point values of that measurement point in the current aggregation window and the corresponding timestamp, providing a data foundation for independent aggregation calculation for each measurement point.
[0052] Based on preset aggregation functions, the measurement point values in the measurement point group data are aggregated and calculated in real time. The aggregation functions include basic statistical operations such as averaging, summing, finding the maximum value, and finding the minimum value. The data is written in real time to the aggregation module 104. According to the aggregation type configured by the system for the measurement point, the module performs corresponding mathematical operations on all values in the group to obtain a temporary aggregate value that represents the statistical characteristics of the measurement point in the current window. This temporary aggregate value is an intermediate calculation result that reflects the data aggregation status of the measurement point in this window.
[0053] The calculated temporary aggregated value is merged with the intermediate aggregated result of the previous aggregation window of the grouped data of the measurement point stored in memory. An intermediate aggregated result record across windows is maintained in memory for each measurement point. For example, for the average value calculation, the intermediate result of the previous window may include the cumulative value of the sum and the data point count. The temporary aggregated value of the current window is added to these cumulative values and the count is updated to generate the updated intermediate aggregated result. When the end time of the current aggregation window arrives, this updated intermediate aggregated result is output as the complete first-level aggregated result data of the measurement point in this window for subsequent modules to store and query.
[0054] When executing the aggregation window, the real-time writing aggregation module 104 maintains an independent intermediate aggregation result memory area for each time granularity in the aggregation window. These time granularities include different levels of fineness such as minute, hour, and day. Each time granularity corresponds to an independent memory area, which is used to temporarily store the intermediate aggregation results of all measurement points under that granularity, thereby obtaining the intermediate aggregation result memory area corresponding to each time granularity.
[0055] Using the measurement point identifier as the key, the intermediate aggregation results of each measurement point at each time granularity are stored in the intermediate aggregation result memory area, forming an intermediate aggregation result memory table for that time granularity. This memory table is a key-value pair structure, where the key is the measurement point identifier and the value is the cumulative statistical data of that measurement point at that time granularity. This structure enables hierarchical management and fast access to aggregation results at different time granularities.
[0056] When a temporary aggregate value is generated, it is updated to the intermediate aggregate result memory table of the corresponding time granularity according to the time granularity of the current aggregate window. For example, if the current window is a minute-level window, the calculation result is updated to the minute-level memory table. At the same time, the system will periodically summarize the aggregate results in the memory table of finer time granularity to the memory table of coarser time granularity. Finally, the aggregate result in the memory table of coarser time granularity is output as the first-level aggregate result data, thereby realizing the synchronous generation of aggregate results of multiple time granularities.
[0057] The first major part of the temporary aggregate value calculation formula is the arithmetic mean of the measurement point values. This arithmetic mean comes from the values of all measurement points in the group data of the measurement point within the current aggregation window. When calculating the arithmetic mean, the measurement point values of each original data point in the group are first summed to obtain a sum. Then, this sum is divided by the number of original data points in the group. The quotient is the central tendency representative value of these measurement point values. This central tendency representative value reflects the general level of the measurement point within the current time window.
[0058] The second main part of the temporary aggregate value calculation formula is the standard deviation of the measurement point values. The standard deviation comes from the degree of difference between the value of each measurement point in the grouped data of the current measurement point within the current aggregation window and the arithmetic mean. To calculate the standard deviation, first calculate the difference between the value of each measurement point and the arithmetic mean, then square each difference to obtain a series of squared values, sum all the squared values to obtain the sum of squares, divide this sum of squares by the difference of the number of original data points minus one, and then take the square root of the quotient. The result is the standard deviation, which reflects the degree of fluctuation of the value of the measurement point within the current time window.
[0059] The fluctuation coefficient weighting factor in the temporary aggregation value calculation formula comes from the system's preset configuration parameters. This fluctuation coefficient weighting factor is a constant preset by the system administrator according to the operating conditions of different production areas in the oil and gas field. For production links with drastic fluctuations, such as the drilling process, a larger weighting factor can be set, while for relatively stable production links, such as stable pipelines, a smaller weighting factor can be set. The weighting factor is directly multiplied by the standard deviation during the calculation process to adjust the influence of the degree of fluctuation in the final aggregation result.
[0060] The index of the original data point in the temporary aggregation value calculation formula comes from the arrangement order of each original data point in the grouped data of the measurement point in the current aggregation window. This index starts from 1 and increments sequentially until the last original data point. It is used to access the measurement point value of each original data point one by one in the summation operation to ensure that each measurement point value is included in the calculation and is not missed or repeated.
[0061] The number of original data points in the temporary aggregate value calculation formula comes from the actual statistical results of the data points in the grouped data of the measurement point within the current aggregation window. This number is obtained by accumulating the counter when generating the grouped data of the measurement point. It represents how many valid original data points were generated for the measurement point within the time window. It is used as a divisor in the arithmetic mean calculation and is used to determine the average base of the sum of squares in the standard deviation calculation.
[0062] The specific process of the calculation formula is as follows: Obtain all original data points of the current measurement point within the current aggregation window from the measurement point group data; iterate through each original data point to extract its measurement point value, while simultaneously counting the number of original data points; sum all extracted measurement point values sequentially to obtain a cumulative sum; divide this cumulative sum by the number of original data points to obtain the arithmetic mean of the measurement point values; after calculating the arithmetic mean, iterate through all measurement point values again, subtract the calculated arithmetic mean from each measurement point value to obtain a series of differences; multiply each difference by itself once to obtain a square value; sum all square values together to obtain a sum of squares; divide this sum of squares by the difference between the number of original data points and one to obtain an intermediate value; perform a square root operation on this intermediate value to obtain the standard deviation of the measurement point value; read the fluctuation coefficient weighting factor corresponding to the measurement point from the system configuration; multiply the standard deviation by the fluctuation coefficient weighting factor to obtain a product; add this product to the previously calculated arithmetic mean; the final result is the temporary aggregated value representing the comprehensive characteristics of the current aggregation window for the measurement point.
[0063] The process of multiplying the volatility coefficient weighting factor by the standard deviation reflects the moderating effect of volatility on the aggregation result. When the standard deviation is large, it indicates that the value of the measurement point fluctuates violently. After multiplying by the weighting factor, a large volatility compensation value will be added to the arithmetic mean, making the aggregation result more reflective of the influence of peaks and troughs. When the standard deviation is small, it indicates that the value of the measurement point is stable. After multiplying by the weighting factor, the added volatility compensation value is small, making the aggregation result mainly determined by the arithmetic mean. This moderating mechanism allows the temporary aggregation value to reflect both the central tendency and the dispersion of the data.
[0064] The calculation process of the temporary aggregate value is closely linked to the generation process of the measurement point group data. The measurement point group data is the data source for the calculation. After the temporary aggregate value is generated, it will be used to merge with the intermediate aggregation result of the previous aggregation window. Therefore, the accuracy of the temporary aggregate value directly affects the quality of the final first-level aggregation result data. In the entire calculation process, all measurement point values involved in the calculation come from the same measurement point group within the same aggregation window, which ensures the relevance and consistency of the calculation results.
[0065] The real-time write aggregation module 104 achieves millisecond-level instantaneous aggregation calculation after data persistence through tight coupling of writing and aggregation, completely eliminating the time delay caused by the traditional batch processing mode. At the same time, the independent maintenance mechanism of the multi-time granularity memory table enables the system to simultaneously meet the needs of real-time monitoring and historical trend analysis for different time granularities. The orderly writing and group calculation of window data points ensure that the aggregation results of each measuring point accurately reflect its true state, greatly improving the timeliness and accuracy of the aggregated data, and providing real-time and reliable data support for oil and gas field production scheduling and fault early warning.
[0066] The aggregation result storage and indexing module 105 is used to store the primary aggregation result data in the aggregation result storage area of the oil and gas field, and to build a primary key index for the aggregation result storage area. The process is as follows: The first-level aggregation result data includes the measurement point identifier, aggregation window start time, aggregation window end time, and aggregation value; The first-level aggregation results data are sorted according to the measurement point identifier and the start time of the aggregation window to generate ordered aggregation records; The ordered aggregation records are appended to the data file in the aggregation result storage area to generate persistent aggregation data; In the index file of the aggregation result storage area, a primary key index entry is created for the aggregation record, consisting of the measurement point identifier, the start time of the aggregation window, and the end time of the aggregation window. The primary key index entry points to the storage offset of the aggregation record in the data file.
[0067] The aggregation result storage and indexing module 105 receives first-level aggregation result data from the real-time write aggregation module 104. The first-level aggregation result data is a structured data record. Each record contains four core fields. The first field is the measurement point identifier, which is used to uniquely identify the oil and gas field measurement point to which the aggregation result belongs. The second field is the aggregation window start time, which indicates the start time of the time interval corresponding to the aggregation result. The third field is the aggregation window end time, which indicates the end time of the time interval corresponding to the aggregation result. The fourth field is the aggregation value, which represents the statistical calculation result of the measurement point within this time interval.
[0068] All received first-level aggregation results are sorted according to two fields: measurement point identifier and aggregation window start time. The sorting operation first uses the measurement point identifier as the primary sort key to arrange all aggregation records belonging to the same measurement point together. Then, within the same measurement point, the records are arranged in chronological order using the aggregation window start time as the secondary sort key. After sorting, the originally messy multiple aggregation records are transformed into an ordered sequence of aggregation records strictly ordered by measurement point and time. This sequence lays the foundation for efficient storage and retrieval in the future.
[0069] The sorted sequence of aggregate records is appended to the data file in the aggregation result storage area. The aggregation result storage area is a dedicated storage space pre-allocated on a distributed file system or local disk. The data file is written sequentially, with each aggregate record written to the end of the file. After each record is written, it will obtain a unique offset in the file to identify the starting storage position of the record in the file. These aggregate records that have been persisted to the disk file are called persistent aggregate data.
[0070] In the index file of the aggregation result storage area, a primary key index entry is created for each aggregation record written to the data file. The primary key index entry consists of three fields: measurement point identifier, aggregation window start time, and aggregation window end time. The combination of these three fields can uniquely identify an aggregation record. The index entry also contains the storage offset of the aggregation record in the data file. This offset directly points to the physical storage location of the record in the data file. All index entries are organized into an index file according to the order of the primary keys, realizing the ability to quickly locate data records by primary key.
[0071] The aggregation result storage and indexing module 105 achieves efficient persistent storage and fast retrieval of aggregation result data through sorted append writing and the primary key index construction mechanism. The sorting operation ensures that the aggregation records of the same measurement point are stored continuously, reducing random disk access. The sequential append writing fully utilizes the continuous write performance of the disk, greatly improving the write throughput. The establishment of the primary key index allows query requests to directly locate the precise location of the data file through the measurement point identifier and time range, avoiding the huge overhead caused by full file scanning, thereby providing the upper-layer application with millisecond-level aggregation data query response capability.
[0072] The aggregation query response module 106, upon receiving an aggregation query request from an oil and gas field, performs an aggregation query on the aggregation result storage area based on the primary key index, and returns the retrieved target first-level aggregation result data to the sender of the aggregation query request. The process is as follows: When an aggregated query request from an oil and gas field is received, the target measurement point identifier, query start time, and query end time are parsed from the message body of the aggregated query request. Based on the target measurement point identifier, query start time, and query end time, search for matching primary key index entries in the index file of the aggregation result storage area to obtain the corresponding list of storage offsets. Based on the list of storage offsets, the corresponding aggregation result data is read from the data file in the aggregation result storage area to generate the query result dataset; The query result dataset is encapsulated into a response message in chronological order and returned to the sender of the aggregate query request.
[0073] The aggregation query response module 106 continuously listens for aggregation query requests sent from various application systems in the oil and gas field. These application systems include real-time monitoring dashboards, production scheduling platforms, fault diagnosis systems, and historical data analysis tools. When an aggregation query request is received, it first obtains the complete message body from the network packet transport layer of the request. Then, it parses the message body according to a predefined request message format. The message body uses a standardized key-value pair structure for encoding. During the parsing process, it reads the target measurement point identifier field in the message body to obtain the measurement point number to be queried, reads the query start time field to obtain the start time of the time range, and reads the query end time field to obtain the end time of the time range. Thus, it obtains the three core parameters of this query: the target measurement point identifier, the query start time, and the query end time.
[0074] Based on the three parameters obtained from parsing—the target measurement point identifier, the query start time, and the query end time—the system searches for all matching primary key index entries in the index file of the aggregation result storage area. The index file is organized using a B-tree or a similar data structure. Each index entry consists of four parts: the measurement point identifier, the aggregation window start time, the aggregation window end time, and the storage offset. The aggregation query response module 106 uses the target measurement point identifier as the first lookup key to quickly locate the index area corresponding to the measurement point.
[0075] Then, within this region, using the query start time and query end time as the range search conditions, all index entries whose aggregation window start time is greater than or equal to the query start time and whose aggregation window end time is less than or equal to the query end time are traversed. For each index entry that meets the time range conditions, its storage offset is extracted. All extracted storage offsets are combined in the order in which the index entries are found to form a storage offset list. Each offset in this list points to the physical storage location of an aggregation result record in the data file.
[0076] Based on the storage offset list obtained in the previous step, the corresponding aggregation result data is read from the data file in the aggregation result storage area. The data file is a binary file written in a sequential append manner. The length of each aggregation result record in the file is fixed or identified by the length field in the record header. The aggregation query response module 106 traverses each offset in the storage offset list, moves the file pointer to the starting position specified by the offset, and reads an entire aggregation result record from that position. The read record contains fields such as measurement point identifier, aggregation window start time, aggregation window end time, and aggregation value. All the read aggregation result records are temporarily stored in memory in their original order in the data file to form a query result dataset. This dataset contains all the target aggregation result data required for this query.
[0077] The query result dataset is encapsulated into a response message in chronological order and returned to the sender of the aggregation query request. The encapsulation operation first iterates through each aggregation result record in the query result dataset, sorts the records according to the aggregation window start time field in the record to ensure that the earlier records are first and the later records are last. Then, all sorted records are filled into the message body in the order of the response message format specifications. The message header of the response message is filled with information such as the status code of this query and the total number of records returned. After the encapsulation is completed, the entire response message is sent back to the application system that initiated the query request through the original network connection. This completes a full aggregation query response process.
[0078] The aggregation query response module 106 achieves millisecond-level retrieval of massive aggregation result data through the precise positioning mechanism of the primary key index, avoiding the huge disk input / output overhead caused by traversing the entire data file. At the same time, the batch reading method of storing the offset list reduces the number of random disk accesses and further improves the query throughput. Standardized request parsing and response encapsulation ensure that the system can seamlessly connect with various upper-layer applications in oil and gas fields, providing stable and reliable data query services for business scenarios such as real-time monitoring, production scheduling, and fault diagnosis.
[0079] Figure 2 This shows the impact of the number of data points in the window on the aggregate curve, with the horizontal axis representing time and the vertical axis representing the aggregate value. Figure 2 The figure shows four curves depicting the aggregation result trends when the number of data points (n) within the window is 5, 10, 20, and 40, respectively. When the number of data points within the window is 5, the aggregation curve fluctuates most dramatically, with a trough of approximately 80 at time 35 minutes and a peak of approximately 102 at time 50 minutes. When the number of data points within the window is 10, the fluctuation range of the aggregation curve narrows compared to when the number is 5, with an aggregation value of approximately 84 at time 15 minutes and approximately 100 at time 50 minutes. When the number of data points within the window is 20, the aggregation curve flattens out, with the aggregation value stabilizing at around 95 at time 30 and 35 minutes. When the number of data points within the window is 40, the aggregation curve is the smoothest, with the smallest fluctuation range. This figure illustrates the impact of the number of data points within the aggregation window in the real-time write aggregation module 104 on the smoothness of the aggregation result, providing a basis for adaptively adjusting the aggregation window size based on data fluctuation characteristics.
[0080] Figure 3 The graph shows the comparison between the sampled value curve of the measurement point within the aggregation window and the temporary aggregated value and arithmetic mean. The horizontal axis represents the sampling point number, and the vertical axis represents the numerical value. Figure 3The three curves in the figure represent the original sampled point values, the temporary aggregated value obtained by the calculation formula, and the arithmetic mean of the sampled point values, respectively. The original sampled point value curve shows significant fluctuations, forming two consecutive peaks at sampling point numbers 14 and 15, with values reaching 48 and 47 respectively, before plummeting to 38 at sampling point number 16. The arithmetic mean curve remains constant at around 40.5, showing no fluctuation characteristics. The temporary aggregated value curve, with a fluctuation correction term superimposed on the arithmetic mean, shows a significant upward trend at sampling point numbers 14 and 15, reaching approximately 44.8 and 46.5 respectively, before moderately declining to approximately 38.5 at sampling point number 16. This figure demonstrates the technical effect of the temporary aggregated value calculation formula written in real-time into the aggregation module 104, simultaneously reflecting both the central tendency and dispersion of the data, and verifies the moderating effect of the fluctuation coefficient weighting factor on the aggregation results.
[0081] Figure 4 The graph shows a comparison of aggregated curves for data with different volatility levels. The horizontal axis represents time, and the vertical axis represents the aggregated value. Figure 4 Three curves were plotted to represent the aggregation results for the low-noise group (Group 1 noise), medium-noise group (Group 2 noise), and high-noise group (Group 3 noise), respectively. The aggregation curve for the low-noise group was the most stable across all time periods, with the aggregation value consistently around 25.2 and fluctuations not exceeding 0.3. The aggregation curve for the medium-noise group initially overlapped with the noise level group, but showed a significant upward trend after time point 52, with the aggregation value rising from 25.2 to 26.2. The aggregation curve for the high-noise group exhibited the most significant fluctuations, rising continuously after time point 14, reaching a peak of 27.8 at time point 26, and then slowly declining to 26.2 after time point 52. This figure demonstrates the processing effect of the aggregation function in the real-time writing aggregation module 104 on data with different levels of fluctuation. Data with greater fluctuations received larger fluctuation compensation values in the aggregation results, enabling the aggregation curve to accurately reflect the discrete characteristics of the original data.
[0082] Figure 5 The graph shows a comparison of polymerization curves with different window particle sizes, with time on the horizontal axis and temperature on the vertical axis. Figure 5Four curves were plotted, corresponding to the aggregation results for windows of 5 minutes, 10 minutes, 20 minutes, and 30 minutes, respectively. The aggregation curve with a 5-minute window exhibited the most dramatic fluctuations, peaking at 27.5 at time 6 minutes and troughing at approximately 23.4 near time 20 minutes, fully preserving the instantaneous changes in the original temperature data. The aggregation curve with a 10-minute window showed significantly reduced fluctuations, peaking at approximately 26.6 at time 6 minutes and troughing at approximately 25 at time 20 minutes, maintaining the same overall trend as the 5-minute window but with increased smoothness. The aggregation curve with a 20-minute window smoothed further, stabilizing around 26.5 between time 8 and 24 minutes. The aggregation curve with a 30-minute window was the flattest, generally remaining within a narrow range of 25.5 to 25.8 after time 12 minutes. The figure illustrates the effect of the multi-time granularity memory table independent maintenance mechanism in the real-time write aggregation module 104. Different aggregation window granularities correspond to different levels of data smoothness, providing flexible options for the real-time and stability requirements of different scenarios in oil and gas field production monitoring.
[0083] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims
1. A high-throughput real-time data aggregation system for massive monitoring points in oil and gas fields, characterized in that: include: The measurement point data parsing module is used to parse the measurement point data packets of oil and gas fields according to the protocol to obtain the raw data points. The raw data points include the measurement point identifier, the data generation timestamp, and the measurement point value. The hash location calculation module is used to calculate the hash value of the test point identifier according to the preset consistent hash function, and obtain the location information on the hash ring; The data fragmentation routing module is used to perform data fragmentation routing on the measurement point identifier based on the position information on the hash ring, so as to determine the queue identifier in the target data fragmentation queue corresponding to the position information, and send the original data point to the target data fragmentation queue corresponding to the queue identifier; The real-time write aggregation module is used to write the raw data points of the target data shard queue into a preset distributed time series database in batches. After the raw data points are written in batches, the raw data points in the distributed time series database are aggregated in real time to obtain the first-level aggregation result data. The aggregation result storage and indexing module is used to store the primary aggregation result data in the aggregation result storage area of the oil and gas field and to build a primary key index for the aggregation result storage area. The aggregation query response module is used to perform aggregation queries on the aggregation result storage area based on the primary key index when receiving an aggregation query request from an oil and gas field, and return the retrieved target first-level aggregation result data to the sender of the aggregation query request.
2. The high-throughput real-time data aggregation system for massive measurement points in oil and gas fields as described in claim 1, characterized in that, In the measurement point data parsing module, the process of obtaining the original data points is as follows: Capture data packets from measurement points sent by field equipment in oil and gas fields; Parse the header fields of the measurement point data packet to obtain the protocol type and device identifier of the measurement point data packet; Based on the protocol type, the corresponding protocol parser is called from the protocol library of the oil and gas field to decode the payload field of the measurement point data packet and obtain the measurement point identifier, data generation timestamp and measurement point value; Outlier removal and dimension normalization are performed on the measurement point values to obtain the original data points of the oil and gas field.
3. The high-throughput real-time data aggregation system for massive measurement points in oil and gas fields as described in claim 1, characterized in that, In the hash position calculation module, the process of obtaining the position information on the hash ring is as follows: Extract the measurement point identifier from the original data points and obtain the byte sequence corresponding to the measurement point identifier; The byte sequence is input into a preset consistent hash function, and the byte sequence is processed according to the consistent hash function to generate a hash value; The hash value is projected onto a preset hash ring to determine the coordinates of the hash value's landing point on the hash ring; Based on the landing point coordinates, query the hash ring partition information pre-stored in the routing mapping table to obtain the queue identifier of the target data fragment queue corresponding to the landing point coordinates; The original data points are associated with the queue identifier to obtain the position information on the hash ring, and data points to be sent are generated.
4. The high-throughput real-time data aggregation system for massive measurement points in oil and gas fields as described in claim 3, characterized in that, The routing mapping table is constructed as follows: Create an equal number of data shard queues based on the total number of data shards in the distributed time-series database; The hash ring is evenly divided into continuous segments identical to the data sharding queues of equal size, and corresponding queue identifiers are assigned to the continuous segments. Store the mapping relationship between consecutive segments and their corresponding queue identifiers in the routing mapping table.
5. The high-throughput real-time data aggregation system for massive measurement points in oil and gas fields as described in claim 3, characterized in that, In the data fragmentation routing module, the process of determining the queue identifier in the target data fragment queue corresponding to the location information and sending the original data point to the target data fragment queue corresponding to the queue identifier is as follows: Parse the queue identifier and the original data point from the data points to be sent; Locate the target data shard queue corresponding to the queue identifier; The original data points are appended to the tail of the target data shard queue to form the data points to be processed in the queue; Update the current length counter of the target data shard queue and generate queue status information.
6. The high-throughput real-time data aggregation system for massive measurement points in oil and gas fields as described in claim 1, characterized in that, The process of writing data into the aggregation module in real time to obtain the first-level aggregation result data is as follows: A preset number of raw data points are pulled from the target data shard queue to form a batch of data to be written; Sort the batch data to be written in ascending order according to the data generation timestamp to generate an ordered data point sequence. The ordered data point sequence is written in batches into the data shards of the distributed time series database to generate the stored original data points; After the original data points have been written, an aggregation calculation task is triggered for the data shards. The aggregation computing task reads the stored original data points whose timestamps fall within the current aggregation window from the data shards and generates a set of window data points; The window data point set is grouped according to the measurement point identifier to generate measurement point group data; Based on a preset aggregation function, the measurement point values of the measurement point group data are aggregated and calculated in real time to obtain temporary aggregated values; The temporary aggregated value is merged with the intermediate aggregated result of the previous aggregated window in the measurement point group data to generate an updated intermediate aggregated result, and the updated intermediate aggregated result is output as the first-level aggregated result data.
7. The high-throughput real-time data aggregation system for massive measurement points in oil and gas fields as described in claim 6, characterized in that, The formula for calculating the temporary aggregation value is: in, This is a temporary aggregation value, where i is the index of the original data point, and n is the number of original data points in the current aggregation window for this measurement point group. This represents the measurement value of the i-th original data point within the current aggregation window. All within the current aggregation window The arithmetic mean, This is the preset volatility coefficient weighting factor.
8. The high-throughput real-time data aggregation system for massive measurement points in oil and gas fields as described in claim 1, characterized in that, In the aggregation result storage and indexing module, the process of storing the first-level aggregation result data in the aggregation result storage area of the oil and gas field and building a primary key index for the aggregation result storage area is as follows: The first-level aggregation result data includes the measurement point identifier, aggregation window start time, aggregation window end time, and aggregation value; The first-level aggregation results data are sorted according to the measurement point identifier and the start time of the aggregation window to generate ordered aggregation records; The ordered aggregation records are appended to the data file in the aggregation result storage area to generate persistent aggregation data; In the index file of the aggregation result storage area, a primary key index entry is created for the aggregation record, consisting of the measurement point identifier, the start time of the aggregation window, and the end time of the aggregation window. The primary key index entry points to the storage offset of the aggregation record in the data file.
9. The high-throughput real-time data aggregation system for massive measurement points in oil and gas fields as described in claim 1, characterized in that, In the aggregation query response module, the process of performing an aggregation query on the aggregation result storage area and returning the retrieved target first-level aggregation result data to the sender of the aggregation query request is as follows: When an aggregated query request from an oil and gas field is received, the target measurement point identifier, query start time, and query end time are parsed from the message body of the aggregated query request. Based on the target measurement point identifier, query start time, and query end time, search for matching primary key index entries in the index file of the aggregation result storage area to obtain the corresponding list of storage offsets. Based on the list of storage offsets, the corresponding aggregation result data is read from the data file in the aggregation result storage area to generate the query result dataset; The query result dataset is encapsulated into a response message in chronological order and returned to the sender of the aggregate query request.