A multi-source data time alignment method
By constructing a unified time-slotted time axis and a differentiated mapping strategy, the problems of data distortion and adaptability in the time alignment of multi-source data are solved, and the accurate alignment and synchronization of periodic and event-based data are achieved, which is suitable for multi-industry and multi-source data fusion scenarios.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- AIR FORCE UNIV PLA
- Filing Date
- 2026-03-05
- Publication Date
- 2026-06-19
Smart Images

Figure CN122241562A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of multi-source data processing technology, specifically relating to a multi-source data time alignment method, and more specifically to a time alignment method suitable for mixed scenarios of periodic data and event-based data. Background Technology
[0002] In fields such as big data processing, IoT monitoring, industrial control, and intelligent transportation, it is often necessary to integrate data from multiple different data sources for comprehensive analysis and decision-making. This multi-source data typically includes two categories: periodic data and event-driven data. Periodic data is generated by data sources at fixed time intervals, with timestamps exhibiting a regular distribution; event-driven data is generated by specific events, with timestamps randomly distributed and without a fixed period.
[0003] Due to differences in the collection cycles and triggering mechanisms of different data sources, timestamps from multiple data sources are difficult to synchronize naturally, posing significant challenges to subsequent processing such as data fusion, comparative analysis, and model training. Existing multi-source data time alignment methods are mostly based on sampling or interpolation operations: sampling methods require resampling the original data at a uniform frequency, which can easily lead to the loss of data details or the introduction of redundant information; interpolation methods estimate data at missing time points through mathematical models, which can fill in gaps, but will change the true characteristics of the original data, and in scenarios where periodic and event-based data are mixed, interpolation accuracy is difficult to guarantee, failing to meet the application scenarios with high requirements for data authenticity and alignment accuracy. Summary of the Invention
[0004] The purpose of this invention is to solve the problems of existing multi-source data time alignment methods that rely on sampling or interpolation, are prone to distortion, and have poor adaptability. Instead, it proposes a method that can achieve accurate time alignment without changing the core characteristics of the original data, adapt to mixed periodic and event-based data scenarios.
[0005] To achieve the above objectives, the technical solution provided by this invention is:
[0006] A method for time alignment of multi-source data is provided, including the following steps:
[0007] Step 1, Timeline Construction: Extract timestamps from all data frames in the data sources to be aligned, and determine a globally unified start time. and end time Based on start time and end time Create a continuous timeline and divide the timeline evenly into multiple continuous and non-overlapping time slots s with a preset time slot length L;
[0008] Step 2, Data Frame Time Slot Mapping: For each data source, iterate through the data frames and map each data frame to the corresponding time slot s according to its timestamp t;
[0009] Step 3, Time Slot Data Determination: For each time slot s in the mapping results of each data source, the final mapped data frame for that time slot is determined using a preset differentiated mapping strategy based on the number of data frames mapped to that time slot.
[0010] Step 4, Alignment Sequence Generation: For each data source, output the representative data frames of each time slot in the order of all its time slots to form an aligned data sequence corresponding to that data source; wherein, all data sources are based on the same time slot division, and the aligned data sequences generated by them are time synchronized on the time slot index.
[0011] Furthermore, the differential mapping strategy in step 3 includes:
[0012] If multiple data frames are mapped to the same time slot s, then the data frame with the earliest timestamp, the data frame with the latest timestamp, or a random frame is selected from these data frames as the final mapped data frame for that time slot s.
[0013] If a single data frame is mapped to the same time slot s, then that data frame is directly used as the final mapped data frame for that time slot s;
[0014] If no data frame is mapped to time slot s, then the final mapped data frame of the previous time slot s-1 is copied to the current time slot s, or the final mapped data frame of the current time slot s is marked as null.
[0015] Furthermore, in step 1, the time slot length L is dynamically adjusted based on at least one of the following: data processing accuracy requirements, data source acquisition cycle, or system resource constraints.
[0016] Furthermore, in step 2, the mapping condition for mapping each data frame to its corresponding time slot s according to its timestamp t is as follows: when the time interval of time slot s is defined as a left-closed, right-open interval. When timestamp t satisfies The data frames are mapped to time slots s.
[0017] Furthermore, multi-source data includes periodic data collected at fixed intervals and event-based data triggered randomly by events.
[0018] The advantages of this invention are:
[0019] 1. The multi-source data time alignment method provided by the present invention constructs a unified time-slotted time axis and maps the data frames of each data source to the corresponding time slot according to their original timestamps, thereby determining a unique representative data frame for each time slot, and finally generating an aligned data sequence for each data source that strictly corresponds to the unified time slot sequence. The core effects of this process are as follows: First, it completely avoids the resampling or numerical interpolation operations on the original data frames in traditional techniques, fundamentally ensuring the consistency of the core features of the output data with the original data, eliminating data distortion caused by estimation or reconstruction, and ensuring data authenticity. Second, this method naturally incorporates periodic data and event-based data into the same processing flow through a time-slotted framework, and uses unified mapping and determination logic to achieve compatibility and synchronization of these two heterogeneous data types, improving the method's versatility. Finally, since all data sources are processed based on the same set of time-slot division rules, the output data sequence achieves precise time-series alignment on the time-slot index, effectively solving the problem that multi-source data is difficult to directly compare and analyze due to different collection cycles and triggering mechanisms. Moreover, the entire method has clear logic and well-defined steps, requires no complex mathematical models, and is easy to implement and deploy.
[0020] 2. In this invention, the time slot length can be dynamically adjusted according to the actual processing accuracy, data source cycle or resource constraints, and the mapping and completion strategies can also be selected as needed, thereby effectively meeting the multi-source data fusion needs of different industries and different characteristics. Attached Figure Description
[0021] The above and / or other features and advantages of the present invention will become more readily understood from the following description with reference to the accompanying drawings, in which:
[0022] Figure 1 This is a flowchart of a multi-source data time alignment method according to an embodiment of the present invention. Detailed Implementation
[0023] The present invention will now be described in detail with reference to the accompanying drawings and exemplary embodiments thereof. It should be noted that the following detailed description of the present invention is for illustrative purposes only and is not intended to limit the scope of the invention.
[0024] To address the problems of existing multi-source data time alignment methods, which are prone to data distortion due to reliance on sampling or interpolation and struggle to effectively accommodate mixed periodic and event-based data scenarios, this invention provides a multi-source data time alignment method based on a unified time-slotted time axis. This method aims to map data frames from various data sources to a unified time-slot sequence according to their timestamps without altering the core characteristics of the original data. It employs flexible strategies to determine representative data for each time slot, ultimately generating a strictly time-synchronized alignment sequence for each data source. This provides a high-quality and reliable data foundation for the fusion analysis and collaborative processing of multi-source data.
[0025] Now combine Figure 1 The flowchart shown illustrates a multi-source data time alignment method, using an industrial IoT scenario as an example, to provide a detailed explanation of this invention. In this embodiment, the data sources to be aligned include three categories: data source A (e.g., a temperature sensor, periodic data, with a collection period of 5 seconds and timestamp sequences of 0s, 5s, 10s…), data source B (e.g., a humidity sensor, periodic data, with a collection period of 8 seconds and timestamp sequences of 0s, 8s, 16s…), and data source C (e.g., a device on / off event recorder, event-type data, with random trigger times and timestamp sequences of 2s, 7s, 12s, 19s…). These data need to be aligned within a time range of 1-20 seconds for comprehensive operational condition analysis or fault warning.
[0026] Step S1: Timeline Construction
[0027] First, the system extracts the timestamp information carried by all data frames in data sources A, B, and C. By comparison, it finds the minimum (0 seconds in this example) and maximum (19 seconds in this example) values among these timestamps, thereby determining the starting point of the global timeline. and the end point Based on this, the system logically creates a coverage... A continuous timeline.
[0028] Subsequently, based on the actual application's requirements for data processing accuracy, the data source's acquisition cycle, and the system's computational resource constraints, a time slot length L is preset. In this embodiment, L is set to 2s. The time slot length L is a key parameter that can be freely set. For example, in industrial process control scenarios requiring precise analysis of high-frequency events, L can be set to a smaller value (e.g., milliseconds); while in environmental monitoring scenarios observing long-term trends, L can be set to a larger value (e.g., minutes or hours). This adjustability of the time slot length is one of the core mechanisms enabling this method to adapt to the fusion needs of different industries and different data characteristics. Next, the system uniformly divides the aforementioned time axis from beginning to end without overlap, obtaining a series of continuous time slots s (s1, s2, s3, ...). Each time slot s represents a time interval of equal length. For example, time slot s1 corresponds to [0s, 2s), time slot s2 corresponds to [2s, 4s), and so on, until the entire time slot is covered. Scope. This step establishes a unified time scale, providing a common time reference for mapping all subsequent data sources. This is a prerequisite for ensuring that the final results can be compared on the same time dimension. By freely setting the time slot length L, this method can flexibly adapt to scenarios with different accuracy requirements and processing capabilities.
[0029] Step S2, Data Frame Time Slot Mapping
[0030] This step processes data sequentially on a per-data-source basis. The system first iterates through each data frame of data source A (timestamps are 0s, 5s, 10s, and 15s). For each data frame, the system reads its timestamp t and determines which time slot s it falls within, as defined in step S1.
[0031] To ensure that each timestamp is accurately mapped to a unique time slot, the time interval of the time slot must be clearly defined. In this embodiment, the time interval of time slot s is defined as a left-closed, right-open interval, meaning that the time range covered by time slot s is... Where as is the time slot start time and es is the time slot end time, and According to this definition, the mapping rule is: when the timestamp t of the data frame satisfies... At that time, the data frame is mapped to time slot s. According to this rule, a data frame with timestamp 0s falls into time slot s1 (…). A data frame with a timestamp of 5 seconds falls into time slot s3. And so on. Using the definition of a left-closed, right-open interval can fundamentally avoid the mapping ambiguity that a data frame may simultaneously belong to two adjacent time slots.
[0032] It should be noted here that those skilled in the art will understand that the specific definition of the time slot interval is not limited to the aforementioned left-closed, right-open form. Any explicit definition that can provide a unique mapping determination can be adopted, only requiring corresponding adjustments to the mapping rules. For example, a left-open, right-closed interval can also be used. Alternatively, a fully closed interval can be used. In this case, the issue of adjacent time slots sharing boundary points needs to be specially handled, which can be resolved through additional rules, such as stipulating that "when t equals a certain boundary value, it is preferentially assigned to the left time slot" or "it is preferentially assigned to the right time slot," as long as this rule is consistently applied in the mapping process of all data sources.
[0033] After mapping data source A, the system then processes data source B independently using the exact same logic and time slot division. It iterates through its data frames (timestamps 0s, 8s, 16s), mapping timestamp 0s to time slot s1, and timestamp 8s to... Therefore, it is mapped to time slot s5 ( The timestamp 16s is mapped to time slot s9. ).
[0034] Subsequently, the system processes data source C. It iterates through its data frames (timestamps 2s, 7s, 12s, 19s), mapping timestamp 2s to time slot s2. The timestamp 7s is mapped to the time slot s4. The timestamp 12s is mapped to time slot s7([12s, 14s)), and the timestamp 19s is mapped to time slot s10( ).
[0035] The core of this step lies in classifying data logically solely based on its original, authentic timestamps, without modifying the numerical content of the data frames themselves. This approach completely eliminates the practice of resampling the original data to change its frequency or generating estimated values through interpolation algorithms, thus preserving 100% of the initial characteristics and authenticity of the data source.
[0036] Step S3: Determining Time Slot Data
[0037] After mapping all data frames from a single data source, a final representative data frame needs to be determined for each time slot s of that data source. The system checks the set of data frames mapped to each time slot s under that data source one by one, and adopts differentiated, configurable strategies based on different set states. The selectivity of these strategies is the key to the flexibility of this method to be compatible with different industry backgrounds, different business logics, and data characteristics.
[0038] Scenario 1: Multiple data frames are mapped within a time slot. This may occur when the data source has a high acquisition frequency or when events are triggered intensively. In response, the system can select a strategy based on application logic. For example, it can select the frame with the earliest timestamp (reflecting the initial state of the interval, suitable for scenarios requiring tracing the start of an event or ensuring the order of data reception), the frame with the latest timestamp (reflecting the latest state of the interval, suitable for scenarios with high requirements for data "freshness"), or randomly select a frame (suitable for scenarios where data frames have equal value and only one representative sample is needed). In this embodiment, the strategy is set to select the frame with the latest timestamp.
[0039] Scenario 2: Mapping within a time slot to a single data frame. This is the simplest case; the system directly designates this data frame as the representative data frame for that time slot. This ensures that all valid data points are faithfully preserved.
[0040] Scenario 3: No data frame is mapped to the time slot. This is common when periodic data has a long period or event-based data is sparse. The system employs a completion strategy, for example, copying the representative data frame determined in the previous time slot s-1 to the current time slot s to maintain data continuity, which is particularly suitable for scenarios with gradual numerical changes or continuous states; or explicitly marking the representative data frame of the current time slot s as null to indicate that there is no valid data at that moment, which is suitable for scenarios sensitive to data missing or requiring precise statistical analysis of valid samples. This embodiment uses the strategy of copying the previous frame.
[0041] The specific determination process under the above strategy will now be explained for each of the three data sources:
[0042] The process of determining data source A (with a cycle of 5 seconds):
[0043] Slot s1: A single data frame mapped to timestamp 0s, directly identified as the representative data frame;
[0044] Slot s2: Unmapped data frame, copying the representative data frame of s1 (0s frame).
[0045] Slot s3: A single data frame mapped to timestamp 5s, directly determined;
[0046] Slot s4: Unmapped data frame, copying the representative data frame (5s frame) of s3.
[0047] Slot s5: Unmapped data frame, a copy of the representative data frame (5s frame) of s4.
[0048] Slot s6: A single data frame mapped to timestamp 10s, directly determined;
[0049] Slot s7: Unmapped data frame, copying the representative data frame of s6 (10s frame).
[0050] Slot s8: A single data frame mapped to timestamp 15s, directly determined;
[0051] Slot s9: Unmapped data frame, copying the representative data frame (15s frame) of s8.
[0052] Slot s10: Unmapped data frame, copying the representative data frame (15s frame) of s9.
[0053] The process of determining data source B (with a cycle of 8 seconds):
[0054] Slot s1: A single data frame mapped to timestamp 0s, which is directly determined;
[0055] Slot s2: Unmapped data frame, copying the representative data frame of s1 (0s frame).
[0056] Slot s3: Unmapped data frame, copying the representative data frame (0s frame) of s2.
[0057] Slot s4: Unmapped data frame, copying the representative data frame (0s frame) of s3.
[0058] Slot s5: A single data frame mapped to a timestamp of 8 seconds, directly determined;
[0059] Slot s6: Unmapped data frame, copying the representative data frame of s5 (8-second frame).
[0060] Slot s7: Unmapped data frame, copying the representative data frame (8s frame) of s6.
[0061] Slot s8: Unmapped data frame, copying the representative data frame (8s frame) of s7.
[0062] Slot s9: A single data frame mapped to timestamp 16s, directly determined;
[0063] Slot s10: Unmapped data frame, copying the representative data frame (16s frame) of s9.
[0064] The process of determining data source C (event-based data):
[0065] Slot s1: No mapped data frame (because the first event is in 2s). Since there is no previous slot, this embodiment sets its representative data frame to null or uses other initialization strategies.
[0066] Slot s2: A single data frame mapped to timestamp 2s, directly determined;
[0067] Slot s3: Unmapped data frame, copying the representative data frame of s2 (2s frame).
[0068] Slot s4: A single data frame mapped to timestamp 7s, directly determined;
[0069] Slot s5: Unmapped data frame, a copy of the representative data frame (7s frame) of s4.
[0070] Slot s6: Unmapped data frame, copying the representative data frame (7s frame) of s5.
[0071] Slot s7: A single data frame mapped to timestamp 12s, directly determined;
[0072] Slot s8: Unmapped data frame, copying the representative data frame (12s frame) of s7.
[0073] Slot s9: Unmapped data frame, copying the representative data frame (12s frame) of s8.
[0074] Slot s10: A single data frame mapped to timestamp 19s, directly determined.
[0075] As can be seen from the above process, regardless of whether the data source is periodic or event-driven, and regardless of whether its data distribution is dense or sparse, this step can generate a clear output for each time slot through a unified logical framework (determining the number of data frames and applying a preset strategy). For time slots without data, the strategy of copying data from the previous time slot ensures the continuity of the output sequence, which is crucial for subsequent time series analysis or model input; while choosing the empty strategy can truly reflect the missing data state. This step successfully organizes the original data with different structures and inconsistent time sequences into a continuous, complete, and structurally unified time series within each data source, laying a solid foundation for the final cross-source synchronization alignment.
[0076] Step S4: Alignment sequence generation
[0077] After independently completing steps S2 and S3 for each data source, each data source obtains a set of "final mapped data frames" that correspond one-to-one with its own time slot sequence. For data source A, its output sequence is the representative frames corresponding to s1, s2, s3...; the same applies to data sources B and C.
[0078] Since all data sources used the exact same method in step S1 , The time slots are divided using L, so their output sequences have identical time slot indices and lengths. When analysis is needed, simply extract representative data frames from different data sources under the same time slot index to construct a multi-source data snapshot for that time interval. For example, under the index of time slot s2, information from data source A (data completed by the strategy), data source B (data completed by the strategy), and data source C (raw event data) is collected for that time period.
[0079] At this point, the system has generated multiple sets of strictly synchronized, time-synchronized aligned data sequences. Three types of data, which were previously impossible to directly compare and analyze due to different acquisition periods (5 seconds vs. 8 seconds) and triggering mechanisms (period vs. random events), have been integrated into a unified and standardized time framework. This makes cross-data source correlation analysis, fusion computation, or collaborative model training directly feasible. For example, it can accurately analyze the changing trends of temperature (data source A) and humidity (data source B) in the next one or several time slots after a "switching event" (data source C) occurs. The entire methodology is clear, involving only timestamp comparison, logical judgment, and simple data referencing, without requiring complex mathematical model calculations. It is computationally efficient and easily implemented in various edge computing devices or central servers.
[0080] In summary, this invention solves the time alignment problem of mixed periodic and event-based multi-source data by constructing a unified time slot axis, mapping based on the original timestamp, and a differentiated determination strategy oriented towards the actual data state. While ensuring data authenticity, it provides a high-quality and easy-to-use synchronous data foundation for upper-layer applications.
[0081] Finally, it should be noted that the features mentioned and / or shown in the above description of exemplary embodiments of the present invention can be combined in the same or similar manner with one or more other embodiments, combined with or substituted for corresponding features in other embodiments. These combined or substituted technical solutions should also be considered to be included within the scope of protection of the present invention.
Claims
1. A method for time alignment of multi-source data, characterized in that, Includes the following steps: Step 1, Timeline Construction: Extract timestamps from all data frames in the data sources to be aligned, and determine a globally unified start time. and end time Based on the start time and the end time Create a continuous time axis and divide the time axis evenly into multiple continuous and non-overlapping time slots s with a preset time slot length L; Step 2, Data Frame Time Slot Mapping: For each data source, iterate through the data frames and map each data frame to the corresponding time slot s according to its timestamp t; Step 3, Time Slot Data Determination: For each time slot s in the mapping results of each data source, the final mapped data frame for that time slot is determined using a preset differentiated mapping strategy based on the number of data frames mapped to that time slot. Step 4, Alignment Sequence Generation: For each data source, output the representative data frames of each time slot in the order of all its time slots to form an aligned data sequence corresponding to that data source; wherein, all data sources are based on the same time slot division, and the aligned data sequences generated by them are time synchronized on the time slot index.
2. The multi-source data time alignment method according to claim 1, characterized in that, The differential mapping strategy in step 3 includes: If multiple data frames are mapped to the same time slot s, then the data frame with the earliest timestamp, the data frame with the latest timestamp, or a random frame is selected from these data frames as the final mapped data frame for that time slot s. If a single data frame is mapped to the same time slot s, then that data frame is directly used as the final mapped data frame for that time slot s; If no data frame is mapped to time slot s, then the final mapped data frame of the previous time slot s-1 is copied to the current time slot s, or the final mapped data frame of the current time slot s is marked as null.
3. The multi-source data time alignment method according to claim 1 or 2, characterized in that, In step 1, the time slot length L is dynamically adjusted based on at least one of the following: data processing accuracy requirements, data source acquisition cycle, or system resource constraints.
4. The multi-source data time alignment method according to claim 1 or 2, characterized in that, In step 2, the mapping condition for each data frame to the corresponding time slot s according to its timestamp t is as follows: when the time interval of time slot s is defined as a left-closed and right-open interval. When timestamp t satisfies The data frames are mapped to time slots s.
5. The multi-source data time alignment method according to claim 1 or 2, characterized in that, The multi-source data includes periodic data collected at fixed intervals and event-based data triggered randomly by events.