Infectious disease risk dynamic assessment system based on multi-source communication data fusion
The infectious disease risk dynamic assessment system, which integrates multi-source communication data, uses trajectory similarity judgment, communication access fusion, and behavioral linkage analysis to identify and eliminate false clusters, thus solving the problem of misjudgment of cluster behavior in infectious disease prevention and control and improving the reliability of the assessment system and the efficiency of prevention and control.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DONGYING CENT FOR DISEASE CONTROL & PREVENTION
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for identifying crowd gathering behavior in infectious disease control suffer from limitations such as limited data sources, poor adaptability to environmental disturbances, and insufficient ability to filter abnormal behavior, leading to misjudgments and waste of control resources, which in turn affect control efficiency and social stability.
A dynamic risk assessment system for infectious diseases based on multi-source communication data fusion is adopted. Through trajectory similarity judgment, communication access fusion, behavioral linkage analysis and cluster credibility scoring modules, false clusters are identified and eliminated, thereby improving the credibility and stability of cluster judgment.
Effectively identify and eliminate false clustering trajectories, reduce the risk of misjudgment, improve the robustness, sensitivity, and accuracy of public prevention and control decisions of the infectious disease risk assessment system, and avoid misallocation of resources and misjudgment of regional lockdowns.
Smart Images

Figure CN122245836A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of infectious disease prevention and control technology, and more specifically, to a dynamic risk assessment system for infectious diseases based on multi-source communication data fusion. Background Technology
[0002] In infectious disease prevention and control and urban public safety management, accurately identifying crowd gathering behavior is a crucial step in assessing regional transmission risks and developing intervention measures. With the rapid proliferation of smart terminals and communication networks, crowd behavior perception mechanisms built upon data such as location information, communication access, and device interaction are gradually becoming an important foundation for dynamic monitoring of infectious diseases. However, existing technologies for crowd identification generally suffer from problems such as reliance on single data sources, poor adaptability to environmental disturbances, and insufficient ability to filter abnormal behavior.
[0003] In complex urban environments, communication devices may experience location drift, location jumps, or frequent access switching due to factors such as base station switching, satellite signal obstruction, or network congestion. Especially within the same time period, multiple devices may access the same hotspot or base station, and even if users have no actual spatial contact, they may be misjudged as "long-term gatherings at the same location" in the data. This type of "pseudo-gathering" phenomenon caused by data anomalies or structural interference has caused false alarms in numerous practical epidemic prevention and control efforts, directly leading to misallocation of epidemic prevention resources, incorrect regional lockdowns, and even public panic, seriously affecting prevention and control efficiency and social stability. Traditional systems often rely on the static overlap of terminal locations or the synchronization of access logs as the basis for judgment, lacking in-depth analysis of communication behavior, network structure, and individual trajectory history, making it difficult to analyze the authenticity of gatherings from multiple dimensions. Therefore, this invention proposes a dynamic infectious disease risk assessment system based on multi-source communication data fusion to solve the above problems. Summary of the Invention
[0004] To achieve the above objectives, the present invention provides the following technical solution: A dynamic infectious disease risk assessment system based on multi-source communication data fusion includes: The trajectory similarity judgment module receives continuous positioning data from multiple terminals, extracts the movement trajectory feature vector of each terminal within the target time period, performs dynamic correlation calculation with its historical trajectory, identifies concentrated behaviors with non-inertial jump characteristics, and eliminates false clustered trajectories. The communication access fusion module extracts network access source information of multiple terminals within a target time period, constructs a regional hotspot access map, and generates a spatial access overlap index based on the common matching degree of the network access structure and the cross-terminal redundancy ratio, which represents the degree of risk of non-real physical aggregation caused by shared access resources. The behavior linkage analysis module acquires short-range communication interaction data of the terminal within the target time period, including Bluetooth broadcast scanning, near-field communication logs and application layer adjacency behavior. It analyzes the time consistency of behavior triggering and interaction frequency between devices, and calculates the behavior interaction consistency index to evaluate the degree to which the aggregation behavior is supported by actual physical interaction. The cluster credibility scoring module receives the spatial access overlap index and the behavioral interaction consistency index, inputs them into the fusion discrimination model, and outputs the cluster credibility score, which serves as the basis for whether to include the current clustered data in the infectious disease risk dynamic assessment model. The risk intervention and correction module marks areas with cluster credibility scores below the credibility threshold as pseudo-cluster areas, removes relevant data for these areas from the data input of the preset risk dynamic assessment model, and records trajectory tracing information. It then inputs all infectious disease risk collection data after removing all pseudo-cluster areas into the preset risk dynamic assessment model and outputs the infectious disease transmission risk level of each target area in geographic space.
[0005] In a preferred embodiment, the terminal movement trajectory feature vector extraction method includes: The positioning points of multiple terminals within the target time period are organized in chronological order. Based on adjacent positioning points, the moving speed, direction angle, dwell time and position change of each terminal are calculated, and these parameters are arranged in time series to form a trajectory basic parameter sequence. Perform time sliding window analysis on the trajectory basic parameter sequence, extract the velocity change trend, directional continuity, dwell phase ratio and position fluctuation intensity within each window, and generate trajectory behavior segments that reflect the continuous motion rhythm and stability. All trajectory behavior segments are connected in chronological order, and a motion trajectory feature vector is formed through numerical normalization and time weight calibration. This vector can characterize the stability of the terminal's motion path, the continuity of its behavior rhythm, and the pattern of position change within the target time period.
[0006] In a preferred embodiment, the dynamic correlation calculation of the terminal trajectory refers to: Extract the feature vector of the terminal's movement trajectory within the target time period and compare it segment by segment with the feature vector of the terminal's movement trajectory formed within the historical normal behavior cycle. The consistency of the trajectory is jointly judged by the similarity of the direction sequence and the overlap of the dwell position. When the deviation condition is met, it is marked as a trajectory deviation segment. For time periods marked as trajectory deviation segments, the trajectory segments are compared with the set behavior reference threshold range based on the proportion of the trajectory segments in the stable speed range, the continuity of the dwell period, and the change range of the path smoothness. When any two of these indicators exceed the reference threshold range consecutively, the trajectory segment is marked as a significant deviation segment. Trajectories marked as significantly deviating from the target segment are directly removed from the movement trajectory sequence, and the remaining trajectory segments are used as reliable trajectories to input into the subsequent aggregation and recognition process.
[0007] In a preferred embodiment, satisfying the deviation condition means: The terminal's movement trajectory within the target time period is divided into multiple continuous movement direction segments. Each direction segment consists of two consecutive positioning points. The displacement direction of the segment is extracted and represented in the form of angles to form a complete direction sequence. At the same time, the direction sequence of the terminal within the historical normal behavior cycle is extracted as a reference direction sequence. For each pair of corresponding direction segments between the direction sequence within the target time period and the historical direction sequence, the angle difference is calculated, and the average and standard deviation of all angle differences are statistically analyzed to determine whether the overall direction change trend is consistent. When the average difference is lower than the first preset angle deviation threshold and the standard deviation is lower than the second preset stability threshold, the direction sequences are considered to be similar. When determining the overlap of stopping positions, the geographical location of each stationary stopping point in the current trajectory is compared with the distance of each stopping point in the historical trajectory. The number of stopping points falling within the spatial overlap range is counted, and the proportion of the current total number of stopping points is used as the overlap ratio. When the overlap ratio is higher than the preset position overlap ratio threshold, the stopping positions are considered to have spatial consistency. When both directional similarity and stopping position consistency are satisfied, the trajectory is determined to have behavioral consistency; otherwise, it is marked as a trajectory deviation segment.
[0008] In a preferred embodiment, the method for calculating the spatial access overlap index includes: Network access source information for multiple terminals within a target time period is extracted, including the base station number connected to each terminal, the start and end points of the access time period, and the number of accesses. Each terminal is designated as a first-class node, and each access source is designated as a second-class node. A regional hotspot access map is constructed based on the connection relationship between the terminal and the access source. In this map, the weight of each connection edge is calculated by the ratio of the access time length to the access frequency, which is used to characterize the access stability of the terminal to that access source. The degree of overlap of access paths between any two terminals is calculated, the set of repeated access sources in the connection between the two is counted, and the average weight of all edges in the set is calculated. If the average weight exceeds the stability threshold set by the system, and the proportion of the number of repeated access sources to the total number of access sources of the two terminals is higher than the access overlap ratio threshold, then it is determined that the terminal pair has a high access commonality relationship. The number of terminal pairs that satisfy the common relationship of high access in the entire map is counted, and the ratio of this number to the total number of all terminal pairs in the map is calculated as the spatial access overlap index.
[0009] In a preferred embodiment, the method for calculating the behavioral interaction consistency index includes: Collect short-range communication interaction events of multiple terminals within a target time period, including Bluetooth identification events, near-field communication events, and application adjacency events. Represent each event as a triplet of event type, start time, and end time, and arrange them in chronological order to construct a sequence of terminal interaction behaviors. The interaction behavior sequences of any two terminals are matched and analyzed to determine whether there are events of the same type that partially overlap in time. If so, they are recorded as a synchronization event. The time overlap of the synchronization event is defined as the proportion of the actual overlap time of the two events to the shorter event time. The event intensity level is preset by the event type. Near field communication events are assigned the highest level, Bluetooth events are assigned the medium level, and application adjacency events are assigned the basic level. The synchronization event score is the product of the overlap and the event level. The scores of all synchronization events of the terminal pair within the target time period are summed to obtain the actual total synchronization score. At the same time, the maximum sum of scores that can be obtained in all communication events that have occurred in the terminal pair, assuming that each event is completely overlapping and reaches the upper limit of the event intensity level, is calculated as the reference total score of the terminal pair. The actual total synchronization score is divided by the reference total score to obtain the behavioral interaction consistency coefficient of the terminal pair. The behavioral interaction consistency coefficients of all terminal pairs within the same target area are averaged to generate the behavioral interaction consistency index for that area.
[0010] In a preferred embodiment, the method for calculating the aggregation credibility score includes: The spatial access overlap index and behavioral interaction consistency index of the target area are obtained, which respectively represent the degree of access resource sharing and the degree of actual physical interaction among terminals in the area, and the two indicators are used as the joint input of the scoring model. The scoring model is based on a convolutional neural network architecture. It extracts features and maps scores to the coupling relationship between spatial aggregation features and behavioral aggregation features. The convolutional neural network model is used to fuse the two input indicators to generate an aggregation credibility score. The score result represents the credibility of aggregation behavior in the current region.
[0011] In a preferred embodiment, the pseudo-clustering region labeling method includes: Obtain the clustering credibility score of the current target area and retrieve the historical score records of the area that were judged as real clusters by the system in the past multiple time periods. Based on the stable distribution range of these score records, extract the reference quantile interval representing the real clustering characteristics. The reference quantile interval is determined by the continuous percentage interval in the middle stable segment of the score record, and the lower bound of the reference quantile interval is used as the dynamic credibility threshold of the area in the current time period. The current clustering credibility score is compared with the dynamic credibility threshold. If the current score is lower than the dynamic credibility threshold, the region is marked as a pseudo-clustering region. False clusters are removed from the data input of the dynamic assessment process of infectious disease risk to prevent low-reliability clustering behavior from interfering with the overall judgment results of the transmission risk model.
[0012] The technical effects and advantages of this invention are as follows: This invention, by setting up a trajectory similarity judgment module, extracts the movement trajectory feature vector of each terminal within a target time period based on continuous positioning data from multiple terminals, and performs dynamic correlation calculation by combining it with its historical trajectory. This effectively identifies clustered behaviors with non-inertial jump characteristics. Compared to traditional methods that rely solely on location point distribution to determine clustering, this invention can accurately eliminate false clustering trajectories caused by abnormal jumps, random fluctuations, or other non-actual behaviors, reducing the risk of misjudgment due to communication network drift, positioning errors, or short-term pseudo-stops. Therefore, it improves the reliability and stability of trajectory clustering judgment in large-scale crowd trajectory identification.
[0013] This invention, by introducing a communication access fusion module and a behavior linkage analysis module, can evaluate the true interaction relationship between terminals from two dimensions: network access structure and actual communication behavior. Specifically, the communication access fusion module generates a spatial access overlap index using the ratio of cross-terminal redundancy to access structure commonality, identifying non-real physical clustering patterns caused by shared access resources; the behavior linkage analysis module acquires short-range communication data and calculates a behavior interaction consistency index, measuring the probability of actual physical contact between devices. The combination of these two modules effectively avoids misidentification of clustering behavior in scenarios where terminals are spatially close but have no actual interaction, and enhances the system's ability to distinguish between "pseudo-clustering" and "true clustering" in complex, high-density areas.
[0014] This invention incorporates a risk intervention and correction module. Based on the dynamic judgment results of cluster credibility scores, areas with credibility scores below a threshold are automatically marked as pseudo-cluster areas, and related data for these areas are removed from the pre-set dynamic infectious disease risk assessment model. This module also records trajectory tracing information and, after removing pseudo-cluster data, reorganizes the input data to generate infectious disease transmission risk levels in geographic space, achieving dynamic intervention and correction of input data quality. Compared to the disadvantage of traditional risk assessment models lacking fault tolerance mechanisms for data errors, this invention can continuously filter low-credibility areas, avoiding resource misallocation, risk misjudgment, and misjudgment of regional lockdowns caused by false clusters, significantly improving the robustness, sensitivity, and accuracy of public health decision-making in the dynamic risk assessment system. Attached Figure Description
[0015] To facilitate understanding by those skilled in the art, the present invention will be further described below with reference to the accompanying drawings; Figure 1 This is a schematic diagram of the dynamic risk assessment system for infectious diseases based on multi-source communication data fusion in this invention. Detailed Implementation
[0016] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.
[0017] Reference Figure 1 The following examples were obtained: Example 1: A dynamic infectious disease risk assessment system based on multi-source communication data fusion, comprising: The trajectory similarity judgment module receives continuous positioning data from multiple terminals, extracts the feature vector of each terminal's movement trajectory within a target time period, and performs dynamic correlation calculation with its historical trajectory to identify concentrated behaviors with non-inertial jump characteristics and eliminate false clustering trajectories. This module is used to model and filter the actual movement behavior of individual terminals within a specific time period. Its significance lies in eliminating non-real clustering paths caused by abnormal positioning errors, network switching, or occasional interference, avoiding misjudging false location overlaps as high-risk population clusters, and helping to ensure the accuracy of subsequent analysis and the credibility of the data source.
[0018] The communication access fusion module extracts network access source information from multiple terminals within a target time period, constructs a regional hotspot access map, and generates a spatial access overlap index based on the common matching degree of network access structure and the cross-terminal redundancy ratio, representing the degree of non-real physical clustering risk caused by shared access resources. This module identifies the superficial clustering illusion caused by access to the same communication resources by analyzing the overlap of network infrastructure such as base stations and hotspots accessed by terminals. Its significance lies in eliminating the virtual "co-occurrence" caused by network coverage intersection at the communication level, thereby improving the authenticity of spatial clustering detection.
[0019] The behavioral linkage analysis module acquires short-range communication interaction data of terminals within a target time period, including Bluetooth broadcast scanning, near-field communication logs, and application-layer adjacency behavior. It analyzes the temporal consistency and interaction frequency of behavioral triggers between devices and calculates a behavioral interaction consistency index to assess the degree to which gathering behavior is supported by actual physical interaction. This module focuses on the micro-behavioral coordination between terminals, extracts the behavioral synchronicity of the gathered crowd from actual short-range interactions, and determines whether face-to-face contact actually exists. Its significance lies in making up for the shortcomings of traditional spatial overlap models in identifying real physical interactions and judging the transmission risk from the contact level.
[0020] The cluster credibility scoring module receives the spatial access overlap index and the behavioral interaction consistency index, inputs them into the fusion discrimination model, and outputs a cluster credibility score, which serves as the basis for whether to include the current cluster data in the infectious disease risk dynamic assessment model. This module inputs multi-dimensional spatial structure and behavioral characteristics into a unified scoring model and outputs a comprehensive score for judging the credibility of cluster events. Its significance lies in filtering cluster events based on credibility, including only high-credibility clusters with real contact risks in subsequent risk models, thus ensuring the accuracy of the assessment results.
[0021] The risk intervention correction module marks areas with cluster credibility scores below the credibility threshold as pseudo-clustering areas. Data related to these areas is removed from the pre-set risk dynamic assessment model's data input. Simultaneously, trajectory tracing information is recorded. All infectious disease risk data collected after removing all pseudo-clustering areas is input into the pre-set risk dynamic assessment model, outputting the infectious disease transmission risk level of each target area in geographic space. This module serves as the system's correction mechanism, performing marking and data removal operations on areas with low cluster credibility. Its significance lies in eliminating the interference of "noise areas" on the overall risk assessment model, supporting subsequent analysis and correction through tracing records, and ensuring that the final risk level output by the system has higher judgment validity and practical intervention value.
[0022] The method for extracting feature vectors of terminal movement trajectories includes: organizing the positioning points of multiple terminals within a target time period in chronological order; specifically, the system collects geolocation data generated by each terminal at a frequency of seconds within the target time period, including the timestamp, latitude and longitude coordinates, and signal strength of each positioning point. Taking each terminal as a unit, all its positioning points are arranged sequentially in chronological order. Based on the geographical distance and time interval between adjacent positioning points, the following parameters are calculated for each segment of movement: movement speed (i.e., the straight-line distance between two positioning points divided by the time difference), azimuth angle (i.e., the geographical azimuth angle from one positioning point to the next), dwell time (i.e., the duration of positional changes of adjacent positioning points within spatial tolerance), and positional change amplitude (i.e., the difference between the maximum and minimum values of positional changes between several consecutive points). These parameters are arranged in a time series to construct a sequence of basic trajectory parameters, which describes the original behavioral pattern of each segment during the continuous movement of the terminal.
[0023] A time-sliding window analysis is performed on the trajectory's basic parameter sequence. A fixed-length time-sliding window is set, for example, every ten minutes. Within each time window, the trend of movement speed (i.e., the difference between the maximum and minimum speed values within the window and its direction of change) is statistically analyzed, directional continuity (i.e., the standard deviation of the direction angle sequence) is calculated, the proportion of dwell time (i.e., the proportion of dwell time to the total window time) is measured, and the intensity of position fluctuation (i.e., the mean of the amplitude of all position changes within the same window) is calculated. Based on these statistical indicators, trajectory behavior segments that characterize the terminal's motion stability and behavioral rhythm within the window are generated, constituting the local motion characteristics of the terminal during that time period.
[0024] All trajectory behavior segments are connected chronologically and subjected to numerical normalization and time weight calibration to form a motion trajectory feature vector. The normalization process employs a segmented normalization strategy, proportionally converting various parameters within each behavior segment (velocity change trend, directional continuity, proportion of dwell time, and intensity of positional fluctuations) according to their minimum and maximum values within the entire time period of that terminal, ensuring that parameter values are between zero and one for easy comparison and fusion. Time weight calibration uses a time decay factor to assign higher weights to behavior segments closer to the current time point in the overall trajectory vector, while assigning lower influence factors to historical trajectory segments further away from the current time period. For example, the time period is divided into six adjacent windows, with the weights of each window vector decreasing in a ratio of 1:0, 8:0, 6:0, 4:0, 2:0, and so on, sequentially weighted to synthesize a complete trajectory vector, thereby enhancing the representation of the current behavior in the overall vector.
[0025] The resulting motion trajectory feature vector serves as a representation of the terminal's behavior within the target time period, exhibiting clear characteristics of motion path stability, behavioral rhythm continuity, and positional change patterns. This vector can be used not only for subsequent trajectory consistency comparisons but also as a key feature input variable in the pseudo-aggregation screening process, enhancing the system's comprehensive ability to judge the stability and anomalies of terminal behavior.
[0026] The dynamic correlation calculation of terminal trajectories refers to extracting the feature vector of the terminal's movement trajectory within a target time period and comparing it segment by segment with the feature vector of the terminal's movement trajectory formed within its historical normal behavior cycle. Specifically, the system matches all trajectory behavior segments generated within the target time period using a time sliding window method with historical normal behavior segments of the terminal from a continuous period of seven days or longer. To ensure comparison accuracy, priority is given to comparing segments appearing in the same time period or within the same daily behavior cycle, such as comparing the behavior segment from 8:00 AM to 10:00 AM each day with historical behavior segments from the same time period. Subsequently, the system jointly judges the consistency of the trajectory through directional sequence similarity and dwell position overlap. When the deviation condition is met, the corresponding segment is temporarily marked as a trajectory deviation segment.
[0027] Behavioral stability characteristics are analyzed for time periods marked as trajectory deviation segments. The system extracts three behavioral stability indicators from the basic trajectory parameters: the proportion of trajectory segments within a stable speed range, the continuity of dwell time, and path smoothness. These indicators are then compared with a set behavioral reference threshold range. The behavioral reference threshold range is derived from the historical normal behavior statistics of the terminal. For example, if the speed fluctuation ratio of the terminal in a stable movement state is between 0.2 and 0.4, the average continuous time of dwell time is between five and fifteen minutes, and the average angle of path direction switching is between twenty and forty degrees. When any two indicators in a trajectory deviation segment continuously exceed the reference threshold range, it is presumed that the segment exhibits change characteristics significantly different from the normal movement pattern of the terminal.
[0028] Trajectory segments deemed to exhibit significant behavioral deviations are marked as significantly deviated segments. Specifically, if the system detects that a trajectory deviation segment meets the aforementioned multi-indicator threshold exceeding conditions within two or more consecutive time sliding windows, then the segment is upgraded to a significantly deviated segment. For example, if a terminal exhibits a speed fluctuation ratio higher than 0.5, a dwell period continuity of less than three minutes, frequent path direction changes, and an average angle change exceeding 60 degrees within a continuous 20-minute interval, then the trajectory for that time period is determined to be a significantly deviated segment. This behavioral pattern generally corresponds to complex signal drift, short-term virtual location jumps, or non-real location update events.
[0029] Trajectories marked as significantly deviating from the target segment are directly removed from the movement trajectory sequence, and the remaining trajectory segments are used as reliable trajectories in subsequent aggregation and identification processes. The removal strategy is a hard removal method, which deletes the entire trajectory feature vector corresponding to the time window of the deviating segment without interpolation or smoothing, thus avoiding interference from pseudo-trajectories in subsequent aggregation and identification results. This method ensures that the system performs aggregation determination and risk assessment analysis solely based on reliable behavioral trajectories constituted by continuous real position changes, which helps reduce the impact of misjudgments caused by positioning drift, short-term signal interference, or non-real position jumps, improving the accuracy and reliability of aggregation and identification decisions.
[0030] In a specific embodiment of the present invention, three core indicators for evaluating the stability of terminal trajectory behavior include: the proportion of stable speed intervals, the continuity of dwell periods, and path smoothness. Their specific calculation and definition are as follows: The proportion of stable speed intervals is defined as follows: After calculating the speed values between all consecutive positioning points in a trajectory behavior segment, the percentage of speed values falling within the "stable speed range" is counted. This stable speed range is set as upper and lower boundary values based on the speed statistics of the terminal's historical normal movement. For example, if there are thirty sets of continuous positioning point speed data in a certain behavior segment, and twenty sets of speed values are within the preset stable range of 0.5 m / s to 1.5 m / s, then the proportion of stable speed intervals is two-thirds. This indicator reflects whether the segment exhibits continuous and balanced movement behavior. If the proportion is low, it indicates that the segment may experience unstable factors such as sudden stops, accelerations, or drifts.
[0031] Definition of dwell cycle continuity: Statistically measure the duration of consecutive stationary states within a trajectory behavior segment and observe whether the durations of multiple dwell cycles are substantially consistent. When the durations of multiple dwell cycles are close to each other and their distribution density is uniform, it is considered to have high dwell cycle continuity. Judgment criteria: Calculate the duration of all stationary segments and compare the difference between the longest and shortest dwell times. If the difference does not exceed a certain percentage threshold (e.g., 30%) of the longest value, it is considered to have good dwell cycle continuity. For example: If a segment contains three stationary states lasting five minutes, six minutes, and five and a half minutes respectively, then the segment can be judged to have high dwell cycle continuity.
[0032] Path smoothness is a measurement index of continuous changes in the direction of motion within a trajectory segment. It is based on the angle of change in direction between three consecutive positioning points. Small angle changes indicate a smooth trajectory; large angle changes and frequent direction changes indicate sharp deviations and low smoothness. The judgment method involves the system calculating the angle formed by every three consecutive positioning points and statistically analyzing the range of these angle values. If most angle changes fluctuate within a range below 30 degrees, the path smoothness is considered good. Conversely, if multiple angle changes jump frequently, especially with sharp turns exceeding 60 degrees, the path smoothness is considered poor, potentially indicating false jumps or sharp turns.
[0033] When used together to determine trajectory deviation, the three indicators each reflect different aspects of the terminal's movement behavior: the proportion of stable speed intervals reflects whether the movement process is smooth; the continuity of dwell time reflects whether stationary behavior is regular; and the path smoothness reflects whether the route trend is natural and acceptable. By comparing the current segment with historical behavior reference threshold ranges using these indicators, abrupt trajectory changes or non-inertial behaviors can be identified relatively accurately, assisting the system in completing high-precision false clustering elimination.
[0034] Satisfying the deviation condition means determining trajectory anomalies by coupling directional behavior patterns with spatial dwelling behavior. Specifically, this involves the following steps: dividing the terminal's movement trajectory within a target time period into multiple continuous movement direction segments, each consisting of two consecutive positioning points. The displacement direction of each segment is extracted and expressed as an angle, forming a complete direction sequence. Simultaneously, the direction sequence within the terminal's historical normal behavior cycle is extracted as a reference direction sequence. For example, when the terminal generates sixty positioning points within a ten-minute time period, the number of direction segments is fifty-nine, calculated using the offset angle of each direction segment towards due north. The direction sequence within the historical normal behavior cycle can be derived from the user's normal travel path direction sequence for seven consecutive days at the same time period, reflecting their daily inertial walking route characteristics.
[0035] For each pair of corresponding direction segments between the direction sequence within the target time period and the historical direction sequence, the angle difference is calculated, and the average and standard deviation of all angle differences are statistically analyzed to determine whether the overall direction change trend is consistent. When the average difference is lower than a first preset angle deviation threshold and the standard deviation is lower than a second preset stability threshold, the direction sequences are considered similar. For example, if a terminal's average direction deviation angle is ten degrees and its direction change fluctuations are stable within this time period (i.e., the standard deviation does not exceed fifteen degrees), then its direction change trend is considered stable, belonging to a normal behavioral inertial trajectory.
[0036] When determining the overlap of stopping positions, the geographical location of each stationary stopping point in the current trajectory is compared with the distance of each stopping point in the historical trajectory. The number of stopping points falling within the spatial overlap range is counted, and the proportion of the current stopping points to the total number of stopping points is used as the overlap ratio. When the overlap ratio is higher than a preset position overlap ratio threshold, the stopping positions are considered to have spatial consistency. For example, when a terminal continuously remains stationary near a restaurant, office, or residence, and there are corresponding overlapping positions of these stationary points in the historical stopping area, and more than 60% of the current stopping points fall within the same selected spatial radius as the historical stopping points, then the stopping behavior can be considered consistent.
[0037] When both directional similarity and stationary consistency are satisfied, the trajectory is deemed to have behavioral consistency; otherwise, it is marked as a trajectory deviation segment. In practical applications, when a terminal's directional behavior shows a significant deviation and its stationary position cannot align with existing historical behavior—for example, suddenly jumping from a fixed office area to a distant location without exhibiting stable stationary behavior—meaning the system cannot simultaneously satisfy both of the aforementioned consistency conditions, this behavioral segment is judged as a trajectory deviation segment and will be eliminated in subsequent trajectory reliability screening to reduce false aggregation interference caused by false positioning, signal errors, or simulated paths.
[0038] The calculation method for the spatial access overlap index includes the following steps to identify potential false clustering relationships among terminals within a target area due to shared access resources. Specifically, the steps are as follows: Extract network access source information for multiple terminals within a target time period, including the base station number connected to each terminal, the start and end points of the access time period, and the number of accesses. Each terminal is designated as a first-class node, and each access source as a second-class node. A regional hotspot access map is constructed based on the connection relationships between terminals and access sources. In this map, each connection edge represents the association between a terminal and an access source. The edge weight is calculated as the ratio of the total access time of the terminal to the access source to the access frequency, characterizing the stability of the terminal's access to that access source. For example, if a terminal's access time at a certain hotspot base station is three hours and the total number of accesses is six, then the edge weight is half an hour, indicating that the terminal's access to that access source exhibits a short-term, frequent characteristic.
[0039] The system calculates the degree of overlap in the access paths between any two terminals, compiles a set of duplicate access sources in their connection, and calculates the average weight of all edges in this set. This average weight is used to determine whether the two terminals exhibit similar stability characteristics for the same access source. For example, if two terminals both access three identical base stations, and their edge weights are for one hour, one and a half hours, and two hours respectively, then the average weight is one hour and fifty minutes. The system then compares this average value with a preset stability threshold. If the value exceeds the threshold and meets subsequent proportional conditions, the terminals exhibit high-stability overlap.
[0040] To determine the existence of common access relationships, if the average edge weight of the aforementioned duplicate access sources exceeds the system's set stability threshold, and the proportion of the number of duplicate access sources to the total number of access sources for both terminals is higher than the access overlap ratio threshold, then the terminal pair is determined to have a high common access relationship. For example, if two terminals access a total of six access sources, four of which are duplicate access sources, and the average weight exceeds the one-hour stability threshold, and if the system sets the access overlap ratio threshold to 60%, then in this example, the overlap ratio is 66%, which meets the condition and is considered to have a high common access relationship. This may not be due to physical contact, but there is a risk of false aggregation.
[0041] The spatial access overlap index is calculated by counting the number of terminal pairs that satisfy the common access relationship in the entire map and dividing this number by the total number of terminal pairs in the map. This index represents the proportion of terminal pairs within the target area that share access overlap characteristics and serves as an important indicator of the potential false clustering risk in that area. A higher spatial access overlap index indicates more severe false proximity behavior caused by terminals sharing access resources within the area. Based on this, the system can adjust its reliability assessment strategy for clustering behavior in that area to prevent false alarms caused by false clustering.
[0042] The calculation method for the Behavioral Interaction Consistency Index includes the following steps to assess whether there are behavioral patterns supported by actual physical interaction between terminals within a target time period, thereby assisting in the identification of real clustering behaviors. Specifically, it involves collecting short-range communication interaction events from multiple terminals within the target time period, including Bluetooth identification events, near-field communication events, and application adjacency events. Each event is represented as a triplet of event type, start time, and end time, and arranged chronologically to construct a sequence of terminal interaction behaviors. For example, if a terminal has a near-field communication event with another device at 2 PM, with the start time at 2 PM and the end time at 2:05 PM, and the event type is near-field communication, then its triplet would be "near-field communication, 2 PM, 2:05 PM". Continuously collecting such events forms a time-ordered behavioral sequence, fully characterizing the spatiotemporal contact behavior of each terminal.
[0043] The interaction sequences of any two terminals are matched and analyzed to determine whether there is a partial temporal overlap between events of the same type. If so, it is recorded as a synchronization event. The temporal overlap of a synchronization event is defined as the proportion of the actual overlap duration of the two events to the duration of the shorter event. The event intensity level is preset by the event type: near-field communication events are assigned the highest level, Bluetooth events are assigned a medium level, and application adjacency events are assigned a basic level. The synchronization event score is determined by the product of the overlap and the event intensity level. For example, if both terminals experience a Bluetooth identification event lasting three minutes and two minutes respectively, with one and a half minutes of overlap, the temporal overlap is 75%. The Bluetooth event intensity level is two, so the synchronization event score is 75% multiplied by two, which is one and a half minutes.
[0044] The actual total synchronization score is calculated by summing the scores of all synchronization events for the terminal pair within the target time period. Simultaneously, the maximum score obtainable from all communication events occurring on the terminal pair, assuming complete overlap and reaching the maximum event intensity level, is calculated as the reference total score for the terminal pair. For example, if the terminal pair experiences five communication events—two near-field communications, two Bluetooth identification events, and one application adjacency event—the reference total score is the sum of the scores for these five events under ideal conditions, calculated and superimposed using 100% maximum event overlap and maximum intensity levels three, two, and one, respectively. The actual total synchronization score is then divided by the reference total score to obtain the behavioral interaction consistency coefficient for the terminal pair.
[0045] The behavioral interaction consistency coefficients of all terminal pairs within the same target area are averaged to generate the behavioral interaction consistency index for that area. This index reflects the overall degree of behavioral coordination among terminals within the target time period. A higher index value indicates a higher level of interactive synchronization among terminals within the area. It can serve as an important input feature indicator in the subsequent aggregation credibility scoring module to determine whether there is supporting evidence of actual physical aggregation, thereby improving the system's ability to identify real aggregation events.
[0046] The calculation method for cluster credibility scoring includes the following steps to determine the authenticity and credibility of clustering behavior within a target area, ensuring that the system can distinguish between real and pseudo-clustering scenarios, thereby improving the data input reliability of the infectious disease risk dynamic assessment model. The spatial access overlap index and behavioral interaction consistency index of the target area are obtained, representing the degree of shared access resources and actual physical interaction between terminals within the area, respectively. These two indicators are used as the joint input to the scoring model. Specifically, when a large number of terminals in a certain area share the same communication access resources and exhibit synchronous short-range interaction behavior, both the spatial access overlap index and the behavioral interaction consistency index will increase. For example, if multiple terminals in an office building continuously access the same base station and conduct short-range communication via Bluetooth, a high index value will be generated, providing the system with a data foundation containing potentially real clustering characteristics.
[0047] The two indicators mentioned above are input into the scoring model using a time-series window, enabling it to learn the true patterns of clustering behavior from continuous regional evolution. To ensure the stability of the feature input, the system aligns the spatial access overlap index and the behavioral interaction consistency index with a uniform time granularity, for example, using five minutes as a data unit to form a continuous feature sequence. In this way, the scoring model can not only identify instantaneous clustering phenomena but also capture the continuous trend of clustering behavior over time, which helps improve the reliability of real-world clustering identification and the ability to model continuous scenes.
[0048] The scoring model, based on a convolutional neural network (CNN) architecture, extracts features and maps scores to the coupling relationship between spatial aggregation features and behavioral aggregation features. It then uses the CNN model to fuse the two input metrics, generating an aggregation credibility score. The CNN model automatically identifies highly correlated patterns between the spatial access overlap index and the behavioral interaction consistency index by extracting local pattern features within a continuous time window. For example, when both metrics rise synchronously over time and remain at stable high values, the scoring model considers this a high-credibility feature of a genuine aggregation event, thus generating a higher aggregation credibility score.
[0049] The system outputs the credibility of clustering behavior within the current area based on a clustering credibility score, which is used to determine whether to include the current clustering data in the subsequent dynamic assessment model of infectious disease risk. For example, when the clustering credibility score of a certain area exceeds the credibility threshold set by the system, the system considers it as real clustering data and inputs it into the dynamic assessment model of infectious disease risk for further transmission risk analysis; conversely, when the score is low, it will be marked as pseudo-clustering and excluded to prevent misjudgments caused by signal interference or virtual behavior, and to ensure the scientific and accurate nature of risk decisions.
[0050] The credibility scoring module employs a convolutional neural network (CNN) model to achieve the fusion recognition and credibility scoring of complex correlation patterns between the spatial access overlap index and the behavioral interaction consistency index. This scoring model utilizes mature deep learning technology, which has been widely applied in various spatiotemporal behavior recognition fields and demonstrates good performance in feature extraction efficiency, data pattern recognition accuracy, and parameter generalization ability, making it feasible and stable as the scoring model for this invention. The CNN model uses the spatial access overlap index and the behavioral interaction consistency index as inputs to construct a two-dimensional time segment structure. In the model structure design, multiple convolutional units are used to perform sliding window processing on the input indicators in the time dimension, extracting the coupling change features between different indicators within each window. Specifically, the first convolutional unit of the model is used to identify the temporal stability characteristics of a single indicator, such as high access frequency or high interaction consistency segments within a continuous time period; the second convolutional unit uses a horizontal convolutional structure to fuse and analyze the temporal alignment characteristics between two input indicators, thereby identifying whether there is a synchronous change trend between space and behavior; the third convolutional unit further extracts cross-dimensional combination patterns to identify composite features such as the simultaneous occurrence of high access overlap and high interaction frequency.
[0051] After feature extraction, the model compresses redundant features using a pooling mechanism to retain significantly changing segments, and maps convolutional features to clustering credibility scores using a fully connected structure. The score represents the credibility of clustering behavior within the current target region and is used to determine whether this clustering behavior can be included as formal input in the dynamic assessment model for infectious disease risk.
[0052] To enhance the model's adaptability in dynamic environments, this invention trains the model using typical sample sets of historical real and pseudo-clustered regions. It employs trusted clustering labels provided by manual annotation or auxiliary detection systems as supervisory signals, and iterates through multiple rounds of training on the model parameters. After training, the model can comprehensively identify spatial and behavioral clustering patterns of each target region within any target time period in a real-world deployment environment, outputting a clustering trustworthiness score. In the application layer design, the convolutional neural network model, as the scoring core, can be embedded into the system's real-time processing flow, supporting rapid evaluation at a minute-level update frequency, particularly suitable for high-concurrency clustering identification tasks in high-density terminal areas. By introducing this existing model technology, this invention achieves automatic feature extraction and intelligent trustworthiness judgment of input indicators without increasing hardware computing costs, avoiding the problem of insufficient adaptability to scene changes in traditional rule-setting methods, and improving the overall accuracy of clustering discrimination in multi-source heterogeneous communication data environments. The convolutional neural network model is not proposed as an innovation in this invention, but rather as a structural optimization application of existing technology. Its introduction in the multi-source communication indicator fusion evaluation scenario reflects the full utilization of the capabilities of mature intelligent models and the reasonable integration at the system level.
[0053] The pseudo-clustering area labeling method includes the following steps to dynamically identify and eliminate clustering behavior areas lacking real physical interaction support, preventing false clustering characteristics from entering the dynamic assessment model of infectious disease risk, thereby ensuring the accuracy of the system's judgment logic and the effectiveness of resource allocation. The method involves obtaining the clustering credibility score of the current target area and retrieving historical score records of the area that were judged as genuine clusters by the system over multiple past time periods. Based on the stable distribution range of these score records, a reference quantile interval representing genuine clustering characteristics is extracted. This reference quantile interval is determined by a continuous percentage interval within the middle stable segment of the score records, and the lower bound of this reference quantile interval is used as the dynamic credibility threshold for the area in the current time period. During this process, the system uses the clustering credibility score of the target area as real-time input and simultaneously retrieves score data from, for example, when the area was identified as a genuine clustering scenario within the same time period over the past seven or thirty days. The system sorts these historical data by size and filters out continuous intervals within a consistently stable segment, such as the 30% to 70% score interval, using this interval as the credibility score band for genuine clustering identification. Based on this, the system uses the minimum score value within this interval as the dynamic credibility threshold. For example, if the historical true aggregation score of the region is continuously distributed between 0.6 and 0.9, the system will use 0.6 as the dynamic reliable threshold for the current time period. This mechanism enables the system to adaptively set the threshold based on the region's own aggregation behavior characteristics, rather than using a fixed static threshold, ensuring adaptability in different scenarios, user densities, and time periods.
[0054] The system compares the current cluster credibility score with the dynamic credibility threshold. If the current score is lower than the dynamic credibility threshold, the area is marked as a pseudo-cluster area. In implementation, the system obtains the current cluster credibility score. For example, if an office area has a score of 0.35 at 2 PM, and the lower bound of the long-term real cluster score range for that area during the same time period is 0.6, the system determines that the score is lower than the dynamic credibility threshold, meaning that the current clustering behavior in that area lacks sufficient real interaction support. For instance, the area might have a large number of devices accessing the same network resource, but actual personnel activity is sparse, or there might be a false clustering phenomenon due to shared hotspot networks, resulting in a significantly low cluster credibility score. In this scenario, the system automatically marks the area as a pseudo-cluster area to avoid misjudging it as a potential core point of infection transmission in subsequent risk analysis.
[0055] To prevent low-reliability clustering behavior from interfering with the overall judgment of the transmission risk model, pseudo-clustering areas are removed from the data input of the dynamic assessment process for infectious disease risk. During the removal process, the system masks all input feature values corresponding to pseudo-clustering areas and stops performing transmission risk simulations for those areas. Simultaneously, it records the marking time and clustering characteristic patterns of the area for subsequent model learning and strategy updates. For example, when an area is marked as a pseudo-clustering area, its spatial access overlap index, behavioral interaction consistency index, and other feature data will no longer be included in the risk transmission calculation module, ensuring that the transmission risk model is evaluated only based on real clustering areas and reliable trajectory areas. The system also archives the pseudo-clustering records for that area to construct a pseudo-clustering sample set, enabling subsequent machine learning mechanisms to identify specific patterns, such as false high-density signal scenarios in areas covered by specific public wireless access points or temporary signal clustering phenomena outside electronic fences at tourist attractions, thereby improving the model's long-term robustness and self-learning ability.
[0056] Based on the above process, the system dynamically updates the regional clustering reliability evaluation mechanism in subsequent time periods, incorporating the pseudo-clustering labeling results into historical samples and behavioral reference libraries. This allows the system to more quickly identify potential pseudo-clustering areas and execute removal logic when similar clustering signal patterns appear under similar time periods and network environment conditions in the future. For example, if a shopping mall consistently experiences concentrated communication access but weak behavioral interaction during the midday period, the system includes this pattern in the pseudo-clustering feature template. When similar characteristics occur again at the same time the following day, the system can more quickly identify the area as a pseudo-cluster, thus avoiding mistakenly including it in the analysis of the real risk propagation chain. This dynamic adjustment mechanism not only improves the long-term stability and judgment accuracy of the system but also achieves self-calibration and self-evolution capabilities. This enables the invention to cope with the influence of multiple factors such as changes in the communication environment, changes in user behavior patterns, and changes in network architecture in long-term operating environments, maintaining a high-precision identification capability for pseudo-clustering behavior.
[0057] It should be noted that the "first preset angle deviation threshold" and "second preset stability threshold" in the trajectory similarity judgment module are based on the existing standards for judging directional consistency in the fields of trajectory analysis and traffic behavior recognition. Typically, an average angle difference of less than ten degrees is used as the lower limit for directional similarity, widely applied in mobile terminal trajectory compression and anomaly detection algorithms. The stability threshold is the standard deviation of the directional difference, reflecting directional fluctuations. Referring to research on the assessment of continuous motion stability in behavioral science, a standard deviation of less than fifteen degrees is generally used as the stability threshold for behavioral consistency, used to exclude large-scale turning anomalies occurring within a short period.
[0058] The behavioral reference threshold range in trajectory dynamic correlation calculation is derived from the user trajectory stability feature extraction methods in existing spatiotemporal behavior modeling research. The reference threshold settings for the proportion of speed stability intervals, the continuity of dwell period, and path smoothness are usually based on the upper and lower limits of 10 percent of the average trajectory of the same user under non-abnormal conditions over multiple historical days, and are used to identify trajectory segments that significantly deviate from daily behavior.
[0059] The stability threshold is used to determine whether a terminal's connection to a network source is stable. Originating from cellular network behavior analysis, it typically uses a connection frequency greater than five times and an average access duration exceeding 20% of the total access duration as the edge weight threshold for judging network access stability. The access overlap ratio threshold is used to determine the degree of overlap in the access paths of two terminals. Based on existing socially aware access graph construction methods, it is usually set to consider significant access commonality when more than 50% of the nodes in the access path are common nodes.
[0060] The "Event Intensity Level" and "Time Overlap Threshold" in the Behavior Linkage Analysis module (though not officially named "threshold," they effectively function as thresholds) are used. The Event Intensity Level is based on the representative levels of physical contact for Bluetooth, Near Field Communication (NFC), and Application Adjacency events in existing near-field interaction recognition models. NFC has the highest intensity level due to its shortest physical distance, followed by Bluetooth; Application Adjacency, being a logical event, has the lowest intensity level. The Time Overlap Threshold references the event time intersection determination logic in existing synchronous behavior recognition models, typically using 50% of the shorter event duration as the minimum synchronization criterion.
[0061] The credibility threshold in the clustering credibility scoring module is dynamically set, not a static constant. Its setting follows the standard practice of risk level classification in existing statistics using the reference quantile method. Specifically, it retrieves the score distribution of the target region across multiple historical time periods where it was determined to be a true cluster, extracts the lower 30% to 50% range as the reference quantile interval, and uses the lower bound as the dynamic credibility threshold for the current time period, ensuring it aligns with the inherent clustering characteristics of the region. The setting methods for various thresholds are based on existing computational principles or empirical rules in several mature fields such as trajectory analysis, network behavior modeling, spatial interaction recognition, and statistical analysis, ensuring the threshold determination is scientifically sound and practically feasible.
[0062] A pre-defined dynamic risk assessment model is used to analyze the transmission risk of screened and confirmed real cluster areas and related credible user behavior data, and finally outputs the infectious disease transmission risk level of each target area in geographic space. This pre-defined dynamic risk assessment model belongs to a widely used existing technology type in the field of public health prevention and control, with mature data processing methods and risk level classification standards. This invention enhances the accuracy and practical significance of the model's output by filtering data inputs and dynamically identifying credible clusters. Specifically, it includes the following steps: After filtering pseudo-cluster areas, the system uses data from all remaining areas as credible inputs and sends them to the pre-defined dynamic risk assessment model. This model calculates risk based on existing infectious disease transmission theories and regional infection risk evolution patterns, typically using a multi-dimensional risk factor construction approach, including indicators such as real cluster size, credible trajectory density, local population density, historical infection rate, regional health resource allocation, and public contact intensity. The system uses each geographic spatial unit as the basic analysis unit, such as administrative streets, community areas, campus areas, or shopping center coverage boundaries as geographic spatial slices, giving the risk output high-resolution spatial characteristics. For example, when multiple real-world gatherings occur in a large commercial center area within a short period, accompanied by an increased historical risk of infection, the reliable input value for that area will significantly increase, indicating a higher risk of transmission. This step ensures that the present invention does not alter the original structure and calculation method of the preset model, but rather enhances the model's reliability and environmental adaptability in multi-source communication environments through precise screening and error removal compensation of input data.
[0063] A pre-defined dynamic risk assessment model models the regional propagation chain of the input credible aggregation data, integrating current credible aggregation behavior with historical propagation trajectory distribution characteristics to form a risk level trend judgment. In this process, the model introduces a time continuity parameter, weighting the analysis of actual aggregation intensity, biological contact potential, and historical cumulative probability of the propagation chain within different time periods. For example, if a region shows an increasing trend of actual aggregation events over five consecutive hours, accompanied by an increase in short-distance contact density, the propagation chain model for that region will automatically adjust the predicted propagation risk value for the future period. Combining the characteristics of regional personnel flow channels, such as subway station entrances or main passageways on university campuses, the model further determines whether the affected area has the potential for outward propagation or cross-regional influence, thereby constructing a risk diffusion trend map. This invention, through a pre-process pseudo-aggregation elimination mechanism, ensures that the modeling in this stage is entirely based on real behavior, thus avoiding the false propagation chain effect caused by misjudging the source of dense signals in traditional models, significantly improving the scientific rationality of risk prediction.
[0064] The system maps risk levels to each target area based on the intermediate output of a pre-set dynamic risk assessment model and performs spatial visualization and aggregation analysis. Risk levels can be categorized into three or more levels, such as low, medium, and high risk, based on the model's calculation results. For example, if a region maintains a high real clustering index for the past 24 hours, accompanied by overlapping historical case contact areas, the system marks the region as high-risk; conversely, if a city park experiences occasional small-scale real clustering without overlapping transmission chains, it is marked as low-risk. The output process is performed using geospatial grid slicing, for example, using a 500-meter square grid to form a refined risk heat map, enabling decision-making departments to quickly identify regional risk hotspots and formulate targeted prevention and control measures. This invention does not create new risk calculation logic in this step; instead, it combines the pre-set dynamic risk assessment model with the invention's reliable clustering screening capability to improve output accuracy and enhance its guidance significance for prevention and control.
[0065] The system updates risk output results in real time and provides policy-linked feedback, offering the infectious disease transmission risk level of each target area, geographically, to the city's epidemic prevention and control management center, medical resource allocation system, and public health monitoring platform. It also supports the generation of regional risk time evolution curves. The system outputs different response strategies based on the risk level of different areas; for example, it initiates temporary movement control for high-risk areas, strengthens public health patrols for medium-risk areas, and maintains routine early warning monitoring for low-risk areas. In this process, the invention establishes a soft-trigger mechanism: when abnormal changes occur in gathering behavior in a region, or the risk level increases due to updated real-time data, the system can automatically trigger an early warning procedure and recommend appropriate policies, such as increasing nucleic acid screening points, improving medical resource allocation, or strengthening campus entry testing. By combining a pre-set dynamic risk assessment model with real-time behavior screening, false cluster elimination, and spatial dynamic monitoring technologies, the invention enables the entire risk output system to possess high real-time performance, adaptability, and interpretability. This ensures that the public health system can accurately locate risk areas, avoid resource waste and misjudgments in prevention and control, and achieve more scientific and reasonable epidemic prevention and control management goals.
[0066] The pre-defined dynamic risk assessment model can utilize existing, mature infectious disease transmission modeling algorithms, such as distributed spatiotemporal transmission models based on the infectious-infected-removal framework, regional transmission risk models based on cellular automata evolution, or urban transmission risk prediction models trained on graph neural networks. These models are widely used in public health risk assessment systems and possess good stability and scalability. Specifically, the pre-defined dynamic risk assessment model can be: an extended version of the classic infectious-infected-removal model, such as a dynamic transmission model incorporating spatial migration matrices and time-varying contact rate parameters; a regional transmission graph model driven by social contact graphs, combining population contact networks and cluster events to construct a dynamic transmission graph; a risk prediction model combining graph convolutional networks and time attention mechanisms, used to predict regional infection trends from multi-source time series; an urban grid evolution model based on cellular automata, simulating the spread of disease in spatial grids; or a regional health risk assessment module pre-deployed by a government health management platform, such as the "multi-factor spatial risk scoring engine" in the national infectious disease early warning system. This invention does not limit the specific implementation of the model, only requiring that the model has the function of accepting spatial granular data input and outputting risk levels in the geospatial dimension, and that it is a mature algorithm model already existing in the prior art. In actual deployment, any of the above-mentioned types of models can be flexibly selected as the dynamic risk assessment engine according to the infrastructure, historical case records, and data interface support of different regions, and integrated with the trusted input mechanism of this invention to achieve risk level output with higher accuracy.
[0067] The above-mentioned models or function formulas are all dimensionless and numerical calculations. The models or function formulas are obtained by software simulation based on a large amount of collected data to obtain the most recent real situation. The preset parameters in the models or function formulas are set by those skilled in the art according to the actual situation.
[0068] It should be understood that in the various embodiments of this application, the order of the above-mentioned processes does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.
[0069] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0070] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0071] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A dynamic risk assessment system for infectious diseases based on multi-source communication data fusion, characterized in that, include: The trajectory similarity judgment module receives continuous positioning data from multiple terminals, extracts the movement trajectory feature vector of each terminal within the target time period, performs dynamic correlation calculation with its historical trajectory, identifies concentrated behaviors with non-inertial jump characteristics, and eliminates false clustered trajectories. The communication access fusion module extracts network access source information of multiple terminals within a target time period, constructs a regional hotspot access map, and generates a spatial access overlap index based on the common matching degree of the network access structure and the cross-terminal redundancy ratio, which represents the degree of non-real physical aggregation risk caused by shared access resources. The behavior linkage analysis module acquires short-range communication interaction data of the terminal within the target time period, including Bluetooth broadcast scanning, near-field communication logs and application layer adjacency behavior. It analyzes the time consistency of behavior triggering and interaction frequency between devices, and calculates the behavior interaction consistency index to evaluate the degree to which the aggregation behavior is supported by actual physical interaction. The cluster credibility scoring module receives the spatial access overlap index and the behavioral interaction consistency index, inputs them into the fusion discrimination model, and outputs the cluster credibility score, which serves as the basis for whether to include the current clustered data in the infectious disease risk dynamic assessment model. The risk intervention and correction module marks areas with cluster credibility scores below the credibility threshold as pseudo-cluster areas, removes relevant data for these areas from the data input of the preset risk dynamic assessment model, and records trajectory tracing information. It then inputs all infectious disease risk collection data after removing all pseudo-cluster areas into the preset risk dynamic assessment model and outputs the infectious disease transmission risk level of each target area in geographic space.
2. The infectious disease risk dynamic assessment system based on multi-source communication data fusion according to claim 1, characterized in that, Methods for extracting feature vectors of terminal movement trajectories include: The positioning points of multiple terminals within the target time period are organized in chronological order. Based on adjacent positioning points, the moving speed, direction angle, dwell time and position change of each terminal are calculated, and these parameters are arranged in time series to form a trajectory basic parameter sequence. Perform time sliding window analysis on the trajectory basic parameter sequence, extract the velocity change trend, directional continuity, dwell phase ratio and position fluctuation intensity within each window, and generate trajectory behavior segments that reflect the continuous motion rhythm and stability. All trajectory behavior segments are connected in chronological order, and a motion trajectory feature vector is formed by numerical normalization and time weight calibration.
3. The infectious disease risk dynamic assessment system based on multi-source communication data fusion according to claim 2, characterized in that, The dynamic correlation calculation of the terminal trajectory refers to: Extract the feature vector of the terminal's movement trajectory within the target time period and compare it segment by segment with the feature vector of the terminal's movement trajectory formed within the historical normal behavior cycle. The consistency of the trajectory is jointly judged by the similarity of the direction sequence and the overlap of the dwell position. When the deviation condition is met, it is marked as a trajectory deviation segment. For time periods marked as trajectory deviation segments, the trajectory segments are compared with the set behavior reference threshold range based on the proportion of the trajectory segments in the stable speed range, the continuity of the dwell period, and the change range of the path smoothness. When any two of these indicators exceed the reference threshold range consecutively, the trajectory segment is marked as a significant deviation segment. Trajectories marked as significantly deviating from the target segment are directly removed from the movement trajectory sequence, and the remaining trajectory segments are used as reliable trajectories to input into the subsequent aggregation and recognition process.
4. The infectious disease risk dynamic assessment system based on multi-source communication data fusion according to claim 3, characterized in that, Satisfying the deviation condition means: The terminal's movement trajectory within the target time period is divided into multiple continuous movement direction segments. Each direction segment consists of two consecutive positioning points. The displacement direction of the segment is extracted and represented in the form of angles to form a complete direction sequence. At the same time, the direction sequence of the terminal within the historical normal behavior cycle is extracted as a reference direction sequence. For each pair of corresponding direction segments between the direction sequence within the target time period and the historical direction sequence, the angle difference is calculated, and the average and standard deviation of all angle differences are statistically analyzed to determine whether the overall direction change trend is consistent. When the average difference is lower than the first preset angle deviation threshold and the standard deviation is lower than the second preset stability threshold, the direction sequences are considered to be similar. When determining the overlap of stopping positions, the geographical location of each stationary stopping point in the current trajectory is compared with the distance of each stopping point in the historical trajectory. The number of stopping points falling within the spatial overlap range is counted, and the proportion of the current total number of stopping points is used as the overlap ratio. When the overlap ratio is higher than the preset position overlap ratio threshold, the stopping positions are considered to have spatial consistency. When both directional similarity and stopping position consistency are satisfied, the trajectory is determined to have behavioral consistency; otherwise, it is marked as a trajectory deviation segment.
5. The infectious disease risk dynamic assessment system based on multi-source communication data fusion according to claim 4, characterized in that, The methods for calculating the spatial access overlap index include: Network access source information for multiple terminals within a target time period is extracted, including the base station number connected to each terminal, the start and end points of the access time period, and the number of accesses. Each terminal is designated as a first-class node, and each access source is designated as a second-class node. A regional hotspot access map is constructed based on the connection relationship between the terminal and the access source. In this map, the weight of each connection edge is calculated by the ratio of the access time length to the access frequency, which is used to characterize the access stability of the terminal to that access source. The degree of overlap of access paths between any two terminals is calculated, the set of repeated access sources in the connection between the two is counted, and the average weight of all edges in the set is calculated. If the average weight exceeds the stability threshold set by the system, and the proportion of the number of repeated access sources to the total number of access sources of the two terminals is higher than the access overlap ratio threshold, then it is determined that the terminal pair has a high access commonality relationship. The number of terminal pairs that satisfy the common relationship of high access in the entire map is counted, and the ratio of this number to the total number of all terminal pairs in the map is calculated as the spatial access overlap index.
6. The infectious disease risk dynamic assessment system based on multi-source communication data fusion according to claim 5, characterized in that, The methods for calculating the behavioral interaction consistency index include: Collect short-range communication interaction events of multiple terminals within a target time period, including Bluetooth identification events, near-field communication events, and application adjacency events. Represent each event as a triplet of event type, start time, and end time, and arrange them in chronological order to construct a sequence of terminal interaction behaviors. The interaction behavior sequences of any two terminals are matched and analyzed to determine whether there are events of the same type that partially overlap in time. If so, they are recorded as a synchronization event. The time overlap of the synchronization event is defined as the proportion of the actual overlap time of the two events to the shorter event time. The event intensity level is preset by the event type. Near field communication events are assigned the highest level, Bluetooth events are assigned the medium level, and application adjacency events are assigned the basic level. The synchronization event score is the product of the overlap and the event level. The scores of all synchronization events of the terminal pair within the target time period are summed to obtain the actual total synchronization score. At the same time, the maximum sum of scores that can be obtained in all communication events that have occurred in the terminal pair, assuming that each event is completely overlapping and reaches the upper limit of the event intensity level, is calculated as the reference total score of the terminal pair. The actual total synchronization score is divided by the reference total score to obtain the behavioral interaction consistency coefficient of the terminal pair. The behavioral interaction consistency coefficients of all terminal pairs within the same target area are averaged to generate the behavioral interaction consistency index for that area.
7. The infectious disease risk dynamic assessment system based on multi-source communication data fusion according to claim 6, characterized in that, Methods for calculating cluster credibility scores include: The spatial access overlap index and behavioral interaction consistency index of the target area are obtained, which respectively represent the degree of access resource sharing and the degree of actual physical interaction among terminals in the area, and the two indicators are used as the joint input of the scoring model. The scoring model is based on a convolutional neural network architecture. It extracts features and maps scores to the coupling relationship between spatial aggregation features and behavioral aggregation features. The convolutional neural network model is used to fuse the two input indicators to generate an aggregation credibility score. The score result represents the credibility of aggregation behavior in the current region.
8. The infectious disease risk dynamic assessment system based on multi-source communication data fusion according to claim 7, characterized in that, Pseudo-cluster region labeling methods include: Obtain the clustering credibility score of the current target area and retrieve the historical score records of the area that were judged as real clusters by the system in the past multiple time periods. Based on the stable distribution range of these score records, extract the reference quantile interval representing the real clustering characteristics. The reference quantile interval is determined by the continuous percentage interval in the middle stable segment of the score record, and the lower bound of the reference quantile interval is used as the dynamic credibility threshold of the area in the current time period. The current cluster credibility score is compared with the dynamic credibility threshold. If the current score is lower than the dynamic credibility threshold, the region is marked as a pseudo-cluster region.