A trajectory outlier detection method based on group division
By using a group-based partitioning method to divide the trajectory dataset using grid density and trajectory group rate, outliers in taxi trajectories are identified, solving the problem of discrepancies between road network environment and trajectory type in existing technologies, and achieving more efficient and accurate trajectory anomaly detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ANHUI NORMAL UNIV
- Filing Date
- 2023-11-10
- Publication Date
- 2026-06-19
Smart Images

Figure CN117609914B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of trajectory mining, and in particular to a method for detecting outliers in trajectories based on group partitioning. Background Technology
[0002] The development of vehicle communication and networks has generated a vast amount of motion trajectory data. Analyzing this data can aid in urban planning, congestion detection, fraud detection, traffic flow prediction, and travel time estimation. Against this backdrop, trajectory data mining and its applications have become an important research topic in urban transportation systems.
[0003] Trajectory data analysis includes trajectory clustering, trajectory classification, trajectory pattern mining, and trajectory outlier detection. Existing technologies have extensively studied trajectory outliers. For example, patent application number 201711099171.4 discloses a trajectory outlier detection method based on common segmented subsequences. This method constructs a trajectory direction code sequence based on the trajectory's directional features, obtaining the trajectory's segmented and piecewise sequences; calculates the CSS distance between trajectories; and detects outlier trajectory pieces and outliers based on preset outlier trajectory piece and outlier point measurement methods. The advantages of this invention are: it designs a trajectory direction code sequence, a trajectory piecewise feature sequence, and the distance between common segmented subsequences of trajectories, thus achieving outlier detection in trajectory pieces and outlier trajectories.
[0004] Trajectory outliers can be used in fields such as vehicle trajectory detection (e.g., taxi trajectory detection) to aid in urban planning, congestion detection, fraud detection, road traffic prediction, and travel time estimation. Trajectories that deviate from the norm spatially or geographically are considered outliers (i.e., outlier trajectories). Many scholars have researched trajectory outlier detection and achieved corresponding results, including clustering-based, distance-based, and grid-based methods. However, most existing methods have two limitations. First, they ignore the influence of the road network environment on trajectory outlier detection, while detour behavior often occurs in densely networked areas with more alternative routes. Second, they treat the trajectory dataset as a whole, ignoring deviations between different types of normal trajectories. To address these issues, this study proposes a group-partition-based trajectory outlier detection method for detecting outlier trajectories from trajectory datasets with the same origin (S) and destination (D). Summary of the Invention
[0005] The purpose of this invention is to overcome the shortcomings of the prior art and provide a trajectory outlier detection method based on group division. This method aims to solve the problem that existing taxi outlier detection methods cannot effectively identify outlier trajectories and to detect outlier trajectories from trajectory datasets with the same origin (S) and destination (D).
[0006] To achieve the above objectives, the technical solution adopted by the present invention is as follows:
[0007] A trajectory outlier detection method based on group partitioning includes the following steps:
[0008] S1. Divide the urban vector area into multiple grids, assign different codes to each grid, and calculate the density of each grid based on the urban road network data;
[0009] S2. Based on the comparison between grid density and grid density threshold, the grid is divided into high-density and low-density grids. The trajectory dataset with fixed origin S and destination D is encoded to obtain the trajectory encoding sequence.
[0010] S3. Based on the number of low-density grids traversed by the trajectory, the trajectory dataset is divided into trajectory groups, and the trajectory group rate is calculated. Trajectory groups with a trajectory group rate less than the group rate threshold are outlier trajectory groups. Otherwise, the compatible trajectories within the group are calculated, and the trajectory group is divided into one or more trajectory clusters.
[0011] S4. Calculate the trajectory cluster rate and divide the trajectory group into a normal sub-trajectory set and a pseudo-anomaly trajectory set;
[0012] S5. For a single trajectory group, calculate the trajectory score within the pseudo-abnormal trajectory set based on the normal sub-trajectory, divide the pseudo-abnormal trajectory set into normal trajectories and outlier trajectories, and combine the outlier trajectories within all trajectory groups to form an outlier trajectory set.
[0013] The method for calculating the mesh density in step S1 includes:
[0014] The ratio of the total length of roads within a grid to the grid area is used as the grid density.
[0015] In step S1, the urban vector region is divided into grids of fixed size.
[0016] The method for obtaining the grid density threshold in step S2 is the natural discontinuity classification method.
[0017] The method for calculating the trajectory group rate in step S3 is as follows:
[0018] The trajectory group rate is calculated using the following formula:
[0019]
[0020] Among them, num(TG) i ) is the trajectory group TG i The number of trajectories within the trajectory dataset TD, where num(TD) is the number of trajectories within the dataset TD.
[0021] In step S3, if the trajectory group ratio is greater than or equal to the group ratio threshold ζ, then the compatible trajectories within the group are calculated, and the trajectory group is divided into trajectory clusters. The formula for calculating compatible trajectories is:
[0022] in, It is trajectory T i The high-density grid-coded sequence. It is trajectory T j The high-density grid-coded sequence passed through; if ρ = 0, the trajectory T i and T j For compatible trajectories, they are in the same trajectory family.
[0023] In step S4, the trajectory clustering rate determines whether a trajectory within a cluster is a normal sub-trajectory, and its calculation method is as follows:
[0024] Let TD be the trajectory dataset, and TZ be the trajectory dataset. i It is the i-th trajectory cluster, and the trajectory cluster rate of TZ is TZ. i The ratio of the number of inner tracks to the number of TD inner tracks is calculated using the following formula:
[0025]
[0026] Among them, num(TZ) i ) is TZ i The number of trajectories within TD, num(TD) is the number of trajectories within TD. Based on the trajectory clustering rate threshold ζ, the trajectory group is divided into normal sub-trajectories and pseudo-anomaly trajectories.
[0027] In step S5, the trajectory score is calculated within the pseudo-anomaly trajectory set. Based on the outlier trajectories in the trajectory score detection group, the trajectory score calculation formula is as follows:
[0028]
[0029] Among them, TS i Trajectory T in the normal sub-trajectory dataset i The encoded sequence, TS j Trajectory T in the pseudo-anomaly trajectory dataset j The encoded sequence, the calculation result satisfies The larger the value, the larger the trajectory T. j The more likely it is to be a normal trajectory.
[0030] The trajectory score is less than or equal to the score threshold. This is then identified as an outlier trajectory.
[0031] After step S5, the pseudo-abnormal trajectory set is divided into a quasi-normal trajectory set and an abnormal trajectory set. The set formed by the quasi-normal trajectory set and the normal sub-trajectory set is the normal trajectory in the trajectory group, and the remaining trajectories in the group are outlier trajectories.
[0032] The advantages of this invention are as follows: First, it proposes a grid type definition method based on grid density calculation, dividing urban areas into high-density and low-density grids, as abnormal taxi behavior typically occurs in densely networked areas. Second, it proposes a trajectory outlier detection method based on group partitioning, obtaining a normal subset of trajectories within each trajectory group based on a high-density grid sequence. This method prevents the negative impact of excessive spatiotemporal deviations in normal trajectories on the outlier detection results, improving detection efficiency and accuracy. Experimental results show that the proposed method performs better in trajectory outlier detection, achieving better results and providing greater practical guidance, such as for identifying suspicious vehicle activities. Attached Figure Description
[0033] The following is a brief explanation of the contents of each of the accompanying drawings and the markings in the drawings:
[0034] Figure 1 This is a framework diagram for trajectory outlier detection provided in an embodiment of the present invention;
[0035] Figure 2 This is a schematic diagram of urban gridding provided in an embodiment of the present invention;
[0036] Figure 3 (a) is a schematic diagram of a high-density grid provided in an embodiment of the present invention;
[0037] Figure 3 (b) is a schematic diagram of a low-density grid provided in an embodiment of the present invention;
[0038] Figure 4 This is a schematic diagram of some trajectory points provided in an embodiment of the present invention;
[0039] Figure 5 The T-1 dataset provided for embodiments of the present invention is divided into different thresholds ζ and The diagram below illustrates F-measure, Accuracy, Precision, and Recall;
[0040] Figure 6 (a) The T-2 dataset provided in the embodiments of the present invention at the threshold ζ2 and The diagram below illustrates F-measure, Accuracy, Precision, and Recall;
[0041] Figure 6 (b) The T-3 dataset provided in the embodiments of the present invention at the threshold ζ2 and The diagram below illustrates F-measure, Accuracy, Precision, and Recall;
[0042] Figure 7 A schematic diagram showing the F-measure comparison results of the TODG algorithm, Two Phase algorithm, ATDC algorithm and iBAT algorithm provided in the embodiments of the present invention;
[0043] Figure 8 A schematic diagram showing the accuracy comparison results of the TODG algorithm, Two Phase algorithm, ATDC algorithm and iBAT algorithm provided in the embodiments of the present invention;
[0044] Figure 9 A schematic diagram showing the comparison of precision results of the TODG algorithm, Two Phase algorithm, ATDC algorithm and iBAT algorithm provided in the embodiments of the present invention;
[0045] Figure 10 A schematic diagram showing the comparison results of the Recall of the TODG algorithm, Two Phase algorithm, ATDC algorithm and iBAT algorithm provided in the embodiments of the present invention;
[0046] Figure 11 (a) is a schematic diagram of the detection results of the TODG algorithm provided in this embodiment of the invention on the T-1 dataset.
[0047] Figure 11 (b) is a schematic diagram of the detection results of the TODG algorithm provided in this embodiment of the invention on the T-2 dataset.
[0048] Figure 11 (c) is a schematic diagram of the detection results of the TODG algorithm provided in this embodiment of the invention on the T-3 dataset. Detailed Implementation
[0049] The specific embodiments of the present invention will be further described in detail below with reference to the accompanying drawings and the description of the preferred embodiments.
[0050] This invention relates to the field of data mining technology and provides a method for detecting outliers in trajectories based on group partitioning. The method includes: dividing an urban area into grids of fixed size, assigning a different code to each grid; calculating grid density based on road network data and obtaining a grid density threshold using a natural discontinuity grading method; classifying the grids into high-density and low-density grids based on the density threshold; converting the trajectories into coded sequences; dividing the trajectory dataset into several groups based on the number of low-density grids traversed by the trajectories; for a single trajectory group, calculating compatible trajectories within the group based on the high-density grid sequence of the trajectory to obtain a normal subset of trajectories; detecting outliers within the group based on the normal subset of trajectories; and obtaining anomalous trajectories within all groups to form the final set of outlier trajectories. This invention considers the impact of the road network environment on outlier trajectory detection. Based on grid density, trajectory sequences, and similarity characteristics between trajectories, it detects outliers in the trajectory set, effectively improving detection efficiency and accuracy. It has better practical guiding significance, such as identifying suspicious vehicle activities, public transportation planning, and detecting taxi fraud, and can be applied to the detection of outliers in various trajectories.
[0051] A trajectory outlier detection method based on group partitioning, the method includes the following steps:
[0052] S1. Divide the urban vector area into grids of fixed size. Assign a different code to each grid. Calculate the density of each grid based on urban road network data. The code assigned to each grid refers to numbering each grid so that each network corresponds to a grid number. When the trajectory passes through multiple grids in sequence, the sequence of grids passed through forms a trajectory code sequence with multiple grid numbers.
[0053] S2. Based on grid density, a grid density threshold is obtained using the natural discontinuity grading method. According to the density threshold, the grid is divided into high-density and low-density grids. The trajectory is encoded for a fixed SD, resulting in a trajectory encoding sequence. After calculating the grid density, the grid is divided into high-density and low-density grids. The high-density grids traversed by the trajectory form a high-density grid encoding sequence. The high-density grid encoding sequence is formed by assigning the corresponding encoding sequence numbers of the high-density grids traversed by the trajectory in the order they were passed.
[0054] S3. Based on the number of low-density grids traversed by the trajectory, the trajectory dataset is divided into trajectory groups, and the trajectory group rate is calculated. Trajectory groups with a trajectory group rate less than the group rate threshold are outlier trajectory groups. Otherwise, the compatible trajectories within the group are calculated, and the trajectory group is divided into one or more trajectory clusters.
[0055] S4. Calculate the trajectory cluster rate and divide the trajectory group into a normal sub-trajectory set and a pseudo-abnormal trajectory set. Each trajectory group has multiple trajectory clusters, and each trajectory cluster contains multiple trajectories. Based on the trajectory cluster rate, the trajectories in each trajectory cluster are divided into a normal set or an abnormal set. After the multiple trajectory clusters in a trajectory group are divided, the trajectories in the trajectory group are divided into a normal sub-trajectory set and a pseudo-abnormal trajectory set.
[0056] S5. For each trajectory group, calculate the trajectory score within the pseudo-abnormal trajectory set based on the normal sub-trajectory within each trajectory group. Divide the pseudo-abnormal trajectory set into normal trajectories and outlier trajectories. Perform trajectory scoring within each trajectory group to obtain the normal trajectories and outlier trajectories within each trajectory group. Combine the outlier trajectories within all trajectory groups to form the outlier trajectory set.
[0057] Furthermore, the mesh density acquisition step in step S1 is as follows:
[0058] Step 1: Divide the city vector area into a grid of fixed size;
[0059] Step 2: Calculate the density of each grid based on the road network data, and use the ratio of the total length of the roads within the grid to the grid area as the grid density.
[0060] The calculation is shown in formula (1):
[0061]
[0062] in, G represents the total length of roads within the grid. S Let represent the area of the grid, i = 1...k represent the 1st trajectory...kth trajectory, Li represent the length of the i-th trajectory in the grid, and k represent the number of trajectories in the grid.
[0063] Furthermore, the grid density threshold acquisition method in step S2 is the natural discontinuity grading method. This method utilizes clustering to maximize the similarity within each cluster and the dissimilarity between outer clusters; however, clustering does not focus on the number and range of elements in each cluster. The natural discontinuity grading method also ensures that the density range and number of different categories are as similar as possible. The natural discontinuity grading method is suitable for classification, and classification is performed based on the acquired threshold. It is automatic classification and automatic threshold generation, which is existing technology, and will not be further described here.
[0064] Furthermore, the trajectory group rate in step S3 determines whether the trajectory group is further divided into trajectory clusters. The trajectory group rate is calculated as shown in formula (2):
[0065]
[0066] Among them, num(TG)i ) is the trajectory group TG i The number of trajectories within the trajectory dataset TD is num(TD), where num(TD) is the number of trajectories within the dataset TD, and TGi represents the i-th trajectory group. If the trajectory group ratio is less than the trajectory cluster ratio threshold ζ, all trajectories in the trajectory cluster are outliers. If the trajectory group ratio is greater than or equal to ζ, the compatible trajectories within the group are calculated, and the trajectory group is divided into trajectory clusters. The formula for calculating compatible trajectories is shown in (3):
[0067]
[0068] in, It is trajectory T i The high-density grid-coded sequence. It is trajectory T j The high-density grid-coded sequence passed through. If ρ = 0, the trajectory T i and T j For compatible trajectories, they belong to the same trajectory family. If ρ = 0, trajectory T i and T j For compatible trajectories, they are placed in the same trajectory cluster, and then each trajectory cluster is further divided into normal sub-trajectories and pseudo-anomaly trajectories based on the trajectory cluster rate.
[0069] Furthermore, the trajectory clustering rate in step S4 determines whether a trajectory within a cluster is a normal sub-trajectory. Assume TD is the trajectory dataset, and TZ... i It is the i-th trajectory cluster. The trajectory cluster rate of TZ is TZ i The ratio of the number of internal trajectories to the number of TD internal trajectories is calculated as shown in formula (4):
[0070]
[0071] Among them, num(TZ) i ) is TZ i The number of trajectories within a TD is denoted by num(TD). Based on the trajectory clustering rate threshold ζ, the trajectories within each trajectory group are divided into a normal subset and a pseudo-anomaly subset. When the trajectory cluster TZ... i If the trajectory clustering rate is greater than the trajectory clustering rate threshold ζ, then the trajectory of the trajectory cluster is divided into a normal sub-trajectory; otherwise, it is divided into a pseudo-anomaly trajectory. All normal sub-trajectories within a trajectory group form a normal sub-trajectory set, and all pseudo-anomaly trajectories within a trajectory group form a pseudo-anomaly trajectory set.
[0072] Furthermore, in step S5, the trajectory score within the pseudo-anomaly trajectory set is calculated based on the outlier trajectories in the trajectory score detection group. The trajectory score calculation is shown in formula (5):
[0073]
[0074] Among them, TS i Trajectory T in the normal sub-trajectory dataset i The encoded sequence, TS j Trajectory T in the pseudo-anomaly trajectory dataset j The encoded sequence includes high-density grid encoded sequences and low-density grid encoded sequences. The calculation results satisfy... The larger the value, the larger the trajectory T. j The more likely it is to be a normal trajectory.
[0075] Furthermore, the constraint in step S5 is: the trajectory score is less than or equal to the score threshold. When the trajectory score is greater than When the time condition is met, the trajectory is considered a normal trajectory; otherwise, it is considered an outlier trajectory. Scoring threshold. To obtain the threshold for the experiment.
[0076] After step S5, the pseudo-abnormal trajectory set is divided into a quasi-normal trajectory set and an abnormal trajectory set. The set formed by the quasi-normal trajectory set and the normal sub-trajectory set is the normal trajectory in the trajectory group, and the remaining trajectories in the group are outlier trajectories.
[0077] like Figure 1 The diagram shows a flowchart of a trajectory outlier detection method based on group partitioning provided by an embodiment of the present invention. The method includes the following steps:
[0078] S1. Divide the urban vector area into a grid of fixed size. Assign a different code to each grid. Calculate the density of each grid based on the urban road network data;
[0079] The city vector region is divided into a grid of fixed size, such as... Figure 2 As shown, the grid density of each grid is calculated according to formula (1).
[0080] S2. Based on grid density, the grid density threshold is obtained using the natural discontinuity classification method. According to the density threshold, the grid is divided into high-density and low-density grids. The trajectory of the fixed SD is encoded to obtain the trajectory encoding sequence.
[0081] Figure 3 (a) is a schematic diagram of a high-density grid. Figure 3 (b) is a schematic diagram of a low-density grid.
[0082] S3. Based on the number of low-density grids traversed by the trajectory, the trajectory dataset is divided into trajectory groups, and the trajectory group rate is calculated. Trajectory groups with a trajectory group rate less than the group rate threshold are outlier trajectory groups. Otherwise, the compatible trajectories within the group are calculated, and the trajectory group is divided into one or more trajectory clusters.
[0083] The trajectory group rate determines whether the trajectory group is further divided into trajectory clusters, and the calculation is shown in formula (2):
[0084]
[0085] Among them, num(TG) i ) is the trajectory group TG i The number of trajectories within the trajectory dataset TD is num(TD). If the trajectory group ratio is less than the trajectory cluster ratio threshold ζ, all trajectories in the trajectory cluster are outliers. If the trajectory group ratio is greater than or equal to ζ, the compatible trajectories within the group are calculated, and the trajectory group is divided into trajectory clusters. The formula for calculating compatible trajectories is shown in (3):
[0086]
[0087] in, It is trajectory T i The high-density grid-coded sequence. It is trajectory T j The high-density grid-coded sequence passed through. If ρ = 0, the trajectory T i and T j For compatible trajectories, they are in the same trajectory family.
[0088] S4. Calculate the trajectory cluster rate and divide the trajectory group into a normal sub-trajectory set and a pseudo-anomaly trajectory set;
[0089] The clustering rate of a trajectory determines whether a trajectory within a cluster is a normal sub-trajectory. Assume TD is a trajectory dataset, and TZ... i It is the i-th trajectory cluster. The trajectory cluster rate of TZ is TZ i The ratio of the number of internal trajectories to the number of TD internal trajectories is calculated as shown in formula (4):
[0090]
[0091] Among them, num(TZ) i ) is TZ i The number of trajectories within a given TD, num(TD) is the number of trajectories within TD. Based on the trajectory clustering rate threshold ζ, the trajectory group is divided into normal sub-trajectories and pseudo-anomaly trajectories.
[0092] S5. For a single trajectory group, calculate the trajectory score within the pseudo-abnormal trajectory set based on the normal sub-trajectory, divide the pseudo-abnormal trajectory set into normal trajectories and outlier trajectories, and combine the outlier trajectories within all trajectory groups to form an outlier trajectory set.
[0093] In step S5, the trajectory score within the pseudo-anomaly trajectory set is calculated based on the outlier trajectories detected in the trajectory score detection group, as shown in formula (5):
[0094]
[0095] Among them, TS i Trajectory T in the normal sub-trajectory dataset i The encoded sequence, TS j Trajectory T in the pseudo-anomaly trajectory dataset j The encoded sequence. The calculation result satisfies The larger the value, the larger the trajectory T. j The more likely it is to be a normal trajectory.
[0096] The constraint in step S5 is: the trajectory score is less than or equal to the score threshold.
[0097] The trajectory outlier detection method provided by this invention has the following functions: First, it proposes a grid type definition method based on grid density calculation, dividing urban areas into high-density and low-density grids, as abnormal taxi behavior typically occurs in densely networked areas. Second, it proposes a trajectory outlier detection method based on group partitioning, obtaining a normal subset of trajectories within each trajectory group based on a high-density grid sequence. This method prevents the negative impact of excessive spatiotemporal deviations in normal trajectories on the outlier detection results, improving detection efficiency and accuracy. Experimental results show that the proposed method performs better in trajectory outlier detection, achieving better results and providing greater practical guidance, such as for identifying suspicious vehicle activities.
[0098] Most existing methods have two limitations. First, they ignore the influence of the road network environment on trajectory outlier detection, while detour behavior often occurs in areas with dense road networks where there are more alternative routes. Second, they treat the trajectory dataset as a whole, ignoring the deviations between different types of normal trajectories. To address these issues, this study proposes a group-based outlier detection method for detecting outliers from trajectory datasets with the same origin (S) and destination (D).
[0099] To illustrate the effectiveness, specific embodiments of the present invention are provided, illustrating the evaluation of the proposed method's effectiveness on three datasets. The data sources are a real taxi movement trajectory dataset and a road network dataset from San Francisco. This dataset contains the trajectories of 536 taxis in the San Francisco metropolitan area over 30 days, with an average sampling rate of 100 seconds. GPS trajectory data records the location (latitude and longitude) of each taxi, along with the corresponding time and occupancy status. Partial trajectory points and a schematic diagram of the San Francisco Bay Area are shown below. Figure 4 As shown.
[0100] T-1 was used as the test set to obtain suitable ζ and ζ (ζ1 = 0.006, ζ2 = 0.012, ζ3 = 0.018, ζ4 = 0.024) and under different conditions The F-measure, Accuracy, Precision, and Recall values are as follows: Figure 5 As shown.
[0101] Figure 6 (a) For dataset T-2, at threshold ζ2 and The following diagram illustrates F-measure, Accuracy, Precision, and Recall. Figure 6 (b) For the T-3 dataset at threshold ζ2 and A schematic diagram of F-measure, Accuracy, Precision, and Recall. At ζ2 and With the parameters set, both datasets achieved good results in F-measure, Accuracy, Precision, and Recall, indicating that the vast majority of outlier trajectories can be detected at this threshold. Therefore, choosing... Compared with other methods.
[0102] Figure 7 A schematic diagram comparing the F-measure results of the TODG algorithm described in this invention with the Two Phase algorithm, ATDC algorithm, and iBAT algorithm is provided. The comparison shows that the method described in this invention significantly outperforms the compared algorithms, yielding more accurate abnormal trajectories and being more beneficial for applications such as traffic fraud detection.
[0103] Figure 8 A schematic diagram comparing the accuracy of the TODG algorithm described in this invention with the Two Phase algorithm, ATDC algorithm, and iBAT algorithm is provided. The comparison shows that the method described in this invention significantly outperforms the compared algorithms, yielding more accurate abnormal trajectories and being more beneficial for applications such as traffic fraud detection.
[0104] Figure 9 A schematic diagram comparing the precision of the TODG algorithm described in this invention with the Two Phase algorithm, ATDC algorithm, and iBAT algorithm is provided. The comparison shows that although the precision value of the proposed method on the T-3 dataset is lower than that of the Two Phase method, the results of the method described in this invention on the T-1 and T-2 datasets are significantly better than the comparative algorithms, yielding more accurate outlier trajectories and thus being more beneficial for applications such as traffic fraud detection.
[0105] Figure 10A schematic diagram comparing the recall results of the TODG algorithm described in this invention with the Two Phase algorithm, ATDC algorithm, and iBAT algorithm is provided. The comparison shows that the method described in this invention significantly outperforms the compared algorithms, yielding more accurate outlier trajectories and being more beneficial for applications such as traffic fraud detection.
[0106] like Figure 7-10 As shown, although Two Phase has higher precision on the T-3 dataset, its F-measure is smaller due to its lower recall compared to TODG. iBAT's outlier detection results are worse than the other three methods because the iBAT algorithm detects outliers based on selected subsamples, and the detection results are affected by the selected subsamples. The Two Phase algorithm calculates trajectory point density and identifies individual outliers based on a trajectory point density threshold; however, the detection results are affected by the density values of individual trajectory points. The ATDC algorithm converts the trajectory's driving distance into multiple grids, and the detection results of peripheral trajectories are affected by the number of grids. Furthermore, Two Phase, ATDC, and iBAT ignore the objective factors that normal trajectories may contain multiple spatial types and that there may be deviations between normal trajectories. Therefore, the F-measure, accuracy, precision, and recall results of the other methods are all inferior to TODG. Therefore, our proposed method performs better than existing methods in vehicle suspicious activity identification, public transportation planning, and taxi fraud detection.
[0107] Figure 11 The anomaly trajectories detected by the TODG algorithm described in this invention on three datasets are presented. Experimental results show that the proposed TODG algorithm performs well in trajectory outlier detection.
[0108] Obviously, the specific implementation of this invention is not limited to the above-described methods. Any non-substantial improvements made using the inventive concept and technical solution of this invention are within the protection scope of this invention.
Claims
1. A trajectory outlier detection method based on group division, characterized in that: Includes the following steps: S1. Divide the urban vector area into multiple grids, assign different codes to each grid, and calculate the density of each grid based on the urban road network data; S2. Based on the comparison between grid density and grid density threshold, the grid is divided into high-density and low-density grids. The trajectory dataset with fixed origin S and destination D is encoded to obtain the trajectory encoding sequence. S3. Based on the number of low-density grids traversed by the trajectory, the trajectory dataset is divided into trajectory groups, and the trajectory group rate is calculated. Trajectory groups with a trajectory group rate less than the group rate threshold are outlier trajectory groups. Otherwise, the compatible trajectories within the group are calculated, and the trajectory group is divided into one or more trajectory clusters. S4. Calculate the trajectory cluster rate and divide the trajectory group into a normal sub-trajectory set and a pseudo-anomaly trajectory set; S5. For a single trajectory group, calculate the trajectory score within the pseudo-anomaly trajectory set based on the normal sub-trajectory, divide the pseudo-anomaly trajectory set into normal trajectories and outlier trajectories, and combine the outlier trajectories within all trajectory groups to form an outlier trajectory set. The method for calculating the trajectory group rate in step S3 is as follows: The trajectory group rate is calculated using the following formula: (2) Among them, num(TG) i ) is the trajectory group TG i The number of trajectories within the trajectory dataset TD; In step S3, if the trajectory group rate is greater than or equal to the group rate threshold... Then, calculate the compatible trajectories within the group, divide the trajectory group into trajectory clusters, and the formula for calculating compatible trajectories is: (3) in, It is trajectory T i The high-density grid-coded sequence. It is trajectory T j The passed high-density grid-coded sequence; if Trajectory T i and T j As compatible trajectories, they are grouped into the same trajectory cluster; In step S4, the trajectory clustering rate determines whether a trajectory within a cluster is a normal sub-trajectory, and its calculation method is as follows: Let TD be the trajectory dataset, and TZ be the trajectory dataset. i It is the i-th trajectory cluster, TZ i The trajectory cluster rate is TZ i The ratio of the number of inner tracks to the number of TD inner tracks is calculated using the following formula: (4) Among them, num(TZ) i ) is TZ i The number of trajectories within TD, where num(TD) is the number of trajectories within TD, is determined based on the trajectory clustering rate threshold. The trajectory group is divided into a normal sub-trajectory set and a pseudo-anomaly trajectory set.
2. The trajectory outlier detection method based on group partitioning as described in claim 1, characterized in that: The method for calculating the mesh density in step S1 includes: The ratio of the total length of roads within a grid to the grid area is used as the grid density.
3. The trajectory outlier detection method based on group division according to claim 1, wherein: In step S1, the urban vector region is divided into grids of fixed size.
4. The trajectory outlier detection method based on group partitioning as described in claim 1, characterized in that: The method for obtaining the grid density threshold in step S2 is the natural discontinuity classification method.
5. The trajectory outlier detection method based on group partitioning as described in claim 1, characterized in that: In step S5, the trajectory score is calculated within the pseudo-anomaly trajectory set. Based on the outlier trajectories in the trajectory score detection group, the trajectory score calculation formula is as follows: (5) Among them, TS i It is a normal sub-trajectory set trajectory T i The encoded sequence, TS j Trajectory T in the pseudo-anomaly trajectory dataset j The encoded sequence, the calculation result satisfies 0 ≤ ≤1.
6. The trajectory outlier detection method based on group division according to claim 5, wherein: the trajectory score is less than or equal to the score threshold then the trajectory is determined to be an outlier trajectory.
7. The trajectory outlier detection method based on group partitioning as described in claim 1, characterized in that: After step S5, the pseudo-abnormal trajectory set is divided into a quasi-normal trajectory set and an abnormal trajectory set. The set formed by the quasi-normal trajectory set and the normal sub-trajectory set is the normal trajectory in the trajectory group, and the remaining trajectories in the group are outlier trajectories.