Traffic flow prediction method and device, computer device and storage medium
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2021-05-14
- Publication Date
- 2026-06-26
AI Technical Summary
Traditional traffic flow prediction methods based on spatiotemporal grid data fail to capture sufficient spatiotemporal features, resulting in inaccurate traffic flow predictions.
By acquiring traffic videos from traffic detection points, the quantity features, flow features, and traffic image features of each type of traffic object are extracted and fused to generate a traffic feature vector. Based on these feature vectors, a traffic flow map is generated, and spatiotemporal features are extracted by combining graph convolution and temporal convolution to predict traffic flow.
It improves the accuracy of traffic flow forecasting, enabling more precise determination of future traffic flow at traffic checkpoints, saving costs, and can be used to optimize traffic network management.
Smart Images

Figure CN115346146B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to a traffic flow prediction method, apparatus, computer equipment, and storage medium. Background Technology
[0002] With social and economic development, the number of vehicles is increasing, and traffic problems are becoming more and more severe. If traffic flow at a certain time in the future can be predicted, then solutions can be designed in advance to address various traffic issues.
[0003] Traditional traffic flow prediction methods typically employ spatiotemporal graph-based approaches. These methods use cyclic temporal networks to capture temporal information, thereby predicting traffic flow patterns with temporal relationships. However, traditional methods based on spatiotemporal grid data primarily consider only the temporal dimension, failing to adequately capture spatiotemporal features, thus leading to inaccurate traffic flow predictions. Summary of the Invention
[0004] Therefore, it is necessary to provide a traffic flow prediction method, apparatus, computer equipment, and storage medium that can improve the accuracy of traffic flow prediction in response to the above-mentioned technical problems.
[0005] A traffic flow prediction method, the method comprising:
[0006] Acquire traffic videos collected at at least one traffic detection point;
[0007] For each traffic video containing traffic images from multiple historical moments, determine the quantity characteristics of each type of traffic object included in the traffic image, the flow characteristics of the traffic image, and the traffic image characteristics;
[0008] For each frame of traffic image, the quantity features, flow features and traffic image features corresponding to the corresponding traffic image are fused to obtain the traffic feature vector corresponding to the corresponding traffic image;
[0009] Based on the traffic feature vectors corresponding to the traffic images of the at least one traffic detection point at the same historical time, a traffic flow map for the corresponding historical time is generated.
[0010] Based on the traffic flow maps corresponding to each historical moment, the predicted traffic flow of the at least one traffic detection point is determined.
[0011] A traffic flow prediction device, the device comprising:
[0012] The acquisition module is used to acquire traffic videos collected at at least one traffic detection point.
[0013] The determination module is used to determine, for each traffic video containing traffic images from multiple historical moments, the quantity characteristics of each type of traffic object included in the traffic image, the flow characteristics of the traffic image, and the traffic image characteristics of the traffic image.
[0014] The fusion module is used to fuse the quantity features, flow features and traffic image features corresponding to each traffic image for each frame of traffic image, respectively, to obtain the traffic feature vector corresponding to the corresponding traffic image;
[0015] The generation module is used to generate a traffic flow map for the corresponding historical time based on the traffic feature vectors corresponding to the traffic images of the at least one traffic detection point at the same historical time.
[0016] The determining module is also used to determine the predicted traffic flow of the at least one traffic detection point based on the traffic flow maps corresponding to each historical time.
[0017] In one embodiment, the determining module is further configured to extract image pyramid features from the traffic image; perform attention mechanism processing on the image pyramid features to obtain attention-enhanced image pyramid features; perform convolution operation on the attention-enhanced image pyramid features to generate candidate boxes containing each traffic individual included in the traffic image; and determine the quantity features corresponding to the corresponding category of traffic objects based on the candidate boxes containing all traffic individuals belonging to the same category of traffic objects.
[0018] In one embodiment, the determining module is further configured to, for each candidate box, identify the category of traffic object to which the traffic individual in the candidate box belongs, and determine the corresponding category confidence; filter out candidate boxes with a category confidence higher than a confidence threshold as target boxes; count the category of traffic object to which the traffic individual in each target box belongs, and determine the quantity feature of each type of traffic object included in the traffic image based on the target boxes corresponding to traffic individuals belonging to the same traffic object.
[0019] In one embodiment, the determining module is further configured to perform channel attention processing on each layer of features in the image pyramid features to obtain corresponding channel pyramid features; perform spatial attention processing on each layer of features in the image pyramid features to obtain corresponding spatial pyramid features; and superimpose the image pyramid features, the channel pyramid features, and the spatial pyramid features to obtain attention-enhanced image pyramid features.
[0020] In one embodiment, the determining module is further configured to perform target tracking on the traffic image to obtain the target movement trajectory point of each tracked traffic individual in the traffic image; add the target location features of the target movement trajectory points in the traffic image to the tracking queue, so that the previous target movement trajectory points corresponding to the previous traffic image in the tracking queue, together with the target movement trajectory points corresponding to the current traffic image, constitute the target movement trajectory of each traffic individual; and determine the traffic flow characteristics of the traffic image based on the number and direction of the target movement trajectories of each traffic individual in the traffic image.
[0021] In one embodiment, the determining module is further configured to perform multi-target tracking on the traffic image to obtain a first action trajectory point for each tracked traffic individual in the traffic image; perform single-target tracking on the traffic image to obtain a second action trajectory point for each tracked traffic individual in the traffic image; and use an adaptive decision model to determine a target action trajectory point for each tracked traffic individual in the traffic image from the first action trajectory point and the second action trajectory point.
[0022] In one embodiment, the determining module is further configured to extract image pyramid features from the traffic image; the image pyramid features include at least two layers of features with progressively increasing scales; for each layer of features in the image pyramid features except for the feature with the largest scale, upsampling is performed to obtain the corresponding upsampled features, and the upsampled features are fused with the features of the previous scale corresponding to the corresponding layer through skip connections to obtain the first position feature of each traffic individual tracked in the traffic image; the first movement trajectory point of each traffic individual is determined based on the first position feature.
[0023] In one embodiment, the determining module is further configured to acquire a target detection image obtained by target detection based on the traffic image; perform convolution processing on the traffic image and the target detection image respectively to extract the first image features of each traffic individual in the traffic image and the second image features of each traffic individual in the target detection image; process each of the first image features and each of the second image features using an attention mechanism to obtain attention-enhanced first image features and attention-enhanced second image features; perform cross-correlation processing on the attention-enhanced first image features and attention-enhanced second image features to obtain similarity features between the attention-enhanced first image features and attention-enhanced second image features; and obtain the second movement trajectory point of each traffic individual tracked in the traffic image based on each of the similarity features.
[0024] In one embodiment, the determining module is further configured to perform classification processing on each of the similarity features to obtain a classification confidence map; perform regression processing on each of the similarity features to obtain a location regression map; and determine a second location feature for each traffic individual based on the classification confidence map and the location regression map; and determine a second movement trajectory point for each traffic individual based on the second location feature.
[0025] In one embodiment, the apparatus further includes a training module for acquiring multiple training images, performing multi-target tracking on each training image to obtain a first training trajectory point, performing single-target tracking on each training image to obtain a second training trajectory point, and during the current training round, for each training image in the current round, selecting a target training trajectory point from the first and second training trajectory points through the adaptive decision model to be trained, and determining the overlap rate between the target training trajectory point and the actual training trajectory point of the training image; accumulating the overlap rates of all training images in the current round to obtain the cumulative overlap rate of the current round; optimizing the adaptive decision model to be trained by maximizing the cumulative overlap rate, and returning to execute the training process of the next round until a preset stopping condition is met to stop training, thereby obtaining a trained adaptive decision model.
[0026] In one embodiment, the determining module is further configured to overlay the traffic flow maps corresponding to each historical time point to obtain an overlaid traffic flow map; perform graph convolution and time-dimensional convolution on the overlaid traffic flow map to extract spatiotemporal features; input the spatiotemporal features into a fully connected layer, and output the predicted traffic flow of the at least one traffic detection point through the fully connected layer.
[0027] In one embodiment, the determining module is further configured to acquire at least two cycles; for each cycle, the traffic flow maps corresponding to each historical moment within the cycle are superimposed to obtain a cycle traffic flow map; and the cycle traffic flow maps corresponding to each cycle are superimposed to obtain a superimposed traffic flow map.
[0028] A computer device includes a memory and a processor, the memory storing a computer program, the processor executing the computer program to implement the steps of the method described above.
[0029] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method described above.
[0030] The aforementioned traffic flow prediction method, apparatus, computer equipment, and storage medium acquire traffic videos collected at at least one traffic detection point. For each traffic video containing traffic images from multiple historical moments, the method determines the quantity characteristics, flow characteristics, and traffic image characteristics of each type of traffic object included in the traffic image. For each frame of traffic image, the method fuses the quantity characteristics, flow characteristics, and traffic image characteristics corresponding to the corresponding traffic image to obtain a traffic feature vector corresponding to the corresponding traffic image. Based on the traffic feature vectors corresponding to the traffic images of at least one traffic detection point at the same historical moment, a traffic flow map for the corresponding historical moment is generated. It can be seen that the traffic flow map contains both spatial and temporal information of each traffic detection point. Therefore, based on the traffic flow maps corresponding to each historical moment, the correlation between the features at each historical moment in the temporal and spatial dimensions can be obtained, thereby more accurately determining the predicted traffic flow at at least one traffic detection point. Attached Figure Description
[0031] Figure 1 This is a diagram illustrating the application environment of a traffic flow prediction method in one embodiment.
[0032] Figure 2 This is a flowchart illustrating a traffic flow prediction method in one embodiment;
[0033] Figure 3 This is a framework diagram of a traffic flow prediction method in one embodiment;
[0034] Figure 4 This is a flowchart illustrating how a traffic image is determined in one embodiment to represent the quantity characteristics of each type of traffic object.
[0035] Figure 5 This is a schematic diagram of the target detection process in one embodiment;
[0036] Figure 6 This is a schematic diagram of the attention mechanism in one embodiment;
[0037] Figure 7 This is a flowchart illustrating how traffic flow characteristics of a traffic image are determined in one embodiment.
[0038] Figure 8 This is a schematic diagram of the structure of a deep feature fusion network in one embodiment;
[0039] Figure 9 This is a flowchart illustrating single-target tracking in one embodiment;
[0040] Figure 10 This is a framework diagram of traffic flow prediction in one embodiment;
[0041] Figure 11 Here is a flowchart of a traffic flow prediction method in another embodiment;
[0042] Figure 12 This is a structural block diagram of a traffic flow prediction device in one embodiment;
[0043] Figure 13 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation
[0044] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0045] The traffic flow prediction method provided in this application can be applied to, for example... Figure 1 The application environment is shown. Video acquisition devices 102 are installed at each traffic detection point, and these devices communicate with a computer device 104 via a network. The computer device 104 acquires traffic videos collected at at least one traffic detection point. For each traffic video containing multiple historical traffic images, it determines the quantity characteristics of each type of traffic object included in the traffic image, the flow characteristics of the traffic image, and the traffic image characteristics. For each frame of the traffic image, it fuses the quantity characteristics, flow characteristics, and traffic image characteristics corresponding to the corresponding traffic image to obtain a traffic feature vector corresponding to the corresponding traffic image. Based on the traffic feature vectors corresponding to the traffic images of the at least one traffic detection point at the same historical time, it generates a traffic flow map for the corresponding historical time. Based on the traffic flow maps corresponding to each historical time, it determines the predicted traffic flow for the at least one traffic detection point.
[0046] The video acquisition device 102 includes a camera that can capture traffic video. Specifically, the video acquisition device can be a monitoring device, or, but is not limited to, various personal computers, laptops, smartphones, and tablets that include a camera. The computer device 104 can be a terminal or a server. The terminal can be, but is not limited to, various personal computers, laptops, smartphones, and tablets, while the server can be a standalone server or a server cluster consisting of multiple servers.
[0047] In some embodiments, when the computer device 104 is a server, the server may be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
[0048] In one embodiment, such as Figure 2 As shown, a traffic flow prediction method is provided, which can be applied to... Figure 1 Taking a computer device as an example, the explanation includes the following steps:
[0049] Step S202: Obtain traffic videos collected at at least one traffic detection point.
[0050] Traffic checkpoints are inspection stations located on roads. They can be set up at intersections or in the middle of roads. Roads can be highways, urban roads, rural roads, etc. In future scenarios, roads could also be aerial, on water, or underwater, and are not limited to these.
[0051] Cameras are installed at traffic checkpoints to capture traffic video and send it to a computer. The computer then retrieves the traffic video footage captured at at least one traffic checkpoint.
[0052] Traffic video is video collected by traffic detection points. In one embodiment, the traffic video may include various traffic objects, such as vehicles and pedestrians. Vehicles may specifically include cars, trucks, buses, and public transport; pedestrians may specifically include young people, the elderly, and children. In another embodiment, the traffic video may also include various traffic objects, such as warships, merchant ships, and yachts.
[0053] Step S204: For traffic images from multiple historical moments included in each traffic video, determine the quantity characteristics of each type of traffic object included in the traffic image, the flow characteristics of the traffic image, and the traffic image characteristics.
[0054] A traffic image is a single frame within a traffic video. A historical moment is the moment a traffic image was captured. When capturing traffic video, cameras can also obtain the moment of the currently captured traffic image and arrange them in chronological order to generate a traffic video over a given period of time.
[0055] Traffic objects refer to objects belonging to a certain category on a traffic road. Specifically, traffic objects can be vehicles, pedestrians, ships, or aircraft, etc. Quantitative characteristics refer to features that include quantitative information. The quantitative characteristics of each category of traffic objects are as follows: for example, the quantitative characteristic of vehicles is 50, and the quantitative characteristic of pedestrians is 100.
[0056] Traffic image features refer to the features included in a traffic image. These features mainly include color features, texture features, shape features, and spatial relationship features.
[0057] Color features are a type of global feature that describes the surface properties of objects in an image or image region.
[0058] Texture features are also a type of global feature, describing the surface properties of objects corresponding to an image or image region. However, since texture is only a characteristic of an object's surface and cannot fully reflect the object's essential attributes, it is impossible to obtain high-level image content using texture features alone.
[0059] Shape features include contour features and region features. Contour features of an image mainly refer to the outer boundaries of objects, while region features relate to the entire shape region.
[0060] Spatial relationships refer to the spatial positions or relative directions between multiple targets segmented from an image. These relationships can be categorized into connection / adjacency relationships, overlap / intersection relationships, and containment / enclosure relationships.
[0061] Specifically, for each historical moment in a traffic video, the computer device identifies the category of each individual traffic object in the traffic image and counts the quantity characteristics of each category of traffic objects in the traffic image; the computer device performs target tracking on the traffic image to obtain the traffic flow characteristics; the traffic flow characteristics include the traffic flow information of each individual traffic object in the traffic image; the computer device uses a convolutional neural network to perform convolution processing on the traffic image to extract the traffic image features.
[0062] In this context, "traffic individual" refers to an individual situated on a traffic path. A traffic individual can be a vehicle, a person, or a boat, etc. It can be understood that different traffic individuals can belong to the same category of traffic objects. For example, in a traffic image frame containing five pedestrians, all five pedestrians belong to the same category of traffic objects, but each pedestrian is considered an independent traffic individual. Traffic flow characteristics refer to the characteristics of the traffic flow represented by traffic individuals. Traffic flow characteristics can include the movement trajectory and direction of movement of traffic individuals. Traffic flow characteristics can also include the size and speed of traffic individuals.
[0063] Step S206: For each frame of traffic image, the quantity features, flow features and traffic image features corresponding to the corresponding traffic image are fused to obtain the traffic feature vector corresponding to the corresponding traffic image.
[0064] Traffic feature vectors are feature vectors obtained by fusing the quantity features, flow features, and traffic image features corresponding to a traffic image.
[0065] Specifically, for each frame of traffic image, the computer device uses the quantity feature, flow feature, and traffic image feature corresponding to the traffic image as elements of the traffic feature vector to be generated, thereby fusing them to obtain the traffic feature vector corresponding to the traffic image. Alternatively, the computer device can perform weighted superposition and fusion of the quantity feature, flow feature, and traffic image feature corresponding to the traffic image to obtain the traffic feature vector corresponding to the traffic image. Of course, the computer device can also use other fusion methods for processing, and this application embodiment does not limit this.
[0066] Furthermore, the computer equipment can also acquire other features from the traffic images, and fuse the quantity features, flow features, traffic image features, and other features corresponding to the traffic images to obtain a traffic feature vector corresponding to the corresponding traffic images. These other features may include at least whether a traffic accident has occurred, the type of traffic accident, and the road width.
[0067] Step S208: Based on the traffic feature vectors corresponding to traffic images of at least one traffic detection point at the same historical time, generate a traffic flow map for the corresponding historical time.
[0068] A traffic flow map is a graph that includes traffic flow information from various traffic monitoring points. The traffic feature vector of a traffic monitoring point at a historical time represents the traffic flow information at that point. The traffic feature vector corresponding to the traffic image of at least one traffic monitoring point at the same historical time represents the traffic flow information of at least one traffic monitoring point at that same historical time. A traffic flow map contains information both temporally and spatially related to the various monitoring points.
[0069] The computer equipment generates a traffic flow graph G=(V,E) for the corresponding historical time. Here, V is all the nodes in the traffic flow graph, representing all traffic detection points (each node represents each traffic detection point), and E is the set of edges connected to each node, representing the flow relationship between each traffic detection point.
[0070] Step S210: Determine the predicted traffic flow at at least one traffic detection point based on the traffic flow maps corresponding to each historical time.
[0071] Predicted traffic flow is the forecasted traffic volume. Predicted traffic flow can include information such as the quantity of traffic objects and the direction of traffic flow. The quantity of traffic objects is, for example, 200 vehicles and 399 pedestrians. The direction of traffic flow is, for example, the direction vehicles travel and the direction pedestrians walk.
[0072] In one implementation, the computer device determines the predicted traffic flow at at least one traffic monitoring point for a specified future time or a specified future time period based on traffic flow maps corresponding to various historical moments. Both the specified time and the specified time period can be set as needed. For example, the specified time could be 12 noon tomorrow, and the specified time period could be the entire next Saturday.
[0073] In another implementation, the computer device determines the predicted traffic flow at at least one traffic detection point for a future time period based on the cycle of the traffic video. For example, if the cycle of the traffic video is one day, the computer device can determine the predicted traffic flow at at least one traffic detection point for the next day. Similarly, if the cycle of the traffic video is one hour, the computer device can determine the predicted traffic flow at at least one traffic detection point for the next hour.
[0074] In this embodiment, traffic videos collected at at least one traffic detection point are acquired. For each traffic video containing traffic images from multiple historical moments, the quantity characteristics, flow characteristics, and traffic image characteristics of each type of traffic object included in the traffic image are determined. For each frame of traffic image, the quantity characteristics, flow characteristics, and traffic image characteristics corresponding to the corresponding traffic image are fused to obtain a traffic feature vector corresponding to the corresponding traffic image. Based on the traffic feature vectors corresponding to the traffic images of at least one traffic detection point at the same historical moment, a traffic flow map for the corresponding historical moment is generated. It can be seen that the traffic flow map contains both spatial information and temporal information for each traffic detection point. Therefore, based on the traffic flow maps corresponding to each historical moment, the correlation between the features at each historical moment in the temporal dimension and the spatial dimension can be obtained, thereby more accurately determining the predicted traffic flow for at least one traffic detection point.
[0075] Furthermore, the aforementioned traffic flow prediction method can directly predict and analyze traffic flow at each traffic detection point through traffic video, without the need for other sensors to collect other data, thus saving costs.
[0076] Understandably, effectively predicting future traffic flow at traffic checkpoints allows for corresponding adjustments, such as adjusting the timing of red and green lights and assigning staff to the checkpoints for coordination, thereby improving the operational efficiency of the traffic network.
[0077] Figure 3 This is a framework diagram of a traffic flow prediction method in one embodiment. The computer device includes a target detection module 302, a target tracking module 304, and a traffic flow prediction module 306. The target detection module 302 determines the quantity characteristics of each type of traffic object in the traffic image, and the target tracking module 304 determines the flow characteristics of the traffic flow. The quantity and flow characteristics of each type of traffic object in the traffic image are then sent to the traffic flow prediction module 306. The traffic flow prediction module 306 fuses the quantity characteristics, flow characteristics, and extracted image features to obtain a traffic feature vector corresponding to the corresponding traffic image. Based on the traffic feature vectors corresponding to traffic images of at least one traffic detection point at the same historical time, a traffic flow map for the corresponding historical time is generated. Based on the traffic flow maps corresponding to each historical time, the predicted traffic flow for at least one traffic detection point is determined.
[0078] In one embodiment, such as Figure 4 As shown, the methods for determining the quantity characteristics of each type of traffic object included in the traffic image include:
[0079] Step S402: Extract image pyramid features from traffic images.
[0080] Image pyramid features refer to a set of features extracted from an image, with each layer increasing in scale. The number of feature layers in an image pyramid feature can be set as needed. For example, an image pyramid feature can include 3 layers, 4 layers, etc.
[0081] Specifically, the computer equipment uses a pre-trained deep convolutional neural network to extract features from traffic images, extracting image pyramid features from the traffic images. The computer equipment can establish an image classification task, and pre-train the deep convolutional neural network in the image classification task to obtain a pre-trained deep convolutional neural network.
[0082] Step S404: Apply attention mechanism to the image pyramid features to obtain attention-enhanced image pyramid features.
[0083] Attention mechanisms are special structures embedded in machine learning models to automatically learn and calculate the contribution of input data to output data. Attention mechanisms are a resource allocation scheme primarily used to address the information overload problem, allocating computational resources to more important tasks.
[0084] Attention mechanisms include at least channel attention mechanisms and spatial attention mechanisms. Channel attention mechanisms are attention mechanisms along the channel dimension, while spatial attention mechanisms are attention mechanisms along the spatial dimension.
[0085] Computer devices can use attention mechanisms to process image pyramid features, which can identify features of different importance from the image pyramid features, i.e., image pyramid features enhanced by attention.
[0086] Specifically, the computer device extracts image patch features twice for each scale of the image pyramid features. The image patch features extracted from each scale are then concatenated to obtain concatenated image patch features. These concatenated image patch features are then used as weights of the transposed convolution kernel and deconvolved with the obtained self-similarity feature map to obtain attention-enhanced image pyramid features.
[0087] Step S406: Perform a convolution operation on the attention-enhanced image pyramid features to generate candidate boxes for each traffic individual included in the traffic image.
[0088] Candidate boxes are the boxes containing individual traffic objects in a traffic image. Candidate boxes can be rectangular, circular, or various irregularly shaped boxes, etc., and there are no restrictions here.
[0089] Specifically, the computer device inputs the attention-enhanced image pyramid features into multiple convolutional layers and global pooling layers, performs convolution operations on the attention-enhanced image pyramid features, extracts the positional features of each traffic individual included in the traffic image, and generates candidate boxes at the positions represented by the positional features of each traffic individual.
[0090] The convolutional layer consists of several convolutional units, each with parameters optimized through backpropagation. Pooling layers, sandwiched between convolutional layers, compress the amount of data and parameters, reducing overfitting. Downsampling layers, also called pooling layers, operate similarly to convolutional layers, except that the convolutional kernels in downsampling only take the maximum or average value at corresponding positions (max pooling, average pooling), meaning the matrix operations differ, and they are not modified through backpropagation.
[0091] Furthermore, the computer equipment can also use bounding box regression to adjust the candidate boxes containing each traffic individual in the traffic image, obtaining the adjusted candidate boxes containing each traffic individual in the traffic image, and then determine the quantitative features corresponding to the corresponding category of traffic objects based on the adjusted candidate boxes containing each traffic individual in the traffic image.
[0092] Step S408: Based on the candidate boxes containing all traffic individuals belonging to the same type of traffic object, determine the quantitative features corresponding to the corresponding category of traffic object.
[0093] Computer equipment classifies traffic individuals in each candidate frame of a traffic image, determines the category of traffic individuals in each candidate frame, counts all candidate frames containing traffic individuals belonging to the same category of traffic objects, and determines the quantitative features corresponding to the traffic objects of the corresponding category.
[0094] For example, each traffic individual in a traffic image has 10 candidate boxes. The computer equipment classifies the traffic individuals in these 10 candidate boxes and determines that there are 3 candidate boxes for traffic individuals belonging to the vehicle (category) traffic object and 7 candidate boxes for traffic individuals belonging to the pedestrian (category) traffic object. That is, the corresponding quantitative feature for the vehicle (category) traffic object is 3, and the corresponding quantitative feature for the pedestrian (category) traffic object is 7.
[0095] In this embodiment, the computer device extracts image pyramid features from traffic images, performs attention mechanism processing on the image pyramid features to obtain attention-enhanced image pyramid features. These attention-enhanced image pyramid features can identify features of different importance levels, and then use convolution operations to extract features more accurately, generate candidate boxes for each traffic individual more accurately, and thus more accurately determine the quantity features corresponding to the corresponding category of traffic objects.
[0096] In one embodiment, the quantity features corresponding to the corresponding category of traffic objects are determined based on the candidate boxes containing all traffic individuals belonging to the same category of traffic objects. This includes: for each candidate box, identifying the category of traffic objects to which the traffic individuals in the candidate box belong, and determining the corresponding category confidence; selecting candidate boxes with category confidence higher than the confidence threshold as target boxes; counting the categories of traffic objects to which the traffic individuals in each target box belong, and determining the quantity features of each category of traffic objects included in the traffic image based on the target boxes corresponding to the traffic individuals belonging to the same category of traffic objects.
[0097] Category confidence refers to the degree of certainty with which a identified category is believed. The higher the category confidence of a traffic individual's category within that category, the higher the accuracy of that category. The confidence threshold can be set as needed. For example, the confidence threshold could be 90%, 95%, etc.
[0098] For each candidate bounding box, the computer device uses a classification network to classify the traffic individuals within the candidate bounding box, identifying the category of traffic object to which each traffic individual belongs and determining the corresponding category confidence. The classification network uses a cross-entropy loss function (softmax loss) to classify the traffic individuals within the candidate bounding box.
[0099] The computer device selects candidate boxes with a category confidence score higher than the confidence score threshold as target boxes, and discards candidate boxes with a confidence score lower than or equal to the confidence score threshold.
[0100] In this embodiment, for each candidate box, the computer device identifies the category of traffic object to which the traffic individual in the candidate box belongs and determines the corresponding category confidence level; the candidate boxes with a category confidence level higher than the confidence level threshold are selected as target boxes, so that the quantity characteristics of each type of traffic object included in the traffic image can be more accurately determined based on the category of traffic object to which the traffic individual in the selected target box belongs.
[0101] Figure 5 This is a schematic diagram of the target detection process in one embodiment. The computer device inputs a traffic image 502 into a convolutional neural network 504, which extracts image pyramid features 506. The computer device uses an attention mechanism 508 to process the image pyramid features 506, resulting in attention-enhanced image pyramid features 510. The computer device performs a convolution operation on the attention-enhanced image pyramid features 510 to generate candidate boxes for each traffic individual included in the traffic image 502. For each candidate box, the category of the traffic object to which the traffic individual belongs is identified, and the corresponding category confidence is determined. Candidate boxes are then filtered, with those having a category confidence higher than a confidence threshold selected as target boxes, resulting in a filtering result 512. The computer device counts the categories of traffic objects to which the traffic individuals in each target box belong, and based on the target boxes corresponding to traffic individuals belonging to the same traffic object, a target detection result 514 is obtained. The target detection result 514 represents the quantity feature of each type of traffic object included in the traffic image.
[0102] In one embodiment, the image pyramid features are processed by an attention mechanism to obtain attention-enhanced image pyramid features, including: processing each layer of features in the image pyramid features by a channel attention mechanism to obtain corresponding channel pyramid features; processing each layer of features in the image pyramid features by a spatial attention mechanism to obtain corresponding spatial pyramid features; and superimposing the image pyramid features, channel pyramid features, and spatial pyramid features to obtain attention-enhanced image pyramid features.
[0103] Channel pyramid features are pyramid features processed using a channel attention mechanism. Spatial pyramid features are pyramid features processed using a spatial attention mechanism.
[0104] The computer device processes each layer of the image pyramid feature using a channel attention mechanism to obtain the channel weights of the image pyramid feature in the channel dimension; these channel weights are then multiplied onto the image pyramid feature to obtain the corresponding channel pyramid feature.
[0105] The computer device calculates the channel weights of the image pyramid features in the channel dimension using the following formula:
[0106]
[0107] Among them, channel attention map That is, the channel weights in the channel dimension, F is the input feature, MLP is a multilayer perceptron, AvgPool represents average pooling, MaxPool represents max pooling, and σ represents the sigmoid function.
[0108] Similarly, the computer device processes each layer of features in the image pyramid feature using a spatial attention mechanism to obtain the spatial weights of the image pyramid features in the spatial dimension; these spatial weights are then multiplied onto the image pyramid features to obtain the corresponding spatial pyramid features.
[0109] Computer equipment uses the following formula to calculate the spatial weights of image pyramid features in the spatial dimension:
[0110]
[0111] Among them, spatial attention map That is, the spatial weights in the spatial dimension, where F is the input feature. This represents a filter with a size of 7×7 for convolution operations. AvgPool represents average pooling, MaxPool represents max pooling, and σ represents the sigmoid function.
[0112] Computer devices overlay image pyramid features, channel pyramid features, and spatial pyramid features to obtain attention-enhanced image pyramid features, which are also known as residual attention features.
[0113] In this embodiment, the computer device processes each layer of features in the image pyramid feature using channel attention and spatial attention mechanisms, respectively, to obtain channel attention features and spatial attention features. By superimposing the image pyramid features, channel pyramid features, and spatial pyramid features, residual attention enhancement can be performed on the image pyramid features in conjunction with the channel dimension and spatial dimension, thereby extracting the image pyramid features more accurately.
[0114] Figure 6 This is a schematic diagram of the attention mechanism in one embodiment. The computer device processes each layer of features 602 in the image pyramid feature using a channel attention mechanism to obtain the corresponding channel pyramid feature; it processes each layer of features 602 in the image pyramid feature using a spatial attention mechanism to obtain the corresponding spatial pyramid feature; and it superimposes the image pyramid feature, channel pyramid feature, and spatial pyramid feature to obtain the attention-enhanced image pyramid feature 604.
[0115] In one embodiment, such as Figure 7 As shown, the methods for determining traffic flow characteristics in traffic images include:
[0116] Step S702: Perform target tracking on the traffic image to obtain the target movement trajectory points of each tracked traffic individual in the traffic image.
[0117] The target movement trajectory point is the location of a tracked individual traffic entity in a traffic image. The direction of movement of that individual traffic entity can be determined through its target movement trajectory. For example, if the target movement trajectory of an individual traffic entity is from intersection A to intersection B, then the direction of movement of that individual traffic entity could be the direction from intersection A to intersection B.
[0118] In one implementation, the computer device can perform single-target tracking on a traffic image to obtain the target trajectory point of each tracked individual in the traffic image. In another implementation, the computer device can perform multi-target tracking on a traffic image to obtain the target trajectory point of each tracked individual in the traffic image.
[0119] Step S704: Add the target location features of the target movement trajectory points in the traffic image to the tracking queue, so that the target movement trajectory points corresponding to the previous traffic images in the tracking queue, together with the target movement trajectory points corresponding to the current traffic image, constitute the target movement trajectory of each traffic individual.
[0120] Target location features are the characteristics of the location of a target's trajectory point. These features allow the determination of the target's location. The tracking queue stores the location features of the trajectory points of each individual traffic entity. The preceding traffic image refers to the traffic images preceding the current traffic image, arranged chronologically by acquisition time. The preceding target trajectory point is the location of the tracked individual traffic entity in the preceding traffic image.
[0121] The computer device adds the target location features of the target movement trajectory points in the traffic image to the tracking queue. By using the target location features of the target movement trajectory points of the traffic individual in the tracking queue, the target movement trajectory points can be determined. By using the preceding location features of each preceding target movement trajectory point corresponding to the traffic individual in the preceding traffic image, each preceding target movement trajectory point can be determined. Thus, by connecting the target movement trajectory points and each preceding target movement trajectory point, the target movement trajectory of the traffic individual can be obtained.
[0122] For example, the current traffic image is frame 5, which includes vehicle A. The tracking queue stores the preceding position features of vehicle A from frames 1 to 4. The computer device tracks the target trajectory point of vehicle A from frame 5 and adds the target position features of vehicle A's target trajectory point to the tracking queue. Then, using the preceding position features of vehicle A from frames 1 to 4 in the tracking queue, the trajectory points of vehicle A in frames 1 to 4 can be determined. Using the target position features of vehicle A's target trajectory point in frame 5, the trajectory point of vehicle A in frame 5 can be determined. Connecting the determined trajectory points yields the target trajectory of vehicle A.
[0123] After acquiring the target trajectory points of each traffic individual tracked in the traffic image, the computer equipment can update the tracking status of each traffic individual; the tracking status includes tracking and loss. Specifically, if a specified traffic individual exists in a previous traffic image and is not tracked in the current traffic image, the tracking status of the specified traffic individual is updated to loss; if a specified traffic individual exists in a previous traffic image and is tracked in the current traffic image, the tracking status of the specified traffic individual remains tracked.
[0124] If a traffic individual's tracking status is "lost," meaning it was not detected in the traffic image, the duration of the loss is recorded. If the duration exceeds a preset threshold, the traffic individual is deleted and no longer tracked. If the duration is less than or equal to the preset threshold, the duration of the loss continues to be recorded. If the tracking status is "tracking," the traffic individual continues to be tracked. When a new traffic individual is detected, its identifier is added to the tracking queue, and the new individual is tracked.
[0125] Step S706: Determine the traffic flow characteristics of the traffic image based on the number and direction of the target movement trajectories of each traffic individual in the traffic image.
[0126] Understandably, after acquiring the target movement trajectory of each individual traffic vehicle, computer equipment can count the number of target movement trajectories in the traffic image and determine the trajectory direction of each individual traffic vehicle, that is, the direction in which the individual traffic vehicle is moving.
[0127] For each frame of traffic image, the computer device acquires the number and direction of the target movement trajectory of each traffic individual tracked in the traffic image, and uses the number and direction of the target movement trajectory of each traffic individual as an element in the traffic flow feature to be generated, thereby generating the traffic flow feature of the traffic image.
[0128] In this embodiment, the computer device performs target tracking on the traffic image to obtain the target movement trajectory point of each tracked traffic individual in the traffic image; the target location features of the target movement trajectory points in the traffic image are added to the tracking queue, so that the target movement trajectory points corresponding to the previous traffic image in the tracking queue, together with the target movement trajectory points corresponding to the current traffic image, constitute the target movement trajectory of each traffic individual; based on the number and trajectory direction of the target movement trajectories of each traffic individual in the traffic image, the traffic flow characteristics of the traffic image are determined. The traffic flow characteristics of the traffic image include both the number of target movement trajectories of each traffic individual and the trajectory direction of the movement trajectory of each traffic individual, which can more accurately represent the traffic flow relationship of each traffic individual in the traffic image, thereby enabling more accurate prediction of traffic flow.
[0129] In one embodiment, target tracking of a traffic image to obtain the target trajectory point of each tracked traffic individual in the traffic image includes: performing multi-target tracking of the traffic image to obtain a first trajectory point of each tracked traffic individual in the traffic image; performing single-target tracking of the traffic image to obtain a second trajectory point of each tracked traffic individual in the traffic image; and using an adaptive decision model to determine the target trajectory point of each tracked traffic individual in the traffic image from the first trajectory point and the second trajectory point.
[0130] Multi-target tracking refers to tracking at least two targets simultaneously. Single-target tracking refers to tracking a single target. It can be understood that computer equipment can perform multi-target tracking on traffic images faster, acquiring the initial trajectory points of each individual traffic element more quickly; while computer equipment can perform single-target tracking on traffic images more accurately, thus acquiring the second trajectory points of each individual traffic element more precisely.
[0131] The first trajectory point refers to the location of a traffic individual obtained by multi-target tracking of a traffic image. The second trajectory point refers to the location of a traffic individual obtained by single-target tracking of a traffic image.
[0132] Adaptive processing is the process of automatically adjusting the processing methods, processing order, processing parameters, boundary conditions or constraints based on the characteristics of the data being processed during the processing and analysis process, so as to adapt them to the statistical distribution characteristics and structural characteristics of the data being processed, in order to achieve the best processing results.
[0133] An adaptive decision model is a model that automatically determines the optimal target trajectory point from the first and second trajectory points.
[0134] In one embodiment, the computer device uses a trained adaptive decision model to determine the target trajectory point of each tracked traffic individual in the traffic image from a first trajectory point and a second trajectory point. The computer device uses machine learning techniques to train the adaptive decision model, allowing it to learn rules for selecting the optimal result.
[0135] Machine learning (ML) is a multidisciplinary field involving probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. It specifically studies how computers can simulate or implement human learning behavior to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to endow computers with intelligence; its applications span all areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and instructional learning.
[0136] In one embodiment, the computer device can train an adaptive decision-making model using reinforcement learning. Reinforcement learning refers to the practice whereby if a certain behavioral policy of an agent results in a positive reward (reinforcement signal) from the environment, the agent's tendency to adopt that behavioral policy in the future will be strengthened. The agent's goal is to discover the optimal policy in each discrete state to maximize the expected discounted reward.
[0137] In this embodiment, the computer device performs multi-target tracking on traffic images, which can more quickly obtain the first trajectory point of each tracked traffic individual in the traffic image; performing single-target tracking on traffic images can more accurately obtain the second trajectory point of each tracked traffic individual in the traffic image; and by adopting an adaptive decision model, it can balance the high efficiency of multi-target tracking and the accuracy of single-target tracking, selecting the optimal target trajectory point from the more efficient first trajectory point and the more accurate second trajectory point, thereby improving the overall accuracy of traffic flow prediction. Furthermore, by using an adaptive decision model to fuse the two tracking algorithms, it can alleviate the matching error based on the re-identification algorithm in traditional schemes and avoid the drift problem based on the single-target tracking algorithm in traditional schemes, resulting in more accurate tracking results.
[0138] In one embodiment, multi-target tracking of a traffic image to obtain the first trajectory point of each tracked traffic individual in the traffic image includes: extracting image pyramid features from the traffic image; the image pyramid features include at least two layers of features with progressively increasing scales; upsampling each layer of features in the image pyramid features except for the largest scale feature to obtain the corresponding upsampled features, and fusing the upsampled features with the features of the previous scale corresponding to the corresponding layer through skip connections to obtain the first position feature of each tracked traffic individual in the traffic image; and determining the first trajectory point of each traffic individual based on the first position feature.
[0139] Image pyramid features consist of at least two layers of features with progressively increasing scales. For example, an image pyramid feature may consist of three layers of features with progressively increasing scales of 1 / 4, 1 / 8, and 1 / 16.
[0140] Upsampling and downsampling both involve resampling a digital signal. The resampling rate is compared to the original sampling rate used to obtain the digital signal (e.g., sampled from an analog signal). If the resampling rate is higher than the original rate, it's called upsampling; if it's lower, it's called downsampling. Upsampling is essentially interpolation. Upsampling features are the features obtained through upsampling.
[0141] The first location feature is the characteristic of the location of the tracked traffic individual during multi-target tracking. The tracking queue is a queue used to store the location features of the tracked traffic individuals.
[0142] Computer equipment uses a deep feature fusion network to extract image pyramid features from traffic images. Figure 8 This is a schematic diagram of the deep feature fusion network in one embodiment. The image pyramid feature includes three layers of features with progressively increasing scales of 1 / 4, 1 / 8, and 1 / 16. For each layer of features in the image pyramid feature except for the largest scale feature, i.e., the features at scales of 1 / 8 and 1 / 16 are upsampled to obtain the corresponding upsampled features. Then, the upsampled features are fused with the features at the previous scale corresponding to the corresponding layer through skip connections to obtain the first position feature of each tracked traffic individual in the traffic image.
[0143] The computer device represents the traffic image as Himage × Wimage, where Himage is the height of the traffic image and Wimage is the width. The output feature map is then shaped as C × H × W, where H = Himage / 4, W = Wimage / 4, and C is the number of channels. Next, two parallel heads are added to acquire predicted bounding boxes and tracking features (Re-ID features), respectively. Then, using an online association strategy, the cosine distance between the target detection features and the tracking features is calculated to obtain the feature association matrix of the current frame and the features from previous frames. The target detection features and tracking features are fused, and the direct distance between the Kalman filter and the target detection features is calculated. If this direct distance is greater than a preset threshold, the distance parameter in the feature association matrix is set to infinity; if the direct distance is less than or equal to the preset threshold, the direct distance is weighted with the tracking features to obtain a distance-weighted result. Finally, the distance-weighted result is processed using the Hungarian algorithm to obtain the matching result. If the matching result is successful, the identifier ID, location features, Kalman filter mean and variance, and confidence value of the tracked traffic individual are updated. If the matching result is unsuccessful, the IOU (Intersection over Union) distance between the tracked features and the predicted bounding box is calculated, and then the IOU distance is matched using the Hungarian algorithm to obtain a new matching result. The new matching result is then processed until the specified conditions are met, at which point the first movement trajectory point of the traffic individual is output. The specified conditions can be set as needed and are not limited here.
[0144] In this embodiment, the computer device extracts image pyramid features from the traffic image; for each layer of features in the image pyramid features except for the largest scale feature, upsampling is performed to obtain the corresponding upsampled features, and the upsampled features are fused with the features of the previous scale corresponding to the corresponding layer through skip connections to obtain the first position feature of each traffic individual tracked in the traffic image. There are more skip connections between low-level aggregations, which can extract more accurate first position features, thereby obtaining more accurate first movement trajectory points of traffic individuals.
[0145] In one embodiment, single-target tracking of a traffic image to obtain the second trajectory point of each tracked traffic individual in the traffic image includes: acquiring a target detection image obtained by target detection based on the traffic image; performing convolution processing on the traffic image and the target detection image respectively to extract the first image features of each traffic individual in the traffic image and the second image features of each traffic individual in the target detection image; processing each first image feature and each second image feature separately using an attention mechanism to obtain attention-enhanced first image features and attention-enhanced second image features; performing cross-correlation processing on the attention-enhanced first image features and attention-enhanced second image features to obtain similarity features between the attention-enhanced first image features and attention-enhanced second image features; and obtaining the second trajectory point of each tracked traffic individual in the traffic image based on each similarity feature.
[0146] An object detection image is an image obtained by performing object detection on a traffic image.
[0147] The first image feature is the feature included in the traffic image. The second image feature is the feature included in the object detection image. The similarity feature is the feature that shows a correlation (similarity) between the attention-enhanced first image feature and the attention-enhanced second image feature.
[0148] In one implementation, the computer device can acquire a target detection image obtained through pre-detection. In another implementation, the computer device can perform target detection on traffic images in real time to obtain a target detection image.
[0149] The computer equipment uses a convolutional neural network to perform convolution processing on the traffic image and the target detection image respectively, extracting the first image features of each traffic individual in the traffic image and the second image features of each traffic individual in the target detection image. The convolutional neural network can be pre-trained.
[0150] The computer device performs cross-correlation processing on the first image features and the second image features that have undergone attention enhancement, and can extract the similarity between the first image features and the second image features that have undergone attention enhancement, thereby extracting similarity features that have similarity.
[0151] The computer device uses an attention mechanism to process each first image feature and each second image feature separately, obtaining attention-enhanced first image features and attention-enhanced second image features, including: processing each first image feature using a channel attention mechanism to obtain the channel weights of each first image feature in the channel dimension; multiplying each channel weight by the corresponding first image feature to obtain each first channel image feature; processing each first image feature using a spatial attention mechanism to obtain the spatial weights of each first image feature in the spatial dimension; multiplying each spatial weight by the corresponding first image feature to obtain each first spatial image feature; and superimposing the corresponding first channel image features, first spatial image features, and first image features to obtain attention-enhanced first image features.
[0152] Each second image feature is processed using a channel attention mechanism to obtain the channel weights of each second image feature in the channel dimension; each channel weight is multiplied by the corresponding second image feature to obtain each second channel image feature; each second image feature is processed using a spatial attention mechanism to obtain the spatial weights of each second image feature in the spatial dimension; each spatial weight is multiplied by the corresponding second image feature to obtain each second spatial image feature; the corresponding second channel image feature, second spatial image feature, and second image feature are superimposed to obtain each second image feature enhanced by attention.
[0153] Specifically, the first channel image feature is obtained by multiplying the channel weights by the corresponding first image feature. The first spatial image feature is obtained by multiplying the spatial weights by the corresponding first image feature. The second channel image feature is obtained by multiplying the channel weights by the corresponding second image feature. The second spatial image feature is obtained by multiplying the spatial weights by the corresponding second image feature.
[0154] The computer device uses the following formula to perform cross-correlation processing on the attention-enhanced first image features and the attention-enhanced second image features:
[0155]
[0156] in, It is a similarity feature. It is a Siamese Feature Extraction Network (SiamRPN) structure, meaning that the network structure and parameters are the same for processing traffic images and object detection images respectively. These are networks designed for different tasks (classification networks and regression networks). reg represents the bounding box regression layer, and cls represents the bounding box foreground and background classification layer. and The feature map is fine-tuned through two convolutional layers, where x is the traffic image and z is the object detection image.
[0157] In this embodiment, the computer device additionally utilizes the target detection image obtained from the target detection, and can extract second image features from the target detection image. The first image features of the traffic image and the second image features of the target detection image can be cross-correlated to obtain the second movement trajectory point of each tracked individual in the traffic image more accurately. Furthermore, the computer device employs an attention mechanism to process each first image feature and each second image feature separately, resulting in attention-enhanced first image features and attention-enhanced second image features. This allows for targeted cross-correlation processing of the attention-enhanced first image features and attention-enhanced second image features, enhancing the representational capability of the tracking algorithm while also pruning and compressing the tracking algorithm.
[0158] In one embodiment, obtaining the second movement trajectory point of each tracked traffic individual in the traffic image based on each similarity feature includes: classifying each similarity feature to obtain a classification confidence map; performing regression processing on each similarity feature to obtain a location regression map; determining the second location feature of each traffic individual based on the classification confidence map and the location regression map; and determining the second movement trajectory point of each traffic individual based on the second location feature.
[0159] A classification confidence map is a set of classification confidence scores for each similarity feature. The higher the classification confidence score of a similarity feature, the more reliable the computer device's classification of that similarity feature.
[0160] A location regression map is a collection of multiple location regression values, which are numerical values obtained by regressing similarity features. Location regression values characterize the accuracy of detecting the location features of traffic individuals in an object detection image, corresponding to the similarity features. The higher the location regression value, the more accurate the detection of traffic individuals in the object detection image.
[0161] In one implementation, when obtaining a classification confidence score greater than a preset classification confidence threshold from the classification confidence map, the computer device determines the detection location feature of the traffic individual in the target detection image corresponding to that classification confidence score, obtains the location regression value corresponding to the detection location feature, and when the location regression value is greater than a preset regression threshold, uses the detection location feature as the second location feature of the traffic individual, thereby determining the second movement trajectory point of the traffic individual. In another implementation, the computer device can use the detection location feature corresponding to the highest location regression value as the second location feature of the traffic individual, thereby determining the second movement trajectory point of the traffic individual. In yet another implementation, the computer device can obtain the weighting factors of the classification confidence map and the location regression value respectively, multiply each classification confidence score and location regression value in the classification confidence map by the corresponding weighting factor to obtain a target evaluation score for the detection location feature, and use the detection location feature corresponding to the target evaluation score higher than the preset evaluation score as the second location feature of the traffic individual, thereby determining the second movement trajectory point of the traffic individual. The method by which the computer device determines the second movement trajectory point can be set as needed and is not limited here.
[0162] In this embodiment, the computer device performs classification processing on each similarity feature to obtain a classification confidence map; performs regression processing on each similarity feature to obtain a location regression map; determines the second location feature of the traffic individual based on the classification confidence map and the location regression map; and accurately determines the second movement trajectory point of each traffic individual based on the second location feature.
[0163] Figure 9This is a schematic diagram of a single-target tracking process in one embodiment. A computer device acquires a target detection image 904 obtained from target detection based on a traffic image, which includes the detection location features of each traffic individual. A convolutional neural network 906 is used to convolve the traffic image 902 to extract the first image features 910 of each traffic individual in the traffic image. A convolutional neural network 908 is used to convolve the target detection image 904 to extract the second image features 912 of each traffic individual in the target detection image. An attention mechanism 914 is used to process each of the first image features 910 to obtain attention-enhanced first image features. The same attention mechanism 916 is used to process each of the second image features 912 to obtain attention-enhanced... The strong second image features; cross-correlation processing is performed on the attention-enhanced first image features and attention-enhanced second image features to obtain similarity features 918 between them; for each similarity feature, classification network 920 is used to classify each similarity feature 918 separately to obtain a classification confidence map 924; regression network 922 is used to regress each similarity feature 918 separately to obtain a location regression value 926; based on the classification confidence map and the location regression value, the second location feature of each traffic individual is determined; based on the second location feature, the second movement trajectory point of each traffic individual is determined.
[0164] In one embodiment, the training steps of the adaptive decision model include: acquiring multiple training images; performing multi-target tracking on each training image to obtain a first training trajectory point; performing single-target tracking on each training image to obtain a second training trajectory point; during the current training round, for each training image in the current round, selecting a target training trajectory point from the first and second training trajectory points using the adaptive decision model to be trained, and determining the overlap rate between the target training trajectory point and the real training trajectory point of the training image; accumulating the overlap rates of all training images in the current round to obtain the cumulative overlap rate of the current round; optimizing the adaptive decision model to be trained by maximizing the cumulative overlap rate, and returning to execute the training process of the next round until a preset stopping condition is met to stop training, thereby obtaining a trained adaptive decision model.
[0165] Training images are used to train the adaptive model. The first training trajectory point is the trajectory point obtained by multi-target tracking of the training image. The second training trajectory point is the trajectory point obtained by single-target tracking of the training image. The target training trajectory point is the trajectory point selected from the first and second training trajectory points. The real training trajectory point is the actual trajectory point in the training image. Real training trajectory points have high accuracy; they can be labeled by the user or by a trained trajectory point detection model, without limitation.
[0166] Intersection over Union (IoU) is a metric used to measure the accuracy of object detection in a given dataset. IoU is a simple metric applicable to any task that derives a prediction bounding box from the output. The cumulative overlap rate is the value obtained by accumulating all overlap rates over a single training epoch. A higher cumulative overlap rate indicates that the adaptive decision model is more accurate in selecting better trajectory points.
[0167] The preset stopping condition can be set as needed. For example, the preset stopping condition could be that the training epochs of the adaptive decision model have reached a preset number, or that the cumulative overlap rate has reached a preset cumulative overlap rate, etc., and there are no restrictions here.
[0168] After obtaining the cumulative overlap rate in each round, the computer can use the policy gradient descent algorithm to optimize the adaptive decision model, maximizing the cumulative overlap rate and thus obtaining the trained adaptive decision model. The policy gradient descent algorithm is an optimization algorithm that solves for the minimum value along the gradient descent direction.
[0169] In this embodiment, the computer device trains the adaptive decision model using training images and calculates the cumulative overlap rate of the adaptive decision model in each round. By maximizing the cumulative overlap rate to optimize the adaptive decision model to be trained, a more accurate adaptive decision model can be obtained. The trained adaptive decision model can more accurately determine the target movement trajectory point of each traffic individual tracked in the traffic image during the application (test) process.
[0170] Figure 10This is a framework diagram for traffic flow prediction in one embodiment. For each traffic video containing traffic images from multiple historical moments, the computer device performs feature extraction on the traffic images to obtain image features; performs target detection on the traffic images to obtain quantity features of each type of traffic object; and performs target tracking on the traffic images to obtain traffic flow features. Specifically, target tracking on the traffic images to obtain traffic flow features includes: performing multi-target tracking on the traffic images to obtain the first trajectory point of each tracked traffic individual in the traffic images; performing single-target tracking on the traffic images to obtain the second trajectory point of each tracked traffic individual in the traffic images; using an adaptive decision model to determine the target trajectory point of each tracked traffic individual in the traffic images from the first and second trajectory points; adding the target position features of the target trajectory points in the traffic images to the tracking queue, so that the previous target trajectory points corresponding to the previous traffic images in the tracking queue, together with the target trajectory points corresponding to the current traffic images, constitute the target trajectory of each traffic individual; and determining the traffic flow features of the traffic images based on the number and direction of the target trajectory of each traffic individual in the traffic images.
[0171] For each frame of traffic image, the computer device fuses the quantity features, flow features, and traffic image features corresponding to the traffic image to obtain a traffic feature vector corresponding to the traffic image. Based on the traffic feature vectors corresponding to the traffic images of at least one traffic detection point at the same historical time, a traffic flow map for the corresponding historical time is generated. The traffic flow map is processed using a channel attention mechanism to obtain a channel attention-enhanced traffic flow map. Based on the channel attention-enhanced traffic flow maps corresponding to each historical time, the predicted traffic flow for at least one traffic detection point is determined.
[0172] In one embodiment, determining the predicted traffic flow for at least one traffic detection point based on the traffic flow maps corresponding to each historical time includes: superimposing the traffic flow maps corresponding to each historical time to obtain a superimposed traffic flow map; performing graph convolution and temporal convolution on the superimposed traffic flow map to extract spatiotemporal features; inputting the spatiotemporal features into a fully connected layer, and outputting the predicted traffic flow for at least one traffic detection point through the fully connected layer.
[0173] A traffic flow overlay map is a traffic flow map obtained by overlaying traffic flow maps corresponding to different historical time points. Spatiotemporal features include features with both temporal and spatial dimensions. Graph convolutional is a convolutional operation that propagates the weighted average of the features of each node with the features of its neighboring nodes to the next layer. Temporal convolution is a convolutional operation along the time dimension.
[0174] Computer equipment can extract spatial features from overlay traffic flow maps by performing graph convolution on them; it can also extract temporal features by performing temporal convolution on them. The extracted spatial and temporal features are then fused together to obtain spatiotemporal features.
[0175] Each neuron in a fully connected layer is fully connected to all neurons in the layer preceding it. Fully connected layers can integrate class-discriminating local information from convolutional or pooling layers. By inputting spatiotemporal features into a fully connected layer, a computer device can determine the predicted traffic flow at at least one traffic detection point.
[0176] Furthermore, after obtaining the overlay traffic flow map, the process also includes: the computer device performing channel attention mechanism processing on the overlay traffic flow map to obtain an overlay traffic flow map enhanced by channel attention; and performing graph convolution and temporal convolution on the overlay traffic flow map to extract spatiotemporal features, including: performing graph convolution and temporal convolution on the overlay traffic flow map enhanced by channel attention to extract spatiotemporal features.
[0177] The computer equipment uses a channel attention mechanism to process the overlay traffic flow map, which can enhance the representation ability of the traffic flow prediction network, while also pruning and compressing the traffic flow prediction network.
[0178] In this embodiment, the computer device overlays the traffic flow maps corresponding to each historical time point to obtain an overlay traffic flow map. The overlay traffic flow map includes information on each detection point in both the spatial and temporal dimensions. Therefore, graph convolution and temporal convolution can be performed on the overlay traffic flow map to capture the correlation and dependence between traffic flows at each traffic detection point. Then, the spatiotemporal features are input into the fully connected layer to accurately output the predicted traffic flow for at least one traffic detection point.
[0179] Figure 11 The flowchart below shows a traffic flow prediction method in another embodiment. For each traffic video 1102, which includes traffic images from multiple historical moments, the computer device uses a convolutional neural network 1104 to perform convolution processing on the traffic images and extract traffic image features 1106. For each frame of traffic image, the quantity features 1108, flow features 1110, and traffic image features 1106 corresponding to the traffic image are fused to obtain a traffic feature vector 1112 corresponding to the corresponding traffic image.
[0180] The computer device overlays the traffic flow maps corresponding to each historical moment to obtain an overlay traffic flow map 1114; graph convolution and time-dimensional convolution are applied to the overlay traffic flow map 1114 to extract spatiotemporal features; the spatiotemporal features are input into a fully connected layer 1118, and the predicted traffic flow 1120 of at least one traffic detection point is output through the fully connected layer 1118.
[0181] In one embodiment, the traffic flow maps corresponding to each historical time point are superimposed to obtain a superimposed traffic flow map, including: obtaining at least two periods; for each period, superimposing the traffic flow maps corresponding to each historical time point within the period to obtain a periodic traffic flow map; and superimposing the periodic traffic flow maps corresponding to each period to obtain a superimposed traffic flow map.
[0182] A periodic traffic flow map is a traffic flow map obtained by overlaying the traffic flow maps corresponding to various historical moments within a period. The period can be set as needed. For example, the period can be one hour, one day, one week, one month, etc., and is not limited to these.
[0183] For each cycle, the computer device overlays the traffic flow maps corresponding to each historical moment within the cycle to obtain the cycle traffic flow map corresponding to each cycle. Then, by overlaying the cycle traffic flow maps corresponding to each cycle, the superimposed traffic flow map is obtained. This superimposed traffic flow map includes the correlation between each cycle. The spatiotemporal features extracted from this superimposed traffic flow map include the correlation between each cycle, which can more accurately determine the predicted traffic flow at at least one traffic detection point.
[0184] In one embodiment, a traffic flow prediction method is provided, applied to a computer device, comprising the following steps:
[0185] Step 1: Obtain traffic videos collected at at least one traffic detection point.
[0186] Step 2: For each traffic video containing traffic images from multiple historical moments, extract image pyramid features from the traffic images.
[0187] Step 3: Process each layer of the image pyramid features using the channel attention mechanism to obtain the corresponding channel pyramid features; process each layer of the image pyramid features using the spatial attention mechanism to obtain the corresponding spatial pyramid features; and superimpose the image pyramid features, channel pyramid features, and spatial pyramid features to obtain the attention-enhanced image pyramid features.
[0188] Step 4: Perform a convolution operation on the attention-enhanced image pyramid features to generate candidate bounding boxes for each individual traffic entity included in the traffic image.
[0189] Step 5: For each candidate box, identify the category of traffic object to which the traffic individual in the candidate box belongs, and determine the corresponding category confidence; select candidate boxes with a category confidence higher than the confidence threshold as target boxes; count the category of traffic object to which the traffic individual in each target box belongs, and determine the quantity feature of each category of traffic object included in the traffic image based on the target boxes corresponding to traffic individuals belonging to the same traffic object.
[0190] Step 6: For traffic images from multiple historical moments included in each traffic video, extract image pyramid features from the traffic images; the image pyramid features include at least two layers of features with progressively increasing scales; for each layer of features in the image pyramid features except for the largest scale feature, upsample the corresponding upsampled features, and fuse the upsampled features with the features of the previous scale corresponding to the corresponding layer through skip connections to obtain the first position feature of each tracked traffic individual in the traffic image; determine the first movement trajectory point of each traffic individual based on the first position feature.
[0191] Step 7: For traffic images from multiple historical moments included in each traffic video, acquire target detection images obtained by target detection based on the traffic images; perform convolution processing on the traffic images and target detection images respectively to extract the first image features of each traffic individual in the traffic images and the second image features of each traffic individual in the target detection images; use an attention mechanism to process each first image feature and each second image feature respectively to obtain attention-enhanced first image features and attention-enhanced second image features; perform cross-correlation processing on the attention-enhanced first image features and attention-enhanced second image features to obtain similarity features between the attention-enhanced first image features and attention-enhanced second image features; perform classification processing on each similarity feature to obtain a classification confidence map; perform regression processing on each similarity feature to obtain a location regression value, and determine the second location feature of each traffic individual based on the classification confidence map and the location regression value; determine the second movement trajectory point of each traffic individual based on the second location feature.
[0192] Step 8: Using an adaptive decision model, determine the target trajectory point for each individual traffic being tracked in the traffic image from the first and second trajectory points. The training steps of the adaptive decision model include: acquiring multiple training images; performing multi-target tracking on each training image to obtain a first training trajectory point; performing single-target tracking on each training image to obtain a second training trajectory point; during the current training round, for each training image in the current round, selecting a target training trajectory point from the first and second training trajectory points using the adaptive decision model to be trained, and determining the overlap rate between the target training trajectory point and the actual training trajectory point of the training image; accumulating the overlap rates of all training images in the current round to obtain the cumulative overlap rate for the current round; optimizing the adaptive decision model to be trained by maximizing the cumulative overlap rate, and returning to execute the next round of training until a preset stopping condition is met, at which point training stops, resulting in a trained adaptive decision model.
[0193] Step 9: Add the target location features of the target movement trajectory points in the traffic image to the tracking queue, so that the target movement trajectory points corresponding to the previous traffic images in the tracking queue, together with the target movement trajectory points corresponding to the current traffic image, constitute the target movement trajectory of each traffic individual; determine the traffic flow characteristics of the traffic image based on the number and direction of the target movement trajectories of each traffic individual in the traffic image.
[0194] Step 10: For each traffic video containing traffic images from multiple historical moments, extract traffic image features from the traffic images.
[0195] Step 11: For each frame of traffic image, fuse the quantity features, flow features and traffic image features corresponding to the traffic image to obtain the traffic feature vector corresponding to the traffic image.
[0196] Step 12: Generate a traffic flow map for the corresponding historical time based on the traffic feature vector corresponding to the traffic image of at least one traffic detection point at the same historical time.
[0197] Step 13: Obtain at least two cycles; for each cycle, overlay the traffic flow maps corresponding to each historical moment within the cycle to obtain a cycle traffic flow map; overlay the cycle traffic flow maps corresponding to each cycle to obtain an overlay traffic flow map.
[0198] Step 14: Perform graph convolution and time-dimensional convolution on the overlay traffic flow map to extract spatiotemporal features; input the spatiotemporal features into the fully connected layer, and output the predicted traffic flow for at least one traffic detection point through the fully connected layer.
[0199] In this embodiment, traffic videos collected at at least one traffic detection point are acquired. For each traffic video containing traffic images from multiple historical moments, the quantity features, flow features, and traffic image features of each type of traffic object included in the traffic image are accurately extracted. For each frame of traffic image, the quantity features, flow features, and traffic image features corresponding to the corresponding traffic image are fused to obtain a traffic feature vector corresponding to the corresponding traffic image. Based on the traffic feature vectors corresponding to the traffic images of at least one traffic detection point at the same historical moment, a traffic flow map for the corresponding historical moment is generated. It can be seen that the traffic flow map contains both spatial information and temporal information for each traffic detection point. Therefore, based on the traffic flow maps corresponding to each historical moment, the correlation between the features at each historical moment in the temporal dimension and the spatial dimension can be obtained, thereby more accurately determining the predicted traffic flow for at least one traffic detection point.
[0200] This application also provides an application scenario in which the above-described traffic flow prediction method is applied. Specifically, the traffic flow prediction method is applied in this scenario as follows:
[0201] Computer equipment acquires traffic videos collected by cameras installed at various traffic monitoring points. It can fully extract features from multiple historical traffic images collected at each monitoring point, as well as the correlation and dependence of these features in the temporal and spatial dimensions. This allows for accurate determination of the predicted traffic flow at at least one monitoring point. Effective traffic flow prediction improves the operational efficiency of transportation networks, reduces safety hazards, and facilitates rational travel planning for passengers, thus holding significant importance for the development of intelligent transportation systems.
[0202] This application also provides another application scenario in which the above-described traffic flow prediction method is applied. Specifically, the traffic flow prediction method is applied in this scenario as follows:
[0203] Computer equipment acquires traffic videos collected from various traffic detection points within a historical period. It can fully extract the features of each traffic image within the historical period, as well as the correlation and dependence of each feature in the time and spatial dimensions. This allows for the accurate determination of the predicted traffic flow of at least one traffic detection point in the future period corresponding to the historical period.
[0204] For example, if computer equipment acquires traffic videos collected from various traffic monitoring points on the previous day, it can accurately determine the predicted traffic flow for at least one traffic monitoring point on the next day.
[0205] For example, if computer equipment acquires traffic videos collected from various traffic monitoring points on the previous Friday, it can accurately determine the predicted traffic flow for at least one traffic monitoring point on the following Friday.
[0206] It should be understood that, although Figure 2 , Figure 4 , Figure 5 , Figure 7 , Figure 9 and Figure 11 The steps in the flowchart are shown sequentially as indicated by the arrows, but these steps are not necessarily executed in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order in which these steps are executed, and they can be performed in other orders. Figure 2 , Figure 4 , Figure 5 , Figure 7 , Figure 9 and Figure 11 At least some of the steps in the process may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but may be executed at different times. The execution order of these steps or stages is not necessarily sequential, but may be executed in turn or alternately with other steps or at least some of the steps or stages in other steps.
[0207] In one embodiment, such as Figure 12 As shown, a traffic flow prediction device is provided. This device can employ software modules, hardware modules, or a combination of both as part of a computer device. Specifically, the device includes: an acquisition module 1202, a determination module 1204, a fusion module 1206, and a generation module 1208, wherein:
[0208] The acquisition module 1202 is used to acquire traffic videos collected at at least one traffic detection point.
[0209] The determination module 1204 is used to determine the quantity characteristics of each type of traffic object included in the traffic image, the flow characteristics of the traffic image, and the traffic image characteristics of the traffic image for each traffic video that includes multiple historical traffic images.
[0210] The fusion module 1206 is used to fuse the quantity features, flow features and traffic image features corresponding to each traffic image for each frame of traffic image, so as to obtain the traffic feature vector corresponding to the corresponding traffic image.
[0211] The generation module 1208 is used to generate a traffic flow map for the corresponding historical time based on the traffic feature vector corresponding to the traffic image of at least one traffic detection point at the same historical time.
[0212] The determination module 1204 is also used to determine the predicted traffic flow at at least one traffic detection point based on the traffic flow maps corresponding to each historical time.
[0213] The aforementioned traffic flow prediction device acquires traffic videos collected at at least one traffic detection point. For each traffic video containing traffic images from multiple historical moments, it determines the quantity characteristics, flow characteristics, and traffic image characteristics of each type of traffic object included in the traffic image. For each frame of traffic image, it fuses the quantity characteristics, flow characteristics, and traffic image characteristics corresponding to the corresponding traffic image to obtain a traffic feature vector corresponding to the corresponding traffic image. Based on the traffic feature vectors corresponding to the traffic images of at least one traffic detection point at the same historical moment, it generates a traffic flow map for the corresponding historical moment. It can be seen that the traffic flow map contains both spatial information and temporal information for each traffic detection point. Therefore, based on the traffic flow maps corresponding to each historical moment, the correlation between the features at each historical moment in the temporal dimension and the spatial dimension can be obtained, thereby more accurately determining the predicted traffic flow at at least one traffic detection point.
[0214] In one embodiment, the determining module 1204 is further configured to extract image pyramid features from the traffic image; perform attention mechanism processing on the image pyramid features to obtain attention-enhanced image pyramid features; perform convolution operation on the attention-enhanced image pyramid features to generate candidate boxes containing each traffic individual included in the traffic image; and determine the quantity features corresponding to the corresponding category of traffic objects based on the candidate boxes containing all traffic individuals belonging to the same category of traffic objects.
[0215] In one embodiment, the determining module 1204 is further configured to, for each candidate box, identify the category of traffic object to which the traffic individual in the candidate box belongs, and determine the corresponding category confidence; filter out candidate boxes with category confidence higher than the confidence threshold as target boxes; count the category of traffic object to which the traffic individual in each target box belongs, and determine the quantity feature of each type of traffic object included in the traffic image based on the target boxes corresponding to traffic individuals belonging to the same traffic object.
[0216] In one embodiment, the determining module 1204 is further configured to perform channel attention mechanism processing on each layer of features in the image pyramid features to obtain corresponding channel pyramid features; perform spatial attention mechanism processing on each layer of features in the image pyramid features to obtain corresponding spatial pyramid features; and superimpose the image pyramid features, channel pyramid features and spatial pyramid features to obtain attention-enhanced image pyramid features.
[0217] In one embodiment, the determining module 1204 is further configured to perform target tracking on the traffic image to obtain the target movement trajectory point of each traffic individual tracked in the traffic image; add the target location features of the target movement trajectory points in the traffic image to the tracking queue, so that the previous target movement trajectory points corresponding to the previous traffic image in the tracking queue, together with the target movement trajectory points corresponding to the current traffic image, constitute the target movement trajectory of each traffic individual; and determine the traffic flow characteristics of the traffic image based on the number and direction of the target movement trajectories of each traffic individual in the traffic image.
[0218] In one embodiment, the determining module 1204 is further configured to perform multi-target tracking on the traffic image to obtain the first action trajectory point of each tracked traffic individual in the traffic image; perform single-target tracking on the traffic image to obtain the second action trajectory point of each tracked traffic individual in the traffic image; and use an adaptive decision model to determine the target action trajectory point of each tracked traffic individual in the traffic image from the first action trajectory point and the second action trajectory point.
[0219] In one embodiment, the determining module 1204 is further configured to extract image pyramid features from the traffic image; the image pyramid features include at least two layers of features with progressively increasing scales; for each layer of features in the image pyramid features except for the feature with the largest scale, upsampling is performed to obtain the corresponding upsampled features, and the upsampled features are fused with the features of the previous scale corresponding to the corresponding layer through skip connections to obtain the first position feature of each traffic individual tracked in the traffic image; the first movement trajectory point of each traffic individual is determined based on the first position feature.
[0220] In one embodiment, the determining module 1204 is further configured to acquire a target detection image obtained by target detection based on a traffic image; perform convolution processing on the traffic image and the target detection image respectively to extract the first image features of each traffic individual in the traffic image and the second image features of each traffic individual in the target detection image; process each first image feature and each second image feature using an attention mechanism to obtain attention-enhanced first image features and attention-enhanced second image features; perform cross-correlation processing on the attention-enhanced first image features and attention-enhanced second image features to obtain similarity features between the attention-enhanced first image features and attention-enhanced second image features; and obtain the second movement trajectory point of each tracked traffic individual in the traffic image based on each similarity feature.
[0221] In one embodiment, the determining module 1204 is further configured to classify each similarity feature to obtain a classification confidence map; perform regression processing on each similarity feature to obtain a location regression value; determine the second location feature of each traffic individual based on the classification confidence map and the location regression value; and determine the second movement trajectory point of each traffic individual based on the second location feature.
[0222] In one embodiment, the apparatus further includes a training module, configured to acquire multiple training images, perform multi-target tracking on each training image to obtain a first training trajectory point, perform single-target tracking on each training image to obtain a second training trajectory point, and during the current training round, for each training image in the current round, select a target training trajectory point from the first and second training trajectory points using the adaptive decision model to be trained, and determine the overlap rate between the target training trajectory point and the real training trajectory point of the training image; accumulate the overlap rates of all training images in the current round to obtain the cumulative overlap rate of the current round; optimize the adaptive decision model to be trained by maximizing the cumulative overlap rate, and return to execute the training process of the next round until the preset stopping condition is met to stop training, thereby obtaining the trained adaptive decision model.
[0223] In one embodiment, the determining module 1204 is further configured to overlay the traffic flow maps corresponding to each historical time to obtain an overlay traffic flow map; perform graph convolution and time-dimensional convolution on the overlay traffic flow map to extract spatiotemporal features; input the spatiotemporal features into a fully connected layer, and output the predicted traffic flow of at least one traffic detection point through the fully connected layer.
[0224] In one embodiment, the determining module 1204 is further configured to acquire at least two cycles; for each cycle, the traffic flow maps corresponding to each historical moment within the cycle are superimposed to obtain a cycle traffic flow map; and the cycle traffic flow maps corresponding to each cycle are superimposed to obtain a superimposed traffic flow map.
[0225] Specific limitations regarding the traffic flow prediction device can be found in the limitations of the traffic flow prediction method described above, and will not be repeated here. Each module in the aforementioned traffic flow prediction device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device in hardware form, or stored in the memory of a computer device in software form, so that the processor can call and execute the corresponding operations of each module.
[0226] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 13As shown, the computer device includes a processor, memory, and a network interface connected via a system bus. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores data such as traffic video, quantity characteristics, flow characteristics, and traffic image characteristics. The network interface is used for communication with external terminals via a network connection. When executed by the processor, the computer program implements a traffic flow prediction method.
[0227] Those skilled in the art will understand that Figure 13 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0228] In one embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.
[0229] In one embodiment, a computer-readable storage medium is provided storing a computer program that, when executed by a processor, implements the steps in the above method embodiments.
[0230] In one embodiment, a computer program product or computer program is provided, the computer program product or computer program including computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, causing the computer device to perform the steps in the above method embodiments.
[0231] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the methods described above. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, or optical storage, etc. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM can be in various forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM), etc.
[0232] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0233] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.
Claims
1. A traffic flow prediction method, characterized in that, The method includes: Acquire traffic videos collected at at least one traffic detection point; For each traffic video containing traffic images from multiple historical moments, the quantity characteristics of each type of traffic object included in the traffic image, the flow characteristics of the traffic image, and the traffic image characteristics are determined. The method for determining the flow characteristics of the traffic image includes: performing multi-target tracking on the traffic image to obtain a first action trajectory point for each tracked traffic individual in the traffic image; performing single-target tracking on the traffic image to obtain a second action trajectory point for each tracked traffic individual in the traffic image; using an adaptive decision model to determine the target action trajectory point for each tracked traffic individual in the traffic image from the first and second action trajectory points; adding the target location characteristics of the target action trajectory points in the traffic image to a tracking queue, such that the previous target action trajectory points corresponding to previous traffic images in the tracking queue, together with the target action trajectory points corresponding to the current traffic image, constitute the target action trajectory of each traffic individual; and determining the flow characteristics of the traffic image based on the quantity and trajectory direction of the target action trajectories of each traffic individual in the traffic image. For each frame of traffic image, the quantity features, flow features and traffic image features corresponding to the corresponding traffic image are fused to obtain the traffic feature vector corresponding to the corresponding traffic image; Based on the traffic feature vectors corresponding to the traffic images of the at least one traffic detection point at the same historical time, a traffic flow map for the corresponding historical time is generated. Based on the traffic flow maps corresponding to each historical moment, the predicted traffic flow of the at least one traffic detection point is determined.
2. The method according to claim 1, characterized in that, The method for determining the quantity characteristics of each type of traffic object included in the traffic image includes: Image pyramid features are extracted from the traffic images; The image pyramid features are processed using an attention mechanism to obtain attention-enhanced image pyramid features; The attention-enhanced image pyramid features are convolved to generate candidate bounding boxes for each individual traffic element in the traffic image. Based on the candidate frames containing all traffic individuals belonging to the same traffic object category, determine the quantitative features corresponding to the corresponding traffic object category.
3. The method according to claim 2, characterized in that, The step of determining the quantitative features corresponding to the corresponding category of traffic objects based on the candidate boxes containing all traffic individuals belonging to the same category of traffic objects includes: For each candidate box, identify the category of the traffic object to which the traffic individual in the candidate box belongs, and determine the corresponding category confidence level; Candidate boxes with a category confidence score higher than the confidence threshold are selected as target boxes; The categories of traffic objects to which the traffic individuals in each of the target boxes belong are counted, and the quantity characteristics of each category of traffic objects included in the traffic image are determined based on the target boxes corresponding to traffic individuals belonging to the same traffic object.
4. The method according to claim 2, characterized in that, The process of applying an attention mechanism to the image pyramid features to obtain attention-enhanced image pyramid features includes: The channel attention mechanism is applied to each layer of the image pyramid features to obtain the corresponding channel pyramid features. The spatial attention mechanism is applied to each layer of the image pyramid features to obtain the corresponding spatial pyramid features. By superimposing the image pyramid features, the channel pyramid features, and the spatial pyramid features, attention-enhanced image pyramid features are obtained.
5. The method according to claim 1, characterized in that, The step of performing multi-target tracking on the traffic image to obtain the first movement trajectory point of each tracked traffic individual in the traffic image includes: Image pyramid features are extracted from the traffic image; the image pyramid features include at least two layers of features with progressively increasing scales. For each layer of features in the image pyramid feature except for the largest feature, upsampling is performed to obtain the corresponding upsampled feature. The upsampled feature is then fused with the feature of the previous scale corresponding to the corresponding layer through skip connections to obtain the first position feature of each traffic individual tracked in the traffic image. The first movement trajectory point of each traffic individual is determined based on the first location feature.
6. The method according to claim 1, characterized in that, The step of performing single-target tracking on the traffic image to obtain the second movement trajectory point of each tracked traffic individual in the traffic image includes: Obtain the target detection image obtained by target detection based on the traffic image; The traffic image and the target detection image are respectively subjected to convolution processing to extract the first image features of each traffic individual in the traffic image and the second image features of each traffic individual in the target detection image; An attention mechanism is used to process each of the first image features and each of the second image features to obtain attention-enhanced first image features and attention-enhanced second image features. Cross-correlation is performed on the first image features enhanced by attention and the second image features enhanced by attention to obtain the similarity features between the first image features enhanced by attention and the second image features enhanced by attention. Based on the aforementioned similarity features, the second movement trajectory point of each individual traffic being tracked in the traffic image is obtained.
7. The method according to claim 6, characterized in that, The step of obtaining the second movement trajectory point of each tracked traffic individual in the traffic image based on each of the similarity features includes: Each of the similarity features is classified separately to obtain a classification confidence map; Regression processing is performed on each of the aforementioned similarity features to obtain a location regression map; Based on the classification confidence map and the location regression map, the second location features of each traffic individual are determined; The second movement trajectory point of each traffic individual is determined based on the second location feature.
8. The method according to claim 1, characterized in that, The training steps for the adaptive decision-making model include: Multiple training images are acquired, and multi-target tracking is performed on each training image to obtain the first training trajectory point; single-target tracking is performed on each training image to obtain the second training trajectory point. During the current training round, for each training image in the current round, a target training trajectory point is selected from the first training trajectory point and the second training trajectory point through the adaptive decision model to be trained, and the overlap rate between the target training trajectory point and the real training trajectory point of the training image is determined. The cumulative overlap rate of the current round is obtained by summing the overlap rates of all training images in the current round. The adaptive decision model to be trained is optimized by maximizing the cumulative overlap rate, and the training process is repeated for the next round until a preset stopping condition is met, thus obtaining a trained adaptive decision model.
9. The method according to claim 1, characterized in that, The step of determining the predicted traffic flow at the at least one traffic detection point based on the traffic flow maps corresponding to each historical time includes: By overlaying the traffic flow maps corresponding to each historical moment, a superimposed traffic flow map is obtained. The superimposed traffic flow map is subjected to graph convolution and temporal convolution respectively to extract spatiotemporal features; The spatiotemporal features are input into a fully connected layer, and the predicted traffic flow of the at least one traffic detection point is output through the fully connected layer.
10. The method according to claim 9, characterized in that, The process of overlaying traffic flow maps corresponding to each historical time point to obtain an overlaid traffic flow map includes: Obtain at least two cycles; For each cycle, the traffic flow maps corresponding to each historical moment within the cycle are superimposed to obtain the cycle traffic flow map. The traffic flow maps corresponding to each cycle are superimposed to obtain the superimposed traffic flow map.
11. A traffic flow prediction device, characterized in that, The device includes: The acquisition module is used to acquire traffic videos collected at at least one traffic detection point. A determination module is used to determine, for each traffic video containing traffic images from multiple historical moments, the quantity characteristics of each type of traffic object included in the traffic image, the flow characteristics of the traffic image, and the traffic image characteristics of the traffic image; wherein, the determination module is further used to perform multi-target tracking on the traffic image to obtain a first action trajectory point for each tracked traffic individual in the traffic image; perform single-target tracking on the traffic image to obtain a second action trajectory point for each tracked traffic individual in the traffic image; use an adaptive decision model to determine a target action trajectory point for each tracked traffic individual in the traffic image from the first action trajectory point and the second action trajectory point; add the target position characteristics of the target action trajectory points in the traffic image to a tracking queue, so that the previous target action trajectory points corresponding to the previous traffic image in the tracking queue, together with the target action trajectory points corresponding to the current traffic image, constitute the target action trajectory of each traffic individual; and determine the flow characteristics of the traffic image based on the quantity and trajectory direction of the target action trajectories of each traffic individual in the traffic image; The fusion module is used to fuse the quantity features, flow features and traffic image features corresponding to each traffic image for each frame of traffic image, respectively, to obtain the traffic feature vector corresponding to the corresponding traffic image; The generation module is used to generate a traffic flow map for the corresponding historical time based on the traffic feature vectors corresponding to the traffic images of the at least one traffic detection point at the same historical time. The determining module is further configured to determine the predicted traffic flow of the at least one traffic detection point based on the traffic flow maps corresponding to each historical time.
12. The traffic flow prediction device according to claim 11, characterized in that, The determining module is further configured to extract image pyramid features from the traffic image; perform attention mechanism processing on the image pyramid features to obtain attention-enhanced image pyramid features; perform convolution operation on the attention-enhanced image pyramid features to generate candidate boxes for each traffic individual included in the traffic image; and determine the quantity features corresponding to the corresponding category of traffic objects based on the candidate boxes of all traffic individuals belonging to the same category of traffic objects.
13. The traffic flow prediction device according to claim 12, characterized in that, The determining module is further configured to, for each candidate box, identify the category of the traffic object to which the traffic individual in the candidate box belongs, and determine the corresponding category confidence; and filter out candidate boxes with a category confidence higher than the confidence threshold as target boxes; The categories of traffic objects to which the traffic individuals in each of the target boxes belong are counted, and the quantity characteristics of each category of traffic objects included in the traffic image are determined based on the target boxes corresponding to traffic individuals belonging to the same traffic object.
14. The traffic flow prediction device according to claim 12, characterized in that, The determining module is also used to process each layer of features in the image pyramid features using a channel attention mechanism to obtain the corresponding channel pyramid features. The spatial attention mechanism is applied to each layer of the image pyramid features to obtain the corresponding spatial pyramid features. By superimposing the image pyramid features, the channel pyramid features, and the spatial pyramid features, attention-enhanced image pyramid features are obtained.
15. The traffic flow prediction device according to claim 11, characterized in that, The determining module is further configured to extract image pyramid features from the traffic image; the image pyramid features include at least two layers of features with progressively increasing scales; for each layer of features in the image pyramid features except for the feature with the largest scale, upsampling is performed to obtain the corresponding upsampled features, and the upsampled features are fused with the features of the previous scale corresponding to the corresponding layer through skip connections to obtain the first position features of each traffic individual tracked in the traffic image; The first movement trajectory point of each traffic individual is determined based on the first location feature.
16. The traffic flow prediction device according to claim 11, characterized in that, The determining module is further configured to acquire a target detection image obtained by target detection based on the traffic image; perform convolution processing on the traffic image and the target detection image respectively to extract the first image features of each traffic individual in the traffic image and the second image features of each traffic individual in the target detection image; process each of the first image features and each of the second image features using an attention mechanism to obtain attention-enhanced first image features and attention-enhanced second image features; perform cross-correlation processing on the attention-enhanced first image features and attention-enhanced second image features to obtain similarity features between the attention-enhanced first image features and attention-enhanced second image features; Based on the aforementioned similarity features, the second movement trajectory point of each individual traffic being tracked in the traffic image is obtained.
17. The traffic flow prediction device according to claim 16, characterized in that, The determining module is further configured to perform classification processing on each of the similarity features to obtain a classification confidence map; perform regression processing on each of the similarity features to obtain a location regression map; and determine the second location feature of each traffic individual based on the classification confidence map and the location regression map. The second movement trajectory point of each traffic individual is determined based on the second location feature.
18. The traffic flow prediction device according to claim 11, characterized in that, The device further includes a training module, which is used to acquire multiple training images, perform multi-target tracking on each training image to obtain a first training trajectory point, and perform single-target tracking on each training image to obtain a second training trajectory point. During the current training round, for each training image in the current round, a target training trajectory point is selected from the first training trajectory point and the second training trajectory point through the adaptive decision model to be trained, and the overlap rate between the target training trajectory point and the real training trajectory point of the training image is determined. The cumulative overlap rate of all training images in the current round is accumulated to obtain the cumulative overlap rate of the current round. The adaptive decision model to be trained is optimized by maximizing the cumulative overlap rate, and the training process of the next round is returned to be executed until the preset stopping condition is met, and the training is stopped to obtain the trained adaptive decision model.
19. The traffic flow prediction device according to claim 11, characterized in that, The determining module is further configured to overlay the traffic flow maps corresponding to each historical time point to obtain an overlay traffic flow map; perform graph convolution and time-dimensional convolution on the overlay traffic flow map to extract spatiotemporal features; input the spatiotemporal features into a fully connected layer, and output the predicted traffic flow of the at least one traffic detection point through the fully connected layer.
20. The traffic flow prediction device according to claim 19, characterized in that, The determining module is also used to obtain at least two cycles; for each cycle, the traffic flow maps corresponding to each historical moment in the cycle are superimposed to obtain a cycle traffic flow map; the cycle traffic flow maps corresponding to each cycle are superimposed to obtain a superimposed traffic flow map.
21. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 10.
22. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 10.
23. A computer program product comprising computer instructions, characterized in that, When the computer instructions are executed by the processor, they implement the steps of the method according to any one of claims 1 to 10.