Vehicle speed measurement method and system based on video target depth estimation

By combining target detection and depth estimation algorithms, and using a single camera to acquire the three-dimensional coordinates and displacement of a vehicle, the problem of insufficient accuracy in traditional vehicle speed measurement methods is solved, achieving accurate vehicle speed measurement and improved stability.

CN116071391BActive Publication Date: 2026-06-16CHINA TELECOM CLOUD TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA TELECOM CLOUD TECH CO LTD
Filing Date
2022-11-14
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Traditional vehicle speed measurement methods rely on target detection, which yields coarse results and is prone to errors. Existing image and video-based methods lack sufficient accuracy.

Method used

By combining object detection and depth estimation algorithms, the system acquires depth information and vehicle position from images, uses a single camera to measure vehicle speed, and calculates the vehicle's displacement on the road surface to obtain accurate speed.

🎯Benefits of technology

It achieves accurate vehicle speed measurement, reduces system complexity and cost, and improves operational stability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116071391B_ABST
    Figure CN116071391B_ABST
Patent Text Reader

Abstract

The application provides a vehicle speed measurement method and system based on video target depth estimation, wherein depth information of a road surface is obtained by estimating the depth information of the road surface, position information of a vehicle target is obtained by target detection and target tracking through a deep learning algorithm, real-time simulation of running of the vehicle on the road surface is realized, displacement of the vehicle in a unit time is obtained, the result is not affected by the size of the vehicle, and a relatively accurate vehicle speed measurement result can be obtained.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer vision, and more specifically to the field of video surveillance technology. Background Technology

[0002] Vehicle speed measurement is a crucial component of traffic management systems. Speeding is the most common modern traffic violation, and it accounts for the vast majority of traffic accidents. Traditional road vehicle speed measurement typically uses electromagnetic coils or ultrasonic devices, which require separate installation and incur significant operating and maintenance costs.

[0003] Current image and video-based methods for measuring vehicle speed generally use object detection as the basic algorithm. This involves calculating the displacement of targets between adjacent frames, comparing the results with the vehicle's dimensions to obtain the vehicle's travel distance, and thus measuring the speed of moving vehicles. However, this method, which uses object detection to obtain vehicle scale information, yields relatively coarse results and is prone to significant errors.

[0004] This invention proposes a vehicle speed measurement method and system based on video target depth estimation. By estimating the road surface depth information, the road surface structure information is obtained. The vehicle position information is obtained through target detection and target tracking, thereby simulating the vehicle's operation on the road surface in real time and obtaining the vehicle's displacement per unit time. The result is not affected by the size of the vehicle and can obtain relatively accurate vehicle speed measurement results. Summary of the Invention

[0005] In view of the above problems, the present invention is proposed.

[0006] According to one aspect of the present invention, a vehicle speed measurement method based on video target depth estimation is proposed, the method comprising:

[0007] Step S1: Obtain continuous image frame information from the image video stream;

[0008] Step S2: Use YOLO or other object detection models to detect target objects in the image and obtain the location and category information of the target objects;

[0009] Step S3: Use SORT or DeepSort target tracking algorithms to match and track the detected targets, filter out low-probability detection boxes, and obtain valid tracking targets and their tracking IDs;

[0010] Step S4: Use a depth estimation algorithm to obtain pixel-by-pixel depth information in the image;

[0011] Step S5: Model the three-dimensional planar information of the road surface using the automatically acquired or manually set road surface area;

[0012] Step S6: Obtain the three-dimensional coordinate information of the vehicle based on its position in the image;

[0013] Step S7: Calculate the positional changes of the target vehicle in multiple consecutive frames of images to obtain the speed of the target vehicle.

[0014] This invention also proposes a vehicle speed measurement system based on video target depth estimation, the system comprising:

[0015] Vehicle target detection and tracking module: acquires continuous image frame information from the image video stream; uses YOLO or other target detection models to detect target objects in the image and obtains the position and category information of the target objects; uses SORT or DeepSort target tracking algorithms to match and track the detected targets, filters low-probability detection boxes, and obtains effective tracking targets and their tracking IDs;

[0016] Image depth estimation module: Uses depth estimation algorithms to obtain pixel-by-pixel depth information in the image;

[0017] Road surface modeling and vehicle position estimation module: Models the three-dimensional planar information of the road surface using automatically acquired or manually set road surface areas; obtains the three-dimensional coordinate information of the vehicle target based on its position in the image;

[0018] Vehicle speed calculation module: Calculates the positional changes of the target vehicle in multiple consecutive frames of images to obtain the speed of the target vehicle.

[0019] Compared with the prior art, this application has the following beneficial effects:

[0020] 1. This invention utilizes a depth estimation algorithm based on a single camera to estimate the spatial position information of each target vehicle in an image, thereby obtaining the motion of the target vehicle between frames and estimating the speed of the target vehicle;

[0021] 2. By combining target detection with depth estimation algorithms, the three-dimensional spatial information of each target can be effectively extracted. By using real-time target detection algorithms and real-time depth estimation algorithms, the performance is good, the cost is low, and the accuracy is reliable.

[0022] 3. This method has a simple structure and requires no other equipment besides a camera, which can reduce the complexity of the system and improve operational stability. Attached Figure Description

[0023] The above and other objects, features, and advantages of the present invention will become more apparent from the more detailed description of the embodiments of the invention in conjunction with the accompanying drawings. The drawings are provided to further illustrate the embodiments of the invention and form part of the specification. They are used together with the embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings, the same reference numerals generally represent the same parts or steps.

[0024] Figure 1 A schematic block diagram of a vehicle speed measurement method based on video target depth estimation according to an embodiment of the present invention is shown. Detailed Implementation

[0025] To make the objectives, technical solutions, and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are merely a part of the embodiments of the present invention, and not all of the embodiments of the present invention. It should be understood that the present invention is not limited to the exemplary embodiments described herein. Based on the embodiments of the present invention described herein, all other embodiments obtained by those skilled in the art without inventive effort should fall within the protection scope of the present invention.

[0026] Example 1:

[0027] To address the problems mentioned above, a vehicle speed measurement method and system based on video target depth estimation is proposed. Specifically, Figure 1 As shown, this invention proposes a vehicle speed measurement method based on video target depth estimation. Through target detection and tracking, the method acquires the position and category information of the target object, calculates a depth map, estimates the road surface depth information, models the road surface to obtain road structure information, then estimates the vehicle's three-dimensional coordinates by estimating the vehicle's position, and finally calculates the vehicle speed to obtain a relatively accurate vehicle speed measurement result. Based on the above, this application proposes a vehicle speed measurement method based on video target depth estimation.

[0028] Specifically, according to an embodiment of the present invention, a vehicle speed measurement method based on video target depth estimation includes:

[0029] Step S1: Obtain continuous image frame information from the image video stream;

[0030] Step S2: Use YOLO or other object detection models to detect target objects in the image and obtain the location and category information of the target objects;

[0031] Step S3: Use SORT or DeepSort target tracking algorithms to match and track the detected targets, filter out low-probability detection boxes, and obtain valid tracking targets and their tracking IDs;

[0032] Step S4: Use a depth estimation algorithm to obtain pixel-by-pixel depth information in the image;

[0033] Step S5: Model the three-dimensional planar information of the road surface using the automatically acquired or manually set road surface area;

[0034] Step S6: Obtain the vehicle's three-dimensional coordinate information based on the vehicle target's position in the image;

[0035] Step S7: Calculate the positional changes of the target vehicle in multiple consecutive frames of images to obtain the speed of the target vehicle.

[0036] Specifically, the above methods include:

[0037] A frame of image is acquired from the video stream. Using general object detection technology, the position and category information of specific target objects such as vehicles in the image frame can be obtained. The position information includes the bounding box coordinates of each target in the image frame. Based on the target position and category information detected from consecutive image frames in the video stream, the target tracking algorithm is used to match the targets in different image frames and obtain multiple target trajectories. Each trajectory includes its position coordinates in each image frame, the target category, and the tracking ID that distinguishes the target trajectory from other trajectories.

[0038] In this step, the object detection algorithm generally uses a general object detection algorithm based on deep learning. Pre-trained models can be used depending on the type of object in the scene. Depending on the type of object, video or image datasets can also be collected for training. Common general object detection models include YOLO series, Faster-RCNN and other network structures. The object detection algorithm is not the main technical point of this patent, so it will not be elaborated on here.

[0039] Target tracking algorithms typically use the overlap rate of bounding boxes as a calculation metric. This metric calculates the degree of overlap between targets appearing in adjacent frames. If two target boxes show significant overlap in their positions across two frames, they are more likely to be the same target. Algorithms like SORT employ this method. Alternatively, feature similarity between two targets can be used as a criterion, as DeepSORT uses. Target tracking algorithms can match the correspondence of the same vehicle target across different image frames, thus enabling vehicle speed measurement.

[0040] Image depth estimation often uses a monocular camera, taking a single frame as input, to estimate the distance of each pixel in the image from the camera. A typical depth estimation model includes an encoder module, a decoder module, and a pixel-wise depth regression module. The encoder extracts depth features from the input image, while the decoder, the reverse component of the encoder, continues to extract features and upscales the feature map to the original image resolution. Typically, the depth of the decoder is much smaller than that of the encoder. The pixel-wise depth regression module uses the image feature information obtained from the decoder to apply a linear regression algorithm to each pixel to obtain its depth value in the image.

[0041] After obtaining the depth information of each pixel in the image, the plane in which the lane is located can be obtained using the depth information. This application targets a fixed camera, so the area where the lane is located can be determined by predefined rules, and then the plane coordinates can be obtained by plane fitting using the depth values ​​within the lane area. Alternatively, a deep learning-based plane detection algorithm can be used to obtain the area where the lane is located through segmentation. However, plane detection algorithms are not the focus of this application and will not be discussed in detail here.

[0042] In addition, RANSAC (Random Sample Consensus) based algorithms can be used to obtain multiple planes in an image. In general road and highway scenes, areas other than the road surface are not flat, and the plane area is basically in the same plane as the road area. The RANSAC algorithm can be used to obtain the main plane in the image, that is, the plane area where the road is located.

[0043] For specific use cases, the actual road surface area can be manually marked in the camera image beforehand to facilitate subsequent algorithm implementation.

[0044] After obtaining the planar coordinates of the road, the vehicle's depth information can be obtained based on the vehicle position information obtained in step 1, thereby obtaining the vehicle's three-dimensional coordinates.

[0045] By acquiring multiple consecutive frames from a video stream and repeating the above operation, the vehicle's three-dimensional coordinates within these frames can be obtained. By calculating the difference in vehicle position between adjacent frames and obtaining the corresponding time difference using the frame number, the vehicle's speed can be calculated. By varying the number of frames added at intervals, the average speed of the vehicle over short or long time periods can be obtained.

[0046] This invention utilizes a depth estimation algorithm based on a single camera to estimate the spatial location information of each target in an image, thereby obtaining the motion between target frames and estimating the target velocity. This method has a simple structure, requires no other equipment besides the camera, reduces system complexity, and improves operational stability.

[0047] Example 2:

[0048] This invention also proposes a vehicle speed measurement system based on video target depth estimation, the system comprising:

[0049] Vehicle target detection and tracking module: acquires continuous image frame information from the image video stream; uses YOLO or other target detection models to detect target objects in the image and obtains the position and category information of the target objects; uses SORT or DeepSort target tracking algorithms to match and track the detected targets, filters low-probability detection boxes, and obtains effective tracking targets and their tracking IDs;

[0050] Image depth estimation module: Uses depth estimation algorithms to obtain pixel-by-pixel depth information in the image;

[0051] Road surface modeling and vehicle position estimation module: Models the three-dimensional planar information of the road surface using automatically acquired or manually set road surface areas; obtains the three-dimensional coordinate information of the vehicle target based on its position in the image;

[0052] Vehicle speed calculation module: Calculates the positional changes of the target vehicle in multiple consecutive frames of images to obtain the speed of the target vehicle.

[0053] Specifically, the target detection and tracking module includes:

[0054] A frame of image is acquired from the video stream. Using general object detection technology, the position and category information of specific target objects such as vehicles in the image frame can be obtained. The position information includes the bounding box coordinates of each target in the image frame. Based on the target position and category information detected from consecutive image frames in the video stream, the target tracking algorithm is used to match the targets in different image frames and obtain multiple target trajectories. Each trajectory includes its position coordinates in each image frame, the target category, and the tracking ID that distinguishes the target trajectory from other trajectories.

[0055] In this step, the object detection algorithm generally uses a general object detection algorithm based on deep learning. Pre-trained models can be used depending on the type of object in the scene. Depending on the type of object, video or image datasets can also be collected for training. Common general object detection models include YOLO series, Faster-RCNN and other network structures. The object detection algorithm is not the main technical point of this patent, so it will not be elaborated on here.

[0056] Target tracking algorithms typically use the overlap rate of bounding boxes as a calculation metric. This metric calculates the degree of overlap between targets appearing in adjacent frames. If two target boxes show significant overlap in their positions across two frames, they are more likely to be the same target. Algorithms like SORT employ this method. Alternatively, feature similarity between two targets can be used as a criterion, as DeepSORT uses. Target tracking algorithms can match the correspondence of the same vehicle target across different image frames, thus enabling vehicle speed measurement.

[0057] The image depth estimation module includes: Image depth estimation typically uses a monocular camera, taking a single frame image as input, and estimates the distance of each pixel in the image from the camera. The depth estimation model usually includes an encoder module, a decoder module, and a pixel-wise depth regression module; the encoder extracts depth features from the input image, and the decoder, being the reverse component of the encoder, continues to extract features and enlarges the feature map to the original image resolution; typically, the depth of the decoder is much smaller than that of the encoder; the pixel-wise depth regression module uses the image feature information obtained from the decoder to apply a linear regression algorithm to each pixel to obtain the depth value of that pixel in the image.

[0058] The road surface modeling and vehicle position estimation module includes: after obtaining the depth information of each pixel in the image, the depth information can be used to obtain the plane in which the lane is located. This application targets a fixed camera, so the area where the lane is located can be determined by predefined rules, and then the plane coordinates can be obtained by plane fitting using the depth values ​​within the lane area. Alternatively, a deep learning-based plane detection algorithm can be used to obtain the area where the lane is located through segmentation. However, the plane detection algorithm is not a key technical feature of this application and will not be discussed in detail here.

[0059] In addition, RANSAC (Random Sample Consensus) based algorithms can be used to obtain multiple planes in an image. In general road and highway scenes, areas other than the road surface are not flat, and the plane area is basically in the same plane as the road area. The RANSAC algorithm can be used to obtain the main plane in the image, that is, the plane area where the road is located.

[0060] For specific use cases, the actual road surface area can be manually marked in the camera image beforehand to facilitate subsequent algorithm implementation.

[0061] After obtaining the planar coordinates of the road, the vehicle's depth information can be obtained based on the vehicle position information obtained in step 1, thereby obtaining the vehicle's three-dimensional coordinates.

[0062] The vehicle speed calculation module includes: acquiring multiple consecutive frames of images from a video stream; repeating the above operation to obtain the vehicle's three-dimensional coordinates in the multiple consecutive frames; calculating the difference in vehicle position between adjacent frames; obtaining the corresponding time difference based on the frame number to obtain the vehicle's speed; and by changing the number of frames added, the average speed of the vehicle over a short or long time period can be obtained.

[0063] Although exemplary embodiments have been described herein with reference to the accompanying drawings, it should be understood that the above exemplary embodiments are merely illustrative and are not intended to limit the scope of the invention thereto. Various changes and modifications can be made therein by those skilled in the art without departing from the scope and spirit of the invention. All such changes and modifications are intended to be included within the scope of the invention as claimed in the appended claims.

[0064] Numerous specific details are set forth in the specification provided herein. However, it will be understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this specification.

[0065] Similarly, it should be understood that, in order to streamline the invention and aid in understanding one or more of the various aspects of the invention, features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof in the description of exemplary embodiments of the invention. However, this method of the invention should not be construed as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as reflected in the corresponding claims, its inventive point lies in solving the corresponding technical problem with features that are fewer than all the features of a single disclosed embodiment. Therefore, the claims following the detailed description are hereby expressly incorporated into that detailed description, wherein each claim itself is a separate embodiment of the invention. Those skilled in the art will understand that, apart from the mutual exclusion of features, any combination can be used to combine all features disclosed in this specification (including the accompanying claims, abstract, and drawings) and all processes or elements of any method or apparatus so disclosed. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract, and drawings) may be replaced by an alternative feature that serves the same, equivalent, or similar purpose.

[0066] Furthermore, those skilled in the art will understand that although some embodiments described herein include certain features but not others included in other embodiments, combinations of features from different embodiments are intended to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments can be used in any combination.

[0067] It should be noted that the above embodiments are illustrative of the invention and not restrictive, and that those skilled in the art can devise alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses should not be construed as limiting the claims. The word "comprising" does not exclude the presence of components or steps not listed in the claims. The word "a" or "an" preceding a component does not exclude the presence of a plurality of such components. The invention can be implemented by means of hardware comprising several different components and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by the same item of hardware. The use of the words first, second, and third, etc., does not indicate any order. These words can be interpreted as names.

[0068] The above description is merely a specific embodiment of the present invention or an explanation of that embodiment. The scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this invention should be included within the scope of protection of this invention. The scope of protection of this invention should be determined by the scope of the claims.

Claims

1. A vehicle speed measurement method based on video target depth estimation, the method comprising: Step S1: Obtain continuous image frame information from the image video stream; Step S2: Use YOLO or other object detection models to detect target objects in the image and obtain the position and category information of the target objects; Step S3: Use SORT or DeepSort object tracking algorithms to match and track the detected targets, filter low-probability detection boxes, and obtain effective tracking targets and their tracking IDs; Step S4: Use depth estimation algorithms to obtain pixel-by-pixel depth information in the image; Step S5: Use automatically acquired or manually set road surface areas to model the three-dimensional planar information of the road surface; Step S6: Obtain the three-dimensional coordinate information of the vehicle based on the position of the vehicle target in the image; Step S7: Calculate the positional changes of the target vehicle in multiple consecutive frames of images to obtain the speed of the target vehicle; The methods specifically include: A frame of image is acquired from the video stream. A general object detection technique is used to obtain the position and category information of vehicles and specific target objects in the image frame. The position information includes the bounding box coordinates of each target in the image frame. Based on the target position and category information detected from consecutive image frames in the video stream, a target tracking algorithm is used to match targets in different image frames to obtain multiple target trajectories. Each trajectory includes its position coordinates in each image frame, the target category, and a tracking ID that distinguishes the target trajectory from other trajectories. The method specifically includes: image depth estimation using a monocular camera, taking a single frame image as input, estimating the distance of each pixel in the image from the camera; the depth estimation model includes an encoder module, a decoder module, and a pixel-wise depth regression module; the encoder extracts depth features from the input image, and the decoder is the reverse component of the encoder, extracting features and enlarging the feature map to the original image resolution; the depth of the decoder is smaller than that of the encoder; the pixel-wise depth regression module uses the image feature information obtained from the decoder to apply a linear regression algorithm to each pixel to obtain the depth value of the pixel in the image; The method specifically includes: using depth information to obtain the plane where the lane is located, the area where the lane is located is determined by predefined rules, and then using the depth values ​​in the lane area to obtain the plane coordinates by plane fitting or using a deep learning-based plane detection algorithm to obtain the area where the lane is located by segmentation; after obtaining the plane coordinates of the road, the depth information of the vehicle is obtained based on the obtained vehicle position information to obtain the three-dimensional coordinates of the vehicle.

2. The vehicle speed measurement method based on video target depth estimation as described in claim 1, characterized in that: The method specifically includes: acquiring multiple consecutive frames of images from a video stream, repeating the process of acquiring multiple consecutive frames of images from the video stream, obtaining the three-dimensional coordinates of the vehicle in the multiple consecutive frames of images, calculating the difference in vehicle position between adjacent frames, obtaining the corresponding time difference through the frame number, and calculating the vehicle speed.

3. A vehicle speed measurement system based on video target depth estimation, characterized in that: The system includes: Vehicle target detection and tracking module: Acquires continuous image frame information from the image video stream; detects target objects in the image using YOLO or other target detection models, and obtains the position and category information of the target objects; uses SORT or DeepSort target tracking algorithms to match and track the detected targets, filters low-probability detection boxes, and obtains effective tracking targets and their tracking IDs; Image depth estimation module: Uses depth estimation algorithms to obtain pixel-by-pixel depth information in the image; Road surface modeling and vehicle position estimation module: Models the three-dimensional planar information of the road surface using automatically acquired or manually set road surface areas; obtains the three-dimensional coordinate information of the vehicle based on the position of the vehicle target in the image; Vehicle speed calculation module: Calculates the position change of the target vehicle in multiple consecutive frames of images, thereby obtaining the speed of the target vehicle; The vehicle target detection and tracking module includes: A frame of image is acquired from the video stream. A general object detection technique is used to obtain the position and category information of vehicles and specific target objects in the image frame. The position information includes the bounding box coordinates of each target in the image frame. Based on the target position and category information detected from consecutive image frames in the video stream, a target tracking algorithm is used to match targets in different image frames to obtain multiple target trajectories. Each trajectory includes its position coordinates in each image frame, the target category, and a tracking ID that distinguishes the target trajectory from other trajectories. The image depth estimation module includes: a monocular camera is used for image depth estimation, with a single frame image as input, to estimate the distance of each pixel in the image from the camera. The depth estimation model includes an encoder module, a decoder module, and a pixel-wise depth regression module. The encoder extracts depth features from the input image, and the decoder is the reverse component of the encoder, extracting features and upscaling the feature map to the original image resolution. The depth of the decoder is less than that of the encoder. The pixel-wise depth regression module uses the image feature information obtained from the decoder to apply a linear regression algorithm to each pixel to obtain the depth value of that pixel in the image. The road surface modeling and vehicle position estimation module includes: using depth information to obtain the plane where the lane is located, the area where the lane is located is determined by predefined rules, and then using the depth values ​​in the lane area to obtain the plane coordinates by plane fitting or using a deep learning-based plane detection algorithm to obtain the area where the lane is located by segmentation; after obtaining the plane coordinates of the road, the depth information of the vehicle is obtained according to the obtained vehicle position information to obtain the three-dimensional coordinates of the vehicle.

4. The vehicle speed measurement system based on video target depth estimation as described in claim 3, characterized in that: The vehicle speed calculation module includes: acquiring multiple consecutive frames of images from a video stream, repeating the process of acquiring multiple consecutive frames of images from the video stream, obtaining the three-dimensional coordinates of the vehicle in the multiple consecutive frames of images, calculating the difference in vehicle position between adjacent frames, obtaining the corresponding time difference through the frame number, and calculating the vehicle speed.