A traffic accident scene detection method and device
By optimizing the boundary based on geodesic distance and local curvature perception, and correcting the adaptive scale mapping field, the geometric and scale distortion problems of non-rigid targets in traffic accident scene detection are solved, and accurate size measurement and detection results are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG ZHIJIANG INTELLIGENT TRANSPORTATION TECH CO LTD
- Filing Date
- 2026-03-12
- Publication Date
- 2026-06-23
Smart Images

Figure CN121811342B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of intelligent transportation technology, and more specifically, to a method and apparatus for on-site detection of traffic accidents. Background Technology
[0002] Traffic accident scene detection is a crucial step in determining accident liability and reconstructing the accident process. Traditional methods rely on manual measurement with tape measures, total stations, or 3D laser scanning, which suffer from low efficiency, susceptibility to subjective bias, and the potential to cause secondary traffic congestion. In recent years, drone-based visual inspection technology has gained widespread attention due to its flexibility and efficiency. This technology primarily uses cameras mounted on drones to capture images of the scene and leverages computer vision algorithms for target recognition and size measurement.
[0003] However, existing methods for detecting traffic accident scenes typically apply edge detection methods and contour fitting algorithms designed for rigid objects directly to non-rigid targets, resulting in severe geometric distortion. Furthermore, they lack the perception and constraint of the target's inherent geometric properties, making it impossible to effectively distinguish between true morphological features (such as sharp corners of blood splatters) and segmentation artifacts. This leads to low accuracy in the detection results, deviating from the objective authenticity required for physical evidence identification. Summary of the Invention
[0004] In view of this, the purpose of this application is to provide a method and apparatus for detecting traffic accidents at the scene, so as to overcome at least one of the above-mentioned defects.
[0005] In a first aspect, embodiments of this application provide a method for detecting traffic accident scenes, including:
[0006] The system acquires color images of traffic accident scenes collected by drones, performs instance segmentation and semantic object detection on the color images, and obtains a mask set and a bounding box set. The mask set includes multiple masks without semantic labels, and the bounding box set includes multiple bounding boxes with semantic labels.
[0007] The matching degree between the mask and the bounding box is evaluated from multiple dimensions, and the semantic label with the highest matching degree is selected for each mask based on the evaluation results;
[0008] Based on the semantic label corresponding to each mask, non-rigid targets in the color image are identified. A boundary optimization strategy based on geodesic distance and local curvature perception is adopted to optimize the boundary of the mask corresponding to the non-rigid target and obtain the optimized geometric contour.
[0009] The basic optical scale of the image acquisition equipment is calibrated to determine the initial scale mapping.
[0010] Based on semantic label recognition of rigid reference objects in color images, the initial scale mapping of the target region where the rigid reference object is located is corrected by using the detection confidence of the rigid reference object to obtain the local scale mapping field.
[0011] By extending the local scale mapping field through linear interpolation, an adaptive scale mapping field is obtained. Based on the adaptive scale mapping field, the size parameters of each detection target are determined to obtain the detection results of traffic accident scenes.
[0012] In an optional implementation, a boundary optimization strategy based on geodesic distance and local curvature awareness is employed to optimize the boundary of the mask corresponding to the non-rigid target and obtain the optimized geometric contour. The steps include: constructing a geodesic distance field based on the geodesic distance between each pixel in the mask corresponding to the non-rigid target and the centroid of the mask; determining the geodesic gradient and local curvature at the boundary points in the mask corresponding to the non-rigid target, and constructing a weighting function based on the local curvature; constructing a boundary optimization energy function based on the geodesic gradient and the weighting function; and minimizing the boundary optimization energy function using the level set method to optimize the geometric contour of the non-rigid target.
[0013] In an optional implementation, the step of correcting the initial scale mapping of the target region where the rigid reference is located using the detection confidence of the rigid reference to obtain a local scale mapping field includes: determining a local correction factor based on the geometric dimensions of the rigid reference and the detection confidence; correcting the initial mapping using the local correction factor to determine a local scale mapping field, wherein the local scale mapping field is used to characterize the true ground length represented by a unit pixel at each pixel point in the target region.
[0014] In an optional implementation, the step of calibrating the basic optical scale of the image acquisition device and determining the initial scale mapping includes: determining the basic optical scale based on the internal parameters of the image acquisition device; determining the incident angle cosine correction term based on the UAV pose information and the line-of-sight vector; and determining the initial scale mapping using the basic optical scale and the incident angle cosine correction term.
[0015] In an optional implementation, the size parameters include perimeter and area, and the adaptive scale mapping field includes the scale mapping value of each pixel. The step of determining the size parameters of each detection target based on the adaptive scale mapping field to obtain the traffic accident scene detection results includes: for each detection target, accumulating the squares of the scale mapping values of each pixel within the mask corresponding to the detection target to determine the area of the detection target; and determining the perimeter of the detection target based on the sum of the scale mapping values of the boundary pixels within the mask corresponding to the detection target.
[0016] In an optional implementation, the step of extending the local scale mapping field by linear interpolation to obtain an adaptive scale mapping field includes: using the scale mapping values of neighboring high-confidence regions, extending the scale of regions without rigid reference objects by linear interpolation to obtain an adaptive scale mapping field.
[0017] In an optional implementation, the method further includes introducing a terrain undulation compensation term in an adaptive scale mapping field, the terrain undulation compensation term being determined based on relative elevation and the flight altitude of the UAV.
[0018] In an optional implementation, the evaluation result includes a comprehensive matching score, which evaluates the degree of matching between the mask and the bounding box from multiple dimensions. The step of selecting the semantic label with the highest matching degree for each mask based on the evaluation result includes: for each mask, evaluating the degree of matching between the mask and the bounding box from three dimensions: spatial overlap, geometric center alignment, and contour similarity, and determining the comprehensive matching score between the mask and different bounding boxes; and determining the semantic label corresponding to the bounding box with the highest comprehensive matching score in the bounding box set as the semantic label with the highest matching degree with the mask.
[0019] In an optional implementation, the spatial overlap is determined based on the intersection-union ratio between the mask and the bounding box, the geometric center alignment is determined based on the centroid distance between the mask and the bounding box, and the contour similarity is determined based on the point set similarity between the mask boundary and the bounding box boundary.
[0020] Secondly, embodiments of this application also provide a traffic accident scene detection device, the device comprising:
[0021] The segmentation and detection module is used to acquire UAV pose information and color images of traffic accident scenes collected by UAV. It performs instance segmentation and semantic object detection on the color images to obtain a mask set and a bounding box set. The mask set includes multiple masks without semantic labels, and the bounding box set includes multiple bounding boxes with semantic labels.
[0022] The semantic mask alignment module is used to evaluate the degree of matching between the mask and the bounding box from multiple dimensions, and selects the semantic label with the highest matching degree for each mask based on the evaluation results;
[0023] The contour optimization module is used to identify non-rigid targets in the color image based on the semantic label corresponding to each mask. It adopts a boundary optimization strategy based on geodesic distance and local curvature perception to optimize the boundary of the mask corresponding to the non-rigid target and obtain the optimized geometric contour.
[0024] The initial mapping determination module is used to correct the basic optical scale of the image acquisition device and determine the initial scale mapping;
[0025] The local scale field determination module is used to identify rigid reference objects in color images based on semantic labels. It uses the detection confidence of rigid reference objects to correct the initial scale mapping of the target region where the rigid reference objects are located in the color image, so as to obtain the local scale field.
[0026] The accident detection module is used to extend the local scale mapping field through linear interpolation to obtain an adaptive scale mapping field. Based on the adaptive scale mapping field, the size parameters of each detection target are determined to obtain the traffic accident scene detection results.
[0027] The embodiments of this application bring the following beneficial effects:
[0028] This application provides a traffic accident scene detection method and apparatus that can optimize the geometric contour of non-rigid targets through a boundary optimization strategy based on geodesic distance and local curvature perception, avoiding the problem of geometric distortion of non-rigid targets. At the same time, it uses the detection confidence of rigid reference objects to correct the initial scale mapping in order to construct an adaptive scale mapping field, overcoming the scale distortion problem caused by perspective distortion at the image edge in traditional global fixed scales. This achieves accurate size measurement of the detected targets in traffic accident scenes. Compared with existing traffic accident scene detection methods, it solves the problems of scale distortion and low accuracy of detection results.
[0029] To make the above-mentioned objectives, features and advantages of this application more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description
[0030] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0031] Figure 1 A flowchart of the traffic accident scene detection method provided in the embodiments of this application is shown;
[0032] Figure 2 A flowchart illustrating the mask and semantic tag matching steps provided in an embodiment of this application is shown;
[0033] Figure 3 A flowchart illustrating the mask boundary optimization steps for non-rigid targets provided in this application embodiment is shown;
[0034] Figure 4A flowchart illustrating the steps for determining the initial scale mapping provided in an embodiment of this application is shown;
[0035] Figure 5 A flowchart illustrating the steps for determining the local scale mapping field provided in an embodiment of this application is shown;
[0036] Figure 6 A schematic diagram of the structure of the traffic accident scene detection device provided in the embodiment of this application is shown;
[0037] Figure 7 A schematic diagram of the structure of the electronic device provided in the embodiments of this application is shown. Detailed Implementation
[0038] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. The components of the embodiments of this application described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of this application. Based on the embodiments of this application, every other embodiment obtained by those skilled in the art without inventive effort falls within the scope of protection of this application.
[0039] The key terms involved in the embodiments of this application are defined as follows:
[0040] Rigid targets refer to objects with fixed geometric shapes and stable physical dimensions at the scene of a traffic accident. Their shape is not easily deformed under normal observation conditions. Rigid targets include, but are not limited to, vehicle bodies, license plates, wheels, and traffic signs.
[0041] Non-rigid targets: These are physical evidence targets at traffic accident scenes that are irregularly shaped, lack a fixed topological structure, and whose boundaries are easily affected by light, shadow, or texture noise. Their physical form does not have rigid constraints, including but not limited to: bloodstains, oil stains, glass fragments, tire friction marks, and scattered soil.
[0042] Geodesic distance: Within a specific constrained area (such as the target area defined by a mask), the shortest path length between two points along the boundary / inside of the constrained area. The path must be entirely within the constrained area and cannot cross the external background or obstacles.
[0043] Distance field: In discrete space (such as an image pixel grid) or continuous space, each location (such as a pixel or 3D point) is assigned a "distance value to a reference target". All locations and their corresponding distance values constitute a "numerical field". Simply put, it is "to attach a distance label to each point in space, and the label value is the specific distance from the point to the reference object".
[0044] Adaptive scaling field: refers to a local scaling factor function covering each pixel in an image. The unit is millimeters per pixel. It is used to convert geometric quantities (such as area and perimeter) in pixel coordinates into real physical dimensions. Its value changes dynamically with the image position to compensate for perspective distortion and terrain undulation.
[0045] To facilitate understanding of this embodiment, the following description uses the application of the traffic accident scene detection method provided in this application to a terminal device as an example to illustrate the exemplary steps provided in this application embodiment. The terminal device can be an airborne edge device of a drone.
[0046] Please see Figure 1 , Figure 1 This is a flowchart illustrating a traffic accident scene detection method provided in an embodiment of this application. Figure 1 As shown in the embodiments of this application, the method for detecting traffic accidents at the scene includes:
[0047] Step S101: Obtain color images of the traffic accident scene collected by the drone, perform instance segmentation and semantic object detection on the color images respectively, and obtain a mask set and a bounding box set.
[0048] The drone hovers 20 to 50 meters above the traffic accident scene and takes a top-down view of the accident scene from an angle approximately directly above it as a color image. That is, the optical axis of the image acquisition device on the drone is basically perpendicular to the ground. For example, the angle between the camera optical axis and the vertical direction perpendicular to the ground is within a preset range (such as ±5°).
[0049] Color images can refer to RGB images. While acquiring color images, UAV pose information can also be obtained from the Inertial Measurement Unit (IMU) and the Global Navigation Satellite System (GNSS). The UAV pose information is used to verify whether the shooting angle of the image acquisition device meets the condition of being approximately perpendicular to the ground. When the condition of being approximately perpendicular to the ground is met, instance segmentation and semantic target detection can be performed on the color images respectively.
[0050] In one embodiment, multiple color images can be obtained through an image acquisition device. Each color image can be input into a general instance segmentation model and an object detection model to obtain the mask set output by the general instance segmentation model and the bounding box set output by the object detection model, thereby obtaining the traffic accident scene detection result corresponding to the color image.
[0051] For example, a general instance segmentation model can be adopted using the FastSAM model or MobileSAM, which is suitable for extracting fine contours of diverse targets such as vehicles, debris, and human bodies at traffic accident scenes. The general instance segmentation model adopts a lightweight encoder-decoder network structure. The encoder is composed of multiple depthwise separable convolutional layers and pointwise convolutional layers stacked together, which gradually reduces the spatial resolution while increasing the channel dimension to efficiently capture multi-scale contextual features. The decoder fuses with the skip connection features from the encoder through upsampling operations to gradually restore spatial details and finally outputs a set of binary masks without semantic labels (category labels).
[0052] Object detection models can refer to semantic object detection models, such as YOLOv8, RT-DETR, or GroundingDINO. These models possess strong semantic discriminative capabilities, accurately identifying key investigation objects such as cars, motorcycles, license plates, and bloodstains. However, their boundaries are typically axis-aligned rectangles, making it difficult to precisely fit the true contours of irregular objects. Object detection models are built upon convolutional neural networks, locating semantic objects in images through region proposals or end-to-end query mechanisms, and outputting a set of bounding boxes carrying semantic labels (category labels) and detection confidence scores.
[0053] The mask set includes multiple masks without semantic labels, and the bounding box set includes multiple bounding boxes with semantic labels. The boundary of each mask is more refined and clearer than the boundary of the bounding box.
[0054] Step S102: Evaluate the matching degree between the mask and the bounding box from multiple dimensions, and select the semantic label with the highest matching degree for each mask based on the evaluation results.
[0055] After determining the mask set and the bounding box set, the masks and bounding boxes need to be aligned so that the bounding box with the highest matching degree is selected for each mask.
[0056] In one embodiment, the semantic label with the highest matching degree for each mask can be selected by a comprehensive matching score. The evaluation result includes a comprehensive matching score, which is used to characterize the degree of matching between the mask and the bounding box.
[0057] The following reference Figure 2 This section will introduce the process of matching masks with semantic tags.
[0058] Figure 2 The flowchart illustrating the mask and semantic tag matching steps provided in the embodiments of this application is shown, as follows: Figure 2 As shown, the steps for matching the mask with the semantic tags include:
[0059] Step S1021: For each mask, evaluate the degree of matching between the mask and the bounding box from three dimensions: spatial overlap, geometric center alignment, and contour similarity, and determine the comprehensive matching score between the mask and different bounding boxes.
[0060] The spatial overlap, geometric center alignment, and contour similarity between the mask and different bounding boxes are determined respectively. Then, a comprehensive matching score is determined based on the spatial overlap, geometric center alignment, and contour similarity.
[0061] Spatial overlap is determined based on the intersection-over-union ratio (IoU) between the mask and the bounding box. The IoU can be used to measure the mask's overlap. With bounding box projection area The formulas for calculating the overlap and intersection-union ratio between them are:
[0062] ;
[0063] In the above formula, Indicates the bounding box The set of pixels after conversion to a binary region (projected region); Indicates mask With bounding box The crossover ratio between them The higher the value, the higher the mask. With bounding box The more consistent the spatial relationship, the more susceptible the mask and bounding box are to being dominated by large targets or missed by small targets. However, using the intersection-union ratio alone to measure the degree of matching between the mask and the bounding box is easily affected by the dominance of large targets or the failure to detect small targets.
[0064] Geometric center alignment is determined based on the centroid distance between the mask and the bounding box. Geometric center alignment is used to suppress erroneous matches with severely offset centroids. The centroid distance is denoted as: The formula for calculating the normalized centroid distance is:
[0065] ;
[0066] In the above formula, and They represent masks respectively. With bounding box The coordinates of the centroid; and Representing bounding boxes The width and height; the denominator in the above formula represents the bounding box. Half the diagonal length is used for scale normalization to make the distances between targets of different sizes comparable. The smaller the centroid distance, the closer the mask is to the center of the bounding box, and the higher the degree of matching.
[0067] Contour similarity is determined based on the point set similarity between the mask boundary and the bounding box boundary. The Chamfer distance is used to approximate the Hausdorff distance to measure the mask boundary. With bounding box boundary Contour similarity. Ideally, the Hausdorff distance could be used directly, but its computational complexity is high. Therefore, this embodiment uses the more efficient Chamfer distance, calculated as follows:
[0068] ;
[0069] In the above formula, p represents the mask boundary. The elements in the table represent sampling points on the boundary of the mask to be matched; q represents the bounding box boundary. The elements in the bounding box represent sampling points on the boundary of the bounding box.
[0070] To eliminate the scale effect, the Chamfer distance can be normalized. The normalized Chamfer distance is: ,in, Represents bounding box The length of the diagonal.
[0071] After determining the spatial overlap, geometric center alignment, and contour similarity between the mask and the bounding box, the comprehensive matching score can be calculated using the following formula:
[0072] ;
[0073] In the above formula, Indicates mask With bounding box Overall matching score between them; and This represents an adjustable weighting coefficient used to balance the importance of various constraints; Indicates geometric center alignment, centroid distance Through exponential decay A scoring function for geometric center alignment was constructed, which can effectively filter out false positive matches with severe center misalignment while preserving reasonable offsets (such as slight offsets caused by occlusion); the normalized Chamfer distance was then used to... The model constructs a contour similarity scoring function, which gives higher scores to masks and bounding boxes with closer contours, in order to distinguish targets with similar shapes but different categories (such as cars and SUVs), effectively improving alignment robustness.
[0074] Step S1022: Determine the semantic label corresponding to the bounding box with the highest comprehensive matching score in the bounding box set as the semantic label with the highest matching degree with the mask.
[0075] Determine the overall matching score between the mask and each bounding box in the bounding box set. Select the highest overall matching score from multiple overall matching scores and determine whether the overall matching score meets the minimum score requirement. If the overall matching score is greater than the set score threshold, it is determined that the minimum score requirement is met; if the overall matching score is less than or equal to the set score threshold, it is determined that the minimum score requirement is not met.
[0076] If the minimum score requirement is met, the bounding box corresponding to the highest comprehensive matching score is taken as the bounding box with the highest matching degree with the mask. Each bounding box carries a semantic label (category label), and the semantic label corresponding to the bounding box is the semantic label with the highest matching degree with the mask.
[0077] If the minimum score requirement is not met, it will be marked as "no semantic label with the highest matching degree to the mask was identified" and will enter the manual review queue.
[0078] Step S103: Based on the semantic label corresponding to each mask, identify non-rigid targets in the color image, and use a boundary optimization strategy based on geodesic distance and local curvature perception to optimize the boundary of the mask corresponding to the non-rigid target to obtain the optimized geometric contour.
[0079] Color images contain multiple detection targets, which can be categorized into rigid and non-rigid targets. Since semantic labels are category labels, the rigid and non-rigid targets in the color image can be identified based on the semantic label with the highest matching degree selected for each mask. Rigid targets have regular boundaries and high segmentation quality, so their size parameters can be directly calculated using the masks obtained from instance segmentation. Non-rigid targets (such as bloodstains and oil stains) require mask boundary optimization to calculate their size parameters using the optimized masks.
[0080] Boundary optimization strategies based on geodesic distance and local curvature perception can refer to strategies that optimize boundaries based on geodesic distance and local curvature perception, in order to actively preserve key geometric features of forensic significance such as sharp corners, branches, and splash trails while suppressing noise.
[0081] The following reference Figure 3 This section introduces the mask boundary optimization process for non-rigid targets.
[0082] Figure 3 A flowchart illustrating the mask boundary optimization steps for non-rigid targets provided in this application embodiment is shown, as follows: Figure 3 As shown, the mask boundary optimization steps for non-rigid targets include:
[0083] Step S1031: Construct a geodesic distance field based on the geodesic distance between each pixel in the mask corresponding to the non-rigid target and the centroid of the mask.
[0084] Inside the mask corresponding to a non-rigid target, define the centroid of the mask. To any pixel inside the mask The geodetic distance is Unlike Euclidean distance, geodesic distance forces the path to travel inside the mask and cannot cross the background area, thus reflecting the "intrinsic geometry" of the non-rigid target itself.
[0085] A distance field is constructed based on the defined geodesic distances. This distance field is called the geodesic distance field, and it can be solved using the Fast Marching Method (FMM), i.e., by solving the following Eikonal equation: , .
[0086] The geodesic distance field obtained by solving is used to characterize the distance between each pixel and the centroid of the mask. The value is the smallest at the center of the mask and increases monotonically towards the mask boundary. It also exhibits abrupt gradient changes at narrow channels or sharp corners, naturally encoding the local geometric complexity.
[0087] Step S1032: Determine the geodesic gradient and local curvature at the boundary points in the mask corresponding to the non-rigid target, and construct a weighting function based on the local curvature at the boundary points.
[0088] Boundary points can refer to every pixel-level boundary point on the mask contour, which can be obtained through edge detection or contour tracing. Calculating boundary points... Geodetic gradient at the location The direction of the geodesic gradient points to the steepest ascent path inside the mask, which is the direction in which the distance increases the fastest along the inside of the mask of a non-rigid target.
[0089] The geodesic gradient points in the tangent direction of the shortest intrinsic path from the boundary point back along the interior of the mask to the centroid of the mask. The magnitude of the geodesic gradient reflects the depth flow of the boundary point within the overall geometry of the non-rigid target. This geodesic gradient is used to drive boundary evolution; for example, in low curvature regions, contraction in the opposite direction of the gradient can smooth jagged edges; in high curvature regions, due to the suppression by the weighting function, it remains almost stationary, thus preserving sharp corners.
[0090] The weighting function is denoted as: , ,in, This represents the local curvature (discrete curvature) at the boundary point p. This represents the smoothing intensity coefficient, used to control the degree of preservation in high curvature areas. This value can be dynamically set according to the target type (e.g., bloodstains require higher fidelity, while oil stains can be moderately smoothed).
[0091] Wherein, the local curvature at boundary point p Estimation can be performed using the three-point method, for example: setting three consecutive points on the boundary as... , , Then we have:
[0092] ;
[0093] In the above formula, Represents boundary points Discrete curvature at; , , These represent the previous adjacent boundary point, the current target point, and the next adjacent boundary point, respectively.
[0094] The calculation formula is based on the relationship between the area and side length of a triangle, and can stably estimate the degree of local curvature.
[0095] Step S1033: Construct the boundary optimization energy function based on the geodesic gradient and weight function.
[0096] To avoid feature loss caused by traditional smoothing methods in high-curvature regions (such as sharp angles formed by blood splatter), this application introduces a locally curvature-aware weighting function to dynamically adjust the smoothing intensity. To this end, a boundary optimization energy function can be constructed based on the geodesic gradient and the weighting function. The boundary optimization energy function is as follows:
[0097] ;
[0098] when When larger (such as sharp corners), Approaching 0, the energy term contributes little, and the boundary hardly moves; when When it is smaller (such as a straight segment), then When the value is close to 1, the gradient-driven effect is significant, thus achieving effective noise reduction.
[0099] Step S1034: Minimize the boundary optimization energy function using the level set method to optimize the geometric profile of the non-rigid target.
[0100] Minimize the boundary optimization energy function using the level set method. This is used to iteratively update the geometric contour of the non-rigid target. Let the signed distance function be... The distance function is the core optimization variable of the energy function, and the energy function is the guiding principle. The evolutionary objective is to establish a strong correlation between the two through energy minimization, with the ultimate goal of making... The zero level set ( This indicates that point p is on the target boundary. (M) precisely matches the actual target contour, where... This indicates that point p is inside a non-rigid target. Let point p be outside the non-rigid target, and its evolution equation be:
[0101] ;
[0102] when When approaching 0, the boundary optimization energy function Reaching the minimum value, at this time The zero level set is the optimized target boundary, which is consistent with the real target contour. The entire geometric contour optimization process automatically handles topological changes (such as hole closure and broken connections) without manual intervention.
[0103] Step S104: Correct the basic optical scale of the image acquisition device and determine the initial scale mapping.
[0104] After completing the boundary optimization of non-rigid targets, a pixel-level adaptive scale mapping field can be constructed. The adaptive scale mapping field is used to establish an independent physical scale mapping for each pixel in the color image, thereby overcoming the serious distortion problem in the drone overhead shooting scene under the assumption of a globally fixed scale in traditional methods.
[0105] When establishing an adaptive scale mapping field, the initial scale mapping can be determined first.
[0106] The following reference Figure 4 This section will introduce the process of determining the initial scale mapping.
[0107] Figure 4 A flowchart illustrating the steps for determining the initial scale mapping provided in an embodiment of this application is shown, as follows: Figure 4 As shown, the steps for determining the initial scale mapping include:
[0108] Step S1041: Determine the basic optical scale based on the internal parameters of the image acquisition device.
[0109] Image acquisition device can refer to camera, and internal parameters can refer to the camera's intrinsic parameters, including but not limited to: the physical size s of a single pixel of the camera sensor and the focal length f of the lens.
[0110] The basic optical scale reflects the physical length corresponding to a unit pixel on a horizontal ground directly below the optical axis. It is the theoretical starting point for scale modeling. The basic optical scale is denoted as: , Where h represents the vertical flight altitude of the UAV relative to the ground, and h can be obtained from the UAV's pose information; the units of s, h and f are all millimeters.
[0111] Step S1042: Based on the UAV pose information and line-of-sight vector, determine the incident angle cosine correction term.
[0112] Since the image acquisition device on the drone may be acquiring RGB images at an angle, this will cause a projection scaling effect. To compensate for the projection scaling effect, an incident angle cosine correction term can be set.
[0113] The incident angle cosine correction term can be determined based on the line-of-sight vector and the ground normal vector. The line-of-sight vector is assumed to be in the camera coordinate system, with its direction from the camera's optical center to the pixel (x, y). However, since the ground normal vector is in the world coordinate system, if the drone is shooting from a perfectly vertical position, the camera coordinate system and the world coordinate system are approximately equal, and the line-of-sight vector can be directly used as the line-of-sight vector in the world coordinate system without rotation. However, if the drone is tilted in any way, the line-of-sight vector and the ground normal vector are not in the same coordinate system, rendering the dot product meaningless. Therefore, a rotation matrix can be determined using the drone's pose information to convert the line-of-sight vector in the camera coordinate system to the line-of-sight vector in the world coordinate system. Then, using the line-of-sight vector in the world coordinate system and the ground normal vector, the incident angle cosine correction term is determined. The formula for calculating the incident angle cosine correction term is as follows:
[0114] ;
[0115] In the above formula, This represents the cosine correction term for the incident angle corresponding to any pixel in an RGB image. Indicates the local incident angle; n represents the ground normal vector, which by default... (Applicable to flat or gently sloping surfaces); v(x,y) represents the gaze vector, which is the vector pointing from the camera's optical center to the pixel (x,y).
[0116] Step S1043: Determine the initial scale mapping using the basic optical scale and the incident angle cosine correction term.
[0117] Because the effective projected area will increase This results in scaling; therefore, the initial scale mapping can be expressed as: ,in, This represents the initial scale mapping at pixel (x,y). The initial scale mapping is used to reduce stretching distortion, for example, to effectively alleviate the stretching distortion at the four corners of an RGB image caused by large-angle observation, making the scale closer to the real ground cover relationship.
[0118] Step S105: Based on semantic labels, identify rigid reference objects in the color image, and use the detection confidence of the rigid reference objects to correct the initial scale mapping of the target region where the rigid reference objects are located, so as to obtain the local scale mapping field.
[0119] To address local scale drift (such as height estimation errors and lens distortion residues), this application introduces a reference confidence weighted correction mechanism to correct scale drift.
[0120] Since each detected target in the color image has a semantic label with the highest matching degree, it is possible to determine whether the detected target is a rigid target or a non-rigid target based on the semantic label. For each rigid target, it is determined whether there is a reference object matching the rigid target in the preset reference object knowledge base. If there is a reference object matching the rigid target, then the rigid target is identified as a rigid reference object.
[0121] The preset reference object knowledge base records multiple reference objects. These reference objects are targets whose physical dimensions are highly standardized in reality and are not easily deformed by accidents. For example, for the vehicle body, due to the large differences in vehicle models and the easy deformation after a collision, it is not used as a reference object. For license plates, due to their hard material and uniform size, they can still be used approximately even if slightly bent, so they can be included in the preset reference object knowledge base.
[0122] For example, the preset reference object knowledge base records the following: the standard width of a small car license plate is 440mm, the standard diameter of a sedan wheel hub is 600 to 700mm, and the height of a traffic cone is approximately 750mm. During the detection process, if the category label of a rigid target (such as "license plate") exists in the preset reference object knowledge base, it is automatically identified as a rigid reference object.
[0123] The following reference Figure 5 This section will introduce the process of determining the local scale mapping field.
[0124] Figure 5 A flowchart illustrating the steps for determining the local scale mapping field provided in an embodiment of this application is shown, as follows: Figure 5 As shown, the steps for determining the local scale mapping field include:
[0125] Step S1051: Determine the local correction factor based on the geometric dimensions of the rigid reference object and the detection confidence level.
[0126] The geometric dimensions of the rigid reference object include its pixel width in the color image, denoted as: In theory, the true scale of the region containing the rigid reference object should be... , The geometric dimensions of rigid reference objects recorded in the predefined reference object knowledge base (e.g., the width of a standard license plate) are used as the reference. The currently determined predicted value is... The actual ratio versus the predicted ratio The ratio reflects the local scale bias, which is the local correction factor. The formula for calculating the local correction factor is:
[0127] ;
[0128] In the above formula, Represents pixels Local correction factor at the location; This represents the detection confidence of the rigid reference object. The detection confidence is determined by the target detection model and obtained through a Sigmoid mapping, which is used to suppress noise introduced by low-quality detection. This represents the correction strength coefficient, used to prevent overcorrection caused by low-confidence detection. Its value is usually between [0.5, 0.9], and the specific value can be determined through offline calibration or online adaptive strategy.
[0129] Step S1052: Correct the initial mapping using a local correction factor to determine the local scale mapping field.
[0130] The local scale mapping field is used to characterize the true ground length represented by a unit pixel at each pixel point in the target region. The local scale mapping field is denoted as: The formula for calculating the local scale mapping field is: The local scale mapping field includes the scale mapping value of each pixel in the color image.
[0131] In color images, most masks (such as non-rigid targets) do not contain any rigid references. Therefore, the region containing the mask lacks a direct local correction factor. To address this, several high-confidence correction points can be generated using rigid references. Then, interpolation is used to smoothly extend the local scale map field to the entire image, ensuring that each pixel (including the location of non-rigid targets) receives a reasonable scale value. In this case, each rigid reference acts as a correction sample point, and the scale map value at the center pixel of that correction sample point is used to correct the scale map values at other locations in the color image.
[0132] Step S106: The local scale mapping field is extended by linear interpolation to obtain an adaptive scale mapping field. The size parameters of each detection target are determined based on the adaptive scale mapping field to obtain the traffic accident scene detection results.
[0133] Color images may contain regions without rigid references. For these regions (such as grass, soil, or unidentified debris), the scale mapping values of neighboring high-confidence regions can be used to linearly interpolate and scale-expand the regions lacking rigid references, thus obtaining an adaptive scale mapping field and ensuring its continuity. Here, "nearby high-confidence regions" refers to regions that are closest to the target area and have the highest detection confidence. A number of regions that exceed the preset confidence threshold.
[0134] In one embodiment, the adaptive scale mapping field includes the scale mapping value of each pixel. The size parameters of each detected target can be determined based on the scale mapping values in the adaptive scale mapping field. The size parameters include, but are not limited to, perimeter and area.
[0135] Specifically, for each detected target, the area of the target is determined by summing the squares of the scale mapping values of each pixel within the mask corresponding to the target. For example, if there are 100 pixels in the mask, since each scale mapping value reflects the actual ground length at that pixel, the sum of the squares of these 100 scale mapping values can be used to determine the physical area of the detected target.
[0136] Furthermore, the perimeter of the detected target can be determined by summing the scale mapping values of the boundary pixels within the mask corresponding to the target. For example, if there are 200 pixels within the mask, the sum of these 200 scale mapping values is used to determine the physical perimeter of the detected target.
[0137] In one embodiment, after determining the size parameters of each detection target, a structured survey report is automatically generated to serve as the result of the traffic accident scene detection.
[0138] The structured investigation report includes the category of the detected target (such as bloodstains, cars), physical dimensions (area, perimeter, length), image coordinates, detection confidence level, and visualization overlay. It can be directly connected to the cloud platform of the traffic management department, supporting remote review and evidence archiving.
[0139] In one embodiment, to further improve the accuracy of the adaptive scale mapping field, a terrain undulation compensation term can be introduced into the adaptive scale mapping field, wherein the terrain undulation compensation term is determined based on the relative elevation of the pixel point relative to the average ground and the flight altitude of the UAV.
[0140] Specifically, if the drone captures multiple frames of a traffic accident scene from multiple perspectives, the SfM (Structure from Motion) algorithm can be run to generate a sparse point cloud of the traffic accident scene, and the relative elevation of each pixel can be estimated based on the sparse point cloud; if only a single frame is captured of the traffic accident scene, a lightweight monocular depth network (such as MiDaS-Mobile) can be selected to output a depth map, and the relative elevation of each pixel can be estimated based on the depth map.
[0141] After determining the relative elevation of each pixel, a terrain undulation compensation term can be constructed and introduced into the adaptive scale mapping field. This term further improves scale accuracy in scenarios involving ramps, shoulders, or vehicle stacking. The adaptive scale mapping field after introducing the terrain undulation compensation term is as follows:
[0142] ;
[0143] In the above formula, This indicates the compensation term for terrain undulations; This indicates the relative elevation of the pixel with respect to the average ground level (positive values indicate a raised area, and negative values indicate a recessed area). This represents the terrain sensitivity coefficient.
[0144] The traffic accident scene detection method provided in this application can optimize the geometric contour of non-rigid targets by using a boundary optimization strategy based on geodesic distance and local curvature perception, thus avoiding the problem of geometric distortion of non-rigid targets. At the same time, it uses the detection confidence of rigid reference objects to correct the initial scale mapping in order to construct an adaptive scale mapping field. This overcomes the scale distortion problem caused by perspective distortion at the image edges in traditional global fixed scales, and realizes accurate size measurement of the detected targets in traffic accident scenes, solving the problems of scale distortion and low accuracy of detection results.
[0145] Based on the same inventive concept, this application also provides a traffic accident scene detection device corresponding to the traffic accident scene detection method. Since the principle of the device in this application is similar to the traffic accident scene detection method described above in this application, the implementation of the device can refer to the implementation of the method, and the repeated parts will not be described again.
[0146] Please see Figure 6 , Figure 6 This is a schematic diagram of the structure of a traffic accident scene detection device provided in an embodiment of this application. Figure 6 As shown, the traffic accident scene detection device 200 includes:
[0147] The segmentation and detection module 201 is used to acquire UAV pose information and color images of traffic accident scenes collected by UAV, perform instance segmentation and semantic target detection on the color images respectively, and obtain a mask set and a bounding box set. The mask set includes multiple masks without semantic labels, and the bounding box set includes multiple bounding boxes with semantic labels.
[0148] The semantic mask alignment module 202 is used to evaluate the degree of matching between the mask and the bounding box from multiple dimensions, and select the semantic label with the highest matching degree for each mask based on the evaluation results;
[0149] The contour optimization module 203 is used to identify non-rigid targets in the color image based on the semantic label corresponding to each mask, and to perform boundary optimization on the mask corresponding to the non-rigid target using a boundary optimization strategy based on geodesic distance and local curvature perception to obtain the optimized geometric contour.
[0150] The initial mapping determination module 204 is used to correct the basic optical scale of the image acquisition device and determine the initial scale mapping;
[0151] The local mapping field determination module 205 is used to identify rigid reference objects in a color image based on semantic labels, and to correct the initial scale mapping of the target region where the rigid reference object is located in the color image by using the detection confidence of the rigid reference object, so as to obtain the local scale mapping field.
[0152] The accident detection module 206 is used to extend the local scale mapping field through linear interpolation to obtain an adaptive scale mapping field, and to determine the size parameters of each detection target based on the adaptive scale mapping field in order to obtain the traffic accident scene detection results.
[0153] Please see Figure 7 , Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Figure 7 As shown, the electronic device 300 includes a processor 310, a memory 320, and a bus 330.
[0154] The memory 320 stores machine-readable instructions executable by the processor 310. When the electronic device 300 is running, the processor 310 and the memory 320 communicate via the bus 330. When the machine-readable instructions are executed by the processor 310, they can perform the operations described above. Figure 1 The steps of the traffic accident scene detection method in the illustrated method embodiment can be found in the method embodiment for specific implementation methods, which will not be repeated here.
[0155] This application also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, can perform the above-described actions. Figure 1 The steps of the traffic accident scene detection method in the illustrated method embodiment can be found in the method embodiment for specific implementation methods, which will not be repeated here.
[0156] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0157] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. The apparatus embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. Furthermore, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Additionally, the shown or discussed mutual couplings, direct couplings, or communication connections may be through some communication interfaces; indirect couplings or communication connections between devices or units may be electrical, mechanical, or other forms.
[0158] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0159] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0160] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a processor-executable, non-volatile, computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0161] Finally, it should be noted that the above-described embodiments are merely specific implementations of this application, used to illustrate the technical solutions of this application, and not to limit them. The scope of protection of this application is not limited thereto. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features, within the scope of the technology disclosed in this application. Such modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and should all be covered within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method for detecting traffic accident scenes, characterized in that, include: A color image of a traffic accident scene captured by a drone is acquired. Instance segmentation and semantic object detection are performed on the color image to obtain a mask set and a bounding box set. The mask set includes multiple masks without semantic labels, and the bounding box set includes multiple bounding boxes with semantic labels. The matching degree between the mask and the bounding box is evaluated from multiple dimensions, and the semantic label with the highest matching degree is selected for each mask based on the evaluation results; Based on the semantic label corresponding to each mask, non-rigid targets in the color image are identified. A boundary optimization strategy based on geodesic distance and local curvature perception is adopted to optimize the boundary of the mask corresponding to the non-rigid target, thereby obtaining the optimized geometric contour. The basic optical scale of the image acquisition equipment is calibrated to determine the initial scale mapping. Based on the semantic tags, rigid reference objects in the color image are identified. The detection confidence of the rigid reference objects is used to correct the initial scale mapping of the target region where the rigid reference objects are located, so as to obtain a local scale mapping field. The local scale mapping field is extended by linear interpolation to obtain an adaptive scale mapping field. The size parameters of each detection target are determined based on the adaptive scale mapping field to obtain the traffic accident scene detection results.
2. The method according to claim 1, characterized in that, The step of using a boundary optimization strategy based on geodesic distance and local curvature perception to optimize the boundary of the mask corresponding to the non-rigid target and obtain the optimized geometric contour includes: Based on the geodesic distance between each pixel in the mask corresponding to the non-rigid target and the centroid of the mask, a geodesic distance field is constructed. Determine the geodesic gradient and local curvature at the boundary points in the mask corresponding to the non-rigid target, and construct a weighting function based on the local curvature; Based on the geodesic gradient and the weighting function, a boundary optimization energy function is constructed; The boundary optimization energy function is minimized using the level set method to optimize the geometric profile of the non-rigid target.
3. The method according to claim 1, characterized in that, The step of correcting the initial scale mapping of the target region where the rigid reference object is located using the detection confidence of the rigid reference object to obtain a local scale mapping field includes: Based on the geometry of the rigid reference object and the detection confidence level, a local correction factor is determined; The initial scale mapping is corrected using the local correction factor to determine the local scale mapping field, which is used to characterize the true ground length represented by a unit pixel at each pixel point in the target region.
4. The method according to claim 1, characterized in that, The step of calibrating the basic optical scale of the image acquisition device and determining the initial scale mapping includes: Determine the basic optical scale based on the internal parameters of the image acquisition device; Based on the UAV pose information and line-of-sight vector, the incident angle cosine correction term is determined; The initial scale mapping is determined using the basic optical scale and the incident angle cosine correction term.
5. The method according to claim 1, characterized in that, The size parameters include perimeter and area, the adaptive scale mapping field includes the scale mapping value of each pixel, and the step of determining the size parameters of each detected target based on the adaptive scale mapping field to obtain the traffic accident scene detection results includes: For each detected target, the square of the scale mapping value of each pixel in the mask corresponding to the detected target is accumulated to determine the area of the detected target; The perimeter of the detected target is determined by summing the scale mapping values of the boundary pixels within the mask corresponding to the detected target.
6. The method according to claim 1, characterized in that, The step of extending the local scale mapping field through linear interpolation to obtain an adaptive scale mapping field includes: By using the scale mapping values of neighboring high-confidence regions, the scale of regions without rigid references is extended by linear interpolation to obtain an adaptive scale mapping field.
7. The method according to claim 6, characterized in that, The method further includes: A terrain undulation compensation term is introduced into the adaptive scale mapping field, which is determined based on the relative elevation and the flight altitude of the UAV.
8. The method according to claim 1, characterized in that, The evaluation result includes a comprehensive matching score. The step of evaluating the matching degree between the mask and the bounding box from multiple dimensions and selecting the semantic label with the highest matching degree for each mask based on the evaluation result includes: For each mask, the degree of matching between the mask and the bounding box is evaluated from three dimensions: spatial overlap, geometric center alignment, and contour similarity, and the comprehensive matching score between the mask and different bounding boxes is determined. The semantic label corresponding to the bounding box with the highest comprehensive matching score in the bounding box set is determined as the semantic label with the highest matching degree with the mask.
9. The method according to claim 8, characterized in that, The spatial overlap is determined based on the intersection-union ratio between the mask and the bounding box, the geometric center alignment is determined based on the centroid distance between the mask and the bounding box, and the contour similarity is determined based on the point set similarity between the mask boundary and the bounding box boundary.
10. A traffic accident scene detection device, characterized in that, include: The segmentation and detection module is used to acquire color images of traffic accident scenes collected by drones, perform instance segmentation and semantic object detection on the color images respectively, and obtain a mask set and a bounding box set. The mask set includes multiple masks without semantic labels, and the bounding box set includes multiple bounding boxes with semantic labels. The semantic mask alignment module is used to evaluate the degree of matching between the mask and the bounding box from multiple dimensions, and selects the semantic label with the highest matching degree for each mask based on the evaluation results; The contour optimization module is used to identify non-rigid targets in the color image based on the semantic label corresponding to each mask, and to perform boundary optimization on the mask corresponding to the non-rigid target using a boundary optimization strategy based on geodesic distance and local curvature perception to obtain the optimized geometric contour. The initial mapping determination module is used to correct the basic optical scale of the image acquisition device and determine the initial scale mapping; The local scale field determination module is used to identify rigid reference objects in the color image based on the semantic labels, and to correct the initial scale mapping of the target region where the rigid reference object is located by using the detection confidence of the rigid reference object, so as to obtain the local scale field. The accident detection module is used to extend the local scale mapping field through linear interpolation to obtain an adaptive scale mapping field, and to determine the size parameters of each detection target based on the adaptive scale mapping field in order to obtain the traffic accident scene detection results.