Vehicle helmet detection method and apparatus, electronic device, and computer-readable storage medium
By extracting edge and background information from image samples and combining it with neural network training to optimize the model structure, the problems of slow detection speed and limited applicable scenarios in existing technologies have been solved, enabling real-time and accurate detection and early warning of the driver's helmet wearing status.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- NEXTVPU (SHANGHAI) CO LTD
- Filing Date
- 2025-12-05
- Publication Date
- 2026-06-25
AI Technical Summary
Existing computer vision technologies are slow in detecting whether drivers of two-wheeled and unicycles are wearing helmets, cannot achieve real-time detection, and have limited applicability, resulting in missed detections and false detections, as well as poor generalization ability.
By acquiring a vehicle helmet detection dataset, edge and background information of image samples are extracted, and a vehicle helmet detection model is established through training using a neural network model. Furthermore, the neural network structure is optimized by combining global average pooling, global max pooling, InnerCIoU loss function, and ReLU activation function to improve detection accuracy and speed.
It enables accurate helmet positioning in complex scenarios, reducing false positives and false negatives, and allows for real-time monitoring of helmet use under different conditions, improving the accuracy and speed of early warnings.
Smart Images

Figure CN2025140424_25062026_PF_FP_ABST
Abstract
Description
Vehicle helmet testing methods and apparatus, electronic devices and computer-readable storage media Technical Field
[0001] This disclosure relates to the field of computer vision technology, and in particular to methods and apparatus for establishing vehicle helmet detection models, vehicle helmet detection methods and apparatus, electronic devices, computer-readable storage media, and computer program products. Background Technology
[0002] With increasingly severe road traffic congestion, vehicles such as two-wheeled vehicles and unicycles have become one of the main modes of transportation. However, the accident rate involving these vehicles has also risen accordingly. Among the main reasons for serious injuries and deaths in traffic accidents involving these vehicles, the failure of drivers to wear helmets is one of the primary causes.
[0003] The methods described in this section are not necessarily methods that had been previously conceived or adopted. Unless otherwise specified, no method described in this section should be assumed to be prior art simply because it is included in this section. Similarly, unless otherwise specified, the issues mentioned in this section should not be considered to be accepted in any prior art. Summary of the Invention
[0004] According to one aspect of this disclosure, a method for establishing a vehicle helmet detection model is provided, comprising: acquiring a vehicle helmet detection dataset, the vehicle helmet detection dataset including multiple image samples of a vehicle during driving or stopping; for each of the multiple image samples: extracting edge information and background information of the image sample; and identifying one or more vehicle helmets in the image sample; and training a neural network model based on the edge information, background information, and the identified one or more vehicle helmets to establish a vehicle helmet detection model.
[0005] According to another aspect of this disclosure, a vehicle helmet detection method is provided, comprising: capturing one or more images of a vehicle while it is in motion or stationary via a sensor; performing object detection on each of the one or more images using a vehicle helmet detection model to determine whether the driver of the vehicle is wearing a vehicle helmet, wherein the vehicle helmet detection model is established according to the method described above for establishing a vehicle helmet detection model; and issuing a warning to the driver based on the determination that the driver of the vehicle is not wearing a vehicle helmet.
[0006] According to another aspect of this disclosure, an apparatus for establishing a vehicle helmet detection model is provided, comprising: a dataset acquisition unit configured to acquire a vehicle helmet detection dataset, the vehicle helmet detection dataset including multiple image samples of a vehicle during driving or stopping; an extraction unit configured to extract edge information and background information for each of the multiple image samples; a recognition unit configured to recognize one or more vehicle helmets in each of the multiple image samples; and a training unit configured to train a neural network model based on the edge information, background information, and the recognized one or more vehicle helmets to establish a vehicle helmet detection model.
[0007] According to another aspect of this disclosure, a vehicle helmet detection device is provided, comprising: a capture unit configured to capture one or more images of a vehicle during driving or stopping via a sensor; a target detection unit configured to perform target detection on each of the one or more images using a vehicle helmet detection model to determine whether the driver of the vehicle is wearing a vehicle helmet, wherein the vehicle helmet detection model is established according to the method for establishing the vehicle helmet detection model described above; and a warning unit configured to issue a warning to the driver based on the determination that the driver of the vehicle is not wearing a vehicle helmet.
[0008] According to another aspect of this disclosure, an electronic device is provided, including a memory, a processor, and a computer program stored on the memory, wherein the processor is configured to execute the computer program to implement the steps of the method for establishing a vehicle helmet detection model or the vehicle helmet detection method described above.
[0009] According to another aspect of this disclosure, a computer-readable storage medium is provided having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method for establishing a vehicle helmet detection model or the vehicle helmet detection method described above.
[0010] According to another aspect of this disclosure, a computer program product is provided, comprising a computer program, wherein when the computer program is executed by a processor, it implements the steps of the method for establishing a vehicle helmet detection model or the vehicle helmet detection method described above.
[0011] Further features and advantages of this disclosure will become clear from the exemplary embodiments described below in conjunction with the accompanying drawings. Attached Figure Description
[0012] The accompanying drawings exemplify embodiments and form part of the specification, serving together with the textual description to explain exemplary implementations of the embodiments. The illustrated embodiments are for illustrative purposes only and do not limit the scope of the claims. Throughout the drawings, the same reference numerals refer to similar but not necessarily identical elements.
[0013] Figure 1 is a flowchart illustrating a method for establishing a vehicle helmet detection model according to an exemplary embodiment;
[0014] Figure 2 is a flowchart illustrating the global average pooling operation and the global max pooling operation performed on an image according to an exemplary embodiment;
[0015] Figure 3 is a flowchart illustrating a method for establishing a vehicle helmet detection model according to another exemplary embodiment;
[0016] Figure 4 is a flowchart illustrating a method for establishing a vehicle helmet detection model according to yet another exemplary embodiment;
[0017] Figure 5 is a flowchart illustrating a vehicle helmet detection method according to an exemplary embodiment;
[0018] Figure 6 shows a structural block diagram of an apparatus for establishing a vehicle helmet detection model according to an embodiment of the present disclosure;
[0019] Figure 7 shows a structural block diagram of a vehicle helmet detection device according to an embodiment of the present disclosure;
[0020] Figure 8 is a structural block diagram illustrating a computing device according to an exemplary embodiment of the present disclosure. Detailed Implementation
[0021] In this disclosure, unless otherwise stated, the use of terms such as "first," "second," etc., to describe various elements is not intended to limit the positional, temporal, or importance relationships of these elements; such terms are merely used to distinguish one element from another. In some examples, the first element and the second element may refer to the same instance of that element, while in other cases, based on the context, they may refer to different instances.
[0022] The terminology used in the description of the various examples described in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context explicitly indicates otherwise, an element may be one or more unless the number of elements is specifically limited. Furthermore, the term "and / or" as used in this disclosure covers any one of the listed items and all possible combinations thereof.
[0023] With increasingly severe road traffic congestion, vehicles such as two-wheeled vehicles and unicycles have become one of the main modes of transportation. However, the accident rate involving these vehicles has also risen accordingly. One of the main reasons for serious injuries and fatalities in these accidents is that drivers of these vehicles are not wearing helmets. Therefore, increasing the helmet-wearing rate among drivers of these vehicles is beneficial to reducing traffic accident injuries and fatalities.
[0024] Although regulations clearly stipulate that drivers of such vehicles must wear helmets while driving, relying solely on regulations and supervision is inefficient. Furthermore, the inventors discovered that existing computer vision algorithms for object detection are slow and unable to detect helmet use in real time, especially in heavy traffic, potentially leading to missed or false detections. Moreover, the inventors found that existing object detection algorithms have limited applicability; on one hand, they may not be compatible with different hardware devices, hindering large-scale application; on the other hand, they may not accurately detect helmet use in different regions and environments, exhibiting poor generalization ability.
[0025] To address the aforementioned technical problems, this disclosure provides a novel method and apparatus for establishing a vehicle helmet detection model, a vehicle helmet detection method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product. As will be clearly seen in the following detailed description, the method for establishing a vehicle helmet detection model according to embodiments of this disclosure can acquire a large vehicle helmet detection dataset, extract edge information and background information from each image sample in the dataset, identify the target "helmet," and train a neural network model based on the edge information, background information, and identification results, thereby establishing a vehicle helmet detection model. Since edge information can locate the contour boundary of the target object, and background information can determine the environmental range of the target object, simultaneously extracting both edge information and background information to train the neural network model can accurately determine the helmet's position in more complex scenes, reducing false detections and missed detections. It can also better infer the complete shape and position of the occluded part when the target object is partially occluded, thereby improving the accuracy of the established vehicle helmet detection model. Furthermore, the vehicle helmet detection method according to embodiments of this disclosure can capture one or more images of a vehicle while it is moving or stationary, and use the established vehicle helmet detection model to determine whether the driver is wearing a vehicle helmet, thereby issuing a warning based on the determination result. Because the established vehicle helmet detection model has high accuracy and fast processing speed, it can realize real-time detection of the driver's helmet wearing status and improve the accuracy of driver warnings.
[0026] Exemplary embodiments of the vehicle warning method of this disclosure will now be described in further detail with reference to the accompanying drawings.
[0027] Figure 1 illustrates a flowchart of a method 100 for establishing a vehicle helmet detection model according to an exemplary embodiment of the present disclosure. As shown in Figure 1, the method 100 for establishing a vehicle helmet detection model may include: step S110, acquiring a vehicle helmet detection dataset, the vehicle helmet detection dataset including multiple image samples of a vehicle during driving or stopping; step S120, for each of the multiple image samples, extracting edge information and background information of the image sample; step S130, for each of the multiple image samples, identifying one or more vehicle helmets in the image sample; and step S140, training a neural network model based on the edge information, background information, and the identified one or more vehicle helmets to establish a vehicle helmet detection model.
[0028] Since edge information can locate the contour boundary of the target object, while background information can determine the environmental range of the target object, extracting both edge and background information simultaneously to train the neural network model can accurately determine the position of the helmet in more complex scenes, reduce false detections and false negatives, and better infer the complete shape and position of the occluded part when the target object is partially occluded, thereby improving the accuracy of the established vehicle helmet detection model.
[0029] According to some embodiments of this disclosure, the vehicle can be a two-wheeled vehicle or a unicycle. For example, the vehicle in this disclosure can be a two-wheeled vehicle such as a motorcycle, electric bicycle, two-wheeled self-balancing scooter, or electric scooter, or a unicycle such as a single-wheeled self-balancing scooter.
[0030] In step S110, obtaining the vehicle helmet detection dataset can include a set of image samples obtained in any suitable manner. In some examples, the vehicle helmet detection dataset can be obtained through web crawling. That is, a script can be used to retrieve a batch of image samples using keywords such as "wearing a helmet," "detecting two-wheeled vehicle helmets," and "two-wheeled vehicle riding," and then images that match the detection scenario can be selected as the vehicle helmet detection dataset. In other examples, the vehicle helmet detection dataset can also be obtained by taking pictures on real roads. For example, images of vehicles on multiple real roads can be taken at different times, in different regions, and under different weather conditions, and the clearer images that match the detection scenario can be selected as the vehicle helmet detection dataset. Furthermore, the number of targets of different categories in the vehicle helmet detection dataset can be relatively balanced when selecting images.
[0031] Since the acquired vehicle helmet detection dataset contains a large number of detection images from different angles, times, regions, and weather conditions, and preferably the number of targets of different categories is relatively balanced, the diversity and richness of the detection dataset can be ensured, which is conducive to improving the accuracy of the established vehicle helmet detection model and adapting it to diverse application scenarios.
[0032] According to some embodiments of this disclosure, in step S120, obtaining the edge information and background information of each image sample among the plurality of image samples may include: performing a global average pooling operation and a global max pooling operation on each image sample among the plurality of image samples to obtain the edge information and background information of the image sample.
[0033] In global average pooling, the entire feature map is taken as input, and a new feature vector is generated by averaging it across its spatial dimensions (height and width). Specifically, assuming the feature map is of size H×W×C, where H represents the height of the feature map, W represents the width of the feature map, and C represents the number of channels. For each channel c (c = 1, 2, ..., C), the average of all elements in that channel is calculated. After global average pooling, a feature vector of size 1×1×C is obtained. This feature vector compresses the original feature map in terms of height and width, retaining only the channel dimension information, and the value of each channel is the average of all elements in the corresponding channel of the original feature map.
[0034] By applying global average pooling, the number of connection weights in the fully connected layers of the neural network is reduced. This reduction in parameters helps decrease the complexity of the neural network, decreases the risk of overfitting, and thus enables the neural model to generalize better on unseen data.
[0035] Similar to global average pooling, global max pooling takes the entire feature map as input and maximizes it in its spatial dimensions (height and width) to generate a new feature vector. Specifically, assuming the feature map is of size H×W×C, where H represents the height, W represents the width, and C represents the number of channels, for each channel c (c = 1, 2, ..., C), the maximum value of all elements in that channel is determined. After global max pooling, a feature vector of size 1×1×C is obtained. This feature vector compresses the original feature map in both height and width, retaining only the channel dimension information, and the value for each channel is the maximum value among all elements of the corresponding channel in the original feature map.
[0036] By applying global max pooling, the most salient features in each channel can be extracted. Since these maximum values may represent key information in the image (for example, in object detection tasks, the most representative features of an object (edges, textures, etc.) may exist in the feature map as maximum values), global max pooling can help the model quickly find this key information, improving the accuracy and efficiency of object detection.
[0037] Figure 2 illustrates a flowchart of performing global average pooling and global max pooling operations on an image according to an exemplary embodiment. As shown in Figure 2, by adding operations 210 (MaxPool2d) and 220 (AvgPool2d), edge and background information of image samples can be extracted more efficiently, thereby improving the expressive power of the neural network model.
[0038] In step S130, identifying one or more vehicle helmets in each of the multiple image samples may include obtaining the bounding box and position information of one or more vehicle helmets in the image sample.
[0039] In some examples, the location information of the bounding box for each vehicle helmet can be the center point of the bottom border of the bounding box. In other examples, the location information of the bounding box for each vehicle helmet can be the geometric center point of the bounding box. In still other examples, the location information of the bounding box for each vehicle helmet can be the center point of the top border of the bounding box. This location information can be, for example, the two-dimensional or three-dimensional coordinates of the center point.
[0040] It will be understood that the location information of the bounding box can refer to the coordinate information of any suitable point of the bounding box, and the scope of the subject matter claimed in this disclosure is not limited in the foregoing.
[0041] Figure 3 shows a flowchart of a method for establishing a vehicle helmet detection model according to another exemplary embodiment. As shown in Figure 3, the method 300 for establishing a vehicle helmet detection model may include steps S310-S340, similar to steps S110-S140 in the method 100 for establishing a vehicle helmet detection model described with reference to Figure 1; and step S350, for one or more identified vehicle helmets, determining a loss function between the predicted bounding boxes and labeled bounding boxes of one or more vehicle helmets, wherein step S340, training a neural network model based on edge information, background information, and the one or more identified vehicle helmets to establish a vehicle helmet detection model, may include: training a neural network model based on edge information, background information, and a loss function to establish a vehicle helmet detection model.
[0042] The loss function can characterize the difference between the predicted bounding box and the labeled bounding box of each vehicle helmet. By calculating this difference and further training the neural network model based on it, the accuracy of the vehicle helmet detection model can be further improved.
[0043] According to some embodiments of this disclosure, the loss function can be the InnerCIoU loss function, which is mainly used to measure the difference between the predicted bounding box and the labeled (real) bounding box of the detected object. Specifically, IoU is the ratio of the intersection to the union of the predicted bounding box and the labeled (real) bounding box of the object, as shown in the following equation (1):
[0044] Where Intersection represents the area of the intersection between the predicted bounding box and the labeled (true) bounding box of the object, and Union represents the area of the union between the predicted bounding box and the labeled (true) bounding box of the object. InnerCIoU, based on CIoU, further considers the distance between the predicted bounding box and the labeled (true) bounding box and the aspect ratio, and can be calculated using, for example, by the following formula (2):
[0045] Where L represents the distance between the predicted bounding box and the labeled (real) bounding box (e.g., when the location information of the bounding box of each vehicle helmet is the geometric center point of the bounding box, L represents the distance between the geometric center point of the predicted bounding box and the geometric center point of the labeled bounding box), area represents the area of the smallest rectangle that can simultaneously contain both the predicted bounding box and the labeled (real) bounding box, α represents the weighting factor that balances different loss terms (such as the distance between center points and the aspect ratio difference) (e.g., it can take the value 0.5), and v represents the aspect ratio value of the labeled (real) bounding box.
[0046] Compared to common IoU loss functions, the InnerCIoU loss function considers more geometric factors, enabling neural network models to converge faster. This is because the InnerCIoU loss function considers not only the overlap area between the predicted bounding box and the labeled (grounded) bounding box, but also the distance between these two bounding boxes and the aspect ratio of the labeled (grounded) bounding box. Therefore, it can more effectively guide the predicted bounding box towards the labeled (grounded) bounding box during neural network model training. Furthermore, by balancing the weights of the distance between center points and the aspect ratio difference, the InnerCIoU loss function can also improve the accuracy of object detection and localization. For example, in the detection of small objects or objects with irregular shapes, the InnerCIoU loss function can better adjust the predicted bounding box to more accurately match the ground truth object.
[0047] It will be understood that, for illustrative purposes, the loss function may be any other suitable loss function.
[0048] It will also be understood that the aforementioned neural network model can be the YOLOv8 model or other models based on deep neural network algorithms to achieve object detection for each image.
[0049] According to some embodiments of this disclosure, the methods 100 and 300 for establishing a vehicle helmet detection model may further include: determining the ReLU function as the activation function of the neural network model; and wherein steps S140 and S340, training the neural network model based on edge information, background information, and one or more identified vehicle helmets to establish a vehicle helmet detection model may include: training the neural network model based on edge information, background information, and the activation function to establish a vehicle helmet detection model.
[0050] Since the ReLU function only needs to perform one comparison operation (see equation (3) below), that is, to determine whether the input is greater than 0, it is very fast in the forward propagation process of the neural network, and is especially suitable for processing large-scale data.
[0051] Furthermore, since the SiLU function includes exponential operations (see equation (4) below), and the edge-side neural network operation unit cannot perform exponential operations, data needs to be transmitted to the Arm or DSP end, and then retransmitted to the neural network unit after the exponential operation is completed. This significantly reduces the operating speed of the neural network model. Compared with the traditional SiLU function, the ReLU function includes an operation to determine whether the input is greater than 0, so it can be performed on the edge-side neural network operation unit, thereby avoiding frequent data transmission and further improving the operating speed of the neural network model.
[0052] According to some embodiments of this disclosure, steps S140 and S340, training a neural network model based on edge information, background information, and one or more identified vehicle helmets to establish a vehicle helmet detection model, may include performing a redundancy removal operation on the neural network model.
[0053] As mentioned above, when using the ReLU activation function to train a neural network model, some neural network operation units may output 0, resulting in sparsity to some extent. Furthermore, for convolutional neural network models, the number of channels in the convolutional kernel is usually redundant. Since a batch normalization layer is typically followed by a regular convolutional layer, this layer can be used to remove redundant channels, thereby reducing the number of computations and the complexity of the neural network model, speeding up computation, and allowing the neural network model to focus more on the key features in the data, thus reducing the fitting of irrelevant information.
[0054] Specifically, the batch normalization (BN) layer is used to normalize the input of each layer of the neural network model to normalize the input to a fixed interval, for example, by the following equation (5):
[0055] Where x is the input to each layer of the neural network model, and E(x) and These are the mean and standard deviation of the mini-batch data, while γ and β are learnable parameters.
[0056] During the training of the neural network model, sparsity reduction training (i.e., setting some parameter γ values to 0) can be achieved by applying L1 regularization constraints to the parameter γ of the BN layer and adjusting the L1 regularization strength coefficient λ. Furthermore, when the data of a certain channel in the BN layer is 0, it can be proven that the corresponding channel of the convolutional layer above that BN layer is also 0, which is a redundant channel. Accordingly, the redundancy reduction operation can be implemented using the following equation (6):
[0057] Where total_loss represents the total loss. This represents the difference between the predicted result and the labeled (true) result for the object, and This indicates that L1 regularization constraints are applied to the parameter γ of the BN layer.
[0058] It will be understood that after establishing the aforementioned vehicle helmet detection model, the trained model can be fitted and ultimately converted into an ONNX offline model to decouple it from the original deep learning framework. This allows it to run on various operating systems and hardware architectures without relying on the training framework. This significantly expands the model's application scope, facilitating the migration of the model from training environments such as data centers to actual production environments or terminal devices.
[0059] According to some embodiments of this disclosure, after obtaining the corresponding ONNX offline model, the DFL module can be pruned to avoid operations such as Reshape, Transpose, and Softmax, thereby simplifying the operation and further improving the model calculation speed.
[0060] Figure 4 shows a flowchart of a method 400 for establishing a vehicle helmet detection model according to yet another exemplary embodiment. As shown in Figure 4, the method 400 for establishing a vehicle helmet detection model may include steps S410-S440 similar to steps S110-S140 in the method 100 for establishing a vehicle helmet detection model described with reference to Figure 1, or steps S310-S340 in the method 300 for establishing a vehicle helmet detection model described with reference to Figure 3; and step S450, after establishing the vehicle helmet detection model, performing graph optimization operations on the convolutional layers, batch normalization layers, and activation functions of the vehicle helmet detection model.
[0061] Each time the vehicle helmet detection model is used to perform helmet detection, it accesses the data in memory three times sequentially: in the convolutional layer, the batch normalization layer, and the activation function. In some examples, the convolutional layer, batch normalization layer, and activation function structure may account for more than 80% of the entire structure of the vehicle helmet detection model. This means that the number of times the data in memory is accessed will increase significantly. By performing graph optimization operations on the convolutional layer, batch normalization layer, and activation function, the corresponding operators of the three can be coupled into a single operator, so that the same operation can be completed with only one access to the data in memory, thereby significantly improving the computational speed and the model's inference performance.
[0062] According to some embodiments of this disclosure, step S550, performing graph optimization operations on the convolutional layer, batch normalization layer, and activation function of the vehicle helmet detection model, may include: reparameterizing the vehicle helmet detection model to couple multiple operators of the corresponding convolutional layer, batch normalization layer, and activation function of the vehicle helmet detection model.
[0063] Typically, neural network models have multi-branch network structures in both the training and inference phases, with each branch network structure associated with a set of parameters. In the inference phase, it is desirable to obtain inference results quickly. Therefore, as mentioned above, graph optimization operations can be performed by coupling multiple sets of parameters from the multi-branch structure of the neural network model, such as convolutional layers, batch normalization layers, and activation functions, to achieve the operation of a single operator.
[0064] For example, the operation of a convolutional layer can be implemented using the following equation (7): y = w * x (7)
[0065] Where y represents the result of the convolution calculation, w represents the weight factor, and x represents the input data (e.g., an image).
[0066] The batch normalization layer and activation function can be implemented using equations (5) and (4) above, respectively.
[0067] In this case, the graph optimization operation performed on the convolutional layers, batch normalization layers, and activation functions of the vehicle helmet detection model can be achieved by the following equation (8):
[0068] By coupling multiple sets of parameters from the convolutional layers, batch normalization layers, and activation functions of the vehicle helmet detection model, a single-operator neural network model can be obtained, achieving equivalent computational results. Since executing a single operator operation only requires accessing memory data once, the computation speed is faster, allowing for quicker acquisition of the neural network model's inference results.
[0069] Figure 5 shows a flowchart of a vehicle helmet detection method 500 according to an exemplary embodiment. As shown in Figure 5, the vehicle helmet detection method 500 may include: step S510, capturing one or more images of the vehicle while it is moving or stationary via a sensor; step S520, performing target detection on each of the one or more images using a vehicle helmet detection model to determine whether the driver of the vehicle is wearing a vehicle helmet, wherein the vehicle helmet detection model is established according to the method for establishing a vehicle helmet detection model described above; and step S530, issuing a warning to the driver based on the determination that the driver of the vehicle is not wearing a vehicle helmet.
[0070] Because the established vehicle helmet detection model is based on a large number of image samples obtained at different times, in different regions, and under different weather conditions, it can monitor the wearing status of drivers' vehicle helmets under various conditions. Furthermore, since the established vehicle helmet detection model can achieve real-time and rapid monitoring of drivers' vehicle helmet wearing status, it can provide timely warnings to drivers, helping to improve riders' safety awareness and reduce the occurrence of traffic accidents.
[0071] In step S510, the sensor can be any suitable sensor capable of capturing images, such as a camera, video camera, or webcam. Such a sensor could, for example, be a fisheye lens for capturing the driver's upper body. The scope of the subject matter claimed in this disclosure is not limited in the foregoing.
[0072] In step S510, the sensor can be located anywhere at the front of the vehicle, as long as it can capture the driver's information in real time. For example, it can be installed on the vehicle's dashboard or on the front glove box, and preferably on the vehicle's centerline.
[0073] In step S520, the method 100, 300 or 400 described above for establishing a vehicle helmet detection model can be used to perform target detection on each captured image, and determine whether the driver of the vehicle is wearing a vehicle helmet based on the detection results.
[0074] In some examples, whether a vehicle driver is wearing a helmet can be determined based on the positions of the detected vehicle helmet's bounding box and the detected driver's bounding box. For instance, it can be determined that the driver is wearing a helmet if the distance between the center point of the determined vehicle helmet's bounding box and the center point of the detected driver's bounding box is less than a preset threshold. As another example, it can be determined that the driver is wearing a helmet if the distance between the center point of the lower boundary of the determined vehicle helmet's frame and the center point of the lower boundary of the detected driver's bounding box is less than a preset threshold.
[0075] In step S530, issuing a warning to the driver based on the determination that the driver is not wearing a helmet may include any one or more of the following: highlighting the warning on the vehicle's display screen; issuing a warning to the driver using visual signals such as flashing lights; or issuing a warning to the driver using auditory signals such as an alarm. Furthermore, as the duration of the driver not wearing a helmet exceeds a preset threshold, the warning may be issued by increasing the volume and frequency of the warning buzzer, increasing the flashing frequency of the warning indicator light, or providing a more prominent highlight.
[0076] It will be understood that how to warn drivers is not limited to the methods described above, and the scope of the subject matter claimed in this disclosure is not limited in the above respects.
[0077] According to some embodiments of this disclosure, the vehicle helmet detection method 500 may further include: preprocessing one or more images to convert one or more images into one or more images in a preset format.
[0078] One or more images captured in step S510 may not be directly applicable as input to the trained vehicle helmet detection model. Therefore, these images can be converted to a preset format (e.g., converted to YUV format by the Video Input module, or converted to a format suitable for the neural network model) so that they can be applied to the trained vehicle helmet detection model to achieve fast and accurate detection of vehicle helmets.
[0079] Figure 6 shows a structural block diagram of an apparatus 600 for establishing a vehicle helmet detection model according to an embodiment of the present disclosure. As shown in Figure 6, the apparatus 600 may include: a dataset acquisition unit 610 configured to acquire a vehicle helmet detection dataset, the vehicle helmet detection dataset including multiple image samples of a vehicle during driving or stopping; an extraction unit 620 configured to extract edge information and background information for each of the multiple image samples; a recognition unit 630 configured to recognize one or more vehicle helmets in each of the multiple image samples; and a training unit 640 configured to train a neural network model based on the edge information, background information, and the recognized one or more vehicle helmets to establish a vehicle helmet detection model.
[0080] Since edge information can locate the contour boundary of the target object, while background information can determine the environmental range of the target object, extracting both edge and background information simultaneously to train the neural network model can accurately determine the position of the helmet in more complex scenes, reduce false detections and false negatives, and better infer the complete shape and position of the occluded part when the target object is partially occluded, thereby improving the accuracy of the established vehicle helmet detection model.
[0081] According to some embodiments of the present disclosure, the extraction unit 620 may include a unit configured to perform a global average pooling operation and a global max pooling operation on each of a plurality of image samples to obtain edge information and background information of the image sample.
[0082] According to some embodiments of this disclosure, the apparatus 600 may further include a loss function determination unit configured to determine a loss function between the predicted bounding boxes and labeled bounding boxes of one or more identified vehicle helmets; and the training unit 640 may include a unit configured to train a neural network model based on edge information, background information, and the loss function to establish a vehicle helmet detection model.
[0083] According to some embodiments of this disclosure, the loss function is the InnerCIoU loss function.
[0084] According to some embodiments of this disclosure, the apparatus 600 may further include an activation function determination unit configured to determine the ReLU function as the activation function of the neural network model; and the training unit 640 may include a unit configured to train the neural network model based on edge information, background information and activation function to establish a vehicle helmet detection model.
[0085] According to some embodiments of this disclosure, training unit 640 may include a redundancy removal unit configured to perform redundancy removal operations on a neural network model.
[0086] According to some embodiments of this disclosure, the apparatus 600 may further include a graph optimization unit configured to perform graph optimization operations on the convolutional layers, batch normalization layers, and activation functions of the vehicle helmet detection model after the model has been established.
[0087] According to some embodiments of this disclosure, the graph optimization unit may include a unit configured to perform model reparameterization on a vehicle helmet detection model to couple multiple operators of the corresponding convolutional layer, batch normalization layer, and activation function of the vehicle helmet detection model.
[0088] According to some embodiments of this disclosure, the vehicle is a two-wheeled vehicle or a unicycle.
[0089] It should be understood that the various units of the apparatus 600 for establishing a vehicle helmet detection model shown in FIG. 6 correspond to the various steps in methods 100, 300, and 400 for establishing a vehicle helmet detection model as described with reference to FIGS. 1, 3, and 4. Therefore, the operations, features, and advantages described above for methods 100, 300, and 400 also apply to the apparatus 600 for establishing a vehicle helmet detection model and its constituent units. For the sake of brevity, some operations, features, and advantages will not be repeated here.
[0090] Figure 7 shows a structural block diagram of a vehicle helmet detection device 700 according to an embodiment of the present disclosure. As shown in Figure 7, the vehicle helmet detection device 700 may include: a capture unit 710 configured to capture one or more images of a vehicle while it is in motion or stationary via a sensor; a target detection unit 720 configured to perform target detection on each of the one or more images using a vehicle helmet detection model to determine whether the driver of the vehicle is wearing a vehicle helmet, wherein the vehicle helmet detection model is established according to the method for establishing the vehicle helmet detection model described above; and a warning unit 730 configured to issue a warning to the driver based on the determination that the driver of the vehicle is not wearing a vehicle helmet.
[0091] Because the established vehicle helmet detection model is based on a large number of image samples obtained at different times, in different regions, and under different weather conditions, it can monitor the wearing status of drivers' vehicle helmets under various conditions. Furthermore, since the established vehicle helmet detection model can achieve real-time and rapid monitoring of drivers' vehicle helmet wearing status, it can provide timely warnings to drivers, helping to improve riders' safety awareness and reduce the occurrence of traffic accidents.
[0092] According to some embodiments of this disclosure, the vehicle helmet detection device 700 may further include a preprocessing unit configured to preprocess one or more images to convert one or more images into one or more images in a preset format.
[0093] It should be understood that the various units of the vehicle helmet detection device 700 shown in FIG. 7 correspond to the various steps in the vehicle helmet detection method 500 described with reference to FIG. 5. Therefore, the operations, features, and advantages described above for the vehicle helmet detection method 500 also apply to the vehicle helmet detection device 700 and its constituent units. For the sake of brevity, some operations, features, and advantages will not be repeated here.
[0094] Another aspect of this disclosure may include an electronic device that may include a memory, a processor, and a computer program stored in the memory, wherein the processor is configured to execute the computer program to implement the steps of the method for establishing a vehicle helmet detection model or the vehicle helmet detection method described above.
[0095] Another aspect of this disclosure may include a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method for establishing a vehicle helmet detection model or the vehicle helmet detection method described above.
[0096] Another aspect of this disclosure may include a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the steps of the method for establishing a vehicle helmet detection model or a vehicle helmet detection method described above.
[0097] Referring to FIG8, a computing device 800 will now be described, which is an example of a hardware device that can be applied to various aspects of this disclosure. The computing device 800 can be any machine configured to perform processing and / or computation, and can be, but is not limited to, a workstation, server, desktop computer, laptop computer, tablet computer, personal digital assistant, smartphone, in-vehicle computer, access control system, time attendance device, or any combination thereof. The aforementioned apparatus for establishing a vehicle helmet detection model and the vehicle helmet detection apparatus can be implemented wholly or at least partially by the computing device 800 or similar devices or systems. While the computing device 800 represents one example of several types of computing platforms, the computing device 800 may include more or fewer elements and / or different element arrangements than shown in FIG8, and does not limit the scope of the claimed subject matter in these respects.
[0098] In some embodiments, computing device 800 may include elements connected to or communicating with bus 802 (possibly via one or more interfaces). For example, computing device 800 may include bus 802, one or more processors 804, one or more input devices 806, and one or more output devices 808. The one or more processors 804 may be any type of processor and may include, but are not limited to, one or more general-purpose processors and / or one or more dedicated processors (e.g., special-purpose processing chips). Input devices 806 may be any type of device capable of inputting information to computing device 800 and may include, but are not limited to, a mouse, keyboard, touchscreen, microphone, and / or remote control. Output devices 808 may be any type of device capable of presenting information and may include, but are not limited to, a display, speaker, video / audio output terminal, vibrator, and / or printer. Computing device 800 may also include or be connected to a non-transitory storage device 810. The non-transitory storage device can be any storage device that is non-transitory and capable of storing data, and can include, but is not limited to, disk drives, optical storage devices, solid-state storage, floppy disks, flexible disks, hard disks, magnetic tapes or any other magnetic media, optical discs or any other optical media, ROM (read-only memory), RAM (random access memory), cache memory and / or any other memory chip or cartridge, and / or any other medium from which a computer can read data, instructions and / or code. The non-transitory storage device 810 can be detached from an interface. The non-transitory storage device 810 embodies one or more non-transitory computer-readable media storing a program including instructions that, when executed by one or more processors of the computing device 800, cause the computing device 800 to perform the methods 100, 300, 400 and their variations for establishing a vehicle helmet detection model, or the vehicle helmet detection method 500 and its variations. The computing device 800 may also include a communication device 812. The communication device 812 can be any type of device or system that enables communication with external devices and / or with a network, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication devices and / or chipsets, such as Bluetooth™ devices, 802.11 devices, WiFi devices, WiMax devices, cellular communication devices and / or the like.
[0099] In some embodiments, the computing device 800 may also include a working memory 814, which may be any type of memory that can store programs (including instructions) and / or data useful for the operation of the processor 804, and may include, but is not limited to, random access memory and / or read-only memory devices.
[0100] The software elements (programs) may reside in the working memory 814, including but not limited to the operating system 816, one or more application programs 818, drivers, and / or other data and code. Instructions for performing the above methods and steps may be included in one or more application programs 818, and the electronic circuitry of the aforementioned apparatus for establishing a vehicle helmet detection model and the vehicle helmet detection device may be implemented by the processor 804 reading and executing the instructions of one or more application programs 818. The executable code or source code of the software element (program) instructions may be stored in a non-transitory computer-readable storage medium (e.g., the aforementioned storage device 810) and may be stored in the working memory 814 during execution (possibly compiled and / or installed). The executable code or source code of the software element (program) instructions may also be downloaded from a remote location.
[0101] It should also be understood that various modifications can be made depending on specific requirements. For example, custom hardware can also be used, and / or specific elements can be implemented using hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. For example, some or all of the disclosed methods and apparatus can be implemented by programming hardware (e.g., programmable logic circuits including field-programmable gate arrays (FPGAs) and / or programmable logic arrays (PLAs)) using logic and algorithms according to this disclosure in assembly language or hardware programming languages (such as Verilog, VHDL, C++).
[0102] It should also be understood that the aforementioned methods can be implemented using a server-client model. For example, the client can use a camera to capture image data and send the image data to the server for further processing. The client can also perform a portion of the processing described above and send the resulting data to the server. The server can receive data from the client, execute the aforementioned methods or another portion thereof, and return the execution result to the client. The client can receive the execution result of the method from the server and, for example, present it to the user via an output device.
[0103] It should also be understood that the components of computing device 800 can be distributed across a network. For example, some processing can be performed using one processor, while other processing can be performed simultaneously by another processor located far away from that processor. Other components of computing device 800 can also be distributed similarly. In this way, computing device 800 can be understood as a distributed computing system that performs processing on multiple processors at multiple locations.
[0104] While embodiments or examples of this disclosure have been described with reference to the accompanying drawings, it should be understood that the methods, systems, and devices described above are merely exemplary embodiments or examples, and the scope of the invention is not limited by these embodiments or examples, but only by the granted claims and their equivalents. Various elements in the embodiments or examples may be omitted or replaced by their equivalents. Furthermore, the steps may be performed in a different order than that described in this disclosure. Further, various elements in the embodiments or examples may be combined in various ways. Importantly, as the technology evolves, many elements described herein can be replaced by equivalents that appear after this disclosure.
Claims
1. A method for establishing a vehicle helmet detection model, comprising: Obtain a vehicle helmet detection dataset, which includes multiple image samples of the vehicle during driving or stopping; For each of the plurality of image samples: Extract edge and background information from the image sample; and Identify one or more vehicle helmets in the image sample; as well as Based on the edge information, the background information, and the identified one or more vehicle helmets, a neural network model is trained to establish a vehicle helmet detection model.
2. The method according to claim 1, wherein, For each of the plurality of image samples, obtaining the edge information and background information of that image sample includes: Global average pooling and global max pooling operations are performed on each of the plurality of image samples to obtain the edge information and background information of that image sample.
3. The method of claim 1, further comprising, for the identified one or more vehicle helmets, determining a loss function between the predicted bounding boxes and labeled bounding boxes of the one or more vehicle helmets; and in, Training the neural network model based on the edge information, the background information, and the identified one or more vehicle helmets to establish the vehicle helmet detection model includes: The neural network model is trained based on the edge information, the background information, and the loss function to establish the vehicle helmet detection model.
4. The method according to claim 3, wherein, The loss function is the InnerCIoU loss function.
5. The method according to any one of claims 1-4, further comprising determining the ReLU function as the activation function of the neural network model; and in, Training the neural network model based on the edge information, the background information, and the identified one or more vehicle helmets to establish the vehicle helmet detection model includes: The neural network model is trained based on the edge information, the background information, and the activation function to establish the vehicle helmet detection model.
6. The method according to any one of claims 1-5, wherein, Training the neural network model based on the edge information, the background information, and the identified one or more vehicle helmets to establish the vehicle helmet detection model includes: A redundancy removal operation is performed on the neural network model.
7. The method according to any one of claims 1-6, further comprising: After establishing the vehicle helmet detection model, graph optimization operations are performed on the convolutional layers, batch normalization layers, and activation functions of the vehicle helmet detection model.
8. The method according to claim 7, wherein, Performing the graph optimization operation on the convolutional layer, the batch normalization layer, and the activation function of the vehicle helmet detection model includes: The vehicle helmet detection model is reparameterized to couple multiple operators of the corresponding convolutional layer, batch normalization layer, and activation function of the vehicle helmet detection model.
9. The method according to any one of claims 1 to 8, wherein, The vehicle is a two-wheeled vehicle or a one-wheeled vehicle.
10. A method for testing vehicle helmets, comprising: Capture one or more images of the vehicle while it is moving or stationary via sensors; The vehicle helmet detection model is used to perform object detection on each of the one or more images to determine whether the driver of the vehicle is wearing a vehicle helmet, wherein the vehicle helmet detection model is established by the method according to any one of claims 1 to 7; as well as A warning is issued to the driver of the vehicle based on the determination that the driver is not wearing a helmet.
11. The method of claim 10, further comprising: The one or more images are preprocessed to convert them into one or more images in a preset format.
12. An apparatus for establishing a vehicle helmet detection model, comprising: The dataset acquisition unit is configured to acquire a vehicle helmet detection dataset, which includes multiple image samples of the vehicle during driving or stopping. The extraction unit is configured to extract edge information and background information for each of the plurality of image samples; The identification unit is configured to identify one or more vehicle helmets in each of the plurality of image samples; as well as The training unit is configured to train a neural network model based on the edge information, the background information, and the identified one or more vehicle helmets to establish a vehicle helmet detection model.
13. A vehicle helmet detection device, comprising: The capture unit is configured to capture one or more images of the vehicle during driving or stopping via sensors; The target detection unit is configured to perform target detection on each of the one or more images using a vehicle helmet detection model to determine whether the driver of the vehicle is wearing a vehicle helmet, wherein the vehicle helmet detection model is established by the method according to any one of claims 1 to 7; as well as The warning unit is configured to issue a warning to the driver based on the determination that the driver of the vehicle is not wearing a vehicle helmet.
14. An electronic device comprising: Memory, processor, and computer program stored on said memory, The processor is configured to execute the computer program to implement the steps of the method according to any one of claims 1 to 9 or 10 to 11.
15. A computer-readable storage medium having a computer program stored thereon, wherein, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 9 or 10 to 11.
16. A computer program product comprising a computer program, wherein, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 9 or 10 to 11.