A sleep state detection method based on improved YOLOv5

By improving the combination of the YOLOv5 model and the Deepsort network, and adopting an improved CBAM attention mechanism and a two-level detection design, the accuracy and robustness issues of sleep detection were solved, achieving real-time and efficient sleep detection and improving the accuracy and security of detection.

CN116597281BActive Publication Date: 2026-06-26NANJING TECH UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANJING TECH UNIV
Filing Date
2023-05-31
Publication Date
2026-06-26

Smart Images

  • Figure CN116597281B_ABST
    Figure CN116597281B_ABST
Patent Text Reader

Abstract

The application relates to a sleep state detection method based on improved YOLOv5. Based on an improved YOLOv5 model structure combined with an improved CBAM attention mechanism, a suspected sleep detection model for detecting suspected sleep persons is constructed, and a two-stage detection design combining feature matching and detection frame matching in turn is combined to realize tracking detection of continuous sleep suspected persons, and finally based on the continuity of the sleep suspected persons, accurate detection of the sleep state persons is realized. In the design scheme, the accuracy and robustness of the sleep state detection are effectively improved through improvement and simultaneous application of the Yolov5 model and the Deepsort network, and the sleep state detection has wide application prospects in improving work safety and enterprise benefits.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a sleep detection method based on an improved YOLOv5, belonging to the field of computer vision and deep learning technology. Background Technology

[0002] In modern industrial production, many industries have high-intensity and high-risk working environments. Due to worker fatigue and distraction, workers may fall asleep on duty, leading to decreased production efficiency and safety accidents. Furthermore, worker sleep on duty is often the main culprit in causing safety accidents, resulting in serious losses to production. Currently, although some companies use cameras for monitoring, due to technical and human limitations, problems such as misjudgments, missed detections, and false alarms frequently occur. Therefore, using computer vision technology to detect worker sleep on duty is essential.

[0003] An analysis of the current state of research on worker sleep detection both domestically and internationally reveals a significant amount of research and achievements. However, some problems remain, such as ambiguity, high cost, and inaccurate detection in traditional methods. With the development of deep learning technology, computer vision-based detection methods can avoid these issues. Currently, deep learning-based object detection methods have made considerable progress, with YOLOv5 (You Only Look Once version 5) being a highly effective algorithm that combines object detection and tracking, offering advantages such as high accuracy and real-time performance. However, due to the complexity of worker sleep detection scenarios, simply using the YOLOv5 algorithm is insufficient for accurate detection. Summary of the Invention

[0004] The technical problem to be solved by this invention is to provide a sleep detection method based on an improved YOLOv5, which adopts a new strategy logic to improve the recognition accuracy and robustness, thereby achieving real-time and efficient sleep detection and avoiding false alarms.

[0005] To solve the above-mentioned technical problems, the present invention adopts the following technical solution: The present invention designs a sleeping detection method based on improved YOLOv5, which executes the following steps A to E in real time based on the real-time captured images to realize the detection of sleeping people in the captured images;

[0006] Step A. Based on the improved YOLOv5 model structure that combines the improved CBAM attention mechanism, a suspected sleep detection model is pre-trained, which takes the image as input and outputs the suspected sleep detection results of each preset sleep posture in the image. The model is then analyzed to obtain the suspected sleep detection results in the image. The suspected sleep detection results include whether there is a suspected person in the image who meets the sleep posture, and the suspected sleep detection box of the suspected person in the image. Then proceed to Step B.

[0007] Step B. Determine whether there is a suspected person in the current image whose sleep detection results match the sleep posture. If yes, obtain the current suspected sleep detection boxes in the current image and proceed to step C; otherwise, the processing of the current image ends.

[0008] Step C. Determine whether there is a previous frame image that is adjacent to the current image and contains a suspected sleep detection box in the historical time direction. If yes, proceed to step D; otherwise, for each suspected sleep detection box in the current image, mark it with a tracking mark that is different from the already marked tracking mark and is different from each other. This constitutes the tracking mark corresponding to each suspected sleep detection box in the current image. Then proceed to step E.

[0009] Step D. Based on the tracking markers corresponding to each suspected sleep detection box in the previous frame image, and considering the historical time direction, according to each consecutive suspected sleep detection box corresponding to each tracking marker, a two-level detection design of feature matching and detection box matching is used to track and detect each current suspected sleep detection box in the current image.

[0010] Specifically, for each currently suspected sleep detection frame that has been successfully tracked, the corresponding currently suspected sleep detection frame is marked with each successfully tracked tracking mark, thus forming the tracking mark corresponding to each currently suspected sleep detection frame.

[0011] For each suspected sleep detection frame that was not successfully tracked, a tracking mark that is different from the tracking mark that has been marked and is different from each other is marked, thus forming the tracking mark corresponding to each suspected sleep detection frame;

[0012] Then proceed to step E;

[0013] Step E. Based on the tracking markers corresponding to each suspected sleep detection box in the currently captured image, determine whether the number of suspected sleep detection boxes corresponding to each tracking marker reaches a preset judgment threshold. If yes, determine that the person corresponding to the tracking marker is in a sleep state, that is, the person corresponding to the tracking marker in the currently captured image is in a sleep state; otherwise, no judgment is made.

[0014] As a preferred technical solution of the present invention: an improved CBAM module is connected in series between the first Conv layer and the first C3 layer in the backbone network of the YOLOv5 model in the direction of data flow, to form an improved YOLOv5 model structure that combines the improved CBAM attention mechanism;

[0015] The improved CBAM module includes a global average pooling layer, a max pooling layer, an average pooling layer, a convolutional layer, a sigmoid activation function layer, a first matrix multiplication module, a second matrix multiplication module, and an ECA-Net layer that takes a feature map as input and outputs a channel-domain attention feature map corresponding to the feature map. The input of the global average pooling layer constitutes the input of the improved CBAM module, which receives the feature map F. The output of the global average pooling layer is connected to the input of the ECA-Net layer, which processes the received feature map F to obtain the channel-domain attention feature map M corresponding to feature map F. c (F) outputs the ECA-Net layer's output and the improved CBAM module's input, which are then connected to the first matrix multiplication module's input. The first matrix multiplication module then performs the output on the channel domain attention feature map M. c (F) The feature map F is processed by matrix multiplication to obtain the result. The output of the first matrix multiplication module is sequentially connected to a max pooling layer, an average pooling layer, a convolutional module, and a sigmoid activation function layer to obtain the spatial attention feature M corresponding to the feature map. s (F), simultaneously, the output of the first matrix multiplication module and the output of the Sigmoid activation function layer are connected to the input of the second matrix multiplication module, which then performs the multiplication based on the received spatial attention features M. s (F) and the processing result output by the first matrix multiplication module are subjected to matrix multiplication to obtain the updated feature map and output.

[0016] As a preferred technical solution of the present invention: the suspected sleep detection model is based on a dataset consisting of a preset number of sample images of local images of a person's sleep state, each frame of which is labeled with a corresponding preset sleep posture. The dataset is expanded and updated by applying the Mosaic data augmentation method. The sample images are used as input, and the detection boxes of local images of a person's sleep state in the sample images are used as output. The model is trained on an improved Yolov5 network structure to obtain the suspected sleep detection model.

[0017] As a preferred embodiment of the present invention, the following loss function is used in the process of training the improved Yolov5 network structure to obtain the suspected sleep detection model:

[0018]

[0019] Where P is the center point position of the local image detection box of a person's sleep state that satisfies the sleep posture in the sample image. gt Let be the center point of the ground truth bounding box in the local image of a person in a sleeping posture within the sample image, and let D be the diagonal distance of the smallest rectangular region enclosed by both the ground truth bounding box and the detection bounding box in the sample image that simultaneously contains the local image of a person in a sleeping posture. IOU represents the intersection-union ratio (IoU) of the detection bounding box A and the ground truth bounding box B in the local image of a person in a sleeping posture within the sample image, where α represents the preset exponent. α This is used to measure the degree of overlap between the ground truth bounding boxes and detection boxes in a local image of a person in a sleeping posture within a sample image. w is the width of the local image detection box for the sleeping posture of a person in the sample image. gt h is the width of the ground truth bounding box of the local image of a person sleeping in a sleeping posture in the sample image, and h is the height of the detection bounding box of the local image of a person sleeping in a sleeping posture in the sample image. gt Let γ be the height of the ground truth bounding box of the local image of a person in a sleeping posture in the sample image, and let γ be the Euclidean distance between the center point of the detection box A and the center point of the ground truth bounding box B in the local image of a person in a sleeping posture in the sample image.

[0020] As a preferred technical solution of the present invention: In the tracking and detection in step D, in the first stage, firstly, the Kalman filter in the Deepsort network is applied to obtain the suspected sleep prediction boxes in the current image for each suspected sleep detection box in the previous frame of the captured image.

[0021] Next, a pre-trained feature extraction model for extracting preset features in the image is applied to extract preset feature vectors for each suspected sleep prediction box, and simultaneously extract preset feature vectors for each current suspected sleep detection box in the currently captured image.

[0022] Then, by using the preset type distance between the pairwise feature vectors of each suspected sleep prediction box and each current suspected sleep detection box, it is determined whether the distance result is less than the preset similarity threshold. If it is, it means that the corresponding suspected sleep prediction box and the current suspected sleep detection box have been initially tracked successfully. Otherwise, it means that the corresponding suspected sleep prediction box and the current suspected sleep detection box have not been tracked successfully.

[0023] In the second stage, for the suspected sleep prediction box and the current suspected sleep detection box that have been initially tracked successfully, it is determined whether the distance between the center position of the suspected sleep prediction box and the center position of the current suspected sleep detection box is less than a preset distance threshold. If yes, it means that the intermediate tracking between the corresponding suspected sleep prediction box and the current suspected sleep detection box is successful; otherwise, it means that the tracking between the corresponding suspected sleep prediction box and the current suspected sleep detection box is unsuccessful.

[0024] In the third stage, for the suspected sleep prediction box and the current suspected sleep detection box that were successfully tracked in the intermediate stage, if the difference between the length of the suspected sleep prediction box and the length of the current suspected sleep detection box is less than a preset length difference threshold, and the difference between the width of the suspected sleep prediction box and the width of the current suspected sleep detection box is less than a preset width difference threshold, then it means that the corresponding suspected sleep prediction box and the current suspected sleep detection box were finally successfully tracked; otherwise, it means that the corresponding suspected sleep prediction box and the current suspected sleep detection box were not successfully tracked.

[0025] The sleep deprivation detection method based on the improved YOLOv5 described in this invention has the following technical advantages compared with the prior art:

[0026] (1) The present invention designs a sleep detection method based on improved YOLOv5. Based on the improved YOLOv5 model structure combined with the improved CBAM attention mechanism, a suspected sleep detection model is constructed to detect suspected sleepers. Combined with the two-level detection design of feature matching and detection box matching, the continuous suspected sleepers are tracked and detected. Finally, based on the continuity of suspected sleepers, the accurate detection of sleepers is achieved. The design scheme improves the accuracy and robustness of sleep detection by improving the YOLOv5 model and Deepsort network and applying them together. It has broad application prospects in improving work safety and enterprise efficiency. Attached Figure Description

[0027] Figure 1 This is a flowchart of the sleep deprivation detection method based on the improved YOLOv5 designed in this invention;

[0028] Figure 2 This is a schematic diagram of the improved CBAM module in the design of this invention. Detailed Implementation

[0029] The specific embodiments of the present invention will be further described in detail below with reference to the accompanying drawings.

[0030] This invention presents a sleep detection method based on an improved YOLOv5, which utilizes real-time captured images, such as... Figure 1 As shown, steps A to E are executed in real time to detect sleeping persons in the captured images.

[0031] Step A. Based on the improved YOLOv5 model structure that combines the improved CBAM attention mechanism, a suspected sleep detection model is pre-trained, which takes the image as input and outputs the suspected sleep detection results for each preset sleep posture in the image. The model is then analyzed to obtain the suspected sleep detection results in the image. The suspected sleep detection results include whether there is a suspected person in the image who meets the sleep posture, and the suspected sleep detection box of the suspected person in the image. Then proceed to Step B.

[0032] In practical applications, the suspected sleep detection model is based on a dataset consisting of a preset number of sample images, each frame of which is labeled with a local image of a person's sleep state corresponding to a preset sleep posture. The dataset is expanded and updated using the Mosaic data augmentation method. The sample images are used as input, and the detection boxes of the local images of the person's sleep state in the sample images are used as output. The model is trained on an improved Yolov5 network structure to obtain the suspected sleep detection model. The preset sleep postures include lying down, standing, and lying on a table.

[0033] In the Mosaic data augmentation method, up to four images are randomly cropped, scaled, and stitched together onto a single mosaic image, with the remaining portion filled with gray borders. However, excessive gray borders can cause the model to learn irrelevant feature information, reducing training efficiency. To minimize the amount of irrelevant feature information learned from the training data, this invention improves the Mosaic data augmentation module by stitching together up to nine images into a single mosaic image. This improved Mosaic data augmentation offers three advantages: first, it increases batch training size; second, it minimizes the area filled with gray borders, improving training efficiency; and third, because the scaled images are closer in size to small targets, it expands the small target dataset, enhancing model robustness and effectively improving the model's ability to detect small targets in a worker's work environment.

[0034] The following loss function is used during the training of the improved Yolov5 network architecture to obtain a suspected sleep detection model:

[0035]

[0036] Where P is the center point position of the local image detection box of a person's sleep state that satisfies the sleep posture in the sample image. gt Let be the center point of the ground truth bounding box in the local image of a person in a sleeping posture within the sample image, and let D be the diagonal distance of the smallest rectangular region enclosed by both the ground truth bounding box and the detection bounding box in the sample image that simultaneously contains the local image of a person in a sleeping posture. IOU represents the intersection-union ratio (IoU) of the detection bounding box A and the ground truth bounding box B in the local image of a person in a sleeping posture within the sample image, where α represents the preset exponent. α This is used to measure the degree of overlap between the ground truth bounding boxes and detection boxes in a local image of a person in a sleeping posture within a sample image. w is the width of the local image detection box for the sleeping posture of a person in the sample image. gt h is the width of the ground truth bounding box of the local image of a person sleeping in a sleeping posture in the sample image, and h is the height of the detection bounding box of the local image of a person sleeping in a sleeping posture in the sample image. gt Let γ be the height of the ground truth bounding box of the local image of a person in a sleeping posture in the sample image, and let γ be the Euclidean distance between the center point of the detection box A and the center point of the ground truth bounding box B in the local image of a person in a sleeping posture in the sample image.

[0037] In practical applications, the structure of the improved YOLOv5 model is specifically designed based on the improved CBAM module connected in series between the first Conv layer and the first C3 layer in the backbone network of the YOLOv5 model in the direction of data flow, forming an improved YOLOv5 model structure that combines the improved CBAM attention mechanism.

[0038] In CBAM, the channel attention part uses a fully connected layer to compress the spatial dimension of channel features and then maps them back to the original dimension. This dimensionality reduction process may lead to the loss of some channel feature information, limiting the capture of dependencies between all channels, thus making it impossible to obtain complete feature information of complex gestures. In order to avoid the loss of some channel information caused by dimensionality reduction in CBAM and to improve the detection accuracy of gesture recognition networks more effectively, the design scheme of this invention improves the CBAM attention mechanism. The improved CBAM attention mechanism retains its spatial domain part but discards the channel domain part, and selects ECA-Net (efficient channel attention) as the channel domain of the improved CBAM. After channel-level global average pooling (GAP), the ECA-Net module aggregates channel features by using one-dimensional convolution, considering the interaction information between each channel and its K neighboring channels. Here, the kernel size K represents the range of local cross-channel interaction, that is, the number of neighboring channels involved in the channel. In order to avoid manually adjusting the value of K, an adaptive method is used to determine the value of K, and the specific calculation method is shown in formula (5).

[0039] The initial value of the number of channels C is 64, |t| odd Let t represent the nearest odd number, γ be set to 2, and b be set to 1.

[0040]

[0041] By using the adaptive calculation method of Equation (5), we can avoid manually adjusting the parameters and directly calculate the most suitable K, thereby fully integrating the interaction information between channels and optimizing the detection performance of the network without the need for cross-validation.

[0042] Furthermore, in practical applications, such as Figure 2 As shown, the improved CBAM module includes a global average pooling layer, a max pooling layer, an average pooling layer, a convolutional layer, a sigmoid activation function layer, a first matrix multiplication module, a second matrix multiplication module, and an ECA-Net layer that takes the feature map as input and the corresponding channel domain attention feature map as output. The input of the global average pooling layer constitutes the input of the improved CBAM module, which receives the feature map F. The output of the global average pooling layer is connected to the input of the ECA-Net layer, which processes the received feature map F to obtain the corresponding channel domain attention feature map M. c (F) Output is performed; the output of the ECA-Net layer and the input of the improved CBAM module are connected to the input of the first matrix multiplication module, which then performs the output on the channel domain attention feature map M. c (F) The feature map F is processed by matrix multiplication to obtain the result. The output of the first matrix multiplication module is sequentially connected to a max pooling layer, an average pooling layer, a convolutional module, and a sigmoid activation function layer to obtain the spatial attention feature M corresponding to the feature map. s (F), simultaneously, the output of the first matrix multiplication module and the output of the Sigmoid activation function layer are connected to the input of the second matrix multiplication module, which then performs the multiplication based on the received spatial attention features M. s (F) and the processing result output by the first matrix multiplication module are subjected to matrix multiplication to obtain the updated feature map and output.

[0043] Step B. Determine whether there is a suspected person in the current image whose sleep detection results match the sleeping posture. If yes, obtain the current suspected sleep detection boxes in the current image and proceed to step C; otherwise, the processing of the current image ends.

[0044] Step C. Determine whether there is a previous frame image that is adjacent to the current image and contains a suspected sleep detection box in the historical time direction. If yes, proceed to step D; otherwise, for each suspected sleep detection box in the current image, mark it with a tracking mark that is different from the already marked tracking mark and is different from each other. This constitutes the tracking mark corresponding to each suspected sleep detection box in the current image. Then proceed to step E.

[0045] Step D. Based on the tracking markers corresponding to each suspected sleep detection box in the previous frame image, and considering the historical time direction, track and detect each suspected sleep detection box in the current image according to the two-level detection design of feature matching and detection box matching.

[0046] Specifically, for each currently suspected sleep detection frame that is successfully tracked, the corresponding currently suspected sleep detection frame is marked with each successfully tracked tracking mark, forming the tracking mark corresponding to each currently suspected sleep detection frame; for each currently suspected sleep detection frame that is not successfully tracked, a tracking mark that is different from the already marked tracking mark and is different from each other is marked, forming the tracking mark corresponding to each currently suspected sleep detection frame; then proceed to step E.

[0047] In practical applications, step D above is specifically executed in three stages. In the first stage, the Kalman filter in the Deepsort network is applied to obtain the suspected sleep detection boxes in the previous frame image and their corresponding suspected sleep prediction boxes in the current image.

[0048] Next, a pre-trained feature extraction model for extracting preset features in the image is applied to extract preset feature vectors for each suspected sleep prediction box, and simultaneously extract preset feature vectors for each current suspected sleep detection box in the currently captured image.

[0049] Then, by using the preset type distance between the pairwise feature vectors of each suspected sleep prediction box and each current suspected sleep detection box, it is determined whether the distance result is less than a preset similarity threshold. If it is, it means that the corresponding suspected sleep prediction box and the current suspected sleep detection box have been initially successfully tracked; otherwise, it means that the corresponding suspected sleep prediction box and the current suspected sleep detection box have not been successfully tracked.

[0050] In the second stage, for the suspected sleep prediction box and the current suspected sleep detection box that have been initially tracked successfully, it is determined whether the distance between the center position of the suspected sleep prediction box and the center position of the current suspected sleep detection box is less than a preset distance threshold. If yes, it means that the intermediate tracking between the corresponding suspected sleep prediction box and the current suspected sleep detection box is successful; otherwise, it means that the tracking between the corresponding suspected sleep prediction box and the current suspected sleep detection box is unsuccessful.

[0051] In the third stage, for the suspected sleep prediction box and the current suspected sleep detection box that were successfully tracked in the intermediate stage, if the difference between the length of the suspected sleep prediction box and the length of the current suspected sleep detection box is less than a preset length difference threshold, and the difference between the width of the suspected sleep prediction box and the width of the current suspected sleep detection box is less than a preset width difference threshold, then it means that the corresponding suspected sleep prediction box and the current suspected sleep detection box were finally successfully tracked; otherwise, it means that the corresponding suspected sleep prediction box and the current suspected sleep detection box were not successfully tracked.

[0052] Step E. Based on the tracking markers corresponding to each suspected sleep detection box in the currently captured image, determine whether the number of suspected sleep detection boxes corresponding to each tracking marker reaches a preset judgment threshold. If yes, determine that the person corresponding to the tracking marker is in a sleep state, that is, the person corresponding to the tracking marker in the currently captured image is in a sleep state; otherwise, no judgment is made.

[0053] The above design incorporates all technical solutions. TensorRT is then used to optimize and accelerate the inference process of the deep learning model, while simultaneously improving GPU utilization. TensorRT analyzes the structure of the YOLOv model and optimizes its specific computational flow, such as removing redundant computations, fusion operations, precision conversion, and memory allocation, thereby improving the model's inference speed and accuracy. The optimized model is then deployed on NVIDIA GPUs, fully utilizing the GPU's parallel computing capabilities to further enhance inference speed and throughput. TensorRT processes multiple images simultaneously, reducing the number of forward computations and further improving inference speed.

[0054] Finally, the Yolov5 and Deepsort tracking code were integrated and improved, and a detection process was developed. Experiments were conducted on different sleep monitoring datasets to evaluate the algorithm. Specific evaluation content included: 1. Dataset partitioning: The sleep monitoring dataset was divided into training, validation, and test sets according to a certain ratio for model training, tuning, and testing. 2. Performance metrics: Multiple performance metrics were selected to evaluate the performance of the detection algorithm, such as mean AP, precision, recall, and F1 score. 3. Experimental comparison: The patented method was compared with other sleep monitoring detection algorithms, including traditional feature extraction-based methods and deep learning-based methods. The same evaluation metrics and datasets were used for comparison to ensure a fair comparison. 4. Experimental result analysis: Through experimental result analysis, the detection accuracy and efficiency of the patented method under different sleep monitoring states, as well as its applicability and generalization ability on different datasets, were evaluated.

[0055] The aforementioned technical solution, based on an improved YOLOv5 sleep detection method, constructs a suspected sleep detection model using an improved YOLOv5 model structure incorporating an enhanced CBAM attention mechanism. It then employs a two-tier detection design—feature matching and bounding box matching—to track and detect continuously suspected sleepers. Finally, based on the continuity of suspected sleepers, it achieves accurate detection of individuals in a sleep-on-duty state. This design effectively improves the accuracy and robustness of sleep detection through improvements to the YOLOv5 model and the Deepsort network, demonstrating broad application prospects in enhancing workplace safety and improving corporate efficiency.

[0056] The embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the present invention is not limited to the above embodiments. Within the scope of knowledge possessed by those skilled in the art, various changes can be made without departing from the spirit of the present invention.

Claims

1. A sleep deprivation detection method based on an improved YOLOv5, characterized in that: Based on the real-time captured images, the following steps A to E are executed in real time to detect sleeping persons in the captured images; Step A. Based on the improved YOLOv5 model structure that combines the improved CBAM attention mechanism, a suspected sleep detection model is pre-trained, which takes the image as input and outputs the suspected sleep detection results of each preset sleep posture in the image. The model is then analyzed to obtain the suspected sleep detection results in the image. The suspected sleep detection results include whether there is a suspected person in the image who meets the sleep posture, and the suspected sleep detection box of the suspected person in the image. Then proceed to Step B. Step B. Determine whether there is a suspected person in the current image whose sleep detection results match the sleep posture. If yes, obtain the current suspected sleep detection boxes in the current image and proceed to step C; otherwise, the processing of the current image ends. Step C. Determine whether there is a previous frame image that is adjacent to the current image and contains a suspected sleep detection box in the historical time direction. If yes, proceed to step D; otherwise, for each suspected sleep detection box in the current image, mark it with a tracking mark that is different from the already marked tracking mark and is different from each other. This constitutes the tracking mark corresponding to each suspected sleep detection box in the current image. Then proceed to step E. Step D. Based on the tracking markers corresponding to each suspected sleep detection box in the previous frame image, and considering the historical time direction, according to each consecutive suspected sleep detection box corresponding to each tracking marker, a two-level detection design of sequential feature matching and detection box matching is used to track and detect each current suspected sleep detection box in the current image. Specifically, for each currently suspected sleep detection frame that has been successfully tracked, the corresponding currently suspected sleep detection frame is marked with each successfully tracked tracking mark, thus forming the tracking mark corresponding to each currently suspected sleep detection frame. For each suspected sleep detection frame that was not successfully tracked, a tracking mark that is different from the tracking mark that has been marked and is different from each other is marked, thus forming the tracking mark corresponding to each suspected sleep detection frame; Then proceed to step E; Step E. Based on the tracking markers corresponding to each suspected sleep detection box in the currently captured image, determine whether the number of suspected sleep detection boxes corresponding to each tracking marker reaches a preset judgment threshold. If yes, determine that the person corresponding to the tracking marker is in a sleep-on-duty state, that is, the person corresponding to the tracking marker in the currently captured image is in a sleep-on-duty state; otherwise, no judgment is made. Based on the improved CBAM module connected in series between the first Conv layer and the first C3 layer in the backbone network of the YOLOv5 model in the direction of data flow, an improved YOLOv5 model structure combining the improved CBAM attention mechanism is formed. The improved CBAM module includes a global average pooling layer, a max pooling layer, an average pooling layer, a convolutional layer, a sigmoid activation function layer, a first matrix multiplication module, a second matrix multiplication module, and an ECA-Net layer that takes feature maps as input and outputs attention feature maps of the corresponding channel domains of the feature maps. The input of the global average pooling layer constitutes the input of the improved CBAM module, which is used to receive feature maps. The output of the global average pooling layer is connected to the input of the ECA-Net layer, and the ECA-Net layer processes the received feature maps. Processing is performed to obtain feature maps. Corresponding channel domain attention feature map The output of the ECA-Net layer and the input of the improved CBAM module are connected to the input of the first matrix multiplication module. The first matrix multiplication module then performs the output on the channel domain attention feature map. Feature map The matrix multiplication process is performed to obtain the result. The output of the first matrix multiplication module is sequentially connected to a max pooling layer, an average pooling layer, a convolutional module, and a sigmoid activation function layer to obtain the spatial attention features corresponding to the feature map. Simultaneously, the output of the first matrix multiplication module and the output of the Sigmoid activation function layer are connected to the input of the second matrix multiplication module, which then processes the received spatial attention features. The updated feature map is obtained by performing matrix multiplication on the processing results of the first matrix multiplication module and then output.

2. The sleep deprivation detection method based on improved YOLOv5 according to claim 1, characterized in that: The suspected sleep detection model is based on a dataset consisting of a preset number of sample images, each frame of which is labeled with a local image of a person's sleep state corresponding to a preset sleep posture. The dataset is expanded and updated using the Mosaic data augmentation method. The sample images are used as input, and the detection boxes of the local images of the person's sleep state in the sample images are used as output. The model is trained on an improved Yolov5 network structure to obtain the suspected sleep detection model.

3. The sleep deprivation detection method based on improved YOLOv5 according to claim 2, characterized in that, The following loss function is used in the process of training the improved Yolov5 network structure to obtain the suspected sleep detection model: ; in, The location of the center point of the local image detection box for a person's sleep state that conforms to a sleep posture in the sample image. The location of the center point of the ground truth bounding box of a person in a sleeping posture within the sample image. It is the diagonal distance of the smallest rectangular region enclosed by both the ground truth bounding box and the detection bounding box in the sample image that simultaneously contains a local image of a person in a sleeping posture. , Represents the local image detection box of a person's sleep state that meets the sleep posture in the sample image. With real frame The intersection and union ratio, This indicates the preset index. This is used to measure the degree of overlap between the ground truth bounding boxes and detection boxes in a local image of a person in a sleeping posture within a sample image. , , Let be the width of the local image detection box for people in a sleeping posture within the sample image. Let be the width of the ground truth bounding box of the local image of a person in a sleeping posture within the sample image, and let h be the height of the detection bounding box of the local image of a person in a sleeping posture within the sample image. Let be the height of the ground truth bounding box of a local image of a person in a sleeping posture within the sample image. Represents the local image detection box of a person's sleep state that meets the sleep posture in the sample image. Center point and true frame The Euclidean distance between the center points.

4. The sleep deprivation detection method based on improved YOLOv5 according to claim 1, characterized in that: In step D, the tracking and detection first stage involves applying the Kalman filter in the Deepsort network to obtain the suspected sleep prediction boxes in the current image for each suspected sleep detection box in the previous frame. Next, a pre-trained feature extraction model for extracting preset features from the image is applied to extract preset feature vectors from each suspected sleep prediction box. Simultaneously, extract the preset feature vectors of each currently suspected sleep detection box in the currently captured image; Then, by using the preset type distance between the pairwise feature vectors of each suspected sleep prediction box and each current suspected sleep detection box, it is determined whether the distance result is less than the preset similarity threshold. If it is, it means that the corresponding suspected sleep prediction box and the current suspected sleep detection box have been initially tracked successfully. Otherwise, it means that the corresponding suspected sleep prediction box and the current suspected sleep detection box have not been tracked successfully. In the second stage, for the suspected sleep prediction box and the current suspected sleep detection box that have been initially tracked successfully, it is determined whether the distance between the center position of the suspected sleep prediction box and the center position of the current suspected sleep detection box is less than a preset distance threshold. If yes, it means that the intermediate tracking between the corresponding suspected sleep prediction box and the current suspected sleep detection box is successful; otherwise, it means that the tracking between the corresponding suspected sleep prediction box and the current suspected sleep detection box is unsuccessful. In the third stage, for the suspected sleep prediction box and the current suspected sleep detection box that were successfully tracked in the intermediate stage, if the difference between the length of the suspected sleep prediction box and the length of the current suspected sleep detection box is less than a preset length difference threshold, and the difference between the width of the suspected sleep prediction box and the width of the current suspected sleep detection box is less than a preset width difference threshold, then it means that the corresponding suspected sleep prediction box and the current suspected sleep detection box were finally successfully tracked; otherwise, it means that the corresponding suspected sleep prediction box and the current suspected sleep detection box were not successfully tracked.