Fatigue driving detection method, electronic device, and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By acquiring ambient light data and vehicle bus data from the video stream in real time, dynamically adjusting the illumination compensation factor and threshold, extracting the mask of the region of interest, and using a large language model for semantic understanding, the problem of low accuracy and high false alarm rate in fatigue driving detection in existing technologies is solved, achieving higher detection accuracy and safety.

CN122244841APending Publication Date: 2026-06-19JIANGSU MANYUN LOGISTICS INFORMATION CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: JIANGSU MANYUN LOGISTICS INFORMATION CO LTD
Filing Date: 2026-04-20
Publication Date: 2026-06-19

Application Information

Patent Timeline

20 Apr 2026

Application

19 Jun 2026

Publication

CN122244841A

IPC: G06V20/59; G06V20/40; G06V40/16; G06V10/25; G06V10/764; G06N5/04

AI Tagging

Application Domain

Character and pattern recognition Inference methods

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing fatigue driving detection methods rely on massive amounts of labeled data, making it difficult to capture the subtle characteristics of early fatigue and distinguish between similar actions. Furthermore, background subtraction methods are sensitive to lighting conditions and are prone to generating noise, resulting in low detection accuracy and a high false alarm rate.

Method used

By acquiring ambient light data and vehicle bus data from the video stream in real time, dynamically adjusting the illumination compensation factor and threshold, extracting the mask of the region of interest, and combining it with a large language model for semantic understanding, a fatigue driving detection result described in natural language is constructed.

Benefits of technology

It improved the accuracy of fatigue driving detection, reduced the false alarm rate, and ensured driving safety.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122244841A_ABST

Patent Text Reader

Abstract

This application provides a fatigue driving detection method, electronic device, and storage medium, relating to the field of image processing technology. The method includes: real-time acquisition of a video stream containing multiple consecutive image frames, ambient light data corresponding to each image frame, and vehicle bus data; performing mask extraction on each image frame: extracting the region of interest; determining a first threshold and an illumination compensation factor based on the ambient light data of the current frame and the previous frame; obtaining an initial difference map by differencing the region of interest with the target background frame, and obtaining a final difference map by weighting it with the illumination compensation factor; performing binarization processing on the final difference map according to the first threshold to obtain a binarized foreground mask; obtaining a foreground mask sequence based on the binarized foreground mask within a preset time period; constructing a prompt word template based on the sequence, ambient light data, and vehicle bus data; and using a large language model to obtain fatigue driving detection results, thereby improving the accuracy of fatigue driving detection, reducing the false alarm rate, and ensuring driving safety.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing technology, and in particular to a fatigue driving detection method, electronic device, and storage medium. Background Technology

[0002] Fatigue driving is one of the main causes of serious traffic accidents. Currently, fatigue driving is mainly detected by recognizing the driver's facial features.

[0003] In related technologies, deep learning methods or background subtraction methods are typically used to identify the driver's facial features. Deep learning methods utilize convolutional neural networks such as YOLO and ResNet to directly classify the driver's facial images captured by the camera, detecting whether the driver's eyes are open or closed, and whether they are yawning, thereby determining whether the driver is fatigued. Background subtraction methods use algorithms such as ViBe, GMM, or SuBSENSE to extract moving foreground elements from the driver's images captured by the camera. Based on the moving foreground, indicators such as the percentage of eyelid closure over the pupil (PERCLOS) are calculated to determine whether the driver is fatigued.

[0004] However, deep learning methods rely on massive amounts of labeled data, making it difficult to capture the subtle features of early fatigue and unable to distinguish between similar actions (such as opening the mouth while singing versus opening the mouth while yawning). Background subtraction methods, on the other hand, are sensitive to lighting conditions and prone to noise; they rely solely on pixel differences to determine whether a driver is fatigued, lacking semantic understanding. Therefore, current fatigue detection methods are inaccurate, prone to false alarms, and compromise driving safety. Summary of the Invention

[0005] This application provides a fatigue driving detection method, electronic device, and storage medium to address the problems of deep learning methods relying on massive amounts of labeled data, making it difficult to capture the subtle features of early fatigue, and failing to distinguish between similar actions (such as opening the mouth while singing versus opening the mouth while yawning). Furthermore, background subtraction methods are sensitive to lighting and prone to noise; and relying solely on pixel differences to determine fatigue driving lacks semantic understanding. This application aims to improve the accuracy of fatigue driving detection, reduce false alarm rates, and ensure driving safety.

[0006] Firstly, this application provides a method for detecting driver fatigue, including: Real-time acquisition of video streams, which include multiple consecutive image frames, each including the driver's face; acquisition of ambient light data and vehicle bus data corresponding to each image frame; Upon acquiring each image frame, the following mask extraction operation is performed on the image frame: Extract the region of interest (ROI) from the image frame; the ROI includes the eye region, mouth region, and head region; Based on the ambient light data corresponding to the image frame and the ambient light data corresponding to the previous image frame, determine the first threshold and illumination compensation factor corresponding to the image frame. An initial difference map is obtained based on the region of interest in the image frame and the target background frame; the target background frame is updated based on the illumination state of the previous image frame. The initial difference map is weighted according to the illumination compensation factor to obtain the final difference map; Based on the first threshold, the final difference map is binarized to obtain the binarized foreground mask corresponding to the image frame; the binarized foreground mask includes the eye region mask, the mouth region mask, and the head region mask; When the acquired video stream meets the preset duration, a foreground mask sequence is obtained based on the binarized foreground mask corresponding to each image frame in the video stream; Based on the foreground mask sequence, ambient light data and vehicle bus data corresponding to each image frame acquired within a preset time period, a prompt word template is constructed; Input the prompt word template into the preset large language model to obtain the fatigue driving detection results within the preset time period.

[0007] In one possible design, ambient light data includes illumination intensity; based on the ambient light data corresponding to the image frame and the ambient light data corresponding to the previous image frame, a first threshold and illumination compensation factor corresponding to the image frame are determined, including: The amount of change in illumination is determined based on the illumination intensity corresponding to the image frame and the illumination intensity corresponding to the previous image frame. The illumination state of the image frame is determined based on the amount of illumination change and the second threshold; the illumination state includes illumination change state and illumination stability state. Based on the illumination state of the image frame, determine the first threshold and illumination compensation factor corresponding to the image frame.

[0008] In one possible design, the second threshold is determined based on the illumination intensity of the previous image frame, and the second threshold satisfies Formula 1. Formula 1 is as follows: ; in, The second threshold, As the first weight, Let be the illumination intensity of the image frame preceding the image frame at time t.

[0009] In one possible design, a first threshold and an illumination compensation factor corresponding to the image frame are determined based on the illumination state of the image frame, including: When the illumination state of the image frame is a sudden illumination change, the first threshold is determined to satisfy Formula 2, and the illumination compensation factor is determined to satisfy Formula 3. Formula 2 is as follows: ;in, Let 'a' be the first threshold and 'a' be the second weight, where a > 1. The baseline threshold; Formula 3 is as follows: ;in, Let be the illumination compensation factor for the image frame at time t. Let be the illumination compensation factor of the image frame preceding the image frame at time t, e be the base of the natural logarithm, and α be the third weight. The difference between the illumination intensity of the image frame at time t and the illumination intensity of the image frame preceding the image frame at time t; When the illumination state of the image frame is in a stable illumination state, the first threshold is determined to be equal to the reference threshold, and the illumination compensation factor is determined to be equal to 1.

[0010] In one possible design, the final difference map is binarized based on a first threshold to obtain a binarized foreground mask corresponding to the image frame, including: For a pixel in the final difference map, if the difference value of the pixel is greater than the first threshold, the pixel is determined to be the foreground and its value is assigned to 1. When the difference value of a pixel is less than or equal to the first threshold, the pixel is identified as background and its value is set to 0. The values of all pixels in the final difference map are assigned to obtain the binarized foreground mask corresponding to the image frame.

[0011] In one possible design, the method also includes: When the illumination state of an image frame is in a sudden illumination change state, the background update rate corresponding to the image frame is determined to be 0. When the illumination state of the image frame is stable, the background update rate corresponding to the image frame is determined to be the default decay value; When the background update rate corresponding to the image frame is the default decay value, the target background frame is updated according to the region of interest in the image frame using Formula 4. Formula four is: ;in, For the updated target background frame, This represents the background update rate corresponding to the image frame. For the target background frame, The region of interest in the image frame; When the background update rate corresponding to the image frame is 0, the target background frame is not updated.

[0012] In one possible design, a cue word template is constructed based on the foreground mask sequence, ambient light data corresponding to each image frame in the video stream, and vehicle bus data, including: Based on the eye region mask of each binarized foreground mask in the foreground mask sequence, calculate the duration and frequency of eye closure within a preset time period; Based on the mouth region mask of each binarized foreground mask in the foreground mask sequence, calculate the mouth opening amplitude and mouth opening frequency within a preset time period; Based on the head region mask of each binarized foreground mask in the foreground mask sequence, calculate the nodding frequency and nodding amplitude within a preset time period; Based on the duration of eye closure, frequency of eye closure, amplitude of mouth opening, frequency of mouth opening, frequency of head nodding, amplitude of head nodding, and ambient light data and vehicle bus data corresponding to each image frame acquired within a preset time period, a prompt word template is constructed using a preset template.

[0013] In one possible design, fatigue driving detection results include fatigue confidence, fatigue type, and natural language interpretation.

[0014] The method provided in the first aspect acquires a video stream in real time, which includes multiple consecutive image frames. Ambient light data and vehicle bus data corresponding to each image frame are acquired, thereby establishing a correspondence between the image frames and the ambient light data and vehicle bus data, providing a foundation for illumination adaptation and semantic reasoning. When each image frame is acquired, a mask extraction operation is performed on the image frame. The mask extraction operation includes extracting the region of interest in the image frame, which facilitates focusing the detection on key areas such as the eyes, mouth, and head, and reduces interference from irrelevant backgrounds. Based on the ambient light data corresponding to the image frame and the ambient light data corresponding to the previous image frame, a first threshold and illumination compensation factor are determined for the image frame to facilitate adaptive dynamic parameter adjustment based on different illumination levels. An initial difference map is obtained based on the region of interest and the target background frame in the image frame. This initial difference map is then weighted according to the illumination compensation factor to obtain a final difference map, ensuring that the final difference map reflects the driver's true motion characteristics under different illumination conditions. Based on the first threshold, the final difference map is binarized to obtain a binarized foreground mask corresponding to the image frame, thereby accurately extracting the motion regions of the eyes, mouth, and head. When the acquired video stream meets preset conditions... Over a long period, based on the binarized foreground mask corresponding to each image frame in the video stream, a foreground mask sequence is obtained. Based on the foreground mask sequence, ambient light data and vehicle bus data corresponding to each image frame acquired within a preset time period, a prompt word template is constructed. The underlying visual features are transformed into natural language descriptions that can be understood by the large language model to facilitate reasoning. The prompt word template is input into the preset large language model, thereby utilizing the common sense reasoning and semantic understanding capabilities of the large language model to perform contextual consistency verification and behavioral intent disambiguation on eye, mouth, and head movement features, and obtain fatigue driving detection results in natural language form within a preset time period, thereby improving the accuracy of fatigue driving detection, reducing the false alarm rate, and ensuring driving safety.

[0015] Secondly, this application provides a fatigue driving detection device, comprising: a module for performing the fatigue driving detection method in the first aspect and any possible design of the first aspect.

[0016] Thirdly, this application provides an electronic device including a first processor, which, when executing a computer-executable program or instructions in a memory, implements a fatigue driving detection method as described in the first aspect and any possible design of the first aspect.

[0017] Fourthly, this application provides an electronic device including at least one memory and at least one second processor. The memory stores a computer-executable program or instructions, and the second processor executes the computer-executable program or instructions to implement the fatigue driving detection method as described in the first aspect and any possible design of the first aspect.

[0018] Fifthly, this application provides a computer-readable storage medium storing a computer-executable program or instructions, which, when executed by a processor, implement the fatigue driving detection method as described in the first aspect and any possible design of the first aspect.

[0019] Sixthly, this application provides a computer program product, comprising: execution instructions stored in a readable storage medium, at least one processor of an electronic device being able to read the execution instructions from the readable storage medium, and the at least one processor executing the execution instructions causing the electronic device to implement the fatigue driving detection method as described in the first aspect and any possible design of the first aspect.

[0020] The above description is merely an overview of the technical solutions of the embodiments of this application. In order to better understand the technical means of the embodiments of this application and to implement them in accordance with the contents of the specification, and to make the above and other objects, features and advantages of the embodiments of this application more obvious and understandable, specific implementation methods of this application are described below. Attached Figure Description

[0021] Figure 1 This is a flowchart of a fatigue driving detection method provided in an embodiment of this application.

[0022] Figure 2 This is a flowchart illustrating a method for determining a first threshold and an illumination compensation factor, provided as an embodiment of this application.

[0023] Figure 3 This is a flowchart illustrating a method for obtaining a binarized foreground mask according to an embodiment of this application.

[0024] Figure 4 This is a flowchart illustrating a method for constructing a prompt word template, as provided in one embodiment of this application.

[0025] Figure 5 This is a schematic diagram of a fatigue driving detection device provided in an embodiment of this application.

[0026] Figure 6 A schematic diagram of the structure of an electronic device provided in an embodiment of this application. Figure 1 .

[0027] Figure 7 A schematic diagram of the structure of an electronic device provided in an embodiment of this application. Figure 2 . Detailed Implementation

[0028] In this application, "at least one" means one or more, and "more than one" means two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can mean: A alone, A and B simultaneously, or B alone, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c alone can mean: a alone, b alone, c alone, a combination of a and b, a combination of a and c, a combination of b and c, or a, b, and c, where a, b, and c can be single or multiple. Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance.

[0029] The terms “center,” “longitudinal,” “lateral,” “up,” “down,” “left,” “right,” “front,” and “rear,” etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are used only for the convenience of describing this application and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on this application.

[0030] The terms "connected" and "connected" should be interpreted broadly. For example, in circuit structures, "connected" or "connected" can refer not only to physical connections but also to electrical or signal connections. This could be a direct connection (physical connection) or an indirect connection via at least one intermediate component, as long as the circuit is connected. It could also refer to the internal connection between two components. Similarly, a signal connection can refer to a connection via a circuit or a medium, such as radio waves. Those skilled in the art will understand the specific meaning of these terms in this application based on the specific circumstances.

[0031] For example, this application provides a fatigue driving detection method, electronic device, and storage medium. By performing a mask extraction operation on image frames including the driver's face, an illumination compensation factor is introduced during the mask extraction process to compensate for changes in driver facial features under abrupt illumination changes. The threshold used for binarization processing is dynamically adjusted based on changes in illumination, ensuring that the extracted binarized foreground mask accurately represents the driver's facial features. Furthermore, multiple binarized foreground masks within a preset time period form a foreground mask sequence, which accurately represents changes in the driver's facial features within the preset time period. A prompt word template is constructed based on the foreground mask sequence, and a large language model is used to obtain fatigue driving detection results in natural language form, thereby improving the accuracy of fatigue driving detection, reducing the false alarm rate, and ensuring driving safety.

[0032] The fatigue driving detection method provided in this application can be executed by an electronic device or by a fatigue driving detection device (hereinafter referred to as the detection device) in an electronic device.

[0033] Among them, electronic devices can be servers, desktop computers, mobile phones, tablets, laptops, wearable devices, in-vehicle devices, augmented reality (AR) / virtual reality (VR) devices, etc.

[0034] The detection device can be implemented through a combination of software and / or hardware. For example, the detection device can be an application (APP), a webpage, or a public account. For the sake of simplicity, this application embodiment uses the execution of a detection device as an example for explanation.

[0035] Below, in conjunction with Figures 1 to 4 The fatigue driving detection method provided in the embodiments of this application will be described.

[0036] Please see Figure 1 , Figure 1 This is a flowchart illustrating a fatigue driving detection method according to an embodiment of this application. Figure 1 As shown, the method includes: S101. The detection device acquires a video stream in real time. The video stream includes multiple consecutive image frames. It acquires the ambient light data and vehicle bus data corresponding to each image frame.

[0037] Each image frame includes the driver's face.

[0038] The detection device can acquire video streams through an onboard camera installed inside the vehicle. The onboard camera can capture the driver's face in real time, obtaining image frames. Multiple consecutive image frames form a video stream, which the onboard camera can directly send to the detection device.

[0039] In some examples, the video stream can be a visible light video stream, a near-infrared light video stream, or a hybrid of visible and near-infrared light video streams. The vehicle-mounted camera supports both visible light and near-infrared light modes, automatically switching modes based on the ambient light intensity to acquire the corresponding type of video stream. The visible light mode is suitable for daytime conditions, while the near-infrared light mode is suitable for nighttime and low-light environments, ensuring that the image frames acquired by the vehicle-mounted camera clearly depict the driver's face under different lighting conditions.

[0040] While acquiring image frames, the detection device also needs to acquire ambient light data and vehicle bus data corresponding to each image frame.

[0041] Ambient light data includes light intensity, color temperature, etc.

[0042] The vehicle bus data includes vehicle speed, steering wheel angle, and continuous driving duration.

[0043] Based on this, the detection device acquires image frames, as well as the ambient light data and vehicle bus data corresponding to the image frames. It associates the ambient light data and vehicle bus data with the image frames to facilitate mask extraction of the image frames based on dynamic changes in illumination. At the same time, it provides rich contextual information for the large language model, which is convenient for fatigue driving recognition.

[0044] S102. When the detection device acquires an image frame, it performs a mask extraction operation on the image frame.

[0045] The mask extraction operation includes the following steps S1021 to S1025: S1021. The detection device extracts the region of interest in the image frame.

[0046] The regions of interest include the eye area, mouth area, and head area. The eye area can be used to detect eye closure, eyelid tremors, etc. The mouth area can be used to detect yawning, speaking, and singing, etc. The head area is used to detect nodding, shaking, etc. By using these three areas, it is possible to determine whether the driver is fatigued.

[0047] In some examples, the detection device uses a lightweight facial landmark detection algorithm to extract regions of interest from image frames.

[0048] Specifically, the detection device uses algorithms to locate the eye contour, mouth contour, and head contour in the image frame. A contour refers to a line connecting feature points. For example, the eye contour is the line connecting feature points along the edges of the upper and lower eyelids. The detection device determines the eye region, mouth region, and head region based on the eye contour, mouth contour, and head contour, respectively. Taking the eye contour as an example, the detection device expands outwards from the eye contour as the center to obtain a region of a preset shape; this region is the eye region. The preset shape may be, for example, an ellipse or a rectangle.

[0049] Among them, a lightweight facial landmark detection algorithm is the MobileFaceNet algorithm.

[0050] Based on this, the detection device can identify the regions in the image frame that are related to fatigue driving detection, thereby focusing on the region of interest during mask extraction, suppressing noise introduced by irrelevant regions, and reducing the amount of computation.

[0051] S1022. The detection device determines the first threshold and illumination compensation factor corresponding to the image frame based on the ambient light data corresponding to the image frame and the ambient light data corresponding to the previous image frame.

[0052] The first threshold is used for binarization.

[0053] The illumination compensation factor is used for weighting the difference map.

[0054] The detection device can calculate the change in ambient light data based on the ambient light data corresponding to the image frame and the ambient light data corresponding to the previous image frame. Based on the change, it can determine the illumination state of the image frame (e.g., sudden illumination change or stable illumination) and dynamically adjust the first threshold and illumination compensation factor based on the illumination state of the image frame.

[0055] For example, when a sudden change in illumination is detected in an image frame, the detection device can increase the first threshold and the illumination compensation factor. This allows it to resist noise caused by the illumination change during binarization and amplify the difference value when weighting the difference map based on the illumination compensation factor, thus adapting to illumination changes. When the illumination of the image frame is stable, the detection device can use the default first threshold for binarization and use normalized weights when weighting the difference map based on the illumination compensation factor, without changing the difference value.

[0056] Therefore, by dynamically adjusting the first threshold and illumination compensation factor using the ambient light data of the image frame and the previous image frame, the mask extraction process can adapt to both sudden changes in illumination and stable illumination, thereby improving the quality of mask extraction and ensuring the complete and accurate preservation of the driver's facial features.

[0057] S1023. The detection device obtains an initial difference map based on the region of interest in the image frame and the target background frame.

[0058] The target background frame refers to the scene model of the driver's seat when there is no movement. The target background frame is sourced from a background model library. This library contains multiple different background frames, each corresponding to an image at a different location. The target background frame includes the driver's seat with the stationary driver. By comparing the region of interest with the target background frame, it can be determined whether the driver is moving, such as blinking or opening their mouth.

[0059] The target background frame is updated based on the lighting state of the previous image frame.

[0060] Considering that the background frame is affected by different lighting conditions, the detection device can update the target background frame based on the lighting state of the previous image frame. For example, during a sudden change in lighting, the target background frame is not updated; instead, the same target background frame as the previous image frame is used, thus preventing the light and shadow generated during the sudden lighting change from being misjudged as moving targets. When the lighting is stable, the detection device can update the target background frame using the current image frame, allowing the target background frame to adapt to slow changes in the scene, such as gradual changes in lighting or minor adjustments in the driver's posture.

[0061] The detection device can calculate the difference between the pixel value of the region of interest in the image frame and the pixel value of the corresponding position in the target background frame by performing pixel-level subtraction operations, thereby obtaining an initial difference map.

[0062] Based on this, the detection device obtains an initial difference map, thereby obtaining information on changes in the driver's face, which is then used for mask extraction.

[0063] S1024. The detection device performs weighted processing on the initial difference map according to the illumination compensation factor to obtain the final difference map.

[0064] Considering that the differences in pixel values in the initial difference map have different meanings under different lighting conditions, for example, when the lighting changes abruptly, the change in ambient brightness will cause large spurious differences in the background area, while the differences of real moving targets, such as closed eyelids and open mouths, may be compressed or distorted due to the overall brightness change; while when the lighting is stable, the initial difference map can truly reflect the differences between the moving foreground and the background.

[0065] Therefore, the detection device can weight the initial difference map according to the illumination compensation factor to obtain the final difference map. The illumination compensation factor is determined based on the ambient light data corresponding to the image frame and the ambient light data corresponding to the previous image frame. When there is a sudden change in illumination, the detection device can increase the illumination compensation factor to amplify the initial difference map with weights, which can restore the difference amplitude of the real moving target and make it easier to segment during binarization. When the illumination is stable, the detection device can normalize the weights, that is, set the illumination compensation factor to 1, so as not to change the initial difference map and avoid introducing unnecessary amplification noise.

[0066] Based on this, the detection device performs weighted processing on the initial difference map according to the illumination compensation factor to obtain the final difference map, which can ensure that the difference map can reflect the driver's true motion characteristics under different illumination conditions.

[0067] S1025. The detection device performs binarization processing on the final difference map according to the first threshold to obtain the binarized foreground mask corresponding to the image frame.

[0068] Specifically, for each pixel in the final difference map, the detection device can compare the pixel value of the pixel with the first threshold to determine whether the pixel value is foreground or background, thereby obtaining the binarized foreground mask corresponding to the image frame.

[0069] The binarized foreground mask includes an eye region mask, a mouth region mask, and a head region mask. The eye region mask is determined based on the corresponding eye region in the final difference image; the mouth region mask is determined based on the corresponding mouth region in the final difference image; and the head region mask is determined based on the corresponding head region in the final difference image.

[0070] Based on this, the detection device can accurately extract the motion foreground of the eyes, mouth, and head, so as to statistically analyze the motion characteristics of these three regions.

[0071] S103. When the acquired video stream meets the preset duration, the detection device obtains a foreground mask sequence based on the binarized foreground mask corresponding to each image frame in the video stream.

[0072] The preset duration can be 10 seconds or 30 seconds, etc. The preset duration can be set according to requirements. The preset duration can also be understood as the duration for fatigue driving judgment. For example, if the preset duration is 10 seconds, the detection device can perform a fatigue driving detection every 10 seconds of video stream.

[0073] The detection device determines the binarized foreground mask corresponding to each image frame obtained within a preset time period, and stores each binarized foreground mask in chronological order to obtain a foreground mask sequence. The foreground mask sequence can provide the continuous motion characteristics of the driver within the preset time period, so as to facilitate the detection of fatigue driving based on the continuous motion characteristics.

[0074] S104. The detection device constructs a prompt word template based on the foreground mask sequence, ambient light data and vehicle bus data corresponding to each image frame acquired within a preset time period.

[0075] Considering that large language models cannot understand foreground mask sequences in numerical form, the detection device can statistically analyze motion features within a preset time period based on the foreground mask sequence. These features include blink count, longest single eye closure duration, mouth opening and closing frequency, yawning frequency, and nodding frequency. The detection device then fills a predefined text template with these motion statistical features, ambient light data for each image frame acquired within the preset time period, and vehicle bus data, thereby generating a prompt word template. This prompt word template is then used as input to the preset large language model, enabling it to understand the data and providing a basis for reasoning.

[0076] S105. The detection device inputs the prompt word template into the preset large language model to obtain the fatigue driving detection results within the preset time period.

[0077] Among them, the pre-set large language models are such as Qwen-1.8B-Int4 or Llama3-8B.

[0078] The results of fatigue driving detection can include fatigue confidence level, fatigue type, and natural language interpretation.

[0079] The fatigue confidence level refers to the probability of a driver being fatigued, as predicted by a pre-defined large language model. Fatigue types include, for example, microsleep, severe fatigue, and mild drowsiness.

[0080] For example, the fatigue driving detection result could be: Fatigue confidence level: 92%; Fatigue type: Severe fatigue; Natural language interpretation: The driver was detected to have prolonged eye closure (1.2 seconds) and low-frequency yawning late at night after driving continuously for 4.2 hours, which is consistent with the typical characteristics of physiological fatigue, and the vehicle speed was relatively high, posing an extremely high risk.

[0081] Based on this, the detection device can utilize the common sense reasoning and semantic understanding capabilities of large language models to perform contextual consistency verification and behavioral intent disambiguation on eye, mouth, and head movement features, thereby improving the accuracy of fatigue driving detection and reducing the false alarm rate.

[0082] In this embodiment, a video stream is acquired in real time, comprising multiple consecutive image frames. Ambient light data and vehicle bus data corresponding to each image frame are acquired, thereby establishing a correspondence between the image frames and the ambient light data and vehicle bus data, providing a foundation for illumination adaptation and semantic reasoning. When an image frame is acquired, a mask extraction operation is performed on the image frame. The mask extraction operation includes extracting the region of interest in the image frame, thereby facilitating the focus of detection on key areas such as the eyes, mouth, and head, and reducing interference from irrelevant backgrounds. Based on the ambient light data corresponding to the image frame and the ambient light data corresponding to the previous image frame, a first threshold and illumination compensation factor are determined for the image frame to facilitate adaptive dynamic parameter adjustment based on different illumination levels. An initial difference map is obtained based on the region of interest and the target background frame in the image frame. This initial difference map is then weighted according to the illumination compensation factor to obtain a final difference map, ensuring that the final difference map reflects the driver's true motion characteristics under different illumination conditions. Based on the first threshold, the final difference map is binarized to obtain a binarized foreground mask corresponding to the image frame, thereby accurately extracting the motion regions of the eyes, mouth, and head. When the acquired video stream meets preset conditions... Over a long period, based on the binarized foreground mask corresponding to each image frame in the video stream, a foreground mask sequence is obtained. Based on the foreground mask sequence, ambient light data and vehicle bus data corresponding to each image frame acquired within a preset time period, a prompt word template is constructed. The underlying visual features are transformed into natural language descriptions that can be understood by the large language model to facilitate reasoning. The prompt word template is input into the preset large language model, thereby utilizing the common sense reasoning and semantic understanding capabilities of the large language model to perform contextual consistency verification and behavioral intent disambiguation on eye, mouth, and head movement features, and obtain fatigue driving detection results in natural language form within a preset time period, thereby improving the accuracy of fatigue driving detection, reducing the false alarm rate, and ensuring driving safety.

[0083] Based on the above exemplary description, ambient light data includes light intensity. The detection device can... Figure 2 The method shown determines the first threshold and illumination compensation factor corresponding to the image frame.

[0084] Please see Figure 2 , Figure 2 This is a flowchart illustrating a method for determining a first threshold and an illumination compensation factor, provided as an embodiment of this application. Figure 2 As shown, the method includes: S201. The detection device determines the amount of light change based on the light intensity corresponding to the image frame and the light intensity corresponding to the previous image frame.

[0085] The illumination variation is the difference between the illumination intensity of the corresponding image frame and the illumination intensity of the preceding image frame. Illumination variation reflects the degree of fluctuation in ambient light between adjacent image frames.

[0086] S202. The detection device determines the illumination state of the image frame based on the amount of illumination change and the second threshold.

[0087] The illumination state includes both abrupt illumination changes and a stable illumination state.

[0088] Specifically, the detection device can determine whether the change in illumination is greater than a second threshold. If the change in illumination is greater than the second threshold, the detection device determines that the illumination state of the image frame is a sudden illumination change; if the change in illumination is less than or equal to the second threshold, the detection device determines that the illumination state of the image frame is a stable illumination state.

[0089] In some examples, the second threshold is determined based on the illumination intensity of the previous image frame, and the second threshold satisfies the following formula: Formula 1; in, The second threshold, As the first weight, Let be the illumination intensity of the image frame preceding the image frame at time t.

[0090] Here, k is, for example, 0.4.

[0091] Based on this, the detection device can determine the illumination state of an image frame through a dynamic threshold that adapts to illumination, thus adaptively linking the illumination state judgment to the current ambient light level. For example, in low-light environments, the second threshold is lowered accordingly, making it more sensitive to subtle changes in light; in high-light environments, the second threshold is raised accordingly, avoiding frequent triggering of abrupt change modes due to small fluctuations in natural light. Compared to a fixed threshold, the dynamic second threshold can balance the response sensitivity under different illumination conditions, reducing false positives and false negatives.

[0092] S203. The detection device determines the first threshold and illumination compensation factor corresponding to the image frame based on the illumination state of the image frame.

[0093] When the illumination state of the image frame is in a state of sudden illumination change, the detection device can determine that the first threshold satisfies the following formula: Formula 2; in, Let 'a' be the first threshold and 'a' be the second weight, where a > 1. This is the baseline threshold.

[0094] Based on this, by increasing the first threshold, the sensitivity to pixel differences can be reduced when performing binarization based on the first threshold, and the false noise caused by sudden changes in illumination can be filtered out, which helps to improve the quality of mask extraction.

[0095] When the illumination state of an image frame is in a state of sudden illumination change, the detection device can determine that the illumination compensation factor satisfies the following formula three: Formula 3; in, Let be the illumination compensation factor for the image frame at time t. Let be the illumination compensation factor of the image frame preceding the image frame at time t, e be the base of the natural logarithm, and α be the third weight. Let be the difference between the illumination intensity of the image frame at time t and the illumination intensity of the image frame preceding the image frame at time t.

[0096] As can be seen from Formula 3, the more drastic the change in illumination, the faster the illumination compensation factor increases. When the initial difference map is weighted based on the illumination compensation factor, the motion features in the initial difference map can be enhanced under the condition of sudden change in illumination. This makes it easier to segment the mask from the difference map during binarization, thereby further helping to improve the quality of mask extraction.

[0097] When the illumination state of the image frame is stable, the detection device can determine that the first threshold is equal to the reference threshold. Thus, when there is no significant change in ambient illumination, binarization can be achieved using the reference threshold.

[0098] When the illumination state of the image frame is stable, the detection device can determine that the illumination compensation factor is equal to 1. That is, no additional weighting is applied to the initial difference map, maintaining the sensitivity of conventional foreground extraction.

[0099] Based on this, the detection device can adaptively adjust the first threshold and the illumination compensation factor according to real-time illumination changes, thereby suppressing false foreground noise in scenarios with drastic illumination changes such as entering and exiting tunnels and oncoming headlights at night, while preserving the real micro-motion features of key areas such as eyes, mouth and head, so as to facilitate high-quality foreground mask extraction.

[0100] Based on the above exemplary description, the detection device can... Figure 3 The method shown yields the binarized foreground mask corresponding to the image frame.

[0101] Please see Figure 3 , Figure 3 This is a flowchart illustrating a method for obtaining a binarized foreground mask according to an embodiment of this application. Figure 3 As shown, the method includes: For a single pixel in the final difference map, execute steps S301 to S303: S301, The detection device determines whether the difference value of the pixel is greater than the first threshold.

[0102] When the difference value of a pixel is greater than the first threshold, the detection device executes S302; when the difference value of a pixel is less than or equal to the first threshold, the detection device executes S303.

[0103] S302. The detection device determines that the pixel is the foreground and assigns the value of the pixel to 1.

[0104] When the difference value of a pixel is greater than the first threshold, it indicates that the pixel belongs to a moving target, such as the eyelid area of a closed eye, the mouth area of an open eye, or the edge of a nodding head. The detection device can then identify the pixel as the foreground.

[0105] S303. The detection device determines that the pixel is the background and assigns the value of the pixel to 0.

[0106] When the difference value of a pixel is less than or equal to the first threshold, it indicates that the pixel belongs to a static background, such as unchanging facial skin or a seat, and the detection device can identify the pixel as the background.

[0107] Based on steps S301 to S303, the detection device assigns values to all pixels in the final difference map to obtain a binarized foreground mask corresponding to the image frame. In this mask, regions with a value of 1 represent detected moving foreground, and regions with a value of 0 represent background. Simultaneously, due to the adaptive adjustment of the first threshold and the illumination compensation factor, this mask effectively suppresses false noise during sudden changes in illumination while preserving the true micro-motion features of the eye, mouth, and head regions, thus improving the quality of mask extraction.

[0108] Based on the above exemplary description, the detection device can also update the target background frame before each time an initial difference map is obtained using the target background frame.

[0109] Specifically, the detection device first determines the background update rate corresponding to the image frame. When the illumination state of the image frame is in a sudden illumination change state, the detection device determines the background update rate corresponding to the image frame to be 0, that is, the background frame is not updated; when the illumination state of the image frame is in a stable illumination state, the detection device determines the background update rate corresponding to the image frame to be a default decay value, indicating that the background frame is allowed to update slowly.

[0110] Furthermore, when the background update rate corresponding to the image frame is the default decay value, the detection device updates the target background frame according to the region of interest in the image frame using the following formula four: Formula 4; in, For the updated target background frame, This represents the background update rate corresponding to the image frame. For the target background frame, This represents the region of interest in the image frame.

[0111] Based on this, the detection device updates the target background frame when the illumination is stable, so that the background frame can adapt to the slow changes in the environment and avoid errors in foreground mask extraction caused by an old background, thereby helping to improve the quality of mask extraction.

[0112] When the background update rate corresponding to the image frame is 0, the detection device does not update the target background frame. In this way, when there are sudden changes in illumination, background contamination caused by instantaneous changes in light and shadow can be avoided, thereby helping to improve the quality of mask extraction.

[0113] Based on the above exemplary description, the detection device can... Figure 4 The method shown is used to construct prompt word templates.

[0114] Please see Figure 4 , Figure 4 This is a flowchart illustrating a method for constructing a prompt word template according to an embodiment of this application. Figure 4 As shown, the method includes: S401. The detection device calculates the duration and frequency of eye closure within a preset time period based on the eye region mask of each binarized foreground mask in the foreground mask sequence.

[0115] The detection device can calculate the area of each eye region mask by using the eye region mask of each binarized foreground mask in the foreground mask sequence. When the area is less than a certain threshold, it is judged as closed eyes. By determining the duration of closed eyes and the number of times closed eyes occur per unit time, the duration of eye closure and the frequency of eye closure can be obtained.

[0116] S402. The detection device calculates the mouth opening amplitude and mouth opening frequency within a preset time period based on the mouth region mask of each binarized foreground mask in the foreground mask sequence.

[0117] The detection device can calculate the aspect ratio of each mouth region mask by using the mouth region mask of each binarized foreground mask in the foreground mask sequence. Based on the aspect ratio, the mouth opening amplitude can be directly obtained. Based on the change of aspect ratio over time, the mouth opening frequency can be obtained.

[0118] S403. The detection device calculates the head nodding frequency and nodding amplitude within a preset time period based on the head region mask of each binarized foreground mask in the foreground mask sequence.

[0119] The detection device can calculate the head centroid displacement vector of each head region mask in the binary foreground mask sequence. The detection device tracks the geometric center of the head region mask and analyzes the displacement of the geometric center in the vertical direction. The nodding amplitude is the distance of this displacement, and the nodding frequency can be obtained by observing how the displacement changes over time.

[0120] It should be noted that this application does not limit the execution order of S401, S402 and S403, and the detection device can execute them simultaneously or sequentially.

[0121] S404. The detection device constructs a prompt word template based on the duration of eye closure, frequency of eye closure, amplitude of mouth opening, frequency of mouth opening, frequency of head nodding, amplitude of head nodding, ambient light data and vehicle bus data corresponding to each image frame acquired within a preset time period, using a preset template.

[0122] The detection device fills the preset target with the duration of eye closure, frequency of eye closure, amplitude of mouth opening, frequency of mouth opening, frequency of head nodding, amplitude of head nodding, ambient light data and vehicle bus data corresponding to each image frame acquired within a preset time, and then obtains the prompt word template.

[0123] The preset template is, for example: "Within time period [T], [N] eye closures were detected, with the longest duration being [t_max] seconds, and the closure trend being [slow / sudden]; [M] large openings and closings were detected in the mouth area, with a frequency of [f] Hz; the head's center of mass underwent a [D] pixel periodic displacement in the vertical direction. Current environment: Time, continuous driving [H] hours, vehicle speed [V] km / h, illumination [L] Lux." Based on this, the detection device can transform pixel-level foreground mask sequences into high-level semantic features and fuse them with ambient light data and vehicle bus data to generate structured natural language prompts, providing clear and sufficient reasoning basis for large language models and helping to accurately detect fatigue driving in the future.

[0124] For example, this application also provides a fatigue driving detection device.

[0125] Figure 5 This is a schematic diagram of a fatigue driving detection device provided in one embodiment of this application. Figure 5 As shown, the device includes: an acquisition module 101, a mask extraction module 102, and an inference module 103.

[0126] The acquisition module 101 is used to acquire video streams in real time. The video streams include multiple consecutive image frames, each image frame including the driver's face; and to acquire ambient light data and vehicle bus data corresponding to each image frame. The mask extraction module 102 is used to perform the following mask extraction operation on each acquired image frame: Extract the region of interest (ROI) from the image frame; the ROI includes the eye region, mouth region, and head region; Based on the ambient light data corresponding to the image frame and the ambient light data corresponding to the previous image frame, determine the first threshold and illumination compensation factor corresponding to the image frame. An initial difference map is obtained based on the region of interest in the image frame and the target background frame; the target background frame is updated based on the illumination state of the previous image frame. The initial difference map is weighted according to the illumination compensation factor to obtain the final difference map; Based on the first threshold, the final difference map is binarized to obtain the binarized foreground mask corresponding to the image frame; the binarized foreground mask includes the eye region mask, the mouth region mask, and the head region mask; When the acquired video stream meets the preset duration, a foreground mask sequence is obtained based on the binarized foreground mask corresponding to each image frame in the video stream; The inference module 103 is used to construct a prompt word template based on the foreground mask sequence, ambient light data and vehicle bus data corresponding to each image frame acquired within a preset time period; and input the prompt word template into a preset large language model to obtain the fatigue driving detection results within a preset time period.

[0127] It should be noted that the fatigue driving detection device in this application embodiment can be used to execute the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, so it will not be repeated here.

[0128] In some examples, ambient light data includes light intensity; the mask extraction module 102 is specifically used to determine the amount of light change based on the light intensity corresponding to the image frame and the light intensity corresponding to the previous image frame. The illumination state of the image frame is determined based on the amount of illumination change and the second threshold; the illumination state includes illumination change state and illumination stability state. Based on the illumination state of the image frame, determine the first threshold and illumination compensation factor corresponding to the image frame.

[0129] In some examples, the second threshold is determined based on the illumination intensity of the previous image frame, and the second threshold satisfies Formula 1; Formula 1 is as follows: ; in, The second threshold, As the first weight, Let be the illumination intensity of the image frame preceding the image frame at time t.

[0130] In some examples, the mask extraction module 102 is specifically used to determine that the first threshold satisfies Formula 2 and the illumination compensation factor satisfies Formula 3 when the illumination state of the image frame is a sudden illumination change. Formula 2 is as follows: ;in, Let 'a' be the first threshold and 'a' be the second weight, where a > 1. The baseline threshold; Formula 3 is as follows: ;in, Let be the illumination compensation factor for the image frame at time t. Let be the illumination compensation factor of the image frame preceding the image frame at time t, e be the base of the natural logarithm, and α be the third weight. The difference between the illumination intensity of the image frame at time t and the illumination intensity of the image frame preceding the image frame at time t; When the illumination state of the image frame is in a stable illumination state, the first threshold is determined to be equal to the reference threshold, and the illumination compensation factor is determined to be equal to 1.

[0131] In some examples, the mask extraction module 102 is specifically used to determine the pixel as the foreground and assign the pixel value to 1 when the difference value of the pixel in the final difference map is greater than a first threshold. When the difference value of a pixel is less than or equal to the first threshold, the pixel is identified as background and its value is set to 0. The values of all pixels in the final difference map are assigned to obtain the binarized foreground mask corresponding to the image frame.

[0132] In some examples, the device also includes an update module; The update module is used to determine that the background update rate of the image frame is 0 when the illumination state of the image frame is in a sudden illumination state. When the illumination state of the image frame is stable, the background update rate corresponding to the image frame is determined to be the default decay value; When the background update rate corresponding to the image frame is the default decay value, the target background frame is updated according to the region of interest in the image frame using Formula 4. Formula four is: ;in, For the updated target background frame, This represents the background update rate corresponding to the image frame. For the target background frame, The region of interest in the image frame; When the background update rate corresponding to the image frame is 0, the target background frame is not updated.

[0133] In some examples, the inference module 103 is specifically used to calculate the duration and frequency of eye closure within a preset time based on the eye region mask of each binarized foreground mask in the foreground mask sequence. Based on the mouth region mask of each binarized foreground mask in the foreground mask sequence, calculate the mouth opening amplitude and mouth opening frequency within a preset time period; Based on the head region mask of each binarized foreground mask in the foreground mask sequence, calculate the nodding frequency and nodding amplitude within a preset time period; Based on the duration of eye closure, frequency of eye closure, amplitude of mouth opening, frequency of mouth opening, frequency of head nodding, amplitude of head nodding, and ambient light data and vehicle bus data corresponding to each image frame acquired within a preset time period, a prompt word template is constructed using a preset template.

[0134] In some examples, fatigue driving detection results include fatigue confidence, fatigue type, and natural language interpretation.

[0135] Figure 6 A schematic diagram of the structure of an electronic device provided in an embodiment of this application. Figure 1 .like Figure 6 As shown, the electronic device may include a first processor 201, which, when executing a computer-executable program or instruction stored in a memory, implements the embodiments of this application. Figures 1 to 4 The fatigue driving detection method shown.

[0136] The electronic device can be used to perform the various steps and / or processes corresponding to the electronic devices in the above method embodiments.

[0137] Figure 7 A schematic diagram of the structure of an electronic device provided in an embodiment of this application. Figure 2 .like Figure 7 As shown, the electronic device may include a second processor 301 and a memory 302. The memory 302 stores a computer program. When the second processor 301 executes the computer program, it implements the embodiments of this application. Figures 1 to 4 The fatigue driving detection method shown.

[0138] The electronic device can be used to perform the various steps and / or processes corresponding to the electronic devices in the above method embodiments.

[0139] The electronic device of this application can be used to execute the technical solutions of the method embodiments described above. Its implementation principle and technical effects are similar. The operations implemented by each module can be further referred to the relevant descriptions of the method embodiments, which will not be repeated here. The modules here can also be replaced by components or circuits.

[0140] This application can divide electronic devices into functional modules based on the above method examples. For example, each function can be divided into its own functional module, or two or more functions can be integrated into one processing module. The integrated module can be implemented in hardware or as a software functional module. It should be noted that the module division in the embodiments of this application is illustrative and only represents one logical functional division. In actual implementation, there may be other division methods.

[0141] Another embodiment of this application provides a computer-readable storage medium storing a computer program. When the computer program is executed by a processor, it can implement the embodiments of this application. Figures 1 to 4 The fatigue driving detection method shown.

[0142] This application also provides a program product including executable instructions stored in a computer-readable storage medium. At least one processor of an electronic device can read the executable instructions from the computer-readable storage medium, and the at least one processor executes the executable instructions to cause the electronic device to implement embodiments of this application. Figures 1 to 4 The fatigue driving detection method shown.

[0143] This application also provides a chip that is connected to a memory, or a chip that integrates a memory. When a software program stored in the memory is executed, it implements the embodiments of this application. Figures 1 to 4 The fatigue driving detection method shown.

[0144] In the embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or modules may be electrical, mechanical, or other forms.

[0145] Those skilled in the art will understand that although some embodiments herein include certain features included in other embodiments, combinations of features from different embodiments are intended to be within the scope of this application and form different embodiments. For example, in the claims, any of the claimed embodiments can be used in any combination.

[0146] The above-described embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.

Claims

1. A method for detecting driver fatigue, characterized in that, The method includes: Real-time acquisition of video stream, the video stream comprising multiple consecutive image frames, each image frame including the driver's face; acquisition of ambient light data and vehicle bus data corresponding to each image frame; Upon acquiring each image frame, the following mask extraction operation is performed on the image frame: Extract the region of interest from the image frame; the region of interest includes the eye region, the mouth region, and the head region; Based on the ambient light data corresponding to the image frame and the ambient light data corresponding to the previous image frame, determine the first threshold and illumination compensation factor corresponding to the image frame. An initial difference map is obtained based on the region of interest and the target background frame in the image frame; the target background frame is updated based on the illumination state of the previous image frame. The initial difference map is weighted according to the illumination compensation factor to obtain the final difference map; Based on the first threshold, the final difference map is binarized to obtain the binarized foreground mask corresponding to the image frame; the binarized foreground mask includes an eye region mask, a mouth region mask, and a head region mask; When the acquired video stream meets the preset duration, a foreground mask sequence is obtained based on the binarized foreground mask corresponding to each image frame in the video stream; Based on the foreground mask sequence, ambient light data and vehicle bus data corresponding to each image frame acquired within the preset time period, a prompt word template is constructed; The prompt word template is input into a preset large language model to obtain the fatigue driving detection results within the preset time period.

2. The method according to claim 1, characterized in that, The ambient light data includes illumination intensity; determining the first threshold and illumination compensation factor corresponding to the image frame based on the ambient light data corresponding to the image frame and the ambient light data corresponding to the previous image frame includes: The amount of illumination change is determined based on the illumination intensity corresponding to the image frame and the illumination intensity corresponding to the previous image frame. The illumination state of the image frame is determined based on the illumination change and the second threshold; the illumination state includes illumination change state and illumination stability state. Based on the illumination state of the image frame, determine the first threshold and illumination compensation factor corresponding to the image frame.

3. The method according to claim 2, characterized in that, The second threshold is determined based on the illumination intensity corresponding to the previous image frame of the image frame, and the second threshold satisfies Formula 1; Formula 1 is as follows: ; in, The second threshold, As the first weight, Let be the illumination intensity of the image frame preceding the image frame at time t.

4. The method according to claim 2, characterized in that, The step of determining the first threshold and illumination compensation factor corresponding to the image frame based on the illumination state of the image frame includes: When the illumination state of the image frame is the illumination abrupt change state, it is determined that the first threshold satisfies Formula 2, and the illumination compensation factor satisfies Formula 3; Formula 2 is as follows: ;in, Let 'a' be the first threshold and 'a' be the second weight, where a >

1. The baseline threshold; Formula 3 is as follows: ;in, Let be the illumination compensation factor for the image frame at time t. Let be the illumination compensation factor of the image frame preceding the image frame at time t, e be the base of the natural logarithm, and α be the third weight. The difference between the illumination intensity of the image frame at time t and the illumination intensity of the image frame preceding the image frame at time t; When the illumination state of the image frame is the illumination stable state, the first threshold is determined to be equal to the reference threshold, and the illumination compensation factor is determined to be equal to 1.

5. The method according to claim 1, characterized in that, The step of binarizing the final difference map according to the first threshold to obtain the binarized foreground mask corresponding to the image frame includes: For a pixel in the final difference map, if the difference value of the pixel is greater than the first threshold, the pixel is determined to be the foreground and the value of the pixel is assigned to 1. When the difference value of the pixel is less than or equal to the first threshold, the pixel is determined to be the background and the value of the pixel is assigned to 0; The final difference map is assigned values to all pixels to obtain the binarized foreground mask corresponding to the image frame.

6. The method according to claim 2, characterized in that, The method further includes: When the illumination state of the image frame is the illumination abrupt change state, the background update rate corresponding to the image frame is determined to be 0; When the illumination state of the image frame is the illumination stable state, the background update rate corresponding to the image frame is determined to be the default decay value; When the background update rate corresponding to the image frame is the default decay value, the target background frame is updated according to the region of interest in the image frame using Formula 4. Formula four is: ;in, The updated target background frame. The background update rate corresponding to the image frame. The target background frame, The region of interest in the image frame; When the background update rate corresponding to the image frame is 0, the target background frame is not updated.

7. The method according to claim 1, characterized in that, The step of constructing a prompt word template based on the foreground mask sequence, ambient light data corresponding to each image frame in the video stream, and vehicle bus data includes: Based on the eye region mask of each binarized foreground mask in the foreground mask sequence, calculate the duration of eye closure and the frequency of eye closure within the preset time period; Based on the mouth region mask of each binarized foreground mask in the foreground mask sequence, calculate the mouth opening amplitude and mouth opening frequency within the preset time period; Based on the head region mask of each binarized foreground mask in the foreground mask sequence, calculate the nodding frequency and nodding amplitude within the preset time period; Based on the duration of eye closure, the frequency of eye closure, the amplitude of mouth opening, the frequency of mouth opening, the frequency of head nodding, the amplitude of head nodding, and the ambient light data and vehicle bus data corresponding to each image frame obtained within the preset time period, the prompt word template is constructed using a preset template.

8. The method according to claim 1, characterized in that, The fatigue driving detection results include fatigue confidence level, fatigue type, and natural language interpretation.

9. An electronic device, characterized in that, include: First processor; The first processor is configured to execute a computer-executable program or instructions in the memory, causing the electronic device to perform the fatigue driving detection method according to any one of claims 1-8.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer-executable program or instructions, the computer-executable program or instructions being configured to perform the fatigue driving detection method according to any one of claims 1-8.