system
A system that monitors driver eye movements and blinking frequency with AI analysis and provides immediate warnings to prevent drowsiness and inattention during long drives, enhancing safety and driving habits.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-22
AI Technical Summary
Current driving assistance technologies fail to adequately address drowsiness and inattention during long-distance driving, particularly in small and medium-sized enterprises, due to insufficient accuracy in detecting abnormal driver conditions and the economic burden of implementing safety measures.
A system that monitors driver blinking frequency and eye movements in real-time using AI analysis, provides immediate warnings through audible, visual, or physical feedback, and records driving patterns to suggest breaks, utilizing ambient sound data for enhanced accuracy.
The system effectively prevents accidents by quickly detecting abnormal driving conditions and providing tailored feedback, improving safety and driving habits through continuous monitoring and data analysis.
Smart Images

Figure 2026101307000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot performed by at least one processor, the method including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In the case of a driver performing a long-distance drive, it is common to continue driving for a long time. As a result, drowsiness and inattention are likely to occur during driving, and this frequently causes accidents. However, the currently available driving assistance technologies cannot sufficiently solve these problems, and there is a problem that it is difficult to introduce safety measures considering the economic burden, especially in small and medium-sized enterprises.
Means for Solving the Problems
[0005] This invention provides an image acquisition means that monitors the driver's blinking frequency and eye movements in real time, and analyzes the data obtained using AI to quickly determine abnormal conditions of the driver. Furthermore, when an abnormal condition is detected, a warning means is provided to promptly notify the driver, thereby realizing a system that supports safe driving. In addition, by collecting and analyzing ambient sound data to improve the accuracy of the judgment, and by transmitting the obtained data to a server to record driving patterns, the system efficiently manages the driver's safety. This makes it possible to suggest when the driver should take a break, and thus prevent accidents from occurring.
[0006] "Image acquisition means" refers to a device that includes a camera or sensor for detecting the driver's face and recording its movements and state.
[0007] An "AI analysis method" is a device or system that utilizes artificial intelligence to analyze the driver's blinking frequency and eye movements based on acquired data, and to determine abnormal conditions.
[0008] A “warning device” is a device that provides audible, visual, or physical feedback used to promptly inform the driver that an abnormal condition has been detected.
[0009] A "data transmission means" is a communication device that transmits collected data to an external server, enabling long-term recording and analysis of driving patterns.
[0010] "Voice analysis means" refers to a device or system that acquires ambient sounds during operation and provides that data to an AI analysis means to assist in predicting abnormal conditions. [Brief explanation of the drawing]
[0011] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2]This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]
[0012] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.
[0013] First, the terms used in the following description will be explained.
[0014] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0015] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0016] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.
[0017] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.
[0018] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0019] [First Embodiment]
[0020] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0021] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0022] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0023] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0024] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0025] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0026] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0027] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0028] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0029] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0030] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0031] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0032] This invention relates to a system that ensures driver safety by installing a mobile device such as a smartphone in the driver's seat of a vehicle and continuously monitoring the driver's face using its front camera. This system has the function of monitoring the driver's blinking frequency and eye movements in real time and issuing a warning when an abnormality is detected. It also simultaneously records ambient sounds to improve the accuracy of abnormality detection.
[0033] First, the device uses facial recognition to detect the driver's face and acquires its location and movement information. This makes it possible to detect early signs of the driver becoming drowsy or inattentive. The device then inputs the acquired data into an AI analysis system to analyze the driver's blinking frequency and eye movement patterns, and compares them to normal conditions.
[0034] When AI analysis detects an anomaly, the device uses warning mechanisms to alert the driver. This alert can be delivered via voice, on-screen display, or vibration to draw the driver's attention. For example, if the user's blinking frequency increases significantly while driving, they might receive a message such as, "Please be careful. We recommend taking a break."
[0035] Furthermore, the data collected by the terminal is periodically sent to the server. The server analyzes this data, recording and accumulating each driver's driving pattern. This allows for a long-term understanding of drivers' driving habits and enables the proposal of specific guidance to promote safe driving.
[0036] This system can prevent dangerous behaviors such as drowsy driving, significantly improving safety for both the driver and the user. Furthermore, it provides appropriate feedback tailored to each user's individual circumstances, contributing to improved driving habits.
[0037] The following describes the processing flow.
[0038] Step 1:
[0039] The device activates a camera installed in the driver's seat to detect the driver's face. Using a face detection algorithm, it recognizes the driver's face position and collects basic data for tracking blink frequency and eye movements in real time.
[0040] Step 2:
[0041] The device inputs the acquired facial movement data into an AI analysis system, which analyzes blinking frequency and eye movement. The AI compares this data to normal driving patterns to determine if there are any abnormalities.
[0042] Step 3:
[0043] If an anomaly is detected through AI analysis, the device will immediately alert the driver using various warning methods. For example, it may provide voice messages such as, "Signs of drowsiness have been detected. Please take a break," alerts on the screen display, or vibration feedback.
[0044] Step 4:
[0045] The terminal periodically sends data to the server regarding abnormal occurrences and normal operation data. This data includes operating time, blinking frequency, and eye movement characteristics.
[0046] Step 5:
[0047] The server analyzes the received data and stores the driver's driving history and patterns. This stored data is used to provide personalized safe driving advice and improvement plans for each driver. The server provides this information to the user and notifies them of points to consider during their next drive.
[0048] (Example 1)
[0049] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0050] The present invention aims to provide a new system for promoting safe driving by preventing dangerous behaviors such as decreased driver attention and drowsy driving. Conventional systems have the problem of insufficient accuracy in detecting abnormal driver conditions, making it difficult to provide appropriate warnings and feedback. Therefore, it is necessary to quickly and accurately detect abnormalities during driving and provide feedback tailored to individual drivers.
[0051] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0052] In this invention, the server includes at least one image acquisition means for detecting the driver's face and analyzing the frequency of blinking and eye movements; an AI analysis means for determining the driver's abnormal condition by comparing the acquired data with normal driving data; a warning means for notifying the driver of a warning via voice or display when an abnormality is detected by the AI analysis means; a data transmission means for transmitting data to a data storage device after notification by the warning means to record and analyze long-term driving patterns; and a means for creating prompt sentences using a generative AI model to customize the feedback provided to the driver. This enables rapid detection of abnormal driving conditions and the provision of appropriate warnings and feedback to the driver.
[0053] "Image acquisition means" refers to a combination of hardware and software used to detect the driver's face and analyze the frequency of blinking and eye movements.
[0054] "AI analysis means" refers to a system that uses artificial intelligence technology to automatically determine abnormal conditions in the driver based on acquired data compared with normal driving data.
[0055] "Warning means" refers to a device or software function that notifies the driver of an abnormality detected by AI analysis means, either through voice or display.
[0056] "Data transmission means" refers to communication technologies and protocols for transmitting data to a data storage device after notification by a warning means.
[0057] A "generative AI model" refers to an artificial intelligence model that has the ability to create prompts to customize the feedback given to the driver.
[0058] This invention is a system that utilizes a portable information terminal installed in the driver's seat to prevent dangerous behaviors such as decreased driver attention and drowsy driving. This system can detect the driver's face and analyze blinking frequency and eye movement in real time. Specifically, a smartphone or tablet is used as the hardware, and a general facial recognition API, which is the latest facial recognition technology, is used as the software.
[0059] The device uses facial recognition to instantly identify the driver's face from video data and continuously monitors their location and status. This facial recognition is implemented using machine learning libraries and platforms, such as TENSORFLOW® and PyTorch. This allows for immediate detection of drowsiness or distraction in the driver, enabling the system to respond to changes in the driver's condition.
[0060] Furthermore, in addition to visual data during driving, ambient sounds are also collected, and voice analysis technology is used to improve the accuracy of detecting abnormal conditions. A voice recognition library is used to process this voice data.
[0061] The server receives data transmitted from terminals, securely stores it in a cloud environment, and analyzes driving patterns over a long period. By utilizing analysis tools provided by cloud service providers, it is possible to create appropriate feedback based on the driving data of individual drivers.
[0062] When the AI analysis system detects an anomaly, the terminal immediately issues a warning to alert the driver. The warning is conveyed to the driver as an audio message, display, or vibration. For example, specific advice such as, "Your blinking frequency is higher than normal. We recommend you take a break," is provided.
[0063] Furthermore, by using a generative AI model, customized prompt messages are generated based on data for each driver, providing even more specific improvement suggestions. These prompt messages are created in the format of, "How should we prompt the driver if their attention is waning?" By using such prompts, users can understand their own driving habits and strive for safer driving.
[0064] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0065] Step 1:
[0066] The device activates the front camera and detects the driver's face. The input is real-time video data, and the output is the location and feature information of the detected face. This prepares the face recognition algorithm to monitor the driver's blinking frequency and eye movements.
[0067] Step 2:
[0068] The terminal continuously monitors the driver's blinking frequency and eye movements based on detected facial information. The input is facial feature information, and the output is blinking frequency data and eye movement patterns. Using AI analysis, this data is analyzed in real time and processed by comparing it to normal patterns.
[0069] Step 3:
[0070] Based on the analysis results, the device notifies the driver of any abnormalities in blinking frequency or gaze patterns. The input is the analyzed driver status data, and the output is a warning message. Specific actions such as voice alerts, display notifications, and vibrations are selected to inform the driver of the abnormal condition.
[0071] Step 4:
[0072] After issuing an alert, the terminal sends the collected data to the server. The input is the entire operational data, and the output is a data record on cloud storage. The data is securely transferred using the SSL / TLS protocol and prepared for data analysis.
[0073] Step 5:
[0074] The server analyzes the received data and examines each driver's driving pattern. The input is driving data stored in cloud storage, and the output is feedback information for the driver. By using a generative AI model, prompt messages are created and the feedback provided to the driver is customized.
[0075] Step 6:
[0076] Based on the feedback provided, users implement improvements that contribute to safer driving. The input is feedback documents provided by the server, and the output is improved driving habits. Specifically, they follow suggestions such as "take sufficient rest on the next drive."
[0077] (Application Example 1)
[0078] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0079] Even when autonomous vehicles are standard equipment, ensuring safety remains a challenge when a human driver switches to manual operation in an unforeseen situation. In particular, it is essential to quickly detect and respond to dangers caused by driver fatigue or distraction.
[0080] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0081] In this invention, the server includes an image acquisition means for detecting the driver's face and analyzing the frequency of blinking and eye movements; an AI analysis means for determining the driver's abnormal condition based on the acquired data; and a warning means for notifying a warning when an abnormality is detected by the AI analysis means. This enables monitoring and real-time notification to allow the driver to safely perform manual operations in an autonomous vehicle.
[0082] "Image acquisition means" refers to a device or method for detecting the driver's face and analyzing the frequency of blinking and eye movements.
[0083] "AI analysis method" refers to a process that uses artificial intelligence to determine the driver's abnormal condition based on acquired data.
[0084] A "warning mechanism" is a function that alerts the driver visually or audibly when an abnormality is detected by the AI analysis mechanism.
[0085] "Data transmission means" refers to a method for recording driving patterns by sending data to a server after notification by a warning means.
[0086] A "real-time notification method" is a function that instantly displays information on the control panel of an autonomous vehicle.
[0087] "Voice analysis means" refers to a method of acquiring ambient sounds during operation and improving the accuracy of judging abnormal conditions based on that data.
[0088] This invention provides a system for enabling a driver to safely perform manual operations in an autonomous vehicle. To achieve this, the server uses image acquisition means to detect the driver's face. The image acquisition means uses a terminal equipped with a camera, which is fixed in an appropriate position within the vehicle. The terminal continuously captures the driver's face and analyzes the frequency of blinking and eye movements.
[0089] The analyzed image data is processed in real time by AI analysis tools. This AI analysis utilizes machine learning libraries such as TensorFlow and Keras. If an abnormal condition is detected as a result of the analysis, a warning system is activated for the user (driver), providing visual or audible alerts.
[0090] Furthermore, after a warning is issued, the terminal uses a data transmission mechanism to send the acquired data to a server. This data records the driver's driving patterns and is used for future improvements. The server is equipped with a voice analysis mechanism, which acquires ambient sounds during driving and improves the accuracy of the AI analysis.
[0091] As a specific example, if the system detects an increase in blinking frequency during long-distance driving, indicating driver fatigue, it can immediately display a visual and audible message such as "Caution: Driver fatigue is detected. A break is recommended," encouraging safe driving.
[0092] An example of a prompt message for a generated AI model might be: "I want to develop an app that detects and warns of blinking and gaze abnormalities for a real-time monitoring AI assistant for drivers. What kind of data and model should I use?"
[0093] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0094] Step 1:
[0095] The terminal detects the driver's face in real time. The input is video data from a camera, and a face detection algorithm is used to identify the face region. The output is the position information of the face. In this step, common computer vision techniques are used for face detection.
[0096] Step 2:
[0097] The device analyzes blink frequency and eye movement from the acquired facial region. The input is facial position information, and the output is blink frequency data and eye movement patterns. This data is analyzed by a deep learning model and compared to normal baseline data.
[0098] Step 3:
[0099] The terminal analyzes ambient sounds using acquired audio data. The input is audio data collected from a microphone, and the output is analysis data that detects noise and abnormal sounds. Audio analysis helps to understand changes in the driving environment and improve the accuracy of anomaly detection.
[0100] Step 4:
[0101] The AI analysis tool uses the analysis data from steps 2 and 3 to determine the driver's abnormal condition. The inputs are blink and gaze data, and voice analysis data, and the output is whether or not an abnormality is present. A machine learning model is used for this determination, and abnormalities exceeding a set threshold are detected.
[0102] Step 5:
[0103] When the user (driver) detects an anomaly, the terminal immediately issues a warning. The input is an anomaly signal from the AI analysis system, and the output is a visual display or audio alert. In this step, specific actions are taken to draw the driver's attention.
[0104] Step 6:
[0105] The server receives data sent from the terminal after a warning and records long-term driving patterns. Inputs are driving patterns and anomaly detection data, and output is a database of driving history. This data will be useful for improving driving habits in the future.
[0106] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0107] This invention is a system that combines multiple sensors and AI technology using a portable terminal installed in the driver's seat to improve driver safety. This system comprehensively monitors the driver's condition by combining driver face detection and emotion recognition, and takes appropriate action when an abnormality is detected.
[0108] First, the device uses facial detection technology to capture the driver's face with a camera and collect facial expression data in real time. Next, an emotion engine analyzes this data to recognize the driver's emotional state. In this process, emotions such as joy, anger, surprise, and sadness are identified. The emotion engine can also determine the driver's level of stress and tension.
[0109] In addition, the device analyzes blink frequency and eye movements based on collected facial expression data to determine any discrepancies from normal driving patterns. If an abnormal condition is detected, for example, if the driver shows signs of stress, it will issue a warning to the driver using warning mechanisms. Alerts are provided through methods such as voice notifications, display notifications, and vibration feedback to encourage the driver to take a break.
[0110] Furthermore, the terminal collects ambient sound data while driving and uses voice analysis to enhance the accuracy of the emotion engine's judgments. The resulting facial recognition data, ambient sound data, and emotion data are transmitted to a server and recorded as a long-term history of driving conditions and emotional changes. The server analyzes the accumulated data to create specific advice for improving the driver's driving patterns and driving safely, and provides this advice to the user as needed.
[0111] As a concrete example, consider a situation where a driver is fatigued from driving for a long time, and their feelings of anger and frustration are increasing. In this case, the device detects this state with its emotion engine, issues a warning at the appropriate time to encourage the driver to take a break, and avoids potential dangers. Through this process, the invention accurately grasps the driver's state and contributes to accident prevention.
[0112] The following describes the processing flow.
[0113] Step 1:
[0114] The device activates a camera installed in the driver's seat to recognize the driver's face. Using a facial recognition algorithm, it detects the driver's face position and tracks blinking frequency and eye movements in real time.
[0115] Step 2:
[0116] The device inputs the acquired facial expression data into an emotion engine to analyze the driver's emotional state. Emotions such as joy, anger, surprise, and sadness are identified, and the degree of stress and tension is also evaluated.
[0117] Step 3:
[0118] The device integrates the results of emotion analysis with the results of blink frequency and eye movement analysis to determine whether the driver is exhibiting any abnormal behavior. For example, increased blinking, fixed gaze, and an increase in negative emotions are considered abnormal.
[0119] Step 4:
[0120] If an anomaly is detected, the device will use warning mechanisms to alert the driver. The driver will be alerted through voice alerts, on-screen messages, or device vibrations. Simultaneously, a message will be sent to the driver encouraging them to take a break.
[0121] Step 5:
[0122] The device periodically transmits facial recognition data, emotion data, and ambient sound data collected during driving to a server. The transmitted data is used to analyze long-term driving patterns and emotional changes.
[0123] Step 6:
[0124] The server analyzes the collected data and records the driver's driving history and changes in their emotional state. Based on these results, it creates specific safe driving advice and improvement plans for the driver and provides feedback to the user as needed.
[0125] (Example 2)
[0126] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0127] Driver inattention and emotional fluctuations are often major contributing factors to traffic accidents. Therefore, it is necessary to improve safety by accurately monitoring the driver's condition in real time and providing appropriate warnings. However, conventional technology has the challenge of not being able to adequately recognize the driver's emotions or stress levels, making immediate response difficult.
[0128] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0129] In this invention, the server includes means for detecting the driver's face and collecting facial expression data, means for analyzing the collected facial expression data and recognizing the emotional state, means for analyzing the blinking frequency and eye movements and identifying differences from the normal driving pattern, means for acquiring and analyzing ambient sounds during driving, and means for notifying the driver of a warning when an abnormality is detected, transmitting data to the server, and recording the driving pattern. This enables comprehensive monitoring of the driver's condition and real-time measures to improve safety.
[0130] "Image acquisition means" refers to technology or equipment for detecting the driver's face and collecting facial expression data.
[0131] "AI analysis method" refers to a processing method that uses artificial intelligence technology to analyze collected facial expression data and recognize the driver's emotional state.
[0132] "Biometric signal analysis means" refers to a method or apparatus for analyzing blink frequency and eye movement to identify differences from normal operating patterns.
[0133] A "warning mechanism" is a notification mechanism that alerts the driver when an abnormality is detected by an AI analysis mechanism.
[0134] "Voice analysis means" refers to a technology that acquires audio from the driving environment and analyzes that data to enhance the accuracy of judging abnormal conditions.
[0135] "Data transmission means" refers to a communication means for sending the analyzed data to a server and recording the driving pattern.
[0136] This invention is a system for improving driver safety, and in particular aims to monitor the driver's condition in real time and provide appropriate alerts when abnormalities are detected. The system mainly consists of a terminal and server installed in the driver's seat, and combines multiple sensors and AI technology.
[0137] The device is equipped with an image acquisition mechanism using a high-resolution camera. This camera recognizes the driver's face and collects facial expression data. The collected data is processed by an AI analysis mechanism within the device to identify the driver's emotional state. An emotion recognition algorithm can be used to identify the emotional state, recognizing basic emotions such as joy, anger, surprise, and sadness.
[0138] Furthermore, the terminal is equipped with biosignal analysis capabilities to detect blinking frequency and eye movements. Based on this information, if movements different from the normal driving pattern are detected, it is judged to be abnormal. In addition, the terminal is equipped with voice analysis capabilities to acquire ambient sounds inside the car using a microphone. After noise is removed from the acquired voice data, it is analyzed by AI analysis capabilities to complement the judgment of emotional state.
[0139] If the device detects an anomaly, it will alert the driver using a warning system. The alert will be delivered either by voice or displayed on the screen. It can also provide physical feedback by gently vibrating the seat to encourage the driver to take a break.
[0140] All analysis data is transmitted from the terminal to the server. The server records the history of each driver's driving patterns and emotional states. This information is securely stored using a cloud server and analyzed using AI analysis tools. The analysis results are provided to the user as personalized safe driving advice.
[0141] As a concrete example, consider a driver who is fatigued from driving for a long time and whose emotions are unstable. In this case, the device can detect the abnormality from the driver's facial expressions and voice data, and prompt them to take a break with voice notifications and vibration feedback. In this way, it is possible to avoid potential dangers and ensure the driver's safety.
[0142] An example of a prompt to input into the generating AI model is, "Design a system that analyzes the driver's facial expressions and ambient sound data, monitors their emotional state in real time, and improves safety." This allows the system to comprehensively understand the driver's state and provide support for safe driving.
[0143] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0144] Step 1:
[0145] The terminal uses a facial recognition camera to detect the driver's face and acquire facial expression data. The input data is the video feed from the camera, and the output is digital data containing the driver's face region and its facial features. In this step, image processing techniques are used to identify the face and extract the information.
[0146] Step 2:
[0147] The terminal inputs the acquired facial expression data into an AI analysis system to analyze the driver's emotional state. The input is the facial expression data obtained in step 1, and the output is numerical data indicating the driver's emotional classification (e.g., joy, anger, sadness, etc.) and its intensity. This process involves specific actions using a pre-trained emotion recognition algorithm.
[0148] Step 3:
[0149] The terminal analyzes blink frequency and eye movements using biosignal analysis tools. The input is continuous video data from a camera, and the output is vector data of the driver's blink frequency and gaze direction. In this step, a specific transformation is performed to apply continuous data monitoring and motion detection algorithms.
[0150] Step 4:
[0151] The device acquires ambient sound using a sound collection device and analyzes it using a speech analysis method. The input data is the audio signal from the microphone, and the output is characteristic data of the sound environment. Specifically, it performs speech filtering and frequency analysis, and applies noise cancellation to improve accuracy.
[0152] Step 5:
[0153] The terminal integrates the above analysis results and detects anomalies by comparing them with driving patterns. Inputs include the aforementioned emotional state, blink and gaze data, and voice analysis results, while output is an alert message indicating the presence or absence of an anomaly and its details. This step includes specific actions taken to detect anomalies based on AI judgment, compared with normal driving data.
[0154] Step 6:
[0155] If an anomaly is detected, the terminal activates a warning mechanism, providing the driver with an alert via voice notification, display, or seat vibration. The input is the result of the anomaly detection, and the output is feedback to the driver. Specific actions in this step include sending signals to the user interface.
[0156] Step 7:
[0157] The analyzed data is sent from the terminal to the server and recorded as a history of driving patterns and emotional states. The input is all the analyzed data, and the output is the historical data stored on the server. This step involves the specific operation of data transmission using a secure communication protocol.
[0158] This process allows the entire system to comprehensively monitor the driver's condition and provide timely support to improve safety.
[0159] (Application Example 2)
[0160] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0161] There is a need for technology that can accurately detect changes in the driver's emotional state and fatigue while driving, and support safe driving. However, conventional technology has the challenge of being unable to make highly accurate judgments about abnormal conditions while taking into account the driver's emotions and ambient sounds.
[0162] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0163] In this invention, the server includes an image acquisition means for detecting the driver's face and analyzing the frequency of blinking and eye movements, an emotion recognition means for determining the driver's emotional state in real time, and a voice analysis means for acquiring ambient sounds and improving the accuracy of determining abnormal conditions. This enables high-precision monitoring of the driver's various states and supports safe driving.
[0164] The "image acquisition means" is a device that detects the driver's face and analyzes the frequency of blinking and eye movements.
[0165] An "AI analysis device" is a processing device that uses artificial intelligence to analyze collected data and determine abnormal conditions in the driver.
[0166] An "emotion recognition system" is a system that determines the driver's emotional state in real time based on facial expression data.
[0167] A "warning device" is a device that notifies the driver of an abnormality when an abnormality is detected by an AI analysis device.
[0168] The "data transmission means" is a function that transmits driving data to an information processing device after notification and records the driving pattern.
[0169] "Voice analysis means" refers to a method for acquiring ambient sounds during operation and using them to improve the accuracy of judging abnormal conditions.
[0170] Modes for carrying out the invention
[0171] This invention realizes a system that monitors the driver's condition with high precision and supports safe driving. The server can detect the driver's face using a terminal installed in the driver's seat. Using the terminal's camera, it determines the position of the face in real time and analyzes the frequency of blinking and eye movements. The hardware used can be a smartphone with a built-in camera or a dedicated device. Libraries such as OpenCV are used for image processing.
[0172] Furthermore, by using an AI model as an emotion recognition tool, the driver's emotional state can be determined in real time from their facial expression data. Deep learning frameworks such as TensorFlow and PyTorch are used here.
[0173] When an abnormal condition is detected, the terminal notifies the driver of a warning through voice notification or display. Speech synthesis technology can be used for the warning, providing appropriate feedback in real time.
[0174] Subsequently, driving data is transmitted to a server via a data transmission device. This records driving patterns and allows for long-term data analysis. Based on the collected data, the server can generate personalized advice for the driver using an AI model.
[0175] As a concrete example, if the system detects signs of fatigue based on facial expression data while a driver is driving long distances, it will issue a voice warning and prompt the driver to take a break. In this way, it is possible to improve driving habits while ensuring the driver's safety.
[0176] An example of a prompt message for a generating AI model would be: "Input the following facial expression data into the facial expression analysis model to identify the driver's emotional state. Input data: smile, blinking frequency, eye movements. Output: emotion label, stress level."
[0177] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0178] Step 1:
[0179] The device uses a camera to detect the driver's face in real time. The input is video from the camera, and the output is the position information of the face. The OpenCV library is used to analyze the video frame by frame to detect the face.
[0180] Step 2:
[0181] The device processes detected facial expressions using AI analysis tools to determine the emotional state. The input is facial feature point data, and the output is an emotion label and stress level. Facial expression analysis is performed using a pre-trained deep learning model with TensorFlow.
[0182] Step 3:
[0183] The device will issue a warning to the user if an abnormal emotional state is detected by AI analysis. Inputs are emotion labels and stress levels, and output is either a voice message or a display notification. Speech synthesis technology is used to generate and notify the warning in real time.
[0184] Step 4:
[0185] The terminal transmits driving data and emotional state data to a server, recording the history in a database. The input is a set of the driver's emotional state and driving data, and the output is the recorded data entry. A data transmission protocol is used to securely transfer the data to the server.
[0186] Step 5:
[0187] The server analyzes the collected data and generates advice for the driver using a generative AI model. The input is historical driving data and emotion labels, and the output is specific advice for improving driving. A Python script is used to perform data analysis and drive the AI model.
[0188] Step 6:
[0189] Users review the advice received from the server and use it to improve their safe driving practices. The input is advice messages from the server, and the output is the driver's improved behavior. Users review the displayed advice and apply it to their next driving session.
[0190] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0191] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0192] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0193] [Second Embodiment]
[0194] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0195] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0196] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0197] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0198] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0199] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0200] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0201] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0202] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0203] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0204] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0205] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0206] This invention relates to a system that ensures driver safety by installing a mobile device such as a smartphone in the driver's seat of a vehicle and continuously monitoring the driver's face using its front camera. This system has the function of monitoring the driver's blinking frequency and eye movements in real time and issuing a warning when an abnormality is detected. It also simultaneously records ambient sounds to improve the accuracy of abnormality detection.
[0207] First, the device uses facial recognition to detect the driver's face and acquires its location and movement information. This makes it possible to detect early signs of the driver becoming drowsy or inattentive. The device then inputs the acquired data into an AI analysis system to analyze the driver's blinking frequency and eye movement patterns, and compares them to normal conditions.
[0208] When AI analysis detects an anomaly, the device uses warning mechanisms to alert the driver. This alert can be delivered via voice, on-screen display, or vibration to draw the driver's attention. For example, if the user's blinking frequency increases significantly while driving, they might receive a message such as, "Please be careful. We recommend taking a break."
[0209] Furthermore, the data collected by the terminal is periodically sent to the server. The server analyzes this data, recording and accumulating each driver's driving pattern. This allows for a long-term understanding of drivers' driving habits and enables the proposal of specific guidance to promote safe driving.
[0210] This system can prevent dangerous behaviors such as drowsy driving, significantly improving safety for both the driver and the user. Furthermore, it provides appropriate feedback tailored to each user's individual circumstances, contributing to improved driving habits.
[0211] The following describes the processing flow.
[0212] Step 1:
[0213] The device activates a camera installed in the driver's seat to detect the driver's face. Using a face detection algorithm, it recognizes the driver's face position and collects basic data for tracking blink frequency and eye movements in real time.
[0214] Step 2:
[0215] The device inputs the acquired facial movement data into an AI analysis system, which analyzes blinking frequency and eye movement. The AI compares this data to normal driving patterns to determine if there are any abnormalities.
[0216] Step 3:
[0217] If an anomaly is detected through AI analysis, the device will immediately alert the driver using various warning methods. For example, it may provide voice messages such as, "Signs of drowsiness have been detected. Please take a break," alerts on the screen display, or vibration feedback.
[0218] Step 4:
[0219] The terminal periodically sends data to the server regarding abnormal occurrences and normal operation data. This data includes operating time, blinking frequency, and eye movement characteristics.
[0220] Step 5:
[0221] The server analyzes the received data and stores the driver's driving history and patterns. This stored data is used to provide personalized safe driving advice and improvement plans for each driver. The server provides this information to the user and notifies them of points to consider during their next drive.
[0222] (Example 1)
[0223] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0224] The present invention aims to provide a new system for promoting safe driving by preventing dangerous behaviors such as decreased driver attention and drowsy driving. Conventional systems have the problem of insufficient accuracy in detecting abnormal driver conditions, making it difficult to provide appropriate warnings and feedback. Therefore, it is necessary to quickly and accurately detect abnormalities during driving and provide feedback tailored to individual drivers.
[0225] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0226] In this invention, the server includes at least one image acquisition means for detecting the driver's face and analyzing the frequency of blinking and eye movements; an AI analysis means for determining the driver's abnormal condition by comparing the acquired data with normal driving data; a warning means for notifying the driver of a warning via voice or display when an abnormality is detected by the AI analysis means; a data transmission means for transmitting data to a data storage device after notification by the warning means to record and analyze long-term driving patterns; and a means for creating prompt sentences using a generative AI model to customize the feedback provided to the driver. This enables rapid detection of abnormal driving conditions and the provision of appropriate warnings and feedback to the driver.
[0227] "Image acquisition means" refers to a combination of hardware and software used to detect the driver's face and analyze the frequency of blinking and eye movements.
[0228] "AI analysis means" refers to a system that uses artificial intelligence technology to automatically determine abnormal conditions in the driver based on acquired data compared with normal driving data.
[0229] "Warning means" refers to a device or software function that notifies the driver of an abnormality detected by AI analysis means, either through voice or display.
[0230] "Data transmission means" refers to communication technologies and protocols for transmitting data to a data storage device after notification by a warning means.
[0231] A "generative AI model" refers to an artificial intelligence model that has the ability to create prompts to customize the feedback given to the driver.
[0232] This invention is a system that utilizes a portable information terminal installed in the driver's seat to prevent dangerous behaviors such as decreased driver attention and drowsy driving. This system can detect the driver's face and analyze blinking frequency and eye movement in real time. Specifically, a smartphone or tablet is used as the hardware, and a general facial recognition API, which is the latest facial recognition technology, is used as the software.
[0233] The device utilizes facial recognition to instantly identify the driver's face from video data and continuously monitors their location and status. This facial recognition is implemented using machine learning libraries and platforms, such as TensorFlow and PyTorch. This allows for immediate detection of drowsiness or distraction in the driver, enabling the system to respond to changes in the driver's condition.
[0234] Furthermore, in addition to visual data during driving, ambient sounds are also collected, and voice analysis technology is used to improve the accuracy of detecting abnormal conditions. A voice recognition library is used to process this voice data.
[0235] The server receives data transmitted from terminals, securely stores it in a cloud environment, and analyzes driving patterns over a long period. By utilizing analysis tools provided by cloud service providers, it is possible to create appropriate feedback based on the driving data of individual drivers.
[0236] When the AI analysis system detects an anomaly, the terminal immediately issues a warning to alert the driver. The warning is conveyed to the driver as an audio message, display, or vibration. For example, specific advice such as, "Your blinking frequency is higher than normal. We recommend you take a break," is provided.
[0237] Furthermore, by using a generative AI model, customized prompt messages are generated based on data for each driver, providing even more specific improvement suggestions. These prompt messages are created in the format of, "How should we prompt the driver if their attention is waning?" By using such prompts, users can understand their own driving habits and strive for safer driving.
[0238] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0239] Step 1:
[0240] The device activates the front camera and detects the driver's face. The input is real-time video data, and the output is the location and feature information of the detected face. This prepares the face recognition algorithm to monitor the driver's blinking frequency and eye movements.
[0241] Step 2:
[0242] The terminal continuously monitors the driver's blinking frequency and eye movements based on detected facial information. The input is facial feature information, and the output is blinking frequency data and eye movement patterns. Using AI analysis, this data is analyzed in real time and processed by comparing it to normal patterns.
[0243] Step 3:
[0244] Based on the analysis results, the device notifies the driver of any abnormalities in blinking frequency or gaze patterns. The input is the analyzed driver status data, and the output is a warning message. Specific actions such as voice alerts, display notifications, and vibrations are selected to inform the driver of the abnormal condition.
[0245] Step 4:
[0246] After issuing an alert, the terminal sends the collected data to the server. The input is the entire operational data, and the output is a data record on cloud storage. The data is securely transferred using the SSL / TLS protocol and prepared for data analysis.
[0247] Step 5:
[0248] The server analyzes the received data and examines each driver's driving pattern. The input is driving data stored in cloud storage, and the output is feedback information for the driver. By using a generative AI model, prompt messages are created and the feedback provided to the driver is customized.
[0249] Step 6:
[0250] Based on the feedback provided, users implement improvements that contribute to safer driving. The input is feedback documents provided by the server, and the output is improved driving habits. Specifically, they follow suggestions such as "take sufficient rest on the next drive."
[0251] (Application Example 1)
[0252] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0253] Even when autonomous vehicles are standard equipment, ensuring safety remains a challenge when a human driver switches to manual operation in an unforeseen situation. In particular, it is essential to quickly detect and respond to dangers caused by driver fatigue or distraction.
[0254] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0255] In this invention, the server includes an image acquisition means for detecting the driver's face and analyzing the frequency of blinking and eye movements; an AI analysis means for determining the driver's abnormal condition based on the acquired data; and a warning means for notifying a warning when an abnormality is detected by the AI analysis means. This enables monitoring and real-time notification to allow the driver to safely perform manual operations in an autonomous vehicle.
[0256] "Image acquisition means" refers to a device or method for detecting the driver's face and analyzing the frequency of blinking and eye movements.
[0257] "AI analysis method" refers to a process that uses artificial intelligence to determine the driver's abnormal condition based on acquired data.
[0258] A "warning mechanism" is a function that alerts the driver visually or audibly when an abnormality is detected by the AI analysis mechanism.
[0259] "Data transmission means" refers to a method for recording driving patterns by sending data to a server after notification by a warning means.
[0260] A "real-time notification method" is a function that instantly displays information on the control panel of an autonomous vehicle.
[0261] "Voice analysis means" refers to a method of acquiring ambient sounds during operation and improving the accuracy of judging abnormal conditions based on that data.
[0262] This invention provides a system for enabling a driver to safely perform manual operations in an autonomous vehicle. To achieve this, the server uses image acquisition means to detect the driver's face. The image acquisition means uses a terminal equipped with a camera, which is fixed in an appropriate position within the vehicle. The terminal continuously captures the driver's face and analyzes the frequency of blinking and eye movements.
[0263] The analyzed image data is processed in real time by AI analysis tools. This AI analysis utilizes machine learning libraries such as TensorFlow and Keras. If an abnormal condition is detected as a result of the analysis, a warning system is activated for the user (driver), providing visual or audible alerts.
[0264] Furthermore, after a warning is issued, the terminal uses a data transmission mechanism to send the acquired data to a server. This data records the driver's driving patterns and is used for future improvements. The server is equipped with a voice analysis mechanism, which acquires ambient sounds during driving and improves the accuracy of the AI analysis.
[0265] As a specific example, if the system detects an increase in blinking frequency during long-distance driving, indicating driver fatigue, it can immediately display a visual and audible message such as "Caution: Driver fatigue is detected. A break is recommended," encouraging safe driving.
[0266] An example of a prompt message for a generated AI model might be: "I want to develop an app that detects and warns of blinking and gaze abnormalities for a real-time monitoring AI assistant for drivers. What kind of data and model should I use?"
[0267] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0268] Step 1:
[0269] The terminal detects the driver's face in real time. The input is video data from a camera, and a face detection algorithm is used to identify the face region. The output is the position information of the face. In this step, common computer vision techniques are used for face detection.
[0270] Step 2:
[0271] The device analyzes blink frequency and eye movement from the acquired facial region. The input is facial position information, and the output is blink frequency data and eye movement patterns. This data is analyzed by a deep learning model and compared to normal baseline data.
[0272] Step 3:
[0273] The terminal analyzes ambient sounds using acquired audio data. The input is audio data collected from a microphone, and the output is analysis data that detects noise and abnormal sounds. Audio analysis helps to understand changes in the driving environment and improve the accuracy of anomaly detection.
[0274] Step 4:
[0275] The AI analysis tool uses the analysis data from steps 2 and 3 to determine the driver's abnormal condition. The inputs are blink and gaze data, and voice analysis data, and the output is whether or not an abnormality is present. A machine learning model is used for this determination, and abnormalities exceeding a set threshold are detected.
[0276] Step 5:
[0277] When the user (driver) detects an anomaly, the terminal immediately issues a warning. The input is an anomaly signal from the AI analysis system, and the output is a visual display or audio alert. In this step, specific actions are taken to draw the driver's attention.
[0278] Step 6:
[0279] The server receives data sent from the terminal after a warning and records long-term driving patterns. Inputs are driving patterns and anomaly detection data, and output is a database of driving history. This data will be useful for improving driving habits in the future.
[0280] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0281] This invention is a system that combines multiple sensors and AI technology using a portable terminal installed in the driver's seat to improve driver safety. This system comprehensively monitors the driver's condition by combining driver face detection and emotion recognition, and takes appropriate action when an abnormality is detected.
[0282] First, the device uses facial detection technology to capture the driver's face with a camera and collect facial expression data in real time. Next, an emotion engine analyzes this data to recognize the driver's emotional state. In this process, emotions such as joy, anger, surprise, and sadness are identified. The emotion engine can also determine the driver's level of stress and tension.
[0283] In addition, the device analyzes blink frequency and eye movements based on collected facial expression data to determine any discrepancies from normal driving patterns. If an abnormal condition is detected, for example, if the driver shows signs of stress, it will issue a warning to the driver using warning mechanisms. Alerts are provided through methods such as voice notifications, display notifications, and vibration feedback to encourage the driver to take a break.
[0284] Furthermore, the terminal collects environmental sound data during driving and enhances the accuracy of the emotion engine's judgment through voice analysis means. The obtained facial recognition data, environmental sound data, and emotion data are transmitted to the server and recorded as a long-term driving situation and emotion change history. The server analyzes the accumulated data, creates specific advice for improving the driver's driving pattern and safe driving, and provides it to the user as appropriate.
[0285] As a specific example, consider a situation where a driver is fatigued due to long hours of driving and the emotions of anger and irritation are increasing. In this case, the terminal detects the state with the emotion engine, issues a warning at an appropriate timing to prompt the driver to take a break, and avoids potential dangers. Through such a process, the invention accurately grasps the driver's state and contributes to accident prevention.
[0286] The following describes the processing flow.
[0287] Step 1:
[0288] The terminal activates the camera installed in the driver's seat and recognizes the driver's face. Using a facial recognition algorithm, it detects the position of the driver's face and tracks the frequency of blinking and the movement of the line of sight in real time.
[0289] Step 2:
[0290] The terminal inputs the obtained facial expression data into the emotion engine and analyzes the driver's emotional state. Emotions such as joy, anger, surprise, and sadness are identified, and the degree of stress and tension is also evaluated.
[0291] Step 3:
[0292] Integrating the results of the emotion analysis and the analysis results of the frequency of blinking and the movement of the line of sight, the terminal determines whether there is an abnormal state of the driver. For example, if an increase in blinking, fixation of the line of sight, and an increase in negative emotions are observed, it is regarded as abnormal.
[0293] Step 4:
[0294] If an anomaly is detected, the device will use warning mechanisms to alert the driver. The driver will be alerted through voice alerts, on-screen messages, or device vibrations. Simultaneously, a message will be sent to the driver encouraging them to take a break.
[0295] Step 5:
[0296] The device periodically transmits facial recognition data, emotion data, and ambient sound data collected during driving to a server. The transmitted data is used to analyze long-term driving patterns and emotional changes.
[0297] Step 6:
[0298] The server analyzes the collected data and records the driver's driving history and changes in their emotional state. Based on these results, it creates specific safe driving advice and improvement plans for the driver and provides feedback to the user as needed.
[0299] (Example 2)
[0300] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0301] Driver inattention and emotional fluctuations are often major contributing factors to traffic accidents. Therefore, it is necessary to improve safety by accurately monitoring the driver's condition in real time and providing appropriate warnings. However, conventional technology has the challenge of not being able to adequately recognize the driver's emotions or stress levels, making immediate response difficult.
[0302] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0303] In this invention, the server includes means for detecting the driver's face and collecting expression data, means for analyzing the collected expression data and recognizing the emotional state, means for analyzing the frequency of blinking and the movement of the line of sight to identify differences from the normal driving pattern, means for acquiring and analyzing the environmental sound during driving, and means for notifying the driver of a warning when an abnormality is detected, transmitting data to the server, and recording the driving pattern. Thereby, it becomes possible to comprehensively monitor the driver's state and take measures to improve safety in real time.
[0304] The "image acquisition means" is a technology or device for detecting the driver's face and collecting expression data.
[0305] The "AI analysis means" is a processing method using artificial intelligence technology for analyzing the collected expression data and recognizing the driver's emotional state.
[0306] The "biological signal analysis means" is a method or device for analyzing the frequency of blinking and the movement of the line of sight to identify differences from the normal driving pattern.
[0307] The "warning means" is a notification means for prompting the driver when an abnormality is detected by the AI analysis means.
[0308] The "voice analysis means" is a technology for acquiring the voice in the driving environment and analyzing the data to complement the accuracy of judging an abnormal state.
[0309] The "data transmission means" is a communication means for transmitting the analyzed data to the server and recording the driving pattern.
[0310] This invention is a system for improving the safety of drivers, and particularly aims to monitor the driver's state in real time and provide appropriate alerts when an abnormality is detected. This system mainly consists of a terminal installed in the driver's seat and a server, and combines a plurality of sensors and AI technology.
[0311] The device is equipped with an image acquisition mechanism using a high-resolution camera. This camera recognizes the driver's face and collects facial expression data. The collected data is processed by an AI analysis mechanism within the device to identify the driver's emotional state. An emotion recognition algorithm can be used to identify the emotional state, recognizing basic emotions such as joy, anger, surprise, and sadness.
[0312] Furthermore, the terminal is equipped with biosignal analysis capabilities to detect blinking frequency and eye movements. Based on this information, if movements different from the normal driving pattern are detected, it is judged to be abnormal. In addition, the terminal is equipped with voice analysis capabilities to acquire ambient sounds inside the car using a microphone. After noise is removed from the acquired voice data, it is analyzed by AI analysis capabilities to complement the judgment of emotional state.
[0313] If the device detects an anomaly, it will alert the driver using a warning system. The alert will be delivered either by voice or displayed on the screen. It can also provide physical feedback by gently vibrating the seat to encourage the driver to take a break.
[0314] All analysis data is transmitted from the terminal to the server. The server records the history of each driver's driving patterns and emotional states. This information is securely stored using a cloud server and analyzed using AI analysis tools. The analysis results are provided to the user as personalized safe driving advice.
[0315] As a concrete example, consider a driver who is fatigued from driving for a long time and whose emotions are unstable. In this case, the device can detect the abnormality from the driver's facial expressions and voice data, and prompt them to take a break with voice notifications and vibration feedback. In this way, it is possible to avoid potential dangers and ensure the driver's safety.
[0316] An example of a prompt to input into the generating AI model is, "Design a system that analyzes the driver's facial expressions and ambient sound data, monitors their emotional state in real time, and improves safety." This allows the system to comprehensively understand the driver's state and provide support for safe driving.
[0317] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0318] Step 1:
[0319] The terminal uses a facial recognition camera to detect the driver's face and acquire facial expression data. The input data is the video feed from the camera, and the output is digital data containing the driver's face region and its facial features. In this step, image processing techniques are used to identify the face and extract the information.
[0320] Step 2:
[0321] The terminal inputs the acquired facial expression data into an AI analysis system to analyze the driver's emotional state. The input is the facial expression data obtained in step 1, and the output is numerical data indicating the driver's emotional classification (e.g., joy, anger, sadness, etc.) and its intensity. This process involves specific actions using a pre-trained emotion recognition algorithm.
[0322] Step 3:
[0323] The terminal analyzes blink frequency and eye movements using biosignal analysis tools. The input is continuous video data from a camera, and the output is vector data of the driver's blink frequency and gaze direction. In this step, a specific transformation is performed to apply continuous data monitoring and motion detection algorithms.
[0324] Step 4:
[0325] The device acquires ambient sound using a sound collection device and analyzes it using a speech analysis method. The input data is the audio signal from the microphone, and the output is characteristic data of the sound environment. Specifically, it performs speech filtering and frequency analysis, and applies noise cancellation to improve accuracy.
[0326] Step 5:
[0327] The terminal integrates the above analysis results and detects anomalies by comparing them with driving patterns. Inputs include the aforementioned emotional state, blink and gaze data, and voice analysis results, while output is an alert message indicating the presence or absence of an anomaly and its details. This step includes specific actions taken to detect anomalies based on AI judgment, compared with normal driving data.
[0328] Step 6:
[0329] If an anomaly is detected, the terminal activates a warning mechanism, providing the driver with an alert via voice notification, display, or seat vibration. The input is the result of the anomaly detection, and the output is feedback to the driver. Specific actions in this step include sending signals to the user interface.
[0330] Step 7:
[0331] The analyzed data is sent from the terminal to the server and recorded as a history of driving patterns and emotional states. The input is all the analyzed data, and the output is the historical data stored on the server. This step involves the specific operation of data transmission using a secure communication protocol.
[0332] This process allows the entire system to comprehensively monitor the driver's condition and provide timely support to improve safety.
[0333] (Application Example 2)
[0334] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0335] There is a need for technology that can accurately detect changes in the driver's emotional state and fatigue while driving, and support safe driving. However, conventional technology has the challenge of being unable to make highly accurate judgments about abnormal conditions while taking into account the driver's emotions and ambient sounds.
[0336] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0337] In this invention, the server includes an image acquisition means for detecting the driver's face and analyzing the frequency of blinking and eye movements, an emotion recognition means for determining the driver's emotional state in real time, and a voice analysis means for acquiring ambient sounds and improving the accuracy of determining abnormal conditions. This enables high-precision monitoring of the driver's various states and supports safe driving.
[0338] The "image acquisition means" is a device that detects the driver's face and analyzes the frequency of blinking and eye movements.
[0339] An "AI analysis device" is a processing device that uses artificial intelligence to analyze collected data and determine abnormal conditions in the driver.
[0340] An "emotion recognition system" is a system that determines the driver's emotional state in real time based on facial expression data.
[0341] A "warning device" is a device that notifies the driver of an abnormality when an abnormality is detected by an AI analysis device.
[0342] The "data transmission means" is a function that transmits driving data to an information processing device after notification and records the driving pattern.
[0343] "Voice analysis means" refers to a method for acquiring ambient sounds during operation and using them to improve the accuracy of judging abnormal conditions.
[0344] Modes for carrying out the invention
[0345] This invention realizes a system that monitors the driver's condition with high precision and supports safe driving. The server can detect the driver's face using a terminal installed in the driver's seat. Using the terminal's camera, it determines the position of the face in real time and analyzes the frequency of blinking and eye movements. The hardware used can be a smartphone with a built-in camera or a dedicated device. Libraries such as OpenCV are used for image processing.
[0346] Furthermore, by using an AI model as an emotion recognition tool, the driver's emotional state can be determined in real time from their facial expression data. Deep learning frameworks such as TensorFlow and PyTorch are used here.
[0347] When an abnormal condition is detected, the terminal notifies the driver of a warning through voice notification or display. Speech synthesis technology can be used for the warning, providing appropriate feedback in real time.
[0348] Subsequently, driving data is transmitted to a server via a data transmission device. This records driving patterns and allows for long-term data analysis. Based on the collected data, the server can generate personalized advice for the driver using an AI model.
[0349] As a concrete example, if the system detects signs of fatigue based on facial expression data while a driver is driving long distances, it will issue a voice warning and prompt the driver to take a break. In this way, it is possible to improve driving habits while ensuring the driver's safety.
[0350] An example of a prompt message for a generating AI model would be: "Input the following facial expression data into the facial expression analysis model to identify the driver's emotional state. Input data: smile, blinking frequency, eye movements. Output: emotion label, stress level."
[0351] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0352] Step 1:
[0353] The device uses a camera to detect the driver's face in real time. The input is video from the camera, and the output is the position information of the face. The OpenCV library is used to analyze the video frame by frame to detect the face.
[0354] Step 2:
[0355] The device processes detected facial expressions using AI analysis tools to determine the emotional state. The input is facial feature point data, and the output is an emotion label and stress level. Facial expression analysis is performed using a pre-trained deep learning model with TensorFlow.
[0356] Step 3:
[0357] The device will issue a warning to the user if an abnormal emotional state is detected by AI analysis. Inputs are emotion labels and stress levels, and output is either a voice message or a display notification. Speech synthesis technology is used to generate and notify the warning in real time.
[0358] Step 4:
[0359] The terminal transmits driving data and emotional state data to a server, recording the history in a database. The input is a set of the driver's emotional state and driving data, and the output is the recorded data entry. A data transmission protocol is used to securely transfer the data to the server.
[0360] Step 5:
[0361] The server analyzes the collected data and generates advice for the driver using a generative AI model. The input is historical driving data and emotion labels, and the output is specific advice for improving driving. A Python script is used to perform data analysis and drive the AI model.
[0362] Step 6:
[0363] Users review the advice received from the server and use it to improve their safe driving practices. The input is advice messages from the server, and the output is the driver's improved behavior. Users review the displayed advice and apply it to their next driving session.
[0364] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0365] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0366] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0367] [Third Embodiment]
[0368] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0369] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0370] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0371] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0372] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0373] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0374] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0375] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0376] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0377] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0378] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0379] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0380] This invention relates to a system that ensures driver safety by installing a mobile device such as a smartphone in the driver's seat of a vehicle and continuously monitoring the driver's face using its front camera. This system has the function of monitoring the driver's blinking frequency and eye movements in real time and issuing a warning when an abnormality is detected. It also simultaneously records ambient sounds to improve the accuracy of abnormality detection.
[0381] First, the device uses facial recognition to detect the driver's face and acquires its location and movement information. This makes it possible to detect early signs of the driver becoming drowsy or inattentive. The device then inputs the acquired data into an AI analysis system to analyze the driver's blinking frequency and eye movement patterns, and compares them to normal conditions.
[0382] When AI analysis detects an anomaly, the device uses warning mechanisms to alert the driver. This alert can be delivered via voice, on-screen display, or vibration to draw the driver's attention. For example, if the user's blinking frequency increases significantly while driving, they might receive a message such as, "Please be careful. We recommend taking a break."
[0383] Furthermore, the data collected by the terminal is periodically sent to the server. The server analyzes this data, recording and accumulating each driver's driving pattern. This allows for a long-term understanding of drivers' driving habits and enables the proposal of specific guidance to promote safe driving.
[0384] This system can prevent dangerous behaviors such as drowsy driving, significantly improving safety for both the driver and the user. Furthermore, it provides appropriate feedback tailored to each user's individual circumstances, contributing to improved driving habits.
[0385] The following describes the processing flow.
[0386] Step 1:
[0387] The device activates a camera installed in the driver's seat to detect the driver's face. Using a face detection algorithm, it recognizes the driver's face position and collects basic data for tracking blink frequency and eye movements in real time.
[0388] Step 2:
[0389] The device inputs the acquired facial movement data into an AI analysis system, which analyzes blinking frequency and eye movement. The AI compares this data to normal driving patterns to determine if there are any abnormalities.
[0390] Step 3:
[0391] If an anomaly is detected through AI analysis, the device will immediately alert the driver using various warning methods. For example, it may provide voice messages such as, "Signs of drowsiness have been detected. Please take a break," alerts on the screen display, or vibration feedback.
[0392] Step 4:
[0393] The terminal periodically sends data to the server regarding abnormal occurrences and normal operation data. This data includes operating time, blinking frequency, and eye movement characteristics.
[0394] Step 5:
[0395] The server analyzes the received data and stores the driver's driving history and patterns. This stored data is used to provide personalized safe driving advice and improvement plans for each driver. The server provides this information to the user and notifies them of points to consider during their next drive.
[0396] (Example 1)
[0397] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0398] The present invention aims to provide a new system for promoting safe driving by preventing dangerous behaviors such as decreased driver attention and drowsy driving. Conventional systems have the problem of insufficient accuracy in detecting abnormal driver conditions, making it difficult to provide appropriate warnings and feedback. Therefore, it is necessary to quickly and accurately detect abnormalities during driving and provide feedback tailored to individual drivers.
[0399] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0400] In this invention, the server includes at least one image acquisition means for detecting the driver's face and analyzing the frequency of blinking and eye movements; an AI analysis means for determining the driver's abnormal condition by comparing the acquired data with normal driving data; a warning means for notifying the driver of a warning via voice or display when an abnormality is detected by the AI analysis means; a data transmission means for transmitting data to a data storage device after notification by the warning means to record and analyze long-term driving patterns; and a means for creating prompt sentences using a generative AI model to customize the feedback provided to the driver. This enables rapid detection of abnormal driving conditions and the provision of appropriate warnings and feedback to the driver.
[0401] "Image acquisition means" refers to a combination of hardware and software used to detect the driver's face and analyze the frequency of blinking and eye movements.
[0402] "AI analysis means" refers to a system that uses artificial intelligence technology to automatically determine abnormal conditions in the driver based on acquired data compared with normal driving data.
[0403] "Warning means" refers to a device or software function that notifies the driver of an abnormality detected by AI analysis means, either through voice or display.
[0404] "Data transmission means" refers to communication technologies and protocols for transmitting data to a data storage device after notification by a warning means.
[0405] A "generative AI model" refers to an artificial intelligence model that has the ability to create prompts to customize the feedback given to the driver.
[0406] This invention is a system that utilizes a portable information terminal installed in the driver's seat to prevent dangerous behaviors such as decreased driver attention and drowsy driving. This system can detect the driver's face and analyze blinking frequency and eye movement in real time. Specifically, a smartphone or tablet is used as the hardware, and a general facial recognition API, which is the latest facial recognition technology, is used as the software.
[0407] The device utilizes facial recognition to instantly identify the driver's face from video data and continuously monitors their location and status. This facial recognition is implemented using machine learning libraries and platforms, such as TensorFlow and PyTorch. This allows for immediate detection of drowsiness or distraction in the driver, enabling the system to respond to changes in the driver's condition.
[0408] Furthermore, in addition to visual data during driving, ambient sounds are also collected, and voice analysis technology is used to improve the accuracy of detecting abnormal conditions. A voice recognition library is used to process this voice data.
[0409] The server receives data transmitted from terminals, securely stores it in a cloud environment, and analyzes driving patterns over a long period. By utilizing analysis tools provided by cloud service providers, it is possible to create appropriate feedback based on the driving data of individual drivers.
[0410] When the AI analysis system detects an anomaly, the terminal immediately issues a warning to alert the driver. The warning is conveyed to the driver as an audio message, display, or vibration. For example, specific advice such as, "Your blinking frequency is higher than normal. We recommend you take a break," is provided.
[0411] Furthermore, by using a generative AI model, customized prompt messages are generated based on data for each driver, providing even more specific improvement suggestions. These prompt messages are created in the format of, "How should we prompt the driver if their attention is waning?" By using such prompts, users can understand their own driving habits and strive for safer driving.
[0412] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0413] Step 1:
[0414] The device activates the front camera and detects the driver's face. The input is real-time video data, and the output is the location and feature information of the detected face. This prepares the face recognition algorithm to monitor the driver's blinking frequency and eye movements.
[0415] Step 2:
[0416] The terminal continuously monitors the driver's blinking frequency and eye movements based on detected facial information. The input is facial feature information, and the output is blinking frequency data and eye movement patterns. Using AI analysis, this data is analyzed in real time and processed by comparing it to normal patterns.
[0417] Step 3:
[0418] Based on the analysis results, the device notifies the driver of any abnormalities in blinking frequency or gaze patterns. The input is the analyzed driver status data, and the output is a warning message. Specific actions such as voice alerts, display notifications, and vibrations are selected to inform the driver of the abnormal condition.
[0419] Step 4:
[0420] After issuing an alert, the terminal sends the collected data to the server. The input is the entire operational data, and the output is a data record on cloud storage. The data is securely transferred using the SSL / TLS protocol and prepared for data analysis.
[0421] Step 5:
[0422] The server analyzes the received data and examines each driver's driving pattern. The input is driving data stored in cloud storage, and the output is feedback information for the driver. By using a generative AI model, prompt messages are created and the feedback provided to the driver is customized.
[0423] Step 6:
[0424] Based on the feedback provided, users implement improvements that contribute to safer driving. The input is feedback documents provided by the server, and the output is improved driving habits. Specifically, they follow suggestions such as "take sufficient rest on the next drive."
[0425] (Application Example 1)
[0426] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0427] Even when autonomous vehicles are standard equipment, ensuring safety remains a challenge when a human driver switches to manual operation in an unforeseen situation. In particular, it is essential to quickly detect and respond to dangers caused by driver fatigue or distraction.
[0428] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0429] In this invention, the server includes an image acquisition means for detecting the driver's face and analyzing the frequency of blinking and eye movements; an AI analysis means for determining the driver's abnormal condition based on the acquired data; and a warning means for notifying a warning when an abnormality is detected by the AI analysis means. This enables monitoring and real-time notification to allow the driver to safely perform manual operations in an autonomous vehicle.
[0430] "Image acquisition means" refers to a device or method for detecting the driver's face and analyzing the frequency of blinking and eye movements.
[0431] "AI analysis method" refers to a process that uses artificial intelligence to determine the driver's abnormal condition based on acquired data.
[0432] A "warning mechanism" is a function that alerts the driver visually or audibly when an abnormality is detected by the AI analysis mechanism.
[0433] "Data transmission means" refers to a method for recording driving patterns by sending data to a server after notification by a warning means.
[0434] A "real-time notification method" is a function that instantly displays information on the control panel of an autonomous vehicle.
[0435] "Voice analysis means" refers to a method of acquiring ambient sounds during operation and improving the accuracy of judging abnormal conditions based on that data.
[0436] This invention provides a system for enabling a driver to safely perform manual operations in an autonomous vehicle. To achieve this, the server uses image acquisition means to detect the driver's face. The image acquisition means uses a terminal equipped with a camera, which is fixed in an appropriate position within the vehicle. The terminal continuously captures the driver's face and analyzes the frequency of blinking and eye movements.
[0437] The analyzed image data is processed in real time by AI analysis tools. This AI analysis utilizes machine learning libraries such as TensorFlow and Keras. If an abnormal condition is detected as a result of the analysis, a warning system is activated for the user (driver), providing visual or audible alerts.
[0438] Furthermore, after a warning is issued, the terminal uses a data transmission mechanism to send the acquired data to a server. This data records the driver's driving patterns and is used for future improvements. The server is equipped with a voice analysis mechanism, which acquires ambient sounds during driving and improves the accuracy of the AI analysis.
[0439] As a specific example, if the system detects an increase in blinking frequency during long-distance driving, indicating driver fatigue, it can immediately display a visual and audible message such as "Caution: Driver fatigue is detected. A break is recommended," encouraging safe driving.
[0440] An example of a prompt message for a generated AI model might be: "I want to develop an app that detects and warns of blinking and gaze abnormalities for a real-time monitoring AI assistant for drivers. What kind of data and model should I use?"
[0441] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0442] Step 1:
[0443] The terminal detects the driver's face in real time. The input is video data from a camera, and a face detection algorithm is used to identify the face region. The output is the position information of the face. In this step, common computer vision techniques are used for face detection.
[0444] Step 2:
[0445] The device analyzes blink frequency and eye movement from the acquired facial region. The input is facial position information, and the output is blink frequency data and eye movement patterns. This data is analyzed by a deep learning model and compared to normal baseline data.
[0446] Step 3:
[0447] The terminal analyzes ambient sounds using acquired audio data. The input is audio data collected from a microphone, and the output is analysis data that detects noise and abnormal sounds. Audio analysis helps to understand changes in the driving environment and improve the accuracy of anomaly detection.
[0448] Step 4:
[0449] The AI analysis tool uses the analysis data from steps 2 and 3 to determine the driver's abnormal condition. The inputs are blink and gaze data, and voice analysis data, and the output is whether or not an abnormality is present. A machine learning model is used for this determination, and abnormalities exceeding a set threshold are detected.
[0450] Step 5:
[0451] When the user (driver) detects an anomaly, the terminal immediately issues a warning. The input is an anomaly signal from the AI analysis system, and the output is a visual display or audio alert. In this step, specific actions are taken to draw the driver's attention.
[0452] Step 6:
[0453] The server receives data sent from the terminal after a warning and records long-term driving patterns. Inputs are driving patterns and anomaly detection data, and output is a database of driving history. This data will be useful for improving driving habits in the future.
[0454] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0455] This invention is a system that combines multiple sensors and AI technology using a portable terminal installed in the driver's seat to improve driver safety. This system comprehensively monitors the driver's condition by combining driver face detection and emotion recognition, and takes appropriate action when an abnormality is detected.
[0456] First, the device uses facial detection technology to capture the driver's face with a camera and collect facial expression data in real time. Next, an emotion engine analyzes this data to recognize the driver's emotional state. In this process, emotions such as joy, anger, surprise, and sadness are identified. The emotion engine can also determine the driver's level of stress and tension.
[0457] In addition, the device analyzes blink frequency and eye movements based on collected facial expression data to determine any discrepancies from normal driving patterns. If an abnormal condition is detected, for example, if the driver shows signs of stress, it will issue a warning to the driver using warning mechanisms. Alerts are provided through methods such as voice notifications, display notifications, and vibration feedback to encourage the driver to take a break.
[0458] Furthermore, the terminal collects ambient sound data while driving and uses voice analysis to enhance the accuracy of the emotion engine's judgments. The resulting facial recognition data, ambient sound data, and emotion data are transmitted to a server and recorded as a long-term history of driving conditions and emotional changes. The server analyzes the accumulated data to create specific advice for improving the driver's driving patterns and driving safely, and provides this advice to the user as needed.
[0459] As a concrete example, consider a situation where a driver is fatigued from driving for a long time, and their feelings of anger and frustration are increasing. In this case, the device detects this state with its emotion engine, issues a warning at the appropriate time to encourage the driver to take a break, and avoids potential dangers. Through this process, the invention accurately grasps the driver's state and contributes to accident prevention.
[0460] The following describes the processing flow.
[0461] Step 1:
[0462] The device activates a camera installed in the driver's seat to recognize the driver's face. Using a facial recognition algorithm, it detects the driver's face position and tracks blinking frequency and eye movements in real time.
[0463] Step 2:
[0464] The device inputs the acquired facial expression data into an emotion engine to analyze the driver's emotional state. Emotions such as joy, anger, surprise, and sadness are identified, and the degree of stress and tension is also evaluated.
[0465] Step 3:
[0466] The device integrates the results of emotion analysis with the results of blink frequency and eye movement analysis to determine whether the driver is exhibiting any abnormal behavior. For example, increased blinking, fixed gaze, and an increase in negative emotions are considered abnormal.
[0467] Step 4:
[0468] If an anomaly is detected, the device will use warning mechanisms to alert the driver. The driver will be alerted through voice alerts, on-screen messages, or device vibrations. Simultaneously, a message will be sent to the driver encouraging them to take a break.
[0469] Step 5:
[0470] The device periodically transmits facial recognition data, emotion data, and ambient sound data collected during driving to a server. The transmitted data is used to analyze long-term driving patterns and emotional changes.
[0471] Step 6:
[0472] The server analyzes the collected data and records the driver's driving history and changes in their emotional state. Based on these results, it creates specific safe driving advice and improvement plans for the driver and provides feedback to the user as needed.
[0473] (Example 2)
[0474] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0475] Driver inattention and emotional fluctuations are often major contributing factors to traffic accidents. Therefore, it is necessary to improve safety by accurately monitoring the driver's condition in real time and providing appropriate warnings. However, conventional technology has the challenge of not being able to adequately recognize the driver's emotions or stress levels, making immediate response difficult.
[0476] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0477] In this invention, the server includes means for detecting the driver's face and collecting facial expression data, means for analyzing the collected facial expression data and recognizing the emotional state, means for analyzing the blinking frequency and eye movements and identifying differences from the normal driving pattern, means for acquiring and analyzing ambient sounds during driving, and means for notifying the driver of a warning when an abnormality is detected, transmitting data to the server, and recording the driving pattern. This enables comprehensive monitoring of the driver's condition and real-time measures to improve safety.
[0478] "Image acquisition means" refers to technology or equipment for detecting the driver's face and collecting facial expression data.
[0479] "AI analysis method" refers to a processing method that uses artificial intelligence technology to analyze collected facial expression data and recognize the driver's emotional state.
[0480] "Biometric signal analysis means" refers to a method or apparatus for analyzing blink frequency and eye movement to identify differences from normal operating patterns.
[0481] A "warning mechanism" is a notification mechanism that alerts the driver when an abnormality is detected by an AI analysis mechanism.
[0482] "Voice analysis means" refers to a technology that acquires audio from the driving environment and analyzes that data to enhance the accuracy of judging abnormal conditions.
[0483] "Data transmission means" refers to a communication means for sending the analyzed data to a server and recording the driving pattern.
[0484] This invention is a system for improving driver safety, and in particular aims to monitor the driver's condition in real time and provide appropriate alerts when abnormalities are detected. The system mainly consists of a terminal and server installed in the driver's seat, and combines multiple sensors and AI technology.
[0485] The device is equipped with an image acquisition mechanism using a high-resolution camera. This camera recognizes the driver's face and collects facial expression data. The collected data is processed by an AI analysis mechanism within the device to identify the driver's emotional state. An emotion recognition algorithm can be used to identify the emotional state, recognizing basic emotions such as joy, anger, surprise, and sadness.
[0486] Furthermore, the terminal is equipped with biosignal analysis capabilities to detect blinking frequency and eye movements. Based on this information, if movements different from the normal driving pattern are detected, it is judged to be abnormal. In addition, the terminal is equipped with voice analysis capabilities to acquire ambient sounds inside the car using a microphone. After noise is removed from the acquired voice data, it is analyzed by AI analysis capabilities to complement the judgment of emotional state.
[0487] If the device detects an anomaly, it will alert the driver using a warning system. The alert will be delivered either by voice or displayed on the screen. It can also provide physical feedback by gently vibrating the seat to encourage the driver to take a break.
[0488] All analysis data is transmitted from the terminal to the server. The server records the history of each driver's driving patterns and emotional states. This information is securely stored using a cloud server and analyzed using AI analysis tools. The analysis results are provided to the user as personalized safe driving advice.
[0489] As a concrete example, consider a driver who is fatigued from driving for a long time and whose emotions are unstable. In this case, the device can detect the abnormality from the driver's facial expressions and voice data, and prompt them to take a break with voice notifications and vibration feedback. In this way, it is possible to avoid potential dangers and ensure the driver's safety.
[0490] An example of a prompt to input into the generating AI model is, "Design a system that analyzes the driver's facial expressions and ambient sound data, monitors their emotional state in real time, and improves safety." This allows the system to comprehensively understand the driver's state and provide support for safe driving.
[0491] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0492] Step 1:
[0493] The terminal uses a facial recognition camera to detect the driver's face and acquire facial expression data. The input data is the video feed from the camera, and the output is digital data containing the driver's face region and its facial features. In this step, image processing techniques are used to identify the face and extract the information.
[0494] Step 2:
[0495] The terminal inputs the acquired facial expression data into an AI analysis system to analyze the driver's emotional state. The input is the facial expression data obtained in step 1, and the output is numerical data indicating the driver's emotional classification (e.g., joy, anger, sadness, etc.) and its intensity. This process involves specific actions using a pre-trained emotion recognition algorithm.
[0496] Step 3:
[0497] The terminal analyzes blink frequency and eye movements using biosignal analysis tools. The input is continuous video data from a camera, and the output is vector data of the driver's blink frequency and gaze direction. In this step, a specific transformation is performed to apply continuous data monitoring and motion detection algorithms.
[0498] Step 4:
[0499] The device acquires ambient sound using a sound collection device and analyzes it using a speech analysis method. The input data is the audio signal from the microphone, and the output is characteristic data of the sound environment. Specifically, it performs speech filtering and frequency analysis, and applies noise cancellation to improve accuracy.
[0500] Step 5:
[0501] The terminal integrates the above analysis results and detects anomalies by comparing them with driving patterns. Inputs include the aforementioned emotional state, blink and gaze data, and voice analysis results, while output is an alert message indicating the presence or absence of an anomaly and its details. This step includes specific actions taken to detect anomalies based on AI judgment, compared with normal driving data.
[0502] Step 6:
[0503] If an anomaly is detected, the terminal activates a warning mechanism, providing the driver with an alert via voice notification, display, or seat vibration. The input is the result of the anomaly detection, and the output is feedback to the driver. Specific actions in this step include sending signals to the user interface.
[0504] Step 7:
[0505] The analyzed data is sent from the terminal to the server and recorded as a history of driving patterns and emotional states. The input is all the analyzed data, and the output is the historical data stored on the server. This step involves the specific operation of data transmission using a secure communication protocol.
[0506] This process allows the entire system to comprehensively monitor the driver's condition and provide timely support to improve safety.
[0507] (Application Example 2)
[0508] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0509] There is a need for technology that can accurately detect changes in the driver's emotional state and fatigue while driving, and support safe driving. However, conventional technology has the challenge of being unable to make highly accurate judgments about abnormal conditions while taking into account the driver's emotions and ambient sounds.
[0510] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0511] In this invention, the server includes an image acquisition means for detecting the driver's face and analyzing the frequency of blinking and eye movements, an emotion recognition means for determining the driver's emotional state in real time, and a voice analysis means for acquiring ambient sounds and improving the accuracy of determining abnormal conditions. This enables high-precision monitoring of the driver's various states and supports safe driving.
[0512] The "image acquisition means" is a device that detects the driver's face and analyzes the frequency of blinking and eye movements.
[0513] An "AI analysis device" is a processing device that uses artificial intelligence to analyze collected data and determine abnormal conditions in the driver.
[0514] An "emotion recognition system" is a system that determines the driver's emotional state in real time based on facial expression data.
[0515] A "warning device" is a device that notifies the driver of an abnormality when an abnormality is detected by an AI analysis device.
[0516] The "data transmission means" is a function that transmits driving data to an information processing device after notification and records the driving pattern.
[0517] "Voice analysis means" refers to a method for acquiring ambient sounds during operation and using them to improve the accuracy of judging abnormal conditions.
[0518] Modes for carrying out the invention
[0519] This invention realizes a system that monitors the driver's condition with high precision and supports safe driving. The server can detect the driver's face using a terminal installed in the driver's seat. Using the terminal's camera, it determines the position of the face in real time and analyzes the frequency of blinking and eye movements. The hardware used can be a smartphone with a built-in camera or a dedicated device. Libraries such as OpenCV are used for image processing.
[0520] Furthermore, by using an AI model as an emotion recognition tool, the driver's emotional state can be determined in real time from their facial expression data. Deep learning frameworks such as TensorFlow and PyTorch are used here.
[0521] When an abnormal condition is detected, the terminal notifies the driver of a warning through voice notification or display. Speech synthesis technology can be used for the warning, providing appropriate feedback in real time.
[0522] Subsequently, driving data is transmitted to a server via a data transmission device. This records driving patterns and allows for long-term data analysis. Based on the collected data, the server can generate personalized advice for the driver using an AI model.
[0523] As a concrete example, if the system detects signs of fatigue based on facial expression data while a driver is driving long distances, it will issue a voice warning and prompt the driver to take a break. In this way, it is possible to improve driving habits while ensuring the driver's safety.
[0524] An example of a prompt message for a generating AI model would be: "Input the following facial expression data into the facial expression analysis model to identify the driver's emotional state. Input data: smile, blinking frequency, eye movements. Output: emotion label, stress level."
[0525] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0526] Step 1:
[0527] The device uses a camera to detect the driver's face in real time. The input is video from the camera, and the output is the position information of the face. The OpenCV library is used to analyze the video frame by frame to detect the face.
[0528] Step 2:
[0529] The device processes detected facial expressions using AI analysis tools to determine the emotional state. The input is facial feature point data, and the output is an emotion label and stress level. Facial expression analysis is performed using a pre-trained deep learning model with TensorFlow.
[0530] Step 3:
[0531] The device will issue a warning to the user if an abnormal emotional state is detected by AI analysis. Inputs are emotion labels and stress levels, and output is either a voice message or a display notification. Speech synthesis technology is used to generate and notify the warning in real time.
[0532] Step 4:
[0533] The terminal transmits driving data and emotional state data to a server, recording the history in a database. The input is a set of the driver's emotional state and driving data, and the output is the recorded data entry. A data transmission protocol is used to securely transfer the data to the server.
[0534] Step 5:
[0535] The server analyzes the collected data and generates advice for the driver using a generative AI model. The input is historical driving data and emotion labels, and the output is specific advice for improving driving. A Python script is used to perform data analysis and drive the AI model.
[0536] Step 6:
[0537] Users review the advice received from the server and use it to improve their safe driving practices. The input is advice messages from the server, and the output is the driver's improved behavior. Users review the displayed advice and apply it to their next driving session.
[0538] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0539] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0540] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0541] [Fourth Embodiment]
[0542] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0543] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0544] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0545] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0546] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0547] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0548] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0549] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0550] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0551] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0552] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0553] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0554] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0555] This invention relates to a system that ensures driver safety by installing a mobile device such as a smartphone in the driver's seat of a vehicle and continuously monitoring the driver's face using its front camera. This system has the function of monitoring the driver's blinking frequency and eye movements in real time and issuing a warning when an abnormality is detected. It also simultaneously records ambient sounds to improve the accuracy of abnormality detection.
[0556] First, the device uses facial recognition to detect the driver's face and acquires its location and movement information. This makes it possible to detect early signs of the driver becoming drowsy or inattentive. The device then inputs the acquired data into an AI analysis system to analyze the driver's blinking frequency and eye movement patterns, and compares them to normal conditions.
[0557] When AI analysis detects an anomaly, the device uses warning mechanisms to alert the driver. This alert can be delivered via voice, on-screen display, or vibration to draw the driver's attention. For example, if the user's blinking frequency increases significantly while driving, they might receive a message such as, "Please be careful. We recommend taking a break."
[0558] Furthermore, the data collected by the terminal is periodically sent to the server. The server analyzes this data, recording and accumulating each driver's driving pattern. This allows for a long-term understanding of drivers' driving habits and enables the proposal of specific guidance to promote safe driving.
[0559] This system can prevent dangerous behaviors such as drowsy driving, significantly improving safety for both the driver and the user. Furthermore, it provides appropriate feedback tailored to each user's individual circumstances, contributing to improved driving habits.
[0560] The following describes the processing flow.
[0561] Step 1:
[0562] The device activates a camera installed in the driver's seat to detect the driver's face. Using a face detection algorithm, it recognizes the driver's face position and collects basic data for tracking blink frequency and eye movements in real time.
[0563] Step 2:
[0564] The device inputs the acquired facial movement data into an AI analysis system, which analyzes blinking frequency and eye movement. The AI compares this data to normal driving patterns to determine if there are any abnormalities.
[0565] Step 3:
[0566] If an anomaly is detected through AI analysis, the device will immediately alert the driver using various warning methods. For example, it may provide voice messages such as, "Signs of drowsiness have been detected. Please take a break," alerts on the screen display, or vibration feedback.
[0567] Step 4:
[0568] The terminal periodically sends data to the server regarding abnormal occurrences and normal operation data. This data includes operating time, blinking frequency, and eye movement characteristics.
[0569] Step 5:
[0570] The server analyzes the received data and stores the driver's driving history and patterns. This stored data is used to provide personalized safe driving advice and improvement plans for each driver. The server provides this information to the user and notifies them of points to consider during their next drive.
[0571] (Example 1)
[0572] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0573] The present invention aims to provide a new system for promoting safe driving by preventing dangerous behaviors such as decreased driver attention and drowsy driving. Conventional systems have the problem of insufficient accuracy in detecting abnormal driver conditions, making it difficult to provide appropriate warnings and feedback. Therefore, it is necessary to quickly and accurately detect abnormalities during driving and provide feedback tailored to individual drivers.
[0574] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0575] In this invention, the server includes at least one image acquisition means for detecting the driver's face and analyzing the frequency of blinking and eye movements; an AI analysis means for determining the driver's abnormal condition by comparing the acquired data with normal driving data; a warning means for notifying the driver of a warning via voice or display when an abnormality is detected by the AI analysis means; a data transmission means for transmitting data to a data storage device after notification by the warning means to record and analyze long-term driving patterns; and a means for creating prompt sentences using a generative AI model to customize the feedback provided to the driver. This enables rapid detection of abnormal driving conditions and the provision of appropriate warnings and feedback to the driver.
[0576] "Image acquisition means" refers to a combination of hardware and software used to detect the driver's face and analyze the frequency of blinking and eye movements.
[0577] "AI analysis means" refers to a system that uses artificial intelligence technology to automatically determine abnormal conditions in the driver based on acquired data compared with normal driving data.
[0578] "Warning means" refers to a device or software function that notifies the driver of an abnormality detected by AI analysis means, either through voice or display.
[0579] "Data transmission means" refers to communication technologies and protocols for transmitting data to a data storage device after notification by a warning means.
[0580] A "generative AI model" refers to an artificial intelligence model that has the ability to create prompts to customize the feedback given to the driver.
[0581] This invention is a system that utilizes a portable information terminal installed in the driver's seat to prevent dangerous behaviors such as decreased driver attention and drowsy driving. This system can detect the driver's face and analyze blinking frequency and eye movement in real time. Specifically, a smartphone or tablet is used as the hardware, and a general facial recognition API, which is the latest facial recognition technology, is used as the software.
[0582] The device utilizes facial recognition to instantly identify the driver's face from video data and continuously monitors their location and status. This facial recognition is implemented using machine learning libraries and platforms, such as TensorFlow and PyTorch. This allows for immediate detection of drowsiness or distraction in the driver, enabling the system to respond to changes in the driver's condition.
[0583] Furthermore, in addition to visual data during driving, ambient sounds are also collected, and voice analysis technology is used to improve the accuracy of detecting abnormal conditions. A voice recognition library is used to process this voice data.
[0584] The server receives data transmitted from terminals, securely stores it in a cloud environment, and analyzes driving patterns over a long period. By utilizing analysis tools provided by cloud service providers, it is possible to create appropriate feedback based on the driving data of individual drivers.
[0585] When the AI analysis system detects an anomaly, the terminal immediately issues a warning to alert the driver. The warning is conveyed to the driver as an audio message, display, or vibration. For example, specific advice such as, "Your blinking frequency is higher than normal. We recommend you take a break," is provided.
[0586] Furthermore, by using a generative AI model, customized prompt messages are generated based on data for each driver, providing even more specific improvement suggestions. These prompt messages are created in the format of, "How should we prompt the driver if their attention is waning?" By using such prompts, users can understand their own driving habits and strive for safer driving.
[0587] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0588] Step 1:
[0589] The device activates the front camera and detects the driver's face. The input is real-time video data, and the output is the location and feature information of the detected face. This prepares the face recognition algorithm to monitor the driver's blinking frequency and eye movements.
[0590] Step 2:
[0591] The terminal continuously monitors the driver's blinking frequency and eye movements based on detected facial information. The input is facial feature information, and the output is blinking frequency data and eye movement patterns. Using AI analysis, this data is analyzed in real time and processed by comparing it to normal patterns.
[0592] Step 3:
[0593] Based on the analysis results, the device notifies the driver of any abnormalities in blinking frequency or gaze patterns. The input is the analyzed driver status data, and the output is a warning message. Specific actions such as voice alerts, display notifications, and vibrations are selected to inform the driver of the abnormal condition.
[0594] Step 4:
[0595] After issuing an alert, the terminal sends the collected data to the server. The input is the entire operational data, and the output is a data record on cloud storage. The data is securely transferred using the SSL / TLS protocol and prepared for data analysis.
[0596] Step 5:
[0597] The server analyzes the received data and examines each driver's driving pattern. The input is driving data stored in cloud storage, and the output is feedback information for the driver. By using a generative AI model, prompt messages are created and the feedback provided to the driver is customized.
[0598] Step 6:
[0599] Based on the feedback provided, users implement improvements that contribute to safer driving. The input is feedback documents provided by the server, and the output is improved driving habits. Specifically, they follow suggestions such as "take sufficient rest on the next drive."
[0600] (Application Example 1)
[0601] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0602] Even when autonomous vehicles are standard equipment, ensuring safety remains a challenge when a human driver switches to manual operation in an unforeseen situation. In particular, it is essential to quickly detect and respond to dangers caused by driver fatigue or distraction.
[0603] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0604] In this invention, the server includes an image acquisition means for detecting the driver's face and analyzing the frequency of blinking and eye movements; an AI analysis means for determining the driver's abnormal condition based on the acquired data; and a warning means for notifying a warning when an abnormality is detected by the AI analysis means. This enables monitoring and real-time notification to allow the driver to safely perform manual operations in an autonomous vehicle.
[0605] "Image acquisition means" refers to a device or method for detecting the driver's face and analyzing the frequency of blinking and eye movements.
[0606] "AI analysis method" refers to a process that uses artificial intelligence to determine the driver's abnormal condition based on acquired data.
[0607] A "warning mechanism" is a function that alerts the driver visually or audibly when an abnormality is detected by the AI analysis mechanism.
[0608] "Data transmission means" refers to a method for recording driving patterns by sending data to a server after notification by a warning means.
[0609] A "real-time notification method" is a function that instantly displays information on the control panel of an autonomous vehicle.
[0610] "Voice analysis means" refers to a method of acquiring ambient sounds during operation and improving the accuracy of judging abnormal conditions based on that data.
[0611] This invention provides a system for enabling a driver to safely perform manual operations in an autonomous vehicle. To achieve this, the server uses image acquisition means to detect the driver's face. The image acquisition means uses a terminal equipped with a camera, which is fixed in an appropriate position within the vehicle. The terminal continuously captures the driver's face and analyzes the frequency of blinking and eye movements.
[0612] The analyzed image data is processed in real time by AI analysis tools. This AI analysis utilizes machine learning libraries such as TensorFlow and Keras. If an abnormal condition is detected as a result of the analysis, a warning system is activated for the user (driver), providing visual or audible alerts.
[0613] Furthermore, after a warning is issued, the terminal uses a data transmission mechanism to send the acquired data to a server. This data records the driver's driving patterns and is used for future improvements. The server is equipped with a voice analysis mechanism, which acquires ambient sounds during driving and improves the accuracy of the AI analysis.
[0614] As a specific example, if the system detects an increase in blinking frequency during long-distance driving, indicating driver fatigue, it can immediately display a visual and audible message such as "Caution: Driver fatigue is detected. A break is recommended," encouraging safe driving.
[0615] An example of a prompt message for a generated AI model might be: "I want to develop an app that detects and warns of blinking and gaze abnormalities for a real-time monitoring AI assistant for drivers. What kind of data and model should I use?"
[0616] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0617] Step 1:
[0618] The terminal detects the driver's face in real time. The input is video data from a camera, and a face detection algorithm is used to identify the face region. The output is the position information of the face. In this step, common computer vision techniques are used for face detection.
[0619] Step 2:
[0620] The device analyzes blink frequency and eye movement from the acquired facial region. The input is facial position information, and the output is blink frequency data and eye movement patterns. This data is analyzed by a deep learning model and compared to normal baseline data.
[0621] Step 3:
[0622] The terminal analyzes ambient sounds using acquired audio data. The input is audio data collected from a microphone, and the output is analysis data that detects noise and abnormal sounds. Audio analysis helps to understand changes in the driving environment and improve the accuracy of anomaly detection.
[0623] Step 4:
[0624] The AI analysis tool uses the analysis data from steps 2 and 3 to determine the driver's abnormal condition. The inputs are blink and gaze data, and voice analysis data, and the output is whether or not an abnormality is present. A machine learning model is used for this determination, and abnormalities exceeding a set threshold are detected.
[0625] Step 5:
[0626] When the user (driver) detects an anomaly, the terminal immediately issues a warning. The input is an anomaly signal from the AI analysis system, and the output is a visual display or audio alert. In this step, specific actions are taken to draw the driver's attention.
[0627] Step 6:
[0628] The server receives data sent from the terminal after a warning and records long-term driving patterns. Inputs are driving patterns and anomaly detection data, and output is a database of driving history. This data will be useful for improving driving habits in the future.
[0629] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0630] This invention is a system that combines multiple sensors and AI technology using a portable terminal installed in the driver's seat to improve driver safety. This system comprehensively monitors the driver's condition by combining driver face detection and emotion recognition, and takes appropriate action when an abnormality is detected.
[0631] First, the device uses facial detection technology to capture the driver's face with a camera and collect facial expression data in real time. Next, an emotion engine analyzes this data to recognize the driver's emotional state. In this process, emotions such as joy, anger, surprise, and sadness are identified. The emotion engine can also determine the driver's level of stress and tension.
[0632] In addition, the device analyzes blink frequency and eye movements based on collected facial expression data to determine any discrepancies from normal driving patterns. If an abnormal condition is detected, for example, if the driver shows signs of stress, it will issue a warning to the driver using warning mechanisms. Alerts are provided through methods such as voice notifications, display notifications, and vibration feedback to encourage the driver to take a break.
[0633] Furthermore, the terminal collects ambient sound data while driving and uses voice analysis to enhance the accuracy of the emotion engine's judgments. The resulting facial recognition data, ambient sound data, and emotion data are transmitted to a server and recorded as a long-term history of driving conditions and emotional changes. The server analyzes the accumulated data to create specific advice for improving the driver's driving patterns and driving safely, and provides this advice to the user as needed.
[0634] As a concrete example, consider a situation where a driver is fatigued from driving for a long time, and their feelings of anger and frustration are increasing. In this case, the device detects this state with its emotion engine, issues a warning at the appropriate time to encourage the driver to take a break, and avoids potential dangers. Through this process, the invention accurately grasps the driver's state and contributes to accident prevention.
[0635] The following describes the processing flow.
[0636] Step 1:
[0637] The device activates a camera installed in the driver's seat to recognize the driver's face. Using a facial recognition algorithm, it detects the driver's face position and tracks blinking frequency and eye movements in real time.
[0638] Step 2:
[0639] The device inputs the acquired facial expression data into an emotion engine to analyze the driver's emotional state. Emotions such as joy, anger, surprise, and sadness are identified, and the degree of stress and tension is also evaluated.
[0640] Step 3:
[0641] The device integrates the results of emotion analysis with the results of blink frequency and eye movement analysis to determine whether the driver is exhibiting any abnormal behavior. For example, increased blinking, fixed gaze, and an increase in negative emotions are considered abnormal.
[0642] Step 4:
[0643] If an anomaly is detected, the device will use warning mechanisms to alert the driver. The driver will be alerted through voice alerts, on-screen messages, or device vibrations. Simultaneously, a message will be sent to the driver encouraging them to take a break.
[0644] Step 5:
[0645] The device periodically transmits facial recognition data, emotion data, and ambient sound data collected during driving to a server. The transmitted data is used to analyze long-term driving patterns and emotional changes.
[0646] Step 6:
[0647] The server analyzes the collected data and records the driver's driving history and changes in their emotional state. Based on these results, it creates specific safe driving advice and improvement plans for the driver and provides feedback to the user as needed.
[0648] (Example 2)
[0649] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0650] Driver inattention and emotional fluctuations are often major contributing factors to traffic accidents. Therefore, it is necessary to improve safety by accurately monitoring the driver's condition in real time and providing appropriate warnings. However, conventional technology has the challenge of not being able to adequately recognize the driver's emotions or stress levels, making immediate response difficult.
[0651] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0652] In this invention, the server includes means for detecting the driver's face and collecting facial expression data, means for analyzing the collected facial expression data and recognizing the emotional state, means for analyzing the blinking frequency and eye movements and identifying differences from the normal driving pattern, means for acquiring and analyzing ambient sounds during driving, and means for notifying the driver of a warning when an abnormality is detected, transmitting data to the server, and recording the driving pattern. This enables comprehensive monitoring of the driver's condition and real-time measures to improve safety.
[0653] "Image acquisition means" refers to technology or equipment for detecting the driver's face and collecting facial expression data.
[0654] "AI analysis method" refers to a processing method that uses artificial intelligence technology to analyze collected facial expression data and recognize the driver's emotional state.
[0655] "Biometric signal analysis means" refers to a method or apparatus for analyzing blink frequency and eye movement to identify differences from normal operating patterns.
[0656] A "warning mechanism" is a notification mechanism that alerts the driver when an abnormality is detected by an AI analysis mechanism.
[0657] "Voice analysis means" refers to a technology that acquires audio from the driving environment and analyzes that data to enhance the accuracy of judging abnormal conditions.
[0658] "Data transmission means" refers to a communication means for sending the analyzed data to a server and recording the driving pattern.
[0659] This invention is a system for improving driver safety, and in particular aims to monitor the driver's condition in real time and provide appropriate alerts when abnormalities are detected. The system mainly consists of a terminal and server installed in the driver's seat, and combines multiple sensors and AI technology.
[0660] The device is equipped with an image acquisition mechanism using a high-resolution camera. This camera recognizes the driver's face and collects facial expression data. The collected data is processed by an AI analysis mechanism within the device to identify the driver's emotional state. An emotion recognition algorithm can be used to identify the emotional state, recognizing basic emotions such as joy, anger, surprise, and sadness.
[0661] Furthermore, the terminal is equipped with biosignal analysis capabilities to detect blinking frequency and eye movements. Based on this information, if movements different from the normal driving pattern are detected, it is judged to be abnormal. In addition, the terminal is equipped with voice analysis capabilities to acquire ambient sounds inside the car using a microphone. After noise is removed from the acquired voice data, it is analyzed by AI analysis capabilities to complement the judgment of emotional state.
[0662] If the device detects an anomaly, it will alert the driver using a warning system. The alert will be delivered either by voice or displayed on the screen. It can also provide physical feedback by gently vibrating the seat to encourage the driver to take a break.
[0663] All analysis data is transmitted from the terminal to the server. The server records the history of each driver's driving patterns and emotional states. This information is securely stored using a cloud server and analyzed using AI analysis tools. The analysis results are provided to the user as personalized safe driving advice.
[0664] As a concrete example, consider a driver who is fatigued from driving for a long time and whose emotions are unstable. In this case, the device can detect the abnormality from the driver's facial expressions and voice data, and prompt them to take a break with voice notifications and vibration feedback. In this way, it is possible to avoid potential dangers and ensure the driver's safety.
[0665] An example of a prompt to input into the generating AI model is, "Design a system that analyzes the driver's facial expressions and ambient sound data, monitors their emotional state in real time, and improves safety." This allows the system to comprehensively understand the driver's state and provide support for safe driving.
[0666] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0667] Step 1:
[0668] The terminal uses a facial recognition camera to detect the driver's face and acquire facial expression data. The input data is the video feed from the camera, and the output is digital data containing the driver's face region and its facial features. In this step, image processing techniques are used to identify the face and extract the information.
[0669] Step 2:
[0670] The terminal inputs the acquired facial expression data into an AI analysis system to analyze the driver's emotional state. The input is the facial expression data obtained in step 1, and the output is numerical data indicating the driver's emotional classification (e.g., joy, anger, sadness, etc.) and its intensity. This process involves specific actions using a pre-trained emotion recognition algorithm.
[0671] Step 3:
[0672] The terminal analyzes blink frequency and eye movements using biosignal analysis tools. The input is continuous video data from a camera, and the output is vector data of the driver's blink frequency and gaze direction. In this step, a specific transformation is performed to apply continuous data monitoring and motion detection algorithms.
[0673] Step 4:
[0674] The device acquires ambient sound using a sound collection device and analyzes it using a speech analysis method. The input data is the audio signal from the microphone, and the output is characteristic data of the sound environment. Specifically, it performs speech filtering and frequency analysis, and applies noise cancellation to improve accuracy.
[0675] Step 5:
[0676] The terminal integrates the above analysis results and detects anomalies by comparing them with driving patterns. Inputs include the aforementioned emotional state, blink and gaze data, and voice analysis results, while output is an alert message indicating the presence or absence of an anomaly and its details. This step includes specific actions taken to detect anomalies based on AI judgment, compared with normal driving data.
[0677] Step 6:
[0678] If an anomaly is detected, the terminal activates a warning mechanism, providing the driver with an alert via voice notification, display, or seat vibration. The input is the result of the anomaly detection, and the output is feedback to the driver. Specific actions in this step include sending signals to the user interface.
[0679] Step 7:
[0680] The analyzed data is sent from the terminal to the server and recorded as a history of driving patterns and emotional states. The input is all the analyzed data, and the output is the historical data stored on the server. This step involves the specific operation of data transmission using a secure communication protocol.
[0681] This process allows the entire system to comprehensively monitor the driver's condition and provide timely support to improve safety.
[0682] (Application Example 2)
[0683] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0684] There is a need for technology that can accurately detect changes in the driver's emotional state and fatigue while driving, and support safe driving. However, conventional technology has the challenge of being unable to make highly accurate judgments about abnormal conditions while taking into account the driver's emotions and ambient sounds.
[0685] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0686] In this invention, the server includes an image acquisition means for detecting the driver's face and analyzing the frequency of blinking and eye movements, an emotion recognition means for determining the driver's emotional state in real time, and a voice analysis means for acquiring ambient sounds and improving the accuracy of determining abnormal conditions. This enables high-precision monitoring of the driver's various states and supports safe driving.
[0687] The "image acquisition means" is a device that detects the driver's face and analyzes the frequency of blinking and eye movements.
[0688] An "AI analysis device" is a processing device that uses artificial intelligence to analyze collected data and determine abnormal conditions in the driver.
[0689] An "emotion recognition system" is a system that determines the driver's emotional state in real time based on facial expression data.
[0690] A "warning device" is a device that notifies the driver of an abnormality when an abnormality is detected by an AI analysis device.
[0691] The "data transmission means" is a function that transmits driving data to an information processing device after notification and records the driving pattern.
[0692] "Voice analysis means" refers to a method for acquiring ambient sounds during operation and using them to improve the accuracy of judging abnormal conditions.
[0693] Modes for carrying out the invention
[0694] This invention realizes a system that monitors the driver's condition with high precision and supports safe driving. The server can detect the driver's face using a terminal installed in the driver's seat. Using the terminal's camera, it determines the position of the face in real time and analyzes the frequency of blinking and eye movements. The hardware used can be a smartphone with a built-in camera or a dedicated device. Libraries such as OpenCV are used for image processing.
[0695] Furthermore, by using an AI model as an emotion recognition tool, the driver's emotional state can be determined in real time from their facial expression data. Deep learning frameworks such as TensorFlow and PyTorch are used here.
[0696] When an abnormal condition is detected, the terminal notifies the driver of a warning through voice notification or display. Speech synthesis technology can be used for the warning, providing appropriate feedback in real time.
[0697] Subsequently, driving data is transmitted to a server via a data transmission device. This records driving patterns and allows for long-term data analysis. Based on the collected data, the server can generate personalized advice for the driver using an AI model.
[0698] As a concrete example, if the system detects signs of fatigue based on facial expression data while a driver is driving long distances, it will issue a voice warning and prompt the driver to take a break. In this way, it is possible to improve driving habits while ensuring the driver's safety.
[0699] An example of a prompt message for a generating AI model would be: "Input the following facial expression data into the facial expression analysis model to identify the driver's emotional state. Input data: smile, blinking frequency, eye movements. Output: emotion label, stress level."
[0700] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0701] Step 1:
[0702] The device uses a camera to detect the driver's face in real time. The input is video from the camera, and the output is the position information of the face. The OpenCV library is used to analyze the video frame by frame to detect the face.
[0703] Step 2:
[0704] The device processes detected facial expressions using AI analysis tools to determine the emotional state. The input is facial feature point data, and the output is an emotion label and stress level. Facial expression analysis is performed using a pre-trained deep learning model with TensorFlow.
[0705] Step 3:
[0706] The device will issue a warning to the user if an abnormal emotional state is detected by AI analysis. Inputs are emotion labels and stress levels, and output is either a voice message or a display notification. Speech synthesis technology is used to generate and notify the warning in real time.
[0707] Step 4:
[0708] The terminal transmits driving data and emotional state data to a server, recording the history in a database. The input is a set of the driver's emotional state and driving data, and the output is the recorded data entry. A data transmission protocol is used to securely transfer the data to the server.
[0709] Step 5:
[0710] The server analyzes the collected data and generates advice for the driver using a generative AI model. The input is historical driving data and emotion labels, and the output is specific advice for improving driving. A Python script is used to perform data analysis and drive the AI model.
[0711] Step 6:
[0712] Users review the advice received from the server and use it to improve their safe driving practices. The input is advice messages from the server, and the output is the driver's improved behavior. Users review the displayed advice and apply it to their next driving session.
[0713] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0714] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0715] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0716] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0717] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0718] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0719] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0720] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0721] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0722] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0723] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0724] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0725] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0726] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0727] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0728] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0729] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0730] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0731] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0732] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0733] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0734] The following is further disclosed regarding the embodiments described above.
[0735] (Claim 1)
[0736] An image acquisition means that detects the driver's face and analyzes the frequency of blinking and eye movements,
[0737] An AI analysis method that determines the driver's abnormal condition based on acquired data,
[0738] A warning system that notifies the driver of an abnormality detected by an AI analysis system,
[0739] A data transmission means that sends data to a server after notification by a warning means and records the driving pattern,
[0740] A system that includes this.
[0741] (Claim 2)
[0742] The system according to claim 1, further comprising a voice analysis means for acquiring ambient sounds during operation and using the data to improve the accuracy of determining abnormal conditions.
[0743] (Claim 3)
[0744] The system according to claim 1, wherein the warning means further includes means for generating vibration as physical feedback.
[0745] "Example 1"
[0746] (Claim 1)
[0747] An image acquisition means that detects the driver's face and analyzes the frequency of blinking and eye movements,
[0748] An AI analysis method that determines the driver's abnormal condition by comparing the acquired data with normal driving data,
[0749] A warning means that, when an abnormality is detected by the AI analysis means, notifies the driver of the warning via voice or display,
[0750] A data transmission means for transmitting data to a data storage device after notification by a warning means, and for recording and analyzing long-term operating patterns,
[0751] A means of creating prompt sentences using a generative AI model to customize the feedback provided to the driver,
[0752] A system that includes this.
[0753] (Claim 2)
[0754] The system according to claim 1, further comprising a voice analysis means for acquiring ambient sounds during operation and using the data to improve the accuracy of determining abnormal conditions.
[0755] (Claim 3)
[0756] The system according to claim 1, wherein the warning means further includes means for generating vibration as physical feedback.
[0757] "Application Example 1"
[0758] (Claim 1)
[0759] An image acquisition means that detects the driver's face and analyzes the frequency of blinking and eye movements,
[0760] An AI analysis method that determines the driver's abnormal condition based on acquired data,
[0761] A warning system that notifies the driver of an abnormality detected by an AI analysis system,
[0762] A data transmission means that sends data to a server after notification by a warning means and records the driving pattern,
[0763] A real-time notification method displayed on the control panel of an autonomous vehicle,
[0764] A system that includes this.
[0765] (Claim 2)
[0766] The system according to claim 1, further comprising a voice analysis means for acquiring ambient sounds during operation and using the data to improve the accuracy of determining abnormal conditions.
[0767] (Claim 3)
[0768] The system according to claim 1, wherein the warning means further includes means for generating a display means as audio and visual feedback.
[0769] "Example 2 of combining an emotion engine"
[0770] (Claim 1)
[0771] An image acquisition means for detecting the driver's face and collecting facial expression data,
[0772] An AI analysis method that analyzes collected facial expression data to recognize emotional states,
[0773] A biosignal analysis means that analyzes blink frequency and eye movements and identifies differences from normal operating patterns,
[0774] A warning system that notifies the driver of an abnormality detected by an AI analysis system,
[0775] A voice analysis method that acquires and analyzes ambient sounds during driving,
[0776] A data transmission means that sends data to a server and records the driving pattern,
[0777] A system that includes this.
[0778] (Claim 2)
[0779] The system according to claim 1, wherein the warning means further includes means for notifying the driver of a warning by voice notification or display and encouraging them to take a break.
[0780] (Claim 3)
[0781] The system according to claim 1, further comprising means for the driving environment to complement the accuracy of emotion analysis using voice analysis means.
[0782] "Application example 2 when combining with an emotional engine"
[0783] (Claim 1)
[0784] An image acquisition means that detects the driver's face and analyzes the frequency of blinking and eye movements,
[0785] An AI analysis method that determines the driver's abnormal condition based on acquired data,
[0786] An emotion recognition means that determines the driver's emotional state in real time,
[0787] A warning system that notifies the driver of an abnormality detected by an AI analysis system,
[0788] A data transmission means that transmits data to an information processing device after notification by a warning means and records the driving pattern,
[0789] A system that includes this.
[0790] (Claim 2)
[0791] The system according to claim 1, further comprising a voice analysis means for acquiring ambient sounds during operation and using the data to improve the accuracy of determining abnormal conditions.
[0792] (Claim 3)
[0793] The system according to claim 1, wherein the warning means further includes means for generating vibration as physical feedback. [Explanation of Symbols]
[0794] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. An image acquisition means that detects the driver's face and analyzes the frequency of blinking and eye movements, An AI analysis method that determines the driver's abnormal condition based on acquired data, A warning system that notifies the driver of an abnormality detected by an AI analysis system, A data transmission means that sends data to a server after notification by a warning means and records the driving pattern, A real-time notification method displayed on the control panel of an autonomous vehicle, A system that includes this.
2. The system according to claim 1, further comprising a voice analysis means for acquiring ambient sounds during operation and using the data to improve the accuracy of determining abnormal conditions.
3. The system according to claim 1, wherein the warning means further includes means for generating a display means as audio and visual feedback.