system

The system addresses the challenge of real-time risk assessment and evidence recording by using AI to analyze object movement and automatically store video data, improving safety and providing reliable evidence.

JP2026104423APending Publication Date: 2026-06-25SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-13
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing technologies fail to effectively judge the risk of contact between moving objects in real time and provide immediate warnings, and they also lack reliable evidence recording in the event of accidents.

Method used

A system that uses an imaging device attached to a moving object to continuously acquire video data, analyze object movement and distance using AI, generate warnings for potential contact, and automatically record video for evidence storage.

Benefits of technology

Enhances safety by reducing accident risks through real-time risk assessment and secure evidence recording.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026104423000001_ABST
    Figure 2026104423000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A means for continuously acquiring image information using a video acquisition device, A means for analyzing the movement and spacing of objects based on acquired image information, The analyzed information provides a means to assess the risk of a moving object coming into contact with the target and to issue a warning. A means for automatically recording video when contact occurs and transferring it to an external information storage device, For image analysis, a deep learning model is used to recognize the object, A means for transmitting acquired video information to a remote location via a communication network, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] With the spread of personal mobile devices, the number of contact accidents between pedestrians and other vehicles has increased, and improving safety has become an important issue. In the conventional technology, it has been difficult to judge the risk of contact of a moving object in real time and send an immediate warning to the user. Also, obtaining and storing reliable evidence at the time of an accident has been an issue.

Means for Solving the Problems

[0005] This invention provides a system that continuously acquires video data using an imaging device attached to a moving object and determines the movement and distance of an object through AI-based image analysis. This system improves safety by generating a warning when a risk of contact is detected. It also has a function to automatically record video when contact occurs and save it to an external storage device, which can be used as evidence in the event of an accident.

[0006] A "mobile device" refers to a device that has a self-propelled function and assists an individual's movement, and is primarily electric.

[0007] An "imaging device" is a device used to continuously acquire video or image data using optical means.

[0008] "Image data" refers to digital data containing visual information acquired by an imaging device.

[0009] "Object" refers to any object or living being that is present around the moving object and may come into contact with it.

[0010] "AI-based image analysis" is a process that uses artificial intelligence technology to analyze video data and identify objects and predict their movement.

[0011] "Risk of contact" is a criterion that indicates the degree to which a moving object is likely to come into contact with objects in its surroundings.

[0012] A "warning" is an alert signal or message that prompts users to take safety measures when the risk of contact increases.

[0013] An "external storage device" is a recording medium or cloud service used to record and securely store video data from contact or accidents. [Brief explanation of the drawing]

[0014] [Figure 1]It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

MODE FOR CARRYING OUT THE INVENTION

[0015] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0016] First, the terms used in the following description will be explained.

[0017] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0018] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0019] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0020] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), etc.

[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0022] [First Embodiment]

[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0035] This invention is a system for ensuring the safety of personal mobile devices, and is mainly composed of an imaging device attached to the mobile device and artificial intelligence technology. Specific embodiments of this system will be described below.

[0036] First, the user attaches their smartphone to the mobile device and launches a dedicated application. This application integrates camera and AI control functions, ready to monitor security in real time.

[0037] Next, the device uses its camera to continuously acquire video data of its surroundings. This captures objects in front of and around the moving object in the video, collecting information for safe operation. The acquired video data is processed by AI within the device to determine the movement and distance of the objects.

[0038] AI-powered image analysis uses deep learning models to accurately recognize pedestrians, vehicles, and other obstacles. Based on this information, the device evaluates the distance to each object and its movement in real time, and immediately generates an alert if it determines there is a risk of contact.

[0039] As a concrete example, suppose a device is operating in a busy area and suddenly detects a pedestrian crossing the road from the front. In this case, the AI ​​calculates the risk of collision with the pedestrian and issues a warning to the user through an audio alert or a warning message on the screen. This allows the user to take quick and appropriate action.

[0040] Furthermore, if contact occurs, the terminal automatically records video of the scene and generates a video clip that includes information from a certain period prior. Once an internet connection becomes available, the recorded video is uploaded to an external storage device, i.e., a cloud service, and securely stored as evidence of the accident. The server strictly manages this data and can provide it to insurance companies or legal authorities as needed.

[0041] Thus, the present invention is a system that embodies efforts to improve safety by reducing the risk of accidents during the operation of mobile objects and by quickly recording and saving the circumstances of accidents that occur.

[0042] The following describes the processing flow.

[0043] Step 1:

[0044] The user securely attaches their smartphone to the moving object and launches the dedicated application. The application automatically activates the camera and sensors and prepares for use.

[0045] Step 2:

[0046] The device uses the smartphone's camera to continuously acquire video data. This allows for the collection of information about the environment in front and around the device, which is then used for analysis.

[0047] Step 3:

[0048] The device applies an AI algorithm to each frame of the acquired video data to analyze the position, speed, and distance of objects. It utilizes a deep learning model to identify pedestrians, other vehicles, and stationary objects.

[0049] Step 4:

[0050] The device evaluates the risk of contact with the object in real time based on the analysis results. If the risk exceeds a certain threshold, it is determined that there is a risk of contact.

[0051] Step 5:

[0052] If the device detects a high risk of contact, it will immediately warn the user. The warning will be provided through audio alerts and on-screen displays to prompt the user to take immediate action.

[0053] Step 6:

[0054] The user receives an alert and takes safe driving actions such as changing lanes, slowing down, or stopping. This helps prevent accidents.

[0055] Step 7:

[0056] If contact occurs, the device immediately activates its recording function and records video before and after the contact. The recorded video is temporarily stored on the device.

[0057] Step 8:

[0058] If the device has an internet connection, it can upload recorded video data to an external storage device such as the cloud, ensuring secure storage.

[0059] Step 9:

[0060] The server will properly manage the video data stored in the cloud and establish a system to provide the data to insurance companies and legal authorities when necessary for accident investigation.

[0061] (Example 1)

[0062] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0063] Enhancing the safety of personal mobility devices has become a critical issue in recent years, given the increasing number of users. In particular, there is a need for systems that can appropriately detect risks and respond quickly to prevent accidental contact or collisions. Furthermore, in the event of a contact accident, it is essential to reliably record the circumstances to aid in subsequent responses and investigations. While these challenges can be addressed by enabling mobile devices to understand their surroundings and implement appropriate risk management, existing technologies have not adequately addressed them.

[0064] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0065] In this invention, the server includes means for continuously acquiring video information by a detection device attached to a mobile vehicle, means for recognizing an object using machine learning technology based on the acquired video information and evaluating its movement and estimated distance, and means for automatically recording video and saving it to a recording device located remotely when contact occurs. As a result, the mobile vehicle can monitor surrounding risks in real time, support safe operation, and in the event of an accident, record the situation to expedite subsequent responses.

[0066] A "detection device" is a device attached to a moving object that continuously acquires video information by continuously photographing the surrounding environment.

[0067] "Visual information" refers to information about the surrounding environment as visual data acquired by a detection device.

[0068] "Machine learning techniques" are algorithms and methods that enable computer systems to identify patterns from data and make automatic decisions based on those patterns.

[0069] "Evaluation of movement and estimated distance" is a process that uses machine learning technology to analyze and evaluate the movement patterns and approach distances of objects from video information obtained by a detection device.

[0070] A "recording device" is a remote storage medium or system used to store data related to events that occur in a moving object.

[0071] A "cloud-based information storage service" is an online platform for storing and managing digital data that is located remotely and accessible via the internet.

[0072] A "mobile device" is any form of transport equipment designed to carry people or goods, including personal electric devices.

[0073] This invention is a system integrated into a personal mobile device and is designed to ensure user safety. The system mainly consists of a user-attached terminal (such as a smartphone), its built-in camera, and an application equipped with AI technology.

[0074] First, the user attaches their smartphone to the mobile device and launches a dedicated application. This application integrates camera control functions and AI, and is ready to monitor security in real time.

[0075] Next, the device uses the smartphone's camera to continuously acquire video information of its surroundings. This video information is processed by a deep learning model using machine learning techniques to accurately recognize pedestrians, vehicles, and obstacles. The AI ​​evaluates the movement and estimated distance of these objects and immediately generates an alert if danger is predicted. This alert is communicated to the user via voice and on-screen display.

[0076] If contact occurs, the server will save the video footage recorded by the terminal to a remote cloud-based information storage service. This ensures that evidence of the incident is securely stored and can be used later as reference if necessary.

[0077] To give a specific example, if the device detects a bicycle that suddenly appears in front of it in a busy area, and the risk of collision increases, the user will be given a voice alert saying, "Please slow down."

[0078] An example of a prompt to input into the generated AI model would be: "Please describe the operating process of the safety system for personal mobile devices. In particular, I would like to know more about the method used for object recognition by AI."

[0079] Through this invention, it becomes possible to support the safe operation of mobile vehicles and minimize the risk of accidents.

[0080] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0081] Step 1:

[0082] The user attaches their smartphone to a moving object and launches a dedicated application. This activates the camera and puts it into safety monitoring mode. The input is the status of the smartphone (camera activated), and the output is an instruction for the camera to start recording. In this process, the user physically secures the smartphone and manually launches the application.

[0083] Step 2:

[0084] The device uses the smartphone's camera to continuously acquire video information of its surroundings. The input is video information captured by the camera, and the output is video data generated in real time. This allows the device to continuously record visual information. In this operation, the camera captures the situation at a high frame rate and passes the video data to the application.

[0085] Step 3:

[0086] The AI ​​installed in the device processes the acquired video data and uses a deep learning model to recognize objects. The input is the acquired video data, and the output is information about the recognized objects (pedestrians, vehicles, obstacles, etc.). In this operation, the AI ​​uses machine learning algorithms to extract features from the video and perform identification.

[0087] Step 4:

[0088] The terminal evaluates the movement and estimated distance of recognized objects to determine the risk of contact. The input is object information, and the output is the result of the risk assessment. In this evaluation, the AI ​​analyzes the movement of the object and calculates whether it deviates from a safe distance range.

[0089] Step 5:

[0090] The device generates an alert based on a risk assessment and notifies the user. The input is the result of the risk assessment, and the output is an alert message (audio or screen display). Specifically, the device emits a warning sound or visual instructions, allowing the user to take appropriate action based on them.

[0091] Step 6:

[0092] The server automatically saves video footage recorded by the terminal to a cloud-based recording device when contact occurs. The input is the video data of the incident to be saved, and the output is the video information securely stored in the cloud. This process involves uploading data externally using an internet connection.

[0093] (Application Example 1)

[0094] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0095] In modern mobility technology, particularly autonomous driving systems, it is crucial to accurately perceive the surrounding environment in real time to ensure safety. However, existing systems often fail to adequately recognize the surrounding environment, increasing the risk of traffic accidents. Furthermore, there is a need for a system that can quickly and safely record and store the circumstances of an accident when it occurs.

[0096] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0097] In this invention, the server includes means for continuously acquiring image information using a video acquisition device, means for analyzing the movement and distance of an object based on the acquired image information, and means for recognizing an object using a deep learning model for image analysis. This makes it possible for a moving object to grasp its surroundings in real time and accurately, thereby enhancing safety. Furthermore, by transmitting the acquired video information to a remote location via a communication network and automatically recording video at the time of contact, accidents can be recorded and saved quickly and reliably.

[0098] A "video acquisition device" is a device attached to a moving object to continuously acquire video information of the surroundings.

[0099] "Image information" refers to visual data acquired by video acquisition devices, which is used for the recognition and analysis of objects.

[0100] "Object" refers to dynamic or static elements such as objects or people that exist around a moving object.

[0101] A "mobile body" is a mechanism responsible for the movement of people or objects, and in this invention, it includes an automatic driving mechanism.

[0102] "Analysis" is the process of understanding and evaluating the content of acquired image information using a specific algorithm.

[0103] A "deep learning model" is a collection of algorithms built on AI technology, designed to recognize objects from image information with high accuracy.

[0104] An "external information storage device" is a device or system for storing data acquired from a mobile object, and in this invention, it refers to an information storage service located in a remote location.

[0105] A "communication network" is a system of telecommunications used to transfer information to a remote location, and includes the Internet.

[0106] "Real-time" refers to a time frame in which the situation in question can be processed and reacted to immediately.

[0107] The system for implementing this invention primarily involves the coordinated operation of a server and a terminal. The terminal is attached to a mobile device and uses a camera to acquire real-time images of the surroundings. The basic principle is to continuously acquire image information from the camera, which acts as an image acquisition device.

[0108] 1. Terminal processing:

[0109] The device functions as a smartphone or similar mobile device and analyzes acquired image information using a deep learning model. The AI ​​technologies used include frameworks such as TENSORFLOW® and PyTorch. This enables accurate recognition of objects such as pedestrians, vehicles, and bicycles, and assessment of the risk of contact with moving objects. When a risk is detected, the device issues a warning signal to the user.

[0110] 2. Server processing:

[0111] The server manages data transmitted from terminals in a cloud environment. Remote data storage services such as AWS® and Google® Cloud Storage are used as external data storage devices. The server receives acquired video information via the communication network and records the video if contact occurs. The stored data is kept secure and accessible for later reference.

[0112] 3. User roles:

[0113] The user operates the vehicle according to instructions from the terminal. When a warning is issued, the user receives audio and visual alerts and takes appropriate evasive action.

[0114] As a concrete example, at a busy intersection, the system detects a cyclist crossing in disregard of traffic signals early on and immediately warns the user to avoid a collision. In this case, the generative AI model performs analysis using prompt statements like the following.

[0115] "To the AI ​​model: Use this video data as input to identify pedestrians, cyclists, and vehicles, and calculate the risk of collision. Increase the frequency of alerts when pedestrians are close and notify the driver in real time."

[0116] In this way, the invention functions as a system that instantly grasps the surrounding situation and enhances safety in real time.

[0117] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0118] Step 1:

[0119] The terminal uses a video acquisition device to continuously acquire image information from around a moving object. The input is video data obtained through the camera, and the output is raw image data.

[0120] Step 2:

[0121] The image data acquired by the device is analyzed using a deep learning model. In this process, the image data is input into an AI framework such as TensorFlow, and the model identifies objects (pedestrians, vehicles, etc.). The output is the identified objects, their location information, and predictions of their movement.

[0122] Step 3:

[0123] The device evaluates the risk of contact between a moving object and an object based on the analysis results. The input is data on the object and its movement, identified by a deep learning model. The device uses this data to calculate the distance and speed to the object and determine the risk of contact. The output is the result of the risk evaluation.

[0124] Step 4:

[0125] The device issues a warning to the user based on a risk assessment. The input is the result of the risk assessment, and the output is an audio or visual alert. Specifically, the user is notified with a warning sound or on-screen display.

[0126] Step 5:

[0127] The server receives and stores image data of high-risk events sent from the terminal. The input is a video clip sent from the terminal. The server stores this in cloud storage and provides the data as needed upon request. The output is securely stored video data.

[0128] Step 6:

[0129] The user receives a warning from the device and takes appropriate evasive action. The input is the warning notification from the device. The user performs driving operations as needed to avoid the danger. The output is the evasive action taken.

[0130] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0131] This invention aims to support the safe operation of personal mobility devices by constructing a warning system that takes into account the surrounding environment using an imaging device and the user's emotional state. Embodiments of this invention are described in detail below.

[0132] First, the user attaches their smartphone to a personal mobility device such as an electric scooter or bicycle and launches a dedicated application. This application incorporates camera control, AI image analysis capabilities, and an emotion engine.

[0133] The device continuously acquires surrounding video data using the smartphone's camera and uses an AI algorithm to identify objects in front of and around it. During this process, the distance and movement of the objects are evaluated in real time, and a warning is generated if a contact risk is detected.

[0134] Furthermore, the device captures the user's facial expressions and uses an emotion engine to analyze the user's emotional state in real time. This emotion analysis allows the device to appropriately adjust the intensity and type of warnings when the user is feeling anxious or worried. For example, it can attract the attention of a user who is highly tense by increasing visual information, while applying standard warning methods if the user is relaxed.

[0135] As a concrete example, if the device detects a vehicle approaching on a congested city road and simultaneously determines from the user's facial expression that they are feeling anxious, it will display a stronger audio alert along with a clearer visual warning on the screen. As a result, the user can recognize the danger more quickly and take safer driving actions.

[0136] Furthermore, if contact occurs, the device automatically records the moment and can securely save the data using a cloud service. This data can be effectively used as evidence in the event of an accident to meet insurance procedures and legal requirements.

[0137] Thus, the present invention provides an embodiment of a system that comprehensively considers the movement environment of a moving object and the psychological state of the user, and contributes to improving safety.

[0138] The following describes the processing flow.

[0139] Step 1:

[0140] The user attaches a smartphone with the dedicated application installed to the personal mobile device and launches the application. The application activates the camera and sensors and completes the initial setup.

[0141] Step 2:

[0142] The device uses the smartphone's camera to acquire surrounding video data in real time and prepares for analysis frame by frame.

[0143] Step 3:

[0144] The device uses an AI algorithm to identify objects such as pedestrians, other vehicles, and stationary objects from the acquired video data, and calculates the distance and relative speed to each object.

[0145] Step 4:

[0146] The device simultaneously captures images of the user's face and analyzes their emotional state using an emotion engine. It identifies the user's facial expressions and evaluates their emotional state, such as their level of tension or relaxation.

[0147] Step 5:

[0148] The device comprehensively evaluates the risk of contact by combining data on the object's movement and distance, as well as the user's emotional state. If there is a risk of contact, it generates a warning, dynamically adjusting the warning method and its intensity according to the user's emotional state.

[0149] Step 6:

[0150] The user receives the warning when it occurs. This is usually presented as a visual or audible alert, but if the user is particularly anxious, measures such as increasing the volume of the audio or the display of the warning will be taken.

[0151] Step 7:

[0152] If the device detects that contact has occurred, it immediately records the event and saves the data. The recording covers the period before and after the contact and is saved as an appropriate clip.

[0153] Step 8:

[0154] The device processes the recorded data and uploads it to a cloud service once an internet connection is available. This process ensures the secure storage of the data and prepares it for future use.

[0155] Step 9:

[0156] The server organizes and stores uploaded data, and is prepared to smoothly provide it to insurance companies and legal authorities as needed. This process streamlines accident processing and reduces the burden on users.

[0157] (Example 2)

[0158] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0159] Ensuring safety while driving personal vehicles is a critical issue. However, conventional technology has not provided safety driving support through warning systems that take into account the driver's emotional state. As a result, there have been cases where the risk of collision could not be adequately warned, and safety could not be improved.

[0160] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0161] In this invention, the server includes means for continuously acquiring surrounding information using an imaging device attached to a mobile body, means for analyzing the characteristics of an object based on the acquired information, and means for analyzing the user's facial expressions to evaluate their emotional state and adjust warnings accordingly. This enables appropriate warnings that take into account the driver's psychological state, thereby improving the safety of personal mobile devices.

[0162] "Mobile devices" refer to personal transport devices, and specifically include electric devices.

[0163] An "imaging device" refers to a device that includes hardware and software for continuously acquiring visual information from the surroundings.

[0164] "Surrounding information" refers to all data about the environment acquired by the imaging device, including video and other sensor data.

[0165] "Means for analyzing the characteristics of an object" refers to a method that uses AI technology to analyze the movement and distance of an object based on acquired surrounding information.

[0166] "Means for analyzing the user's facial expressions" refers to a system that uses an emotion engine to evaluate the user's emotional state based on video data acquired by an imaging device.

[0167] "Means of adjusting warnings" refers to technology that has the function of appropriately changing the intensity and form of warnings according to the user's emotional state.

[0168] The embodiments for carrying out the present invention will now be described. This invention provides a system for a personal mobile device that acquires information about its surroundings and supports safe driving.

[0169] The user attaches their smartphone to a motorized personal mobility device and launches a dedicated application. This application incorporates camera control, AI image analysis, and an emotion analysis engine. This enables the smartphone to monitor the status of the mobile device.

[0170] The device, specifically a smartphone, continuously acquires images of its surroundings using its built-in camera. These images are analyzed by AI algorithms, allowing for real-time identification of object characteristics and movements. Distance sensors and GPS functionality may also be used in the AI ​​analysis.

[0171] The device uses an AI model to assess the distance to an object and the risk of contact. If a high level of danger is detected, the device generates a warning and communicates it to the user via voice and visual alerts. This warning level is adjusted based on the user's real-time emotional state. The device uses its camera to capture the user's face and performs emotion analysis from their facial expressions. The emotion analysis engine detects the user's anxiety, impatience, and other emotions, and generates a warning appropriate to the situation.

[0172] For example, if the device detects that the user is feeling anxious, it may display a louder-than-usual alarm or a visual warning with a more prominent color scheme. This allows the user to quickly understand the risk and take safe action.

[0173] Furthermore, recording automatically begins in the event of contact or an accident, and the collected data is saved via cloud storage. This feature plays a crucial role in collecting evidence after an accident and in insurance procedures.

[0174] An example of a prompt from the generated AI model is, "Please explain what kind of warning would be effective when you are feeling anxious. Please describe the specific situation and the countermeasures." This system makes it possible to drive vehicles safely while taking into account the user's psychological state.

[0175] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0176] Step 1:

[0177] The user installs a dedicated application on their smartphone and attaches it to the electric personal mobility device. Launching the application makes all system functions available. The input is the user attaching the smartphone and launching the app, while the output is the completion of the app's launch and the start of environmental monitoring. Specifically, the user taps the app and operates it through the interface.

[0178] Step 2:

[0179] The device continuously captures ambient video data using the smartphone's camera. The input is the video data acquired by the camera, and the output is the result of image analysis by an AI model. Specifically, the device sends live video from the camera frame by frame to the AI ​​system for real-time processing.

[0180] Step 3:

[0181] The device uses an AI model to identify objects from acquired video footage and analyzes their movement patterns and distances. The input is the image analysis data obtained in step 2, and the output is the movement history and distance information of the detected objects. Specifically, the device executes an AI algorithm to identify objects based on a certain threshold.

[0182] Step 4:

[0183] The terminal assesses the risk of contact and generates a warning. The input is the analysis result from step 3, and the output is the trigger for the warning (e.g., an audio warning or a visual alert). Specifically, when an object comes within a certain distance, the terminal immediately activates an alarm and displays a warning message on the screen.

[0184] Step 5:

[0185] The device uses a camera to capture the user's facial expressions and performs real-time emotion analysis using an emotion engine. The input is the user's facial image, and the output is the analyzed emotional state. Specifically, the device analyzes the user's facial features to identify emotions such as anxiety or worry.

[0186] Step 6:

[0187] The device customizes warnings according to the user's emotional state. The input is the result of the emotion analysis in step 5, and the output is the adjusted warning content. Specifically, when the emotional state is tense, the device selects an enhanced warning method, increasing the volume and visual display.

[0188] Step 7:

[0189] The device automatically records video and saves it to the cloud in the event of contact or an accident. The input is video data from the moment the accident occurs, and the output is the recorded footage securely stored in the cloud. Specifically, the device calls a cloud API to initiate the process of backing up the recorded data.

[0190] (Application Example 2)

[0191] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0192] In autonomous vehicles, if occupants experience feelings of anxiety or impatience during operation, conventional warning systems lack sufficient psychological support, limiting their ability to improve safety. Therefore, there is a need for a system that analyzes the occupants' emotional state in real time and dynamically adjusts appropriate warnings and information provision to achieve both safety and comfort.

[0193] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0194] In this invention, the server includes means for continuously acquiring video data using an imaging device attached to a moving object, means for analyzing the movement and distance of an object based on the acquired video data, and means for analyzing the user's emotional state and adjusting the generated warning according to that state. This makes it possible to provide appropriate warnings and information that correspond to the emotional state of the occupants.

[0195] A "mobile object" is a machine or device capable of changing its position, and in this invention, it specifically refers to an autonomous vehicle.

[0196] An "imaging device" is a device that acquires surrounding visual information as digital data, and includes devices such as cameras.

[0197] "Video data" refers to visual information acquired by an imaging device, represented in digital format, and in a format that allows for analysis and storage.

[0198] "Objects" refer to people and objects present around a moving vehicle that should be recognized and detected for safe driving.

[0199] "Emotional state" refers to the mental state of the crew members and can be expressed as anxiety, impatience, relaxation, etc.

[0200] A "warning" is a visual or auditory presentation of information intended to convey the presence of a potential hazard and to attract the attention of the crew.

[0201] "External storage device" refers to a storage means installed separately from a mobile device for storing data, and includes remote data storage services such as the cloud.

[0202] The system that realizes this invention is operated by a server mounted on a mobile vehicle. The server uses a camera that functions as an imaging device to continuously acquire video data of the surroundings. The video data is analyzed using AI image analysis software to identify the position and movement of objects, and based on this information, the mobile vehicle analyzes whether there is a risk of collision.

[0203] This system incorporates an emotion analysis tool to assess the crew's emotional state in real time. The server captures the crew's facial expressions, and the emotion analysis engine analyzes their psychological state, such as anxiety or impatience. The system dynamically adjusts the content and method of warnings according to the crew's emotional state. For example, if a crew member is feeling anxious, the intensity of visual warnings and audio alerts may be increased, and information may be provided to reassure them.

[0204] Furthermore, in the event of a collision, the server immediately saves video data of that moment to an external storage device. This saving function utilizes cloud services to store the data remotely, allowing it to be used as evidence in the event of an accident.

[0205] For example, if an autonomous vehicle driving through a busy area analyzes the occupants' emotions when the distance to the vehicle in front decreases and determines that they are feeling anxious, the system may switch to a mode that provides calming music or helpful information.

[0206] In this way, the server aims to improve the safety of the vehicle and the comfort of the occupants by providing advanced safe driving support tailored to the occupants' psychological state.

[0207] An example of a prompt message is: "Use AI to perform emotion analysis and environmental awareness within the autonomous vehicle, understand the occupants' emotional state, and suggest ways to reduce anxiety."

[0208] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0209] Step 1:

[0210] The server continuously acquires video data from the surrounding environment through an imaging device mounted on a mobile vehicle. At this stage, raw data from the camera is input and output as video data. This video data is used in subsequent analysis processing.

[0211] Step 2:

[0212] The server inputs the acquired video data into AI image analysis software to analyze the movement and distance of objects. Here, data processing is performed to recognize objects from the image data and calculate their position and velocity. This outputs data used for collision risk assessment.

[0213] Step 3:

[0214] The server uses the analysis results to determine whether there is a risk of collision in order to assess the likelihood of a collision. In this step, it calculates the presence or absence of a collision risk based on information about the movement and distance of the objects, and outputs the result.

[0215] Step 4:

[0216] The server captures images of the crew members' faces to analyze their emotional state and inputs them into an emotion analysis engine. The input data consists of facial expression images, and this data is used to evaluate their emotional state, such as anxiety or relaxation levels, and output as emotion evaluation data.

[0217] Step 5:

[0218] The server generates warnings and adjusts their intensity and type based on collision risk and emotion assessment data. Specifically, if occupants are feeling anxious, it generates strong visual and auditory warnings. As a result, appropriate warning signals are output to the user.

[0219] Step 6:

[0220] The server detects contact when it occurs, immediately records video footage of that moment, and saves it to external storage. Here, collision detection triggers the transfer of video data to a cloud service. This process ensures that accident records are securely stored.

[0221] Step 7:

[0222] Users receive relaxing music and information provided by the system, which reduces anxiety. In this process, the user's experience based on the content provided by the server reduces the psychological burden on the crew.

[0223] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0224] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0225] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0226] [Second Embodiment]

[0227] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0228] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0229] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0230] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0231] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0232] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0233] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0234] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0235] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0236] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0237] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0238] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0239] This invention is a system for ensuring the safety of personal mobile devices, and is mainly composed of an imaging device attached to the mobile device and artificial intelligence technology. Specific embodiments of this system will be described below.

[0240] First, the user attaches their smartphone to the mobile device and launches a dedicated application. This application integrates camera and AI control functions, ready to monitor security in real time.

[0241] Next, the device uses its camera to continuously acquire video data of its surroundings. This captures objects in front of and around the moving object in the video, collecting information for safe operation. The acquired video data is processed by AI within the device to determine the movement and distance of the objects.

[0242] AI-powered image analysis uses deep learning models to accurately recognize pedestrians, vehicles, and other obstacles. Based on this information, the device evaluates the distance to each object and its movement in real time, and immediately generates an alert if it determines there is a risk of contact.

[0243] As a concrete example, suppose a device is operating in a busy area and suddenly detects a pedestrian crossing the road from the front. In this case, the AI ​​calculates the risk of collision with the pedestrian and issues a warning to the user through an audio alert or a warning message on the screen. This allows the user to take quick and appropriate action.

[0244] Furthermore, if contact occurs, the terminal automatically records video of the scene and generates a video clip that includes information from a certain period prior. Once an internet connection becomes available, the recorded video is uploaded to an external storage device, i.e., a cloud service, and securely stored as evidence of the accident. The server strictly manages this data and can provide it to insurance companies or legal authorities as needed.

[0245] Thus, the present invention is a system that embodies efforts to improve safety by reducing the risk of accidents during the operation of mobile objects and by quickly recording and saving the circumstances of accidents that occur.

[0246] The following describes the processing flow.

[0247] Step 1:

[0248] The user securely attaches their smartphone to the moving object and launches the dedicated application. The application automatically activates the camera and sensors and prepares for use.

[0249] Step 2:

[0250] The device uses the smartphone's camera to continuously acquire video data. This allows for the collection of information about the environment in front and around the device, which is then used for analysis.

[0251] Step 3:

[0252] The device applies an AI algorithm to each frame of the acquired video data to analyze the position, speed, and distance of objects. It utilizes a deep learning model to identify pedestrians, other vehicles, and stationary objects.

[0253] Step 4:

[0254] The device evaluates the risk of contact with the object in real time based on the analysis results. If the risk exceeds a certain threshold, it is determined that there is a risk of contact.

[0255] Step 5:

[0256] If the device detects a high risk of contact, it will immediately warn the user. The warning will be provided through audio alerts and on-screen displays to prompt the user to take immediate action.

[0257] Step 6:

[0258] The user receives an alert and takes safe driving actions such as changing lanes, slowing down, or stopping. This helps prevent accidents.

[0259] Step 7:

[0260] If contact occurs, the device immediately activates its recording function and records video before and after the contact. The recorded video is temporarily stored on the device.

[0261] Step 8:

[0262] If the device has an internet connection, it can upload recorded video data to an external storage device such as the cloud, ensuring secure storage.

[0263] Step 9:

[0264] The server will properly manage the video data stored in the cloud and establish a system to provide the data to insurance companies and legal authorities when necessary for accident investigation.

[0265] (Example 1)

[0266] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0267] Enhancing the safety of personal mobility devices has become a critical issue in recent years, given the increasing number of users. In particular, there is a need for systems that can appropriately detect risks and respond quickly to prevent accidental contact or collisions. Furthermore, in the event of a contact accident, it is essential to reliably record the circumstances to aid in subsequent responses and investigations. While these challenges can be addressed by enabling mobile devices to understand their surroundings and implement appropriate risk management, existing technologies have not adequately addressed them.

[0268] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0269] In this invention, the server includes means for continuously acquiring video information by a detection device attached to a mobile vehicle, means for recognizing an object using machine learning technology based on the acquired video information and evaluating its movement and estimated distance, and means for automatically recording video and saving it to a recording device located remotely when contact occurs. As a result, the mobile vehicle can monitor surrounding risks in real time, support safe operation, and in the event of an accident, record the situation to expedite subsequent responses.

[0270] A "detection device" is a device attached to a moving object that continuously acquires video information by continuously photographing the surrounding environment.

[0271] "Visual information" refers to information about the surrounding environment as visual data acquired by a detection device.

[0272] "Machine learning techniques" are algorithms and methods that enable computer systems to identify patterns from data and make automatic decisions based on those patterns.

[0273] "Evaluation of movement and estimated distance" is a process that uses machine learning technology to analyze and evaluate the movement patterns and approach distances of objects from video information obtained by a detection device.

[0274] A "recording device" is a remote storage medium or system used to store data related to events that occur in a moving object.

[0275] A "cloud-based information storage service" is an online platform for storing and managing digital data that is located remotely and accessible via the internet.

[0276] A "mobile device" is any form of transport equipment designed to carry people or goods, including personal electric devices.

[0277] This invention is a system integrated into a personal mobile device and is designed to ensure user safety. The system mainly consists of a user-attached terminal (such as a smartphone), its built-in camera, and an application equipped with AI technology.

[0278] First, the user attaches their smartphone to the mobile device and launches a dedicated application. This application integrates camera control functions and AI, and is ready to monitor security in real time.

[0279] Next, the device uses the smartphone's camera to continuously acquire video information of its surroundings. This video information is processed by a deep learning model using machine learning techniques to accurately recognize pedestrians, vehicles, and obstacles. The AI ​​evaluates the movement and estimated distance of these objects and immediately generates an alert if danger is predicted. This alert is communicated to the user via voice and on-screen display.

[0280] If contact occurs, the server will save the video footage recorded by the terminal to a remote cloud-based information storage service. This ensures that evidence of the incident is securely stored and can be used later as reference if necessary.

[0281] To give a specific example, if the device detects a bicycle that suddenly appears in front of it in a busy area, and the risk of collision increases, the user will be given a voice alert saying, "Please slow down."

[0282] An example of a prompt to input into the generated AI model would be: "Please describe the operating process of the safety system for personal mobile devices. In particular, I would like to know more about the method used for object recognition by AI."

[0283] Through this invention, it becomes possible to support the safe operation of mobile vehicles and minimize the risk of accidents.

[0284] The flow of the specific process in Example 1 will be described using FIG. 11.

[0285] Step 1:

[0286] The user attaches the smartphone to the moving body and launches the dedicated application. As a result, the camera starts operating and enters the safety monitoring mode. The input is the state of the smartphone (activation of the built-in camera), and the output is an instruction for the camera to start shooting. In this process, the user physically fixes the smartphone and manually starts the application.

[0287] Step 2:

[0288] The terminal continuously acquires surrounding video information using the camera of the smartphone. The input is the video information captured by the camera, and the output is the video data generated in real time. As a result, the terminal executes the operation of continuously recording visual information. In this operation, the camera captures the situation at a high frame rate and passes the video data into the application.

[0289] Step 3:

[0290] The AI installed in the terminal processes the acquired video data and recognizes objects using the deep learning model. The input is the acquired video data, and the output is the recognized object information (pedestrians, vehicles, obstacles, etc.). In this operation, the AI makes full use of the machine learning algorithm to extract features from the video and perform identification.

[0291] Step 4:

[0292] The terminal evaluates the movement and estimated distance of the recognized objects and determines the contact risk. The input is the object information, and the output is the result of the risk determination. In this evaluation, the AI analyzes the dynamic state of the objects and calculates whether they deviate from the safe distance range.

[0293] Step 5:

[0294] The device generates an alert based on a risk assessment and notifies the user. The input is the result of the risk assessment, and the output is an alert message (audio or screen display). Specifically, the device emits a warning sound or visual instructions, allowing the user to take appropriate action based on them.

[0295] Step 6:

[0296] The server automatically saves video footage recorded by the terminal to a cloud-based recording device when contact occurs. The input is the video data of the incident to be saved, and the output is the video information securely stored in the cloud. This process involves uploading data externally using an internet connection.

[0297] (Application Example 1)

[0298] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0299] In modern mobility technology, particularly autonomous driving systems, it is crucial to accurately perceive the surrounding environment in real time to ensure safety. However, existing systems often fail to adequately recognize the surrounding environment, increasing the risk of traffic accidents. Furthermore, there is a need for a system that can quickly and safely record and store the circumstances of an accident when it occurs.

[0300] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0301] In this invention, the server includes means for continuously acquiring image information by an image acquisition device, means for analyzing the movement and distance of an object based on the acquired image information, and means for recognizing the object by using a deep learning model for image analysis. Thereby, the moving body can accurately grasp the surrounding situation in real time and enhance safety. Also, by transmitting the acquired video information to a remote location via a communication network and automatically recording the video at the time of contact, the accident can be recorded and stored quickly and reliably.

[0302] The "image acquisition device" is a device mounted on a moving body for continuously acquiring surrounding video information.

[0303] The "image information" refers to visual data acquired by an image acquisition device and is data used for object recognition and analysis.

[0304] The "object" refers to dynamic or static elements such as objects and people existing around the moving body.

[0305] The "moving body" is a mechanism responsible for the movement of people and goods and includes an automatic driving mechanism in this invention.

[0306] "Analysis" is a process of understanding and evaluating the content of the acquired image information by using a specific algorithm.

[0307] The "deep learning model" is a collection of algorithms constructed based on AI technology and is a mechanism for accurately recognizing an object from image information.

[0308] The "external information storage device" is a device or system for storing data acquired from the moving body and refers to an information storage service existing in a remote location in this invention.

[0309] The "communication network" is a telecommunication mechanism for transferring information to a remote location and includes the Internet.

[0310] "Real-time" refers to a time frame in which the situation in question can be processed and reacted to immediately.

[0311] The system for implementing this invention primarily involves the coordinated operation of a server and a terminal. The terminal is attached to a mobile device and uses a camera to acquire real-time images of the surroundings. The basic principle is to continuously acquire image information from the camera, which acts as an image acquisition device.

[0312] 1. Terminal processing:

[0313] The device functions as a smartphone or similar mobile device and analyzes acquired image information using a deep learning model. The AI ​​technologies used include frameworks such as TensorFlow and PyTorch. This enables accurate recognition of objects such as pedestrians, vehicles, and bicycles, and assessment of the risk of contact with moving objects. If a risk is detected, the device issues a warning signal to the user.

[0314] 2. Server processing:

[0315] The server manages data transmitted from terminals in a cloud environment. Remote data storage services such as AWS and Google Cloud Storage are used as external data storage devices. The server receives acquired video information via the communication network and records the video if contact occurs. The stored data is kept secure and accessible for later reference.

[0316] 3. User roles:

[0317] The user operates the vehicle according to instructions from the terminal. When a warning is issued, the user receives audio and visual alerts and takes appropriate evasive action.

[0318] As a concrete example, at a busy intersection, the system detects a cyclist crossing in disregard of traffic signals early on and immediately warns the user to avoid a collision. In this case, the generative AI model performs analysis using prompt statements like the following.

[0319] "To the AI ​​model: Use this video data as input to identify pedestrians, cyclists, and vehicles, and calculate the risk of collision. Increase the frequency of alerts when pedestrians are close and notify the driver in real time."

[0320] In this way, the invention functions as a system that instantly grasps the surrounding situation and enhances safety in real time.

[0321] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0322] Step 1:

[0323] The terminal uses a video acquisition device to continuously acquire image information from around a moving object. The input is video data obtained through the camera, and the output is raw image data.

[0324] Step 2:

[0325] The image data acquired by the device is analyzed using a deep learning model. In this process, the image data is input into an AI framework such as TensorFlow, and the model identifies objects (pedestrians, vehicles, etc.). The output is the identified objects, their location information, and predictions of their movement.

[0326] Step 3:

[0327] The device evaluates the risk of contact between a moving object and an object based on the analysis results. The input is data on the object and its movement, identified by a deep learning model. The device uses this data to calculate the distance and speed to the object and determine the risk of contact. The output is the result of the risk evaluation.

[0328] Step 4:

[0329] The device issues a warning to the user based on a risk assessment. The input is the result of the risk assessment, and the output is an audio or visual alert. Specifically, the user is notified with a warning sound or on-screen display.

[0330] Step 5:

[0331] The server receives and stores image data of high-risk events sent from the terminal. The input is a video clip sent from the terminal. The server stores this in cloud storage and provides the data as needed upon request. The output is securely stored video data.

[0332] Step 6:

[0333] The user receives a warning from the device and takes appropriate evasive action. The input is the warning notification from the device. The user performs driving operations as needed to avoid the danger. The output is the evasive action taken.

[0334] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0335] This invention aims to support the safe operation of personal mobility devices by constructing a warning system that takes into account the surrounding environment using an imaging device and the user's emotional state. Embodiments of this invention are described in detail below.

[0336] First, the user attaches their smartphone to a personal mobility device such as an electric scooter or bicycle and launches a dedicated application. This application incorporates camera control, AI image analysis capabilities, and an emotion engine.

[0337] The device continuously acquires surrounding video data using the smartphone's camera and uses an AI algorithm to identify objects in front of and around it. During this process, the distance and movement of the objects are evaluated in real time, and a warning is generated if a contact risk is detected.

[0338] Furthermore, the device captures the user's facial expressions and uses an emotion engine to analyze the user's emotional state in real time. This emotion analysis allows the device to appropriately adjust the intensity and type of warnings when the user is feeling anxious or worried. For example, it can attract the attention of a user who is highly tense by increasing visual information, while applying standard warning methods if the user is relaxed.

[0339] As a concrete example, if the device detects a vehicle approaching on a congested city road and simultaneously determines from the user's facial expression that they are feeling anxious, it will display a stronger audio alert along with a clearer visual warning on the screen. As a result, the user can recognize the danger more quickly and take safer driving actions.

[0340] Furthermore, if contact occurs, the device automatically records the moment and can securely save the data using a cloud service. This data can be effectively used as evidence in the event of an accident to meet insurance procedures and legal requirements.

[0341] Thus, the present invention provides an embodiment of a system that comprehensively considers the movement environment of a moving object and the psychological state of the user, and contributes to improving safety.

[0342] The following describes the processing flow.

[0343] Step 1:

[0344] The user attaches a smartphone with the dedicated application installed to the personal mobile device and launches the application. The application activates the camera and sensors and completes the initial setup.

[0345] Step 2:

[0346] The device uses the smartphone's camera to acquire surrounding video data in real time and prepares for analysis frame by frame.

[0347] Step 3:

[0348] The device uses an AI algorithm to identify objects such as pedestrians, other vehicles, and stationary objects from the acquired video data, and calculates the distance and relative speed to each object.

[0349] Step 4:

[0350] The device simultaneously captures images of the user's face and analyzes their emotional state using an emotion engine. It identifies the user's facial expressions and evaluates their emotional state, such as their level of tension or relaxation.

[0351] Step 5:

[0352] The device comprehensively evaluates the risk of contact by combining data on the object's movement and distance, as well as the user's emotional state. If there is a risk of contact, it generates a warning, dynamically adjusting the warning method and its intensity according to the user's emotional state.

[0353] Step 6:

[0354] The user receives the warning when it occurs. This is usually presented as a visual or audible alert, but if the user is particularly anxious, measures such as increasing the volume of the audio or the display of the warning will be taken.

[0355] Step 7:

[0356] If the device detects that contact has occurred, it immediately records the event and saves the data. The recording covers the period before and after the contact and is saved as an appropriate clip.

[0357] Step 8:

[0358] The device processes the recorded data and uploads it to a cloud service once an internet connection is available. This process ensures the secure storage of the data and prepares it for future use.

[0359] Step 9:

[0360] The server organizes and stores uploaded data, and is prepared to smoothly provide it to insurance companies and legal authorities as needed. This process streamlines accident processing and reduces the burden on users.

[0361] (Example 2)

[0362] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0363] Ensuring safety while driving personal vehicles is a critical issue. However, conventional technology has not provided safety driving support through warning systems that take into account the driver's emotional state. As a result, there have been cases where the risk of collision could not be adequately warned, and safety could not be improved.

[0364] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0365] In this invention, the server includes means for continuously acquiring surrounding information using an imaging device attached to a mobile body, means for analyzing the characteristics of an object based on the acquired information, and means for analyzing the user's facial expressions to evaluate their emotional state and adjust warnings accordingly. This enables appropriate warnings that take into account the driver's psychological state, thereby improving the safety of personal mobile devices.

[0366] "Mobile devices" refer to personal transport devices, and specifically include electric devices.

[0367] An "imaging device" refers to a device that includes hardware and software for continuously acquiring visual information from the surroundings.

[0368] "Surrounding information" refers to all data about the environment acquired by the imaging device, including video and other sensor data.

[0369] "Means for analyzing the characteristics of an object" refers to a method that uses AI technology to analyze the movement and distance of an object based on acquired surrounding information.

[0370] "Means for analyzing the user's facial expressions" refers to a system that uses an emotion engine to evaluate the user's emotional state based on video data acquired by an imaging device.

[0371] "Means of adjusting warnings" refers to technology that has the function of appropriately changing the intensity and form of warnings according to the user's emotional state.

[0372] The embodiments for carrying out the present invention will now be described. This invention provides a system for a personal mobile device that acquires information about its surroundings and supports safe driving.

[0373] The user attaches their smartphone to a motorized personal mobility device and launches a dedicated application. This application incorporates camera control, AI image analysis, and an emotion analysis engine. This enables the smartphone to monitor the status of the mobile device.

[0374] The device, specifically a smartphone, continuously acquires images of its surroundings using its built-in camera. These images are analyzed by AI algorithms, allowing for real-time identification of object characteristics and movements. Distance sensors and GPS functionality may also be used in the AI ​​analysis.

[0375] The device uses an AI model to assess the distance to an object and the risk of contact. If a high level of danger is detected, the device generates a warning and communicates it to the user via voice and visual alerts. This warning level is adjusted based on the user's real-time emotional state. The device uses its camera to capture the user's face and performs emotion analysis from their facial expressions. The emotion analysis engine detects the user's anxiety, impatience, and other emotions, and generates a warning appropriate to the situation.

[0376] For example, if the device detects that the user is feeling anxious, it may display a louder-than-usual alarm or a visual warning with a more prominent color scheme. This allows the user to quickly understand the risk and take safe action.

[0377] Furthermore, recording automatically begins in the event of contact or an accident, and the collected data is saved via cloud storage. This feature plays a crucial role in collecting evidence after an accident and in insurance procedures.

[0378] An example of a prompt from the generated AI model is, "Please explain what kind of warning would be effective when you are feeling anxious. Please describe the specific situation and the countermeasures." This system makes it possible to drive vehicles safely while taking into account the user's psychological state.

[0379] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0380] Step 1:

[0381] The user installs a dedicated application on their smartphone and attaches it to the electric personal mobility device. Launching the application makes all system functions available. The input is the user attaching the smartphone and launching the app, while the output is the completion of the app's launch and the start of environmental monitoring. Specifically, the user taps the app and operates it through the interface.

[0382] Step 2:

[0383] The device continuously captures ambient video data using the smartphone's camera. The input is the video data acquired by the camera, and the output is the result of image analysis by an AI model. Specifically, the device sends live video from the camera frame by frame to the AI ​​system for real-time processing.

[0384] Step 3:

[0385] The device uses an AI model to identify objects from acquired video footage and analyzes their movement patterns and distances. The input is the image analysis data obtained in step 2, and the output is the movement history and distance information of the detected objects. Specifically, the device executes an AI algorithm to identify objects based on a certain threshold.

[0386] Step 4:

[0387] The terminal assesses the risk of contact and generates a warning. The input is the analysis result from step 3, and the output is the trigger for the warning (e.g., an audio warning or a visual alert). Specifically, when an object comes within a certain distance, the terminal immediately activates an alarm and displays a warning message on the screen.

[0388] Step 5:

[0389] The device uses a camera to capture the user's facial expressions and performs real-time emotion analysis using an emotion engine. The input is the user's facial image, and the output is the analyzed emotional state. Specifically, the device analyzes the user's facial features to identify emotions such as anxiety or worry.

[0390] Step 6:

[0391] The device customizes warnings according to the user's emotional state. The input is the result of the emotion analysis in step 5, and the output is the adjusted warning content. Specifically, when the emotional state is tense, the device selects an enhanced warning method, increasing the volume and visual display.

[0392] Step 7:

[0393] The device automatically records video and saves it to the cloud in the event of contact or an accident. The input is video data from the moment the accident occurs, and the output is the recorded footage securely stored in the cloud. Specifically, the device calls a cloud API to initiate the process of backing up the recorded data.

[0394] (Application Example 2)

[0395] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".

[0396] In autonomous vehicles, if occupants experience feelings of anxiety or impatience during operation, conventional warning systems lack sufficient psychological support, limiting their ability to improve safety. Therefore, there is a need for a system that analyzes the occupants' emotional state in real time and dynamically adjusts appropriate warnings and information provision to achieve both safety and comfort.

[0397] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0398] In this invention, the server includes means for continuously acquiring video data using an imaging device attached to a moving object, means for analyzing the movement and distance of an object based on the acquired video data, and means for analyzing the user's emotional state and adjusting the generated warning according to that state. This makes it possible to provide appropriate warnings and information that correspond to the emotional state of the occupants.

[0399] A "mobile object" is a machine or device capable of changing its position, and in this invention, it specifically refers to an autonomous vehicle.

[0400] An "imaging device" is a device that acquires surrounding visual information as digital data, and includes devices such as cameras.

[0401] "Video data" refers to visual information acquired by an imaging device, represented in digital format, and in a format that allows for analysis and storage.

[0402] "Objects" refer to people and objects present around a moving vehicle that should be recognized and detected for safe driving.

[0403] "Emotional state" refers to the mental state of the crew members and can be expressed as anxiety, impatience, relaxation, etc.

[0404] A "warning" is a visual or auditory presentation of information intended to convey the presence of a potential hazard and to attract the attention of the crew.

[0405] "External storage device" refers to a storage means installed separately from a mobile device for storing data, and includes remote data storage services such as the cloud.

[0406] The system that realizes this invention is operated by a server mounted on a mobile vehicle. The server uses a camera that functions as an imaging device to continuously acquire video data of the surroundings. The video data is analyzed using AI image analysis software to identify the position and movement of objects, and based on this information, the mobile vehicle analyzes whether there is a risk of collision.

[0407] This system incorporates an emotion analysis tool to assess the crew's emotional state in real time. The server captures the crew's facial expressions, and the emotion analysis engine analyzes their psychological state, such as anxiety or impatience. The system dynamically adjusts the content and method of warnings according to the crew's emotional state. For example, if a crew member is feeling anxious, the intensity of visual warnings and audio alerts may be increased, and information may be provided to reassure them.

[0408] Furthermore, in the event of a collision, the server immediately saves video data of that moment to an external storage device. This saving function utilizes cloud services to store the data remotely, allowing it to be used as evidence in the event of an accident.

[0409] For example, if an autonomous vehicle driving through a busy area analyzes the occupants' emotions when the distance to the vehicle in front decreases and determines that they are feeling anxious, the system may switch to a mode that provides calming music or helpful information.

[0410] In this way, the server aims to improve the safety of the vehicle and the comfort of the occupants by providing advanced safe driving support tailored to the occupants' psychological state.

[0411] An example of a prompt message is: "Use AI to perform emotion analysis and environmental awareness within the autonomous vehicle, understand the occupants' emotional state, and suggest ways to reduce anxiety."

[0412] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0413] Step 1:

[0414] The server continuously acquires video data from the surrounding environment through an imaging device mounted on a mobile vehicle. At this stage, raw data from the camera is input and output as video data. This video data is used in subsequent analysis processing.

[0415] Step 2:

[0416] The server inputs the acquired video data into AI image analysis software to analyze the movement and distance of objects. Here, data processing is performed to recognize objects from the image data and calculate their position and velocity. This outputs data used for collision risk assessment.

[0417] Step 3:

[0418] The server uses the analysis results to determine whether there is a risk of collision in order to assess the likelihood of a collision. In this step, it calculates the presence or absence of a collision risk based on information about the movement and distance of the objects, and outputs the result.

[0419] Step 4:

[0420] The server captures images of the crew members' faces to analyze their emotional state and inputs them into an emotion analysis engine. The input data consists of facial expression images, and this data is used to evaluate their emotional state, such as anxiety or relaxation levels, and output as emotion evaluation data.

[0421] Step 5:

[0422] The server generates warnings and adjusts their intensity and type based on collision risk and emotion assessment data. Specifically, if occupants are feeling anxious, it generates strong visual and auditory warnings. As a result, appropriate warning signals are output to the user.

[0423] Step 6:

[0424] The server detects contact when it occurs, immediately records video footage of that moment, and saves it to external storage. Here, collision detection triggers the transfer of video data to a cloud service. This process ensures that accident records are securely stored.

[0425] Step 7:

[0426] Users receive relaxing music and information provided by the system, which reduces anxiety. In this process, the user's experience based on the content provided by the server reduces the psychological burden on the crew.

[0427] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0428] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0429] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0430] [Third Embodiment]

[0431] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0432] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0433] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0434] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0435] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0436] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0437] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0438] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0439] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0440] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0441] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0442] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0443] This invention is a system for ensuring the safety of personal mobile devices, and is mainly composed of an imaging device attached to the mobile device and artificial intelligence technology. Specific embodiments of this system will be described below.

[0444] First, the user attaches their smartphone to the mobile device and launches a dedicated application. This application integrates camera and AI control functions, ready to monitor security in real time.

[0445] Next, the device uses its camera to continuously acquire video data of its surroundings. This captures objects in front of and around the moving object in the video, collecting information for safe operation. The acquired video data is processed by AI within the device to determine the movement and distance of the objects.

[0446] AI-powered image analysis uses deep learning models to accurately recognize pedestrians, vehicles, and other obstacles. Based on this information, the device evaluates the distance to each object and its movement in real time, and immediately generates an alert if it determines there is a risk of contact.

[0447] As a concrete example, suppose a device is operating in a busy area and suddenly detects a pedestrian crossing the road from the front. In this case, the AI ​​calculates the risk of collision with the pedestrian and issues a warning to the user through an audio alert or a warning message on the screen. This allows the user to take quick and appropriate action.

[0448] Furthermore, if contact occurs, the terminal automatically records video of the scene and generates a video clip that includes information from a certain period prior. Once an internet connection becomes available, the recorded video is uploaded to an external storage device, i.e., a cloud service, and securely stored as evidence of the accident. The server strictly manages this data and can provide it to insurance companies or legal authorities as needed.

[0449] Thus, the present invention is a system that embodies efforts to improve safety by reducing the risk of accidents during the operation of mobile objects and by quickly recording and saving the circumstances of accidents that occur.

[0450] The following describes the processing flow.

[0451] Step 1:

[0452] The user securely attaches their smartphone to the moving object and launches the dedicated application. The application automatically activates the camera and sensors and prepares for use.

[0453] Step 2:

[0454] The device uses the smartphone's camera to continuously acquire video data. This allows for the collection of information about the environment in front and around the device, which is then used for analysis.

[0455] Step 3:

[0456] The device applies an AI algorithm to each frame of the acquired video data to analyze the position, speed, and distance of objects. It utilizes a deep learning model to identify pedestrians, other vehicles, and stationary objects.

[0457] Step 4:

[0458] The device evaluates the risk of contact with the object in real time based on the analysis results. If the risk exceeds a certain threshold, it is determined that there is a risk of contact.

[0459] Step 5:

[0460] If the device detects a high risk of contact, it will immediately warn the user. The warning will be provided through audio alerts and on-screen displays to prompt the user to take immediate action.

[0461] Step 6:

[0462] The user receives an alert and takes safe driving actions such as changing lanes, slowing down, or stopping. This helps prevent accidents.

[0463] Step 7:

[0464] If contact occurs, the device immediately activates its recording function and records video before and after the contact. The recorded video is temporarily stored on the device.

[0465] Step 8:

[0466] If the device has an internet connection, it can upload recorded video data to an external storage device such as the cloud, ensuring secure storage.

[0467] Step 9:

[0468] The server will properly manage the video data stored in the cloud and establish a system to provide the data to insurance companies and legal authorities when necessary for accident investigation.

[0469] (Example 1)

[0470] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0471] Enhancing the safety of personal mobility devices has become a critical issue in recent years, given the increasing number of users. In particular, there is a need for systems that can appropriately detect risks and respond quickly to prevent accidental contact or collisions. Furthermore, in the event of a contact accident, it is essential to reliably record the circumstances to aid in subsequent responses and investigations. While these challenges can be addressed by enabling mobile devices to understand their surroundings and implement appropriate risk management, existing technologies have not adequately addressed them.

[0472] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0473] In this invention, the server includes means for continuously acquiring video information by a detection device attached to a mobile vehicle, means for recognizing an object using machine learning technology based on the acquired video information and evaluating its movement and estimated distance, and means for automatically recording video and saving it to a recording device located remotely when contact occurs. As a result, the mobile vehicle can monitor surrounding risks in real time, support safe operation, and in the event of an accident, record the situation to expedite subsequent responses.

[0474] A "detection device" is a device attached to a moving object that continuously acquires video information by continuously photographing the surrounding environment.

[0475] "Visual information" refers to information about the surrounding environment as visual data acquired by a detection device.

[0476] "Machine learning techniques" are algorithms and methods that enable computer systems to identify patterns from data and make automatic decisions based on those patterns.

[0477] "Evaluation of movement and estimated distance" is a process that uses machine learning technology to analyze and evaluate the movement patterns and approach distances of objects from video information obtained by a detection device.

[0478] A "recording device" is a remote storage medium or system used to store data related to events that occur in a moving object.

[0479] A "cloud-based information storage service" is an online platform for storing and managing digital data that is located remotely and accessible via the internet.

[0480] A "mobile device" is any form of transport equipment designed to carry people or goods, including personal electric devices.

[0481] This invention is a system integrated into a personal mobile device and is designed to ensure user safety. The system mainly consists of a user-attached terminal (such as a smartphone), its built-in camera, and an application equipped with AI technology.

[0482] First, the user attaches their smartphone to the mobile device and launches a dedicated application. This application integrates camera control functions and AI, and is ready to monitor security in real time.

[0483] Next, the device uses the smartphone's camera to continuously acquire video information of its surroundings. This video information is processed by a deep learning model using machine learning techniques to accurately recognize pedestrians, vehicles, and obstacles. The AI ​​evaluates the movement and estimated distance of these objects and immediately generates an alert if danger is predicted. This alert is communicated to the user via voice and on-screen display.

[0484] If contact occurs, the server will save the video footage recorded by the terminal to a remote cloud-based information storage service. This ensures that evidence of the incident is securely stored and can be used later as reference if necessary.

[0485] To give a specific example, if the device detects a bicycle that suddenly appears in front of it in a busy area, and the risk of collision increases, the user will be given a voice alert saying, "Please slow down."

[0486] An example of a prompt to input into the generated AI model would be: "Please describe the operating process of the safety system for personal mobile devices. In particular, I would like to know more about the method used for object recognition by AI."

[0487] Through this invention, it becomes possible to support the safe operation of mobile vehicles and minimize the risk of accidents.

[0488] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0489] Step 1:

[0490] The user attaches their smartphone to a moving object and launches a dedicated application. This activates the camera and puts it into safety monitoring mode. The input is the status of the smartphone (camera activated), and the output is an instruction for the camera to start recording. In this process, the user physically secures the smartphone and manually launches the application.

[0491] Step 2:

[0492] The device uses the smartphone's camera to continuously acquire video information of its surroundings. The input is video information captured by the camera, and the output is video data generated in real time. This allows the device to continuously record visual information. In this operation, the camera captures the situation at a high frame rate and passes the video data to the application.

[0493] Step 3:

[0494] The AI ​​installed in the device processes the acquired video data and uses a deep learning model to recognize objects. The input is the acquired video data, and the output is information about the recognized objects (pedestrians, vehicles, obstacles, etc.). In this operation, the AI ​​uses machine learning algorithms to extract features from the video and perform identification.

[0495] Step 4:

[0496] The terminal evaluates the movement and estimated distance of recognized objects to determine the risk of contact. The input is object information, and the output is the result of the risk assessment. In this evaluation, the AI ​​analyzes the movement of the object and calculates whether it deviates from a safe distance range.

[0497] Step 5:

[0498] The device generates an alert based on a risk assessment and notifies the user. The input is the result of the risk assessment, and the output is an alert message (audio or screen display). Specifically, the device emits a warning sound or visual instructions, allowing the user to take appropriate action based on them.

[0499] Step 6:

[0500] The server automatically saves video footage recorded by the terminal to a cloud-based recording device when contact occurs. The input is the video data of the incident to be saved, and the output is the video information securely stored in the cloud. This process involves uploading data externally using an internet connection.

[0501] (Application Example 1)

[0502] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0503] In modern mobility technology, particularly autonomous driving systems, it is crucial to accurately perceive the surrounding environment in real time to ensure safety. However, existing systems often fail to adequately recognize the surrounding environment, increasing the risk of traffic accidents. Furthermore, there is a need for a system that can quickly and safely record and store the circumstances of an accident when it occurs.

[0504] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0505] In this invention, the server includes means for continuously acquiring image information using a video acquisition device, means for analyzing the movement and distance of an object based on the acquired image information, and means for recognizing an object using a deep learning model for image analysis. This makes it possible for a moving object to grasp its surroundings in real time and accurately, thereby enhancing safety. Furthermore, by transmitting the acquired video information to a remote location via a communication network and automatically recording video at the time of contact, accidents can be recorded and saved quickly and reliably.

[0506] A "video acquisition device" is a device attached to a moving object to continuously acquire video information of the surroundings.

[0507] "Image information" refers to visual data acquired by video acquisition devices, which is used for the recognition and analysis of objects.

[0508] "Object" refers to dynamic or static elements such as objects or people that exist around a moving object.

[0509] A "mobile body" is a mechanism responsible for the movement of people or objects, and in this invention, it includes an automatic driving mechanism.

[0510] "Analysis" is the process of understanding and evaluating the content of acquired image information using a specific algorithm.

[0511] A "deep learning model" is a collection of algorithms built on AI technology, designed to recognize objects from image information with high accuracy.

[0512] An "external information storage device" is a device or system for storing data acquired from a mobile object, and in this invention, it refers to an information storage service located in a remote location.

[0513] A "communication network" is a system of telecommunications used to transfer information to a remote location, and includes the Internet.

[0514] "Real-time" refers to a time frame in which the situation in question can be processed and reacted to immediately.

[0515] The system for implementing this invention primarily involves the coordinated operation of a server and a terminal. The terminal is attached to a mobile device and uses a camera to acquire real-time images of the surroundings. The basic principle is to continuously acquire image information from the camera, which acts as an image acquisition device.

[0516] 1. Terminal processing:

[0517] The device functions as a smartphone or similar mobile device and analyzes acquired image information using a deep learning model. The AI ​​technologies used include frameworks such as TensorFlow and PyTorch. This enables accurate recognition of objects such as pedestrians, vehicles, and bicycles, and assessment of the risk of contact with moving objects. If a risk is detected, the device issues a warning signal to the user.

[0518] 2. Server processing:

[0519] The server manages data transmitted from terminals in a cloud environment. Remote data storage services such as AWS and Google Cloud Storage are used as external data storage devices. The server receives acquired video information via the communication network and records the video if contact occurs. The stored data is kept secure and accessible for later reference.

[0520] 3. User roles:

[0521] The user operates the vehicle according to instructions from the terminal. When a warning is issued, the user receives audio and visual alerts and takes appropriate evasive action.

[0522] As a concrete example, at a busy intersection, the system detects a cyclist crossing in disregard of traffic signals early on and immediately warns the user to avoid a collision. In this case, the generative AI model performs analysis using prompt statements like the following.

[0523] "To the AI ​​model: Use this video data as input to identify pedestrians, cyclists, and vehicles, and calculate the risk of collision. Increase the frequency of alerts when pedestrians are close and notify the driver in real time."

[0524] In this way, the invention functions as a system that instantly grasps the surrounding situation and enhances safety in real time.

[0525] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0526] Step 1:

[0527] The terminal uses a video acquisition device to continuously acquire image information from around a moving object. The input is video data obtained through the camera, and the output is raw image data.

[0528] Step 2:

[0529] The image data acquired by the device is analyzed using a deep learning model. In this process, the image data is input into an AI framework such as TensorFlow, and the model identifies objects (pedestrians, vehicles, etc.). The output is the identified objects, their location information, and predictions of their movement.

[0530] Step 3:

[0531] The device evaluates the risk of contact between a moving object and an object based on the analysis results. The input is data on the object and its movement, identified by a deep learning model. The device uses this data to calculate the distance and speed to the object and determine the risk of contact. The output is the result of the risk evaluation.

[0532] Step 4:

[0533] The device issues a warning to the user based on a risk assessment. The input is the result of the risk assessment, and the output is an audio or visual alert. Specifically, the user is notified with a warning sound or on-screen display.

[0534] Step 5:

[0535] The server receives and stores image data of high-risk events sent from the terminal. The input is a video clip sent from the terminal. The server stores this in cloud storage and provides the data as needed upon request. The output is securely stored video data.

[0536] Step 6:

[0537] The user receives a warning from the device and takes appropriate evasive action. The input is the warning notification from the device. The user performs driving operations as needed to avoid the danger. The output is the evasive action taken.

[0538] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0539] This invention aims to support the safe operation of personal mobility devices by constructing a warning system that takes into account the surrounding environment using an imaging device and the user's emotional state. Embodiments of this invention are described in detail below.

[0540] First, the user attaches their smartphone to a personal mobility device such as an electric scooter or bicycle and launches a dedicated application. This application incorporates camera control, AI image analysis capabilities, and an emotion engine.

[0541] The device continuously acquires surrounding video data using the smartphone's camera and uses an AI algorithm to identify objects in front of and around it. During this process, the distance and movement of the objects are evaluated in real time, and a warning is generated if a contact risk is detected.

[0542] Furthermore, the device captures the user's facial expressions and uses an emotion engine to analyze the user's emotional state in real time. This emotion analysis allows the device to appropriately adjust the intensity and type of warnings when the user is feeling anxious or worried. For example, it can attract the attention of a user who is highly tense by increasing visual information, while applying standard warning methods if the user is relaxed.

[0543] As a concrete example, if the device detects a vehicle approaching on a congested city road and simultaneously determines from the user's facial expression that they are feeling anxious, it will display a stronger audio alert along with a clearer visual warning on the screen. As a result, the user can recognize the danger more quickly and take safer driving actions.

[0544] Furthermore, if contact occurs, the device automatically records the moment and can securely save the data using a cloud service. This data can be effectively used as evidence in the event of an accident to meet insurance procedures and legal requirements.

[0545] Thus, the present invention provides an embodiment of a system that comprehensively considers the movement environment of a moving object and the psychological state of the user, and contributes to improving safety.

[0546] The following describes the processing flow.

[0547] Step 1:

[0548] The user attaches a smartphone with the dedicated application installed to the personal mobile device and launches the application. The application activates the camera and sensors and completes the initial setup.

[0549] Step 2:

[0550] The device uses the smartphone's camera to acquire surrounding video data in real time and prepares for analysis frame by frame.

[0551] Step 3:

[0552] The device uses an AI algorithm to identify objects such as pedestrians, other vehicles, and stationary objects from the acquired video data, and calculates the distance and relative speed to each object.

[0553] Step 4:

[0554] The device simultaneously captures images of the user's face and analyzes their emotional state using an emotion engine. It identifies the user's facial expressions and evaluates their emotional state, such as their level of tension or relaxation.

[0555] Step 5:

[0556] The device comprehensively evaluates the risk of contact by combining data on the object's movement and distance, as well as the user's emotional state. If there is a risk of contact, it generates a warning, dynamically adjusting the warning method and its intensity according to the user's emotional state.

[0557] Step 6:

[0558] The user receives the warning when it occurs. This is usually presented as a visual or audible alert, but if the user is particularly anxious, measures such as increasing the volume of the audio or the display of the warning will be taken.

[0559] Step 7:

[0560] If the device detects that contact has occurred, it immediately records the event and saves the data. The recording covers the period before and after the contact and is saved as an appropriate clip.

[0561] Step 8:

[0562] The device processes the recorded data and uploads it to a cloud service once an internet connection is available. This process ensures the secure storage of the data and prepares it for future use.

[0563] Step 9:

[0564] The server organizes and stores uploaded data, and is prepared to smoothly provide it to insurance companies and legal authorities as needed. This process streamlines accident processing and reduces the burden on users.

[0565] (Example 2)

[0566] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0567] Ensuring safety while driving personal vehicles is a critical issue. However, conventional technology has not provided safety driving support through warning systems that take into account the driver's emotional state. As a result, there have been cases where the risk of collision could not be adequately warned, and safety could not be improved.

[0568] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0569] In this invention, the server includes means for continuously acquiring surrounding information using an imaging device attached to a mobile body, means for analyzing the characteristics of an object based on the acquired information, and means for analyzing the user's facial expressions to evaluate their emotional state and adjust warnings accordingly. This enables appropriate warnings that take into account the driver's psychological state, thereby improving the safety of personal mobile devices.

[0570] "Mobile devices" refer to personal transport devices, and specifically include electric devices.

[0571] An "imaging device" refers to a device that includes hardware and software for continuously acquiring visual information from the surroundings.

[0572] "Surrounding information" refers to all data about the environment acquired by the imaging device, including video and other sensor data.

[0573] "Means for analyzing the characteristics of an object" refers to a method that uses AI technology to analyze the movement and distance of an object based on acquired surrounding information.

[0574] "Means for analyzing the user's facial expressions" refers to a system that uses an emotion engine to evaluate the user's emotional state based on video data acquired by an imaging device.

[0575] "Means of adjusting warnings" refers to technology that has the function of appropriately changing the intensity and form of warnings according to the user's emotional state.

[0576] The embodiments for carrying out the present invention will now be described. This invention provides a system for a personal mobile device that acquires information about its surroundings and supports safe driving.

[0577] The user attaches their smartphone to a motorized personal mobility device and launches a dedicated application. This application incorporates camera control, AI image analysis, and an emotion analysis engine. This enables the smartphone to monitor the status of the mobile device.

[0578] The device, specifically a smartphone, continuously acquires images of its surroundings using its built-in camera. These images are analyzed by AI algorithms, allowing for real-time identification of object characteristics and movements. Distance sensors and GPS functionality may also be used in the AI ​​analysis.

[0579] The device uses an AI model to assess the distance to an object and the risk of contact. If a high level of danger is detected, the device generates a warning and communicates it to the user via voice and visual alerts. This warning level is adjusted based on the user's real-time emotional state. The device uses its camera to capture the user's face and performs emotion analysis from their facial expressions. The emotion analysis engine detects the user's anxiety, impatience, and other emotions, and generates a warning appropriate to the situation.

[0580] For example, if the device detects that the user is feeling anxious, it may display a louder-than-usual alarm or a visual warning with a more prominent color scheme. This allows the user to quickly understand the risk and take safe action.

[0581] Furthermore, recording automatically begins in the event of contact or an accident, and the collected data is saved via cloud storage. This feature plays a crucial role in collecting evidence after an accident and in insurance procedures.

[0582] An example of a prompt from the generated AI model is, "Please explain what kind of warning would be effective when you are feeling anxious. Please describe the specific situation and the countermeasures." This system makes it possible to drive vehicles safely while taking into account the user's psychological state.

[0583] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0584] Step 1:

[0585] The user installs a dedicated application on their smartphone and attaches it to the electric personal mobility device. Launching the application makes all system functions available. The input is the user attaching the smartphone and launching the app, while the output is the completion of the app's launch and the start of environmental monitoring. Specifically, the user taps the app and operates it through the interface.

[0586] Step 2:

[0587] The device continuously captures ambient video data using the smartphone's camera. The input is the video data acquired by the camera, and the output is the result of image analysis by an AI model. Specifically, the device sends live video from the camera frame by frame to the AI ​​system for real-time processing.

[0588] Step 3:

[0589] The device uses an AI model to identify objects from acquired video footage and analyzes their movement patterns and distances. The input is the image analysis data obtained in step 2, and the output is the movement history and distance information of the detected objects. Specifically, the device executes an AI algorithm to identify objects based on a certain threshold.

[0590] Step 4:

[0591] The terminal assesses the risk of contact and generates a warning. The input is the analysis result from step 3, and the output is the trigger for the warning (e.g., an audio warning or a visual alert). Specifically, when an object comes within a certain distance, the terminal immediately activates an alarm and displays a warning message on the screen.

[0592] Step 5:

[0593] The device uses a camera to capture the user's facial expressions and performs real-time emotion analysis using an emotion engine. The input is the user's facial image, and the output is the analyzed emotional state. Specifically, the device analyzes the user's facial features to identify emotions such as anxiety or worry.

[0594] Step 6:

[0595] The device customizes warnings according to the user's emotional state. The input is the result of the emotion analysis in step 5, and the output is the adjusted warning content. Specifically, when the emotional state is tense, the device selects an enhanced warning method, increasing the volume and visual display.

[0596] Step 7:

[0597] The device automatically records video and saves it to the cloud in the event of contact or an accident. The input is video data from the moment the accident occurs, and the output is the recorded footage securely stored in the cloud. Specifically, the device calls a cloud API to initiate the process of backing up the recorded data.

[0598] (Application Example 2)

[0599] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0600] In autonomous vehicles, if occupants experience feelings of anxiety or impatience during operation, conventional warning systems lack sufficient psychological support, limiting their ability to improve safety. Therefore, there is a need for a system that analyzes the occupants' emotional state in real time and dynamically adjusts appropriate warnings and information provision to achieve both safety and comfort.

[0601] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0602] In this invention, the server includes means for continuously acquiring video data using an imaging device attached to a moving object, means for analyzing the movement and distance of an object based on the acquired video data, and means for analyzing the user's emotional state and adjusting the generated warning according to that state. This makes it possible to provide appropriate warnings and information that correspond to the emotional state of the occupants.

[0603] A "mobile object" is a machine or device capable of changing its position, and in this invention, it specifically refers to an autonomous vehicle.

[0604] An "imaging device" is a device that acquires surrounding visual information as digital data, and includes devices such as cameras.

[0605] "Video data" refers to visual information acquired by an imaging device, represented in digital format, and in a format that allows for analysis and storage.

[0606] "Objects" refer to people and objects present around a moving vehicle that should be recognized and detected for safe driving.

[0607] "Emotional state" refers to the mental state of the crew members and can be expressed as anxiety, impatience, relaxation, etc.

[0608] A "warning" is a visual or auditory presentation of information intended to convey the presence of a potential hazard and to attract the attention of the crew.

[0609] "External storage device" refers to a storage means installed separately from a mobile device for storing data, and includes remote data storage services such as the cloud.

[0610] The system that realizes this invention is operated by a server mounted on a mobile vehicle. The server uses a camera that functions as an imaging device to continuously acquire video data of the surroundings. The video data is analyzed using AI image analysis software to identify the position and movement of objects, and based on this information, the mobile vehicle analyzes whether there is a risk of collision.

[0611] This system incorporates an emotion analysis tool to assess the crew's emotional state in real time. The server captures the crew's facial expressions, and the emotion analysis engine analyzes their psychological state, such as anxiety or impatience. The system dynamically adjusts the content and method of warnings according to the crew's emotional state. For example, if a crew member is feeling anxious, the intensity of visual warnings and audio alerts may be increased, and information may be provided to reassure them.

[0612] Furthermore, in the event of a collision, the server immediately saves video data of that moment to an external storage device. This saving function utilizes cloud services to store the data remotely, allowing it to be used as evidence in the event of an accident.

[0613] For example, if an autonomous vehicle driving through a busy area analyzes the occupants' emotions when the distance to the vehicle in front decreases and determines that they are feeling anxious, the system may switch to a mode that provides calming music or helpful information.

[0614] In this way, the server aims to improve the safety of the vehicle and the comfort of the occupants by providing advanced safe driving support tailored to the occupants' psychological state.

[0615] An example of a prompt message is: "Use AI to perform emotion analysis and environmental awareness within the autonomous vehicle, understand the occupants' emotional state, and suggest ways to reduce anxiety."

[0616] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0617] Step 1:

[0618] The server continuously acquires video data from the surrounding environment through an imaging device mounted on a mobile vehicle. At this stage, raw data from the camera is input and output as video data. This video data is used in subsequent analysis processing.

[0619] Step 2:

[0620] The server inputs the acquired video data into AI image analysis software to analyze the movement and distance of objects. Here, data processing is performed to recognize objects from the image data and calculate their position and velocity. This outputs data used for collision risk assessment.

[0621] Step 3:

[0622] The server uses the analysis results to determine whether there is a risk of collision in order to assess the likelihood of a collision. In this step, it calculates the presence or absence of a collision risk based on information about the movement and distance of the objects, and outputs the result.

[0623] Step 4:

[0624] The server captures images of the crew members' faces to analyze their emotional state and inputs them into an emotion analysis engine. The input data consists of facial expression images, and this data is used to evaluate their emotional state, such as anxiety or relaxation levels, and output as emotion evaluation data.

[0625] Step 5:

[0626] The server generates warnings and adjusts their intensity and type based on collision risk and emotion assessment data. Specifically, if occupants are feeling anxious, it generates strong visual and auditory warnings. As a result, appropriate warning signals are output to the user.

[0627] Step 6:

[0628] The server detects contact when it occurs, immediately records video footage of that moment, and saves it to external storage. Here, collision detection triggers the transfer of video data to a cloud service. This process ensures that accident records are securely stored.

[0629] Step 7:

[0630] Users receive relaxing music and information provided by the system, which reduces anxiety. In this process, the user's experience based on the content provided by the server reduces the psychological burden on the crew.

[0631] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0632] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0633] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0634] [Fourth Embodiment]

[0635] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0636] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0637] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0638] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0639] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0640] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0641] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0642] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0643] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0644] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0645] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0646] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0647] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0648] This invention is a system for ensuring the safety of personal mobile devices, and is mainly composed of an imaging device attached to the mobile device and artificial intelligence technology. Specific embodiments of this system will be described below.

[0649] First, the user attaches their smartphone to the mobile device and launches a dedicated application. This application integrates camera and AI control functions, ready to monitor security in real time.

[0650] Next, the device uses its camera to continuously acquire video data of its surroundings. This captures objects in front of and around the moving object in the video, collecting information for safe operation. The acquired video data is processed by AI within the device to determine the movement and distance of the objects.

[0651] AI-powered image analysis uses deep learning models to accurately recognize pedestrians, vehicles, and other obstacles. Based on this information, the device evaluates the distance to each object and its movement in real time, and immediately generates an alert if it determines there is a risk of contact.

[0652] As a concrete example, suppose a device is operating in a busy area and suddenly detects a pedestrian crossing the road from the front. In this case, the AI ​​calculates the risk of collision with the pedestrian and issues a warning to the user through an audio alert or a warning message on the screen. This allows the user to take quick and appropriate action.

[0653] Furthermore, if contact occurs, the terminal automatically records video of the scene and generates a video clip that includes information from a certain period prior. Once an internet connection becomes available, the recorded video is uploaded to an external storage device, i.e., a cloud service, and securely stored as evidence of the accident. The server strictly manages this data and can provide it to insurance companies or legal authorities as needed.

[0654] Thus, the present invention is a system that embodies efforts to improve safety by reducing the risk of accidents during the operation of mobile objects and by quickly recording and saving the circumstances of accidents that occur.

[0655] The following describes the processing flow.

[0656] Step 1:

[0657] The user securely attaches their smartphone to the moving object and launches the dedicated application. The application automatically activates the camera and sensors and prepares for use.

[0658] Step 2:

[0659] The device uses the smartphone's camera to continuously acquire video data. This allows for the collection of information about the environment in front and around the device, which is then used for analysis.

[0660] Step 3:

[0661] The device applies an AI algorithm to each frame of the acquired video data to analyze the position, speed, and distance of objects. It utilizes a deep learning model to identify pedestrians, other vehicles, and stationary objects.

[0662] Step 4:

[0663] The device evaluates the risk of contact with the object in real time based on the analysis results. If the risk exceeds a certain threshold, it is determined that there is a risk of contact.

[0664] Step 5:

[0665] If the device detects a high risk of contact, it will immediately warn the user. The warning will be provided through audio alerts and on-screen displays to prompt the user to take immediate action.

[0666] Step 6:

[0667] The user receives an alert and takes safe driving actions such as changing lanes, slowing down, or stopping. This helps prevent accidents.

[0668] Step 7:

[0669] If contact occurs, the device immediately activates its recording function and records video before and after the contact. The recorded video is temporarily stored on the device.

[0670] Step 8:

[0671] If the device has an internet connection, it can upload recorded video data to an external storage device such as the cloud, ensuring secure storage.

[0672] Step 9:

[0673] The server will properly manage the video data stored in the cloud and establish a system to provide the data to insurance companies and legal authorities when necessary for accident investigation.

[0674] (Example 1)

[0675] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0676] Enhancing the safety of personal mobility devices has become a critical issue in recent years, given the increasing number of users. In particular, there is a need for systems that can appropriately detect risks and respond quickly to prevent accidental contact or collisions. Furthermore, in the event of a contact accident, it is essential to reliably record the circumstances to aid in subsequent responses and investigations. While these challenges can be addressed by enabling mobile devices to understand their surroundings and implement appropriate risk management, existing technologies have not adequately addressed them.

[0677] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0678] In this invention, the server includes means for continuously acquiring video information by a detection device attached to a mobile vehicle, means for recognizing an object using machine learning technology based on the acquired video information and evaluating its movement and estimated distance, and means for automatically recording video and saving it to a recording device located remotely when contact occurs. As a result, the mobile vehicle can monitor surrounding risks in real time, support safe operation, and in the event of an accident, record the situation to expedite subsequent responses.

[0679] A "detection device" is a device attached to a moving object that continuously acquires video information by continuously photographing the surrounding environment.

[0680] "Visual information" refers to information about the surrounding environment as visual data acquired by a detection device.

[0681] "Machine learning techniques" are algorithms and methods that enable computer systems to identify patterns from data and make automatic decisions based on those patterns.

[0682] "Evaluation of movement and estimated distance" is a process that uses machine learning technology to analyze and evaluate the movement patterns and approach distances of objects from video information obtained by a detection device.

[0683] A "recording device" is a remote storage medium or system used to store data related to events that occur in a moving object.

[0684] A "cloud-based information storage service" is an online platform for storing and managing digital data that is located remotely and accessible via the internet.

[0685] A "mobile device" is any form of transport equipment designed to carry people or goods, including personal electric devices.

[0686] This invention is a system integrated into a personal mobile device and is designed to ensure user safety. The system mainly consists of a user-attached terminal (such as a smartphone), its built-in camera, and an application equipped with AI technology.

[0687] First, the user attaches their smartphone to the mobile device and launches a dedicated application. This application integrates camera control functions and AI, and is ready to monitor security in real time.

[0688] Next, the device uses the smartphone's camera to continuously acquire video information of its surroundings. This video information is processed by a deep learning model using machine learning techniques to accurately recognize pedestrians, vehicles, and obstacles. The AI ​​evaluates the movement and estimated distance of these objects and immediately generates an alert if danger is predicted. This alert is communicated to the user via voice and on-screen display.

[0689] If contact occurs, the server will save the video footage recorded by the terminal to a remote cloud-based information storage service. This ensures that evidence of the incident is securely stored and can be used later as reference if necessary.

[0690] To give a specific example, if the device detects a bicycle that suddenly appears in front of it in a busy area, and the risk of collision increases, the user will be given a voice alert saying, "Please slow down."

[0691] An example of a prompt to input into the generated AI model would be: "Please describe the operating process of the safety system for personal mobile devices. In particular, I would like to know more about the method used for object recognition by AI."

[0692] Through this invention, it becomes possible to support the safe operation of mobile vehicles and minimize the risk of accidents.

[0693] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0694] Step 1:

[0695] The user attaches their smartphone to a moving object and launches a dedicated application. This activates the camera and puts it into safety monitoring mode. The input is the status of the smartphone (camera activated), and the output is an instruction for the camera to start recording. In this process, the user physically secures the smartphone and manually launches the application.

[0696] Step 2:

[0697] The device uses the smartphone's camera to continuously acquire video information of its surroundings. The input is video information captured by the camera, and the output is video data generated in real time. This allows the device to continuously record visual information. In this operation, the camera captures the situation at a high frame rate and passes the video data to the application.

[0698] Step 3:

[0699] The AI ​​installed in the device processes the acquired video data and uses a deep learning model to recognize objects. The input is the acquired video data, and the output is information about the recognized objects (pedestrians, vehicles, obstacles, etc.). In this operation, the AI ​​uses machine learning algorithms to extract features from the video and perform identification.

[0700] Step 4:

[0701] The terminal evaluates the movement and estimated distance of recognized objects to determine the risk of contact. The input is object information, and the output is the result of the risk assessment. In this evaluation, the AI ​​analyzes the movement of the object and calculates whether it deviates from a safe distance range.

[0702] Step 5:

[0703] The device generates an alert based on a risk assessment and notifies the user. The input is the result of the risk assessment, and the output is an alert message (audio or screen display). Specifically, the device emits a warning sound or visual instructions, allowing the user to take appropriate action based on them.

[0704] Step 6:

[0705] The server automatically saves video footage recorded by the terminal to a cloud-based recording device when contact occurs. The input is the video data of the incident to be saved, and the output is the video information securely stored in the cloud. This process involves uploading data externally using an internet connection.

[0706] (Application Example 1)

[0707] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0708] In modern mobility technology, particularly autonomous driving systems, it is crucial to accurately perceive the surrounding environment in real time to ensure safety. However, existing systems often fail to adequately recognize the surrounding environment, increasing the risk of traffic accidents. Furthermore, there is a need for a system that can quickly and safely record and store the circumstances of an accident when it occurs.

[0709] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0710] In this invention, the server includes means for continuously acquiring image information using a video acquisition device, means for analyzing the movement and distance of an object based on the acquired image information, and means for recognizing an object using a deep learning model for image analysis. This makes it possible for a moving object to grasp its surroundings in real time and accurately, thereby enhancing safety. Furthermore, by transmitting the acquired video information to a remote location via a communication network and automatically recording video at the time of contact, accidents can be recorded and saved quickly and reliably.

[0711] A "video acquisition device" is a device attached to a moving object to continuously acquire video information of the surroundings.

[0712] "Image information" refers to visual data acquired by video acquisition devices, which is used for the recognition and analysis of objects.

[0713] "Object" refers to dynamic or static elements such as objects or people that exist around a moving object.

[0714] A "mobile body" is a mechanism responsible for the movement of people or objects, and in this invention, it includes an automatic driving mechanism.

[0715] "Analysis" is the process of understanding and evaluating the content of acquired image information using a specific algorithm.

[0716] A "deep learning model" is a collection of algorithms built on AI technology, designed to recognize objects from image information with high accuracy.

[0717] An "external information storage device" is a device or system for storing data acquired from a mobile object, and in this invention, it refers to an information storage service located in a remote location.

[0718] A "communication network" is a system of telecommunications used to transfer information to a remote location, and includes the Internet.

[0719] "Real-time" refers to a time frame in which the situation in question can be processed and reacted to immediately.

[0720] The system for implementing this invention primarily involves the coordinated operation of a server and a terminal. The terminal is attached to a mobile device and uses a camera to acquire real-time images of the surroundings. The basic principle is to continuously acquire image information from the camera, which acts as an image acquisition device.

[0721] 1. Terminal processing:

[0722] The device functions as a smartphone or similar mobile device and analyzes acquired image information using a deep learning model. The AI ​​technologies used include frameworks such as TensorFlow and PyTorch. This enables accurate recognition of objects such as pedestrians, vehicles, and bicycles, and assessment of the risk of contact with moving objects. If a risk is detected, the device issues a warning signal to the user.

[0723] 2. Server processing:

[0724] The server manages data transmitted from terminals in a cloud environment. Remote data storage services such as AWS and Google Cloud Storage are used as external data storage devices. The server receives acquired video information via the communication network and records the video if contact occurs. The stored data is kept secure and accessible for later reference.

[0725] 3. User roles:

[0726] The user operates the vehicle according to instructions from the terminal. When a warning is issued, the user receives audio and visual alerts and takes appropriate evasive action.

[0727] As a concrete example, at a busy intersection, the system detects a cyclist crossing in disregard of traffic signals early on and immediately warns the user to avoid a collision. In this case, the generative AI model performs analysis using prompt statements like the following.

[0728] "To the AI ​​model: Use this video data as input to identify pedestrians, cyclists, and vehicles, and calculate the risk of collision. Increase the frequency of alerts when pedestrians are close and notify the driver in real time."

[0729] In this way, the invention functions as a system that instantly grasps the surrounding situation and enhances safety in real time.

[0730] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0731] Step 1:

[0732] The terminal uses a video acquisition device to continuously acquire image information from around a moving object. The input is video data obtained through the camera, and the output is raw image data.

[0733] Step 2:

[0734] The image data acquired by the device is analyzed using a deep learning model. In this process, the image data is input into an AI framework such as TensorFlow, and the model identifies objects (pedestrians, vehicles, etc.). The output is the identified objects, their location information, and predictions of their movement.

[0735] Step 3:

[0736] The device evaluates the risk of contact between a moving object and an object based on the analysis results. The input is data on the object and its movement, identified by a deep learning model. The device uses this data to calculate the distance and speed to the object and determine the risk of contact. The output is the result of the risk evaluation.

[0737] Step 4:

[0738] The device issues a warning to the user based on a risk assessment. The input is the result of the risk assessment, and the output is an audio or visual alert. Specifically, the user is notified with a warning sound or on-screen display.

[0739] Step 5:

[0740] The server receives and stores image data of high-risk events sent from the terminal. The input is a video clip sent from the terminal. The server stores this in cloud storage and provides the data as needed upon request. The output is securely stored video data.

[0741] Step 6:

[0742] The user receives a warning from the device and takes appropriate evasive action. The input is the warning notification from the device. The user performs driving operations as needed to avoid the danger. The output is the evasive action taken.

[0743] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0744] This invention aims to support the safe operation of personal mobility devices by constructing a warning system that takes into account the surrounding environment using an imaging device and the user's emotional state. Embodiments of this invention are described in detail below.

[0745] First, the user attaches their smartphone to a personal mobility device such as an electric scooter or bicycle and launches a dedicated application. This application incorporates camera control, AI image analysis capabilities, and an emotion engine.

[0746] The device continuously acquires surrounding video data using the smartphone's camera and uses an AI algorithm to identify objects in front of and around it. During this process, the distance and movement of the objects are evaluated in real time, and a warning is generated if a contact risk is detected.

[0747] Furthermore, the device captures the user's facial expressions and uses an emotion engine to analyze the user's emotional state in real time. This emotion analysis allows the device to appropriately adjust the intensity and type of warnings when the user is feeling anxious or worried. For example, it can attract the attention of a user who is highly tense by increasing visual information, while applying standard warning methods if the user is relaxed.

[0748] As a concrete example, if the device detects a vehicle approaching on a congested city road and simultaneously determines from the user's facial expression that they are feeling anxious, it will display a stronger audio alert along with a clearer visual warning on the screen. As a result, the user can recognize the danger more quickly and take safer driving actions.

[0749] Furthermore, if contact occurs, the device automatically records the moment and can securely save the data using a cloud service. This data can be effectively used as evidence in the event of an accident to meet insurance procedures and legal requirements.

[0750] Thus, the present invention provides an embodiment of a system that comprehensively considers the movement environment of a moving object and the psychological state of the user, and contributes to improving safety.

[0751] The following describes the processing flow.

[0752] Step 1:

[0753] The user attaches a smartphone with the dedicated application installed to the personal mobile device and launches the application. The application activates the camera and sensors and completes the initial setup.

[0754] Step 2:

[0755] The device uses the smartphone's camera to acquire surrounding video data in real time and prepares for analysis frame by frame.

[0756] Step 3:

[0757] The device uses an AI algorithm to identify objects such as pedestrians, other vehicles, and stationary objects from the acquired video data, and calculates the distance and relative speed to each object.

[0758] Step 4:

[0759] The device simultaneously captures images of the user's face and analyzes their emotional state using an emotion engine. It identifies the user's facial expressions and evaluates their emotional state, such as their level of tension or relaxation.

[0760] Step 5:

[0761] The device comprehensively evaluates the risk of contact by combining data on the object's movement and distance, as well as the user's emotional state. If there is a risk of contact, it generates a warning, dynamically adjusting the warning method and its intensity according to the user's emotional state.

[0762] Step 6:

[0763] The user receives the warning when it occurs. This is usually presented as a visual or audible alert, but if the user is particularly anxious, measures such as increasing the volume of the audio or the display of the warning will be taken.

[0764] Step 7:

[0765] If the device detects that contact has occurred, it immediately records the event and saves the data. The recording covers the period before and after the contact and is saved as an appropriate clip.

[0766] Step 8:

[0767] The device processes the recorded data and uploads it to a cloud service once an internet connection is available. This process ensures the secure storage of the data and prepares it for future use.

[0768] Step 9:

[0769] The server organizes and stores uploaded data, and is prepared to smoothly provide it to insurance companies and legal authorities as needed. This process streamlines accident processing and reduces the burden on users.

[0770] (Example 2)

[0771] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0772] Ensuring safety while driving personal vehicles is a critical issue. However, conventional technology has not provided safety driving support through warning systems that take into account the driver's emotional state. As a result, there have been cases where the risk of collision could not be adequately warned, and safety could not be improved.

[0773] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0774] In this invention, the server includes means for continuously acquiring surrounding information using an imaging device attached to a mobile body, means for analyzing the characteristics of an object based on the acquired information, and means for analyzing the user's facial expressions to evaluate their emotional state and adjust warnings accordingly. This enables appropriate warnings that take into account the driver's psychological state, thereby improving the safety of personal mobile devices.

[0775] "Mobile devices" refer to personal transport devices, and specifically include electric devices.

[0776] An "imaging device" refers to a device that includes hardware and software for continuously acquiring visual information from the surroundings.

[0777] "Surrounding information" refers to all data about the environment acquired by the imaging device, including video and other sensor data.

[0778] "Means for analyzing the characteristics of an object" refers to a method that uses AI technology to analyze the movement and distance of an object based on acquired surrounding information.

[0779] "Means for analyzing the user's facial expressions" refers to a system that uses an emotion engine to evaluate the user's emotional state based on video data acquired by an imaging device.

[0780] "Means of adjusting warnings" refers to technology that has the function of appropriately changing the intensity and form of warnings according to the user's emotional state.

[0781] The embodiments for carrying out the present invention will now be described. This invention provides a system for a personal mobile device that acquires information about its surroundings and supports safe driving.

[0782] The user attaches their smartphone to a motorized personal mobility device and launches a dedicated application. This application incorporates camera control, AI image analysis, and an emotion analysis engine. This enables the smartphone to monitor the status of the mobile device.

[0783] The device, specifically a smartphone, continuously acquires images of its surroundings using its built-in camera. These images are analyzed by AI algorithms, allowing for real-time identification of object characteristics and movements. Distance sensors and GPS functionality may also be used in the AI ​​analysis.

[0784] The device uses an AI model to assess the distance to an object and the risk of contact. If a high level of danger is detected, the device generates a warning and communicates it to the user via voice and visual alerts. This warning level is adjusted based on the user's real-time emotional state. The device uses its camera to capture the user's face and performs emotion analysis from their facial expressions. The emotion analysis engine detects the user's anxiety, impatience, and other emotions, and generates a warning appropriate to the situation.

[0785] For example, if the device detects that the user is feeling anxious, it may display a louder-than-usual alarm or a visual warning with a more prominent color scheme. This allows the user to quickly understand the risk and take safe action.

[0786] Furthermore, recording automatically begins in the event of contact or an accident, and the collected data is saved via cloud storage. This feature plays a crucial role in collecting evidence after an accident and in insurance procedures.

[0787] An example of a prompt from the generated AI model is, "Please explain what kind of warning would be effective when you are feeling anxious. Please describe the specific situation and the countermeasures." This system makes it possible to drive vehicles safely while taking into account the user's psychological state.

[0788] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0789] Step 1:

[0790] The user installs a dedicated application on their smartphone and attaches it to the electric personal mobility device. Launching the application makes all system functions available. The input is the user attaching the smartphone and launching the app, while the output is the completion of the app's launch and the start of environmental monitoring. Specifically, the user taps the app and operates it through the interface.

[0791] Step 2:

[0792] The device continuously captures ambient video data using the smartphone's camera. The input is the video data acquired by the camera, and the output is the result of image analysis by an AI model. Specifically, the device sends live video from the camera frame by frame to the AI ​​system for real-time processing.

[0793] Step 3:

[0794] The device uses an AI model to identify objects from acquired video footage and analyzes their movement patterns and distances. The input is the image analysis data obtained in step 2, and the output is the movement history and distance information of the detected objects. Specifically, the device executes an AI algorithm to identify objects based on a certain threshold.

[0795] Step 4:

[0796] The terminal assesses the risk of contact and generates a warning. The input is the analysis result from step 3, and the output is the trigger for the warning (e.g., an audio warning or a visual alert). Specifically, when an object comes within a certain distance, the terminal immediately activates an alarm and displays a warning message on the screen.

[0797] Step 5:

[0798] The device uses a camera to capture the user's facial expressions and performs real-time emotion analysis using an emotion engine. The input is the user's facial image, and the output is the analyzed emotional state. Specifically, the device analyzes the user's facial features to identify emotions such as anxiety or worry.

[0799] Step 6:

[0800] The device customizes warnings according to the user's emotional state. The input is the result of the emotion analysis in step 5, and the output is the adjusted warning content. Specifically, when the emotional state is tense, the device selects an enhanced warning method, increasing the volume and visual display.

[0801] Step 7:

[0802] The device automatically records video and saves it to the cloud in the event of contact or an accident. The input is video data from the moment the accident occurs, and the output is the recorded footage securely stored in the cloud. Specifically, the device calls a cloud API to initiate the process of backing up the recorded data.

[0803] (Application Example 2)

[0804] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0805] In autonomous vehicles, if occupants experience feelings of anxiety or impatience during operation, conventional warning systems lack sufficient psychological support, limiting their ability to improve safety. Therefore, there is a need for a system that analyzes the occupants' emotional state in real time and dynamically adjusts appropriate warnings and information provision to achieve both safety and comfort.

[0806] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0807] In this invention, the server includes means for continuously acquiring video data using an imaging device attached to a moving object, means for analyzing the movement and distance of an object based on the acquired video data, and means for analyzing the user's emotional state and adjusting the generated warning according to that state. This makes it possible to provide appropriate warnings and information that correspond to the emotional state of the occupants.

[0808] A "mobile object" is a machine or device capable of changing its position, and in this invention, it specifically refers to an autonomous vehicle.

[0809] An "imaging device" is a device that acquires surrounding visual information as digital data, and includes devices such as cameras.

[0810] "Video data" refers to visual information acquired by an imaging device, represented in digital format, and in a format that allows for analysis and storage.

[0811] "Objects" refer to people and objects present around a moving vehicle that should be recognized and detected for safe driving.

[0812] "Emotional state" refers to the mental state of the crew members and can be expressed as anxiety, impatience, relaxation, etc.

[0813] A "warning" is a visual or auditory presentation of information intended to convey the presence of a potential hazard and to attract the attention of the crew.

[0814] "External storage device" refers to a storage means installed separately from a mobile device for storing data, and includes remote data storage services such as the cloud.

[0815] The system that realizes this invention is operated by a server mounted on a mobile vehicle. The server uses a camera that functions as an imaging device to continuously acquire video data of the surroundings. The video data is analyzed using AI image analysis software to identify the position and movement of objects, and based on this information, the mobile vehicle analyzes whether there is a risk of collision.

[0816] This system incorporates an emotion analysis tool to assess the crew's emotional state in real time. The server captures the crew's facial expressions, and the emotion analysis engine analyzes their psychological state, such as anxiety or impatience. The system dynamically adjusts the content and method of warnings according to the crew's emotional state. For example, if a crew member is feeling anxious, the intensity of visual warnings and audio alerts may be increased, and information may be provided to reassure them.

[0817] Furthermore, in the event of a collision, the server immediately saves video data of that moment to an external storage device. This saving function utilizes cloud services to store the data remotely, allowing it to be used as evidence in the event of an accident.

[0818] For example, if an autonomous vehicle driving through a busy area analyzes the occupants' emotions when the distance to the vehicle in front decreases and determines that they are feeling anxious, the system may switch to a mode that provides calming music or helpful information.

[0819] In this way, the server aims to improve the safety of the vehicle and the comfort of the occupants by providing advanced safe driving support tailored to the occupants' psychological state.

[0820] An example of a prompt message is: "Use AI to perform emotion analysis and environmental awareness within the autonomous vehicle, understand the occupants' emotional state, and suggest ways to reduce anxiety."

[0821] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0822] Step 1:

[0823] The server continuously acquires video data from the surrounding environment through an imaging device mounted on a mobile vehicle. At this stage, raw data from the camera is input and output as video data. This video data is used in subsequent analysis processing.

[0824] Step 2:

[0825] The server inputs the acquired video data into AI image analysis software to analyze the movement and distance of objects. Here, data processing is performed to recognize objects from the image data and calculate their position and velocity. This outputs data used for collision risk assessment.

[0826] Step 3:

[0827] The server uses the analysis results to determine whether there is a risk of collision in order to assess the likelihood of a collision. In this step, it calculates the presence or absence of a collision risk based on information about the movement and distance of the objects, and outputs the result.

[0828] Step 4:

[0829] The server captures images of the crew members' faces to analyze their emotional state and inputs them into an emotion analysis engine. The input data consists of facial expression images, and this data is used to evaluate their emotional state, such as anxiety or relaxation levels, and output as emotion evaluation data.

[0830] Step 5:

[0831] The server generates warnings and adjusts their intensity and type based on collision risk and emotion assessment data. Specifically, if occupants are feeling anxious, it generates strong visual and auditory warnings. As a result, appropriate warning signals are output to the user.

[0832] Step 6:

[0833] The server detects contact when it occurs, immediately records video footage of that moment, and saves it to external storage. Here, collision detection triggers the transfer of video data to a cloud service. This process ensures that accident records are securely stored.

[0834] Step 7:

[0835] Users receive relaxing music and information provided by the system, which reduces anxiety. In this process, the user's experience based on the content provided by the server reduces the psychological burden on the crew.

[0836] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0837] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0838] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0839] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0840] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0841] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0842] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0843] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0844] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0845] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0846] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0847] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0848] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0849] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0850] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0851] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0852] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0853] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0854] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0855] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0856] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0857] The following is further disclosed regarding the embodiments described above.

[0858] (Claim 1)

[0859] A means for continuously acquiring video data using an imaging device attached to a moving object,

[0860] A means for analyzing the movement and distance of an object based on acquired video data,

[0861] A means for determining the risk of a moving object coming into contact with an object based on the analysis results and generating a warning,

[0862] A means for automatically recording video and saving it to an external storage device when contact occurs,

[0863] A system that includes this.

[0864] (Claim 2)

[0865] The system according to claim 1, comprising at least an electric personal mobility device as a mobile unit.

[0866] (Claim 3)

[0867] The system according to claim 1, which uses a data storage service located in a remote location as an external storage device.

[0868] "Example 1"

[0869] (Claim 1)

[0870] A means for continuously acquiring video information using a detection device attached to a moving object,

[0871] A means for recognizing objects using machine learning technology based on acquired video information, and for evaluating their movement and estimated distance,

[0872] Based on the evaluation results, a means is provided to determine the risk of a moving object coming into contact with the target object, and to generate and notify a warning in real time.

[0873] A means to automatically record video when contact occurs and save it to a recording device located remotely,

[0874] A system that includes this.

[0875] (Claim 2)

[0876] The system according to claim 1, comprising at least an electric personal mobility device as a mobile unit.

[0877] (Claim 3)

[0878] The system according to claim 1, which uses a cloud-based information storage service as a recording device.

[0879] "Application Example 1"

[0880] (Claim 1)

[0881] A means for continuously acquiring image information using a video acquisition device,

[0882] A means for analyzing the movement and spacing of objects based on acquired image information,

[0883] The analyzed information provides a means to assess the risk of a moving object coming into contact with the target and to issue a warning.

[0884] A means for automatically recording video when contact occurs and transferring it to an external information storage device,

[0885] For image analysis, a deep learning model is used to recognize the object,

[0886] A means for transmitting acquired video information to a remote location via a communication network,

[0887] A system that includes this.

[0888] (Claim 2)

[0889] The system according to claim 1, comprising at least an autonomous driving mobility mechanism.

[0890] (Claim 3)

[0891] The system according to claim 1, which uses a remote information storage service as an external information storage device.

[0892] "Example 2 of combining an emotion engine"

[0893] (Claim 1)

[0894] A means of continuously acquiring surrounding information using an imaging device attached to a moving object,

[0895] A means for analyzing the characteristics of an object based on acquired surrounding information,

[0896] A means for determining the risk of a moving object coming into contact with an object based on the analysis results and generating a warning,

[0897] A means of evaluating the emotional state by analyzing the user's facial expressions and adjusting the intensity of the warning,

[0898] A means to automatically record and store in a remote location when contact occurs,

[0899] A system that includes this.

[0900] (Claim 2)

[0901] The system according to claim 1, comprising at least an electrically powered personal mobile device.

[0902] (Claim 3)

[0903] The system according to claim 1, which is capable of acquiring data for analyzing psychological states.

[0904] "Application example 2 when combining with an emotional engine"

[0905] (Claim 1)

[0906] A means for continuously acquiring video data using an imaging device attached to a moving object,

[0907] A means for analyzing the movement and distance of an object based on acquired video data,

[0908] A means for determining the risk of a moving object coming into contact with an object based on the analysis results and generating a warning,

[0909] A means for automatically recording video and saving it to an external storage device when contact occurs,

[0910] A means for analyzing the user's emotional state and adjusting the generated warning according to that state,

[0911] A system that includes this.

[0912] (Claim 2)

[0913] The system according to claim 1, comprising at least an autonomous vehicle as a mobile entity.

[0914] (Claim 3)

[0915] The system according to claim 1, which uses a data storage service located in a remote location as an external storage device. [Explanation of Symbols]

[0916] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means for continuously acquiring image information using an image acquisition device, A means for analyzing the movement and spacing of objects based on acquired image information, The analyzed information provides a means to assess the risk of a moving object coming into contact with the target and to issue a warning. A means for automatically recording video when contact occurs and transferring it to an external information storage device, For image analysis, a deep learning model is used to recognize the object, A means for transmitting acquired video information to a remote location via a communication network, A system that includes this.

2. The system according to claim 1, comprising at least an autonomous driving mobility mechanism.

3. The system according to claim 1, which uses a remote information storage service as an external information storage device.