system
An AI-driven monitoring system effectively detects and deters abnormal behaviors and suspicious individuals by generating real-time warnings, addressing the limitations of conventional systems in crime prevention.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-10
- Publication Date
- 2026-06-22
AI Technical Summary
Conventional monitoring systems are inadequate in real-time detection of abnormal behaviors and suspicious individuals, lack effective deterrence mechanisms, and fail to provide immediate notifications, making them insufficient for preventing crimes.
An AI-based system that analyzes video data from surveillance devices to detect abnormal behaviors and suspicious individuals, generates warning messages, and outputs them through communication means to management terminals for immediate action.
Enables rapid detection and deterrence of potential threats by generating situation-appropriate warnings, enhancing crime prevention and user safety.
Smart Images

Figure 2026101194000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] The current monitoring system only records abnormal behaviors and is not sufficient to deter crimes on the spot. Also, it is difficult to identify suspicious persons or those with a criminal record in real time and take immediate appropriate actions. Furthermore, there is a lack of effective notification means for intruders, making it insufficient to prevent crimes.
Means for Solving the Problems
[0005] This invention provides an artificial intelligence means for analyzing video data acquired from a monitoring device and detecting abnormal behavior and suspicious individuals in real time. It includes a generation means that generates warning messages corresponding to the abnormality using generation technology. Furthermore, by constructing a system that includes a communication means for outputting the generated warning messages and immediately notifying a management terminal of the abnormal behavior detection results, effective crime deterrence against monitored targets is achieved.
[0006] A "surveillance device" is a device installed to photograph a specific area and collect video data.
[0007] "Video data" refers to digital data that includes visual information acquired by surveillance equipment.
[0008] "Artificial intelligence means" refers to a computer program or system equipped with the function of analyzing video data to detect abnormal behavior or suspicious individuals.
[0009] The "generation means" refers to a system that has the function of automatically creating warning messages based on abnormal behavior detected by artificial intelligence means.
[0010] A "warning message" is an audio or text notification created by a generation device to alert the public to unusual behavior or the presence of a suspicious person.
[0011] "Output means" refers to a device that provides audio or display to notify an external party of the generated warning message.
[0012] "Communication means" refers to a system with network functionality for transmitting abnormal behavior detection results and suspicious person information to other devices or management terminals.
[0013] A "management terminal" is a device used to monitor the operation of the entire system and to receive information when an anomaly occurs.
[0014] A "suspicious person" refers to an individual who poses a potential threat and is detected by the system based on predefined criteria. [Brief explanation of the drawing]
[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14]It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentiment engine is combined.
Embodiments for Carrying Out the Invention
[0016] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0017] First, the terms used in the following description will be explained.
[0018] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0019] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0020] In the following embodiments, a numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.
[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0023] [First Embodiment]
[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0036] To implement the present invention, a system is constructed that includes a monitoring device, a server, a terminal, and a user as its main components. The specific functions of each component and their embodiments are described below.
[0037] server
[0038] The server receives video data transmitted in real time from the monitoring device and analyzes this data using an AI model. First, the video data is broken down into frames, and each frame is preprocessed. This preprocessing includes noise reduction and resolution adjustment. Next, the preprocessed frames are analyzed by artificial intelligence for the presence or absence of abnormal behavior, and suspicious individuals are identified by comparing them with a registered database.
[0039] generation technology
[0040] When abnormal behavior or suspicious individuals are detected, the server automatically generates appropriate warning messages using generation technology. Depending on the type of abnormal behavior, it adjusts the content and format of the message to create warning messages or interactive communications.
[0041] terminal
[0042] The terminal receives warning messages and anomaly detection notifications from the server and displays them to the administrator. Received warning messages are output in real time as audio or text to alert the administrator and those nearby. Furthermore, the terminal notifies the police and other relevant agencies as needed.
[0043] User
[0044] The user assesses the situation based on the information provided by the device and responds as needed. If an anomaly is detected, the user checks the situation on site and takes action, such as contacting the police. The user can also customize the system settings to suit their individual needs, including, for example, the type and frequency of notifications and the content of the warning messages generated.
[0045] Specific example
[0046] Examples in unmanned stores
[0047] An unmanned store has installed monitoring equipment, which is constantly monitored by a server. If a customer enters the store wearing a helmet, the server immediately detects this and generates a warning message using generation technology: "Excuse me, could you please remove your helmet?" This message is played through a speaker on the terminal, prompting the customer to remove their helmet.
[0048] Examples in private residences
[0049] When a surveillance device installed in the parking lot of a private residence detects an intruder, the server immediately analyzes the information and generates sounds and voices to make it appear as if the resident is home. These generated voices are played from a terminal, exerting a psychological deterrent effect on the intruder and preventing unauthorized entry.
[0050] Thus, the present invention provides a system that can detect anomalies in real time and immediately suggest a response by making full use of advanced AI-based analysis and generation technologies.
[0051] The following describes the processing flow.
[0052] Step 1:
[0053] The server receives video data in real time from the monitoring device. The video data in each frame is preprocessed to a state suitable for analysis by removing noise and adjusting the resolution.
[0054] Step 2:
[0055] The server inputs pre-processed video frames into an AI model and performs analysis to detect abnormal behavior. The AI model evaluates the abnormality of specific behavioral patterns and records the data if it determines that a behavior is abnormal.
[0056] Step 3:
[0057] Based on the analysis results from the previous step, the server uses a facial recognition algorithm to identify individuals in the video and compares them with a database to determine if they are suspicious. If a suspicious person is identified, the server reports this information to the management system.
[0058] Step 4:
[0059] When abnormal behavior or suspicious activity is detected, the server uses generation technology to generate a situation-appropriate warning message or dialogue message. This is done using generation AI to create messages in natural language.
[0060] Step 5:
[0061] The server sends the generated warning message to the terminal. At the same time, it notifies the terminal of any detected abnormal behavior or suspicious individuals, preparing for necessary countermeasures to be taken.
[0062] Step 6:
[0063] The terminal outputs a warning message received from the server, alerting administrators and those nearby through audio or screen display. If necessary, it notifies the appropriate authorities if direct action is required.
[0064] Step 7:
[0065] Users review the information provided by their devices and decide on a course of action based on the actual situation. This may include going to the scene or contacting the police as needed. To maximize the system's effectiveness, users regularly optimize settings and parameters.
[0066] (Example 1)
[0067] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0068] In modern society, the use of monitoring devices is increasing, but conventional systems have limitations in terms of human monitoring and real-time response. Therefore, there is a need for rapid and automatic detection and response to abnormal behavior. This invention aims to solve these problems by effectively detecting anomalies and providing appropriate warnings.
[0069] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0070] In this invention, the server includes an information processing means that receives time-series data from a monitoring device and analyzes the data to determine anomalies; a generation means that generates notification information based on the anomalies determined by the information processing means; and a display means that outputs the notification information. This enables rapid detection of anomalies and the transmission of appropriate information.
[0071] A "monitoring device" is an electronic device used to detect and record environmental conditions in real time.
[0072] "Time-series data" refers to a series of data acquired based on specific time intervals, allowing for analysis along a time axis.
[0073] "Information processing means" refers to a method or apparatus for analyzing received data and performing calculations and judgments to determine whether or not there are any abnormalities.
[0074] "Generation means" refers to a method or apparatus for generating notification information based on data obtained from information processing means.
[0075] "Notification information" refers to information containing messages about abnormalities or events requiring attention, and is generated to encourage prompt action.
[0076] "Display means" refers to a device or method for outputting generated notification information visually or audibly.
[0077] "Transmission means" refers to a method or device equipped with communication functions for transmitting warning or abnormal information to a specific terminal or device.
[0078] An "operating terminal" is an electronic device used to receive notifications and information transmitted by the monitoring system, and it provides an interface with the user.
[0079] A "feature" is a unique attribute or element used to identify a particular object or person.
[0080] The present invention is implemented by a system including a monitoring device, a server, a terminal, and a user. This system can automatically detect and notify of anomalies based on time-series data.
[0081] server
[0082] The server first receives time-series data from the monitoring device. A common protocol is used for this reception, and programming languages such as Python and JavaScript are often used for processing. The server preprocesses the data using libraries such as OpenCV and NumPy to identify anomalies. At this stage, machine learning frameworks such as Tensorflow and PyTorch are used to analyze abnormal behavior.
[0083] terminal
[0084] The device receives notification information sent from the server. The received information is output as audio using Text-to-Speech (TTS) technology. The device also visualizes the notification in an easy-to-understand way for the user using JavaScript or React. The device reports to relevant organizations as needed.
[0085] User
[0086] Users can check the situation on-site based on notification information provided by their devices and take appropriate action. For example, after receiving a warning, a user can immediately go to the site and take specific measures. Users can also configure the system's operation according to their individual needs. This includes features such as adjusting the type and frequency of notifications.
[0087] Specific example
[0088] Examples in unmanned stores
[0089] A monitoring device installed in an unmanned store captures a customer entering the store wearing a helmet. The server detects this and generates a warning message, "Please remove your helmet," using a "generative AI model." The terminal plays this message through its speaker, prompting the customer to take action.
[0090] Examples in private residences
[0091] If a suspicious person is detected in a private residence, the server generates sounds and voices to simulate the presence of a resident. These generated voices are played from a terminal, creating psychological pressure on the intruder and preventing them from entering.
[0092] Example of a prompt
[0093] "If there is a suspicious person in the parking lot, please generate audio that makes it sound as if a resident is at home."
[0094] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0095] Step 1:
[0096] The server receives time-series data from the monitoring device. It also receives real-time video data transmitted from the monitoring device as input. This data is transferred using the Real-Time Streaming Protocol (RTSP). The received video data is then prepared to be broken down into individual frames.
[0097] Step 2:
[0098] The server preprocesses the received video data. It receives the video frames decomposed in step 1 as input. During this process, it uses the "OpenCV" library to perform noise reduction and resolution adjustment. Specifically, it uses Gaussian blur to reduce noise and improve analysis accuracy. The output is a preprocessed, clean video frame.
[0099] Step 3:
[0100] The server analyzes abnormal behavior using preprocessed data. It receives the preprocessed frames obtained in step 2 as input. Using machine learning libraries such as "TensorFlow," an AI model identifies abnormal behavior. The analysis yields an anomaly detection result as output. This result includes the behavioral characteristics of the suspicious person and a determination of whether that behavior is abnormal or not.
[0101] Step 4:
[0102] The server generates notification information based on the detection results. The anomaly detection results obtained in step 3 are used as input. Utilizing the "Generative AI Model," the AI generates a situation-appropriate warning message based on the prompt text. For example, if suspicious activity is detected, a message such as "A suspicious person has been detected. Please take action." is generated. The generated warning message is obtained as output.
[0103] Step 5:
[0104] The terminal receives and displays the generated warning message. It receives the warning message sent from the server in step 4 as input. The terminal uses Text-to-Speech (TTS) technology to output the message as audio. It also displays a text notification on the screen. The warning content is communicated to the user both audibly and visually as output.
[0105] Step 6:
[0106] The user makes decisions based on notifications from their device. The warning message received in step 5 is used as input. The user takes appropriate action depending on the situation, such as rushing to the scene or notifying the relevant authorities. The output is that safety is ensured and actions are taken to resolve the problem.
[0107] (Application Example 1)
[0108] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0109] Conventional security systems have struggled to effectively detect abnormal behavior and suspicious individuals, and to respond quickly. In particular, there is a need for highly effective deterrents to prevent intruders from entering homes. Furthermore, if the types of alarms and voice messages are not appropriately generated, the deterrent effect against suspicious individuals is diminished.
[0110] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0111] In this invention, the server includes data processing means for receiving video information from a monitoring system and analyzing the video information to detect abnormal behavior; information generation means for generating a warning message based on the abnormal behavior detected by the data processing means; and sound generation means for outputting the generated audio signal to a public address system in real time. This makes it possible to effectively deter suspicious individuals within a residence and to respond quickly.
[0112] A "surveillance system" refers to any device that acquires video information and detects suspicious individuals or abnormal behavior.
[0113] "Video information" refers to visual data acquired by surveillance systems, and specifically to digital video data that is subject to analysis.
[0114] "Data processing means" refers to software or hardware used to analyze acquired video information and detect abnormal behavior or suspicious individuals.
[0115] "Information generation means" refers to processes and technologies for creating appropriate warning messages based on detected abnormal behavior.
[0116] "Signal output means" refers to a device or means that has the function of outputting the generated warning message as an audio signal or visual signal to communicate it to the outside.
[0117] "Information and communication means" refers to communication technology used to transmit data via a network to management information terminals and other devices for reporting status.
[0118] "Sound generation means" refers to the technology and equipment used to generate a message created by an information generation means as an audio signal and output it to a public address system.
[0119] A "public address system" refers to a device used to amplify generated audio signals in real time and to amplify sound within a designated space.
[0120] In order to implement this invention, it is necessary to construct a network system that includes a monitoring system, a data processing server, an information generation device, an acoustic public address system, and a management information terminal.
[0121] The server receives video information transmitted in real time from the monitoring system. The program uses Python, TensorFlow, and OpenCV to decompose the video information frame by frame and perform preprocessing such as noise reduction and resolution adjustment. Next, data processing tools analyze these frames to detect abnormal behavior and suspicious individuals. An AI model is used for this analysis, identifying anomalies more quickly and accurately than conventional security systems. The generation technology uses a generation AI model (e.g., OpenAI®'s GPT) to generate warning messages corresponding to abnormal behavior.
[0122] The terminal receives the generated warning message and generates an audio signal via an acoustic generation device. This audio signal is transmitted to an acoustic sound amplifier via Bluetooth or Wi-Fi connection and played back in real time by the sound amplifier. This can exert a psychological deterrent effect on suspicious individuals.
[0123] Users can review the information they receive through the management terminal and adjust the settings of the sound system if necessary. Throughout this entire process, the system provides quick and effective security measures within the residence.
[0124] As a concrete example, consider a scenario where an intruder enters a residential yard. The AI model instantly detects the intruder and generates a warning message: "Is anyone there? If you are, please leave immediately." This message is quickly played through an audible loudspeaker, effectively deterring further intrusion.
[0125] Examples of prompts to send to a generative AI model:
[0126] "A suspicious person has been detected. Please generate a warning message that says, 'Is anyone there? If you are, please leave immediately.'"
[0127] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0128] Step 1:
[0129] The server receives video information from the surveillance system in real time. The input is streamed video data from the surveillance camera. The server breaks down this data into frames and performs preprocessing, such as noise reduction and resolution adjustment, using OpenCV. The output is preprocessed, clear video frame data.
[0130] Step 2:
[0131] The server inputs pre-processed video frames into an AI model to analyze for abnormal behavior and the presence of suspicious individuals. The input is pre-processed video frame data. The AI model uses TensorFlow to analyze abnormal behavior and determine the presence of suspicious individuals. This process outputs a judgment as an analysis result, indicating whether or not abnormal behavior was detected.
[0132] Step 3:
[0133] The server uses a generative AI model to generate a warning message based on the abnormal behavior detected by the AI model. The input is the analysis result from the AI model. The generative AI model (e.g., OpenAI's GPT) generates an appropriate warning message based on the prompt text. The output is the generated warning message in text format.
[0134] Step 4:
[0135] The terminal receives a warning message from the server, converts it into physical speech, and sends it to the sound amplifier. The input is the text data of the generated warning message. The terminal generates the speech signal using Text-to-Speech (TTS) technology. The generated speech signal is output to the sound amplifier via Bluetooth or Wi-Fi.
[0136] Step 5:
[0137] The user checks for suspicious person notifications on the management information terminal and monitors the operation status of the sound amplification system. Inputs include warning notification information and audio output status from the terminal. The user adjusts system settings and takes additional measures as needed. Outputs are information about adjustments and countermeasures taken by the user.
[0138] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0139] This invention combines a system that analyzes video data from a monitoring device to detect abnormal behavior with an emotion engine that recognizes user emotions. This system uses a server, terminal, and emotion engine to comprehensively analyze abnormality detection and the user's emotional state, thereby enabling the generation and notification of more accurate warning messages.
[0140] server
[0141] The server receives video data acquired from monitoring devices in real time and analyzes abnormal behavior using an AI model. Based on the analysis results, if abnormal behavior is detected, it generates a warning message using generation technology. The generated message is adjusted as appropriate depending on the type of abnormal behavior. In addition, the server evaluates the user's emotional state via an emotion engine and adjusts the warning message based on changes in emotion.
[0142] Emotional Engine
[0143] The emotion engine analyzes the user's voice and behavioral data to determine their emotional state in real time. This information is sent to the server and used to generate warning messages, ensuring that the most effective messages are delivered to the user.
[0144] terminal
[0145] The device receives warning messages and emotional information sent from the server and presents them to the user. For example, if the user is in a stressful situation, it can output a more empathetic message. Furthermore, based on information from the emotion engine, the device provides the platform with support tailored to the user's emotions.
[0146] Specific example
[0147] Application examples in unmanned retail stores
[0148] In an unmanned store, if a monitoring device detects abnormal behavior in the store, the server uses generation technology to create an appropriate warning message. For example, if the emotion engine detects that a customer is agitated, the terminal will display a message to the customer saying, "We will assist you shortly, please relax and wait."
[0149] Examples of applications in private residences
[0150] In a private residence, if a surveillance device detects an intruder, the server immediately generates a warning message. Furthermore, if the emotion engine detects the resident's anxiety, the terminal plays a reassuring message such as, "We have contacted the police until safety is confirmed," to put the resident at ease.
[0151] This system provides sophisticated interactions that take user emotions into consideration, enabling more effective detection and response to anomalies.
[0152] The following describes the processing flow.
[0153] Step 1:
[0154] The server receives video data from the monitoring device in real time, breaks down the data into frames, and performs preprocessing. This includes noise reduction and resolution adjustment, and converting the data into a format suitable for the AI model.
[0155] Step 2:
[0156] The server analyzes pre-processed frames using an AI model to detect abnormal behavior. It evaluates patterns of abnormal behavior, and if a relevant behavior is detected, it identifies the data and records it in the management system.
[0157] Step 3:
[0158] The server uses an emotion engine to analyze the user's emotional state from their voice and video. The emotion engine identifies emotions such as stress, anxiety, and joy, and generates an emotion rating based on that.
[0159] Step 4:
[0160] Based on the detection results of anomalies and sentiment evaluations, the server uses generation technology to customize warning messages. For example, if the user is highly anxious, it generates a reassuring message and adjusts it accordingly.
[0161] Step 5:
[0162] The server sends the generated warning message and emotional state information to the terminal. This provides real-time warnings while also offering notifications that are sensitive to the user's emotions.
[0163] Step 6:
[0164] The device uses messages received from the server to output warnings as audio or text. If the user is emotionally unstable, a calmer-toned message will be played to help stabilize the situation.
[0165] Step 7:
[0166] Users will review information from their devices and take appropriate action if necessary. If the anomaly is serious, they will take measures such as contacting the police and optimize the system through device settings.
[0167] (Example 2)
[0168] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0169] Conventional monitoring systems often fail to provide appropriate messages tailored to the user's emotions and circumstances when detecting abnormal behavior and issuing warnings, resulting in limited effectiveness of the warnings. In particular, in situations where users experience anxiety or stress, uniform warning messages fail to provide sufficient reassurance, making it difficult to effectively ensure user safety.
[0170] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0171] In this invention, the server includes an analysis means for receiving video information from a monitoring device and analyzing the video information to detect abnormal behavior; a generation means for generating warning information based on the abnormal behavior detected by the analysis means; and an adjustment means for adjusting the warning information based on the user's emotions. This makes it possible to generate and present an optimal warning message according to the type of abnormal behavior and the user's emotional state.
[0172] A "monitoring device" is a device that acquires video information from the environment and transmits it to a server.
[0173] "Visual information" refers to visual data acquired by surveillance equipment and is used to analyze abnormal behavior.
[0174] "Analysis means" refers to algorithms and technologies used to process received video information and identify abnormal behavior.
[0175] A "generation means" refers to a process or system that creates appropriate warning information based on the results of abnormal behavior obtained through analysis means.
[0176] A "modification mechanism" is a system that modifies the generated warning information according to the user's emotional state and presents it in the most optimal format.
[0177] "Display means" refers to devices or methods for presenting adjusted warning information to users.
[0178] "Communication means" refers to technologies and systems for transmitting detected abnormal behavior and related information to information terminals.
[0179] An "information terminal" is a device used to receive information about abnormal behavior via communication means.
[0180] This invention relates to a system that analyzes video information acquired from surveillance equipment to detect abnormal behavior and generates and presents warning messages that take emotional information into consideration. This system is primarily implemented using a server, terminals, and an emotion engine.
[0181] The server receives video information in real time from the monitoring equipment. The video information is transmitted to the server using a streaming protocol. The server analyzes this video information using a deep learning framework (e.g., TensorFlow or PyTorch) to detect objects and recognize movements, thereby identifying abnormal behavior. Based on these results, the server uses a generative AI model to generate appropriate warning messages based on the prompt text. For example, a warning message such as "Intruder is currently entering" might be created based on this prompt.
[0182] The emotion engine analyzes the user's voice data and behavioral information to evaluate their emotional state. This evaluation is performed using voice recognition software and an emotion analysis API. This determines the user's emotional state and provides this information to the server. The server then uses this emotion information to adjust warning messages to the most appropriate form.
[0183] The device receives customized warning messages sent from the server and presents them to the user. The device displays the tailored warning information to the user through the screen and audio. For example, if the user is feeling anxious, a reassuring message such as "Please wait until safety is confirmed" will be played via audio output. The device can also use information from its emotion engine to provide necessary support.
[0184] As a specific example, if a monitoring device in an unmanned store detects a particular abnormal behavior, the server generates a warning message with the prompt, "The monitoring camera has detected a confused customer. Please generate a relaxing message." In this way, the present invention enables the detection of abnormal behavior as well as the provision of effective warning messages tailored to the user's emotions.
[0185] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0186] Step 1:
[0187] The server receives video information from the monitoring equipment in real time. This input is transmitted encrypted as a data stream. The server decodes this data using a streaming protocol and stores it as video frames to prepare for analysis.
[0188] Step 2:
[0189] The server analyzes the received video frames using a deep learning framework (e.g., TensorFlow or PyTorch). The input data is provided to a model that detects objects and recognizes actions frame by frame. If the AI model detects abnormal behavior, it outputs the result as an alert and generates data including the type and location of the abnormal behavior.
[0190] Step 3:
[0191] The server utilizes a generative AI model to generate warning messages based on detected abnormal behavior. In this step, the AI model receives a prompt as input and generates an appropriate message in response to that prompt. For example, a warning message such as "Intruder is currently entering the premises" might be created.
[0192] Step 4:
[0193] The emotion engine collects user voice and behavioral data and evaluates their emotional state in real time. Voice data is converted to text using speech recognition software and input into the emotion analysis API. This outputs the user's emotional parameters, which are then sent to the server.
[0194] Step 5:
[0195] The server optimizes warning messages using emotion parameters sent from the emotion engine. The input for this optimization is the original warning message and the user's emotion parameters. Based on this data, the server adjusts the tone and content of the warning message and outputs the adjusted message.
[0196] Step 6:
[0197] The terminal receives a pre-arranged warning message from the server and presents it to the user. The input to this process is the pre-arranged warning message, which the terminal displays on the screen or outputs as audio, depending on the method. Specifically, depending on the user's status, a message such as "Please wait until safety is confirmed" may be displayed.
[0198] Step 7:
[0199] The device provides necessary support based on information from the emotion engine. The input for this support is the user's emotional state. The device performs actions such as automatically launching a mental health support application to provide additional support as needed by the user.
[0200] (Application Example 2)
[0201] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0202] In abnormal behavior detection systems using monitoring devices, conventional methods have the problem of not being able to issue warning messages that take into account the user's emotions. Therefore, warning messages are not optimized according to the user's emotional state, resulting in a lack of reassurance and effective interaction. Furthermore, improving the accuracy of generating appropriate messages according to the type of abnormal behavior is also necessary.
[0203] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0204] In this invention, the server includes a computing means for receiving video signals from a monitoring device and analyzing the video signals to detect abnormal behavior; a generating means for generating warning information based on the abnormal behavior detected by the computing means; and an emotion recognition means for recognizing the user's emotional state in real time. As a result, warning messages are adjusted based on the user's emotional information, enabling more effective and reassuring interactions.
[0205] A "surveillance device" is a device that collects video signals and transmits those videos to a computing device for analysis.
[0206] "Video signal" refers to a series of video data captured by a monitoring device, and is used for analyzing abnormal behavior.
[0207] "Computation means" refers to hardware or software that analyzes video signals and performs processing to detect abnormal behavior.
[0208] "Abnormal behavior" refers to actions that deviate from normal behavioral patterns and are judged by the system as actions that require vigilance or attention.
[0209] "Warning information" refers to notifications, cautionary messages, or signals generated when abnormal behavior is detected.
[0210] "Generation means" refers to a function that executes a process to create warning information based on the detection results of abnormal behavior.
[0211] "Emotion recognition means" refers to methods for analyzing and acquiring a user's emotional state in real time.
[0212] "Communication means" refers to a network or interface for transmitting detection results and user emotion information to an external control device or similar.
[0213] The "adjustment mechanism" is a function that modifies warning information to the most optimal content based on acquired emotional information.
[0214] This invention relates to a system in which a server receives video signals from a monitoring device and analyzes abnormal behavior using computational means. The server analyzes the video signals in real time using image processing libraries such as OpenCV, and when it detects abnormal behavior, it generates warning information using generative AI technology. This generated warning information is adjusted to take into account the user's emotional state.
[0215] To recognize the user's emotional state, an emotion recognition system is used. This system uses libraries such as the EmotionRecognition library to estimate emotions in real time from the user's voice and behavioral data. The emotion information is sent to a server and used when creating prompt statements based on a generative AI model.
[0216] As a concrete example, suppose the system is implemented in an apartment management setting, and a suspicious person is detected by the monitoring device. If the system detects that a resident is feeling anxious, the server generates a message saying, "The area is safe, please rest assured." This message is individually tailored to alleviate the resident's anxiety.
[0217] Examples of prompts for a generative AI model include:
[0218] "An intruder has been detected. Please generate a message to reassure the residents and alleviate their anxiety."
[0219] In this way, the system can provide effective warning information tailored to the user's situation and emotional state, creating a reassuring environment.
[0220] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0221] Step 1:
[0222] The server receives video signals from the monitoring device. The input is a video signal, and the output is analyzable image data. Image processing libraries such as OpenCV are used to decompose the video signal into frames, preparing it for analysis.
[0223] Step 2:
[0224] The server uses computational tools to analyze video frames to determine the presence or absence of abnormal behavior. The input is frame data, and the output is the detection result of abnormal behavior. An AI model is used to compare and analyze the abnormal behavior with normal behavior patterns to determine if abnormal behavior exists.
[0225] Step 3:
[0226] If abnormal behavior is detected, the server generates warning information using a generative AI model. The input is the result of the detected abnormal behavior, and the output is the generated warning message. The generative AI model creates a valid prompt sentence according to the nature of the abnormality and constructs the warning message using natural language processing techniques.
[0227] Step 4:
[0228] The server analyzes the user's emotional state in real time using emotion recognition technology. Input is user voice and behavioral data, and output is estimated emotion data. The EmotionRecognition library is used to identify emotions from factors such as voice tone and facial expressions.
[0229] Step 5:
[0230] The server adjusts warning messages based on estimated sentiment data. The input is the warning message and sentiment data, and the output is the adjusted, personalized message. By changing the tone and content of the message, for example, to softer language, the server reduces user anxiety.
[0231] Step 6:
[0232] The device presents the user with a pre-arranged warning message. The input is the final message sent to the device, and the output is the information provided to the user. The device communicates the message to the user and provides reassurance through screen display and audio output.
[0233] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0234] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0235] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0236] [Second Embodiment]
[0237] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0238] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0239] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0240] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0241] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0242] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0243] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0244] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0245] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0246] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0247] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0248] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0249] To implement the present invention, a system is constructed that includes a monitoring device, a server, a terminal, and a user as its main components. The specific functions of each component and their embodiments are described below.
[0250] server
[0251] The server receives video data transmitted in real time from the monitoring device and analyzes this data using an AI model. First, the video data is broken down into frames, and each frame is preprocessed. This preprocessing includes noise reduction and resolution adjustment. Next, the preprocessed frames are analyzed by artificial intelligence for the presence or absence of abnormal behavior, and suspicious individuals are identified by comparing them with a registered database.
[0252] generation technology
[0253] When abnormal behavior or suspicious individuals are detected, the server automatically generates appropriate warning messages using generation technology. Depending on the type of abnormal behavior, it adjusts the content and format of the message to create warning messages or interactive communications.
[0254] terminal
[0255] The terminal receives warning messages and anomaly detection notifications from the server and displays them to the administrator. Received warning messages are output in real time as audio or text to alert the administrator and those nearby. Furthermore, the terminal notifies the police and other relevant agencies as needed.
[0256] User
[0257] The user assesses the situation based on the information provided by the device and responds as needed. If an anomaly is detected, the user checks the situation on site and takes action, such as contacting the police. The user can also customize the system settings to suit their individual needs, including, for example, the type and frequency of notifications and the content of the warning messages generated.
[0258] Specific example
[0259] Examples in unmanned stores
[0260] An unmanned store has installed monitoring equipment, which is constantly monitored by a server. If a customer enters the store wearing a helmet, the server immediately detects this and generates a warning message using generation technology: "Excuse me, could you please remove your helmet?" This message is played through a speaker on the terminal, prompting the customer to remove their helmet.
[0261] Examples in private residences
[0262] When a surveillance device installed in the parking lot of a private residence detects an intruder, the server immediately analyzes the information and generates sounds and voices to make it appear as if the resident is home. These generated voices are played from a terminal, exerting a psychological deterrent effect on the intruder and preventing unauthorized entry.
[0263] Thus, the present invention provides a system that can detect anomalies in real time and immediately suggest a response by making full use of advanced AI-based analysis and generation technologies.
[0264] The following describes the processing flow.
[0265] Step 1:
[0266] The server receives video data in real time from the monitoring device. The video data in each frame is preprocessed to a state suitable for analysis by removing noise and adjusting the resolution.
[0267] Step 2:
[0268] The server inputs pre-processed video frames into an AI model and performs analysis to detect abnormal behavior. The AI model evaluates the abnormality of specific behavioral patterns and records the data if it determines that a behavior is abnormal.
[0269] Step 3:
[0270] Based on the analysis results from the previous step, the server uses a facial recognition algorithm to identify individuals in the video and compares them with a database to determine if they are suspicious. If a suspicious person is identified, the server reports this information to the management system.
[0271] Step 4:
[0272] When abnormal behavior or suspicious activity is detected, the server uses generation technology to generate a situation-appropriate warning message or dialogue message. This is done using generation AI to create messages in natural language.
[0273] Step 5:
[0274] The server sends the generated warning message to the terminal. At the same time, it notifies the terminal of any detected abnormal behavior or suspicious individuals, preparing for necessary countermeasures to be taken.
[0275] Step 6:
[0276] The terminal outputs a warning message received from the server, alerting administrators and those nearby through audio or screen display. If necessary, it notifies the appropriate authorities if direct action is required.
[0277] Step 7:
[0278] Users review the information provided by their devices and decide on a course of action based on the actual situation. This may include going to the scene or contacting the police as needed. To maximize the system's effectiveness, users regularly optimize settings and parameters.
[0279] (Example 1)
[0280] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0281] In modern society, while the scenarios where surveillance devices are used are increasing, the conventional systems have limitations in human surveillance and real-time response. Therefore, there is a demand for rapid and automatic detection and response to abnormal behaviors. The purpose of the present invention is to solve these problems by effectively detecting abnormalities and giving appropriate warnings.
[0282] The specific processing by the specific processing unit 290 of the data processing device 12 in Embodiment 1 is realized by the following respective means.
[0283] In this invention, the server includes information processing means for receiving time-series data from a surveillance device, analyzing the data to discriminate abnormalities, generation means for generating notification information based on the abnormalities discriminated by the information processing means, and display means for outputting the notification information. Thereby, abnormalities can be rapidly detected and appropriate information can be transmitted.
[0284] A "surveillance device" is an electronics device for detecting and recording the situation in an environment in real time.
[0285] "Time-series data" is a series of data acquired based on a specific time interval and can be analyzed along the time axis.
[0286] "Information processing means" is a method or device for analyzing the received data and performing calculations and judgments to determine the presence or absence of abnormalities.
[0287] "Generation means" is a method or device for generating notification information based on the data obtained from the information processing means.
[0288] "Notification information" is information including messages about abnormalities or events that require attention and is generated to prompt a rapid response.
[0289] "Display means" is a device or method for outputting the generated notification information visually or aurally.
[0290] "Transmission means" refers to a method or device equipped with communication functions for transmitting warning or abnormal information to a specific terminal or device.
[0291] An "operating terminal" is an electronic device used to receive notifications and information transmitted by the monitoring system, and it provides an interface with the user.
[0292] A "feature" is a unique attribute or element used to identify a particular object or person.
[0293] The present invention is implemented by a system including a monitoring device, a server, a terminal, and a user. This system can automatically detect and notify of anomalies based on time-series data.
[0294] server
[0295] The server first receives time-series data from the monitoring device. A common protocol is used for this reception, and programming languages such as Python and JavaScript are often used for processing. The server preprocesses the data using libraries such as OpenCV and NumPy to identify anomalies. At this stage, machine learning frameworks such as TensorFlow and PyTorch are used to analyze abnormal behavior.
[0296] terminal
[0297] The device receives notification information sent from the server. The received information is output as audio using Text-to-Speech (TTS) technology. The device also visualizes the notification in an easy-to-understand way for the user using JavaScript or React. The device reports to relevant organizations as needed.
[0298] User
[0299] The user checks the on-site situation based on the notification information provided by the terminal and takes appropriate actions. For example, after confirming a warning, the user can immediately go to the site and take specific measures. The user can also set the operation of the system according to individual needs. This includes functions such as adjusting the type and frequency of notifications.
[0300] Specific examples
[0301] Example in an unmanned vending store
[0302] The monitoring device installed in the unmanned vending store captures the situation where a customer enters the store wearing a helmet. The server detects this and uses the "generative AI model" to generate a warning message saying "Customer, please remove your helmet." The terminal plays this message through the speaker to prompt the customer to act.
[0303] Example in a private residence
[0304] When a suspicious person is detected in a private residence, the server generates ambient sounds and voices that pretend someone is at home. This generated audio is streamed from the terminal to exert psychological pressure on the suspicious person and prevent intrusion.
[0305] Examples of prompt texts
[0306] "If there is a suspicious person in the parking lot, please generate audio that makes it seem like the resident is at home."
[0307] The flow of the specific process in Example 1 will be described using FIG. 11.
[0308] Step 1:
[0309] The server receives time-series data from the monitoring device. As input, it receives real-time video data transmitted from the monitoring device. This data is transferred using the "Real-Time Streaming Protocol (RTSP)". The received video data is prepared to be decomposed into frames.
[0310] Step 2:
[0311] The server preprocesses the received video data. It receives the video frames decomposed in step 1 as input. During this process, it uses the "OpenCV" library to perform noise reduction and resolution adjustment. Specifically, it uses Gaussian blur to reduce noise and improve analysis accuracy. The output is a preprocessed, clean video frame.
[0312] Step 3:
[0313] The server analyzes abnormal behavior using preprocessed data. It receives the preprocessed frames obtained in step 2 as input. Using machine learning libraries such as "TensorFlow," an AI model identifies abnormal behavior. The analysis yields an anomaly detection result as output. This result includes the behavioral characteristics of the suspicious person and a determination of whether that behavior is abnormal or not.
[0314] Step 4:
[0315] The server generates notification information based on the detection results. The anomaly detection results obtained in step 3 are used as input. Utilizing the "Generative AI Model," the AI generates a situation-appropriate warning message based on the prompt text. For example, if suspicious activity is detected, a message such as "A suspicious person has been detected. Please take action." is generated. The generated warning message is obtained as output.
[0316] Step 5:
[0317] The terminal receives and displays the generated warning message. It receives the warning message sent from the server in step 4 as input. The terminal uses Text-to-Speech (TTS) technology to output the message as audio. It also displays a text notification on the screen. The warning content is communicated to the user both audibly and visually as output.
[0318] Step 6:
[0319] The user makes decisions based on notifications from their device. The warning message received in step 5 is used as input. The user takes appropriate action depending on the situation, such as rushing to the scene or notifying the relevant authorities. The output is that safety is ensured and actions are taken to resolve the problem.
[0320] (Application Example 1)
[0321] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0322] Conventional security systems have struggled to effectively detect abnormal behavior and suspicious individuals, and to respond quickly. In particular, there is a need for highly effective deterrents to prevent intruders from entering homes. Furthermore, if the types of alarms and voice messages are not appropriately generated, the deterrent effect against suspicious individuals is diminished.
[0323] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0324] In this invention, the server includes data processing means for receiving video information from a monitoring system and analyzing the video information to detect abnormal behavior; information generation means for generating a warning message based on the abnormal behavior detected by the data processing means; and sound generation means for outputting the generated audio signal to a public address system in real time. This makes it possible to effectively deter suspicious individuals within a residence and to respond quickly.
[0325] A "surveillance system" refers to any device that acquires video information and detects suspicious individuals or abnormal behavior.
[0326] "Video information" refers to visual data acquired by surveillance systems, and specifically to digital video data that is subject to analysis.
[0327] "Data processing means" refers to software or hardware used to analyze acquired video information and detect abnormal behavior or suspicious individuals.
[0328] "Information generation means" refers to processes and technologies for creating appropriate warning messages based on detected abnormal behavior.
[0329] "Signal output means" refers to a device or means that has the function of outputting the generated warning message as an audio signal or visual signal to communicate it to the outside.
[0330] "Information and communication means" refers to communication technology used to transmit data via a network to management information terminals and other devices for reporting status.
[0331] "Sound generation means" refers to the technology and equipment used to generate a message created by an information generation means as an audio signal and output it to a public address system.
[0332] A "public address system" refers to a device used to amplify generated audio signals in real time and to amplify sound within a designated space.
[0333] In order to implement this invention, it is necessary to construct a network system that includes a monitoring system, a data processing server, an information generation device, an acoustic public address system, and a management information terminal.
[0334] The server receives video information transmitted in real time from the monitoring system. The program uses Python, TensorFlow, and OpenCV to decompose the video information frame by frame and perform preprocessing such as noise reduction and resolution adjustment. Next, data processing tools analyze these frames to detect abnormal behavior and suspicious individuals. An AI model is used for this analysis, identifying anomalies more quickly and accurately than conventional security systems. The generation technology utilizes a generative AI model (e.g., OpenAI's GPT) to generate warning messages corresponding to abnormal behavior.
[0335] The terminal receives the generated warning message and generates an audio signal via an acoustic generation device. This audio signal is transmitted to an acoustic sound amplifier via Bluetooth or Wi-Fi connection and played back in real time by the sound amplifier. This can exert a psychological deterrent effect on suspicious individuals.
[0336] Users can review the information they receive through the management terminal and adjust the settings of the sound system if necessary. Throughout this entire process, the system provides quick and effective security measures within the residence.
[0337] As a concrete example, consider a scenario where an intruder enters a residential yard. The AI model instantly detects the intruder and generates a warning message: "Is anyone there? If you are, please leave immediately." This message is quickly played through an audible loudspeaker, effectively deterring further intrusion.
[0338] Examples of prompts to send to a generative AI model:
[0339] "A suspicious person has been detected. Please generate a warning message that says, 'Is anyone there? If you are, please leave immediately.'"
[0340] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0341] Step 1:
[0342] The server receives video information from the surveillance system in real time. The input is streamed video data from the surveillance camera. The server breaks down this data into frames and performs preprocessing, such as noise reduction and resolution adjustment, using OpenCV. The output is preprocessed, clear video frame data.
[0343] Step 2:
[0344] The server inputs pre-processed video frames into an AI model to analyze for abnormal behavior and the presence of suspicious individuals. The input is pre-processed video frame data. The AI model uses TensorFlow to analyze abnormal behavior and determine the presence of suspicious individuals. This process outputs a judgment as an analysis result, indicating whether or not abnormal behavior was detected.
[0345] Step 3:
[0346] The server uses a generative AI model to generate a warning message based on the abnormal behavior detected by the AI model. The input is the analysis result from the AI model. The generative AI model (e.g., OpenAI's GPT) generates an appropriate warning message based on the prompt text. The output is the generated warning message in text format.
[0347] Step 4:
[0348] The terminal receives a warning message from the server, converts it into physical speech, and sends it to the sound amplifier. The input is the text data of the generated warning message. The terminal generates the speech signal using Text-to-Speech (TTS) technology. The generated speech signal is output to the sound amplifier via Bluetooth or Wi-Fi.
[0349] Step 5:
[0350] The user checks for suspicious person notifications on the management information terminal and monitors the operation status of the sound amplification system. Inputs include warning notification information and audio output status from the terminal. The user adjusts system settings and takes additional measures as needed. Outputs are information about adjustments and countermeasures taken by the user.
[0351] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0352] This invention combines a system that analyzes video data from a monitoring device to detect abnormal behavior with an emotion engine that recognizes user emotions. This system uses a server, terminal, and emotion engine to comprehensively analyze abnormality detection and the user's emotional state, thereby enabling the generation and notification of more accurate warning messages.
[0353] server
[0354] The server receives video data acquired from monitoring devices in real time and analyzes abnormal behavior using an AI model. Based on the analysis results, if abnormal behavior is detected, it generates a warning message using generation technology. The generated message is adjusted as appropriate depending on the type of abnormal behavior. In addition, the server evaluates the user's emotional state via an emotion engine and adjusts the warning message based on changes in emotion.
[0355] Emotional Engine
[0356] The emotion engine analyzes the user's voice and behavioral data to determine their emotional state in real time. This information is sent to the server and used to generate warning messages, ensuring that the most effective messages are delivered to the user.
[0357] terminal
[0358] The device receives warning messages and emotional information sent from the server and presents them to the user. For example, if the user is in a stressful situation, it can output a more empathetic message. Furthermore, based on information from the emotion engine, the device provides the platform with support tailored to the user's emotions.
[0359] Specific example
[0360] Application examples in unmanned retail stores
[0361] In an unmanned store, if a monitoring device detects abnormal behavior in the store, the server uses generation technology to create an appropriate warning message. For example, if the emotion engine detects that a customer is agitated, the terminal will display a message to the customer saying, "We will assist you shortly, please relax and wait."
[0362] Examples of applications in private residences
[0363] In a private residence, if a surveillance device detects an intruder, the server immediately generates a warning message. Furthermore, if the emotion engine detects the resident's anxiety, the terminal plays a reassuring message such as, "We have contacted the police until safety is confirmed," to put the resident at ease.
[0364] This system provides sophisticated interactions that take user emotions into consideration, enabling more effective detection and response to anomalies.
[0365] The following describes the processing flow.
[0366] Step 1:
[0367] The server receives video data from the monitoring device in real time, breaks down the data into frames, and performs preprocessing. This includes noise reduction and resolution adjustment, and converting the data into a format suitable for the AI model.
[0368] Step 2:
[0369] The server analyzes pre-processed frames using an AI model to detect abnormal behavior. It evaluates patterns of abnormal behavior, and if a relevant behavior is detected, it identifies the data and records it in the management system.
[0370] Step 3:
[0371] The server uses an emotion engine to analyze the user's emotional state from their voice and video. The emotion engine identifies emotions such as stress, anxiety, and joy, and generates an emotion rating based on that.
[0372] Step 4:
[0373] Based on the detection results of anomalies and sentiment evaluations, the server uses generation technology to customize warning messages. For example, if the user is highly anxious, it generates a reassuring message and adjusts it accordingly.
[0374] Step 5:
[0375] The server sends the generated warning message and emotional state information to the terminal. This provides real-time warnings while also offering notifications that are sensitive to the user's emotions.
[0376] Step 6:
[0377] The device uses messages received from the server to output warnings as audio or text. If the user is emotionally unstable, a calmer-toned message will be played to help stabilize the situation.
[0378] Step 7:
[0379] Users will review information from their devices and take appropriate action if necessary. If the anomaly is serious, they will take measures such as contacting the police and optimize the system through device settings.
[0380] (Example 2)
[0381] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0382] Conventional monitoring systems often fail to provide appropriate messages tailored to the user's emotions and circumstances when detecting abnormal behavior and issuing warnings, resulting in limited effectiveness of the warnings. In particular, in situations where users experience anxiety or stress, uniform warning messages fail to provide sufficient reassurance, making it difficult to effectively ensure user safety.
[0383] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0384] In this invention, the server includes an analysis means for receiving video information from a monitoring device and analyzing the video information to detect abnormal behavior; a generation means for generating warning information based on the abnormal behavior detected by the analysis means; and an adjustment means for adjusting the warning information based on the user's emotions. This makes it possible to generate and present an optimal warning message according to the type of abnormal behavior and the user's emotional state.
[0385] A "monitoring device" is a device that acquires video information from the environment and transmits it to a server.
[0386] "Visual information" refers to visual data acquired by surveillance equipment and is used to analyze abnormal behavior.
[0387] "Analysis means" refers to algorithms and technologies used to process received video information and identify abnormal behavior.
[0388] A "generation means" refers to a process or system that creates appropriate warning information based on the results of abnormal behavior obtained through analysis means.
[0389] A "modification mechanism" is a system that modifies the generated warning information according to the user's emotional state and presents it in the most optimal format.
[0390] "Display means" refers to devices or methods for presenting adjusted warning information to users.
[0391] "Communication means" refers to technologies and systems for transmitting detected abnormal behavior and related information to information terminals.
[0392] An "information terminal" is a device used to receive information about abnormal behavior via communication means.
[0393] This invention relates to a system that analyzes video information acquired from surveillance equipment to detect abnormal behavior and generates and presents warning messages that take emotional information into consideration. This system is primarily implemented using a server, terminals, and an emotion engine.
[0394] The server receives video information in real time from the monitoring equipment. The video information is transmitted to the server using a streaming protocol. The server analyzes this video information using a deep learning framework (e.g., TensorFlow or PyTorch) to detect objects and recognize movements, thereby identifying abnormal behavior. Based on these results, the server uses a generative AI model to generate appropriate warning messages based on the prompt text. For example, a warning message such as "Intruder is currently entering" might be created based on this prompt.
[0395] The emotion engine analyzes the user's voice data and behavioral information to evaluate their emotional state. This evaluation is performed using voice recognition software and an emotion analysis API. This determines the user's emotional state and provides this information to the server. The server then uses this emotion information to adjust warning messages to the most appropriate form.
[0396] The device receives customized warning messages sent from the server and presents them to the user. The device displays the tailored warning information to the user through the screen and audio. For example, if the user is feeling anxious, a reassuring message such as "Please wait until safety is confirmed" will be played via audio output. The device can also use information from its emotion engine to provide necessary support.
[0397] As a specific example, if a monitoring device in an unmanned store detects a particular abnormal behavior, the server generates a warning message with the prompt, "The monitoring camera has detected a confused customer. Please generate a relaxing message." In this way, the present invention enables the detection of abnormal behavior as well as the provision of effective warning messages tailored to the user's emotions.
[0398] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0399] Step 1:
[0400] The server receives video information from the monitoring equipment in real time. This input is transmitted encrypted as a data stream. The server decodes this data using a streaming protocol and stores it as video frames to prepare for analysis.
[0401] Step 2:
[0402] The server analyzes the received video frames using a deep learning framework (e.g., TensorFlow or PyTorch). The input data is provided to a model that detects objects and recognizes actions frame by frame. If the AI model detects abnormal behavior, it outputs the result as an alert and generates data including the type and location of the abnormal behavior.
[0403] Step 3:
[0404] The server utilizes a generative AI model to generate warning messages based on detected abnormal behavior. In this step, the AI model receives a prompt as input and generates an appropriate message in response to that prompt. For example, a warning message such as "Intruder is currently entering the premises" might be created.
[0405] Step 4:
[0406] The emotion engine collects user voice and behavioral data and evaluates their emotional state in real time. Voice data is converted to text using speech recognition software and input into the emotion analysis API. This outputs the user's emotional parameters, which are then sent to the server.
[0407] Step 5:
[0408] The server optimizes warning messages using emotion parameters sent from the emotion engine. The input for this optimization is the original warning message and the user's emotion parameters. Based on this data, the server adjusts the tone and content of the warning message and outputs the adjusted message.
[0409] Step 6:
[0410] The terminal receives a pre-arranged warning message from the server and presents it to the user. The input to this process is the pre-arranged warning message, which the terminal displays on the screen or outputs as audio, depending on the method. Specifically, depending on the user's status, a message such as "Please wait until safety is confirmed" may be displayed.
[0411] Step 7:
[0412] The device provides necessary support based on information from the emotion engine. The input for this support is the user's emotional state. The device performs actions such as automatically launching a mental health support application to provide additional support as needed by the user.
[0413] (Application Example 2)
[0414] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the smart glasses 214 as the "terminal".
[0415] In abnormal behavior detection systems using monitoring devices, conventional methods have the problem of not being able to issue warning messages that take into account the user's emotions. Therefore, warning messages are not optimized according to the user's emotional state, resulting in a lack of reassurance and effective interaction. Furthermore, improving the accuracy of generating appropriate messages according to the type of abnormal behavior is also necessary.
[0416] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0417] In this invention, the server includes a computing means for receiving video signals from a monitoring device and analyzing the video signals to detect abnormal behavior; a generating means for generating warning information based on the abnormal behavior detected by the computing means; and an emotion recognition means for recognizing the user's emotional state in real time. As a result, warning messages are adjusted based on the user's emotional information, enabling more effective and reassuring interactions.
[0418] A "surveillance device" is a device that collects video signals and transmits those videos to a computing device for analysis.
[0419] "Video signal" refers to a series of video data captured by a monitoring device, and is used for analyzing abnormal behavior.
[0420] "Computation means" refers to hardware or software that analyzes video signals and performs processing to detect abnormal behavior.
[0421] "Abnormal behavior" refers to actions that deviate from normal behavioral patterns and are judged by the system as actions that require vigilance or attention.
[0422] "Warning information" refers to notifications, cautionary messages, or signals generated when abnormal behavior is detected.
[0423] "Generation means" refers to a function that executes a process to create warning information based on the detection results of abnormal behavior.
[0424] "Emotion recognition means" refers to methods for analyzing and acquiring a user's emotional state in real time.
[0425] "Communication means" refers to a network or interface for transmitting detection results and user emotion information to an external control device or similar.
[0426] The "adjustment mechanism" is a function that modifies warning information to the most optimal content based on acquired emotional information.
[0427] This invention relates to a system in which a server receives video signals from a monitoring device and analyzes abnormal behavior using computational means. The server analyzes the video signals in real time using image processing libraries such as OpenCV, and when it detects abnormal behavior, it generates warning information using generative AI technology. This generated warning information is adjusted to take into account the user's emotional state.
[0428] To recognize the user's emotional state, an emotion recognition system is used. This system uses libraries such as the EmotionRecognition library to estimate emotions in real time from the user's voice and behavioral data. The emotion information is sent to a server and used when creating prompt statements based on a generative AI model.
[0429] As a concrete example, suppose the system is implemented in an apartment management setting, and a suspicious person is detected by the monitoring device. If the system detects that a resident is feeling anxious, the server generates a message saying, "The area is safe, please rest assured." This message is individually tailored to alleviate the resident's anxiety.
[0430] Examples of prompts for a generative AI model include:
[0431] "An intruder has been detected. Please generate a message to reassure the residents and alleviate their anxiety."
[0432] In this way, the system can provide effective warning information tailored to the user's situation and emotional state, creating a reassuring environment.
[0433] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0434] Step 1:
[0435] The server receives video signals from the monitoring device. The input is a video signal, and the output is analyzable image data. Image processing libraries such as OpenCV are used to decompose the video signal into frames, preparing it for analysis.
[0436] Step 2:
[0437] The server uses computational tools to analyze video frames to determine the presence or absence of abnormal behavior. The input is frame data, and the output is the detection result of abnormal behavior. An AI model is used to compare and analyze the abnormal behavior with normal behavior patterns to determine if abnormal behavior exists.
[0438] Step 3:
[0439] If abnormal behavior is detected, the server generates warning information using a generative AI model. The input is the result of the detected abnormal behavior, and the output is the generated warning message. The generative AI model creates a valid prompt sentence according to the nature of the abnormality and constructs the warning message using natural language processing techniques.
[0440] Step 4:
[0441] The server analyzes the user's emotional state in real time using emotion recognition technology. Input is user voice and behavioral data, and output is estimated emotion data. The EmotionRecognition library is used to identify emotions from factors such as voice tone and facial expressions.
[0442] Step 5:
[0443] The server adjusts warning messages based on estimated sentiment data. The input is the warning message and sentiment data, and the output is the adjusted, personalized message. By changing the tone and content of the message, for example, to softer language, the server reduces user anxiety.
[0444] Step 6:
[0445] The device presents the user with a pre-arranged warning message. The input is the final message sent to the device, and the output is the information provided to the user. The device communicates the message to the user and provides reassurance through screen display and audio output.
[0446] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0447] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0448] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0449] [Third Embodiment]
[0450] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0451] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0452] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0453] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0454] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0455] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0456] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0457] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0458] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0459] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0460] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0461] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0462] To implement the present invention, a system is constructed that includes a monitoring device, a server, a terminal, and a user as its main components. The specific functions of each component and their embodiments are described below.
[0463] server
[0464] The server receives video data transmitted in real time from the monitoring device and analyzes this data using an AI model. First, the video data is broken down into frames, and each frame is preprocessed. This preprocessing includes noise reduction and resolution adjustment. Next, the preprocessed frames are analyzed by artificial intelligence for the presence or absence of abnormal behavior, and suspicious individuals are identified by comparing them with a registered database.
[0465] generation technology
[0466] When abnormal behavior or suspicious individuals are detected, the server automatically generates appropriate warning messages using generation technology. Depending on the type of abnormal behavior, it adjusts the content and format of the message to create warning messages or interactive communications.
[0467] terminal
[0468] The terminal receives warning messages and anomaly detection notifications from the server and displays them to the administrator. Received warning messages are output in real time as audio or text to alert the administrator and those nearby. Furthermore, the terminal notifies the police and other relevant agencies as needed.
[0469] User
[0470] The user assesses the situation based on the information provided by the device and responds as needed. If an anomaly is detected, the user checks the situation on site and takes action, such as contacting the police. The user can also customize the system settings to suit their individual needs, including, for example, the type and frequency of notifications and the content of the warning messages generated.
[0471] Specific example
[0472] Examples in unmanned stores
[0473] An unmanned store has installed monitoring equipment, which is constantly monitored by a server. If a customer enters the store wearing a helmet, the server immediately detects this and generates a warning message using generation technology: "Excuse me, could you please remove your helmet?" This message is played through a speaker on the terminal, prompting the customer to remove their helmet.
[0474] Examples in private residences
[0475] When a surveillance device installed in the parking lot of a private residence detects an intruder, the server immediately analyzes the information and generates sounds and voices to make it appear as if the resident is home. These generated voices are played from a terminal, exerting a psychological deterrent effect on the intruder and preventing unauthorized entry.
[0476] Thus, the present invention provides a system that can detect anomalies in real time and immediately suggest a response by making full use of advanced AI-based analysis and generation technologies.
[0477] The following describes the processing flow.
[0478] Step 1:
[0479] The server receives video data in real time from the monitoring device. The video data in each frame is preprocessed to a state suitable for analysis by removing noise and adjusting the resolution.
[0480] Step 2:
[0481] The server inputs pre-processed video frames into an AI model and performs analysis to detect abnormal behavior. The AI model evaluates the abnormality of specific behavioral patterns and records the data if it determines that a behavior is abnormal.
[0482] Step 3:
[0483] Based on the analysis results from the previous step, the server uses a facial recognition algorithm to identify individuals in the video and compares them with a database to determine if they are suspicious. If a suspicious person is identified, the server reports this information to the management system.
[0484] Step 4:
[0485] When abnormal behavior or suspicious activity is detected, the server uses generation technology to generate a situation-appropriate warning message or dialogue message. This is done using generation AI to create messages in natural language.
[0486] Step 5:
[0487] The server sends the generated warning message to the terminal. At the same time, it notifies the terminal of any detected abnormal behavior or suspicious individuals, preparing for necessary countermeasures to be taken.
[0488] Step 6:
[0489] The terminal outputs a warning message received from the server, alerting administrators and those nearby through audio or screen display. If necessary, it notifies the appropriate authorities if direct action is required.
[0490] Step 7:
[0491] Users review the information provided by their devices and decide on a course of action based on the actual situation. This may include going to the scene or contacting the police as needed. To maximize the system's effectiveness, users regularly optimize settings and parameters.
[0492] (Example 1)
[0493] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0494] In modern society, the use of monitoring devices is increasing, but conventional systems have limitations in terms of human monitoring and real-time response. Therefore, there is a need for rapid and automatic detection and response to abnormal behavior. This invention aims to solve these problems by effectively detecting anomalies and providing appropriate warnings.
[0495] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0496] In this invention, the server includes an information processing means that receives time-series data from a monitoring device and analyzes the data to determine anomalies; a generation means that generates notification information based on the anomalies determined by the information processing means; and a display means that outputs the notification information. This enables rapid detection of anomalies and the transmission of appropriate information.
[0497] A "monitoring device" is an electronic device used to detect and record environmental conditions in real time.
[0498] "Time-series data" refers to a series of data acquired based on specific time intervals, allowing for analysis along a time axis.
[0499] "Information processing means" refers to a method or apparatus for analyzing received data and performing calculations and judgments to determine whether or not there are any abnormalities.
[0500] "Generation means" refers to a method or apparatus for generating notification information based on data obtained from information processing means.
[0501] "Notification information" refers to information containing messages about abnormalities or events requiring attention, and is generated to encourage prompt action.
[0502] "Display means" refers to a device or method for outputting generated notification information visually or audibly.
[0503] "Transmission means" refers to a method or device equipped with communication functions for transmitting warning or abnormal information to a specific terminal or device.
[0504] An "operating terminal" is an electronic device used to receive notifications and information transmitted by the monitoring system, and it provides an interface with the user.
[0505] A "feature" is a unique attribute or element used to identify a particular object or person.
[0506] The present invention is implemented by a system including a monitoring device, a server, a terminal, and a user. This system can automatically detect and notify of anomalies based on time-series data.
[0507] server
[0508] The server first receives time-series data from the monitoring device. A common protocol is used for this reception, and programming languages such as Python and JavaScript are often used for processing. The server preprocesses the data using libraries such as OpenCV and NumPy to identify anomalies. At this stage, machine learning frameworks such as TensorFlow and PyTorch are used to analyze abnormal behavior.
[0509] terminal
[0510] The device receives notification information sent from the server. The received information is output as audio using Text-to-Speech (TTS) technology. The device also visualizes the notification in an easy-to-understand way for the user using JavaScript or React. The device reports to relevant organizations as needed.
[0511] User
[0512] Users can check the situation on-site based on notification information provided by their devices and take appropriate action. For example, after receiving a warning, a user can immediately go to the site and take specific measures. Users can also configure the system's operation according to their individual needs. This includes features such as adjusting the type and frequency of notifications.
[0513] Specific example
[0514] Examples in unmanned stores
[0515] A monitoring device installed in an unmanned store captures a customer entering the store wearing a helmet. The server detects this and generates a warning message, "Please remove your helmet," using a "generative AI model." The terminal plays this message through its speaker, prompting the customer to take action.
[0516] Examples in private residences
[0517] If a suspicious person is detected in a private residence, the server generates sounds and voices to simulate the presence of a resident. These generated voices are played from a terminal, creating psychological pressure on the intruder and preventing them from entering.
[0518] Example of a prompt
[0519] "If there is a suspicious person in the parking lot, please generate audio that makes it sound as if a resident is at home."
[0520] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0521] Step 1:
[0522] The server receives time-series data from the monitoring device. It also receives real-time video data transmitted from the monitoring device as input. This data is transferred using the Real-Time Streaming Protocol (RTSP). The received video data is then prepared to be broken down into individual frames.
[0523] Step 2:
[0524] The server preprocesses the received video data. It receives the video frames decomposed in step 1 as input. During this process, it uses the "OpenCV" library to perform noise reduction and resolution adjustment. Specifically, it uses Gaussian blur to reduce noise and improve analysis accuracy. The output is a preprocessed, clean video frame.
[0525] Step 3:
[0526] The server analyzes abnormal behavior using preprocessed data. It receives the preprocessed frames obtained in step 2 as input. Using machine learning libraries such as "TensorFlow," an AI model identifies abnormal behavior. The analysis yields an anomaly detection result as output. This result includes the behavioral characteristics of the suspicious person and a determination of whether that behavior is abnormal or not.
[0527] Step 4:
[0528] The server generates notification information based on the detection results. The anomaly detection results obtained in step 3 are used as input. Utilizing the "Generative AI Model," the AI generates a situation-appropriate warning message based on the prompt text. For example, if suspicious activity is detected, a message such as "A suspicious person has been detected. Please take action." is generated. The generated warning message is obtained as output.
[0529] Step 5:
[0530] The terminal receives and displays the generated warning message. It receives the warning message sent from the server in step 4 as input. The terminal uses Text-to-Speech (TTS) technology to output the message as audio. It also displays a text notification on the screen. The warning content is communicated to the user both audibly and visually as output.
[0531] Step 6:
[0532] The user makes decisions based on notifications from their device. The warning message received in step 5 is used as input. The user takes appropriate action depending on the situation, such as rushing to the scene or notifying the relevant authorities. The output is that safety is ensured and actions are taken to resolve the problem.
[0533] (Application Example 1)
[0534] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0535] Conventional security systems have struggled to effectively detect abnormal behavior and suspicious individuals, and to respond quickly. In particular, there is a need for highly effective deterrents to prevent intruders from entering homes. Furthermore, if the types of alarms and voice messages are not appropriately generated, the deterrent effect against suspicious individuals is diminished.
[0536] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0537] In this invention, the server includes data processing means for receiving video information from a monitoring system and analyzing the video information to detect abnormal behavior; information generation means for generating a warning message based on the abnormal behavior detected by the data processing means; and sound generation means for outputting the generated audio signal to a public address system in real time. This makes it possible to effectively deter suspicious individuals within a residence and to respond quickly.
[0538] A "surveillance system" refers to any device that acquires video information and detects suspicious individuals or abnormal behavior.
[0539] "Video information" refers to visual data acquired by surveillance systems, and specifically to digital video data that is subject to analysis.
[0540] "Data processing means" refers to software or hardware used to analyze acquired video information and detect abnormal behavior or suspicious individuals.
[0541] "Information generation means" refers to processes and technologies for creating appropriate warning messages based on detected abnormal behavior.
[0542] "Signal output means" refers to a device or means that has the function of outputting the generated warning message as an audio signal or visual signal to communicate it to the outside.
[0543] "Information and communication means" refers to communication technology used to transmit data via a network to management information terminals and other devices for reporting status.
[0544] "Sound generation means" refers to the technology and equipment used to generate a message created by an information generation means as an audio signal and output it to a public address system.
[0545] A "public address system" refers to a device used to amplify generated audio signals in real time and to amplify sound within a designated space.
[0546] In order to implement this invention, it is necessary to construct a network system that includes a monitoring system, a data processing server, an information generation device, an acoustic public address system, and a management information terminal.
[0547] The server receives video information transmitted in real time from the monitoring system. The program uses Python, TensorFlow, and OpenCV to decompose the video information frame by frame and perform preprocessing such as noise reduction and resolution adjustment. Next, data processing tools analyze these frames to detect abnormal behavior and suspicious individuals. An AI model is used for this analysis, identifying anomalies more quickly and accurately than conventional security systems. The generation technology utilizes a generative AI model (e.g., OpenAI's GPT) to generate warning messages corresponding to abnormal behavior.
[0548] The terminal receives the generated warning message and generates an audio signal via an acoustic generation device. This audio signal is transmitted to an acoustic sound amplifier via Bluetooth or Wi-Fi connection and played back in real time by the sound amplifier. This can exert a psychological deterrent effect on suspicious individuals.
[0549] Users can review the information they receive through the management terminal and adjust the settings of the sound system if necessary. Throughout this entire process, the system provides quick and effective security measures within the residence.
[0550] As a concrete example, consider a scenario where an intruder enters a residential yard. The AI model instantly detects the intruder and generates a warning message: "Is anyone there? If you are, please leave immediately." This message is quickly played through an audible loudspeaker, effectively deterring further intrusion.
[0551] Examples of prompts to send to a generative AI model:
[0552] "A suspicious person has been detected. Please generate a warning message that says, 'Is anyone there? If you are, please leave immediately.'"
[0553] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0554] Step 1:
[0555] The server receives video information from the surveillance system in real time. The input is streamed video data from the surveillance camera. The server breaks down this data into frames and performs preprocessing, such as noise reduction and resolution adjustment, using OpenCV. The output is preprocessed, clear video frame data.
[0556] Step 2:
[0557] The server inputs pre-processed video frames into an AI model to analyze for abnormal behavior and the presence of suspicious individuals. The input is pre-processed video frame data. The AI model uses TensorFlow to analyze abnormal behavior and determine the presence of suspicious individuals. This process outputs a judgment as an analysis result, indicating whether or not abnormal behavior was detected.
[0558] Step 3:
[0559] The server uses a generative AI model to generate a warning message based on the abnormal behavior detected by the AI model. The input is the analysis result from the AI model. The generative AI model (e.g., OpenAI's GPT) generates an appropriate warning message based on the prompt text. The output is the generated warning message in text format.
[0560] Step 4:
[0561] The terminal receives a warning message from the server, converts it into physical speech, and sends it to the sound amplifier. The input is the text data of the generated warning message. The terminal generates the speech signal using Text-to-Speech (TTS) technology. The generated speech signal is output to the sound amplifier via Bluetooth or Wi-Fi.
[0562] Step 5:
[0563] The user checks for suspicious person notifications on the management information terminal and monitors the operation status of the sound amplification system. Inputs include warning notification information and audio output status from the terminal. The user adjusts system settings and takes additional measures as needed. Outputs are information about adjustments and countermeasures taken by the user.
[0564] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0565] This invention combines a system that analyzes video data from a monitoring device to detect abnormal behavior with an emotion engine that recognizes user emotions. This system uses a server, terminal, and emotion engine to comprehensively analyze abnormality detection and the user's emotional state, thereby enabling the generation and notification of more accurate warning messages.
[0566] server
[0567] The server receives video data acquired from monitoring devices in real time and analyzes abnormal behavior using an AI model. Based on the analysis results, if abnormal behavior is detected, it generates a warning message using generation technology. The generated message is adjusted as appropriate depending on the type of abnormal behavior. In addition, the server evaluates the user's emotional state via an emotion engine and adjusts the warning message based on changes in emotion.
[0568] Emotional Engine
[0569] The emotion engine analyzes the user's voice and behavioral data to determine their emotional state in real time. This information is sent to the server and used to generate warning messages, ensuring that the most effective messages are delivered to the user.
[0570] terminal
[0571] The device receives warning messages and emotional information sent from the server and presents them to the user. For example, if the user is in a stressful situation, it can output a more empathetic message. Furthermore, based on information from the emotion engine, the device provides the platform with support tailored to the user's emotions.
[0572] Specific example
[0573] Application examples in unmanned retail stores
[0574] In an unmanned store, if a monitoring device detects abnormal behavior in the store, the server uses generation technology to create an appropriate warning message. For example, if the emotion engine detects that a customer is agitated, the terminal will display a message to the customer saying, "We will assist you shortly, please relax and wait."
[0575] Examples of applications in private residences
[0576] In a private residence, if a surveillance device detects an intruder, the server immediately generates a warning message. Furthermore, if the emotion engine detects the resident's anxiety, the terminal plays a reassuring message such as, "We have contacted the police until safety is confirmed," to put the resident at ease.
[0577] This system provides sophisticated interactions that take user emotions into consideration, enabling more effective detection and response to anomalies.
[0578] The following describes the processing flow.
[0579] Step 1:
[0580] The server receives video data from the monitoring device in real time, breaks down the data into frames, and performs preprocessing. This includes noise reduction and resolution adjustment, and converting the data into a format suitable for the AI model.
[0581] Step 2:
[0582] The server analyzes pre-processed frames using an AI model to detect abnormal behavior. It evaluates patterns of abnormal behavior, and if a relevant behavior is detected, it identifies the data and records it in the management system.
[0583] Step 3:
[0584] The server uses an emotion engine to analyze the user's emotional state from their voice and video. The emotion engine identifies emotions such as stress, anxiety, and joy, and generates an emotion rating based on that.
[0585] Step 4:
[0586] Based on the detection results of anomalies and sentiment evaluations, the server uses generation technology to customize warning messages. For example, if the user is highly anxious, it generates a reassuring message and adjusts it accordingly.
[0587] Step 5:
[0588] The server sends the generated warning message and emotional state information to the terminal. This provides real-time warnings while also offering notifications that are sensitive to the user's emotions.
[0589] Step 6:
[0590] The device uses messages received from the server to output warnings as audio or text. If the user is emotionally unstable, a calmer-toned message will be played to help stabilize the situation.
[0591] Step 7:
[0592] Users will review information from their devices and take appropriate action if necessary. If the anomaly is serious, they will take measures such as contacting the police and optimize the system through device settings.
[0593] (Example 2)
[0594] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0595] Conventional monitoring systems often fail to provide appropriate messages tailored to the user's emotions and circumstances when detecting abnormal behavior and issuing warnings, resulting in limited effectiveness of the warnings. In particular, in situations where users experience anxiety or stress, uniform warning messages fail to provide sufficient reassurance, making it difficult to effectively ensure user safety.
[0596] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0597] In this invention, the server includes an analysis means for receiving video information from a monitoring device and analyzing the video information to detect abnormal behavior; a generation means for generating warning information based on the abnormal behavior detected by the analysis means; and an adjustment means for adjusting the warning information based on the user's emotions. This makes it possible to generate and present an optimal warning message according to the type of abnormal behavior and the user's emotional state.
[0598] A "monitoring device" is a device that acquires video information from the environment and transmits it to a server.
[0599] "Visual information" refers to visual data acquired by surveillance equipment and is used to analyze abnormal behavior.
[0600] "Analysis means" refers to algorithms and technologies used to process received video information and identify abnormal behavior.
[0601] A "generation means" refers to a process or system that creates appropriate warning information based on the results of abnormal behavior obtained through analysis means.
[0602] A "modification mechanism" is a system that modifies the generated warning information according to the user's emotional state and presents it in the most optimal format.
[0603] "Display means" refers to devices or methods for presenting adjusted warning information to users.
[0604] "Communication means" refers to technologies and systems for transmitting detected abnormal behavior and related information to information terminals.
[0605] An "information terminal" is a device used to receive information about abnormal behavior via communication means.
[0606] This invention relates to a system that analyzes video information acquired from surveillance equipment to detect abnormal behavior and generates and presents warning messages that take emotional information into consideration. This system is primarily implemented using a server, terminals, and an emotion engine.
[0607] The server receives video information in real time from the monitoring equipment. The video information is transmitted to the server using a streaming protocol. The server analyzes this video information using a deep learning framework (e.g., TensorFlow or PyTorch) to detect objects and recognize movements, thereby identifying abnormal behavior. Based on these results, the server uses a generative AI model to generate appropriate warning messages based on the prompt text. For example, a warning message such as "Intruder is currently entering" might be created based on this prompt.
[0608] The emotion engine analyzes the user's voice data and behavioral information to evaluate their emotional state. This evaluation is performed using voice recognition software and an emotion analysis API. This determines the user's emotional state and provides this information to the server. The server then uses this emotion information to adjust warning messages to the most appropriate form.
[0609] The device receives customized warning messages sent from the server and presents them to the user. The device displays the tailored warning information to the user through the screen and audio. For example, if the user is feeling anxious, a reassuring message such as "Please wait until safety is confirmed" will be played via audio output. The device can also use information from its emotion engine to provide necessary support.
[0610] As a specific example, if a monitoring device in an unmanned store detects a particular abnormal behavior, the server generates a warning message with the prompt, "The monitoring camera has detected a confused customer. Please generate a relaxing message." In this way, the present invention enables the detection of abnormal behavior as well as the provision of effective warning messages tailored to the user's emotions.
[0611] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0612] Step 1:
[0613] The server receives video information from the monitoring equipment in real time. This input is transmitted encrypted as a data stream. The server decodes this data using a streaming protocol and stores it as video frames to prepare for analysis.
[0614] Step 2:
[0615] The server analyzes the received video frames using a deep learning framework (e.g., TensorFlow or PyTorch). The input data is provided to a model that detects objects and recognizes actions frame by frame. If the AI model detects abnormal behavior, it outputs the result as an alert and generates data including the type and location of the abnormal behavior.
[0616] Step 3:
[0617] The server utilizes a generative AI model to generate warning messages based on detected abnormal behavior. In this step, the AI model receives a prompt as input and generates an appropriate message in response to that prompt. For example, a warning message such as "Intruder is currently entering the premises" might be created.
[0618] Step 4:
[0619] The emotion engine collects user voice and behavioral data and evaluates their emotional state in real time. Voice data is converted to text using speech recognition software and input into the emotion analysis API. This outputs the user's emotional parameters, which are then sent to the server.
[0620] Step 5:
[0621] The server optimizes warning messages using emotion parameters sent from the emotion engine. The input for this optimization is the original warning message and the user's emotion parameters. Based on this data, the server adjusts the tone and content of the warning message and outputs the adjusted message.
[0622] Step 6:
[0623] The terminal receives a pre-arranged warning message from the server and presents it to the user. The input to this process is the pre-arranged warning message, which the terminal displays on the screen or outputs as audio, depending on the method. Specifically, depending on the user's status, a message such as "Please wait until safety is confirmed" may be displayed.
[0624] Step 7:
[0625] The device provides necessary support based on information from the emotion engine. The input for this support is the user's emotional state. The device performs actions such as automatically launching a mental health support application to provide additional support as needed by the user.
[0626] (Application Example 2)
[0627] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0628] In abnormal behavior detection systems using monitoring devices, conventional methods have the problem of not being able to issue warning messages that take into account the user's emotions. Therefore, warning messages are not optimized according to the user's emotional state, resulting in a lack of reassurance and effective interaction. Furthermore, improving the accuracy of generating appropriate messages according to the type of abnormal behavior is also necessary.
[0629] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0630] In this invention, the server includes a computing means for receiving video signals from a monitoring device and analyzing the video signals to detect abnormal behavior; a generating means for generating warning information based on the abnormal behavior detected by the computing means; and an emotion recognition means for recognizing the user's emotional state in real time. As a result, warning messages are adjusted based on the user's emotional information, enabling more effective and reassuring interactions.
[0631] A "surveillance device" is a device that collects video signals and transmits those videos to a computing device for analysis.
[0632] "Video signal" refers to a series of video data captured by a monitoring device, and is used for analyzing abnormal behavior.
[0633] "Computation means" refers to hardware or software that analyzes video signals and performs processing to detect abnormal behavior.
[0634] "Abnormal behavior" refers to actions that deviate from normal behavioral patterns and are judged by the system as actions that require vigilance or attention.
[0635] "Warning information" refers to notifications, cautionary messages, or signals generated when abnormal behavior is detected.
[0636] "Generation means" refers to a function that executes a process to create warning information based on the detection results of abnormal behavior.
[0637] "Emotion recognition means" refers to methods for analyzing and acquiring a user's emotional state in real time.
[0638] "Communication means" refers to a network or interface for transmitting detection results and user emotion information to an external control device or similar.
[0639] The "adjustment mechanism" is a function that modifies warning information to the most optimal content based on acquired emotional information.
[0640] This invention relates to a system in which a server receives video signals from a monitoring device and analyzes abnormal behavior using computational means. The server analyzes the video signals in real time using image processing libraries such as OpenCV, and when it detects abnormal behavior, it generates warning information using generative AI technology. This generated warning information is adjusted to take into account the user's emotional state.
[0641] To recognize the user's emotional state, an emotion recognition system is used. This system uses libraries such as the EmotionRecognition library to estimate emotions in real time from the user's voice and behavioral data. The emotion information is sent to a server and used when creating prompt statements based on a generative AI model.
[0642] As a concrete example, suppose the system is implemented in an apartment management setting, and a suspicious person is detected by the monitoring device. If the system detects that a resident is feeling anxious, the server generates a message saying, "The area is safe, please rest assured." This message is individually tailored to alleviate the resident's anxiety.
[0643] Examples of prompts for a generative AI model include:
[0644] "An intruder has been detected. Please generate a message to reassure the residents and alleviate their anxiety."
[0645] In this way, the system can provide effective warning information tailored to the user's situation and emotional state, creating a reassuring environment.
[0646] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0647] Step 1:
[0648] The server receives video signals from the monitoring device. The input is a video signal, and the output is analyzable image data. Image processing libraries such as OpenCV are used to decompose the video signal into frames, preparing it for analysis.
[0649] Step 2:
[0650] The server uses computational tools to analyze video frames to determine the presence or absence of abnormal behavior. The input is frame data, and the output is the detection result of abnormal behavior. An AI model is used to compare and analyze the abnormal behavior with normal behavior patterns to determine if abnormal behavior exists.
[0651] Step 3:
[0652] If abnormal behavior is detected, the server generates warning information using a generative AI model. The input is the result of the detected abnormal behavior, and the output is the generated warning message. The generative AI model creates a valid prompt sentence according to the nature of the abnormality and constructs the warning message using natural language processing techniques.
[0653] Step 4:
[0654] The server analyzes the user's emotional state in real time using emotion recognition technology. Input is user voice and behavioral data, and output is estimated emotion data. The EmotionRecognition library is used to identify emotions from factors such as voice tone and facial expressions.
[0655] Step 5:
[0656] The server adjusts warning messages based on estimated sentiment data. The input is the warning message and sentiment data, and the output is the adjusted, personalized message. By changing the tone and content of the message, for example, to softer language, the server reduces user anxiety.
[0657] Step 6:
[0658] The device presents the user with a pre-arranged warning message. The input is the final message sent to the device, and the output is the information provided to the user. The device communicates the message to the user and provides reassurance through screen display and audio output.
[0659] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0660] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0661] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0662] [Fourth Embodiment]
[0663] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0664] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0665] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0666] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0667] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0668] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0669] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0670] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0671] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0672] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0673] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0674] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0675] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0676] To implement the present invention, a system is constructed that includes a monitoring device, a server, a terminal, and a user as its main components. The specific functions of each component and their embodiments are described below.
[0677] server
[0678] The server receives video data transmitted in real time from the monitoring device and analyzes this data using an AI model. First, the video data is broken down into frames, and each frame is preprocessed. This preprocessing includes noise reduction and resolution adjustment. Next, the preprocessed frames are analyzed by artificial intelligence for the presence or absence of abnormal behavior, and suspicious individuals are identified by comparing them with a registered database.
[0679] generation technology
[0680] When abnormal behavior or suspicious individuals are detected, the server automatically generates appropriate warning messages using generation technology. Depending on the type of abnormal behavior, it adjusts the content and format of the message to create warning messages or interactive communications.
[0681] terminal
[0682] The terminal receives warning messages and anomaly detection notifications from the server and displays them to the administrator. Received warning messages are output in real time as audio or text to alert the administrator and those nearby. Furthermore, the terminal notifies the police and other relevant agencies as needed.
[0683] User
[0684] The user assesses the situation based on the information provided by the device and responds as needed. If an anomaly is detected, the user checks the situation on site and takes action, such as contacting the police. The user can also customize the system settings to suit their individual needs, including, for example, the type and frequency of notifications and the content of the warning messages generated.
[0685] Specific example
[0686] Examples in unmanned stores
[0687] An unmanned store has installed monitoring equipment, which is constantly monitored by a server. If a customer enters the store wearing a helmet, the server immediately detects this and generates a warning message using generation technology: "Excuse me, could you please remove your helmet?" This message is played through a speaker on the terminal, prompting the customer to remove their helmet.
[0688] Examples in private residences
[0689] When a surveillance device installed in the parking lot of a private residence detects an intruder, the server immediately analyzes the information and generates sounds and voices to make it appear as if the resident is home. These generated voices are played from a terminal, exerting a psychological deterrent effect on the intruder and preventing unauthorized entry.
[0690] Thus, the present invention provides a system that can detect anomalies in real time and immediately suggest a response by making full use of advanced AI-based analysis and generation technologies.
[0691] The following describes the processing flow.
[0692] Step 1:
[0693] The server receives video data in real time from the monitoring device. The video data in each frame is preprocessed to a state suitable for analysis by removing noise and adjusting the resolution.
[0694] Step 2:
[0695] The server inputs pre-processed video frames into an AI model and performs analysis to detect abnormal behavior. The AI model evaluates the abnormality of specific behavioral patterns and records the data if it determines that a behavior is abnormal.
[0696] Step 3:
[0697] Based on the analysis results from the previous step, the server uses a facial recognition algorithm to identify individuals in the video and compares them with a database to determine if they are suspicious. If a suspicious person is identified, the server reports this information to the management system.
[0698] Step 4:
[0699] When abnormal behavior or suspicious activity is detected, the server uses generation technology to generate a situation-appropriate warning message or dialogue message. This is done using generation AI to create messages in natural language.
[0700] Step 5:
[0701] The server sends the generated warning message to the terminal. At the same time, it notifies the terminal of any detected abnormal behavior or suspicious individuals, preparing for necessary countermeasures to be taken.
[0702] Step 6:
[0703] The terminal outputs a warning message received from the server, alerting administrators and those nearby through audio or screen display. If necessary, it notifies the appropriate authorities if direct action is required.
[0704] Step 7:
[0705] Users review the information provided by their devices and decide on a course of action based on the actual situation. This may include going to the scene or contacting the police as needed. To maximize the system's effectiveness, users regularly optimize settings and parameters.
[0706] (Example 1)
[0707] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0708] In modern society, the use of monitoring devices is increasing, but conventional systems have limitations in terms of human monitoring and real-time response. Therefore, there is a need for rapid and automatic detection and response to abnormal behavior. This invention aims to solve these problems by effectively detecting anomalies and providing appropriate warnings.
[0709] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0710] In this invention, the server includes an information processing means that receives time-series data from a monitoring device and analyzes the data to determine anomalies; a generation means that generates notification information based on the anomalies determined by the information processing means; and a display means that outputs the notification information. This enables rapid detection of anomalies and the transmission of appropriate information.
[0711] A "monitoring device" is an electronic device used to detect and record environmental conditions in real time.
[0712] "Time-series data" refers to a series of data acquired based on specific time intervals, allowing for analysis along a time axis.
[0713] "Information processing means" refers to a method or apparatus for analyzing received data and performing calculations and judgments to determine whether or not there are any abnormalities.
[0714] "Generation means" refers to a method or apparatus for generating notification information based on data obtained from information processing means.
[0715] "Notification information" refers to information containing messages about abnormalities or events requiring attention, and is generated to encourage prompt action.
[0716] "Display means" refers to a device or method for outputting generated notification information visually or audibly.
[0717] "Transmission means" refers to a method or device equipped with communication functions for transmitting warning or abnormal information to a specific terminal or device.
[0718] An "operating terminal" is an electronic device used to receive notifications and information transmitted by the monitoring system, and it provides an interface with the user.
[0719] A "feature" is a unique attribute or element used to identify a particular object or person.
[0720] The present invention is implemented by a system including a monitoring device, a server, a terminal, and a user. This system can automatically detect and notify of anomalies based on time-series data.
[0721] server
[0722] The server first receives time-series data from the monitoring device. A common protocol is used for this reception, and programming languages such as Python and JavaScript are often used for processing. The server preprocesses the data using libraries such as OpenCV and NumPy to identify anomalies. At this stage, machine learning frameworks such as TensorFlow and PyTorch are used to analyze abnormal behavior.
[0723] terminal
[0724] The device receives notification information sent from the server. The received information is output as audio using Text-to-Speech (TTS) technology. The device also visualizes the notification in an easy-to-understand way for the user using JavaScript or React. The device reports to relevant organizations as needed.
[0725] User
[0726] Users can check the situation on-site based on notification information provided by their devices and take appropriate action. For example, after receiving a warning, a user can immediately go to the site and take specific measures. Users can also configure the system's operation according to their individual needs. This includes features such as adjusting the type and frequency of notifications.
[0727] Specific example
[0728] Examples in unmanned stores
[0729] A monitoring device installed in an unmanned store captures a customer entering the store wearing a helmet. The server detects this and generates a warning message, "Please remove your helmet," using a "generative AI model." The terminal plays this message through its speaker, prompting the customer to take action.
[0730] Examples in private residences
[0731] If a suspicious person is detected in a private residence, the server generates sounds and voices to simulate the presence of a resident. These generated voices are played from a terminal, creating psychological pressure on the intruder and preventing them from entering.
[0732] Example of a prompt
[0733] "If there is a suspicious person in the parking lot, please generate audio that makes it sound as if a resident is at home."
[0734] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0735] Step 1:
[0736] The server receives time-series data from the monitoring device. It also receives real-time video data transmitted from the monitoring device as input. This data is transferred using the Real-Time Streaming Protocol (RTSP). The received video data is then prepared to be broken down into individual frames.
[0737] Step 2:
[0738] The server preprocesses the received video data. It receives the video frames decomposed in step 1 as input. During this process, it uses the "OpenCV" library to perform noise reduction and resolution adjustment. Specifically, it uses Gaussian blur to reduce noise and improve analysis accuracy. The output is a preprocessed, clean video frame.
[0739] Step 3:
[0740] The server analyzes abnormal behavior using preprocessed data. It receives the preprocessed frames obtained in step 2 as input. Using machine learning libraries such as "TensorFlow," an AI model identifies abnormal behavior. The analysis yields an anomaly detection result as output. This result includes the behavioral characteristics of the suspicious person and a determination of whether that behavior is abnormal or not.
[0741] Step 4:
[0742] The server generates notification information based on the detection results. The anomaly detection results obtained in step 3 are used as input. Utilizing the "Generative AI Model," the AI generates a situation-appropriate warning message based on the prompt text. For example, if suspicious activity is detected, a message such as "A suspicious person has been detected. Please take action." is generated. The generated warning message is obtained as output.
[0743] Step 5:
[0744] The terminal receives and displays the generated warning message. It receives the warning message sent from the server in step 4 as input. The terminal uses Text-to-Speech (TTS) technology to output the message as audio. It also displays a text notification on the screen. The warning content is communicated to the user both audibly and visually as output.
[0745] Step 6:
[0746] The user makes decisions based on notifications from their device. The warning message received in step 5 is used as input. The user takes appropriate action depending on the situation, such as rushing to the scene or notifying the relevant authorities. The output is that safety is ensured and actions are taken to resolve the problem.
[0747] (Application Example 1)
[0748] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0749] Conventional security systems have struggled to effectively detect abnormal behavior and suspicious individuals, and to respond quickly. In particular, there is a need for highly effective deterrents to prevent intruders from entering homes. Furthermore, if the types of alarms and voice messages are not appropriately generated, the deterrent effect against suspicious individuals is diminished.
[0750] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0751] In this invention, the server includes data processing means for receiving video information from a monitoring system and analyzing the video information to detect abnormal behavior; information generation means for generating a warning message based on the abnormal behavior detected by the data processing means; and sound generation means for outputting the generated audio signal to a public address system in real time. This makes it possible to effectively deter suspicious individuals within a residence and to respond quickly.
[0752] A "surveillance system" refers to any device that acquires video information and detects suspicious individuals or abnormal behavior.
[0753] "Video information" refers to visual data acquired by surveillance systems, and specifically to digital video data that is subject to analysis.
[0754] "Data processing means" refers to software or hardware used to analyze acquired video information and detect abnormal behavior or suspicious individuals.
[0755] "Information generation means" refers to processes and technologies for creating appropriate warning messages based on detected abnormal behavior.
[0756] "Signal output means" refers to a device or means that has the function of outputting the generated warning message as an audio signal or visual signal to communicate it to the outside.
[0757] "Information and communication means" refers to communication technology used to transmit data via a network to management information terminals and other devices for reporting status.
[0758] "Sound generation means" refers to the technology and equipment used to generate a message created by an information generation means as an audio signal and output it to a public address system.
[0759] A "public address system" refers to a device used to amplify generated audio signals in real time and to amplify sound within a designated space.
[0760] In order to implement this invention, it is necessary to construct a network system that includes a monitoring system, a data processing server, an information generation device, an acoustic public address system, and a management information terminal.
[0761] The server receives video information transmitted in real time from the monitoring system. The program uses Python, TensorFlow, and OpenCV to decompose the video information frame by frame and perform preprocessing such as noise reduction and resolution adjustment. Next, data processing tools analyze these frames to detect abnormal behavior and suspicious individuals. An AI model is used for this analysis, identifying anomalies more quickly and accurately than conventional security systems. The generation technology utilizes a generative AI model (e.g., OpenAI's GPT) to generate warning messages corresponding to abnormal behavior.
[0762] The terminal receives the generated warning message and generates an audio signal via an acoustic generation device. This audio signal is transmitted to an acoustic sound amplifier via Bluetooth or Wi-Fi connection and played back in real time by the sound amplifier. This can exert a psychological deterrent effect on suspicious individuals.
[0763] Users can review the information they receive through the management terminal and adjust the settings of the sound system if necessary. Throughout this entire process, the system provides quick and effective security measures within the residence.
[0764] As a concrete example, consider a scenario where an intruder enters a residential yard. The AI model instantly detects the intruder and generates a warning message: "Is anyone there? If you are, please leave immediately." This message is quickly played through an audible loudspeaker, effectively deterring further intrusion.
[0765] Examples of prompts to send to a generative AI model:
[0766] "A suspicious person has been detected. Please generate a warning message that says, 'Is anyone there? If you are, please leave immediately.'"
[0767] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0768] Step 1:
[0769] The server receives video information from the surveillance system in real time. The input is streamed video data from the surveillance camera. The server breaks down this data into frames and performs preprocessing, such as noise reduction and resolution adjustment, using OpenCV. The output is preprocessed, clear video frame data.
[0770] Step 2:
[0771] The server inputs pre-processed video frames into an AI model to analyze for abnormal behavior and the presence of suspicious individuals. The input is pre-processed video frame data. The AI model uses TensorFlow to analyze abnormal behavior and determine the presence of suspicious individuals. This process outputs a judgment as an analysis result, indicating whether or not abnormal behavior was detected.
[0772] Step 3:
[0773] The server uses a generative AI model to generate a warning message based on the abnormal behavior detected by the AI model. The input is the analysis result from the AI model. The generative AI model (e.g., OpenAI's GPT) generates an appropriate warning message based on the prompt text. The output is the generated warning message in text format.
[0774] Step 4:
[0775] The terminal receives a warning message from the server, converts it into physical speech, and sends it to the sound amplifier. The input is the text data of the generated warning message. The terminal generates the speech signal using Text-to-Speech (TTS) technology. The generated speech signal is output to the sound amplifier via Bluetooth or Wi-Fi.
[0776] Step 5:
[0777] The user checks for suspicious person notifications on the management information terminal and monitors the operation status of the sound amplification system. Inputs include warning notification information and audio output status from the terminal. The user adjusts system settings and takes additional measures as needed. Outputs are information about adjustments and countermeasures taken by the user.
[0778] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0779] This invention combines a system that analyzes video data from a monitoring device to detect abnormal behavior with an emotion engine that recognizes user emotions. This system uses a server, terminal, and emotion engine to comprehensively analyze abnormality detection and the user's emotional state, thereby enabling the generation and notification of more accurate warning messages.
[0780] server
[0781] The server receives video data acquired from monitoring devices in real time and analyzes abnormal behavior using an AI model. Based on the analysis results, if abnormal behavior is detected, it generates a warning message using generation technology. The generated message is adjusted as appropriate depending on the type of abnormal behavior. In addition, the server evaluates the user's emotional state via an emotion engine and adjusts the warning message based on changes in emotion.
[0782] Emotional Engine
[0783] The emotion engine analyzes the user's voice and behavioral data to determine their emotional state in real time. This information is sent to the server and used to generate warning messages, ensuring that the most effective messages are delivered to the user.
[0784] terminal
[0785] The device receives warning messages and emotional information sent from the server and presents them to the user. For example, if the user is in a stressful situation, it can output a more empathetic message. Furthermore, based on information from the emotion engine, the device provides the platform with support tailored to the user's emotions.
[0786] Specific example
[0787] Application examples in unmanned retail stores
[0788] In an unmanned store, if a monitoring device detects abnormal behavior in the store, the server uses generation technology to create an appropriate warning message. For example, if the emotion engine detects that a customer is agitated, the terminal will display a message to the customer saying, "We will assist you shortly, please relax and wait."
[0789] Examples of applications in private residences
[0790] In a private residence, if a surveillance device detects an intruder, the server immediately generates a warning message. Furthermore, if the emotion engine detects the resident's anxiety, the terminal plays a reassuring message such as, "We have contacted the police until safety is confirmed," to put the resident at ease.
[0791] This system provides sophisticated interactions that take user emotions into consideration, enabling more effective detection and response to anomalies.
[0792] The following describes the processing flow.
[0793] Step 1:
[0794] The server receives video data from the monitoring device in real time, breaks down the data into frames, and performs preprocessing. This includes noise reduction and resolution adjustment, and converting the data into a format suitable for the AI model.
[0795] Step 2:
[0796] The server analyzes pre-processed frames using an AI model to detect abnormal behavior. It evaluates patterns of abnormal behavior, and if a relevant behavior is detected, it identifies the data and records it in the management system.
[0797] Step 3:
[0798] The server uses an emotion engine to analyze the user's emotional state from their voice and video. The emotion engine identifies emotions such as stress, anxiety, and joy, and generates an emotion rating based on that.
[0799] Step 4:
[0800] Based on the detection results of anomalies and sentiment evaluations, the server uses generation technology to customize warning messages. For example, if the user is highly anxious, it generates a reassuring message and adjusts it accordingly.
[0801] Step 5:
[0802] The server sends the generated warning message and emotional state information to the terminal. This provides real-time warnings while also offering notifications that are sensitive to the user's emotions.
[0803] Step 6:
[0804] The device uses messages received from the server to output warnings as audio or text. If the user is emotionally unstable, a calmer-toned message will be played to help stabilize the situation.
[0805] Step 7:
[0806] Users will review information from their devices and take appropriate action if necessary. If the anomaly is serious, they will take measures such as contacting the police and optimize the system through device settings.
[0807] (Example 2)
[0808] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0809] Conventional monitoring systems often fail to provide appropriate messages tailored to the user's emotions and circumstances when detecting abnormal behavior and issuing warnings, resulting in limited effectiveness of the warnings. In particular, in situations where users experience anxiety or stress, uniform warning messages fail to provide sufficient reassurance, making it difficult to effectively ensure user safety.
[0810] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0811] In this invention, the server includes an analysis means for receiving video information from a monitoring device and analyzing the video information to detect abnormal behavior; a generation means for generating warning information based on the abnormal behavior detected by the analysis means; and an adjustment means for adjusting the warning information based on the user's emotions. This makes it possible to generate and present an optimal warning message according to the type of abnormal behavior and the user's emotional state.
[0812] A "monitoring device" is a device that acquires video information from the environment and transmits it to a server.
[0813] "Visual information" refers to visual data acquired by surveillance equipment and is used to analyze abnormal behavior.
[0814] "Analysis means" refers to algorithms and technologies used to process received video information and identify abnormal behavior.
[0815] A "generation means" refers to a process or system that creates appropriate warning information based on the results of abnormal behavior obtained through analysis means.
[0816] A "modification mechanism" is a system that modifies the generated warning information according to the user's emotional state and presents it in the most optimal format.
[0817] "Display means" refers to devices or methods for presenting adjusted warning information to users.
[0818] "Communication means" refers to technologies and systems for transmitting detected abnormal behavior and related information to information terminals.
[0819] An "information terminal" is a device used to receive information about abnormal behavior via communication means.
[0820] This invention relates to a system that analyzes video information acquired from surveillance equipment to detect abnormal behavior and generates and presents warning messages that take emotional information into consideration. This system is primarily implemented using a server, terminals, and an emotion engine.
[0821] The server receives video information in real time from the monitoring equipment. The video information is transmitted to the server using a streaming protocol. The server analyzes this video information using a deep learning framework (e.g., TensorFlow or PyTorch) to detect objects and recognize movements, thereby identifying abnormal behavior. Based on these results, the server uses a generative AI model to generate appropriate warning messages based on the prompt text. For example, a warning message such as "Intruder is currently entering" might be created based on this prompt.
[0822] The emotion engine analyzes the user's voice data and behavioral information to evaluate their emotional state. This evaluation is performed using voice recognition software and an emotion analysis API. This determines the user's emotional state and provides this information to the server. The server then uses this emotion information to adjust warning messages to the most appropriate form.
[0823] The device receives customized warning messages sent from the server and presents them to the user. The device displays the tailored warning information to the user through the screen and audio. For example, if the user is feeling anxious, a reassuring message such as "Please wait until safety is confirmed" will be played via audio output. The device can also use information from its emotion engine to provide necessary support.
[0824] As a specific example, if a monitoring device in an unmanned store detects a particular abnormal behavior, the server generates a warning message with the prompt, "The monitoring camera has detected a confused customer. Please generate a relaxing message." In this way, the present invention enables the detection of abnormal behavior as well as the provision of effective warning messages tailored to the user's emotions.
[0825] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0826] Step 1:
[0827] The server receives video information from the monitoring equipment in real time. This input is transmitted encrypted as a data stream. The server decodes this data using a streaming protocol and stores it as video frames to prepare for analysis.
[0828] Step 2:
[0829] The server analyzes the received video frames using a deep learning framework (e.g., TensorFlow or PyTorch). The input data is provided to a model that detects objects and recognizes actions frame by frame. If the AI model detects abnormal behavior, it outputs the result as an alert and generates data including the type and location of the abnormal behavior.
[0830] Step 3:
[0831] The server utilizes a generative AI model to generate warning messages based on detected abnormal behavior. In this step, the AI model receives a prompt as input and generates an appropriate message in response to that prompt. For example, a warning message such as "Intruder is currently entering the premises" might be created.
[0832] Step 4:
[0833] The emotion engine collects user voice and behavioral data and evaluates their emotional state in real time. Voice data is converted to text using speech recognition software and input into the emotion analysis API. This outputs the user's emotional parameters, which are then sent to the server.
[0834] Step 5:
[0835] The server optimizes warning messages using emotion parameters sent from the emotion engine. The input for this optimization is the original warning message and the user's emotion parameters. Based on this data, the server adjusts the tone and content of the warning message and outputs the adjusted message.
[0836] Step 6:
[0837] The terminal receives a pre-arranged warning message from the server and presents it to the user. The input to this process is the pre-arranged warning message, which the terminal displays on the screen or outputs as audio, depending on the method. Specifically, depending on the user's status, a message such as "Please wait until safety is confirmed" may be displayed.
[0838] Step 7:
[0839] The device provides necessary support based on information from the emotion engine. The input for this support is the user's emotional state. The device performs actions such as automatically launching a mental health support application to provide additional support as needed by the user.
[0840] (Application Example 2)
[0841] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0842] In abnormal behavior detection systems using monitoring devices, conventional methods have the problem of not being able to issue warning messages that take into account the user's emotions. Therefore, warning messages are not optimized according to the user's emotional state, resulting in a lack of reassurance and effective interaction. Furthermore, improving the accuracy of generating appropriate messages according to the type of abnormal behavior is also necessary.
[0843] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0844] In this invention, the server includes a computing means for receiving video signals from a monitoring device and analyzing the video signals to detect abnormal behavior; a generating means for generating warning information based on the abnormal behavior detected by the computing means; and an emotion recognition means for recognizing the user's emotional state in real time. As a result, warning messages are adjusted based on the user's emotional information, enabling more effective and reassuring interactions.
[0845] A "surveillance device" is a device that collects video signals and transmits those videos to a computing device for analysis.
[0846] "Video signal" refers to a series of video data captured by a monitoring device, and is used for analyzing abnormal behavior.
[0847] "Computation means" refers to hardware or software that analyzes video signals and performs processing to detect abnormal behavior.
[0848] "Abnormal behavior" refers to actions that deviate from normal behavioral patterns and are judged by the system as actions that require vigilance or attention.
[0849] "Warning information" refers to notifications, cautionary messages, or signals generated when abnormal behavior is detected.
[0850] "Generation means" refers to a function that executes a process to create warning information based on the detection results of abnormal behavior.
[0851] "Emotion recognition means" refers to methods for analyzing and acquiring a user's emotional state in real time.
[0852] "Communication means" refers to a network or interface for transmitting detection results and user emotion information to an external control device or similar.
[0853] The "adjustment mechanism" is a function that modifies warning information to the most optimal content based on acquired emotional information.
[0854] This invention relates to a system in which a server receives video signals from a monitoring device and analyzes abnormal behavior using computational means. The server analyzes the video signals in real time using image processing libraries such as OpenCV, and when it detects abnormal behavior, it generates warning information using generative AI technology. This generated warning information is adjusted to take into account the user's emotional state.
[0855] To recognize the user's emotional state, an emotion recognition system is used. This system uses libraries such as the EmotionRecognition library to estimate emotions in real time from the user's voice and behavioral data. The emotion information is sent to a server and used when creating prompt statements based on a generative AI model.
[0856] As a concrete example, suppose the system is implemented in an apartment management setting, and a suspicious person is detected by the monitoring device. If the system detects that a resident is feeling anxious, the server generates a message saying, "The area is safe, please rest assured." This message is individually tailored to alleviate the resident's anxiety.
[0857] Examples of prompts for a generative AI model include:
[0858] "An intruder has been detected. Please generate a message to reassure the residents and alleviate their anxiety."
[0859] In this way, the system can provide effective warning information tailored to the user's situation and emotional state, creating a reassuring environment.
[0860] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0861] Step 1:
[0862] The server receives video signals from the monitoring device. The input is a video signal, and the output is analyzable image data. Image processing libraries such as OpenCV are used to decompose the video signal into frames, preparing it for analysis.
[0863] Step 2:
[0864] The server uses computational tools to analyze video frames to determine the presence or absence of abnormal behavior. The input is frame data, and the output is the detection result of abnormal behavior. An AI model is used to compare and analyze the abnormal behavior with normal behavior patterns to determine if abnormal behavior exists.
[0865] Step 3:
[0866] If abnormal behavior is detected, the server generates warning information using a generative AI model. The input is the result of the detected abnormal behavior, and the output is the generated warning message. The generative AI model creates a valid prompt sentence according to the nature of the abnormality and constructs the warning message using natural language processing techniques.
[0867] Step 4:
[0868] The server analyzes the user's emotional state in real time using emotion recognition technology. Input is user voice and behavioral data, and output is estimated emotion data. The EmotionRecognition library is used to identify emotions from factors such as voice tone and facial expressions.
[0869] Step 5:
[0870] The server adjusts warning messages based on estimated sentiment data. The input is the warning message and sentiment data, and the output is the adjusted, personalized message. By changing the tone and content of the message, for example, to softer language, the server reduces user anxiety.
[0871] Step 6:
[0872] The device presents the user with a pre-arranged warning message. The input is the final message sent to the device, and the output is the information provided to the user. The device communicates the message to the user and provides reassurance through screen display and audio output.
[0873] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0874] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0875] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0876] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0877] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. In the upper and lower directions of the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. Also, the upper side of the concentric circles is where "pleasant" emotions are located, and the lower side is where "unpleasant" emotions are located. In this way, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0878] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0879] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0880] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0881] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0882] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0883] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0884] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0885] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0886] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0887] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0888] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0889] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0890] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0891] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0892] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0893] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0894] The following is further disclosed regarding the embodiments described above.
[0895] (Claim 1)
[0896] An artificial intelligence means that receives video data from a monitoring device, analyzes the video data to detect abnormal behavior,
[0897] A generation means that generates a warning message based on abnormal behavior detected by the artificial intelligence means,
[0898] An output means for outputting the warning message,
[0899] A communication means for notifying a management terminal of the results of detecting the abnormal behavior,
[0900] A system that includes this.
[0901] (Claim 2)
[0902] The system according to claim 1, characterized in that the generation means generates a dialogue-style message according to the type of abnormal behavior.
[0903] (Claim 3)
[0904] The system according to claim 1, characterized in that when the communication means recognizes the face of a suspicious person that has been registered in advance, it notifies the management terminal of information about the suspicious person.
[0905] "Example 1"
[0906] (Claim 1)
[0907] Information processing means that receives time-series data from a monitoring device, analyzes the data, and determines anomalies,
[0908] A generation means that generates notification information based on an anomaly identified by the information processing means,
[0909] A display means for outputting the notification information,
[0910] A transmission means for transmitting the result of the abnormality detection to the operating terminal,
[0911] A system that includes this.
[0912] (Claim 2)
[0913] The system according to claim 1, characterized in that the generation means generates a dialogue-style notification according to the type of anomaly.
[0914] (Claim 3)
[0915] The system according to claim 1, characterized in that when the transmission means recognizes the characteristics of a suspicious person that have been registered in advance, it transmits information about the suspicious person to the operating terminal.
[0916] "Application Example 1"
[0917] (Claim 1)
[0918] A data processing means that receives video information from a monitoring system and analyzes the video information to detect abnormal behavior,
[0919] Information generation means that generates a warning message based on abnormal behavior detected by the data processing means,
[0920] A signal output means for outputting the warning message,
[0921] Information communication means for notifying a management information terminal of the results of detecting the abnormal behavior,
[0922] A sound generation means that generates an audio signal using the warning message and outputs it to a public address system in real time,
[0923] A system that includes this.
[0924] (Claim 2)
[0925] The system according to claim 1, characterized in that the information generation means generates a dialogue-style message according to the type of abnormal behavior and exerts a psychological inhibitory effect through the generated audio signal.
[0926] (Claim 3)
[0927] The system according to claim 1, characterized in that when the information communication means recognizes the characteristics of a registered suspicious person, it notifies the management information terminal of the suspicious person's information, and also transmits a warning message to a home information processing device to alert people within the residence.
[0928] "Example 2 of combining an emotion engine"
[0929] (Claim 1)
[0930] An analysis means that receives video information from a monitoring device and analyzes the video information to detect abnormal behavior,
[0931] A generation means that generates warning information based on abnormal behavior detected by the analysis means,
[0932] A means for adjusting the warning information based on the user's emotions,
[0933] A display means for presenting the adjusted warning information,
[0934] A communication means for notifying an information terminal of the results of detecting the abnormal behavior,
[0935] A system that includes this.
[0936] (Claim 2)
[0937] The system according to claim 1, characterized in that the generation means generates conversational messages according to the type of abnormal behavior and the emotional state of the user.
[0938] (Claim 3)
[0939] The system according to claim 1, characterized in that when the communication means recognizes the face of a suspicious person that has been registered in advance, it notifies an information terminal of the information of the suspicious person.
[0940] "Application example 2 when combining with an emotional engine"
[0941] (Claim 1)
[0942] A computing means that receives a video signal from a monitoring device and analyzes the video signal to detect abnormal behavior,
[0943] A generation means that generates warning information based on abnormal behavior detected by the calculation means,
[0944] An output means for outputting the warning information,
[0945] A communication means for notifying the control device of the detection result of the abnormal behavior,
[0946] An emotion recognition method that recognizes the user's emotional state in real time,
[0947] An adjustment means for adjusting warning information based on emotional information obtained by the emotion recognition means,
[0948] A system that includes this.
[0949] (Claim 2)
[0950] The system according to claim 1, characterized in that the generation means generates dialogue-based information according to the type of abnormal behavior and adjusts the content of the information according to the emotional information.
[0951] (Claim 3)
[0952] The system according to claim 1, characterized in that when the communication means recognizes the facial information of a suspicious individual that has been registered in advance, it notifies the control device of the information of the individual. [Explanation of Symbols]
[0953] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A data processing means that receives video information from a monitoring system and analyzes the video information to detect abnormal behavior, Information generation means that generates a warning message based on abnormal behavior detected by the data processing means, A signal output means for outputting the warning message, Information communication means for notifying a management information terminal of the results of detecting the abnormal behavior, A sound generation means that generates an audio signal using the warning message and outputs it to a public address system in real time, A system that includes this.
2. The system according to claim 1, characterized in that the information generation means generates a dialogue-style message according to the type of abnormal behavior and exerts a psychological inhibitory effect through the generated audio signal.
3. The system according to claim 1, characterized in that when the information communication means recognizes the characteristics of a registered suspicious person, it notifies the management information terminal of the information of the suspicious person, and also transmits a warning message to a home information processing device to notify people within the residence.