system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A system using a wearable video acquisition device for real-time analysis and notification addresses the challenge of safely reporting illegal acts, ensuring efficient and safe public order maintenance.

JP2026100537APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: SOFTBANK GROUP CORP
Filing Date: 2024-12-09
Publication Date: 2026-06-19

Application Information

Patent Timeline

09 Dec 2024

Application

19 Jun 2026

Publication

JP2026100537A

IPC: H04N7/18; G07C9/29; G08B25/08; G08B25/04; G08B29/18; G08B23/00; G08B29/10; G07C11/00; G08B21/02; G08B25/10; G08B25/06; G08B29/16; G08B26/00; G08B25/14

AI Tagging

Application Domain

Checking apparatus Closed circuit television systems

Technology Topics

Engineering Data mining

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Texitile light ageing test instrument
CN1588059Acompact structure Easy to assemble and disassemble Material analysis by optical meansTextile testingEngineering Light filter
Multi-dimensional training method and device of support vector machine
CN114186620AImprove linear separabilityimprove classificationKernel methods Character and pattern recognition Data set Descent algorithm
Loop structure of cold heat flows
CN1916533AImprove efficiencySimple configurationFluid circulation arrangement Heating and refrigeration combinations Heat flow Working fluid
Environment-friendly mobile collecting box for decoration cutting dust
CN108636005AThe dragging process is smoothavoid secondary flyingUsing liquid separation agent Working accessories Engineering Sediment
An IGBT lifetime prediction method based on a GA-Elman-LSTM combined model
CN115964937BImprove forecast accuracySolve the problem of easy to fall into local minimumInternal combustion piston engines Biological models Engineering Data mining

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

There is a need for a safer and less conspicuous method to report illegal or nuisance acts, as conventional handheld devices for evidence capture can be risky and inefficient, and there are limited means to promptly alert legal authorities.

Method used

A system using a video acquisition device worn by the user to analyze video in real-time for illegal activities, providing immediate user notifications and automatic alerts to public authorities if necessary, ensuring safety and efficient public order maintenance.

Benefits of technology

Enables efficient and safe reporting of illegal activities by analyzing video in real-time, providing user notifications and automatic alerts, thereby maintaining public order while ensuring user safety.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026100537000001_ABST

Patent Text Reader

Abstract

We provide the system. [Solution] Means of acquiring video, An analysis means for analyzing the acquired video, A means of determining illegality based on the analysis results, A notification mechanism that alerts the user based on the judgment result, Based on the notification, the means of reporting to public institutions, A system that includes this.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] Recently, in areas where public security has deteriorated, even when witnessing illegal or nuisance acts, there are limited means to report them safely and effectively. In particular, taking evidence with a conventional handheld device is too conspicuous and may lead to trouble. Therefore, there is a need for a method that can record acts that should be noted in a safer and less conspicuous way and promptly prompt legal authorities to respond.

Means for Solving the Problems

[0005] This invention provides a system that uses a device equipped with a video acquisition means worn by the user to analyze the acquired video in real time and determine whether an action is illegal. Based on the analysis results, this system notifies the user of a warning and, if necessary, includes means to notify public authorities. This makes it possible to maintain public order in the area safely and efficiently.

[0006] A "video acquisition device" is a device used to collect information that a user is visually viewing as digital data.

[0007] "Analysis means" refers to a function that uses image processing and artificial intelligence technology to understand the content based on acquired video data and analyzes it according to specific evaluation criteria.

[0008] A "decision-making tool" is a device or program that uses rules and conditions set by the system based on the analysis results to evaluate illegality and derive a result.

[0009] "Notification means" refers to a method or device for communicating the decision result to the user, providing information through visual display or audio output.

[0010] A "notification mechanism" is a device or function that automatically or manually notifies a public authority based on analytical information when specific conditions are met. [Brief explanation of the drawing]

[0011] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, when an emotion engine is combined. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0012] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0013] First, let's explain the terminology used in the following explanation.

[0014] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0015] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0016] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs, various parameters, and the like. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0017] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor, an antenna, and the like. The communication I / F manages communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0018] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0019] [First Embodiment]

[0020] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0021] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0022] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0023] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0024] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0025] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0026] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0027] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0028] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0029] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0030] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0031] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0032] This invention is effective when used with video acquisition equipment and an analysis program linked to it, which are used by the user. The embodiments thereof are described below.

[0033] The user first puts on a smart glasses-type device. This device has a built-in camera that can capture video in the user's field of view in real time. After capturing the video, the device compresses the data and sends it to the user's smartphone, which then sends the data to a server.

[0034] The server is responsible for analyzing the received video data. This involves using highly accurate artificial intelligence algorithms to recognize movement and objects within the video and determine whether illegal or disruptive activities are occurring. This decision-making process utilizes advanced pattern recognition technology based on pre-trained datasets.

[0035] Based on the analysis results, the server immediately sends feedback to the user's smartphone. The notification includes details of the detected incident and recommended actions. For example, if an assault is detected, a notification will appear stating, "Your safety may be at risk, so please maintain a safe distance."

[0036] Furthermore, if the server determines through analysis that serious illegal activity has been detected, it will automatically prepare a report to the public authorities. This report will include the type of incident identified, relevant location information, and, if necessary, a portion of the video footage. Users can review the details of this report and, if necessary, manually approve it.

[0037] Thus, the system of the present invention can efficiently and safely support the maintenance of local public order through the cooperation between the user's terminal and the server. For example, if the server determines that someone is illegally creating graffiti in a public place while the user is patrolling a certain area, the user will be immediately notified, and the police will be automatically notified, thereby prompting a swift response. This process makes it possible to contribute to maintaining local order while ensuring the user's safety.

[0038] The following describes the processing flow.

[0039] Step 1:

[0040] When the user activates the system, the device begins acquiring real-time video through the camera on the smart glasses being worn. This collects information within the user's field of vision as digital data.

[0041] Step 2:

[0042] The device transfers the acquired video data to the smartphone via Bluetooth. At this time, the data is appropriately compressed to improve the efficiency of inter-device communication.

[0043] Step 3:

[0044] The server acquires video data received from the user's smartphone. The data is then passed to an analysis module in preparation for immediate analysis.

[0045] Step 4:

[0046] The server begins analyzing the received video using an artificial intelligence algorithm. Here, a trained model is used to recognize objects and movements in the video and evaluate their legality in light of laws and regulations.

[0047] Step 5:

[0048] The server uses the analysis results to determine whether illegal or disruptive behavior has been detected. If the observed values exceed a threshold, the process proceeds to the next step.

[0049] Step 6:

[0050] The server generates feedback to provide to the user based on its assessment. For example, it might create a notification message such as, "Suspicious behavior detected. Please be careful."

[0051] Step 7:

[0052] The terminal notifies the user of feedback sent from the server. Notifications are made via voice and visual interfaces, allowing the user to understand the situation.

[0053] Step 8:

[0054] The server prepares to report to public authorities such as the police if serious illegal activity is detected. The report will include details of the scene, location information, and details of the illegal activity.

[0055] Step 9:

[0056] Users review the report content via their device and, if they have the option to manually approve the report, they do so. This process ensures reliable communication.

[0057] Step 10:

[0058] The reported information is securely transmitted from the server to the appropriate public authorities. This allows for the rapid initiation of necessary on-site responses. The process is then complete.

[0059] (Example 1)

[0060] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0061] In modern society, maintaining public safety requires means to quickly and efficiently detect and respond to illegal and disruptive behavior. However, current technologies have not adequately captured and analyzed real-time video, resulting in delays, especially with massive amounts of data, making rapid response difficult. Furthermore, there has been a lack of coordination between appropriate notifications and automated reporting, even in cases where users themselves may be harmed. It is necessary to solve these problems and support the maintenance of local public safety more efficiently.

[0062] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0063] In this invention, the server includes video acquisition means, data processing means for compressing and transmitting the acquired video, and communication means for transmitting the compressed video data over a network. This enables real-time information analysis, prompt notification to users, and rapid notification to public authorities as needed.

[0064] "Image acquisition means" refers to a device that has the function of acquiring images that come into the user's field of view in real time.

[0065] "Data processing means" refers to technology that enables efficient data communication by compressing acquired video data.

[0066] "Communication means" refers to a system equipped with the function of transmitting compressed video data to other devices via a network.

[0067] "Analysis means" refers to a technology that uses highly accurate artificial intelligence to analyze behavioral patterns in the process of analyzing received video data.

[0068] A "judgment tool" is a system that has the function of determining whether or not an action in a video is illegal based on the analysis results.

[0069] A "notification method" is a system that provides warnings and information to users based on the results of a decision.

[0070] "Reporting methods" refer to technologies that include the process of preparing and transmitting reports to public institutions as needed.

[0071] A "generative AI model" is a form of artificial intelligence technology that uses pre-trained datasets to perform sophisticated behavioral pattern analysis.

[0072] This invention is a system that achieves its effectiveness by using a video acquisition device worn by the user and an analysis program that works in conjunction with it. Specifically, the user wears smart glasses-type terminals. These terminals have high-performance cameras built in and can acquire images that come into the user's field of view in real time. This image acquisition allows the user to constantly be aware of their surroundings.

[0073] The acquired video data is first compressed on the device. An efficient codec is used for this compression, minimizing the amount of data transmitted. The compressed data is then sent to the server via the user's smartphone. A secure communication protocol is used for transmission, guaranteeing data confidentiality.

[0074] The server analyzes the received video data using a generating AI model. This AI model utilizes advanced pattern recognition technology to recognize movement and objects within the video in real time. As a result, it can quickly determine whether illegal or disruptive activities are occurring.

[0075] The analysis results are immediately fed back to the user and sent directly as a notification to the smart glasses. The notification includes details of the recognized event and recommended next actions. Based on this information, the user can respond quickly and safely.

[0076] For example, if a user is walking around town and detects suspicious behavior, the server can analyze the movement, determine if it is illegal, and send a notification to the user saying, "There is a suspicious person. Please keep your distance." Furthermore, if it is determined that reporting is necessary, the server can automatically prepare the necessary information for reporting to public authorities.

[0077] Examples of prompts include, "Please explain how a system for detecting suspicious behavior in public places works," and "Please explain a method for performing real-time video analysis using smart glasses."

[0078] Thus, the system of the present invention enables efficient and safe support for maintaining the safety of local communities through the cooperation between a video acquisition device used by the user and a server.

[0079] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0080] Step 1:

[0081] The device acquires video footage that is within the user's field of view.

[0082] The input is real-time video from the user's field of view, and the output is digitized video data. A camera built into the device operates continuously, capturing real-time video along the user's line of sight. This ensures that the user's surroundings are faithfully recorded.

[0083] Step 2:

[0084] The device compresses the acquired video data.

[0085] The input is the raw video data obtained in Step 1, and the output is compressed video data. An efficient codec is used for the compression process to reduce the amount of data, thereby improving communication efficiency. This makes it possible to minimize transmission delay while maintaining video quality.

[0086] Step 3:

[0087] The device sends compressed video data to the smartphone.

[0088] The input is the compressed video data obtained in step 2, and the output is the data transferred to the smartphone. The terminal uses wireless communication such as Bluetooth or Wi-Fi to quickly transmit the data to the smartphone. This process allows the user's smartphone to obtain the data necessary for the next processing step.

[0089] Step 4:

[0090] The smartphone sends the received data to the server.

[0091] The input is compressed video data stored on a smartphone, and the output is data sent to a server. The smartphone uses a secure communication protocol (e.g., HTTPS) to ensure the confidentiality and integrity of the data before sending it to the server. This prepares the server for analysis.

[0092] Step 5:

[0093] The server analyzes the received video data.

[0094] The input is compressed video data stored on the server, and the output is the analysis results. The server utilizes a generative AI model and advanced pattern recognition to identify dynamic elements and objects within the video and determine whether or not illegal activity is occurring. This analysis process is performed in real time, resulting in highly accurate results.

[0095] Step 6:

[0096] The server sends a notification to the user based on the analysis results.

[0097] The input is the analysis results obtained in step 5, and the output is notification information for the user. The server sends the dangers and recommended actions identified in the analysis to the user's smartphone, displaying them as notifications on the device. This allows the user to immediately understand the situation and take appropriate action.

[0098] Step 7:

[0099] The server prepares to report if serious illegal activity is detected.

[0100] The input is the result of the illegality analysis in Step 5, and the output is the information prepared for reporting. The server aggregates the details of the incident and location information, automatically creates the content of the report to the public authorities, and supports immediate response. Users can manually review and approve the report as needed. This function enables a quick and effective response to danger.

[0101] (Application Example 1)

[0102] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0103] In recent years, as societal demands for safety have increased, there is a need for early detection and rapid response to illegal and disruptive behavior in public places. However, conventional technologies make immediate response on-site difficult, and there are problems with providing appropriate instructions and reports in emergencies. In such a situation, it is difficult to maintain local security while ensuring the safety of users.

[0104] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0105] In this invention, the server includes a video acquisition means, an analysis means for analyzing the acquired video, a determination means for determining illegality based on the analysis results, an instruction generation means for generating instructions for ensuring safety using the user's location information, and an emergency notification means for automatically notifying pre-set contacts in response to an emergency situation. This enables immediate notification of abnormal situations, provides instructions for the user to act safely, and automatically makes emergency notifications as needed.

[0106] A "video acquisition device" is a device that captures the user's surroundings in real time.

[0107] "Analysis means" refers to a function for processing acquired video data and analyzing its content.

[0108] A "judgment tool" is a system that evaluates the legality of actions and situations within a video based on the analysis results.

[0109] "Notification means" refers to a method for informing the user of the analysis and judgment results.

[0110] "Reporting methods" refer to methods for providing information to public institutions as needed.

[0111] The "instruction generation means" is a function that creates instructions to guide the user to take safe actions based on their location information and circumstances.

[0112] An "emergency notification method" is a system that automatically transmits the situation to pre-set contacts in the event of an emergency.

[0113] This invention provides a system implemented by a smart glasses-type terminal worn by the user and a server for analyzing video data acquired from that terminal. This system has the following specific configuration.

[0114] By wearing smart glasses, users can acquire real-time video of their surroundings. These smart glasses are equipped with a built-in camera that captures the user's field of view as video data. The video data is compressed within the smart glasses and then transmitted to the user's smartphone via communication functions. In this process, the smart glasses use common communication technologies such as Bluetooth and Wi-Fi.

[0115] The smartphone sends the received video data to a cloud server. A 4G or 5G network is used for data communication. The server analyzes the received video data using libraries such as TENSORFLOW® and OpenCV. The server uses a pre-trained AI model to identify and analyze movement and objects within the video.

[0116] Based on the analyzed data, the server determines whether illegal or disruptive activity has occurred. If necessary, it immediately notifies the user's smartphone and provides safety instructions. For example, if the server detects violent activity, advice for ensuring the user's safety will be displayed on the smartphone. Furthermore, if the situation is deemed serious, an automatic notification will be sent to registered emergency contacts.

[0117] As a concrete example, if security personnel are patrolling a shopping mall, the system can detect suspicious customer behavior and send a warning. In this case, an example of a prompt given to the generating AI model would be, "Please tell me how to detect suspicious behavior in the shopping mall, alert staff, and report to the management center if necessary." This enables effective security management.

[0118] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0119] Step 1:

[0120] When a user wears smart glasses, the device uses its built-in camera to capture images in the user's field of view in real time. The input is the real-world scenery within the user's field of view, and the output is digital image data. This digital image data is compressed to enable efficient transmission.

[0121] Step 2:

[0122] The compressed video data is transmitted from the terminal to the user's smartphone via Bluetooth or Wi-Fi. The input is the compressed video data, and the output is the compressed data received on the smartphone. The smartphone then prepares the received data to be sent to the server.

[0123] Step 3:

[0124] The smartphone sends compressed video data to a cloud server using a 4G or 5G network. The input is the compressed data on the smartphone, and the output is the compressed data stored on the server. The server receives this data and begins analysis.

[0125] Step 4:

[0126] The server uses libraries such as OpenCV and TensorFlow to analyze the received video data. The input is compressed video data, and the output is the analysis result. The analysis includes a process of identifying abnormal behavior and characteristic movements in the video using a generative AI model.

[0127] Step 5:

[0128] The server determines whether illegal or disruptive behavior is occurring based on the analysis results. The input is the analysis results, and the output is the judgment result. Here, it compares the results with pre-established behavioral patterns and identifies any anomalies.

[0129] Step 6:

[0130] If an anomaly is detected, the server immediately sends a notification to the smartphone. The input is the judgment result, and the output is the content of the notification on the smartphone. This notification includes recommended actions for the user in response to the detected anomaly.

[0131] Step 7:

[0132] The server automatically notifies pre-configured emergency contacts if a serious anomaly is detected. The input is the result of the serious anomaly detection, and the output is the content of the notification sent to the emergency contacts. This includes location information and details about the situation.

[0133] This series of steps allows users to act safely in real time and respond quickly when necessary.

[0134] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0135] This invention provides a system for detecting illegal activities occurring around a user, which achieves more accurate analysis and notification by also considering the user's emotional state. This system is composed of a combination of video acquisition means, analysis means, judgment means, notification means, reporting means, and emotion engine.

[0136] The user wears smart glasses that collect video footage of their surroundings in real time. The device compresses the video data and sends it to a server via a smartphone. On the server, video analysis tools use artificial intelligence to recognize patterns of illegal activity in the video and make a determination of illegality based on that recognition.

[0137] In addition to this process, the device is equipped with an emotion engine that recognizes the user's emotions. The emotion engine evaluates the user's current emotional state based on their facial expressions and voice data, and if a specific emotion such as tension or fear is detected, it provides that information to the analysis system. This allows the analysis system to make a judgment of illegality while also considering the user's emotional state, enabling more contextually appropriate notifications.

[0138] Based on the analysis of illegal activity and feedback from the emotion engine, the server sends a notification to the user's smartphone. This notification includes appropriate action guidelines tailored to the detected illegal activity and the user's own situation. For example, if it is determined that the user is in a state of panic, a notification such as "You are currently in a dangerous situation. Please remain calm, assess the situation, and move to a designated safe location" will be sent.

[0139] Furthermore, the server prepares to report to public authorities if serious illegal activity is detected or if there are concerns about user safety. Detailed situation reports, including information from the emotion engine, are added to the report to enable a swift and appropriate response.

[0140] For example, if the emotion engine determines that a user has witnessed a suspicious person in a crowd and is experiencing heightened anxiety, the server will prioritize the user's safety and immediately contact the police. This process ensures public safety in situations requiring a swift response while maintaining the user's security. By taking the user's emotional state into consideration, this system, which combines emotion recognition with normal illegal activity detection, can provide a more flexible and safer environment.

[0141] The following describes the processing flow.

[0142] Step 1:

[0143] When the user activates the smart glasses, the device acquires real-time video through its camera. This video includes information about the surrounding environment, and the user's entire field of vision is collected as digital data.

[0144] Step 2:

[0145] The terminal compresses the video data and transfers it to the user's smartphone via Bluetooth. This prepares the terminal for efficient data transmission to the server.

[0146] Step 3:

[0147] The server receives video data from the smartphone and prepares it to be passed to the analysis module. Maintaining data quality is the focus at this stage.

[0148] Step 4:

[0149] The server uses an artificial intelligence model as a means of video analysis to recognize patterns that may indicate illegal activity from the received video. This process utilizes motion detection and object recognition technologies.

[0150] Step 5:

[0151] The device uses an emotion engine to evaluate the user's emotional state based on facial expression and voice data. For example, it detects changes in facial expression and quantifies anxiety and stress levels.

[0152] Step 6:

[0153] The analysis method integrates the results of illegal activity detection on the server with user emotion data obtained from the emotion engine. This allows for analysis that is more tailored to the user's current situation.

[0154] Step 7:

[0155] Based on the analysis results, the server generates a notification for the user. The notification includes instructions that take into account the identified risks and the user's emotional state.

[0156] Step 8:

[0157] The device sends the generated notification to the user. The notification is presented to the user visually or audibly and includes specific instructions for action.

[0158] Step 9:

[0159] The server automatically initiates reporting procedures if it determines that serious illegal activity has occurred or that user safety is being threatened. Reports include location information and sentiment engine data.

[0160] Step 10:

[0161] Users are required to take appropriate action based on the information they receive. They can also verify the information they have received. This ensures the user's own safety while enabling the rapid dissemination of information to public authorities.

[0162] (Example 2)

[0163] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0164] Conventional illegal activity detection systems judge the abnormality of behavior without considering the user's emotional state, making it difficult to accurately grasp the degree of danger the user faces. Furthermore, because there are insufficient means to adequately protect the user's psychological safety in the detection of illegal activity, effective notification and reporting in situations where a swift and appropriate response is required is difficult.

[0165] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0166] In this invention, the server includes means for acquiring video data, means for evaluating and integrating user emotional data into analysis, and means for issuing warnings based on the judgment results and reporting to public authorities as necessary. This enables more accurate risk assessment that takes the user's emotional state into account, as well as appropriate notifications and prompt reporting based on those assessments.

[0167] "Video data" refers to information acquired in digital format for recording or transmitting visual information.

[0168] A "communication network" is a network infrastructure used to transmit digital data from one point to another.

[0169] Artificial intelligence is a computer system that uses large amounts of data to learn and enables pattern recognition and prediction.

[0170] "Emotional data" refers to information about a user's psychological state, obtained from their facial expressions, voice, and other sources.

[0171] A "public institution" is an administrative agency or similar organization established for the public good based on laws and regulations.

[0172] "Notification" is the act of formally conveying information to those who have the right to receive it.

[0173] "Analysis" is the process of examining data in detail and extracting meaningful information from it.

[0174] "Abnormal" refers to a situation that deviates from the normal state or expected conditions.

[0175] "Reporting" is the act of systematically organizing facts and circumstances and making them public.

[0176] This invention is a system that uses advanced analytical techniques to ensure user safety and accurate detection of illegal activities. The user wears smart glasses as a means of acquiring video data, thereby collecting surrounding video data in real time. The terminal efficiently compresses this video data and transmits it to a server via a communication network. For data compression, video compression libraries such as FFMPEG are utilized.

[0177] The server uses artificial intelligence technology to analyze video data. Specifically, it utilizes machine learning frameworks such as TensorFlow and OpenCV to analyze behavioral patterns in real time using trained models. This allows for the immediate recognition of illegal or abnormal behavior, and enables appropriate responses based on the results.

[0178] Furthermore, the device is equipped with an emotion engine to evaluate the user's emotions. This engine analyzes the user's facial expressions and voice data to identify their current emotional state. Specific methods used include Facial Emotion Recognition models. The server integrates this emotion data with the analysis results, enabling it to make decisions that take the user's mental state into account.

[0179] Furthermore, if the server detects a danger, it will immediately notify the user and provide appropriate instructions. In cases where the user is in a state of panic, the notification will use specific examples such as, "You are currently in a dangerous situation. Please remain calm, assess the situation, and move to a designated safe location."

[0180] The server automatically reports any serious illegal activity to public authorities. The reports include data obtained from the emotion engine to support a swift and appropriate response. This allows us to ensure user safety while contributing to broader social safety.

[0181] An example of a prompt regarding the use of a generative AI model is: "Explain how the system detects illegal activity while considering the user's emotional state. Also, provide examples of situations where emotion recognition is useful." This prompt is used to deepen the understanding of how the generative AI model works and its applications.

[0182] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0183] Step 1:

[0184] The user uses smart glasses to collect real-time video data of their surroundings. This video is acquired from a high-resolution camera sensor. The input data consists of the collected video frames, which are then sent to the next processing step.

[0185] Step 2:

[0186] The device compresses the received video data using a video compression library such as FFMPEG. The compressed data is then sent to the server via the smartphone. In this process, the input is uncompressed video data, and the output is a compressed video file.

[0187] Step 3:

[0188] The server applies artificial intelligence models using tools like TensorFlow and OpenCV to analyze the received video data. It analyzes behavioral patterns in the video and recognizes signs of illegality. The input is compressed video data, and the output is a judgment regarding the abnormality of the behavior.

[0189] Step 4:

[0190] The device collects the user's facial expressions and voice data, and analyzes their emotional state using an emotion engine. Here, the emotional state is classified into categories such as "tension" and "anxiety." The input is the user's facial expression and voice data, and the output is the emotional evaluation result.

[0191] Step 5:

[0192] The server integrates the analysis results and feedback from the emotion engine to make an overall judgment. Specifically, it performs an anomaly severity assessment that takes into account the user's emotional state. The input is the behavioral judgment result and the emotion evaluation result, and the output is the integrated risk assessment result.

[0193] Step 6:

[0194] Based on its assessment, the server sends a notification to the user's smartphone, providing action guidelines if necessary. For example, a notification might appear stating, "You are currently in a dangerous situation. Please evacuate to a safe location." The input is the integrated risk assessment result, and the output is the action guidelines provided to the user.

[0195] Step 7:

[0196] In the event of an emergency, the server reports detailed information, including emotional data, to public authorities. It is designed to enable a rapid response through the reporting system. Input is the assessment of a significant risk, and output is the content of the report to public authorities.

[0197] (Application Example 2)

[0198] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0199] Conventional illegal activity detection systems analyze only video data and do not consider the user's emotional state, which can lead to incorrect judgments or inappropriate notifications depending on the situation. Furthermore, prioritizing user safety requires a flexible approach that takes their emotional state into account. This system aims to solve these problems.

[0200] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0201] In this invention, the server includes a video acquisition device, an analysis device, a decision-making device, a notification device that notifies the user of appropriate warnings, a notification device that notifies public authorities, and an emotion analysis device that analyzes the user's emotional state. This enables more accurate detection and notification of illegal activities that take into account the user's emotional state.

[0202] A "video acquisition device" is a device used to collect video footage of the user's surroundings in real time.

[0203] An "analysis device" is a device used to analyze acquired video footage and evaluate the behavioral patterns and illegality contained within it.

[0204] A "judgment device" is a device that makes a final determination of illegality based on analysis results and the user's emotional state.

[0205] A "notification device" is a device that notifies the user of appropriate warnings or instructions based on the judged results and the user's emotions.

[0206] A "notification device" is a device that, depending on the situation, notifies public authorities to encourage a swift response.

[0207] An "emotion analysis device" is a device that analyzes the user's emotional state from their facial expressions and voice data, and evaluates states such as tension and fear.

[0208] A "communication device" is a device used to compress acquired video information and transmit it to a remote server via a network.

[0209] "Machine learning technology" is a technique that automatically learns patterns and rules based on large amounts of data and uses them for analysis and decision-making.

[0210] A "pre-trained model" is a model that has been trained in advance using data, and is used to analyze behavioral patterns based on video data and emotional states.

[0211] The system for realizing this invention mainly consists of a terminal, a server, and a user. The user first wears an image acquisition device such as smart glasses to collect images of the surroundings in real time. The collected images are subjected to initial compression processing within the terminal and then transmitted to a remote server via a communication device.

[0212] The server performs video analysis and emotion analysis. The analysis device uses a pre-trained model based on machine learning techniques to analyze specific behavioral patterns and facial expressions from the acquired video data. This makes it possible to predict illegal activities and evaluate the user's emotional state. For example, Python's OpenCV and TensorFlow are used for the analysis.

[0213] Next, the server, through the decision-making device, determines what information should be sent to the user based on the analysis results and sentiment data. If action is necessary, the notification device sends a warning to the user's terminal. The notification includes appropriate instructions tailored to the situation, prioritizing the user's safety.

[0214] For example, if a user spots a suspicious person in a crowded place and the emotion analysis device detects a high level of anxiety, the server will take this into consideration and send a warning such as, "Please be careful. Contact public authorities if necessary." In this way, a safer environment is provided for the user.

[0215] Furthermore, the notification system prepares notifications to public authorities as needed, and assists in prompt intervention by providing detailed situation reports, including the user's emotional state.

[0216] The generation AI model might use prompts like the following: "Detect patterns of suspicious individuals through video analysis and generate a notification based on the user's current emotional state."

[0217] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0218] Step 1:

[0219] The user wears smart glasses to collect images of their surroundings. The collected video data is compressed and sent from the device to a server. This data processing enables the transfer of video information. The input is real-world video, and the output is compressed video data.

[0220] Step 2:

[0221] The server inputs the received video data into an analysis device, which uses machine learning techniques to detect specific behavioral patterns. This data processing extracts suspicious behavior from the video. The input is compressed video data, and the output is analyzed behavioral pattern information.

[0222] Step 3:

[0223] The server uses an emotion analyzer to assess the user's emotional state. It analyzes facial expressions and voice data to identify the current emotional state. This calculation reveals levels of tension and fear. The input is the user's facial expressions and voice data, and the output is the emotional state assessment result.

[0224] Step 4:

[0225] Based on the analysis results and emotional state, the server determines the illegality of the actions through a decision-making device. This data calculation determines whether reporting or warning is necessary. The input is behavioral pattern information and emotional evaluation results, and the output is the decision result.

[0226] Step 5:

[0227] To ensure user safety, the server uses a notification device to send appropriate warnings to the terminal. Situation-specific instructions are provided to enable users to respond quickly. The input is the judgment result, and the output is the notification message to the user.

[0228] Step 6:

[0229] If necessary, the server prepares a notification to public authorities using the notification device, attaching a detailed situation report including the user's emotional state. This enables a swift response. The input is the judgment result and emotional state, and the output is the notification message.

[0230] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0231] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0232] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0233] [Second Embodiment]

[0234] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0235] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0236] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0237] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0238] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0239] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0240] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0241] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0242] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0243] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0244] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0245] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0246] This invention is effective when used with video acquisition equipment and an analysis program linked to it, which are used by the user. The embodiments thereof are described below.

[0247] The user first puts on a smart glasses-type device. This device has a built-in camera that can capture video in the user's field of view in real time. After capturing the video, the device compresses the data and sends it to the user's smartphone, which then sends the data to a server.

[0248] The server is responsible for analyzing the received video data. This involves using highly accurate artificial intelligence algorithms to recognize movement and objects within the video and determine whether illegal or disruptive activities are occurring. This decision-making process utilizes advanced pattern recognition technology based on pre-trained datasets.

[0249] Based on the analysis results, the server immediately sends feedback to the user's smartphone. The notification includes details of the detected incident and recommended actions. For example, if an assault is detected, a notification will appear stating, "Your safety may be at risk, so please maintain a safe distance."

[0250] Furthermore, if the server determines through analysis that serious illegal activity has been detected, it will automatically prepare a report to the public authorities. This report will include the type of incident identified, relevant location information, and, if necessary, a portion of the video footage. Users can review the details of this report and, if necessary, manually approve it.

[0251] Thus, the system of the present invention can efficiently and safely support the maintenance of local public order through the cooperation between the user's terminal and the server. For example, if the server determines that someone is illegally creating graffiti in a public place while the user is patrolling a certain area, the user will be immediately notified, and the police will be automatically notified, thereby prompting a swift response. This process makes it possible to contribute to maintaining local order while ensuring the user's safety.

[0252] The following describes the processing flow.

[0253] Step 1:

[0254] When the user activates the system, the device begins acquiring real-time video through the camera on the smart glasses being worn. This collects information within the user's field of vision as digital data.

[0255] Step 2:

[0256] The device transfers the acquired video data to the smartphone via Bluetooth. At this time, the data is appropriately compressed to improve the efficiency of inter-device communication.

[0257] Step 3:

[0258] The server acquires video data received from the user's smartphone. The data is then passed to an analysis module in preparation for immediate analysis.

[0259] Step 4:

[0260] The server begins analyzing the received video using an artificial intelligence algorithm. Here, a trained model is used to recognize objects and movements in the video and evaluate their legality in light of laws and regulations.

[0261] Step 5:

[0262] The server uses the analysis results to determine whether illegal or disruptive behavior has been detected. If the observed values exceed a threshold, the process proceeds to the next step.

[0263] Step 6:

[0264] The server generates feedback to provide to the user based on its assessment. For example, it might create a notification message such as, "Suspicious behavior detected. Please be careful."

[0265] Step 7:

[0266] The terminal notifies the user of feedback sent from the server. Notifications are made via voice and visual interfaces, allowing the user to understand the situation.

[0267] Step 8:

[0268] The server prepares to report to public authorities such as the police if serious illegal activity is detected. The report will include details of the scene, location information, and details of the illegal activity.

[0269] Step 9:

[0270] Users review the report content via their device and, if they have the option to manually approve the report, they do so. This process ensures reliable communication.

[0271] Step 10:

[0272] The reported information is securely transmitted from the server to the appropriate public authorities. This allows for the rapid initiation of necessary on-site responses. The process is then complete.

[0273] (Example 1)

[0274] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0275] In modern society, maintaining public safety requires means to quickly and efficiently detect and respond to illegal and disruptive behavior. However, current technologies have not adequately captured and analyzed real-time video, resulting in delays, especially with massive amounts of data, making rapid response difficult. Furthermore, there has been a lack of coordination between appropriate notifications and automated reporting, even in cases where users themselves may be harmed. It is necessary to solve these problems and support the maintenance of local public safety more efficiently.

[0276] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0277] In this invention, the server includes an image acquisition means, a data processing means for compressing and transmitting the acquired image, and a communication means for transmitting the compressed image data via a network. As a result, real-time information analysis becomes possible, enabling quick notification to the user and, if necessary, quick reporting to public institutions.

[0278] The "image acquisition means" is a device having a function of acquiring in real time an image that enters the user's field of vision.

[0279] The "data processing means" refers to a technology that enables efficient data communication by compressing the acquired image.

[0280] The "communication means" is a system equipped with a function for transmitting the compressed image data to other devices via a network.

[0281] The "analysis means" is a technology that analyzes behavior patterns using high-precision artificial intelligence in the process of analyzing the received image data.

[0282] The "judgment means" is a system having a function of determining whether or not an act in the image has illegality based on the analysis result.

[0283] The "notification means" is a system that provides warnings and information to the user in accordance with the judgment result.

[0284] The "reporting means" is a technology including a process for preparing and transmitting the reporting content to public institutions as necessary.

[0285] The "generated AI model" is a form of artificial intelligence technology for analyzing advanced behavior patterns using a learned dataset.

[0286] The present invention is a system that exhibits its effects by using a video acquisition device worn by a user and an analysis program that cooperates with it. Specifically, the user wears a smart glass-type terminal. This terminal has a high-performance camera built in and can acquire video that enters the user's field of view in real time. By acquiring this video, the user can always grasp the surrounding situation.

[0287] The acquired video data is first compressed in the terminal. An efficient codec is used for this data compression to minimize the required communication volume. The compressed data is then transmitted to the server via the user's smartphone. A secure communication protocol is used for the transmission to ensure the confidentiality of the data.

[0288] The server analyzes the received video data using a generated AI model. This AI model can recognize movements and objects in the video in real time by means of advanced pattern recognition technology. As a result, it is possible to quickly determine whether illegal or nuisance acts are occurring.

[0289] The analysis result is immediately fed back to the user and sent as a direct notification to the smart glass. The notification content describes the details of the recognized event and the recommended next action. Based on this information, the user can respond quickly and safely.

[0290] As a specific example, when the user detects suspicious behavior while walking in the street, the server can analyze the movement, make a determination of illegality, and send a notification to the user saying, "There is a suspicious person. Keep your distance." Also, when it is determined that reporting is necessary, the server automatically prepares the content of the report to the public institution.

[0291] Examples of prompt sentences include, "Please explain the mechanism of a system for detecting suspicious behavior in public places." and, "Please explain the method of performing real-time video analysis using smart glasses."

[0292] Thus, the system of the present invention enables efficient and safe support for maintaining the safety of local communities through the cooperation between a video acquisition device used by the user and a server.

[0293] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0294] Step 1:

[0295] The device acquires video footage that is within the user's field of view.

[0296] The input is real-time video from the user's field of view, and the output is digitized video data. A camera built into the device operates continuously, capturing real-time video along the user's line of sight. This ensures that the user's surroundings are faithfully recorded.

[0297] Step 2:

[0298] The device compresses the acquired video data.

[0299] The input is the raw video data obtained in Step 1, and the output is compressed video data. An efficient codec is used for the compression process to reduce the amount of data, thereby improving communication efficiency. This makes it possible to minimize transmission delay while maintaining video quality.

[0300] Step 3:

[0301] The device sends compressed video data to the smartphone.

[0302] The input is the compressed video data obtained in step 2, and the output is the data transferred to the smartphone. The terminal uses wireless communication such as Bluetooth or Wi-Fi to quickly transmit the data to the smartphone. This process allows the user's smartphone to obtain the data necessary for the next processing step.

[0303] Step 4:

[0304] The smartphone sends the received data to the server.

[0305] The input is the compressed video data stored in the smartphone, and the output is the data sent to the server. The smartphone uses a secure communication protocol (e.g., HTTPS) to ensure the confidentiality and integrity of the data and then sends it to the server. This enables the server to be prepared for analysis.

[0306] Step 5:

[0307] The server analyzes the received video data.

[0308] The input is the compressed video data stored in the server, and the output is the analysis result. The server utilizes the generated AI model to identify dynamic elements and objects within the video through advanced pattern recognition and determine the presence of illegal acts. This analysis process is performed in real-time and yields highly accurate results.

[0309] Step 6:

[0310] The server sends a notification to the user based on the analysis result.

[0311] The input is the analysis result obtained in Step 5, and the output is the notification information for the user. The server sends the risks and recommended actions recognized through the analysis to the user's smartphone and displays them as notifications on the terminal. This allows the user to immediately understand the situation and take appropriate actions.

[0312] Step 7:

[0313] If a serious illegal act is determined, the server prepares for reporting.

[0314] The input is the result of the illegality analysis in Step 5, and the output is the information prepared for reporting. The server aggregates the details of the incident and location information, automatically creates the content of the report to the public authorities, and supports immediate response. Users can manually review and approve the report as needed. This function enables a quick and effective response to danger.

[0315] (Application Example 1)

[0316] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0317] In recent years, as societal demands for safety have increased, there is a need for early detection and rapid response to illegal and disruptive behavior in public places. However, conventional technologies make immediate response on-site difficult, and there are problems with providing appropriate instructions and reports in emergencies. In such a situation, it is difficult to maintain local security while ensuring the safety of users.

[0318] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0319] In this invention, the server includes a video acquisition means, an analysis means for analyzing the acquired video, a determination means for determining illegality based on the analysis results, an instruction generation means for generating instructions for ensuring safety using the user's location information, and an emergency notification means for automatically notifying pre-set contacts in response to an emergency situation. This enables immediate notification of abnormal situations, provides instructions for the user to act safely, and automatically makes emergency notifications as needed.

[0320] A "video acquisition device" is a device that captures the user's surroundings in real time.

[0321] "Analysis means" refers to a function for processing acquired video data and analyzing its content.

[0322] A "judgment tool" is a system that evaluates the legality of actions and situations within a video based on the analysis results.

[0323] "Notification means" refers to a method for informing the user of the analysis and judgment results.

[0324] "Reporting methods" refer to methods for providing information to public institutions as needed.

[0325] The "instruction generation means" is a function that creates instructions to guide the user to take safe actions based on their location information and circumstances.

[0326] An "emergency notification method" is a system that automatically transmits the situation to pre-set contacts in the event of an emergency.

[0327] This invention provides a system implemented by a smart glasses-type terminal worn by the user and a server for analyzing video data acquired from that terminal. This system has the following specific configuration.

[0328] By wearing smart glasses, users can acquire real-time video of their surroundings. These smart glasses are equipped with a built-in camera that captures the user's field of view as video data. The video data is compressed within the smart glasses and then transmitted to the user's smartphone via communication functions. In this process, the smart glasses use common communication technologies such as Bluetooth and Wi-Fi.

[0329] The smartphone sends the received video data to a cloud server. A 4G or 5G network is used for data communication. The server analyzes the received video data using libraries such as TensorFlow and OpenCV. The server uses a pre-trained AI model to identify and analyze movement and objects within the video.

[0330] Based on the analyzed data, the server determines whether illegal or disruptive activity has occurred. If necessary, it immediately notifies the user's smartphone and provides safety instructions. For example, if the server detects violent activity, advice for ensuring the user's safety will be displayed on the smartphone. Furthermore, if the situation is deemed serious, an automatic notification will be sent to registered emergency contacts.

[0331] As a concrete example, if security personnel are patrolling a shopping mall, the system can detect suspicious customer behavior and send a warning. In this case, an example of a prompt given to the generating AI model would be, "Please tell me how to detect suspicious behavior in the shopping mall, alert staff, and report to the management center if necessary." This enables effective security management.

[0332] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0333] Step 1:

[0334] When a user wears smart glasses, the device uses its built-in camera to capture images in the user's field of view in real time. The input is the real-world scenery within the user's field of view, and the output is digital image data. This digital image data is compressed to enable efficient transmission.

[0335] Step 2:

[0336] The compressed video data is transmitted from the terminal to the user's smartphone via Bluetooth or Wi-Fi. The input is the compressed video data, and the output is the compressed data received on the smartphone. The smartphone then prepares the received data to be sent to the server.

[0337] Step 3:

[0338] The smartphone sends compressed video data to a cloud server using a 4G or 5G network. The input is the compressed data on the smartphone, and the output is the compressed data stored on the server. The server receives this data and begins analysis.

[0339] Step 4:

[0340] The server uses libraries such as OpenCV and TensorFlow to analyze the received video data. The input is compressed video data, and the output is the analysis result. The analysis includes a process of identifying abnormal behavior and characteristic movements in the video using a generative AI model.

[0341] Step 5:

[0342] The server determines whether illegal or disruptive behavior is occurring based on the analysis results. The input is the analysis results, and the output is the judgment result. Here, it compares the results with pre-established behavioral patterns and identifies any anomalies.

[0343] Step 6:

[0344] If an anomaly is detected, the server immediately sends a notification to the smartphone. The input is the judgment result, and the output is the content of the notification on the smartphone. This notification includes recommended actions for the user in response to the detected anomaly.

[0345] Step 7:

[0346] The server automatically notifies pre-configured emergency contacts if a serious anomaly is detected. The input is the result of the serious anomaly detection, and the output is the content of the notification sent to the emergency contacts. This includes location information and details about the situation.

[0347] This series of steps allows users to act safely in real time and respond quickly when necessary.

[0348] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0349] This invention provides a system for detecting illegal activities occurring around a user, which achieves more accurate analysis and notification by also considering the user's emotional state. This system is composed of a combination of video acquisition means, analysis means, judgment means, notification means, reporting means, and emotion engine.

[0350] The user wears smart glasses that collect video footage of their surroundings in real time. The device compresses the video data and sends it to a server via a smartphone. On the server, video analysis tools use artificial intelligence to recognize patterns of illegal activity in the video and make a determination of illegality based on that recognition.

[0351] In addition to this process, the device is equipped with an emotion engine that recognizes the user's emotions. The emotion engine evaluates the user's current emotional state based on their facial expressions and voice data, and if a specific emotion such as tension or fear is detected, it provides that information to the analysis system. This allows the analysis system to make a judgment of illegality while also considering the user's emotional state, enabling more contextually appropriate notifications.

[0352] Based on the analysis of illegal activity and feedback from the emotion engine, the server sends a notification to the user's smartphone. This notification includes appropriate action guidelines tailored to the detected illegal activity and the user's own situation. For example, if it is determined that the user is in a state of panic, a notification such as "You are currently in a dangerous situation. Please remain calm, assess the situation, and move to a designated safe location" will be sent.

[0353] Furthermore, the server prepares to report to public authorities if serious illegal activity is detected or if there are concerns about user safety. Detailed situation reports, including information from the emotion engine, are added to the report to enable a swift and appropriate response.

[0354] For example, if the emotion engine determines that a user has witnessed a suspicious person in a crowd and is experiencing heightened anxiety, the server will prioritize the user's safety and immediately contact the police. This process ensures public safety in situations requiring a swift response while maintaining the user's security. By taking the user's emotional state into consideration, this system, which combines emotion recognition with normal illegal activity detection, can provide a more flexible and safer environment.

[0355] The following describes the processing flow.

[0356] Step 1:

[0357] When the user activates the smart glasses, the device acquires real-time video through its camera. This video includes information about the surrounding environment, and the user's entire field of vision is collected as digital data.

[0358] Step 2:

[0359] The terminal compresses the video data and transfers it to the user's smartphone via Bluetooth. This prepares the terminal for efficient data transmission to the server.

[0360] Step 3:

[0361] The server receives video data from the smartphone and prepares it to be passed to the analysis module. Maintaining data quality is the focus at this stage.

[0362] Step 4:

[0363] The server uses an artificial intelligence model as a means of video analysis to recognize patterns that may indicate illegal activity from the received video. This process utilizes motion detection and object recognition technologies.

[0364] Step 5:

[0365] The device uses an emotion engine to evaluate the user's emotional state based on facial expression and voice data. For example, it detects changes in facial expression and quantifies anxiety and stress levels.

[0366] Step 6:

[0367] The analysis method integrates the results of illegal activity detection on the server with user emotion data obtained from the emotion engine. This allows for analysis that is more tailored to the user's current situation.

[0368] Step 7:

[0369] Based on the analysis results, the server generates a notification for the user. The notification includes instructions that take into account the identified risks and the user's emotional state.

[0370] Step 8:

[0371] The device sends the generated notification to the user. The notification is presented to the user visually or audibly and includes specific instructions for action.

[0372] Step 9:

[0373] The server automatically initiates reporting procedures if it determines that serious illegal activity has occurred or that user safety is being threatened. Reports include location information and sentiment engine data.

[0374] Step 10:

[0375] Users are required to take appropriate action based on the information they receive. They can also verify the information they have received. This ensures the user's own safety while enabling the rapid dissemination of information to public authorities.

[0376] (Example 2)

[0377] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0378] Conventional illegal activity detection systems judge the abnormality of behavior without considering the user's emotional state, making it difficult to accurately grasp the degree of danger the user faces. Furthermore, because there are insufficient means to adequately protect the user's psychological safety in the detection of illegal activity, effective notification and reporting in situations where a swift and appropriate response is required is difficult.

[0379] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0380] In this invention, the server includes means for acquiring video data, means for evaluating and integrating user emotional data into analysis, and means for issuing warnings based on the judgment results and reporting to public authorities as necessary. This enables more accurate risk assessment that takes the user's emotional state into account, as well as appropriate notifications and prompt reporting based on those assessments.

[0381] "Video data" refers to information acquired in digital format for recording or transmitting visual information.

[0382] A "communication network" is a network infrastructure used to transmit digital data from one point to another.

[0383] Artificial intelligence is a computer system that uses large amounts of data to learn and enables pattern recognition and prediction.

[0384] "Emotional data" refers to information about a user's psychological state, obtained from their facial expressions, voice, and other sources.

[0385] A "public institution" is an administrative agency or similar organization established for the public good based on laws and regulations.

[0386] "Notification" is the act of formally conveying information to those who have the right to receive it.

[0387] "Analysis" is the process of examining data in detail and extracting meaningful information from it.

[0388] "Abnormal" refers to a situation that deviates from the normal state or expected conditions.

[0389] "Reporting" is the act of systematically organizing facts and circumstances and making them public.

[0390] This invention is a system that uses advanced analytical techniques to ensure user safety and accurate detection of illegal activities. The user wears smart glasses as a means of acquiring video data, thereby collecting surrounding video data in real time. The terminal efficiently compresses this video data and transmits it to a server via a communication network. For data compression, video compression libraries such as FFMPEG are utilized.

[0391] The server uses artificial intelligence technology to analyze video data. Specifically, it utilizes machine learning frameworks such as TensorFlow and OpenCV to analyze behavioral patterns in real time using trained models. This allows for the immediate recognition of illegal or abnormal behavior, and enables appropriate responses based on the results.

[0392] Furthermore, the device is equipped with an emotion engine to evaluate the user's emotions. This engine analyzes the user's facial expressions and voice data to identify their current emotional state. Specific methods used include Facial Emotion Recognition models. The server integrates this emotion data with the analysis results, enabling it to make decisions that take the user's mental state into account.

[0393] Furthermore, if the server detects a danger, it will immediately notify the user and provide appropriate instructions. In cases where the user is in a state of panic, the notification will use specific examples such as, "You are currently in a dangerous situation. Please remain calm, assess the situation, and move to a designated safe location."

[0394] The server automatically reports any serious illegal activity to public authorities. The reports include data obtained from the emotion engine to support a swift and appropriate response. This allows us to ensure user safety while contributing to broader social safety.

[0395] An example of a prompt regarding the use of a generative AI model is: "Explain how the system detects illegal activity while considering the user's emotional state. Also, provide examples of situations where emotion recognition is useful." This prompt is used to deepen the understanding of how the generative AI model works and its applications.

[0396] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0397] Step 1:

[0398] The user uses smart glasses to collect real-time video data of their surroundings. This video is acquired from a high-resolution camera sensor. The input data consists of the collected video frames, which are then sent to the next processing step.

[0399] Step 2:

[0400] The device compresses the received video data using a video compression library such as FFMPEG. The compressed data is then sent to the server via the smartphone. In this process, the input is uncompressed video data, and the output is a compressed video file.

[0401] Step 3:

[0402] The server applies artificial intelligence models using tools like TensorFlow and OpenCV to analyze the received video data. It analyzes behavioral patterns in the video and recognizes signs of illegality. The input is compressed video data, and the output is a judgment regarding the abnormality of the behavior.

[0403] Step 4:

[0404] The device collects the user's facial expressions and voice data, and analyzes their emotional state using an emotion engine. Here, the emotional state is classified into categories such as "tension" and "anxiety." The input is the user's facial expression and voice data, and the output is the emotional evaluation result.

[0405] Step 5:

[0406] The server integrates the analysis results and feedback from the emotion engine to make an overall judgment. Specifically, it performs an anomaly severity assessment that takes into account the user's emotional state. The input is the behavioral judgment result and the emotion evaluation result, and the output is the integrated risk assessment result.

[0407] Step 6:

[0408] Based on its assessment, the server sends a notification to the user's smartphone, providing action guidelines if necessary. For example, a notification might appear stating, "You are currently in a dangerous situation. Please evacuate to a safe location." The input is the integrated risk assessment result, and the output is the action guidelines provided to the user.

[0409] Step 7:

[0410] In the event of an emergency, the server reports detailed information, including emotional data, to public authorities. It is designed to enable a rapid response through the reporting system. Input is the assessment of a significant risk, and output is the content of the report to public authorities.

[0411] (Application Example 2)

[0412] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0413] Conventional illegal activity detection systems analyze only video data and do not consider the user's emotional state, which can lead to incorrect judgments or inappropriate notifications depending on the situation. Furthermore, prioritizing user safety requires a flexible approach that takes their emotional state into account. This system aims to solve these problems.

[0414] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0415] In this invention, the server includes a video acquisition device, an analysis device, a decision-making device, a notification device that notifies the user of appropriate warnings, a notification device that notifies public authorities, and an emotion analysis device that analyzes the user's emotional state. This enables more accurate detection and notification of illegal activities that take into account the user's emotional state.

[0416] A "video acquisition device" is a device used to collect video footage of the user's surroundings in real time.

[0417] An "analysis device" is a device used to analyze acquired video footage and evaluate the behavioral patterns and illegality contained within it.

[0418] A "judgment device" is a device that makes a final determination of illegality based on analysis results and the user's emotional state.

[0419] A "notification device" is a device that notifies the user of appropriate warnings or instructions based on the judged results and the user's emotions.

[0420] A "notification device" is a device that, depending on the situation, notifies public authorities to encourage a swift response.

[0421] An "emotion analysis device" is a device that analyzes the user's emotional state from their facial expressions and voice data, and evaluates states such as tension and fear.

[0422] A "communication device" is a device used to compress acquired video information and transmit it to a remote server via a network.

[0423] "Machine learning technology" is a technique that automatically learns patterns and rules based on large amounts of data and uses them for analysis and decision-making.

[0424] A "pre-trained model" is a model that has been trained in advance using data, and is used to analyze behavioral patterns based on video data and emotional states.

[0425] The system for realizing this invention mainly consists of a terminal, a server, and a user. The user first wears an image acquisition device such as smart glasses to collect images of the surroundings in real time. The collected images are subjected to initial compression processing within the terminal and then transmitted to a remote server via a communication device.

[0426] The server performs video analysis and emotion analysis. The analysis device uses a pre-trained model based on machine learning techniques to analyze specific behavioral patterns and facial expressions from the acquired video data. This makes it possible to predict illegal activities and evaluate the user's emotional state. For example, Python's OpenCV and TensorFlow are used for the analysis.

[0427] Next, the server, through the decision-making device, determines what information should be sent to the user based on the analysis results and sentiment data. If action is necessary, the notification device sends a warning to the user's terminal. The notification includes appropriate instructions tailored to the situation, prioritizing the user's safety.

[0428] For example, if a user spots a suspicious person in a crowded place and the emotion analysis device detects a high level of anxiety, the server will take this into consideration and send a warning such as, "Please be careful. Contact public authorities if necessary." In this way, a safer environment is provided for the user.

[0429] Furthermore, the notification system prepares notifications to public authorities as needed, and assists in prompt intervention by providing detailed situation reports, including the user's emotional state.

[0430] The generation AI model might use prompts like the following: "Detect patterns of suspicious individuals through video analysis and generate a notification based on the user's current emotional state."

[0431] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0432] Step 1:

[0433] The user wears smart glasses to collect images of their surroundings. The collected video data is compressed and sent from the device to a server. This data processing enables the transfer of video information. The input is real-world video, and the output is compressed video data.

[0434] Step 2:

[0435] The server inputs the received video data into an analysis device, which uses machine learning techniques to detect specific behavioral patterns. This data processing extracts suspicious behavior from the video. The input is compressed video data, and the output is analyzed behavioral pattern information.

[0436] Step 3:

[0437] The server uses an emotion analyzer to assess the user's emotional state. It analyzes facial expressions and voice data to identify the current emotional state. This calculation reveals levels of tension and fear. The input is the user's facial expressions and voice data, and the output is the emotional state assessment result.

[0438] Step 4:

[0439] Based on the analysis results and emotional state, the server determines the illegality of the actions through a decision-making device. This data calculation determines whether reporting or warning is necessary. The input is behavioral pattern information and emotional evaluation results, and the output is the decision result.

[0440] Step 5:

[0441] To ensure user safety, the server uses a notification device to send appropriate warnings to the terminal. Situation-specific instructions are provided to enable users to respond quickly. The input is the judgment result, and the output is the notification message to the user.

[0442] Step 6:

[0443] If necessary, the server prepares a notification to public authorities using the notification device, attaching a detailed situation report including the user's emotional state. This enables a swift response. The input is the judgment result and emotional state, and the output is the notification message.

[0444] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0445] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0446] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0447] [Third Embodiment]

[0448] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0449] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0450] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0451] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0452] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0453] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0454] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0455] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0456] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0457] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0458] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0459] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0460] This invention is effective when used with video acquisition equipment and an analysis program linked to it, which are used by the user. The embodiments thereof are described below.

[0461] The user first puts on a smart glasses-type device. This device has a built-in camera that can capture video in the user's field of view in real time. After capturing the video, the device compresses the data and sends it to the user's smartphone, which then sends the data to a server.

[0462] The server is responsible for analyzing the received video data. This involves using highly accurate artificial intelligence algorithms to recognize movement and objects within the video and determine whether illegal or disruptive activities are occurring. This decision-making process utilizes advanced pattern recognition technology based on pre-trained datasets.

[0463] Based on the analysis results, the server immediately sends feedback to the user's smartphone. The notification includes details of the detected incident and recommended actions. For example, if an assault is detected, a notification will appear stating, "Your safety may be at risk, so please maintain a safe distance."

[0464] Furthermore, if the server determines through analysis that serious illegal activity has been detected, it will automatically prepare a report to the public authorities. This report will include the type of incident identified, relevant location information, and, if necessary, a portion of the video footage. Users can review the details of this report and, if necessary, manually approve it.

[0465] Thus, the system of the present invention can efficiently and safely support the maintenance of local public order through the cooperation between the user's terminal and the server. For example, if the server determines that someone is illegally creating graffiti in a public place while the user is patrolling a certain area, the user will be immediately notified, and the police will be automatically notified, thereby prompting a swift response. This process makes it possible to contribute to maintaining local order while ensuring the user's safety.

[0466] The following describes the processing flow.

[0467] Step 1:

[0468] When the user activates the system, the device begins acquiring real-time video through the camera on the smart glasses being worn. This collects information within the user's field of vision as digital data.

[0469] Step 2:

[0470] The device transfers the acquired video data to the smartphone via Bluetooth. At this time, the data is appropriately compressed to improve the efficiency of inter-device communication.

[0471] Step 3:

[0472] The server acquires video data received from the user's smartphone. The data is then passed to an analysis module in preparation for immediate analysis.

[0473] Step 4:

[0474] The server begins analyzing the received video using an artificial intelligence algorithm. Here, a trained model is used to recognize objects and movements in the video and evaluate their legality in light of laws and regulations.

[0475] Step 5:

[0476] The server uses the analysis results to determine whether illegal or disruptive behavior has been detected. If the observed values exceed a threshold, the process proceeds to the next step.

[0477] Step 6:

[0478] The server generates feedback to provide to the user based on its assessment. For example, it might create a notification message such as, "Suspicious behavior detected. Please be careful."

[0479] Step 7:

[0480] The terminal notifies the user of feedback sent from the server. Notifications are made via voice and visual interfaces, allowing the user to understand the situation.

[0481] Step 8:

[0482] The server prepares to report to public authorities such as the police if serious illegal activity is detected. The report will include details of the scene, location information, and details of the illegal activity.

[0483] Step 9:

[0484] Users review the report content via their device and, if they have the option to manually approve the report, they do so. This process ensures reliable communication.

[0485] Step 10:

[0486] The reported information is securely transmitted from the server to the appropriate public authorities. This allows for the rapid initiation of necessary on-site responses. The process is then complete.

[0487] (Example 1)

[0488] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0489] In modern society, maintaining public safety requires means to quickly and efficiently detect and respond to illegal and disruptive behavior. However, current technologies have not adequately captured and analyzed real-time video, resulting in delays, especially with massive amounts of data, making rapid response difficult. Furthermore, there has been a lack of coordination between appropriate notifications and automated reporting, even in cases where users themselves may be harmed. It is necessary to solve these problems and support the maintenance of local public safety more efficiently.

[0490] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0491] In this invention, the server includes video acquisition means, data processing means for compressing and transmitting the acquired video, and communication means for transmitting the compressed video data over a network. This enables real-time information analysis, prompt notification to users, and rapid notification to public authorities as needed.

[0492] "Image acquisition means" refers to a device that has the function of acquiring images that come into the user's field of view in real time.

[0493] "Data processing means" refers to technology that enables efficient data communication by compressing acquired video data.

[0494] "Communication means" refers to a system equipped with the function of transmitting compressed video data to other devices via a network.

[0495] "Analysis means" refers to a technology that uses highly accurate artificial intelligence to analyze behavioral patterns in the process of analyzing received video data.

[0496] A "judgment tool" is a system that has the function of determining whether or not an action in a video is illegal based on the analysis results.

[0497] A "notification method" is a system that provides warnings and information to users based on the results of a decision.

[0498] "Reporting methods" refer to technologies that include the process of preparing and transmitting reports to public institutions as needed.

[0499] A "generative AI model" is a form of artificial intelligence technology that uses pre-trained datasets to perform sophisticated behavioral pattern analysis.

[0500] This invention is a system that achieves its effectiveness by using a video acquisition device worn by the user and an analysis program that works in conjunction with it. Specifically, the user wears smart glasses-type terminals. These terminals have high-performance cameras built in and can acquire images that come into the user's field of view in real time. This image acquisition allows the user to constantly be aware of their surroundings.

[0501] The acquired video data is first compressed on the device. An efficient codec is used for this compression, minimizing the amount of data transmitted. The compressed data is then sent to the server via the user's smartphone. A secure communication protocol is used for transmission, guaranteeing data confidentiality.

[0502] The server analyzes the received video data using a generating AI model. This AI model utilizes advanced pattern recognition technology to recognize movement and objects within the video in real time. As a result, it can quickly determine whether illegal or disruptive activities are occurring.

[0503] The analysis results are immediately fed back to the user and sent directly as a notification to the smart glasses. The notification includes details of the recognized event and recommended next actions. Based on this information, the user can respond quickly and safely.

[0504] For example, if a user is walking around town and detects suspicious behavior, the server can analyze the movement, determine if it is illegal, and send a notification to the user saying, "There is a suspicious person. Please keep your distance." Furthermore, if it is determined that reporting is necessary, the server can automatically prepare the necessary information for reporting to public authorities.

[0505] Examples of prompts include, "Please explain how a system for detecting suspicious behavior in public places works," and "Please explain a method for performing real-time video analysis using smart glasses."

[0506] Thus, the system of the present invention enables efficient and safe support for maintaining the safety of local communities through the cooperation between a video acquisition device used by the user and a server.

[0507] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0508] Step 1:

[0509] The device acquires video footage that is within the user's field of view.

[0510] The input is real-time video from the user's field of view, and the output is digitized video data. A camera built into the device operates continuously, capturing real-time video along the user's line of sight. This ensures that the user's surroundings are faithfully recorded.

[0511] Step 2:

[0512] The device compresses the acquired video data.

[0513] The input is the raw video data obtained in Step 1, and the output is compressed video data. An efficient codec is used for the compression process to reduce the amount of data, thereby improving communication efficiency. This makes it possible to minimize transmission delay while maintaining video quality.

[0514] Step 3:

[0515] The device sends compressed video data to the smartphone.

[0516] The input is the compressed video data obtained in step 2, and the output is the data transferred to the smartphone. The terminal uses wireless communication such as Bluetooth or Wi-Fi to quickly transmit the data to the smartphone. This process allows the user's smartphone to obtain the data necessary for the next processing step.

[0517] Step 4:

[0518] The smartphone sends the received data to the server.

[0519] The input is compressed video data stored on a smartphone, and the output is data sent to a server. The smartphone uses a secure communication protocol (e.g., HTTPS) to ensure the confidentiality and integrity of the data before sending it to the server. This prepares the server for analysis.

[0520] Step 5:

[0521] The server analyzes the received video data.

[0522] The input is compressed video data stored on the server, and the output is the analysis results. The server utilizes a generative AI model and advanced pattern recognition to identify dynamic elements and objects within the video and determine whether or not illegal activity is occurring. This analysis process is performed in real time, resulting in highly accurate results.

[0523] Step 6:

[0524] The server sends a notification to the user based on the analysis results.

[0525] The input is the analysis results obtained in step 5, and the output is notification information for the user. The server sends the dangers and recommended actions identified in the analysis to the user's smartphone, displaying them as notifications on the device. This allows the user to immediately understand the situation and take appropriate action.

[0526] Step 7:

[0527] The server prepares to report if serious illegal activity is detected.

[0528] The input is the result of the illegality analysis in Step 5, and the output is the information prepared for reporting. The server aggregates the details of the incident and location information, automatically creates the content of the report to the public authorities, and supports immediate response. Users can manually review and approve the report as needed. This function enables a quick and effective response to danger.

[0529] (Application Example 1)

[0530] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0531] In recent years, as societal demands for safety have increased, there is a need for early detection and rapid response to illegal and disruptive behavior in public places. However, conventional technologies make immediate response on-site difficult, and there are problems with providing appropriate instructions and reports in emergencies. In such a situation, it is difficult to maintain local security while ensuring the safety of users.

[0532] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0533] In this invention, the server includes a video acquisition means, an analysis means for analyzing the acquired video, a determination means for determining illegality based on the analysis results, an instruction generation means for generating instructions for ensuring safety using the user's location information, and an emergency notification means for automatically notifying pre-set contacts in response to an emergency situation. This enables immediate notification of abnormal situations, provides instructions for the user to act safely, and automatically makes emergency notifications as needed.

[0534] A "video acquisition device" is a device that captures the user's surroundings in real time.

[0535] "Analysis means" refers to a function for processing acquired video data and analyzing its content.

[0536] A "judgment tool" is a system that evaluates the legality of actions and situations within a video based on the analysis results.

[0537] "Notification means" refers to a method for informing the user of the analysis and judgment results.

[0538] "Reporting methods" refer to methods for providing information to public institutions as needed.

[0539] The "instruction generation means" is a function that creates instructions to guide the user to take safe actions based on their location information and circumstances.

[0540] An "emergency notification method" is a system that automatically transmits the situation to pre-set contacts in the event of an emergency.

[0541] This invention provides a system implemented by a smart glasses-type terminal worn by the user and a server for analyzing video data acquired from that terminal. This system has the following specific configuration.

[0542] By wearing smart glasses, users can acquire real-time video of their surroundings. These smart glasses are equipped with a built-in camera that captures the user's field of view as video data. The video data is compressed within the smart glasses and then transmitted to the user's smartphone via communication functions. In this process, the smart glasses use common communication technologies such as Bluetooth and Wi-Fi.

[0543] The smartphone sends the received video data to a cloud server. A 4G or 5G network is used for data communication. The server analyzes the received video data using libraries such as TensorFlow and OpenCV. The server uses a pre-trained AI model to identify and analyze movement and objects within the video.

[0544] Based on the analyzed data, the server determines whether illegal or disruptive activity has occurred. If necessary, it immediately notifies the user's smartphone and provides safety instructions. For example, if the server detects violent activity, advice for ensuring the user's safety will be displayed on the smartphone. Furthermore, if the situation is deemed serious, an automatic notification will be sent to registered emergency contacts.

[0545] As a concrete example, if security personnel are patrolling a shopping mall, the system can detect suspicious customer behavior and send a warning. In this case, an example of a prompt given to the generating AI model would be, "Please tell me how to detect suspicious behavior in the shopping mall, alert staff, and report to the management center if necessary." This enables effective security management.

[0546] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0547] Step 1:

[0548] When a user wears smart glasses, the device uses its built-in camera to capture images in the user's field of view in real time. The input is the real-world scenery within the user's field of view, and the output is digital image data. This digital image data is compressed to enable efficient transmission.

[0549] Step 2:

[0550] The compressed video data is transmitted from the terminal to the user's smartphone via Bluetooth or Wi-Fi. The input is the compressed video data, and the output is the compressed data received on the smartphone. The smartphone then prepares the received data to be sent to the server.

[0551] Step 3:

[0552] The smartphone sends compressed video data to a cloud server using a 4G or 5G network. The input is the compressed data on the smartphone, and the output is the compressed data stored on the server. The server receives this data and begins analysis.

[0553] Step 4:

[0554] The server uses libraries such as OpenCV and TensorFlow to analyze the received video data. The input is compressed video data, and the output is the analysis result. The analysis includes a process of identifying abnormal behavior and characteristic movements in the video using a generative AI model.

[0555] Step 5:

[0556] The server determines whether illegal or disruptive behavior is occurring based on the analysis results. The input is the analysis results, and the output is the judgment result. Here, it compares the results with pre-established behavioral patterns and identifies any anomalies.

[0557] Step 6:

[0558] If an anomaly is detected, the server immediately sends a notification to the smartphone. The input is the judgment result, and the output is the content of the notification on the smartphone. This notification includes recommended actions for the user in response to the detected anomaly.

[0559] Step 7:

[0560] The server automatically notifies pre-configured emergency contacts if a serious anomaly is detected. The input is the result of the serious anomaly detection, and the output is the content of the notification sent to the emergency contacts. This includes location information and details about the situation.

[0561] This series of steps allows users to act safely in real time and respond quickly when necessary.

[0562] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0563] This invention provides a system for detecting illegal activities occurring around a user, which achieves more accurate analysis and notification by also considering the user's emotional state. This system is composed of a combination of video acquisition means, analysis means, judgment means, notification means, reporting means, and emotion engine.

[0564] The user wears smart glasses that collect video footage of their surroundings in real time. The device compresses the video data and sends it to a server via a smartphone. On the server, video analysis tools use artificial intelligence to recognize patterns of illegal activity in the video and make a determination of illegality based on that recognition.

[0565] In addition to this process, the device is equipped with an emotion engine that recognizes the user's emotions. The emotion engine evaluates the user's current emotional state based on their facial expressions and voice data, and if a specific emotion such as tension or fear is detected, it provides that information to the analysis system. This allows the analysis system to make a judgment of illegality while also considering the user's emotional state, enabling more contextually appropriate notifications.

[0566] Based on the analysis of illegal activity and feedback from the emotion engine, the server sends a notification to the user's smartphone. This notification includes appropriate action guidelines tailored to the detected illegal activity and the user's own situation. For example, if it is determined that the user is in a state of panic, a notification such as "You are currently in a dangerous situation. Please remain calm, assess the situation, and move to a designated safe location" will be sent.

[0567] Furthermore, the server prepares to report to public authorities if serious illegal activity is detected or if there are concerns about user safety. Detailed situation reports, including information from the emotion engine, are added to the report to enable a swift and appropriate response.

[0568] For example, if the emotion engine determines that a user has witnessed a suspicious person in a crowd and is experiencing heightened anxiety, the server will prioritize the user's safety and immediately contact the police. This process ensures public safety in situations requiring a swift response while maintaining the user's security. By taking the user's emotional state into consideration, this system, which combines emotion recognition with normal illegal activity detection, can provide a more flexible and safer environment.

[0569] The following describes the processing flow.

[0570] Step 1:

[0571] When the user activates the smart glasses, the device acquires real-time video through its camera. This video includes information about the surrounding environment, and the user's entire field of vision is collected as digital data.

[0572] Step 2:

[0573] The terminal compresses the video data and transfers it to the user's smartphone via Bluetooth. This prepares the terminal for efficient data transmission to the server.

[0574] Step 3:

[0575] The server receives video data from the smartphone and prepares it to be passed to the analysis module. Maintaining data quality is the focus at this stage.

[0576] Step 4:

[0577] The server uses an artificial intelligence model as a means of video analysis to recognize patterns that may indicate illegal activity from the received video. This process utilizes motion detection and object recognition technologies.

[0578] Step 5:

[0579] The device uses an emotion engine to evaluate the user's emotional state based on facial expression and voice data. For example, it detects changes in facial expression and quantifies anxiety and stress levels.

[0580] Step 6:

[0581] The analysis method integrates the results of illegal activity detection on the server with user emotion data obtained from the emotion engine. This allows for analysis that is more tailored to the user's current situation.

[0582] Step 7:

[0583] Based on the analysis results, the server generates a notification for the user. The notification includes instructions that take into account the identified risks and the user's emotional state.

[0584] Step 8:

[0585] The device sends the generated notification to the user. The notification is presented to the user visually or audibly and includes specific instructions for action.

[0586] Step 9:

[0587] The server automatically initiates reporting procedures if it determines that serious illegal activity has occurred or that user safety is being threatened. Reports include location information and sentiment engine data.

[0588] Step 10:

[0589] Users are required to take appropriate action based on the information they receive. They can also verify the information they have received. This ensures the user's own safety while enabling the rapid dissemination of information to public authorities.

[0590] (Example 2)

[0591] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0592] Conventional illegal activity detection systems judge the abnormality of behavior without considering the user's emotional state, making it difficult to accurately grasp the degree of danger the user faces. Furthermore, because there are insufficient means to adequately protect the user's psychological safety in the detection of illegal activity, effective notification and reporting in situations where a swift and appropriate response is required is difficult.

[0593] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0594] In this invention, the server includes means for acquiring video data, means for evaluating and integrating user emotional data into analysis, and means for issuing warnings based on the judgment results and reporting to public authorities as necessary. This enables more accurate risk assessment that takes the user's emotional state into account, as well as appropriate notifications and prompt reporting based on those assessments.

[0595] "Video data" refers to information acquired in digital format for recording or transmitting visual information.

[0596] A "communication network" is a network infrastructure used to transmit digital data from one point to another.

[0597] Artificial intelligence is a computer system that uses large amounts of data to learn and enables pattern recognition and prediction.

[0598] "Emotional data" refers to information about a user's psychological state, obtained from their facial expressions, voice, and other sources.

[0599] A "public institution" is an administrative agency or similar organization established for the public good based on laws and regulations.

[0600] "Notification" is the act of formally conveying information to those who have the right to receive it.

[0601] "Analysis" is the process of examining data in detail and extracting meaningful information from it.

[0602] "Abnormal" refers to a situation that deviates from the normal state or expected conditions.

[0603] "Reporting" is the act of systematically organizing facts and circumstances and making them public.

[0604] This invention is a system that uses advanced analytical techniques to ensure user safety and accurate detection of illegal activities. The user wears smart glasses as a means of acquiring video data, thereby collecting surrounding video data in real time. The terminal efficiently compresses this video data and transmits it to a server via a communication network. For data compression, video compression libraries such as FFMPEG are utilized.

[0605] The server uses artificial intelligence technology to analyze video data. Specifically, it utilizes machine learning frameworks such as TensorFlow and OpenCV to analyze behavioral patterns in real time using trained models. This allows for the immediate recognition of illegal or abnormal behavior, and enables appropriate responses based on the results.

[0606] Furthermore, the device is equipped with an emotion engine to evaluate the user's emotions. This engine analyzes the user's facial expressions and voice data to identify their current emotional state. Specific methods used include Facial Emotion Recognition models. The server integrates this emotion data with the analysis results, enabling it to make decisions that take the user's mental state into account.

[0607] Furthermore, if the server detects a danger, it will immediately notify the user and provide appropriate instructions. In cases where the user is in a state of panic, the notification will use specific examples such as, "You are currently in a dangerous situation. Please remain calm, assess the situation, and move to a designated safe location."

[0608] The server automatically reports any serious illegal activity to public authorities. The reports include data obtained from the emotion engine to support a swift and appropriate response. This allows us to ensure user safety while contributing to broader social safety.

[0609] An example of a prompt regarding the use of a generative AI model is: "Explain how the system detects illegal activity while considering the user's emotional state. Also, provide examples of situations where emotion recognition is useful." This prompt is used to deepen the understanding of how the generative AI model works and its applications.

[0610] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0611] Step 1:

[0612] The user uses smart glasses to collect real-time video data of their surroundings. This video is acquired from a high-resolution camera sensor. The input data consists of the collected video frames, which are then sent to the next processing step.

[0613] Step 2:

[0614] The device compresses the received video data using a video compression library such as FFMPEG. The compressed data is then sent to the server via the smartphone. In this process, the input is uncompressed video data, and the output is a compressed video file.

[0615] Step 3:

[0616] The server applies artificial intelligence models using tools like TensorFlow and OpenCV to analyze the received video data. It analyzes behavioral patterns in the video and recognizes signs of illegality. The input is compressed video data, and the output is a judgment regarding the abnormality of the behavior.

[0617] Step 4:

[0618] The device collects the user's facial expressions and voice data, and analyzes their emotional state using an emotion engine. Here, the emotional state is classified into categories such as "tension" and "anxiety." The input is the user's facial expression and voice data, and the output is the emotional evaluation result.

[0619] Step 5:

[0620] The server integrates the analysis results and feedback from the emotion engine to make an overall judgment. Specifically, it performs an anomaly severity assessment that takes into account the user's emotional state. The input is the behavioral judgment result and the emotion evaluation result, and the output is the integrated risk assessment result.

[0621] Step 6:

[0622] Based on its assessment, the server sends a notification to the user's smartphone, providing action guidelines if necessary. For example, a notification might appear stating, "You are currently in a dangerous situation. Please evacuate to a safe location." The input is the integrated risk assessment result, and the output is the action guidelines provided to the user.

[0623] Step 7:

[0624] In the event of an emergency, the server reports detailed information, including emotional data, to public authorities. It is designed to enable a rapid response through the reporting system. Input is the assessment of a significant risk, and output is the content of the report to public authorities.

[0625] (Application Example 2)

[0626] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0627] Conventional illegal activity detection systems analyze only video data and do not consider the user's emotional state, which can lead to incorrect judgments or inappropriate notifications depending on the situation. Furthermore, prioritizing user safety requires a flexible approach that takes their emotional state into account. This system aims to solve these problems.

[0628] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0629] In this invention, the server includes a video acquisition device, an analysis device, a decision-making device, a notification device that notifies the user of appropriate warnings, a notification device that notifies public authorities, and an emotion analysis device that analyzes the user's emotional state. This enables more accurate detection and notification of illegal activities that take into account the user's emotional state.

[0630] A "video acquisition device" is a device used to collect video footage of the user's surroundings in real time.

[0631] An "analysis device" is a device used to analyze acquired video footage and evaluate the behavioral patterns and illegality contained within it.

[0632] A "judgment device" is a device that makes a final determination of illegality based on analysis results and the user's emotional state.

[0633] A "notification device" is a device that notifies the user of appropriate warnings or instructions based on the judged results and the user's emotions.

[0634] A "notification device" is a device that, depending on the situation, notifies public authorities to encourage a swift response.

[0635] An "emotion analysis device" is a device that analyzes the user's emotional state from their facial expressions and voice data, and evaluates states such as tension and fear.

[0636] A "communication device" is a device used to compress acquired video information and transmit it to a remote server via a network.

[0637] "Machine learning technology" is a technique that automatically learns patterns and rules based on large amounts of data and uses them for analysis and decision-making.

[0638] A "pre-trained model" is a model that has been trained in advance using data, and is used to analyze behavioral patterns based on video data and emotional states.

[0639] The system for realizing this invention mainly consists of a terminal, a server, and a user. The user first wears an image acquisition device such as smart glasses to collect images of the surroundings in real time. The collected images are subjected to initial compression processing within the terminal and then transmitted to a remote server via a communication device.

[0640] The server performs video analysis and emotion analysis. The analysis device uses a pre-trained model based on machine learning techniques to analyze specific behavioral patterns and facial expressions from the acquired video data. This makes it possible to predict illegal activities and evaluate the user's emotional state. For example, Python's OpenCV and TensorFlow are used for the analysis.

[0641] Next, the server, through the decision-making device, determines what information should be sent to the user based on the analysis results and sentiment data. If action is necessary, the notification device sends a warning to the user's terminal. The notification includes appropriate instructions tailored to the situation, prioritizing the user's safety.

[0642] For example, if a user spots a suspicious person in a crowded place and the emotion analysis device detects a high level of anxiety, the server will take this into consideration and send a warning such as, "Please be careful. Contact public authorities if necessary." In this way, a safer environment is provided for the user.

[0643] Furthermore, the notification system prepares notifications to public authorities as needed, and assists in prompt intervention by providing detailed situation reports, including the user's emotional state.

[0644] The generation AI model might use prompts like the following: "Detect patterns of suspicious individuals through video analysis and generate a notification based on the user's current emotional state."

[0645] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0646] Step 1:

[0647] The user wears smart glasses to collect images of their surroundings. The collected video data is compressed and sent from the device to a server. This data processing enables the transfer of video information. The input is real-world video, and the output is compressed video data.

[0648] Step 2:

[0649] The server inputs the received video data into an analysis device, which uses machine learning techniques to detect specific behavioral patterns. This data processing extracts suspicious behavior from the video. The input is compressed video data, and the output is analyzed behavioral pattern information.

[0650] Step 3:

[0651] The server uses an emotion analyzer to assess the user's emotional state. It analyzes facial expressions and voice data to identify the current emotional state. This calculation reveals levels of tension and fear. The input is the user's facial expressions and voice data, and the output is the emotional state assessment result.

[0652] Step 4:

[0653] Based on the analysis results and emotional state, the server determines the illegality of the actions through a decision-making device. This data calculation determines whether reporting or warning is necessary. The input is behavioral pattern information and emotional evaluation results, and the output is the decision result.

[0654] Step 5:

[0655] To ensure user safety, the server uses a notification device to send appropriate warnings to the terminal. Situation-specific instructions are provided to enable users to respond quickly. The input is the judgment result, and the output is the notification message to the user.

[0656] Step 6:

[0657] If necessary, the server prepares a notification to public authorities using the notification device, attaching a detailed situation report including the user's emotional state. This enables a swift response. The input is the judgment result and emotional state, and the output is the notification message.

[0658] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0659] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0660] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0661] [Fourth Embodiment]

[0662] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0663] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0664] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0665] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0666] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0667] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0668] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0669] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0670] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0671] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0672] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0673] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0674] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0675] This invention is effective when used with video acquisition equipment and an analysis program linked to it, which are used by the user. The embodiments thereof are described below.

[0676] The user first puts on a smart glasses-type device. This device has a built-in camera that can capture video in the user's field of view in real time. After capturing the video, the device compresses the data and sends it to the user's smartphone, which then sends the data to a server.

[0677] The server is responsible for analyzing the received video data. This involves using highly accurate artificial intelligence algorithms to recognize movement and objects within the video and determine whether illegal or disruptive activities are occurring. This decision-making process utilizes advanced pattern recognition technology based on pre-trained datasets.

[0678] Based on the analysis results, the server immediately sends feedback to the user's smartphone. The notification includes details of the detected incident and recommended actions. For example, if an assault is detected, a notification will appear stating, "Your safety may be at risk, so please maintain a safe distance."

[0679] Furthermore, if the server determines through analysis that serious illegal activity has been detected, it will automatically prepare a report to the public authorities. This report will include the type of incident identified, relevant location information, and, if necessary, a portion of the video footage. Users can review the details of this report and, if necessary, manually approve it.

[0680] Thus, the system of the present invention can efficiently and safely support the maintenance of local public order through the cooperation between the user's terminal and the server. For example, if the server determines that someone is illegally creating graffiti in a public place while the user is patrolling a certain area, the user will be immediately notified, and the police will be automatically notified, thereby prompting a swift response. This process makes it possible to contribute to maintaining local order while ensuring the user's safety.

[0681] The following describes the processing flow.

[0682] Step 1:

[0683] When the user activates the system, the device begins acquiring real-time video through the camera on the smart glasses being worn. This collects information within the user's field of vision as digital data.

[0684] Step 2:

[0685] The device transfers the acquired video data to the smartphone via Bluetooth. At this time, the data is appropriately compressed to improve the efficiency of inter-device communication.

[0686] Step 3:

[0687] The server acquires video data received from the user's smartphone. The data is then passed to an analysis module in preparation for immediate analysis.

[0688] Step 4:

[0689] The server begins analyzing the received video using an artificial intelligence algorithm. Here, a trained model is used to recognize objects and movements in the video and evaluate their legality in light of laws and regulations.

[0690] Step 5:

[0691] The server uses the analysis results to determine whether illegal or disruptive behavior has been detected. If the observed values exceed a threshold, the process proceeds to the next step.

[0692] Step 6:

[0693] The server generates feedback to provide to the user based on its assessment. For example, it might create a notification message such as, "Suspicious behavior detected. Please be careful."

[0694] Step 7:

[0695] The terminal notifies the user of feedback sent from the server. Notifications are made via voice and visual interfaces, allowing the user to understand the situation.

[0696] Step 8:

[0697] The server prepares to report to public authorities such as the police if serious illegal activity is detected. The report will include details of the scene, location information, and details of the illegal activity.

[0698] Step 9:

[0699] Users review the report content via their device and, if they have the option to manually approve the report, they do so. This process ensures reliable communication.

[0700] Step 10:

[0701] The reported information is securely transmitted from the server to the appropriate public authorities. This allows for the rapid initiation of necessary on-site responses. The process is then complete.

[0702] (Example 1)

[0703] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0704] In modern society, maintaining public safety requires means to quickly and efficiently detect and respond to illegal and disruptive behavior. However, current technologies have not adequately captured and analyzed real-time video, resulting in delays, especially with massive amounts of data, making rapid response difficult. Furthermore, there has been a lack of coordination between appropriate notifications and automated reporting, even in cases where users themselves may be harmed. It is necessary to solve these problems and support the maintenance of local public safety more efficiently.

[0705] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0706] In this invention, the server includes video acquisition means, data processing means for compressing and transmitting the acquired video, and communication means for transmitting the compressed video data over a network. This enables real-time information analysis, prompt notification to users, and rapid notification to public authorities as needed.

[0707] "Image acquisition means" refers to a device that has the function of acquiring images that come into the user's field of view in real time.

[0708] "Data processing means" refers to technology that enables efficient data communication by compressing acquired video data.

[0709] "Communication means" refers to a system equipped with the function of transmitting compressed video data to other devices via a network.

[0710] "Analysis means" refers to a technology that uses highly accurate artificial intelligence to analyze behavioral patterns in the process of analyzing received video data.

[0711] A "judgment tool" is a system that has the function of determining whether or not an action in a video is illegal based on the analysis results.

[0712] A "notification method" is a system that provides warnings and information to users based on the results of a decision.

[0713] "Reporting methods" refer to technologies that include the process of preparing and transmitting reports to public institutions as needed.

[0714] A "generative AI model" is a form of artificial intelligence technology that uses pre-trained datasets to perform sophisticated behavioral pattern analysis.

[0715] This invention is a system that achieves its effectiveness by using a video acquisition device worn by the user and an analysis program that works in conjunction with it. Specifically, the user wears smart glasses-type terminals. These terminals have high-performance cameras built in and can acquire images that come into the user's field of view in real time. This image acquisition allows the user to constantly be aware of their surroundings.

[0716] The acquired video data is first compressed on the device. An efficient codec is used for this compression, minimizing the amount of data transmitted. The compressed data is then sent to the server via the user's smartphone. A secure communication protocol is used for transmission, guaranteeing data confidentiality.

[0717] The server analyzes the received video data using a generating AI model. This AI model utilizes advanced pattern recognition technology to recognize movement and objects within the video in real time. As a result, it can quickly determine whether illegal or disruptive activities are occurring.

[0718] The analysis results are immediately fed back to the user and sent directly as a notification to the smart glasses. The notification includes details of the recognized event and recommended next actions. Based on this information, the user can respond quickly and safely.

[0719] For example, if a user is walking around town and detects suspicious behavior, the server can analyze the movement, determine if it is illegal, and send a notification to the user saying, "There is a suspicious person. Please keep your distance." Furthermore, if it is determined that reporting is necessary, the server can automatically prepare the necessary information for reporting to public authorities.

[0720] Examples of prompts include, "Please explain how a system for detecting suspicious behavior in public places works," and "Please explain a method for performing real-time video analysis using smart glasses."

[0721] Thus, the system of the present invention enables efficient and safe support for maintaining the safety of local communities through the cooperation between a video acquisition device used by the user and a server.

[0722] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0723] Step 1:

[0724] The device acquires video footage that is within the user's field of view.

[0725] The input is real-time video from the user's field of view, and the output is digitized video data. A camera built into the device operates continuously, capturing real-time video along the user's line of sight. This ensures that the user's surroundings are faithfully recorded.

[0726] Step 2:

[0727] The device compresses the acquired video data.

[0728] The input is the raw video data obtained in Step 1, and the output is compressed video data. An efficient codec is used for the compression process to reduce the amount of data, thereby improving communication efficiency. This makes it possible to minimize transmission delay while maintaining video quality.

[0729] Step 3:

[0730] The device sends compressed video data to the smartphone.

[0731] The input is the compressed video data obtained in step 2, and the output is the data transferred to the smartphone. The terminal uses wireless communication such as Bluetooth or Wi-Fi to quickly transmit the data to the smartphone. This process allows the user's smartphone to obtain the data necessary for the next processing step.

[0732] Step 4:

[0733] The smartphone sends the received data to the server.

[0734] The input is compressed video data stored on a smartphone, and the output is data sent to a server. The smartphone uses a secure communication protocol (e.g., HTTPS) to ensure the confidentiality and integrity of the data before sending it to the server. This prepares the server for analysis.

[0735] Step 5:

[0736] The server analyzes the received video data.

[0737] The input is compressed video data stored on the server, and the output is the analysis results. The server utilizes a generative AI model and advanced pattern recognition to identify dynamic elements and objects within the video and determine whether or not illegal activity is occurring. This analysis process is performed in real time, resulting in highly accurate results.

[0738] Step 6:

[0739] The server sends a notification to the user based on the analysis results.

[0740] The input is the analysis results obtained in step 5, and the output is notification information for the user. The server sends the dangers and recommended actions identified in the analysis to the user's smartphone, displaying them as notifications on the device. This allows the user to immediately understand the situation and take appropriate action.

[0741] Step 7:

[0742] The server prepares to report if serious illegal activity is detected.

[0743] The input is the result of the illegality analysis in Step 5, and the output is the information prepared for reporting. The server aggregates the details of the incident and location information, automatically creates the content of the report to the public authorities, and supports immediate response. Users can manually review and approve the report as needed. This function enables a quick and effective response to danger.

[0744] (Application Example 1)

[0745] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0746] In recent years, as societal demands for safety have increased, there is a need for early detection and rapid response to illegal and disruptive behavior in public places. However, conventional technologies make immediate response on-site difficult, and there are problems with providing appropriate instructions and reports in emergencies. In such a situation, it is difficult to maintain local security while ensuring the safety of users.

[0747] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0748] In this invention, the server includes a video acquisition means, an analysis means for analyzing the acquired video, a determination means for determining illegality based on the analysis results, an instruction generation means for generating instructions for ensuring safety using the user's location information, and an emergency notification means for automatically notifying pre-set contacts in response to an emergency situation. This enables immediate notification of abnormal situations, provides instructions for the user to act safely, and automatically makes emergency notifications as needed.

[0749] A "video acquisition device" is a device that captures the user's surroundings in real time.

[0750] "Analysis means" refers to a function for processing acquired video data and analyzing its content.

[0751] A "judgment tool" is a system that evaluates the legality of actions and situations within a video based on the analysis results.

[0752] "Notification means" refers to a method for informing the user of the analysis and judgment results.

[0753] "Reporting methods" refer to methods for providing information to public institutions as needed.

[0754] The "instruction generation means" is a function that creates instructions to guide the user to take safe actions based on their location information and circumstances.

[0755] An "emergency notification method" is a system that automatically transmits the situation to pre-set contacts in the event of an emergency.

[0756] This invention provides a system implemented by a smart glasses-type terminal worn by the user and a server for analyzing video data acquired from that terminal. This system has the following specific configuration.

[0757] By wearing smart glasses, users can acquire real-time video of their surroundings. These smart glasses are equipped with a built-in camera that captures the user's field of view as video data. The video data is compressed within the smart glasses and then transmitted to the user's smartphone via communication functions. In this process, the smart glasses use common communication technologies such as Bluetooth and Wi-Fi.

[0758] The smartphone sends the received video data to a cloud server. A 4G or 5G network is used for data communication. The server analyzes the received video data using libraries such as TensorFlow and OpenCV. The server uses a pre-trained AI model to identify and analyze movement and objects within the video.

[0759] Based on the analyzed data, the server determines whether illegal or disruptive activity has occurred. If necessary, it immediately notifies the user's smartphone and provides safety instructions. For example, if the server detects violent activity, advice for ensuring the user's safety will be displayed on the smartphone. Furthermore, if the situation is deemed serious, an automatic notification will be sent to registered emergency contacts.

[0760] As a concrete example, if security personnel are patrolling a shopping mall, the system can detect suspicious customer behavior and send a warning. In this case, an example of a prompt given to the generating AI model would be, "Please tell me how to detect suspicious behavior in the shopping mall, alert staff, and report to the management center if necessary." This enables effective security management.

[0761] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0762] Step 1:

[0763] When a user wears smart glasses, the device uses its built-in camera to capture images in the user's field of view in real time. The input is the real-world scenery within the user's field of view, and the output is digital image data. This digital image data is compressed to enable efficient transmission.

[0764] Step 2:

[0765] The compressed video data is transmitted from the terminal to the user's smartphone via Bluetooth or Wi-Fi. The input is the compressed video data, and the output is the compressed data received on the smartphone. The smartphone then prepares the received data to be sent to the server.

[0766] Step 3:

[0767] The smartphone sends compressed video data to a cloud server using a 4G or 5G network. The input is the compressed data on the smartphone, and the output is the compressed data stored on the server. The server receives this data and begins analysis.

[0768] Step 4:

[0769] The server uses libraries such as OpenCV and TensorFlow to analyze the received video data. The input is compressed video data, and the output is the analysis result. The analysis includes a process of identifying abnormal behavior and characteristic movements in the video using a generative AI model.

[0770] Step 5:

[0771] The server determines whether illegal or disruptive behavior is occurring based on the analysis results. The input is the analysis results, and the output is the judgment result. Here, it compares the results with pre-established behavioral patterns and identifies any anomalies.

[0772] Step 6:

[0773] If an anomaly is detected, the server immediately sends a notification to the smartphone. The input is the judgment result, and the output is the content of the notification on the smartphone. This notification includes recommended actions for the user in response to the detected anomaly.

[0774] Step 7:

[0775] The server automatically notifies pre-configured emergency contacts if a serious anomaly is detected. The input is the result of the serious anomaly detection, and the output is the content of the notification sent to the emergency contacts. This includes location information and details about the situation.

[0776] This series of steps allows users to act safely in real time and respond quickly when necessary.

[0777] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0778] This invention provides a system for detecting illegal activities occurring around a user, which achieves more accurate analysis and notification by also considering the user's emotional state. This system is composed of a combination of video acquisition means, analysis means, judgment means, notification means, reporting means, and emotion engine.

[0779] The user wears smart glasses that collect video footage of their surroundings in real time. The device compresses the video data and sends it to a server via a smartphone. On the server, video analysis tools use artificial intelligence to recognize patterns of illegal activity in the video and make a determination of illegality based on that recognition.

[0780] In addition to this process, the device is equipped with an emotion engine that recognizes the user's emotions. The emotion engine evaluates the user's current emotional state based on their facial expressions and voice data, and if a specific emotion such as tension or fear is detected, it provides that information to the analysis system. This allows the analysis system to make a judgment of illegality while also considering the user's emotional state, enabling more contextually appropriate notifications.

[0781] Based on the analysis of illegal activity and feedback from the emotion engine, the server sends a notification to the user's smartphone. This notification includes appropriate action guidelines tailored to the detected illegal activity and the user's own situation. For example, if it is determined that the user is in a state of panic, a notification such as "You are currently in a dangerous situation. Please remain calm, assess the situation, and move to a designated safe location" will be sent.

[0782] Furthermore, the server prepares to report to public authorities if serious illegal activity is detected or if there are concerns about user safety. Detailed situation reports, including information from the emotion engine, are added to the report to enable a swift and appropriate response.

[0783] For example, if the emotion engine determines that a user has witnessed a suspicious person in a crowd and is experiencing heightened anxiety, the server will prioritize the user's safety and immediately contact the police. This process ensures public safety in situations requiring a swift response while maintaining the user's security. By taking the user's emotional state into consideration, this system, which combines emotion recognition with normal illegal activity detection, can provide a more flexible and safer environment.

[0784] The following describes the processing flow.

[0785] Step 1:

[0786] When the user activates the smart glasses, the device acquires real-time video through its camera. This video includes information about the surrounding environment, and the user's entire field of vision is collected as digital data.

[0787] Step 2:

[0788] The terminal compresses the video data and transfers it to the user's smartphone via Bluetooth. This prepares the terminal for efficient data transmission to the server.

[0789] Step 3:

[0790] The server receives video data from the smartphone and prepares it to be passed to the analysis module. Maintaining data quality is the focus at this stage.

[0791] Step 4:

[0792] The server uses an artificial intelligence model as a means of video analysis to recognize patterns that may indicate illegal activity from the received video. This process utilizes motion detection and object recognition technologies.

[0793] Step 5:

[0794] The device uses an emotion engine to evaluate the user's emotional state based on facial expression and voice data. For example, it detects changes in facial expression and quantifies anxiety and stress levels.

[0795] Step 6:

[0796] The analysis method integrates the results of illegal activity detection on the server with user emotion data obtained from the emotion engine. This allows for analysis that is more tailored to the user's current situation.

[0797] Step 7:

[0798] Based on the analysis results, the server generates a notification for the user. The notification includes instructions that take into account the identified risks and the user's emotional state.

[0799] Step 8:

[0800] The device sends the generated notification to the user. The notification is presented to the user visually or audibly and includes specific instructions for action.

[0801] Step 9:

[0802] The server automatically initiates reporting procedures if it determines that serious illegal activity has occurred or that user safety is being threatened. Reports include location information and sentiment engine data.

[0803] Step 10:

[0804] Users are required to take appropriate action based on the information they receive. They can also verify the information they have received. This ensures the user's own safety while enabling the rapid dissemination of information to public authorities.

[0805] (Example 2)

[0806] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0807] Conventional illegal activity detection systems judge the abnormality of behavior without considering the user's emotional state, making it difficult to accurately grasp the degree of danger the user faces. Furthermore, because there are insufficient means to adequately protect the user's psychological safety in the detection of illegal activity, effective notification and reporting in situations where a swift and appropriate response is required is difficult.

[0808] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0809] In this invention, the server includes means for acquiring video data, means for evaluating and integrating user emotional data into analysis, and means for issuing warnings based on the judgment results and reporting to public authorities as necessary. This enables more accurate risk assessment that takes the user's emotional state into account, as well as appropriate notifications and prompt reporting based on those assessments.

[0810] "Video data" refers to information acquired in digital format for recording or transmitting visual information.

[0811] A "communication network" is a network infrastructure used to transmit digital data from one point to another.

[0812] Artificial intelligence is a computer system that uses large amounts of data to learn and enables pattern recognition and prediction.

[0813] "Emotional data" refers to information about a user's psychological state, obtained from their facial expressions, voice, and other sources.

[0814] A "public institution" is an administrative agency or similar organization established for the public good based on laws and regulations.

[0815] "Notification" is the act of formally conveying information to those who have the right to receive it.

[0816] "Analysis" is the process of examining data in detail and extracting meaningful information from it.

[0817] "Abnormal" refers to a situation that deviates from the normal state or expected conditions.

[0818] "Reporting" is the act of systematically organizing facts and circumstances and making them public.

[0819] This invention is a system that uses advanced analytical techniques to ensure user safety and accurate detection of illegal activities. The user wears smart glasses as a means of acquiring video data, thereby collecting surrounding video data in real time. The terminal efficiently compresses this video data and transmits it to a server via a communication network. For data compression, video compression libraries such as FFMPEG are utilized.

[0820] The server uses artificial intelligence technology to analyze video data. Specifically, it utilizes machine learning frameworks such as TensorFlow and OpenCV to analyze behavioral patterns in real time using trained models. This allows for the immediate recognition of illegal or abnormal behavior, and enables appropriate responses based on the results.

[0821] Furthermore, the device is equipped with an emotion engine to evaluate the user's emotions. This engine analyzes the user's facial expressions and voice data to identify their current emotional state. Specific methods used include Facial Emotion Recognition models. The server integrates this emotion data with the analysis results, enabling it to make decisions that take the user's mental state into account.

[0822] Furthermore, if the server detects a danger, it will immediately notify the user and provide appropriate instructions. In cases where the user is in a state of panic, the notification will use specific examples such as, "You are currently in a dangerous situation. Please remain calm, assess the situation, and move to a designated safe location."

[0823] The server automatically reports any serious illegal activity to public authorities. The reports include data obtained from the emotion engine to support a swift and appropriate response. This allows us to ensure user safety while contributing to broader social safety.

[0824] An example of a prompt regarding the use of a generative AI model is: "Explain how the system detects illegal activity while considering the user's emotional state. Also, provide examples of situations where emotion recognition is useful." This prompt is used to deepen the understanding of how the generative AI model works and its applications.

[0825] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0826] Step 1:

[0827] The user uses smart glasses to collect real-time video data of their surroundings. This video is acquired from a high-resolution camera sensor. The input data consists of the collected video frames, which are then sent to the next processing step.

[0828] Step 2:

[0829] The device compresses the received video data using a video compression library such as FFMPEG. The compressed data is then sent to the server via the smartphone. In this process, the input is uncompressed video data, and the output is a compressed video file.

[0830] Step 3:

[0831] The server applies artificial intelligence models using tools like TensorFlow and OpenCV to analyze the received video data. It analyzes behavioral patterns in the video and recognizes signs of illegality. The input is compressed video data, and the output is a judgment regarding the abnormality of the behavior.

[0832] Step 4:

[0833] The device collects the user's facial expressions and voice data, and analyzes their emotional state using an emotion engine. Here, the emotional state is classified into categories such as "tension" and "anxiety." The input is the user's facial expression and voice data, and the output is the emotional evaluation result.

[0834] Step 5:

[0835] The server integrates the analysis results and feedback from the emotion engine to make an overall judgment. Specifically, it performs an anomaly severity assessment that takes into account the user's emotional state. The input is the behavioral judgment result and the emotion evaluation result, and the output is the integrated risk assessment result.

[0836] Step 6:

[0837] Based on its assessment, the server sends a notification to the user's smartphone, providing action guidelines if necessary. For example, a notification might appear stating, "You are currently in a dangerous situation. Please evacuate to a safe location." The input is the integrated risk assessment result, and the output is the action guidelines provided to the user.

[0838] Step 7:

[0839] In the event of an emergency, the server reports detailed information, including emotional data, to public authorities. It is designed to enable a rapid response through the reporting system. Input is the assessment of a significant risk, and output is the content of the report to public authorities.

[0840] (Application Example 2)

[0841] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0842] Conventional illegal activity detection systems analyze only video data and do not consider the user's emotional state, which can lead to incorrect judgments or inappropriate notifications depending on the situation. Furthermore, prioritizing user safety requires a flexible approach that takes their emotional state into account. This system aims to solve these problems.

[0843] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0844] In this invention, the server includes a video acquisition device, an analysis device, a decision-making device, a notification device that notifies the user of appropriate warnings, a notification device that notifies public authorities, and an emotion analysis device that analyzes the user's emotional state. This enables more accurate detection and notification of illegal activities that take into account the user's emotional state.

[0845] A "video acquisition device" is a device used to collect video footage of the user's surroundings in real time.

[0846] An "analysis device" is a device used to analyze acquired video footage and evaluate the behavioral patterns and illegality contained within it.

[0847] A "judgment device" is a device that makes a final determination of illegality based on analysis results and the user's emotional state.

[0848] A "notification device" is a device that notifies the user of appropriate warnings or instructions based on the judged results and the user's emotions.

[0849] A "notification device" is a device that, depending on the situation, notifies public authorities to encourage a swift response.

[0850] An "emotion analysis device" is a device that analyzes the user's emotional state from their facial expressions and voice data, and evaluates states such as tension and fear.

[0851] A "communication device" is a device used to compress acquired video information and transmit it to a remote server via a network.

[0852] "Machine learning technology" is a technique that automatically learns patterns and rules based on large amounts of data and uses them for analysis and decision-making.

[0853] A "pre-trained model" is a model that has been trained in advance using data, and is used to analyze behavioral patterns based on video data and emotional states.

[0854] The system for realizing this invention mainly consists of a terminal, a server, and a user. The user first wears an image acquisition device such as smart glasses to collect images of the surroundings in real time. The collected images are subjected to initial compression processing within the terminal and then transmitted to a remote server via a communication device.

[0855] The server performs video analysis and emotion analysis. The analysis device uses a pre-trained model based on machine learning techniques to analyze specific behavioral patterns and facial expressions from the acquired video data. This makes it possible to predict illegal activities and evaluate the user's emotional state. For example, Python's OpenCV and TensorFlow are used for the analysis.

[0856] Next, the server, through the decision-making device, determines what information should be sent to the user based on the analysis results and sentiment data. If action is necessary, the notification device sends a warning to the user's terminal. The notification includes appropriate instructions tailored to the situation, prioritizing the user's safety.

[0857] For example, if a user spots a suspicious person in a crowded place and the emotion analysis device detects a high level of anxiety, the server will take this into consideration and send a warning such as, "Please be careful. Contact public authorities if necessary." In this way, a safer environment is provided for the user.

[0858] Furthermore, the notification system prepares notifications to public authorities as needed, and assists in prompt intervention by providing detailed situation reports, including the user's emotional state.

[0859] The generation AI model might use prompts like the following: "Detect patterns of suspicious individuals through video analysis and generate a notification based on the user's current emotional state."

[0860] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0861] Step 1:

[0862] The user wears smart glasses to collect images of their surroundings. The collected video data is compressed and sent from the device to a server. This data processing enables the transfer of video information. The input is real-world video, and the output is compressed video data.

[0863] Step 2:

[0864] The server inputs the received video data into an analysis device, which uses machine learning techniques to detect specific behavioral patterns. This data processing extracts suspicious behavior from the video. The input is compressed video data, and the output is analyzed behavioral pattern information.

[0865] Step 3:

[0866] The server uses an emotion analyzer to assess the user's emotional state. It analyzes facial expressions and voice data to identify the current emotional state. This calculation reveals levels of tension and fear. The input is the user's facial expressions and voice data, and the output is the emotional state assessment result.

[0867] Step 4:

[0868] Based on the analysis results and emotional state, the server determines the illegality of the actions through a decision-making device. This data calculation determines whether reporting or warning is necessary. The input is behavioral pattern information and emotional evaluation results, and the output is the decision result.

[0869] Step 5:

[0870] To ensure user safety, the server uses a notification device to send appropriate warnings to the terminal. Situation-specific instructions are provided to enable users to respond quickly. The input is the judgment result, and the output is the notification message to the user.

[0871] Step 6:

[0872] If necessary, the server prepares a notification to public authorities using the notification device, attaching a detailed situation report including the user's emotional state. This enables a swift response. The input is the judgment result and emotional state, and the output is the notification message.

[0873] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0874] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0875] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0876] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0877] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0878] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0879] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0880] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0881] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0882] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0883] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0884] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0885] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0886] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0887] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0888] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0889] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0890] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0891] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0892] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0893] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0894] The following is further disclosed regarding the embodiments described above.

[0895] (Claim 1)

[0896] Means of acquiring video,

[0897] An analysis means for analyzing the acquired video,

[0898] A means of determining illegality based on the analysis results,

[0899] A notification mechanism that alerts the user based on the judgment result,

[0900] Based on the notification, the means of reporting to public institutions,

[0901] A system that includes this.

[0902] (Claim 2)

[0903] The system according to claim 1, comprising communication means for compressing acquired video data and transmitting it over a network.

[0904] (Claim 3)

[0905] The system according to claim 1, wherein the analysis means uses artificial intelligence to analyze behavioral patterns using a trained model.

[0906] "Example 1"

[0907] (Claim 1)

[0908] Means of acquiring video,

[0909] A data processing means for compressing and transmitting acquired video,

[0910] A communication method for transmitting compressed video data over a network,

[0911] An analysis method that uses artificial intelligence to analyze received video and analyze behavioral patterns,

[0912] A means of determining illegality based on the analysis results,

[0913] A notification mechanism that alerts the user based on the judgment result,

[0914] Based on the notification, the means of reporting to public institutions,

[0915] A system that includes this.

[0916] (Claim 2)

[0917] The system according to claim 1, comprising a process including manual approval by the user during the preparation stage of a notification.

[0918] (Claim 3)

[0919] The system according to claim 1, wherein the analysis means uses a highly accurate generative AI model and efficiently analyzes behavioral patterns using a trained dataset.

[0920] "Application Example 1"

[0921] (Claim 1)

[0922] Means of acquiring video,

[0923] An analysis means for analyzing the acquired video,

[0924] A means of determining illegality based on the analysis results,

[0925] A notification mechanism that alerts the user based on the judgment result,

[0926] Based on the notification, the means of reporting to public institutions,

[0927] An instruction generation means that generates instructions for ensuring safety using the user's location information,

[0928] An emergency notification system that automatically notifies pre-set contacts depending on the emergency situation,

[0929] A system that includes this.

[0930] (Claim 2)

[0931] The system according to claim 1, comprising communication means for compressing acquired video data and transmitting it over a network.

[0932] (Claim 3)

[0933] The system according to claim 1, wherein the analysis means uses artificial intelligence to analyze behavioral patterns using a trained model.

[0934] "Example 2 of combining an emotion engine"

[0935] (Claim 1)

[0936] Means of acquiring video data,

[0937] A means of compressing acquired video data and transmitting it via a communication network,

[0938] A means of analyzing transmitted video data using artificial intelligence to recognize abnormal behavior,

[0939] A means of acquiring and integrating emotional data into analysis to evaluate the user's state,

[0940] A means for determining abnormalities based on analysis results and emotional data,

[0941] A means of notifying the user of a warning based on the judgment result and supporting the user in taking appropriate action,

[0942] Means of reporting detailed information to public authorities as needed,

[0943] A system that includes this.

[0944] (Claim 2)

[0945] The system according to claim 1, comprising an emotion recognition engine that acquires user emotion data and integrates the emotion data into analysis.

[0946] (Claim 3)

[0947] The system according to claim 1, which, when an abnormality is judged to be serious, promptly reports to public institutions, including emotional data in the content of the report.

[0948] "Application example 2 when combining with an emotional engine"

[0949] (Claim 1)

[0950] Video acquisition device,

[0951] An analysis device for analyzing the acquired video,

[0952] A judgment device that determines illegality based on analysis results and the user's emotional state,

[0953] A notification device that notifies the user of appropriate warnings based on the judgment results and sentiment analysis,

[0954] A notification device that notifies public authorities based on notifications and emotional states,

[0955] A system including an emotion analysis device that analyzes the emotional state of a user.

[0956] (Claim 2)

[0957] The system according to claim 1, comprising a communication device for compressing acquired video information and transmitting it over a network.

[0958] (Claim 3)

[0959] The system according to claim 1, wherein the analysis device uses machine learning technology to analyze behavioral patterns using a trained model. [Explanation of symbols]

[0960] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. Means of acquiring video, An analysis means for analyzing the acquired video, A means of determining illegality based on the analysis results, A notification mechanism that alerts the user based on the judgment result, Based on the notification, the means of reporting to public institutions, A system that includes this.

2. The system according to claim 1, comprising communication means for compressing acquired video data and transmitting it over a network.

3. The system according to claim 1, wherein the analysis means uses artificial intelligence to analyze behavioral patterns using a trained model.