system

The system addresses the challenge of recording and analyzing daily life data by using wearable devices and smart home sensors to automatically generate engaging videos, enhancing user engagement with their life logs.

JP2026100615APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing systems require significant effort and time to record and analyze daily life data, and there is a lack of automated systems that can efficiently generate engaging content using AI technology.

Method used

A system comprising wearable devices, smart home sensors, and a server that collects and processes user activity data to automatically extract important events and generate videos, utilizing multimodal AI for analysis and video editing.

Benefits of technology

Reduces the user's record-keeping burden by efficiently generating professional-quality videos that reflect important life moments, allowing users to easily relive and engage with their experiences.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026100615000001_ABST
    Figure 2026100615000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A device for acquiring user activity data, A device for processing the acquired activity data, A device for analyzing activity data and extracting important events, A device for automatically generating videos based on extracted important events, A device for providing the generated video to the user, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, including the steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of the chatbot's character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In modern society, users hope to record their life logs in detail and reflect on their lives based on them. However, there is a problem that it takes a great deal of time and effort to record daily life and analyze the data. In addition, it is not easy to create attractive content using such data, and an automated system that makes full use of AI technology is required.

Means for Solving the Problems

[0005] This invention reduces the user's record-keeping burden by providing a device for acquiring user activity data and a device for processing and analyzing that data to automatically extract important events. Furthermore, by including a device for automatically generating videos based on the extracted important events, it enables users to efficiently reflect on their lives and provides them with engaging life log content.

[0006] "User activity data" refers to an individual's biological and behavioral information obtained through wearable devices and smart home sensors.

[0007] "Acquisition equipment" refers to a system of hardware and software designed to collect user activity data in real time.

[0008] "Processing equipment" refers to a computer system or software used to organize, classify, and preprocess collected activity data.

[0009] A "device for analysis and extraction of important events" refers to a system that uses statistical methods and artificial intelligence technology to identify noteworthy events from activity data.

[0010] "A device for automatically generating videos" refers to a computer program or system that visualizes important events extracted from user activities and creates videos based on those events.

[0011] "Device for providing" refers to the interface or software used to notify the user of the generated video and provide it in a playable format. [Brief explanation of the drawing]

[0012] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2]This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, when an emotion engine is combined. [Figure 14] This is a sequence diagram showing the processing flow of the data processing system in Application Example 2, which combines an emotion engine. [Modes for carrying out the invention]

[0013] Hereinafter, an example of an embodiment of the system relating to the technology of this disclosure will be described with reference to the attached drawings.

[0014] First, the terms used in the following description will be explained.

[0015] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0016] In the following embodiments, the labeled RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0017] In the following embodiments, the labeled storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0018] In the following embodiments, the labeled communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0019] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0020] [First Embodiment]

[0021] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0022] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0023] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0024] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0025] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0026] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0027] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0028] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0029] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0030] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0031] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0032] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0033] This invention is an advanced system for efficiently recording a user's life log and capturing important moments as video. This system consists of multiple devices and software, providing a mechanism for comprehensive data collection regarding the user's activities and for analyzing that data.

[0034] First, the wearable device, acting as the terminal, continuously acquires the user's biometric and location information. This device can record data such as heart rate, steps taken, and travel routes throughout the day. In addition, smart home sensors complementarily acquire information about the user's environment and analyze voice commands and the surrounding acoustic environment. This allows for a detailed record of the user's daily activities and events.

[0035] Next, the server receives the large dataset sent from the terminal and securely stores it in a database. At this stage, the data is filtered and organized to focus on useful information. The server's built-in multimodal AI engine analyzes the collected data and recognizes unusual behavior and new patterns. For example, if a user runs at an unusual speed or new activity is recorded in a particular location, these are identified as important events.

[0036] The server then utilizes its automated editing function to generate a video based on the recognized key events. The editing algorithm used here maintains professional quality by employing video transitions and effects to construct still images, videos, and audio data into a single story.

[0037] Finally, the generated video is sent to the user in a secure format. The user can view it through a dedicated application and relive past experiences. For example, the user's activities on a day when they try a new dish are recorded, and the process and results are edited into a video, allowing the user to relive the cooking process and see their family enjoying the dish at a later date.

[0038] The above outlines the steps required to implement this invention, demonstrating a new form of daily record-keeping that utilizes the user's life log.

[0039] The following describes the processing flow.

[0040] Step 1:

[0041] The device collects the user's biometric information, voice data, and environmental information using wearable devices and smart home sensors. This includes heart rate, steps taken, applications being used, and voice commands. The collected data is temporarily stored on the device.

[0042] Step 2:

[0043] The device uploads collected data to the server at regular intervals. During this process, the data is encrypted to prevent the identification of individuals and transmitted securely.

[0044] Step 3:

[0045] The server takes in the received data and first performs noise removal and format normalization. This preprocessing ensures that the data necessary for analysis is prepared in a clean state.

[0046] Step 4:

[0047] A multimodal AI within the server analyzes pre-processed data and integrates information from each modality. It detects characteristic events by comparing them against existing data patterns to extract important events from the user's daily life.

[0048] Step 5:

[0049] The server generates a video based on the extracted key events. Narration generated using natural language processing is synchronized with the video, and background music and transitions are appropriately added to create a video with a narrative.

[0050] Step 6:

[0051] The server uploads the generated video file to the user's dedicated secure portal and sends a notification to the user. At this time, access permissions for the video are set, making it viewable only by the user.

[0052] Step 7:

[0053] Users can watch videos through a dedicated application. Furthermore, they can provide ratings and feedback on the videos, and the results can be used for future improvements and AI training.

[0054] (Example 1)

[0055] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0056] In modern society, the amount of data individuals generate in their daily lives is increasing, and there is a need to utilize this data effectively. However, extracting meaningful information from this vast amount of data and recording important personal moments requires advanced technology. The challenge lies in creating a system that allows users to easily record their daily activities, reflect on valuable life events, and generate professional-quality video.

[0057] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0058] In this invention, the server includes a wearable device that continuously acquires the user's biometric and location information, a sensor device that complementarily acquires the user's environmental information, a data processing device for securely storing the acquired data in a database and performing filtering and indexing, an analysis device that analyzes the data using a multimodal AI engine and recognizes abnormal behavior and new activity patterns, a video generation device that automatically edits moving images based on important events and constructs them into a story, and a distribution device for providing videos to the user in a secure manner. This allows important events to be automatically extracted from the user's life log, making it easy for the user to look back on important moments from the past and enjoy them visually.

[0059] "User biometric information" refers to data that indicates an individual's physical condition and activity, such as heart rate and step count.

[0060] "Location information" refers to geographical data used to identify a user's current location and travel route.

[0061] A "wearable device" is a device that a user wears on their body, enabling them to continuously acquire biometric and location information.

[0062] A "sensor device" is a detector or measuring instrument used to acquire complementary information about the user's surrounding environment.

[0063] A "database" is an information storage system built to systematically store, manage, and utilize large amounts of acquired data.

[0064] "Filtering" is the process of extracting important data by removing unnecessary or incorrect information from acquired data.

[0065] "Indexing" is the process of organizing data by adding an index, making it easy to search and access.

[0066] A "multimodal AI engine" is an artificial intelligence technology that integrates and analyzes multiple types of data to recognize abnormal behavior and new activity patterns.

[0067] An "analytical device" is a device that processes data to extract specific information and obtain meaningful results.

[0068] A "video generation device" is a device that extracts important events based on acquired data and automatically constructs a video with a narrative structure.

[0069] A "distribution device" is a communication device or system that provides generated videos to users safely and reliably.

[0070] This invention is an advanced system for efficiently recording a user's life log and generating important moments as videos. The system consists of multiple devices and software, and provides a mechanism for comprehensive data collection and analysis of user activities.

[0071] First, a wearable device, acting as a terminal, continuously acquires the user's biometric and location information. This device can record data such as heart rate, steps taken, and travel routes throughout the day. Commercially available smartwatches and fitness trackers can be used for this process. In addition, smart home sensors complementarily acquire information about the user's environment and analyze voice commands and the surrounding acoustic environment. This allows for a detailed record of the user's daily activities and events.

[0072] Next, the server receives large datasets sent from the terminals and securely stores them in a database. Advanced security protocols and cloud storage systems are used to filter and organize this data, focusing on critical information. The server incorporates a multimodal AI engine that analyzes the collected data in real time, recognizing anomalous behavior and new patterns. AI frameworks and machine learning algorithms are used for data analysis. For example, if a device runs at an unusual speed or new activity is recorded in a specific location, these are identified as important events.

[0073] Subsequently, the server uses video editing software to generate a video based on the recognized key events. The editing algorithm utilizes video transitions and effects to construct still images, video footage, and audio data into a single story. This automated editing process uses existing video editing tools or custom-developed software.

[0074] Finally, the generated video is sent to the user in a secure format. The user can view the video through a dedicated application and relive their past experiences. For example, the user's activities on a day when they try a new dish can be recorded, and the process and results can be edited into a video, allowing them to relive the cooking process and see their family enjoying the dish at a later date.

[0075] An example of a prompt for a generative AI model would be the instruction, "Generate relevant videos based on records of a user performing a new activity at a specific location." This would allow the system to naturally generate valuable video recordings from the user's life log.

[0076] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0077] Step 1:

[0078] The device continuously acquires the user's biometric information (heart rate, steps) and location information using a wearable device. The input is real-time data from the device, which is recorded as a digital signal. The output is a dataset of biometric and location information organized in chronological order. Specifically, the device periodically activates sensors and stores the acquired data in temporary memory.

[0079] Step 2:

[0080] The device uses smart home sensors to acquire voice commands and acoustic data as environmental information. Inputs are ambient sounds and acoustic signals, which are analyzed using voice recognition technology. Outputs are event triggers and behavior estimation data based on the acoustic environment. Specifically, the sensor captures sound, performs noise filtering, and then organizes the results as digital data.

[0081] Step 3:

[0082] The terminal organizes acquired biometric, location, and environmental information as packet data and sends it to the server. The input is an integrated dataset from the sensors, which is encoded and transmitted via a communication module. The output is packet data delivered to the server through a secure communication channel. Specifically, the data packets are encrypted using the SSL / TLS protocol.

[0083] Step 4:

[0084] The server decrypts data received from the terminal and stores it in a database. The input is encrypted data packets, which are decrypted and identified as biometric or environmental information. The output is organized database entries. Specifically, the server-side layer filters the data and imputes missing values.

[0085] Step 5:

[0086] The server analyzes data in the database using a multimodal AI engine. The input is the entire dataset accumulated in the past, and the AI ​​applies anomaly detection algorithms to recognize important events. The output is a list of events that indicate abnormal behavior or new activity patterns. Specifically, a machine learning model is applied to each data point to extract characteristic patterns.

[0087] Step 6:

[0088] The server activates a video generator based on recognized events. The input consists of an event list and associated image and audio data, which are used by video editing software to automatically construct a narrative video. The output is the completed video file. Specifically, the software places video clips on a timeline and combines transitions and effects.

[0089] Step 7:

[0090] The user receives and views videos provided by the server using a dedicated application. The input is a video file from the server, which is decoded and played by the application. The output is the user's meaningful experience of watching the video. Specifically, the user receives a notification, launches the application, and controls the video playback.

[0091] (Application Example 1)

[0092] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0093] In modern cities, there is a need to efficiently collect large amounts of data on residents' activities and utilize it for urban management. However, there is a lack of mechanisms to record the actions of individual residents while maintaining their privacy and to extract meaningful information by linking it to the overall picture of the city. In particular, information processing technologies that can contribute to efficient transportation management and rapid response in emergencies based on behavioral trends are needed.

[0094] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0095] In this invention, the server includes means for acquiring user behavior information, means for processing the acquired behavior information, and means for analyzing the behavior information and extracting important events. This enables understanding behavioral trends across the entire city, allowing for efficient urban management and rapid response.

[0096] "User" refers to an individual or group that uses the system and is the entity that provides behavioral information.

[0097] "Behavioral information" refers to information that includes data on users' movement patterns and daily activities.

[0098] "Means" refers to the device or method used to achieve a specific function.

[0099] "Visual content" refers to media that includes visually recognizable information such as generated videos and images.

[0100] "Efficient urban management" refers to the optimal operation of urban transportation, public services, and infrastructure management.

[0101] The server is the heart of this system and is responsible for processing behavioral information sent by users. This behavioral information is collected through devices such as wearable devices and smartphones, and includes data such as the user's movement patterns and heart rate. These devices may include smartwatches like the Apple Watch and smartphones.

[0102] The server securely stores the received behavioral information in databases such as AWS® RDS. Multimodal AI using TENSORFLOW® is utilized for data processing and calculations, enabling analysis to understand behavioral trends across the entire city. Abnormal behavior and new patterns are detected by AI algorithms, enabling traffic congestion prediction and event impact analysis.

[0103] Users receive the results, generated as visual content, through a smartphone application built with React Native. The app presents information that contributes to the efficient operation of the city and aims to improve the daily lives of residents.

[0104] As a concrete example, on the day a new art museum opens, it is conceivable that residents' museum visit history and public transportation usage information could be recorded and analyzed to propose transportation improvement measures for future events.

[0105] Examples of prompt statements to input into a generative AI model include the following:

[0106] "Analyze the facilities users visited on October 15, 2023, and their subsequent travel patterns, and create a report on the impact of that specific event on the city."

[0107] In this way, a system is realized that contributes to the efficient management of cities and the improvement of the quality of life for residents.

[0108] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0109] Step 1:

[0110] The device continuously acquires user behavior information. Specifically, it uses smartwatches and smartphones to record heart rate, location information, steps taken, and other data. This input data is sent to the cloud for subsequent analysis.

[0111] Step 2:

[0112] The server receives behavioral information sent from the terminal and stores it in a database such as AWS RDS. The input data includes time-series information and location data, and saving this data prepares it for later use in various analyses.

[0113] Step 3:

[0114] The server analyzes behavioral information stored in the database. Here, a generative AI model using TensorFlow is employed to detect abnormal behavior and new patterns from the data. Stored behavioral information is used as input, and the output provides anomaly detection results and behavioral patterns.

[0115] Step 4:

[0116] The server generates visual content based on key insights extracted from the analyzed data. At this time, it summarizes the anomaly detection results and creates a visual design to present them clearly to the user through a React Native application interface.

[0117] Step 5:

[0118] Users receive generated visual content through a smartphone application. The application visualizes behavioral trends across the city and presents suggestions for efficient urban management. This gives users a means to optimize their activities within the city.

[0119] Through the above processing steps, the system analyzes user data and provides information useful for urban management.

[0120] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0121] This invention is a system that meticulously records a user's life log and generates videos based on it, and further includes a function to recognize the user's emotions and optimize the video content. This system consists of multiple devices and software.

[0122] First, the wearable device and camera used as the terminal record the user's physical activity and facial expressions 24 hours a day. This includes not only heart rate, steps taken, and travel route, but also emotion-related data such as facial expressions and tone of voice. The terminal temporarily stores the emotion data along with the activity data and sends it to the server at regular intervals.

[0123] Next, the server stores the received data in a database. This data is analyzed by a multimodal AI that includes an emotion engine. The emotion engine recognizes the user's emotional state in real time through facial expression analysis and voice analysis. This allows for the detection of various emotional states such as sadness, happiness, and surprise.

[0124] After emotional and activity data is analyzed, the server generates a video based on the key events extracted. In this video generation process, emotional states are used in the editing. For example, moments when the user is emotionally aroused are particularly emphasized. Narration generated using natural language processing and selected music are also adjusted according to the emotional state.

[0125] Users can view the generated videos through a dedicated application. The application also provides a function to input feedback after viewing the videos, and this information is used to improve the system. In this way, the present invention provides a means of recording and reflecting on a rich and personalized life log that integrates emotions and activities. Specifically, a video of a user experiencing a sense of accomplishment at a sports event is recorded and edited to complement the emotions conveying that joy. In this way, users can vividly relive those moving moments.

[0126] The following describes the processing flow.

[0127] Step 1:

[0128] The device collects the user's biometric information through a wearable device. Simultaneously, it uses a camera and microphone to acquire the user's facial expressions and voice data. This allows for real-time recording of the user's activities and emotions.

[0129] Step 2:

[0130] The device temporarily stores collected biometric information and emotion-related data. At regular intervals, this data is encrypted and transferred to a server. This transfer is designed to protect user privacy.

[0131] Step 3:

[0132] The server stores the received data in a database. Here, the data is first preprocessed to remove noise and standardize its format. This prepares the data for analysis.

[0133] Step 4:

[0134] The emotion engine on the server analyzes the user's emotional state from facial and voice data. The emotion engine detects and quantifies emotions from subtle changes in facial expression and voice tone. This information is stored together with activity data.

[0135] Step 5:

[0136] The server's multimodal AI integrates biometric and emotional data to extract key events. This process selects characteristic scenes, such as movements that deviate from everyday behavior or heightened emotions.

[0137] Step 6:

[0138] The server automatically generates videos based on the extracted information. Visual effects and music are optimized according to the emotional state. Narration is also generated using natural language processing technology and incorporated into the video.

[0139] Step 7:

[0140] Users receive notifications from the server and watch videos generated through a dedicated application. After watching the videos, users can input feedback into the application, and this feedback is used to improve the AI.

[0141] (Example 2)

[0142] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0143] Conventional life-logging systems can record user activity data and extract important events, but they have limitations in generating personalized content that reflects the user's emotional state. Furthermore, there is a need for a system that can automatically generate videos based on user emotions and activities, and perform real-time emotion recognition and editing during that process. This invention aims to solve these problems.

[0144] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0145] In this invention, the server includes means for acquiring user activity data and emotional data, means for processing the acquired activity data and emotional data, means for analyzing the activity data and emotional data and extracting important events, means for automatically generating videos according to the user's emotional state, means for adjusting music and narration appropriate to the emotion based on the generated video, and means for providing the generated video to the user and obtaining user feedback. This enables the personalized automatic generation of videos that reflect the user's emotional state, allowing the user to have a richer experience.

[0146] A "user" is the individual who utilizes the system and is the subject of collection of their activity data and emotional data.

[0147] "Activity data" refers to information related to a user's physical activity, including data such as heart rate, steps taken, and travel route.

[0148] "Emotional data" refers to information that reflects the user's emotional state, and is based on data such as facial expressions and tone of voice.

[0149] "Key events extracted" refer to particularly noteworthy occurrences or moments within the activity and emotional data.

[0150] "Means for automatically generating videos" refers to a process or device that automatically generates videos tailored to the user using acquired activity data and emotion data.

[0151] An "emotion analysis engine" is software or a system that analyzes a user's facial expressions and voice to recognize the user's emotional state in real time.

[0152] "Personalized content" refers to information or media that is tailored or generated based on the individual user's preferences and characteristics.

[0153] This invention is a system that meticulously records a user's life log and generates videos that take the user's emotions into consideration. This system consists of terminals such as wearable devices and cameras, and a server.

[0154] The device continuously records the user's physical activity and emotional data 24 hours a day. Specifically, the wearable device measures heart rate and steps using heart rate sensors and accelerometers, and tracks movement routes using GPS. It also uses a camera and microphone to analyze the user's facial expressions and voice tone to acquire emotional data. This allows it to capture emotional states such as joy, sadness, and surprise in real time.

[0155] The device sends collected activity and emotion data to the server at regular intervals. The server stores this data in a database and performs emotion analysis using deep learning frameworks such as TensorFlow or PyTorch. The emotion analysis engine recognizes the user's emotional state in real time from the collected facial and voice data and extracts important events by combining them with activity data.

[0156] Based on the extracted key events, the server automatically generates a video using video editing software similar to Adobe Premiere Pro. This video reflects the user's emotional state, and the narration and music are selected to match those emotions. For example, in a scene where the user experiences a sense of accomplishment, that moment is emphasized, and music based on joy is selected.

[0157] Users can view videos generated through a dedicated application. This application also includes a feature that allows users to easily input feedback after viewing, and the system is improved based on this feedback. For example, a prompt message for input to the generation AI model might be, "Please edit the video to highlight the emotional peaks of the past 24 hours."

[0158] In this way, the present invention provides a rich means of life logging and reflection that integrates the user's emotions and activities.

[0159] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0160] Step 1:

[0161] The device collects data on the user's physical activity and emotions. Specifically, the wearable device uses sensors to record heart rate, steps, and movement routes, and uses a camera and microphone to record facial expressions and voice tone. The input consists of the user's real-time physical information and emotional expressions, which are temporarily stored as time-series data within the device.

[0162] Step 2:

[0163] The device transmits collected data to the server at regular intervals. Specifically, it securely transfers data stored on the device to the server in an encrypted format via Wi-Fi or a mobile network. The input consists of activity data and sentiment data from the device, and the output is a dataset stored on the server.

[0164] Step 3:

[0165] The server stores the received data in a database and analyzes it using an emotion analysis engine. Specifically, it uses deep learning frameworks such as TensorFlow and PyTorch to extract the user's emotional state from facial and voice data. The input is activity data and emotion data sent to the server, and the output is data labeled with the user's emotional state.

[0166] Step 4:

[0167] The server extracts important events using the results of sentiment analysis and generates a video based on those results. Specifically, it identifies moments of emotional peaks in the user's daily activities and automatically creates a video based on them using video editing software similar to Adobe Premiere Pro. The input is event data from sentiment analysis, and the output is an edited video file.

[0168] Step 5:

[0169] Users view videos generated through a dedicated application. The application provides users with video playback functionality and also allows them to input feedback such as their impressions and suggestions for improvement after viewing. The input is the generated video, and the output is the user's feedback data.

[0170] Step 6:

[0171] The server collects user feedback and uses it to improve the system. Specifically, it analyzes user viewing history and feedback content, and uses this data to improve the accuracy of emotional expression and editing in future video production. The input is user feedback, and the output is data that serves as an indicator for system improvement.

[0172] (Application Example 2)

[0173] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0174] In recent years, systems utilizing personal life log data have attracted attention, but they are not sufficiently optimized for video editing that reflects emotional states. Furthermore, there is a lack of effective means to highlight particularly moving moments in users' daily lives. Therefore, there is a need for a system that enables the creation of rich videos that incorporate users' emotions.

[0175] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0176] In this invention, the server includes means for processing user activity data and emotional data, means for analyzing the activity data and emotional data and extracting important events, and means for automatically generating videos based on the extracted important events and emotional data and performing emotionally optimized editing. This allows users to richly reminisce about their life logs through videos, and to efficiently record and view particularly emotionally significant moments.

[0177] "User activity data" refers to information about an individual's physical activity, location, and daily behavior.

[0178] "Emotional data" refers to information about an individual's emotional state based on their facial expressions, tone of voice, and other physiological indicators.

[0179] "Processing means" refers to a system that converts or organizes acquired data into an appropriate format before analysis or editing.

[0180] "Means of analysis and extraction of important events" refers to a system that evaluates collected data and selects events that are significant to the user.

[0181] "A means of automatically generating videos and performing emotionally optimized editing" refers to a system that generates and edits videos that take into account an individual's emotional state, based on important events and emotional data.

[0182] "Means of delivery" refers to the interface or mechanism that allows users to access and use the generated content.

[0183] This invention is a system that automatically generates videos based on user activity and emotional data. A wearable device worn by the user and installed cameras continuously record the user's physical activity and emotional data 24 hours a day. This includes heart rate, movement data, facial expressions, and voice tone. This data is temporarily stored on the device and then periodically transmitted to a server.

[0184] The server operates on a high-performance cloud computing environment and stores incoming data in a database. This data is analyzed by an emotion engine utilizing multi-modal AI. The emotion engine analyzes the data using specific algorithms to recognize the user's emotional state. This extracts important events from the data and generates a video that expresses emotions richly. In the editing process, a generative AI model is used to incorporate narration and music that match the emotional state into the video.

[0185] Users view the generated videos through the interface of their smartphones or consumer robots. After watching the videos, users can provide feedback and contribute to further improvements to the system. This feedback is also part of the data analysis process and will be reflected in the next video generation.

[0186] As a concrete example, during a summer family trip, the surprise and joy of a child riding a roller coaster for the first time could be recorded. This would allow the family to vividly relive that moment. On the other hand, an example of a prompt to give instructions to a generative AI model could be in the form of, "Edit the highlight of last week's family trip into an emotionally rich video."

[0187] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0188] Step 1:

[0189] The device collects user activity and emotional data using wearable devices and cameras. Inputs include heart rate, location data, facial expressions, and voice tone. This data is temporarily stored in internal memory to complete the first stage of data collection.

[0190] Step 2:

[0191] The device transfers collected activity and emotion data to the server at regular intervals. It uses data stored in the device's memory as input and outputs it to a cloud service via the internet. This allows the data to be aggregated on the server in real time.

[0192] Step 3:

[0193] The server stores the received data in a database. The input is the transferred user activity and sentiment data, and the output consists of multiple organized data segments stored in the database. At this stage, the stored data is ready for later analysis.

[0194] Step 4:

[0195] The server analyzes data using multi-modal AI and recognizes emotional states with an emotion engine. Input consists of activity and emotion data from a database, and the algorithm analyzes this data to output emotional states (e.g., joy, surprise, sadness). Based on this, the server selects important events.

[0196] Step 5:

[0197] The server automatically generates videos using a generative AI model based on emotional data. The input consists of selected key events and emotional states, and the output is emotionally optimized video content. The generated videos automatically incorporate narration and music based on emotions.

[0198] Step 6:

[0199] Users view videos generated via smartphones or consumer robots. The input is video data provided by a server, and users view the video by displaying it on their device screen. After viewing the video, users can also provide feedback, which will be used to improve future video generation.

[0200] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0201] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0202] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0203] [Second Embodiment]

[0204] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0205] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0206] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0207] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0208] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0209] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0210] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0211] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0212] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0213] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0214] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0215] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0216] This invention is an advanced system for efficiently recording a user's life log and capturing important moments as video. This system consists of multiple devices and software, providing a mechanism for comprehensive data collection regarding the user's activities and for analyzing that data.

[0217] First, the wearable device, acting as the terminal, continuously acquires the user's biometric and location information. This device can record data such as heart rate, steps taken, and travel routes throughout the day. In addition, smart home sensors complementarily acquire information about the user's environment and analyze voice commands and the surrounding acoustic environment. This allows for a detailed record of the user's daily activities and events.

[0218] Next, the server receives the large dataset sent from the terminal and securely stores it in a database. At this stage, the data is filtered and organized to focus on useful information. The server's built-in multimodal AI engine analyzes the collected data and recognizes unusual behavior and new patterns. For example, if a user runs at an unusual speed or new activity is recorded in a particular location, these are identified as important events.

[0219] The server then utilizes its automated editing function to generate a video based on the recognized key events. The editing algorithm used here maintains professional quality by employing video transitions and effects to construct still images, videos, and audio data into a single story.

[0220] Finally, the generated video is sent to the user in a secure format. The user can view it through a dedicated application and relive past experiences. For example, the user's activities on a day when they try a new dish are recorded, and the process and results are edited into a video, allowing the user to relive the cooking process and see their family enjoying the dish at a later date.

[0221] The above outlines the steps required to implement this invention, demonstrating a new form of daily record-keeping that utilizes the user's life log.

[0222] The following describes the processing flow.

[0223] Step 1:

[0224] The device collects the user's biometric information, voice data, and environmental information using wearable devices and smart home sensors. This includes heart rate, steps taken, applications being used, and voice commands. The collected data is temporarily stored on the device.

[0225] Step 2:

[0226] The device uploads collected data to the server at regular intervals. During this process, the data is encrypted to prevent the identification of individuals and transmitted securely.

[0227] Step 3:

[0228] The server takes in the received data and first performs noise removal and format normalization. This preprocessing ensures that the data necessary for analysis is prepared in a clean state.

[0229] Step 4:

[0230] A multimodal AI within the server analyzes pre-processed data and integrates information from each modality. It detects characteristic events by comparing them against existing data patterns to extract important events from the user's daily life.

[0231] Step 5:

[0232] The server generates a video based on the extracted key events. Narration generated using natural language processing is synchronized with the video, and background music and transitions are appropriately added to create a video with a narrative.

[0233] Step 6:

[0234] The server uploads the generated video file to the user's dedicated secure portal and sends a notification to the user. At this time, access permissions for the video are set, making it viewable only by the user.

[0235] Step 7:

[0236] Users can watch videos through a dedicated application. Furthermore, they can provide ratings and feedback on the videos, and the results can be used for future improvements and AI training.

[0237] (Example 1)

[0238] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0239] In modern society, the amount of data individuals generate in their daily lives is increasing, and there is a need to utilize this data effectively. However, extracting meaningful information from this vast amount of data and recording important personal moments requires advanced technology. The challenge lies in creating a system that allows users to easily record their daily activities, reflect on valuable life events, and generate professional-quality video.

[0240] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0241] In this invention, the server includes a wearable device that continuously acquires the user's biometric and location information, a sensor device that complementarily acquires the user's environmental information, a data processing device for securely storing the acquired data in a database and performing filtering and indexing, an analysis device that analyzes the data using a multimodal AI engine and recognizes abnormal behavior and new activity patterns, a video generation device that automatically edits moving images based on important events and constructs them into a story, and a distribution device for providing videos to the user in a secure manner. This allows important events to be automatically extracted from the user's life log, making it easy for the user to look back on important moments from the past and enjoy them visually.

[0242] "User biometric information" refers to data that indicates an individual's physical condition and activity, such as heart rate and step count.

[0243] "Location information" refers to geographical data used to identify a user's current location and travel route.

[0244] A "wearable device" is a device that a user wears on their body, enabling them to continuously acquire biometric and location information.

[0245] A "sensor device" is a detector or measuring instrument used to acquire complementary information about the user's surrounding environment.

[0246] A "database" is an information storage system built to systematically store, manage, and utilize large amounts of acquired data.

[0247] "Filtering" is the process of extracting important data by removing unnecessary or incorrect information from acquired data.

[0248] "Indexing" is the process of organizing data by adding an index, making it easy to search and access.

[0249] A "multimodal AI engine" is an artificial intelligence technology that integrates and analyzes multiple types of data to recognize abnormal behavior and new activity patterns.

[0250] An "analytical device" is a device that processes data to extract specific information and obtain meaningful results.

[0251] A "video generation device" is a device that extracts important events based on acquired data and automatically constructs a video with a narrative structure.

[0252] A "distribution device" is a communication device or system that provides generated videos to users safely and reliably.

[0253] This invention is an advanced system for efficiently recording a user's life log and generating important moments as videos. The system consists of multiple devices and software, and provides a mechanism for comprehensive data collection and analysis of user activities.

[0254] First, a wearable device, acting as a terminal, continuously acquires the user's biometric and location information. This device can record data such as heart rate, steps taken, and travel routes throughout the day. Commercially available smartwatches and fitness trackers can be used for this process. In addition, smart home sensors complementarily acquire information about the user's environment and analyze voice commands and the surrounding acoustic environment. This allows for a detailed record of the user's daily activities and events.

[0255] Next, the server receives large datasets sent from the terminals and securely stores them in a database. Advanced security protocols and cloud storage systems are used to filter and organize this data, focusing on critical information. The server incorporates a multimodal AI engine that analyzes the collected data in real time, recognizing anomalous behavior and new patterns. AI frameworks and machine learning algorithms are used for data analysis. For example, if a device runs at an unusual speed or new activity is recorded in a specific location, these are identified as important events.

[0256] Subsequently, the server uses video editing software to generate a video based on the recognized key events. The editing algorithm utilizes video transitions and effects to construct still images, video footage, and audio data into a single story. This automated editing process uses existing video editing tools or custom-developed software.

[0257] Finally, the generated video is sent to the user in a secure format. The user can view the video through a dedicated application and relive their past experiences. For example, the user's activities on a day when they try a new dish can be recorded, and the process and results can be edited into a video, allowing them to relive the cooking process and see their family enjoying the dish at a later date.

[0258] An example of a prompt for a generative AI model would be the instruction, "Generate relevant videos based on records of a user performing a new activity at a specific location." This would allow the system to naturally generate valuable video recordings from the user's life log.

[0259] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0260] Step 1:

[0261] The device continuously acquires the user's biometric information (heart rate, steps) and location information using a wearable device. The input is real-time data from the device, which is recorded as a digital signal. The output is a dataset of biometric and location information organized in chronological order. Specifically, the device periodically activates sensors and stores the acquired data in temporary memory.

[0262] Step 2:

[0263] The device uses smart home sensors to acquire voice commands and acoustic data as environmental information. Inputs are ambient sounds and acoustic signals, which are analyzed using voice recognition technology. Outputs are event triggers and behavior estimation data based on the acoustic environment. Specifically, the sensor captures sound, performs noise filtering, and then organizes the results as digital data.

[0264] Step 3:

[0265] The terminal organizes acquired biometric, location, and environmental information as packet data and sends it to the server. The input is an integrated dataset from the sensors, which is encoded and transmitted via a communication module. The output is packet data delivered to the server through a secure communication channel. Specifically, the data packets are encrypted using the SSL / TLS protocol.

[0266] Step 4:

[0267] The server decrypts data received from the terminal and stores it in a database. The input is encrypted data packets, which are decrypted and identified as biometric or environmental information. The output is organized database entries. Specifically, the server-side layer filters the data and imputes missing values.

[0268] Step 5:

[0269] The server analyzes data in the database using a multimodal AI engine. The input is the entire dataset accumulated in the past, and the AI ​​applies anomaly detection algorithms to recognize important events. The output is a list of events that indicate abnormal behavior or new activity patterns. Specifically, a machine learning model is applied to each data point to extract characteristic patterns.

[0270] Step 6:

[0271] The server activates a video generator based on recognized events. The input consists of an event list and associated image and audio data, which are used by video editing software to automatically construct a narrative video. The output is the completed video file. Specifically, the software places video clips on a timeline and combines transitions and effects.

[0272] Step 7:

[0273] The user receives and views videos provided by the server using a dedicated application. The input is a video file from the server, which is decoded and played by the application. The output is the user's meaningful experience of watching the video. Specifically, the user receives a notification, launches the application, and controls the video playback.

[0274] (Application Example 1)

[0275] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0276] In modern cities, there is a need to efficiently collect large amounts of data on residents' activities and utilize it for urban management. However, there is a lack of mechanisms to record the actions of individual residents while maintaining their privacy and to extract meaningful information by linking it to the overall picture of the city. In particular, information processing technologies that can contribute to efficient transportation management and rapid response in emergencies based on behavioral trends are needed.

[0277] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0278] In this invention, the server includes means for acquiring user behavior information, means for processing the acquired behavior information, and means for analyzing the behavior information and extracting important events. This enables understanding behavioral trends across the entire city, allowing for efficient urban management and rapid response.

[0279] "User" refers to an individual or group that uses the system and is the entity that provides behavioral information.

[0280] "Behavioral information" refers to information that includes data on users' movement patterns and daily activities.

[0281] "Means" refers to the device or method used to achieve a specific function.

[0282] "Visual content" refers to media that includes visually recognizable information such as generated videos and images.

[0283] "Efficient urban management" refers to the optimal operation of urban transportation, public services, and infrastructure management.

[0284] The server is the center of this system and is responsible for processing the action information sent by users. The action information is collected through terminals such as wearable devices and smartphones, and includes data such as the user's movement pattern and heart rate. These terminals may include smartwatches such as the Apple Watch and smartphones.

[0285] The server securely stores the received action information in a database such as AWS RDS. For data processing and calculation, multimodal AI using TensorFlow is utilized, and based on this, analysis is conducted to understand the behavior trends of the entire city. Detection of abnormal behaviors and new patterns is carried out by the AI algorithm, enabling prediction of traffic congestion and analysis of the impact of events.

[0286] Users receive the results generated as visual content through a smartphone application built with React Native. The application presents information that contributes to the efficient operation of the city and aims to improve the daily lives of residents.

[0287] As a specific example, it is assumed that on the day a museum newly opens, the visit history of residents to the museum and the usage information of public transportation are recorded and analyzed, and traffic improvement measures for future events are proposed.

[0288] Examples of prompt sentences input into the generative AI model include the following.

[0289] "Analyze the facilities visited by the user on October 15, 2023 and their subsequent movement patterns, and create a report on the impact of a specific event on the city."

[0290] In this way, a system is realized that contributes to the efficient operation of the city and the improvement of the quality of residents' daily lives.

[0291] The flow of specific processing in Application Example 1 will be described using Figure 12.

[0292] Step 1:

[0293] The device continuously acquires user behavior information. Specifically, it uses smartwatches and smartphones to record heart rate, location information, steps taken, and other data. This input data is sent to the cloud for subsequent analysis.

[0294] Step 2:

[0295] The server receives behavioral information sent from the terminal and stores it in a database such as AWS RDS. The input data includes time-series information and location data, and saving this data prepares it for later use in various analyses.

[0296] Step 3:

[0297] The server analyzes behavioral information stored in the database. Here, a generative AI model using TensorFlow is employed to detect abnormal behavior and new patterns from the data. Stored behavioral information is used as input, and the output provides anomaly detection results and behavioral patterns.

[0298] Step 4:

[0299] The server generates visual content based on key insights extracted from the analyzed data. At this time, it summarizes the anomaly detection results and creates a visual design to present them clearly to the user through a React Native application interface.

[0300] Step 5:

[0301] Users receive generated visual content through a smartphone application. The application visualizes behavioral trends across the city and presents suggestions for efficient urban management. This gives users a means to optimize their activities within the city.

[0302] Through the above processing steps, the system analyzes the user's data and provides information useful for urban management.

[0303] Furthermore, an emotion engine for estimating the user's emotions may be combined. That is, the specific processing unit 290 may estimate the user's emotions using the emotion identification model 59 and perform specific processing using the user's emotions.

[0304] The present invention is a system that records the user's life log in detail and generates a video based on it, and further has a function of recognizing the user's emotions and optimizing the video content. This system is composed of a plurality of devices and software.

[0305] First, wearable devices and cameras used as terminals record the user's physical activities and expressions for 24 hours. This includes not only heart rate, number of steps, and movement routes, but also data related to emotions such as facial expressions and voice tones. The terminal temporarily stores the emotion data together with the activity data and transmits it to the server at regular intervals.

[0306] Next, the server stores the received data in a database. This data is analyzed by a multimodal AI including an emotion engine. The emotion engine recognizes the user's emotional state in real time through facial expression analysis and voice analysis. As a result, various emotional states such as sad, happy, and surprised are detected.

[0307] After the emotion data and activity data are analyzed, the server generates a video based on the extracted important events. In this video generation process, the emotional state is utilized for editing. For example, moments when the user gets emotionally excited are edited in a particularly emphasized form. The narration generated using natural language processing and the selected music are also adjusted according to the emotional state.

[0308] Users can view the generated videos through a dedicated application. The application also provides a function to input feedback after viewing the videos, and this information is used to improve the system. In this way, the present invention provides a means of recording and reflecting on a rich and personalized life log that integrates emotions and activities. Specifically, a video of a user experiencing a sense of accomplishment at a sports event is recorded and edited to complement the emotions conveying that joy. In this way, users can vividly relive those moving moments.

[0309] The following describes the processing flow.

[0310] Step 1:

[0311] The device collects the user's biometric information through a wearable device. Simultaneously, it uses a camera and microphone to acquire the user's facial expressions and voice data. This allows for real-time recording of the user's activities and emotions.

[0312] Step 2:

[0313] The device temporarily stores collected biometric information and emotion-related data. At regular intervals, this data is encrypted and transferred to a server. This transfer is designed to protect user privacy.

[0314] Step 3:

[0315] The server stores the received data in a database. Here, the data is first preprocessed to remove noise and standardize its format. This prepares the data for analysis.

[0316] Step 4:

[0317] The emotion engine on the server analyzes the user's emotional state from facial and voice data. The emotion engine detects and quantifies emotions from subtle changes in facial expression and voice tone. This information is stored together with activity data.

[0318] Step 5:

[0319] The server's multimodal AI integrates biometric and emotional data to extract key events. This process selects characteristic scenes, such as movements that deviate from everyday behavior or heightened emotions.

[0320] Step 6:

[0321] The server automatically generates videos based on the extracted information. Visual effects and music are optimized according to the emotional state. Narration is also generated using natural language processing technology and incorporated into the video.

[0322] Step 7:

[0323] Users receive notifications from the server and watch videos generated through a dedicated application. After watching the videos, users can input feedback into the application, and this feedback is used to improve the AI.

[0324] (Example 2)

[0325] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0326] Conventional life-logging systems can record user activity data and extract important events, but they have limitations in generating personalized content that reflects the user's emotional state. Furthermore, there is a need for a system that can automatically generate videos based on user emotions and activities, and perform real-time emotion recognition and editing during that process. This invention aims to solve these problems.

[0327] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0328] In this invention, the server includes means for acquiring user activity data and emotional data, means for processing the acquired activity data and emotional data, means for analyzing the activity data and emotional data and extracting important events, means for automatically generating videos according to the user's emotional state, means for adjusting music and narration appropriate to the emotion based on the generated video, and means for providing the generated video to the user and obtaining user feedback. This enables the personalized automatic generation of videos that reflect the user's emotional state, allowing the user to have a richer experience.

[0329] A "user" is the individual who utilizes the system and is the subject of collection of their activity data and emotional data.

[0330] "Activity data" refers to information related to a user's physical activity, including data such as heart rate, steps taken, and travel route.

[0331] "Emotional data" refers to information that reflects the user's emotional state, and is based on data such as facial expressions and tone of voice.

[0332] "Key events extracted" refer to particularly noteworthy occurrences or moments within the activity and emotional data.

[0333] "Means for automatically generating videos" refers to a process or device that automatically generates videos tailored to the user using acquired activity data and emotion data.

[0334] An "emotion analysis engine" is software or a system that analyzes a user's facial expressions and voice to recognize the user's emotional state in real time.

[0335] "Personalized content" refers to information or media that is tailored or generated based on the individual user's preferences and characteristics.

[0336] This invention is a system that meticulously records a user's life log and generates videos that take the user's emotions into consideration. This system consists of terminals such as wearable devices and cameras, and a server.

[0337] The device continuously records the user's physical activity and emotional data 24 hours a day. Specifically, the wearable device measures heart rate and steps using heart rate sensors and accelerometers, and tracks movement routes using GPS. It also uses a camera and microphone to analyze the user's facial expressions and voice tone to acquire emotional data. This allows it to capture emotional states such as joy, sadness, and surprise in real time.

[0338] The device sends collected activity and emotion data to the server at regular intervals. The server stores this data in a database and performs emotion analysis using deep learning frameworks such as TensorFlow or PyTorch. The emotion analysis engine recognizes the user's emotional state in real time from the collected facial and voice data and extracts important events by combining them with activity data.

[0339] Based on the extracted key events, the server automatically generates a video using video editing software similar to Adobe Premiere Pro. This video reflects the user's emotional state, and the narration and music are selected to match those emotions. For example, in a scene where the user experiences a sense of accomplishment, that moment is emphasized, and music based on joy is selected.

[0340] Users can view videos generated through a dedicated application. This application also includes a feature that allows users to easily input feedback after viewing, and the system is improved based on this feedback. For example, a prompt message for input to the generation AI model might be, "Please edit the video to highlight the emotional peaks of the past 24 hours."

[0341] In this way, the present invention provides a rich means of life logging and reflection that integrates the user's emotions and activities.

[0342] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0343] Step 1:

[0344] The device collects data on the user's physical activity and emotions. Specifically, the wearable device uses sensors to record heart rate, steps, and movement routes, and uses a camera and microphone to record facial expressions and voice tone. The input consists of the user's real-time physical information and emotional expressions, which are temporarily stored as time-series data within the device.

[0345] Step 2:

[0346] The device transmits collected data to the server at regular intervals. Specifically, it securely transfers data stored on the device to the server in an encrypted format via Wi-Fi or a mobile network. The input consists of activity data and sentiment data from the device, and the output is a dataset stored on the server.

[0347] Step 3:

[0348] The server stores the received data in a database and analyzes it using an emotion analysis engine. Specifically, it uses deep learning frameworks such as TensorFlow and PyTorch to extract the user's emotional state from facial and voice data. The input is activity data and emotion data sent to the server, and the output is data labeled with the user's emotional state.

[0349] Step 4:

[0350] The server extracts important events using the results of sentiment analysis and generates a video based on those results. Specifically, it identifies moments of emotional peaks in the user's daily activities and automatically creates a video based on them using video editing software similar to Adobe Premiere Pro. The input is event data from sentiment analysis, and the output is an edited video file.

[0351] Step 5:

[0352] Users view videos generated through a dedicated application. The application provides users with video playback functionality and also allows them to input feedback such as their impressions and suggestions for improvement after viewing. The input is the generated video, and the output is the user's feedback data.

[0353] Step 6:

[0354] The server collects user feedback and uses it to improve the system. Specifically, it analyzes user viewing history and feedback content, and uses this data to improve the accuracy of emotional expression and editing in future video production. The input is user feedback, and the output is data that serves as an indicator for system improvement.

[0355] (Application Example 2)

[0356] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0357] In recent years, systems utilizing personal life log data have attracted attention, but they are not sufficiently optimized for video editing that reflects emotional states. Furthermore, there is a lack of effective means to highlight particularly moving moments in users' daily lives. Therefore, there is a need for a system that enables the creation of rich videos that incorporate users' emotions.

[0358] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0359] In this invention, the server includes means for processing user activity data and emotional data, means for analyzing the activity data and emotional data and extracting important events, and means for automatically generating videos based on the extracted important events and emotional data and performing emotionally optimized editing. This allows users to richly reminisce about their life logs through videos, and to efficiently record and view particularly emotionally significant moments.

[0360] "User activity data" refers to information about an individual's physical activity, location, and daily behavior.

[0361] "Emotional data" refers to information about an individual's emotional state based on their facial expressions, tone of voice, and other physiological indicators.

[0362] "Processing means" refers to a system that converts or organizes acquired data into an appropriate format before analysis or editing.

[0363] "Means of analysis and extraction of important events" refers to a system that evaluates collected data and selects events that are significant to the user.

[0364] "A means of automatically generating videos and performing emotionally optimized editing" refers to a system that generates and edits videos that take into account an individual's emotional state, based on important events and emotional data.

[0365] "Means of delivery" refers to the interface or mechanism that allows users to access and use the generated content.

[0366] This invention is a system that automatically generates videos based on user activity and emotional data. A wearable device worn by the user and installed cameras continuously record the user's physical activity and emotional data 24 hours a day. This includes heart rate, movement data, facial expressions, and voice tone. This data is temporarily stored on the device and then periodically transmitted to a server.

[0367] The server operates on a high-performance cloud computing environment and stores incoming data in a database. This data is analyzed by an emotion engine utilizing multi-modal AI. The emotion engine analyzes the data using specific algorithms to recognize the user's emotional state. This extracts important events from the data and generates a video that expresses emotions richly. In the editing process, a generative AI model is used to incorporate narration and music that match the emotional state into the video.

[0368] Users view the generated videos through the interface of their smartphones or consumer robots. After watching the videos, users can provide feedback and contribute to further improvements to the system. This feedback is also part of the data analysis process and will be reflected in the next video generation.

[0369] As a concrete example, during a summer family trip, the surprise and joy of a child riding a roller coaster for the first time could be recorded. This would allow the family to vividly relive that moment. On the other hand, an example of a prompt to give instructions to a generative AI model could be in the form of, "Edit the highlight of last week's family trip into an emotionally rich video."

[0370] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0371] Step 1:

[0372] The device collects user activity and emotional data using wearable devices and cameras. Inputs include heart rate, location data, facial expressions, and voice tone. This data is temporarily stored in internal memory to complete the first stage of data collection.

[0373] Step 2:

[0374] The device transfers collected activity and emotion data to the server at regular intervals. It uses data stored in the device's memory as input and outputs it to a cloud service via the internet. This allows the data to be aggregated on the server in real time.

[0375] Step 3:

[0376] The server stores the received data in a database. The input is the transferred user activity and sentiment data, and the output consists of multiple organized data segments stored in the database. At this stage, the stored data is ready for later analysis.

[0377] Step 4:

[0378] The server analyzes data using multi-modal AI and recognizes emotional states with an emotion engine. Input consists of activity and emotion data from a database, and the algorithm analyzes this data to output emotional states (e.g., joy, surprise, sadness). Based on this, the server selects important events.

[0379] Step 5:

[0380] The server automatically generates videos using a generative AI model based on emotional data. The input consists of selected key events and emotional states, and the output is emotionally optimized video content. The generated videos automatically incorporate narration and music based on emotions.

[0381] Step 6:

[0382] Users view videos generated via smartphones or consumer robots. The input is video data provided by a server, and users view the video by displaying it on their device screen. After viewing the video, users can also provide feedback, which will be used to improve future video generation.

[0383] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0384] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0385] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0386] [Third Embodiment]

[0387] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0388] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0389] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0390] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0391] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0392] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0393] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0394] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0395] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0396] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0397] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0398] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0399] This invention is an advanced system for efficiently recording a user's life log and capturing important moments as video. This system consists of multiple devices and software, providing a mechanism for comprehensive data collection regarding the user's activities and for analyzing that data.

[0400] First, the wearable device, acting as the terminal, continuously acquires the user's biometric and location information. This device can record data such as heart rate, steps taken, and travel routes throughout the day. In addition, smart home sensors complementarily acquire information about the user's environment and analyze voice commands and the surrounding acoustic environment. This allows for a detailed record of the user's daily activities and events.

[0401] Next, the server receives the large dataset sent from the terminal and securely stores it in a database. At this stage, the data is filtered and organized to focus on useful information. The server's built-in multimodal AI engine analyzes the collected data and recognizes unusual behavior and new patterns. For example, if a user runs at an unusual speed or new activity is recorded in a particular location, these are identified as important events.

[0402] The server then utilizes its automated editing function to generate a video based on the recognized key events. The editing algorithm used here maintains professional quality by employing video transitions and effects to construct still images, videos, and audio data into a single story.

[0403] Finally, the generated video is sent to the user in a secure format. The user can view it through a dedicated application and relive past experiences. For example, the user's activities on a day when they try a new dish are recorded, and the process and results are edited into a video, allowing the user to relive the cooking process and see their family enjoying the dish at a later date.

[0404] The above outlines the steps required to implement this invention, demonstrating a new form of daily record-keeping that utilizes the user's life log.

[0405] The following describes the processing flow.

[0406] Step 1:

[0407] The device collects the user's biometric information, voice data, and environmental information using wearable devices and smart home sensors. This includes heart rate, steps taken, applications being used, and voice commands. The collected data is temporarily stored on the device.

[0408] Step 2:

[0409] The device uploads collected data to the server at regular intervals. During this process, the data is encrypted to prevent the identification of individuals and transmitted securely.

[0410] Step 3:

[0411] The server takes in the received data and first performs noise removal and format normalization. This preprocessing ensures that the data necessary for analysis is prepared in a clean state.

[0412] Step 4:

[0413] A multimodal AI within the server analyzes pre-processed data and integrates information from each modality. It detects characteristic events by comparing them against existing data patterns to extract important events from the user's daily life.

[0414] Step 5:

[0415] The server generates a video based on the extracted key events. Narration generated using natural language processing is synchronized with the video, and background music and transitions are appropriately added to create a video with a narrative.

[0416] Step 6:

[0417] The server uploads the generated video file to the user's dedicated secure portal and sends a notification to the user. At this time, access permissions for the video are set, making it viewable only by the user.

[0418] Step 7:

[0419] Users can watch videos through a dedicated application. Furthermore, they can provide ratings and feedback on the videos, and the results can be used for future improvements and AI training.

[0420] (Example 1)

[0421] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0422] In modern society, the amount of data individuals generate in their daily lives is increasing, and there is a need to utilize this data effectively. However, extracting meaningful information from this vast amount of data and recording important personal moments requires advanced technology. The challenge lies in creating a system that allows users to easily record their daily activities, reflect on valuable life events, and generate professional-quality video.

[0423] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0424] In this invention, the server includes a wearable device that continuously acquires the user's biometric and location information, a sensor device that complementarily acquires the user's environmental information, a data processing device for securely storing the acquired data in a database and performing filtering and indexing, an analysis device that analyzes the data using a multimodal AI engine and recognizes abnormal behavior and new activity patterns, a video generation device that automatically edits moving images based on important events and constructs them into a story, and a distribution device for providing videos to the user in a secure manner. This allows important events to be automatically extracted from the user's life log, making it easy for the user to look back on important moments from the past and enjoy them visually.

[0425] "User biometric information" refers to data that indicates an individual's physical condition and activity, such as heart rate and step count.

[0426] "Location information" refers to geographical data used to identify a user's current location and travel route.

[0427] A "wearable device" is a device that a user wears on their body, enabling them to continuously acquire biometric and location information.

[0428] A "sensor device" is a detector or measuring instrument used to acquire complementary information about the user's surrounding environment.

[0429] A "database" is an information storage system built to systematically store, manage, and utilize large amounts of acquired data.

[0430] "Filtering" is the process of extracting important data by removing unnecessary or incorrect information from acquired data.

[0431] "Indexing" is the process of organizing data by adding an index, making it easy to search and access.

[0432] A "multimodal AI engine" is an artificial intelligence technology that integrates and analyzes multiple types of data to recognize abnormal behavior and new activity patterns.

[0433] An "analytical device" is a device that processes data to extract specific information and obtain meaningful results.

[0434] A "video generation device" is a device that extracts important events based on acquired data and automatically constructs a video with a narrative structure.

[0435] A "distribution device" is a communication device or system that provides generated videos to users safely and reliably.

[0436] This invention is an advanced system for efficiently recording a user's life log and generating important moments as videos. The system consists of multiple devices and software, and provides a mechanism for comprehensive data collection and analysis of user activities.

[0437] First, a wearable device, acting as a terminal, continuously acquires the user's biometric and location information. This device can record data such as heart rate, steps taken, and travel routes throughout the day. Commercially available smartwatches and fitness trackers can be used for this process. In addition, smart home sensors complementarily acquire information about the user's environment and analyze voice commands and the surrounding acoustic environment. This allows for a detailed record of the user's daily activities and events.

[0438] Next, the server receives large datasets sent from the terminals and securely stores them in a database. Advanced security protocols and cloud storage systems are used to filter and organize this data, focusing on critical information. The server incorporates a multimodal AI engine that analyzes the collected data in real time, recognizing anomalous behavior and new patterns. AI frameworks and machine learning algorithms are used for data analysis. For example, if a device runs at an unusual speed or new activity is recorded in a specific location, these are identified as important events.

[0439] Subsequently, the server uses video editing software to generate a video based on the recognized key events. The editing algorithm utilizes video transitions and effects to construct still images, video footage, and audio data into a single story. This automated editing process uses existing video editing tools or custom-developed software.

[0440] Finally, the generated video is sent to the user in a secure format. The user can view the video through a dedicated application and relive their past experiences. For example, the user's activities on a day when they try a new dish can be recorded, and the process and results can be edited into a video, allowing them to relive the cooking process and see their family enjoying the dish at a later date.

[0441] An example of a prompt for a generative AI model would be the instruction, "Generate relevant videos based on records of a user performing a new activity at a specific location." This would allow the system to naturally generate valuable video recordings from the user's life log.

[0442] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0443] Step 1:

[0444] The device continuously acquires the user's biometric information (heart rate, steps) and location information using a wearable device. The input is real-time data from the device, which is recorded as a digital signal. The output is a dataset of biometric and location information organized in chronological order. Specifically, the device periodically activates sensors and stores the acquired data in temporary memory.

[0445] Step 2:

[0446] The device uses smart home sensors to acquire voice commands and acoustic data as environmental information. Inputs are ambient sounds and acoustic signals, which are analyzed using voice recognition technology. Outputs are event triggers and behavior estimation data based on the acoustic environment. Specifically, the sensor captures sound, performs noise filtering, and then organizes the results as digital data.

[0447] Step 3:

[0448] The terminal organizes acquired biometric, location, and environmental information as packet data and sends it to the server. The input is an integrated dataset from the sensors, which is encoded and transmitted via a communication module. The output is packet data delivered to the server through a secure communication channel. Specifically, the data packets are encrypted using the SSL / TLS protocol.

[0449] Step 4:

[0450] The server decrypts data received from the terminal and stores it in a database. The input is encrypted data packets, which are decrypted and identified as biometric or environmental information. The output is organized database entries. Specifically, the server-side layer filters the data and imputes missing values.

[0451] Step 5:

[0452] The server analyzes data in the database using a multimodal AI engine. The input is the entire dataset accumulated in the past, and the AI ​​applies anomaly detection algorithms to recognize important events. The output is a list of events that indicate abnormal behavior or new activity patterns. Specifically, a machine learning model is applied to each data point to extract characteristic patterns.

[0453] Step 6:

[0454] The server activates a video generator based on recognized events. The input consists of an event list and associated image and audio data, which are used by video editing software to automatically construct a narrative video. The output is the completed video file. Specifically, the software places video clips on a timeline and combines transitions and effects.

[0455] Step 7:

[0456] The user receives and views videos provided by the server using a dedicated application. The input is a video file from the server, which is decoded and played by the application. The output is the user's meaningful experience of watching the video. Specifically, the user receives a notification, launches the application, and controls the video playback.

[0457] (Application Example 1)

[0458] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0459] In modern cities, there is a need to efficiently collect large amounts of data on residents' activities and utilize it for urban management. However, there is a lack of mechanisms to record the actions of individual residents while maintaining their privacy and to extract meaningful information by linking it to the overall picture of the city. In particular, information processing technologies that can contribute to efficient transportation management and rapid response in emergencies based on behavioral trends are needed.

[0460] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0461] In this invention, the server includes means for acquiring user behavior information, means for processing the acquired behavior information, and means for analyzing the behavior information and extracting important events. This enables understanding behavioral trends across the entire city, allowing for efficient urban management and rapid response.

[0462] "User" refers to an individual or group that uses the system and is the entity that provides behavioral information.

[0463] "Behavioral information" refers to information that includes data on users' movement patterns and daily activities.

[0464] "Means" refers to the device or method used to achieve a specific function.

[0465] "Visual content" refers to media that includes visually recognizable information such as generated videos and images.

[0466] "Efficient urban management" refers to the optimal operation of urban transportation, public services, and infrastructure management.

[0467] The server is the heart of this system and is responsible for processing behavioral information sent by users. This behavioral information is collected through devices such as wearable devices and smartphones, and includes data such as the user's movement patterns and heart rate. These devices may include smartwatches like the Apple Watch and smartphones.

[0468] The server securely stores the received behavioral information in a database such as AWS RDS. Multimodal AI using TensorFlow is utilized for data processing and calculations, enabling analysis to understand behavioral trends across the entire city. Abnormal behavior and new patterns are detected by AI algorithms, making it possible to predict traffic congestion and analyze the impact of events.

[0469] Users receive the results, generated as visual content, through a smartphone application built with React Native. The app presents information that contributes to the efficient operation of the city and aims to improve the daily lives of residents.

[0470] As a concrete example, on the day a new art museum opens, it is conceivable that residents' museum visit history and public transportation usage information could be recorded and analyzed to propose transportation improvement measures for future events.

[0471] Examples of prompt statements to input into a generative AI model include the following:

[0472] "Analyze the facilities users visited on October 15, 2023, and their subsequent travel patterns, and create a report on the impact of that specific event on the city."

[0473] In this way, a system is realized that contributes to the efficient management of cities and the improvement of the quality of life for residents.

[0474] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0475] Step 1:

[0476] The device continuously acquires user behavior information. Specifically, it uses smartwatches and smartphones to record heart rate, location information, steps taken, and other data. This input data is sent to the cloud for subsequent analysis.

[0477] Step 2:

[0478] The server receives behavioral information sent from the terminal and stores it in a database such as AWS RDS. The input data includes time-series information and location data, and saving this data prepares it for later use in various analyses.

[0479] Step 3:

[0480] The server analyzes behavioral information stored in the database. Here, a generative AI model using TensorFlow is employed to detect abnormal behavior and new patterns from the data. Stored behavioral information is used as input, and the output provides anomaly detection results and behavioral patterns.

[0481] Step 4:

[0482] The server generates visual content based on key insights extracted from the analyzed data. At this time, it summarizes the anomaly detection results and creates a visual design to present them clearly to the user through a React Native application interface.

[0483] Step 5:

[0484] Users receive generated visual content through a smartphone application. The application visualizes behavioral trends across the city and presents suggestions for efficient urban management. This gives users a means to optimize their activities within the city.

[0485] Through the above processing steps, the system analyzes user data and provides information useful for urban management.

[0486] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0487] This invention is a system that meticulously records a user's life log and generates videos based on it, and further includes a function to recognize the user's emotions and optimize the video content. This system consists of multiple devices and software.

[0488] First, the wearable device and camera used as the terminal record the user's physical activity and facial expressions 24 hours a day. This includes not only heart rate, steps taken, and travel route, but also emotion-related data such as facial expressions and tone of voice. The terminal temporarily stores the emotion data along with the activity data and sends it to the server at regular intervals.

[0489] Next, the server stores the received data in a database. This data is analyzed by a multimodal AI that includes an emotion engine. The emotion engine recognizes the user's emotional state in real time through facial expression analysis and voice analysis. This allows for the detection of various emotional states such as sadness, happiness, and surprise.

[0490] After emotional and activity data is analyzed, the server generates a video based on the key events extracted. In this video generation process, emotional states are used in the editing. For example, moments when the user is emotionally aroused are particularly emphasized. Narration generated using natural language processing and selected music are also adjusted according to the emotional state.

[0491] Users can view the generated videos through a dedicated application. The application also provides a function to input feedback after viewing the videos, and this information is used to improve the system. In this way, the present invention provides a means of recording and reflecting on a rich and personalized life log that integrates emotions and activities. Specifically, a video of a user experiencing a sense of accomplishment at a sports event is recorded and edited to complement the emotions conveying that joy. In this way, users can vividly relive those moving moments.

[0492] The following describes the processing flow.

[0493] Step 1:

[0494] The device collects the user's biometric information through a wearable device. Simultaneously, it uses a camera and microphone to acquire the user's facial expressions and voice data. This allows for real-time recording of the user's activities and emotions.

[0495] Step 2:

[0496] The device temporarily stores collected biometric information and emotion-related data. At regular intervals, this data is encrypted and transferred to a server. This transfer is designed to protect user privacy.

[0497] Step 3:

[0498] The server stores the received data in a database. Here, the data is first preprocessed to remove noise and standardize its format. This prepares the data for analysis.

[0499] Step 4:

[0500] The emotion engine on the server analyzes the user's emotional state from facial and voice data. The emotion engine detects and quantifies emotions from subtle changes in facial expression and voice tone. This information is stored together with activity data.

[0501] Step 5:

[0502] The server's multimodal AI integrates biometric and emotional data to extract key events. This process selects characteristic scenes, such as movements that deviate from everyday behavior or heightened emotions.

[0503] Step 6:

[0504] The server automatically generates videos based on the extracted information. Visual effects and music are optimized according to the emotional state. Narration is also generated using natural language processing technology and incorporated into the video.

[0505] Step 7:

[0506] Users receive notifications from the server and watch videos generated through a dedicated application. After watching the videos, users can input feedback into the application, and this feedback is used to improve the AI.

[0507] (Example 2)

[0508] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0509] Conventional life-logging systems can record user activity data and extract important events, but they have limitations in generating personalized content that reflects the user's emotional state. Furthermore, there is a need for a system that can automatically generate videos based on user emotions and activities, and perform real-time emotion recognition and editing during that process. This invention aims to solve these problems.

[0510] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0511] In this invention, the server includes means for acquiring user activity data and emotional data, means for processing the acquired activity data and emotional data, means for analyzing the activity data and emotional data and extracting important events, means for automatically generating videos according to the user's emotional state, means for adjusting music and narration appropriate to the emotion based on the generated video, and means for providing the generated video to the user and obtaining user feedback. This enables the personalized automatic generation of videos that reflect the user's emotional state, allowing the user to have a richer experience.

[0512] A "user" is the individual who utilizes the system and is the subject of collection of their activity data and emotional data.

[0513] "Activity data" refers to information related to a user's physical activity, including data such as heart rate, steps taken, and travel route.

[0514] "Emotional data" refers to information that reflects the user's emotional state, and is based on data such as facial expressions and tone of voice.

[0515] "Key events extracted" refer to particularly noteworthy occurrences or moments within the activity and emotional data.

[0516] "Means for automatically generating videos" refers to a process or device that automatically generates videos tailored to the user using acquired activity data and emotion data.

[0517] An "emotion analysis engine" is software or a system that analyzes a user's facial expressions and voice to recognize the user's emotional state in real time.

[0518] "Personalized content" refers to information or media that is tailored or generated based on the individual user's preferences and characteristics.

[0519] This invention is a system that meticulously records a user's life log and generates videos that take the user's emotions into consideration. This system consists of terminals such as wearable devices and cameras, and a server.

[0520] The device continuously records the user's physical activity and emotional data 24 hours a day. Specifically, the wearable device measures heart rate and steps using heart rate sensors and accelerometers, and tracks movement routes using GPS. It also uses a camera and microphone to analyze the user's facial expressions and voice tone to acquire emotional data. This allows it to capture emotional states such as joy, sadness, and surprise in real time.

[0521] The device sends collected activity and emotion data to the server at regular intervals. The server stores this data in a database and performs emotion analysis using deep learning frameworks such as TensorFlow or PyTorch. The emotion analysis engine recognizes the user's emotional state in real time from the collected facial and voice data and extracts important events by combining them with activity data.

[0522] Based on the extracted key events, the server automatically generates a video using video editing software similar to Adobe Premiere Pro. This video reflects the user's emotional state, and the narration and music are selected to match those emotions. For example, in a scene where the user experiences a sense of accomplishment, that moment is emphasized, and music based on joy is selected.

[0523] Users can view videos generated through a dedicated application. This application also includes a feature that allows users to easily input feedback after viewing, and the system is improved based on this feedback. For example, a prompt message for input to the generation AI model might be, "Please edit the video to highlight the emotional peaks of the past 24 hours."

[0524] In this way, the present invention provides a rich means of life logging and reflection that integrates the user's emotions and activities.

[0525] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0526] Step 1:

[0527] The device collects data on the user's physical activity and emotions. Specifically, the wearable device uses sensors to record heart rate, steps, and movement routes, and uses a camera and microphone to record facial expressions and voice tone. The input consists of the user's real-time physical information and emotional expressions, which are temporarily stored as time-series data within the device.

[0528] Step 2:

[0529] The device transmits collected data to the server at regular intervals. Specifically, it securely transfers data stored on the device to the server in an encrypted format via Wi-Fi or a mobile network. The input consists of activity data and sentiment data from the device, and the output is a dataset stored on the server.

[0530] Step 3:

[0531] The server stores the received data in a database and analyzes it using an emotion analysis engine. Specifically, it uses deep learning frameworks such as TensorFlow and PyTorch to extract the user's emotional state from facial and voice data. The input is activity data and emotion data sent to the server, and the output is data labeled with the user's emotional state.

[0532] Step 4:

[0533] The server extracts important events using the results of sentiment analysis and generates a video based on those results. Specifically, it identifies moments of emotional peaks in the user's daily activities and automatically creates a video based on them using video editing software similar to Adobe Premiere Pro. The input is event data from sentiment analysis, and the output is an edited video file.

[0534] Step 5:

[0535] Users view videos generated through a dedicated application. The application provides users with video playback functionality and also allows them to input feedback such as their impressions and suggestions for improvement after viewing. The input is the generated video, and the output is the user's feedback data.

[0536] Step 6:

[0537] The server collects user feedback and uses it to improve the system. Specifically, it analyzes user viewing history and feedback content, and uses this data to improve the accuracy of emotional expression and editing in future video production. The input is user feedback, and the output is data that serves as an indicator for system improvement.

[0538] (Application Example 2)

[0539] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0540] In recent years, systems utilizing personal life log data have attracted attention, but they are not sufficiently optimized for video editing that reflects emotional states. Furthermore, there is a lack of effective means to highlight particularly moving moments in users' daily lives. Therefore, there is a need for a system that enables the creation of rich videos that incorporate users' emotions.

[0541] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0542] In this invention, the server includes means for processing user activity data and emotional data, means for analyzing the activity data and emotional data and extracting important events, and means for automatically generating videos based on the extracted important events and emotional data and performing emotionally optimized editing. This allows users to richly reminisce about their life logs through videos, and to efficiently record and view particularly emotionally significant moments.

[0543] "User activity data" refers to information about an individual's physical activity, location, and daily behavior.

[0544] "Emotional data" refers to information about an individual's emotional state based on their facial expressions, tone of voice, and other physiological indicators.

[0545] "Processing means" refers to a system that converts or organizes acquired data into an appropriate format before analysis or editing.

[0546] "Means of analysis and extraction of important events" refers to a system that evaluates collected data and selects events that are significant to the user.

[0547] "A means of automatically generating videos and performing emotionally optimized editing" refers to a system that generates and edits videos that take into account an individual's emotional state, based on important events and emotional data.

[0548] "Means of delivery" refers to the interface or mechanism that allows users to access and use the generated content.

[0549] This invention is a system that automatically generates videos based on user activity and emotional data. A wearable device worn by the user and installed cameras continuously record the user's physical activity and emotional data 24 hours a day. This includes heart rate, movement data, facial expressions, and voice tone. This data is temporarily stored on the device and then periodically transmitted to a server.

[0550] The server operates on a high-performance cloud computing environment and stores incoming data in a database. This data is analyzed by an emotion engine utilizing multi-modal AI. The emotion engine analyzes the data using specific algorithms to recognize the user's emotional state. This extracts important events from the data and generates a video that expresses emotions richly. In the editing process, a generative AI model is used to incorporate narration and music that match the emotional state into the video.

[0551] Users view the generated videos through the interface of their smartphones or consumer robots. After watching the videos, users can provide feedback and contribute to further improvements to the system. This feedback is also part of the data analysis process and will be reflected in the next video generation.

[0552] As a concrete example, during a summer family trip, the surprise and joy of a child riding a roller coaster for the first time could be recorded. This would allow the family to vividly relive that moment. On the other hand, an example of a prompt to give instructions to a generative AI model could be in the form of, "Edit the highlight of last week's family trip into an emotionally rich video."

[0553] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0554] Step 1:

[0555] The device collects user activity and emotional data using wearable devices and cameras. Inputs include heart rate, location data, facial expressions, and voice tone. This data is temporarily stored in internal memory to complete the first stage of data collection.

[0556] Step 2:

[0557] The device transfers collected activity and emotion data to the server at regular intervals. It uses data stored in the device's memory as input and outputs it to a cloud service via the internet. This allows the data to be aggregated on the server in real time.

[0558] Step 3:

[0559] The server stores the received data in a database. The input is the transferred user activity and sentiment data, and the output consists of multiple organized data segments stored in the database. At this stage, the stored data is ready for later analysis.

[0560] Step 4:

[0561] The server analyzes data using multi-modal AI and recognizes emotional states with an emotion engine. Input consists of activity and emotion data from a database, and the algorithm analyzes this data to output emotional states (e.g., joy, surprise, sadness). Based on this, the server selects important events.

[0562] Step 5:

[0563] The server automatically generates videos using a generative AI model based on emotional data. The input consists of selected key events and emotional states, and the output is emotionally optimized video content. The generated videos automatically incorporate narration and music based on emotions.

[0564] Step 6:

[0565] Users view videos generated via smartphones or consumer robots. The input is video data provided by a server, and users view the video by displaying it on their device screen. After viewing the video, users can also provide feedback, which will be used to improve future video generation.

[0566] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0567] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0568] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0569] [Fourth Embodiment]

[0570] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0571] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0572] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0573] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0574] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0575] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0576] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0577] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0578] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0579] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0580] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0581] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0582] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0583] This invention is an advanced system for efficiently recording a user's life log and capturing important moments as video. This system consists of multiple devices and software, providing a mechanism for comprehensive data collection regarding the user's activities and for analyzing that data.

[0584] First, the wearable device, acting as the terminal, continuously acquires the user's biometric and location information. This device can record data such as heart rate, steps taken, and travel routes throughout the day. In addition, smart home sensors complementarily acquire information about the user's environment and analyze voice commands and the surrounding acoustic environment. This allows for a detailed record of the user's daily activities and events.

[0585] Next, the server receives the large dataset sent from the terminal and securely stores it in a database. At this stage, the data is filtered and organized to focus on useful information. The server's built-in multimodal AI engine analyzes the collected data and recognizes unusual behavior and new patterns. For example, if a user runs at an unusual speed or new activity is recorded in a particular location, these are identified as important events.

[0586] The server then utilizes its automated editing function to generate a video based on the recognized key events. The editing algorithm used here maintains professional quality by employing video transitions and effects to construct still images, videos, and audio data into a single story.

[0587] Finally, the generated video is sent to the user in a secure format. The user can view it through a dedicated application and relive past experiences. For example, the user's activities on a day when they try a new dish are recorded, and the process and results are edited into a video, allowing the user to relive the cooking process and see their family enjoying the dish at a later date.

[0588] The above outlines the steps required to implement this invention, demonstrating a new form of daily record-keeping that utilizes the user's life log.

[0589] The following describes the processing flow.

[0590] Step 1:

[0591] The device collects the user's biometric information, voice data, and environmental information using wearable devices and smart home sensors. This includes heart rate, steps taken, applications being used, and voice commands. The collected data is temporarily stored on the device.

[0592] Step 2:

[0593] The device uploads collected data to the server at regular intervals. During this process, the data is encrypted to prevent the identification of individuals and transmitted securely.

[0594] Step 3:

[0595] The server takes in the received data and first performs noise removal and format normalization. This preprocessing ensures that the data necessary for analysis is prepared in a clean state.

[0596] Step 4:

[0597] A multimodal AI within the server analyzes pre-processed data and integrates information from each modality. It detects characteristic events by comparing them against existing data patterns to extract important events from the user's daily life.

[0598] Step 5:

[0599] The server generates a video based on the extracted key events. Narration generated using natural language processing is synchronized with the video, and background music and transitions are appropriately added to create a video with a narrative.

[0600] Step 6:

[0601] The server uploads the generated video file to the user's dedicated secure portal and sends a notification to the user. At this time, access permissions for the video are set, making it viewable only by the user.

[0602] Step 7:

[0603] Users can watch videos through a dedicated application. Furthermore, they can provide ratings and feedback on the videos, and the results can be used for future improvements and AI training.

[0604] (Example 1)

[0605] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0606] In modern society, the amount of data individuals generate in their daily lives is increasing, and there is a need to utilize this data effectively. However, extracting meaningful information from this vast amount of data and recording important personal moments requires advanced technology. The challenge lies in creating a system that allows users to easily record their daily activities, reflect on valuable life events, and generate professional-quality video.

[0607] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0608] In this invention, the server includes a wearable device that continuously acquires the user's biometric and location information, a sensor device that complementarily acquires the user's environmental information, a data processing device for securely storing the acquired data in a database and performing filtering and indexing, an analysis device that analyzes the data using a multimodal AI engine and recognizes abnormal behavior and new activity patterns, a video generation device that automatically edits moving images based on important events and constructs them into a story, and a distribution device for providing videos to the user in a secure manner. This allows important events to be automatically extracted from the user's life log, making it easy for the user to look back on important moments from the past and enjoy them visually.

[0609] "User biometric information" refers to data that indicates an individual's physical condition and activity, such as heart rate and step count.

[0610] "Location information" refers to geographical data used to identify a user's current location and travel route.

[0611] A "wearable device" is a device that a user wears on their body, enabling them to continuously acquire biometric and location information.

[0612] A "sensor device" is a detector or measuring instrument used to acquire complementary information about the user's surrounding environment.

[0613] A "database" is an information storage system built to systematically store, manage, and utilize large amounts of acquired data.

[0614] "Filtering" is the process of extracting important data by removing unnecessary or incorrect information from acquired data.

[0615] "Indexing" is the process of organizing data by adding an index, making it easy to search and access.

[0616] A "multimodal AI engine" is an artificial intelligence technology that integrates and analyzes multiple types of data to recognize abnormal behavior and new activity patterns.

[0617] An "analytical device" is a device that processes data to extract specific information and obtain meaningful results.

[0618] A "video generation device" is a device that extracts important events based on acquired data and automatically constructs a video with a narrative structure.

[0619] A "distribution device" is a communication device or system that provides generated videos to users safely and reliably.

[0620] This invention is an advanced system for efficiently recording a user's life log and generating important moments as videos. The system consists of multiple devices and software, and provides a mechanism for comprehensive data collection and analysis of user activities.

[0621] First, a wearable device, acting as a terminal, continuously acquires the user's biometric and location information. This device can record data such as heart rate, steps taken, and travel routes throughout the day. Commercially available smartwatches and fitness trackers can be used for this process. In addition, smart home sensors complementarily acquire information about the user's environment and analyze voice commands and the surrounding acoustic environment. This allows for a detailed record of the user's daily activities and events.

[0622] Next, the server receives large datasets sent from the terminals and securely stores them in a database. Advanced security protocols and cloud storage systems are used to filter and organize this data, focusing on critical information. The server incorporates a multimodal AI engine that analyzes the collected data in real time, recognizing anomalous behavior and new patterns. AI frameworks and machine learning algorithms are used for data analysis. For example, if a device runs at an unusual speed or new activity is recorded in a specific location, these are identified as important events.

[0623] Subsequently, the server uses video editing software to generate a video based on the recognized key events. The editing algorithm utilizes video transitions and effects to construct still images, video footage, and audio data into a single story. This automated editing process uses existing video editing tools or custom-developed software.

[0624] Finally, the generated video is sent to the user in a secure format. The user can view the video through a dedicated application and relive their past experiences. For example, the user's activities on a day when they try a new dish can be recorded, and the process and results can be edited into a video, allowing them to relive the cooking process and see their family enjoying the dish at a later date.

[0625] An example of a prompt for a generative AI model would be the instruction, "Generate relevant videos based on records of a user performing a new activity at a specific location." This would allow the system to naturally generate valuable video recordings from the user's life log.

[0626] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0627] Step 1:

[0628] The device continuously acquires the user's biometric information (heart rate, steps) and location information using a wearable device. The input is real-time data from the device, which is recorded as a digital signal. The output is a dataset of biometric and location information organized in chronological order. Specifically, the device periodically activates sensors and stores the acquired data in temporary memory.

[0629] Step 2:

[0630] The device uses smart home sensors to acquire voice commands and acoustic data as environmental information. Inputs are ambient sounds and acoustic signals, which are analyzed using voice recognition technology. Outputs are event triggers and behavior estimation data based on the acoustic environment. Specifically, the sensor captures sound, performs noise filtering, and then organizes the results as digital data.

[0631] Step 3:

[0632] The terminal organizes acquired biometric, location, and environmental information as packet data and sends it to the server. The input is an integrated dataset from the sensors, which is encoded and transmitted via a communication module. The output is packet data delivered to the server through a secure communication channel. Specifically, the data packets are encrypted using the SSL / TLS protocol.

[0633] Step 4:

[0634] The server decrypts data received from the terminal and stores it in a database. The input is encrypted data packets, which are decrypted and identified as biometric or environmental information. The output is organized database entries. Specifically, the server-side layer filters the data and imputes missing values.

[0635] Step 5:

[0636] The server analyzes data in the database using a multimodal AI engine. The input is the entire dataset accumulated in the past, and the AI ​​applies anomaly detection algorithms to recognize important events. The output is a list of events that indicate abnormal behavior or new activity patterns. Specifically, a machine learning model is applied to each data point to extract characteristic patterns.

[0637] Step 6:

[0638] The server activates a video generator based on recognized events. The input consists of an event list and associated image and audio data, which are used by video editing software to automatically construct a narrative video. The output is the completed video file. Specifically, the software places video clips on a timeline and combines transitions and effects.

[0639] Step 7:

[0640] The user receives and views videos provided by the server using a dedicated application. The input is a video file from the server, which is decoded and played by the application. The output is the user's meaningful experience of watching the video. Specifically, the user receives a notification, launches the application, and controls the video playback.

[0641] (Application Example 1)

[0642] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0643] In modern cities, there is a need to efficiently collect large amounts of data on residents' activities and utilize it for urban management. However, there is a lack of mechanisms to record the actions of individual residents while maintaining their privacy and to extract meaningful information by linking it to the overall picture of the city. In particular, information processing technologies that can contribute to efficient transportation management and rapid response in emergencies based on behavioral trends are needed.

[0644] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0645] In this invention, the server includes means for acquiring user behavior information, means for processing the acquired behavior information, and means for analyzing the behavior information and extracting important events. This enables understanding behavioral trends across the entire city, allowing for efficient urban management and rapid response.

[0646] "User" refers to an individual or group that uses the system and is the entity that provides behavioral information.

[0647] "Behavioral information" refers to information that includes data on users' movement patterns and daily activities.

[0648] "Means" refers to the device or method used to achieve a specific function.

[0649] "Visual content" refers to media that includes visually recognizable information such as generated videos and images.

[0650] "Efficient urban management" refers to the optimal operation of urban transportation, public services, and infrastructure management.

[0651] The server is the heart of this system and is responsible for processing behavioral information sent by users. This behavioral information is collected through devices such as wearable devices and smartphones, and includes data such as the user's movement patterns and heart rate. These devices may include smartwatches like the Apple Watch and smartphones.

[0652] The server securely stores the received behavioral information in a database such as AWS RDS. Multimodal AI using TensorFlow is utilized for data processing and calculations, enabling analysis to understand behavioral trends across the entire city. Abnormal behavior and new patterns are detected by AI algorithms, making it possible to predict traffic congestion and analyze the impact of events.

[0653] Users receive the results, generated as visual content, through a smartphone application built with React Native. The app presents information that contributes to the efficient operation of the city and aims to improve the daily lives of residents.

[0654] As a concrete example, on the day a new art museum opens, it is conceivable that residents' museum visit history and public transportation usage information could be recorded and analyzed to propose transportation improvement measures for future events.

[0655] Examples of prompt statements to input into a generative AI model include the following:

[0656] "Analyze the facilities users visited on October 15, 2023, and their subsequent travel patterns, and create a report on the impact of that specific event on the city."

[0657] In this way, a system is realized that contributes to the efficient management of cities and the improvement of the quality of life for residents.

[0658] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0659] Step 1:

[0660] The device continuously acquires user behavior information. Specifically, it uses smartwatches and smartphones to record heart rate, location information, steps taken, and other data. This input data is sent to the cloud for subsequent analysis.

[0661] Step 2:

[0662] The server receives behavioral information sent from the terminal and stores it in a database such as AWS RDS. The input data includes time-series information and location data, and saving this data prepares it for later use in various analyses.

[0663] Step 3:

[0664] The server analyzes behavioral information stored in the database. Here, a generative AI model using TensorFlow is employed to detect abnormal behavior and new patterns from the data. Stored behavioral information is used as input, and the output provides anomaly detection results and behavioral patterns.

[0665] Step 4:

[0666] The server generates visual content based on key insights extracted from the analyzed data. At this time, it summarizes the anomaly detection results and creates a visual design to present them clearly to the user through a React Native application interface.

[0667] Step 5:

[0668] Users receive generated visual content through a smartphone application. The application visualizes behavioral trends across the city and presents suggestions for efficient urban management. This gives users a means to optimize their activities within the city.

[0669] Through the above processing steps, the system analyzes user data and provides information useful for urban management.

[0670] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0671] This invention is a system that meticulously records a user's life log and generates videos based on it, and further includes a function to recognize the user's emotions and optimize the video content. This system consists of multiple devices and software.

[0672] First, the wearable device and camera used as the terminal record the user's physical activity and facial expressions 24 hours a day. This includes not only heart rate, steps taken, and travel route, but also emotion-related data such as facial expressions and tone of voice. The terminal temporarily stores the emotion data along with the activity data and sends it to the server at regular intervals.

[0673] Next, the server stores the received data in a database. This data is analyzed by a multimodal AI that includes an emotion engine. The emotion engine recognizes the user's emotional state in real time through facial expression analysis and voice analysis. This allows for the detection of various emotional states such as sadness, happiness, and surprise.

[0674] After emotional and activity data is analyzed, the server generates a video based on the key events extracted. In this video generation process, emotional states are used in the editing. For example, moments when the user is emotionally aroused are particularly emphasized. Narration generated using natural language processing and selected music are also adjusted according to the emotional state.

[0675] Users can view the generated videos through a dedicated application. The application also provides a function to input feedback after viewing the videos, and this information is used to improve the system. In this way, the present invention provides a means of recording and reflecting on a rich and personalized life log that integrates emotions and activities. Specifically, a video of a user experiencing a sense of accomplishment at a sports event is recorded and edited to complement the emotions conveying that joy. In this way, users can vividly relive those moving moments.

[0676] The following describes the processing flow.

[0677] Step 1:

[0678] The device collects the user's biometric information through a wearable device. Simultaneously, it uses a camera and microphone to acquire the user's facial expressions and voice data. This allows for real-time recording of the user's activities and emotions.

[0679] Step 2:

[0680] The device temporarily stores collected biometric information and emotion-related data. At regular intervals, this data is encrypted and transferred to a server. This transfer is designed to protect user privacy.

[0681] Step 3:

[0682] The server stores the received data in a database. Here, the data is first preprocessed to remove noise and standardize its format. This prepares the data for analysis.

[0683] Step 4:

[0684] The emotion engine on the server analyzes the user's emotional state from facial and voice data. The emotion engine detects and quantifies emotions from subtle changes in facial expression and voice tone. This information is stored together with activity data.

[0685] Step 5:

[0686] The server's multimodal AI integrates biometric and emotional data to extract key events. This process selects characteristic scenes, such as movements that deviate from everyday behavior or heightened emotions.

[0687] Step 6:

[0688] The server automatically generates videos based on the extracted information. Visual effects and music are optimized according to the emotional state. Narration is also generated using natural language processing technology and incorporated into the video.

[0689] Step 7:

[0690] Users receive notifications from the server and watch videos generated through a dedicated application. After watching the videos, users can input feedback into the application, and this feedback is used to improve the AI.

[0691] (Example 2)

[0692] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0693] Conventional life-logging systems can record user activity data and extract important events, but they have limitations in generating personalized content that reflects the user's emotional state. Furthermore, there is a need for a system that can automatically generate videos based on user emotions and activities, and perform real-time emotion recognition and editing during that process. This invention aims to solve these problems.

[0694] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0695] In this invention, the server includes means for acquiring user activity data and emotional data, means for processing the acquired activity data and emotional data, means for analyzing the activity data and emotional data and extracting important events, means for automatically generating videos according to the user's emotional state, means for adjusting music and narration appropriate to the emotion based on the generated video, and means for providing the generated video to the user and obtaining user feedback. This enables the personalized automatic generation of videos that reflect the user's emotional state, allowing the user to have a richer experience.

[0696] A "user" is the individual who utilizes the system and is the subject of collection of their activity data and emotional data.

[0697] "Activity data" refers to information related to a user's physical activity, including data such as heart rate, steps taken, and travel route.

[0698] "Emotional data" refers to information that reflects the user's emotional state, and is based on data such as facial expressions and tone of voice.

[0699] "Key events extracted" refer to particularly noteworthy occurrences or moments within the activity and emotional data.

[0700] "Means for automatically generating videos" refers to a process or device that automatically generates videos tailored to the user using acquired activity data and emotion data.

[0701] An "emotion analysis engine" is software or a system that analyzes a user's facial expressions and voice to recognize the user's emotional state in real time.

[0702] "Personalized content" refers to information or media that is tailored or generated based on the individual user's preferences and characteristics.

[0703] This invention is a system that meticulously records a user's life log and generates videos that take the user's emotions into consideration. This system consists of terminals such as wearable devices and cameras, and a server.

[0704] The device continuously records the user's physical activity and emotional data 24 hours a day. Specifically, the wearable device measures heart rate and steps using heart rate sensors and accelerometers, and tracks movement routes using GPS. It also uses a camera and microphone to analyze the user's facial expressions and voice tone to acquire emotional data. This allows it to capture emotional states such as joy, sadness, and surprise in real time.

[0705] The device sends collected activity and emotion data to the server at regular intervals. The server stores this data in a database and performs emotion analysis using deep learning frameworks such as TensorFlow or PyTorch. The emotion analysis engine recognizes the user's emotional state in real time from the collected facial and voice data and extracts important events by combining them with activity data.

[0706] Based on the extracted key events, the server automatically generates a video using video editing software similar to Adobe Premiere Pro. This video reflects the user's emotional state, and the narration and music are selected to match those emotions. For example, in a scene where the user experiences a sense of accomplishment, that moment is emphasized, and music based on joy is selected.

[0707] Users can view videos generated through a dedicated application. This application also includes a feature that allows users to easily input feedback after viewing, and the system is improved based on this feedback. For example, a prompt message for input to the generation AI model might be, "Please edit the video to highlight the emotional peaks of the past 24 hours."

[0708] In this way, the present invention provides a rich means of life logging and reflection that integrates the user's emotions and activities.

[0709] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0710] Step 1:

[0711] The device collects data on the user's physical activity and emotions. Specifically, the wearable device uses sensors to record heart rate, steps, and movement routes, and uses a camera and microphone to record facial expressions and voice tone. The input consists of the user's real-time physical information and emotional expressions, which are temporarily stored as time-series data within the device.

[0712] Step 2:

[0713] The device transmits collected data to the server at regular intervals. Specifically, it securely transfers data stored on the device to the server in an encrypted format via Wi-Fi or a mobile network. The input consists of activity data and sentiment data from the device, and the output is a dataset stored on the server.

[0714] Step 3:

[0715] The server stores the received data in a database and analyzes it using an emotion analysis engine. Specifically, it uses deep learning frameworks such as TensorFlow and PyTorch to extract the user's emotional state from facial and voice data. The input is activity data and emotion data sent to the server, and the output is data labeled with the user's emotional state.

[0716] Step 4:

[0717] The server extracts important events using the results of sentiment analysis and generates a video based on those results. Specifically, it identifies moments of emotional peaks in the user's daily activities and automatically creates a video based on them using video editing software similar to Adobe Premiere Pro. The input is event data from sentiment analysis, and the output is an edited video file.

[0718] Step 5:

[0719] Users view videos generated through a dedicated application. The application provides users with video playback functionality and also allows them to input feedback such as their impressions and suggestions for improvement after viewing. The input is the generated video, and the output is the user's feedback data.

[0720] Step 6:

[0721] The server collects user feedback and uses it to improve the system. Specifically, it analyzes user viewing history and feedback content, and uses this data to improve the accuracy of emotional expression and editing in future video production. The input is user feedback, and the output is data that serves as an indicator for system improvement.

[0722] (Application Example 2)

[0723] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0724] In recent years, systems utilizing personal life log data have attracted attention, but they are not sufficiently optimized for video editing that reflects emotional states. Furthermore, there is a lack of effective means to highlight particularly moving moments in users' daily lives. Therefore, there is a need for a system that enables the creation of rich videos that incorporate users' emotions.

[0725] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0726] In this invention, the server includes means for processing user activity data and emotional data, means for analyzing the activity data and emotional data and extracting important events, and means for automatically generating videos based on the extracted important events and emotional data and performing emotionally optimized editing. This allows users to richly reminisce about their life logs through videos, and to efficiently record and view particularly emotionally significant moments.

[0727] "User activity data" refers to information about an individual's physical activity, location, and daily behavior.

[0728] "Emotional data" refers to information about an individual's emotional state based on their facial expressions, tone of voice, and other physiological indicators.

[0729] "Processing means" refers to a system that converts or organizes acquired data into an appropriate format before analysis or editing.

[0730] "Means of analysis and extraction of important events" refers to a system that evaluates collected data and selects events that are significant to the user.

[0731] "A means of automatically generating videos and performing emotionally optimized editing" refers to a system that generates and edits videos that take into account an individual's emotional state, based on important events and emotional data.

[0732] "Means of delivery" refers to the interface or mechanism that allows users to access and use the generated content.

[0733] This invention is a system that automatically generates videos based on user activity and emotional data. A wearable device worn by the user and installed cameras continuously record the user's physical activity and emotional data 24 hours a day. This includes heart rate, movement data, facial expressions, and voice tone. This data is temporarily stored on the device and then periodically transmitted to a server.

[0734] The server operates on a high-performance cloud computing environment and stores incoming data in a database. This data is analyzed by an emotion engine utilizing multi-modal AI. The emotion engine analyzes the data using specific algorithms to recognize the user's emotional state. This extracts important events from the data and generates a video that expresses emotions richly. In the editing process, a generative AI model is used to incorporate narration and music that match the emotional state into the video.

[0735] Users view the generated videos through the interface of their smartphones or consumer robots. After watching the videos, users can provide feedback and contribute to further improvements to the system. This feedback is also part of the data analysis process and will be reflected in the next video generation.

[0736] As a concrete example, during a summer family trip, the surprise and joy of a child riding a roller coaster for the first time could be recorded. This would allow the family to vividly relive that moment. On the other hand, an example of a prompt to give instructions to a generative AI model could be in the form of, "Edit the highlight of last week's family trip into an emotionally rich video."

[0737] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0738] Step 1:

[0739] The device collects user activity and emotional data using wearable devices and cameras. Inputs include heart rate, location data, facial expressions, and voice tone. This data is temporarily stored in internal memory to complete the first stage of data collection.

[0740] Step 2:

[0741] The device transfers collected activity and emotion data to the server at regular intervals. It uses data stored in the device's memory as input and outputs it to a cloud service via the internet. This allows the data to be aggregated on the server in real time.

[0742] Step 3:

[0743] The server stores the received data in a database. The input is the transferred user activity and sentiment data, and the output consists of multiple organized data segments stored in the database. At this stage, the stored data is ready for later analysis.

[0744] Step 4:

[0745] The server analyzes data using multi-modal AI and recognizes emotional states with an emotion engine. Input consists of activity and emotion data from a database, and the algorithm analyzes this data to output emotional states (e.g., joy, surprise, sadness). Based on this, the server selects important events.

[0746] Step 5:

[0747] The server automatically generates videos using a generative AI model based on emotional data. The input consists of selected key events and emotional states, and the output is emotionally optimized video content. The generated videos automatically incorporate narration and music based on emotions.

[0748] Step 6:

[0749] Users view videos generated via smartphones or consumer robots. The input is video data provided by a server, and users view the video by displaying it on their device screen. After viewing the video, users can also provide feedback, which will be used to improve future video generation.

[0750] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0751] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0752] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0753] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0754] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0755] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0756] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0757] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0758] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0759] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0760] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0761] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0762] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0763] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0764] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0765] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0766] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0767] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0768] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0769] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0770] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0771] The following is further disclosed regarding the embodiments described above.

[0772] (Claim 1)

[0773] A device for acquiring user activity data,

[0774] A device for processing the acquired activity data,

[0775] A device for analyzing activity data and extracting important events,

[0776] A device for automatically generating videos based on extracted important events,

[0777] A device for providing the generated video to the user,

[0778] A system that includes this.

[0779] (Claim 2)

[0780] The system according to claim 1, comprising a sensor for continuously recording user activity data over a 24-hour period.

[0781] (Claim 3)

[0782] The system according to claim 1, further comprising a function for adding natural language explanations to extracted important events.

[0783] "Example 1"

[0784] (Claim 1)

[0785] A wearable device that continuously acquires the user's biometric and location information,

[0786] A sensor device that supplementarily acquires user environmental information,

[0787] A data processing device for securely storing acquired data in a database and performing filtering and indexing,

[0788] An analytical device that uses a multimodal AI engine to analyze data and recognize abnormal behavior and new activity patterns,

[0789] A video generation device that automatically edits video footage based on important events and structures it into a story,

[0790] A distribution device for providing videos to users in a secure manner,

[0791] A system that includes this.

[0792] (Claim 2)

[0793] The system according to claim 1, comprising a sensor for continuously acquiring user activity data.

[0794] (Claim 3)

[0795] The system according to claim 1, further comprising a function for adding natural language explanations to the generated video.

[0796] "Application Example 1"

[0797] (Claim 1)

[0798] Means for obtaining user behavior information,

[0799] Means for processing acquired behavioral information,

[0800] A means of analyzing behavioral information and extracting important events,

[0801] A means for automatically generating visual content based on extracted important events,

[0802] Means for providing the generated visual content to the user,

[0803] To support the efficient management of cities, we need means to understand the behavioral trends of cities as a whole,

[0804] A system that includes this.

[0805] (Claim 2)

[0806] The system according to claim 1, comprising a detector for continuously recording user behavior information over a 24-hour period.

[0807] (Claim 3)

[0808] The system according to claim 1, further comprising a function for adding natural language explanations to the extracted important events.

[0809] "Example 2 of combining an emotion engine"

[0810] (Claim 1)

[0811] Means for acquiring user activity data and sentiment data,

[0812] Means for processing acquired activity data and emotion data,

[0813] A means of analyzing activity data and emotional data to extract important events,

[0814] A means of automatically generating videos according to the user's emotional state,

[0815] A means of adjusting music and narration to suit the emotions based on the generated video,

[0816] A means of providing the generated video to the user and obtaining user feedback,

[0817] A system that includes this.

[0818] (Claim 2)

[0819] The system according to claim 1, comprising an emotion analysis engine that recognizes the user's emotional state in real time.

[0820] (Claim 3)

[0821] The system according to claim 1, further comprising a function that uses activity data and emotion data to edit videos by emphasizing the peak of the user's emotions.

[0822] "Application example 2 when combining with an emotional engine"

[0823] (Claim 1)

[0824] Means of obtaining user activity data,

[0825] Means for processing acquired activity data and emotion data,

[0826] A means of analyzing activity data and emotional data to extract important events,

[0827] A means of automatically generating videos based on extracted key events and emotional data, and performing emotionally optimized editing,

[0828] A means of providing the generated video to the user,

[0829] A system that includes this.

[0830] (Claim 2)

[0831] The system according to claim 1, comprising sensors for continuously recording user activity data and emotional data over a 24-hour period.

[0832] (Claim 3)

[0833] The system according to claim 1, comprising the function of adding natural language explanations to extracted important events and selecting music or narration according to the emotional state. [Explanation of symbols]

[0834] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A device for acquiring user activity data, A device for processing the acquired activity data, A device for analyzing activity data and extracting important events, A device for automatically generating videos based on extracted important events, A device for providing the generated video to the user, A system that includes this.

2. The system according to claim 1, comprising a sensor for continuously recording user activity data over a 24-hour period.

3. The system according to claim 1, further comprising a function for adding natural language explanations to extracted important events.