system
A system for efficiently organizing and reliving special moments using data collection, analysis, and augmented/virtual reality technologies addresses the challenge of managing diverse digital data, enriching personal memories.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-16
- Publication Date
- 2026-06-26
AI Technical Summary
The management and reliving of individual special moments in daily life is challenging due to the inefficiency in organizing diverse digital data and insufficient means for recreating past experiences, leading to a lack of rich utilization of personal memories.
A system equipped with data collection, analysis, tagging, and generation capabilities, utilizing machine learning and augmented/virtual reality to organize and recreate special moments through personalized visual information.
Enables efficient management and reliving of special moments by automatically organizing and presenting personalized visual information using augmented or virtual reality, enhancing the user's digital experience.
Smart Images

Figure 2026105323000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In modern times, due to the spread of digital devices, various media data in an individual's daily life is generated. However, such diverse data is often managed individually, and it is difficult to efficiently organize, search, and relive it, so problems occur in the management of individual special moments. In addition, there is also a problem that means for reproducing past experiences using the real space or virtual space are insufficient, so that an individual's memory cannot be richly utilized.
Means for Solving the Problems
[0005] This invention provides a device equipped with means for automatically collecting user data from multiple data sources. This device analyzes the collected data and identifies its content and emotions using machine learning algorithms. It also has a function to automatically assign information tags related to the data based on the analysis results. Furthermore, it has means for recognizing specific events from this data and organizing related data, enabling the generation of personalized visual information for the user. By displaying this visual information using augmented reality or virtual reality technology, the system provides the ability to recreate past experiences. This makes it possible to efficiently manage and re-experience individual special moments.
[0006] "Device means" refers to physical or software components for collecting and processing user data.
[0007] "Analysis means" refers to a technical system for analyzing collected data and identifying its content and emotions.
[0008] "Tagging method" refers to a process or technology that assigns information related to data based on analysis results.
[0009] "Event recognition means" refers to a method for associating multiple data points with each other, recognizing them as a specific event, and organizing them.
[0010] "Generation method" refers to the technology that creates visual information based on analyzed and tagged information.
[0011] "Display means" refers to technologies and devices that display generated visual information in a real or virtual environment.
[0012] "Visual information" refers to digital content created to visually recreate a user's memories or specific events.
[0013] Augmented reality technology refers to the technology that overlays virtual information onto real-world space.
[0014] "Virtual reality technology" refers to technology that provides users with a completely artificial environment generated by a computer, thereby promoting an immersive experience. [Brief explanation of the drawing]
[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, which incorporates an emotion engine. [Figure 14]It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentiment engine is combined.
Embodiments for Carrying Out the Invention
[0016] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0017] First, the terms used in the following description will be explained.
[0018] In the following embodiments, a processor with a reference number (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0019] In the following embodiments, a RAM (Random Access Memory) with a reference number is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0020] In the following embodiments, a storage with a reference number is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.
[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0023] [First Embodiment]
[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0036] This invention relates to a comprehensive system for efficiently organizing and reliving special moments in a user's daily life. In its embodiments, the system is mainly organized around three themes: a server, a terminal, and a user, each playing a different role.
[0037] The device collects various data, such as photos, videos, voice memos, GPS information, social media posts, and calendar events, through applications installed on the user's smartphone or other mobile device. This data collection is performed automatically with the user's permission. The user's location information and the date and time of capture can be used to supplement the data with background and scene information.
[0038] The server receives data sent from the terminal. After receiving the data, the server uses machine learning algorithms to analyze it and identify its content and associated emotions. Based on the analysis results, it assigns information tags related to each data item and performs cross-media tagging. This makes it possible to intuitively organize data based on information such as location, characters, event names, and emotions.
[0039] Furthermore, the server recognizes specific events based on the user's calendar information and historical data, and groups related data. For example, it automatically detects the user's birthday or travel events and groups together photos and videos associated with them.
[0040] Users can use this organized data to generate personalized visual information. Specifically, they interact with an AI assistant to determine settings (e.g., theme and video length) for creating a particular story. Based on this information, the server generates personalized videos or slideshows.
[0041] Finally, the device is equipped with technology to display the generated visual information. This technology uses augmented reality (AR) or virtual reality (VR) to allow users to re-experience past memories in real or virtual space. This makes it possible to recreate special moments regardless of the physical environment.
[0042] As described above, this system aims to enrich users' digital lives by automatically organizing their memories and reconstructing special moments in a unique way.
[0043] The following describes the processing flow.
[0044] Step 1:
[0045] The device obtains the necessary permissions from the user and automatically collects data such as photos, videos, and voice memos from the smartphone. Since GPS information is also collected, the user's location information can also be obtained.
[0046] Step 2:
[0047] The device sends the collected data to the server using a secure protocol. Data is compressed and encrypted during transmission to ensure its security.
[0048] Step 3:
[0049] The server stores the data received from the terminal and analyzes its content using machine learning algorithms. This analysis includes image recognition, speech recognition, and text analysis, and is used to identify people, places, and objects within the data.
[0050] Step 4:
[0051] Based on the analysis results, the server automatically assigns relevant information tags to the data. These tags include location, person, event name, and sentiment, and are useful for subsequent data retrieval and organization.
[0052] Step 5:
[0053] The server uses calendar information to recognize events and groups data related to specific events (such as trips or birthdays). This makes it possible to manage related memories together.
[0054] Step 6:
[0055] The user accesses the generated data and interacts with an AI assistant to configure settings for creating a personalized video story. Once the settings are complete, the server generates the video or slideshow based on them.
[0056] Step 7:
[0057] The device displays generated visual information to the user using augmented reality or virtual reality technology. Users can use AR devices or VR goggles to re-experience special moments in real or virtual space.
[0058] (Example 1)
[0059] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0060] In modern society, user information is generated in vast quantities from a wide range of sources, making it difficult to properly organize this information and quickly re-examine necessary information. In particular, organizing information based on emotions and systematically understanding chronologically related events are challenges.
[0061] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0062] In this invention, the server includes information device means for automatically collecting user information from multiple information sources, data analysis means for analyzing the collected user information and identifying its content and associated emotions, and identifier assignment means for automatically assigning identifiers related to the information based on the analysis results. This enables efficient organization of user information and allows for the re-experience of information based on emotions and related events.
[0063] "Information device means" refers to a device or system for automatically collecting user information from multiple information sources.
[0064] "Data analysis means" refers to means for analyzing collected user information and identifying its content and associated sentiments.
[0065] "Identifier assignment means" refers to a means for automatically assigning identifiers related to information based on the results of data analysis.
[0066] An "activity recognition means" is a means for systematizing multiple pieces of user information related to a specific activity.
[0067] "Information generation means" refers to means for generating customized visual information from systematized information.
[0068] "Presentation means" refers to a means of presenting generated visual information to a user in real space or virtual space.
[0069] A "learning algorithm" is an algorithm used in data analysis to identify the sentiment of users.
[0070] Augmented reality technology is a technology that overlays digital information onto the real world environment.
[0071] "Virtual reality technology" is a technology that presents users with a computer-generated virtual environment, providing a sense of immersion.
[0072] This invention is a comprehensive device for organizing and reliving special moments in the user's daily life. This device mainly consists of three components: a server, a terminal, and a user, each playing a different role.
[0073] The device collects information through applications installed on the user's smartphone or mobile device. Specifically, it automatically acquires images, videos, voice memos, and location information using the camera, microphone, and GPS functions. This collection process is initiated automatically or manually based on the user's settings. Social media posts and calendar events are also acquired as target data. The information collected at this stage is collected with the user's permission and transmitted to the server via a secure channel.
[0074] The server receives the collected information and performs data analysis. Machine learning algorithms are used, with a particular "generative AI model" playing a key role in identifying the sentiment associated with the data's content. The analyzed data is automatically assigned relevant identifiers and organized based on activity. Through these processes, the server organizes information based on specific events, such as "birthdays" or "trips," which are important to the user.
[0075] Users can generate customized stories by interacting with the AI assistant. During the interaction, users communicate their wishes using prompts. For example, they might tell the AI a specific request such as, "Make a 3-minute video about my summer trip memories." Based on these prompts, the server generates personalized visual information using relevant data.
[0076] The device is equipped with the functionality to present generated visual information to the user. By using augmented reality (AR) and virtual reality (VR) technologies, users can re-experience special moments from their past in virtual or real-world spaces. This experience has a visual impact on the user and provides a new digital life that is not dependent on the physical environment.
[0077] This invention is expected to enhance the user experience when reliving special moments, as users will be able to easily access information in a highly organized and tagged state.
[0078] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0079] Step 1:
[0080] The device collects information from the user's daily life. Inputs include the camera, microphone, GPS, user social media posts, and calendar entries. This allows it to acquire images, videos, voice memos, and location information, which are then transmitted to a server via a secure communication method.
[0081] Step 2:
[0082] The server receives data sent from the terminal. It processes all collected data from the terminal as input. To perform data analysis, it uses a generative AI model to assign meaning to the items in the data and perform sentiment analysis. The output is a dataset with assigned identifiers.
[0083] Step 3:
[0084] The server assigns identifiers to the analyzed data. It uses analysis results provided by a generative AI model as input. It tags the data based on event names and sentiment, and structures the data through cross-media tagging. The output generates tagged and organized data groups.
[0085] Step 4:
[0086] The server groups specific event data based on organized data. It uses data groups with assigned identifiers as input. It automatically extracts and groups data for related events (e.g., birthdays and trips) by referencing the user's calendar information, etc. The output is a dataset grouped by event.
[0087] Step 5:
[0088] The user enters a prompt via the AI assistant. Specifically, they provide text instructions such as, "Make a 3-minute video of my summer trip memories." The server then receives the necessary details for generation and begins processing.
[0089] Step 6:
[0090] The server generates customized visual information based on user prompts. It uses prompts and grouped event data as input. Utilizing a generative AI model, it creates videos and slideshows in the user's desired format. The output is the completed visual content.
[0091] Step 7:
[0092] The device presents the generated video or slideshow to the user. It receives completed content from a server as input. Output includes providing visual information that utilizes AR or VR technology, allowing users to interactively re-experience special moments from the past in real or virtual space.
[0093] (Application Example 1)
[0094] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0095] The present invention aims to provide a method for efficiently organizing special moments in a user's daily life and effectively reliving them. In particular, it aims to realize a system that allows users to re-experience past experiences while interacting with a household robot. This will emotionally enrich special experiences in the user's digital life.
[0096] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0097] In this invention, the server includes a device means for automatically acquiring user information from multiple information sources, an analysis device means for analyzing the acquired user information and recognizing its content, and a tagging means for automatically assigning attribute tags related to the information based on the analysis results. This makes it possible for users to relive special moments from the past through a home robot.
[0098] "Device means" refers to a component that has the function of automatically acquiring user information from multiple information sources.
[0099] An "analysis device means" is a device that has the function of analyzing acquired user information and recognizing its content.
[0100] A "tagging method" is a mechanism that automatically assigns attribute tags related to information based on the analysis results.
[0101] An "event recognition means" is a mechanism for organizing and consolidating multiple user information related to a specific event.
[0102] A "generation device means" is a device for generating individualized visual information from standardized information.
[0103] "Presentation device means" refers to a device equipped with technology for presenting generated visual information in the real world or a virtual world.
[0104] "Means of recreating personalized narratives" refers to methods that allow users to re-experience past memories while interacting with home appliance robots.
[0105] The system implementing this invention can efficiently organize special moments in a user's daily life and allow them to re-experience them through a home robot. The system mainly consists of a terminal, a server, and a user.
[0106] Users collect various types of information through their smartphones and other devices, including photos, videos, voice memos, location data, social media posts, and calendar events related to their daily activities. This data is automatically sent to the server after obtaining the user's permission.
[0107] The server analyzes received user data using machine learning algorithms such as Python, Tensorflow®, and PyTorch. This analysis identifies the sentiment associated with the data's content, and information tags are assigned based on this. This tagging process intuitively organizes the information and links it to specific events.
[0108] In particular, home robots are designed to allow users to relive past memories in an interactive format. The robots receive voice input from the user and generate appropriate responses. By utilizing AR technology to provide visual information, users can experience special moments from the past in the real world.
[0109] For example, if a user asks a home robot to "show me photos from last year's trip," the system identifies the relevant photos using tags and displays them through an AR display. This allows the user to vividly recreate actual landscapes and past experiences.
[0110] An example of a prompt to input into a generative AI model is: "The user wants memories of a trip from last year. Please find relevant content based on specific dates and sentiment tags."
[0111] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0112] Step 1:
[0113] The device collects information from the user's daily life, including photos, videos, voice memos, location data, social media posts, and calendar events. This data is collected automatically with the user's permission. The input data is diverse and broadly covers the user's daily life. Structured raw data is generated as output.
[0114] Step 2:
[0115] The device sends the collected data to the server. After the server receives the data, it analyzes it using machine learning algorithms (such as TensorFlow or PyTorch). The server receives the raw data sent from the device as input and generates results that identify the content and sentiment as output.
[0116] Step 3:
[0117] The server automatically assigns information tags based on the analysis results. The input is already analyzed data, and the output is organized data with attributes related to the dataset (e.g., location, person, emotion, etc.) attached.
[0118] Step 4:
[0119] The server groups organized data related to a specific event. This includes organizing information based on the user's calendar information and related events. The input is organized data with tags, and the output generates a dataset that aggregates all data related to a single event.
[0120] Step 5:
[0121] When a user wishes to relive past memories through a home robot, the server uses a generated dataset to provide a personalized narrative. Specifically, the AI analyzes the user's voice input, retrieves relevant information, and generates and presents visual information on an AR device or robot display. The input is the user's request (as a prompt), and the output is an interactive visual experience.
[0122] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0123] This invention relates to a comprehensive system for efficiently organizing and reliving special moments, including emotions, in a user's daily life. The system mainly consists of a server, a terminal, an emotion engine, and a user, each playing a different role.
[0124] The device collects various types of data, such as photos, videos, and voice memos, through applications installed on the user's smartphone or other mobile devices. This collection is done automatically with the user's permission, and the device also uses location services to supplement the user's actions and the background of the scene.
[0125] The server receives data sent from the terminal and analyzes it using machine learning algorithms and an emotion engine. The emotion engine determines the user's emotional state, particularly through the analysis of voice and images, and reflects this in the analysis results. As a result, information tags based on the analysis results are automatically assigned to each data, and cross-media tagging is performed. This process allows the data to be organized by information such as location, person, event name, and emotion.
[0126] Furthermore, the server recognizes specific events based on the user's calendar information and historical data, and groups related data accordingly. For example, it can detect the user's birthday or travel events and group related photos and videos together.
[0127] Users utilize this organized data to generate personalized visual information via an AI assistant. Specifically, users specify the settings necessary for story creation (theme, video length, emotional expression, etc.) through dialogue. The server then generates personalized videos and slideshows that take emotional information into account, based on these settings.
[0128] The device has the ability to deliver this generated visual information through augmented reality (AR) or virtual reality (VR) technology. This allows users to re-experience special moments from the past in real or virtual space. For example, a user can use an AR device to relive memorable moments from their travels.
[0129] This system allows users to efficiently manage special moments, including emotions, within digital data and relive them as needed. This enables users to make more meaningful use of their individual memories.
[0130] The following describes the processing flow.
[0131] Step 1:
[0132] The device operates applications with the user's permission and automatically collects user data such as photos, videos, voice memos, and GPS data from the smartphone. Data collection is performed periodically in the background.
[0133] Step 2:
[0134] The device transmits the collected data to the server using a secure protocol. During this process, the data is compressed and encrypted to ensure information security.
[0135] Step 3:
[0136] The server analyzes the received data. Image recognition technology detects objects and faces in photos and videos and identifies their content. For voice memos, speech recognition technology is used to convert the audio into text.
[0137] Step 4:
[0138] An emotion engine operates on the server, analyzing audio and images from the analyzed data to determine the user's emotions. This identifies emotional states such as happiness, surprise, and sadness.
[0139] Step 5:
[0140] The server automatically assigns informational tags to the data based on the analysis results. These tags include information such as location, people, event names, and user sentiment, and are used later to search and organize the data.
[0141] Step 6:
[0142] The server performs event recognition and identifies specific events based on calendar information. It organizes the collected data in association with events, grouping data by event such as travel or birthdays.
[0143] Step 7:
[0144] The user uses an AI assistant to configure settings for generating a personalized story. The user specifies the theme, video length, emotional expression, and other details, and this information is sent to the server.
[0145] Step 8:
[0146] The server generates personalized visual information (e.g., videos, slideshows) that reflects user preferences and emotions and event information. Analyzed emotion information is also used in the generation process.
[0147] Step 9:
[0148] The device provides the user with generated visual information using augmented reality (AR) or virtual reality (VR) technology. The user can use an AR device or VR goggles to re-experience special moments in real or virtual space.
[0149] (Example 2)
[0150] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0151] It is difficult to efficiently collect the diverse information that individuals generate in their daily lives, analyze its content, and grasp its characteristics, including emotional states. Furthermore, effectively classifying data related to specific events and integrating it to make it re-experienced as personalized visual information has been difficult to achieve with conventional methods.
[0152] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0153] In this invention, the server includes terminal means for automatically acquiring personal information, analysis means for analyzing the acquired personal information and evaluating emotional states, tagging means for identifying content based on the analysis results and associating information tags, classification means for classifying and organizing personal information related to important activities, creation means for creating personalized visual elements using the organized information, and reproduction means for reproducing the created visual elements in the real world or virtual space. This makes it possible to efficiently analyze and organize diverse personal data and provide a system that allows users to re-experience special moments.
[0154] "Personal information" refers to digital data related to a user, such as photos, videos, voice memos, and location data.
[0155] "Terminal means" refers to a computer or device used to automatically acquire and collect user information.
[0156] "Analysis methods" refer to algorithms and systems that use acquired personal information to evaluate emotional states and analyze their characteristics using machine learning or generative AI models.
[0157] "Tagging method" refers to the process of identifying information related to user data based on analysis results and assigning corresponding labels or tags.
[0158] "Classification method" refers to a method or system for organizing and grouping personal information related to a specific event or important activity.
[0159] "Creation method" refers to a method for generating visual elements tailored to individual users based on organized information.
[0160] "Reproduction methods" refer to systems and technologies for presenting generated visual elements in real-world environments or virtual spaces.
[0161] This invention provides a system for efficiently collecting and analyzing personal information and generating and reproducing personalized visual information based on the results. The components of the system are described in detail below.
[0162] Users use their devices to automatically collect personal information such as photos, videos, and voice memos generated in their daily lives. With the user's permission, the device securely collects this data and uses location services to add background data such as location information. Mobile devices such as smartphones and tablets are primarily used for this purpose.
[0163] The collected data is sent to a server. The server is a high-performance computer system that runs machine learning algorithms and generative AI models for sentiment analysis. Examples of software used include machine learning libraries such as TensorFlow and PyTorch. This allows the server to analyze the data and extract the user's emotional state and other important information.
[0164] Once the analysis is complete, the server automatically assigns information tags to the data. This organizes and groups data related to specific events (e.g., birthdays or trips).
[0165] Through interaction with the AI assistant, users can specify themes such as "happy memories" or "special moments" and request the server to generate visual information. The AI assistant then generates the most suitable videos or slideshows based on the user's preferences. The generated visual information is presented via the user's device using augmented reality (AR) or virtual reality (VR) technology. In particular, using AR glasses or VR headsets makes it possible to visually re-experience past memories.
[0166] As a concrete example, if a user wants to relive memories of a past trip, they might enter the prompt, "Please create a 5-minute video using photos and videos from my trip to Hawaii that will bring back those memories." Based on this prompt, the system extracts and analyzes the necessary data and generates the desired video. This allows the user to relive that trip as if it happened yesterday.
[0167] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0168] Step 1:
[0169] Input: Photos, videos, and voice memos from the user's daily life.
[0170] Operation: The device automatically collects this data through applications on the user's smartphone or tablet. With the user's permission, location information is also obtained, and supplementary information is gathered to tag the data.
[0171] Output: The collected data, along with the associated location data, is prepared to be sent to the server.
[0172] Step 2:
[0173] Input: Personal information data transmitted from the device.
[0174] Operation: The server receives this data and analyzes its contents using machine learning algorithms and generative AI models. This analysis includes tone analysis from audio data and facial recognition in images to assess the user's emotional state.
[0175] Output: The analyzed data is tagged with emotional states and event information and registered in a database.
[0176] Step 3:
[0177] Input: Analyzed and tagged user data.
[0178] Operation: Based on the analysis information, the server classifies and organizes data related to specific events or important activities. For example, this includes the process of grouping data related to past trips or birthday events.
[0179] Output: A well-organized dataset is generated for each event.
[0180] Step 4:
[0181] Input: A well-organized dataset.
[0182] Operation: The user enters prompt text into the AI assistant, specifying the theme and content for visual information generation. Based on these instructions, the server uses image and video editing software to generate personalized videos and slideshows.
[0183] Output: Generated visual information file (video or slideshow).
[0184] Step 5:
[0185] Input: The generated visual information file.
[0186] Operation: The device reproduces visual information in the real world or virtual space through AR glasses or VR headsets. This allows users to relive special moments from the past in an immersive way.
[0187] Output: Presentation of interactive visual information that allows the user to re-experience the experience.
[0188] (Application Example 2)
[0189] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0190] As personal digital information increases daily, there is a need for systems that can efficiently manage this information and easily allow users to relive special moments. However, conventional systems are insufficient in providing detailed organization that takes emotions into account and in offering visual re-experiences. Therefore, the challenge is to provide users with a more personalized experience.
[0191] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0192] In this invention, the server includes terminal means for automatically acquiring personal information from multiple sources; analysis means for analyzing the acquired personal information and identifying its content; tagging means for automatically assigning identification tags related to the information based on the analysis results; event recognition means for aggregating multiple pieces of personal information related to a specific event; generation means for generating personalized visual information from the aggregated information; and presentation means for presenting the generated visual information in physical or virtual space. This enables users to efficiently manage special moments, including emotions, and re-experience them in physical or virtual space.
[0193] A "terminal device" is a device that has the function of automatically acquiring personal information from multiple sources.
[0194] "Analysis means" refers to a process or device for analyzing acquired personal information and identifying its contents.
[0195] A "tagging mechanism" is a system that automatically assigns identification tags related to information based on the analysis results.
[0196] An "event recognition system" is a system that has the function of aggregating multiple pieces of personal information related to a specific event.
[0197] "Generating means" refers to a process or apparatus that generates personalized visual information from compiled information.
[0198] "Presentation means" refers to a device or technology for presenting generated visual information in physical or virtual space.
[0199] The system implementing this invention mainly consists of a server, a terminal, and a user. The server is built on the cloud, and the terminal functions as a mobile device such as a smartphone or tablet. The user interfaces with the system through these devices.
[0200] The device collects various personal information, such as location data, photos, videos, and voice memos, during the user's activities. This data collection is done with the user's permission, and smartphone hardware such as GPS, camera, and microphone are used to obtain the information.
[0201] The server receives data sent from the terminal and analyzes it using machine learning algorithms. During the analysis process, it recognizes emotions within personal information, and identification tags are automatically assigned to the analysis results. Specifically, it uses machine learning frameworks such as TensorFlow and PyTorch to perform calculations to identify emotions from image and audio data.
[0202] The analyzed data is further organized by event recognition mechanisms, and data related to specific events is grouped together. For example, photos and videos related to specific events such as trips or festivals are automatically grouped.
[0203] Personalized visual information is generated from this organized data through a generation method. The user interacts with the generating AI model, prompting it to set the story's theme and emotional expression.
[0204] As a means of presentation, the generated visual information is presented to the user through augmented reality (AR) or virtual reality (VR). This allows the user to re-experience past memories in both physical and virtual spaces. Smartphone ARKit or ARCore are used for presentation.
[0205] As a concrete example, we will generate an AR experience that allows users to relive the emotions of a park they visited with their family, using photos and videos from that time. An example of the prompt text in this case is as follows:
[0206] "I'd like to use AR to relive memories of our family visit to XX Park last spring. I have photos and videos. Please recreate them while emphasizing the emotions."
[0207] In this way, a system is provided that allows users to relive special moments more deeply and emotionally.
[0208] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0209] Step 1:
[0210] The device uses the user's smartphone or tablet to collect personal information such as location data, photos, videos, and voice memos. This collection is performed using GPS, camera, and microphone, and the data is stored directly on the device. The input is various sensor data collected in real time, and the output is the collected raw data.
[0211] Step 2:
[0212] The terminal sends the collected data to the server. The server preprocesses the received data and converts it into a parseable format. Specifically, audio data is converted to text, and image data has its resolution adjusted. The input is the raw data sent from the terminal, and the output is the data converted into a parseable format.
[0213] Step 3:
[0214] The server uses machine learning algorithms to analyze user emotions from analyzable data. Natural language processing techniques are applied to emotion analysis of audio data, and image recognition techniques are applied to emotion analysis of image data. The input is pre-processed data, and the output is the emotion information assigned to each data point.
[0215] Step 4:
[0216] The server automatically assigns identification tags related to the information based on the analysis results. For example, tags such as sentiment information, location, and event name are added to the data. The input is data with sentiment information, and the output is tagged data.
[0217] Step 5:
[0218] The server uses event recognition mechanisms to organize the tagged data into related events. This allows the data to be classified based on specific contexts, such as travel or festivals. The input is tagged data, and the output is a dataset associated with events.
[0219] Step 6:
[0220] The user inputs prompt text to the generating AI model to configure the settings for generating visual information. This prompt specifies a particular theme or emotional expression. The input is the user's prompt text, and the output is the generation settings parameters.
[0221] Step 7:
[0222] The server considers generation configuration parameters and generates personalized visual information from event-related data. This visual information includes content for AR and VR. The input is a well-organized dataset and generation configuration parameters, and the output is the generated visual content.
[0223] Step 8:
[0224] The device presents the generated visual information to the user using ARKit or ARCore. This allows the user to re-experience past events in AR / VR based on specified emotions or themes. The input is the generated visual content, and the output is the user's visual re-experience.
[0225] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0226] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0227] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0228] [Second Embodiment]
[0229] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0230] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0231] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0232] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0233] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0234] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0235] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0236] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0237] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0238] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0239] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0240] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0241] This invention relates to a comprehensive system for efficiently organizing and reliving special moments in a user's daily life. In its embodiments, the system is mainly organized around three themes: a server, a terminal, and a user, each playing a different role.
[0242] The device collects various data, such as photos, videos, voice memos, GPS information, social media posts, and calendar events, through applications installed on the user's smartphone or other mobile device. This data collection is performed automatically with the user's permission. The user's location information and the date and time of capture can be used to supplement the data with background and scene information.
[0243] The server receives data sent from the terminal. After receiving the data, the server uses machine learning algorithms to analyze it and identify its content and associated emotions. Based on the analysis results, it assigns information tags related to each data item and performs cross-media tagging. This makes it possible to intuitively organize data based on information such as location, characters, event names, and emotions.
[0244] Furthermore, the server recognizes specific events based on the user's calendar information and historical data, and groups related data. For example, it automatically detects the user's birthday or travel events and groups together photos and videos associated with them.
[0245] Users can use this organized data to generate personalized visual information. Specifically, they interact with an AI assistant to determine settings (e.g., theme and video length) for creating a particular story. Based on this information, the server generates personalized videos or slideshows.
[0246] Finally, the device is equipped with technology to display the generated visual information. This technology uses augmented reality (AR) or virtual reality (VR) to allow users to re-experience past memories in real or virtual space. This makes it possible to recreate special moments regardless of the physical environment.
[0247] As described above, this system aims to enrich users' digital lives by automatically organizing their memories and reconstructing special moments in a unique way.
[0248] The following describes the processing flow.
[0249] Step 1:
[0250] The device obtains the necessary permissions from the user and automatically collects data such as photos, videos, and voice memos from the smartphone. Since GPS information is also collected, the user's location information can also be obtained.
[0251] Step 2:
[0252] The device sends the collected data to the server using a secure protocol. Data is compressed and encrypted during transmission to ensure its security.
[0253] Step 3:
[0254] The server stores the data received from the terminal and analyzes its content using machine learning algorithms. This analysis includes image recognition, speech recognition, and text analysis, and is used to identify people, places, and objects within the data.
[0255] Step 4:
[0256] Based on the analysis results, the server automatically assigns relevant information tags to the data. These tags include location, person, event name, and sentiment, and are useful for subsequent data retrieval and organization.
[0257] Step 5:
[0258] The server uses calendar information to recognize events and groups data related to specific events (such as trips or birthdays). This makes it possible to manage related memories together.
[0259] Step 6:
[0260] The user accesses the generated data and interacts with an AI assistant to configure settings for creating a personalized video story. Once the settings are complete, the server generates the video or slideshow based on them.
[0261] Step 7:
[0262] The device displays generated visual information to the user using augmented reality or virtual reality technology. Users can use AR devices or VR goggles to re-experience special moments in real or virtual space.
[0263] (Example 1)
[0264] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0265] In modern society, user information is generated in vast quantities from a wide range of sources, making it difficult to properly organize this information and quickly re-examine necessary information. In particular, organizing information based on emotions and systematically understanding chronologically related events are challenges.
[0266] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0267] In this invention, the server includes information device means for automatically collecting user information from multiple information sources, data analysis means for analyzing the collected user information and identifying its content and associated emotions, and identifier assignment means for automatically assigning identifiers related to the information based on the analysis results. This enables efficient organization of user information and allows for the re-experience of information based on emotions and related events.
[0268] "Information device means" refers to a device or system for automatically collecting user information from multiple information sources.
[0269] "Data analysis means" refers to means for analyzing collected user information and identifying its content and associated sentiments.
[0270] "Identifier assignment means" refers to a means for automatically assigning identifiers related to information based on the results of data analysis.
[0271] An "activity recognition means" is a means for systematizing multiple pieces of user information related to a specific activity.
[0272] "Information generation means" refers to means for generating customized visual information from systematized information.
[0273] "Presentation means" refers to a means of presenting generated visual information to a user in real space or virtual space.
[0274] A "learning algorithm" is an algorithm used in data analysis to identify the sentiment of users.
[0275] Augmented reality technology is a technology that overlays digital information onto the real world environment.
[0276] "Virtual reality technology" is a technology that presents users with a computer-generated virtual environment, providing a sense of immersion.
[0277] This invention is a comprehensive device for organizing and reliving special moments in the user's daily life. This device mainly consists of three components: a server, a terminal, and a user, each playing a different role.
[0278] The device collects information through applications installed on the user's smartphone or mobile device. Specifically, it automatically acquires images, videos, voice memos, and location information using the camera, microphone, and GPS functions. This collection process is initiated automatically or manually based on the user's settings. Social media posts and calendar events are also acquired as target data. The information collected at this stage is collected with the user's permission and transmitted to the server via a secure channel.
[0279] The server receives the collected information and performs data analysis. Here, machine learning algorithms are used, and in particular, the "generative AI model" plays a role in identifying emotions related to the content of the data. Also, relevant identifiers are automatically assigned to the analyzed data, and systematic categorization based on activities is carried out. Through these processes, the server organizes information based on specific events, such as important events for users like "birthday" or "travel".
[0280] Users can generate customized stories by interacting with the AI assistant. During the interaction, users convey their wishes using prompt sentences. For example, specific requests such as "Create a 3 - minute video of the memories of a summer trip" are communicated to the AI. Based on this prompt sentence, the server generates personalized visual information based on relevant data.
[0281] The terminal has a function to present the generated visual information to the user. By using augmented reality (AR) technology or virtual reality (VR) technology, users can relive their past special moments in a virtual space or the real space. This experience gives users a visual impact and provides a new digital life that is independent of the physical environment.
[0282] With this invention, it is expected that users can access information in a highly organized and tagged state easily, and the user experience when reliving special moments will be improved.
[0283] The flow of specific processing in Example 1 will be described using FIG. 11.
[0284] Step 1:
[0285] The terminal collects information from the user's daily life. As inputs, it uses the camera, microphone, GPS, and the user's SNS posts, calendar entries. Thereby, it obtains images, videos, voice memos, location information, and transmits them to the server via secure communication means.
[0286] Step 2:
[0287] The server receives the data sent from the terminal. As input, it handles all the collected data from the terminal. To perform data analysis, it uses a generated AI model to perform semantic annotation and sentiment analysis on the items included in the data. As output, it obtains a dataset with identifiers assigned.
[0288] Step 3:
[0289] The server assigns identifiers to the analyzed data. As input, it uses the analysis results provided by the generated AI model. It adds tags based on event names and sentiment to the data and structures the data through cross-media tagging. As output, it generates a tagged and organized data group.
[0290] Step 4:
[0291] The server groups specific event data based on the organized data. As input, it uses the data group with identifiers assigned. It refers to the user's calendar information, etc., automatically extracts and summarizes the data of related events (such as birthdays and trips). As output, it obtains a dataset grouped by event.
[0292] Step 5:
[0293] The user inputs a prompt sentence via the AI assistant. Specifically, it provides content such as the text instruction "Create a 3-minute video of the memories of the summer trip". Thereby, the server receives the detailed settings required for generation and starts processing.
[0294] Step 6:
[0295] The server generates customized visual information based on user prompts. It uses prompts and grouped event data as input. Utilizing a generative AI model, it creates videos and slideshows in the user's desired format. The output is the completed visual content.
[0296] Step 7:
[0297] The device presents the generated video or slideshow to the user. It receives completed content from a server as input. Output includes providing visual information that utilizes AR or VR technology, allowing users to interactively re-experience special moments from the past in real or virtual space.
[0298] (Application Example 1)
[0299] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0300] The present invention aims to provide a method for efficiently organizing special moments in a user's daily life and effectively reliving them. In particular, it aims to realize a system that allows users to re-experience past experiences while interacting with a household robot. This will emotionally enrich special experiences in the user's digital life.
[0301] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0302] In this invention, the server includes a device means for automatically acquiring user information from multiple information sources, an analysis device means for analyzing the acquired user information and recognizing its content, and a tagging means for automatically assigning attribute tags related to the information based on the analysis results. This makes it possible for users to relive special moments from the past through a home robot.
[0303] "Device means" is a component having a function for automatically acquiring user information from a plurality of information sources.
[0304] "Analysis device means" is a device having a function for analyzing the acquired user information and recognizing its content.
[0305] "Tagging means" is a mechanism for automatically assigning attribute tags related to information based on the analysis result.
[0306] "Event recognition means" is a mechanism for organizing a plurality of user information related to a specific event.
[0307] "Generation device means" is a device for generating individualized visual information from the organized information.
[0308] "Presentation device means" is a device equipped with a technology for presenting the generated visual information in the real world or a virtual world.
[0309] "Means for reproducing an individualized story" is a method for allowing a user to re-experience past memories while interacting with a home appliance robot.
[0310] The system for implementing the present invention can efficiently organize special moments in a user's daily life and re-experience them via a home robot. The system mainly consists of a terminal, a server, and a user.
[0311] The user collects various information such as photos, videos, voice memos, location information, SNS posts, and calendar events in daily activities through a terminal such as a smartphone. These data are automatically transmitted to the server after obtaining the user's permission.
[0312] The server analyzes received user data using machine learning algorithms such as Python, TensorFlow, and PyTorch. This analysis identifies the sentiment associated with the data's content, and information tags are assigned based on this. This tagging intuitively organizes the information and links it to specific events.
[0313] In particular, home robots are designed to allow users to relive past memories in an interactive format. The robots receive voice input from the user and generate appropriate responses. By utilizing AR technology to provide visual information, users can experience special moments from the past in the real world.
[0314] For example, if a user asks a home robot to "show me photos from last year's trip," the system identifies the relevant photos using tags and displays them through an AR display. This allows the user to vividly recreate actual landscapes and past experiences.
[0315] An example of a prompt to input into a generative AI model is: "The user wants memories of a trip from last year. Please find relevant content based on specific dates and sentiment tags."
[0316] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0317] Step 1:
[0318] The device collects information from the user's daily life, including photos, videos, voice memos, location data, social media posts, and calendar events. This data is collected automatically with the user's permission. The input data is diverse and broadly covers the user's daily life. Structured raw data is generated as output.
[0319] Step 2:
[0320] The device sends the collected data to the server. After the server receives the data, it analyzes it using machine learning algorithms (such as TensorFlow or PyTorch). The server receives the raw data sent from the device as input and generates results that identify the content and sentiment as output.
[0321] Step 3:
[0322] The server automatically assigns information tags based on the analysis results. The input is already analyzed data, and the output is organized data with attributes related to the dataset (e.g., location, person, emotion, etc.) attached.
[0323] Step 4:
[0324] The server groups organized data related to a specific event. This includes organizing information based on the user's calendar information and related events. The input is organized data with tags, and the output generates a dataset that aggregates all data related to a single event.
[0325] Step 5:
[0326] When a user wishes to relive past memories through a home robot, the server uses a generated dataset to provide a personalized narrative. Specifically, the AI analyzes the user's voice input, retrieves relevant information, and generates and presents visual information on an AR device or robot display. The input is the user's request (as a prompt), and the output is an interactive visual experience.
[0327] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0328] This invention relates to a comprehensive system for efficiently organizing and reliving special moments, including emotions, in a user's daily life. The system mainly consists of a server, a terminal, an emotion engine, and a user, each playing a different role.
[0329] The device collects various types of data, such as photos, videos, and voice memos, through applications installed on the user's smartphone or other mobile devices. This collection is done automatically with the user's permission, and the device also uses location services to supplement the user's actions and the background of the scene.
[0330] The server receives data sent from the terminal and analyzes it using machine learning algorithms and an emotion engine. The emotion engine determines the user's emotional state, particularly through the analysis of voice and images, and reflects this in the analysis results. As a result, information tags based on the analysis results are automatically assigned to each data, and cross-media tagging is performed. This process allows the data to be organized by information such as location, person, event name, and emotion.
[0331] Furthermore, the server recognizes specific events based on the user's calendar information and historical data, and groups related data accordingly. For example, it can detect the user's birthday or travel events and group related photos and videos together.
[0332] Users utilize this organized data to generate personalized visual information via an AI assistant. Specifically, users specify the settings necessary for story creation (theme, video length, emotional expression, etc.) through dialogue. The server then generates personalized videos and slideshows that take emotional information into account, based on these settings.
[0333] The device has the ability to deliver this generated visual information through augmented reality (AR) or virtual reality (VR) technology. This allows users to re-experience special moments from the past in real or virtual space. For example, a user can use an AR device to relive memorable moments from their travels.
[0334] This system allows users to efficiently manage special moments, including emotions, within digital data and relive them as needed. This enables users to make more meaningful use of their individual memories.
[0335] The following describes the processing flow.
[0336] Step 1:
[0337] The device operates applications with the user's permission and automatically collects user data such as photos, videos, voice memos, and GPS data from the smartphone. Data collection is performed periodically in the background.
[0338] Step 2:
[0339] The device transmits the collected data to the server using a secure protocol. During this process, the data is compressed and encrypted to ensure information security.
[0340] Step 3:
[0341] The server analyzes the received data. Image recognition technology detects objects and faces in photos and videos and identifies their content. For voice memos, speech recognition technology is used to convert the audio into text.
[0342] Step 4:
[0343] An emotion engine operates on the server, analyzing audio and images from the analyzed data to determine the user's emotions. This identifies emotional states such as happiness, surprise, and sadness.
[0344] Step 5:
[0345] The server automatically assigns informational tags to the data based on the analysis results. These tags include information such as location, people, event names, and user sentiment, and are used later to search and organize the data.
[0346] Step 6:
[0347] The server performs event recognition and identifies specific events based on calendar information. It organizes the collected data in association with events, grouping data by event such as travel or birthdays.
[0348] Step 7:
[0349] The user uses an AI assistant to configure settings for generating a personalized story. The user specifies the theme, video length, emotional expression, and other details, and this information is sent to the server.
[0350] Step 8:
[0351] The server generates personalized visual information (e.g., videos, slideshows) that reflects user preferences and emotions and event information. Analyzed emotion information is also used in the generation process.
[0352] Step 9:
[0353] The device provides the user with generated visual information using augmented reality (AR) or virtual reality (VR) technology. The user can use an AR device or VR goggles to re-experience special moments in real or virtual space.
[0354] (Example 2)
[0355] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0356] It is difficult to efficiently collect the diverse information that individuals generate in their daily lives, analyze its content, and grasp its characteristics, including emotional states. Furthermore, effectively classifying data related to specific events and integrating it to make it re-experienced as personalized visual information has been difficult to achieve with conventional methods.
[0357] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0358] In this invention, the server includes terminal means for automatically acquiring personal information, analysis means for analyzing the acquired personal information and evaluating emotional states, tagging means for identifying content based on the analysis results and associating information tags, classification means for classifying and organizing personal information related to important activities, creation means for creating personalized visual elements using the organized information, and reproduction means for reproducing the created visual elements in the real world or virtual space. This makes it possible to efficiently analyze and organize diverse personal data and provide a system that allows users to re-experience special moments.
[0359] "Personal information" refers to digital data related to a user, such as photos, videos, voice memos, and location data.
[0360] "Terminal means" refers to a computer or device used to automatically acquire and collect user information.
[0361] "Analysis methods" refer to algorithms and systems that use acquired personal information to evaluate emotional states and analyze their characteristics using machine learning or generative AI models.
[0362] "Tagging method" refers to the process of identifying information related to user data based on analysis results and assigning corresponding labels or tags.
[0363] "Classification method" refers to a method or system for organizing and grouping personal information related to a specific event or important activity.
[0364] "Creation method" refers to a method for generating visual elements tailored to individual users based on organized information.
[0365] "Reproduction methods" refer to systems and technologies for presenting generated visual elements in real-world environments or virtual spaces.
[0366] This invention provides a system for efficiently collecting and analyzing personal information and generating and reproducing personalized visual information based on the results. The components of the system are described in detail below.
[0367] Users use their devices to automatically collect personal information such as photos, videos, and voice memos generated in their daily lives. With the user's permission, the device securely collects this data and uses location services to add background data such as location information. Mobile devices such as smartphones and tablets are primarily used for this purpose.
[0368] The collected data is sent to a server. The server is a high-performance computer system that runs machine learning algorithms and generative AI models for sentiment analysis. Examples of software used include machine learning libraries such as TensorFlow and PyTorch. This allows the server to analyze the data and extract the user's emotional state and other important information.
[0369] Once the analysis is complete, the server automatically assigns information tags to the data. This organizes and groups data related to specific events (e.g., birthdays or trips).
[0370] Through interaction with the AI assistant, users can specify themes such as "happy memories" or "special moments" and request the server to generate visual information. The AI assistant then generates the most suitable videos or slideshows based on the user's preferences. The generated visual information is presented via the user's device using augmented reality (AR) or virtual reality (VR) technology. In particular, using AR glasses or VR headsets makes it possible to visually re-experience past memories.
[0371] As a concrete example, if a user wants to relive memories of a past trip, they might enter the prompt, "Please create a 5-minute video using photos and videos from my trip to Hawaii that will bring back those memories." Based on this prompt, the system extracts and analyzes the necessary data and generates the desired video. This allows the user to relive that trip as if it happened yesterday.
[0372] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0373] Step 1:
[0374] Input: Photos, videos, and voice memos from the user's daily life.
[0375] Operation: The device automatically collects this data through applications on the user's smartphone or tablet. With the user's permission, location information is also obtained, and supplementary information is gathered to tag the data.
[0376] Output: The collected data, along with the associated location data, is prepared to be sent to the server.
[0377] Step 2:
[0378] Input: Personal information data transmitted from the device.
[0379] Operation: The server receives this data and analyzes its contents using machine learning algorithms and generative AI models. This analysis includes tone analysis from audio data and facial recognition in images to assess the user's emotional state.
[0380] Output: The analyzed data is tagged with emotional states and event information and registered in a database.
[0381] Step 3:
[0382] Input: Analyzed and tagged user data.
[0383] Operation: Based on the analysis information, the server classifies and organizes data related to specific events or important activities. For example, this includes the process of grouping data related to past trips or birthday events.
[0384] Output: A well-organized dataset is generated for each event.
[0385] Step 4:
[0386] Input: A well-organized dataset.
[0387] Operation: The user enters prompt text into the AI assistant, specifying the theme and content for visual information generation. Based on these instructions, the server uses image and video editing software to generate personalized videos and slideshows.
[0388] Output: Generated visual information file (video or slideshow).
[0389] Step 5:
[0390] Input: The generated visual information file.
[0391] Operation: The device reproduces visual information in the real world or virtual space through AR glasses or VR headsets. This allows users to relive special moments from the past in an immersive way.
[0392] Output: Presentation of interactive visual information that allows the user to re-experience the experience.
[0393] (Application Example 2)
[0394] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0395] As personal digital information increases daily, there is a need for systems that can efficiently manage this information and easily allow users to relive special moments. However, conventional systems are insufficient in providing detailed organization that takes emotions into account and in offering visual re-experiences. Therefore, the challenge is to provide users with a more personalized experience.
[0396] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0397] In this invention, the server includes terminal means for automatically acquiring personal information from multiple sources; analysis means for analyzing the acquired personal information and identifying its content; tagging means for automatically assigning identification tags related to the information based on the analysis results; event recognition means for aggregating multiple pieces of personal information related to a specific event; generation means for generating personalized visual information from the aggregated information; and presentation means for presenting the generated visual information in physical or virtual space. This enables users to efficiently manage special moments, including emotions, and re-experience them in physical or virtual space.
[0398] A "terminal device" is a device that has the function of automatically acquiring personal information from multiple sources.
[0399] "Analysis means" refers to a process or device for analyzing acquired personal information and identifying its contents.
[0400] A "tagging mechanism" is a system that automatically assigns identification tags related to information based on the analysis results.
[0401] An "event recognition system" is a system that has the function of aggregating multiple pieces of personal information related to a specific event.
[0402] "Generating means" refers to a process or apparatus that generates personalized visual information from compiled information.
[0403] "Presentation means" refers to a device or technology for presenting generated visual information in physical or virtual space.
[0404] The system implementing this invention mainly consists of a server, a terminal, and a user. The server is built on the cloud, and the terminal functions as a mobile device such as a smartphone or tablet. The user interfaces with the system through these devices.
[0405] The device collects various personal information, such as location data, photos, videos, and voice memos, during the user's activities. This data collection is done with the user's permission, and smartphone hardware such as GPS, camera, and microphone are used to obtain the information.
[0406] The server receives data sent from the terminal and analyzes it using machine learning algorithms. During the analysis process, it recognizes emotions within personal information, and identification tags are automatically assigned to the analysis results. Specifically, it uses machine learning frameworks such as TensorFlow and PyTorch to perform calculations to identify emotions from image and audio data.
[0407] The analyzed data is further organized by event recognition mechanisms, and data related to specific events is grouped together. For example, photos and videos related to specific events such as trips or festivals are automatically grouped.
[0408] Personalized visual information is generated from this organized data through a generation method. The user interacts with the generating AI model, prompting it to set the story's theme and emotional expression.
[0409] As a means of presentation, the generated visual information is presented to the user through augmented reality (AR) or virtual reality (VR). This allows the user to re-experience past memories in both physical and virtual spaces. Smartphone ARKit or ARCore are used for presentation.
[0410] As a concrete example, we will generate an AR experience that allows users to relive the emotions of a park they visited with their family, using photos and videos from that time. An example of the prompt text in this case is as follows:
[0411] "I'd like to use AR to relive memories of our family visit to XX Park last spring. I have photos and videos. Please recreate them while emphasizing the emotions."
[0412] In this way, a system is provided that allows users to relive special moments more deeply and emotionally.
[0413] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0414] Step 1:
[0415] The device uses the user's smartphone or tablet to collect personal information such as location data, photos, videos, and voice memos. This collection is performed using GPS, camera, and microphone, and the data is stored directly on the device. The input is various sensor data collected in real time, and the output is the collected raw data.
[0416] Step 2:
[0417] The terminal sends the collected data to the server. The server preprocesses the received data and converts it into a parseable format. Specifically, audio data is converted to text, and image data has its resolution adjusted. The input is the raw data sent from the terminal, and the output is the data converted into a parseable format.
[0418] Step 3:
[0419] The server uses machine learning algorithms to analyze user emotions from analyzable data. Natural language processing techniques are applied to emotion analysis of audio data, and image recognition techniques are applied to emotion analysis of image data. The input is pre-processed data, and the output is the emotion information assigned to each data point.
[0420] Step 4:
[0421] The server automatically assigns identification tags related to the information based on the analysis results. For example, tags such as sentiment information, location, and event name are added to the data. The input is data with sentiment information, and the output is tagged data.
[0422] Step 5:
[0423] The server uses event recognition mechanisms to organize the tagged data into related events. This allows the data to be classified based on specific contexts, such as travel or festivals. The input is tagged data, and the output is a dataset associated with events.
[0424] Step 6:
[0425] The user inputs prompt text to the generating AI model to configure the settings for generating visual information. This prompt specifies a particular theme or emotional expression. The input is the user's prompt text, and the output is the generation settings parameters.
[0426] Step 7:
[0427] The server considers generation configuration parameters and generates personalized visual information from event-related data. This visual information includes content for AR and VR. The input is a well-organized dataset and generation configuration parameters, and the output is the generated visual content.
[0428] Step 8:
[0429] The device presents the generated visual information to the user using ARKit or ARCore. This allows the user to re-experience past events in AR / VR based on specified emotions or themes. The input is the generated visual content, and the output is the user's visual re-experience.
[0430] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0431] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0432] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0433] [Third Embodiment]
[0434] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0435] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0436] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0437] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0438] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0439] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0440] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0441] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0442] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0443] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0444] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0445] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0446] This invention relates to a comprehensive system for efficiently organizing and reliving special moments in a user's daily life. In its embodiments, the system is mainly organized around three themes: a server, a terminal, and a user, each playing a different role.
[0447] The device collects various data, such as photos, videos, voice memos, GPS information, social media posts, and calendar events, through applications installed on the user's smartphone or other mobile device. This data collection is performed automatically with the user's permission. The user's location information and the date and time of capture can be used to supplement the data with background and scene information.
[0448] The server receives data sent from the terminal. After receiving the data, the server uses machine learning algorithms to analyze it and identify its content and associated emotions. Based on the analysis results, it assigns information tags related to each data item and performs cross-media tagging. This makes it possible to intuitively organize data based on information such as location, characters, event names, and emotions.
[0449] Furthermore, the server recognizes specific events based on the user's calendar information and historical data, and groups related data. For example, it automatically detects the user's birthday or travel events and groups together photos and videos associated with them.
[0450] Users can use this organized data to generate personalized visual information. Specifically, they interact with an AI assistant to determine settings (e.g., theme and video length) for creating a particular story. Based on this information, the server generates personalized videos or slideshows.
[0451] Finally, the device is equipped with technology to display the generated visual information. This technology uses augmented reality (AR) or virtual reality (VR) to allow users to re-experience past memories in real or virtual space. This makes it possible to recreate special moments regardless of the physical environment.
[0452] As described above, this system aims to enrich users' digital lives by automatically organizing their memories and reconstructing special moments in a unique way.
[0453] The following describes the processing flow.
[0454] Step 1:
[0455] The device obtains the necessary permissions from the user and automatically collects data such as photos, videos, and voice memos from the smartphone. Since GPS information is also collected, the user's location information can also be obtained.
[0456] Step 2:
[0457] The device sends the collected data to the server using a secure protocol. Data is compressed and encrypted during transmission to ensure its security.
[0458] Step 3:
[0459] The server stores the data received from the terminal and analyzes its content using machine learning algorithms. This analysis includes image recognition, speech recognition, and text analysis, and is used to identify people, places, and objects within the data.
[0460] Step 4:
[0461] Based on the analysis results, the server automatically assigns relevant information tags to the data. These tags include location, person, event name, and sentiment, and are useful for subsequent data retrieval and organization.
[0462] Step 5:
[0463] The server uses calendar information to recognize events and groups data related to specific events (such as trips or birthdays). This makes it possible to manage related memories together.
[0464] Step 6:
[0465] The user accesses the generated data and interacts with an AI assistant to configure settings for creating a personalized video story. Once the settings are complete, the server generates the video or slideshow based on them.
[0466] Step 7:
[0467] The device displays generated visual information to the user using augmented reality or virtual reality technology. Users can use AR devices or VR goggles to re-experience special moments in real or virtual space.
[0468] (Example 1)
[0469] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0470] In modern society, user information is generated in vast quantities from a wide range of sources, making it difficult to properly organize this information and quickly re-examine necessary information. In particular, organizing information based on emotions and systematically understanding chronologically related events are challenges.
[0471] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0472] In this invention, the server includes information device means for automatically collecting user information from multiple information sources, data analysis means for analyzing the collected user information and identifying its content and associated emotions, and identifier assignment means for automatically assigning identifiers related to the information based on the analysis results. This enables efficient organization of user information and allows for the re-experience of information based on emotions and related events.
[0473] "Information device means" refers to a device or system for automatically collecting user information from multiple information sources.
[0474] "Data analysis means" refers to means for analyzing collected user information and identifying its content and associated sentiments.
[0475] "Identifier assignment means" refers to a means for automatically assigning identifiers related to information based on the results of data analysis.
[0476] An "activity recognition means" is a means for systematizing multiple pieces of user information related to a specific activity.
[0477] "Information generation means" refers to means for generating customized visual information from systematized information.
[0478] "Presentation means" refers to a means of presenting generated visual information to a user in real space or virtual space.
[0479] A "learning algorithm" is an algorithm used in data analysis to identify the sentiment of users.
[0480] Augmented reality technology is a technology that overlays digital information onto the real world environment.
[0481] "Virtual reality technology" is a technology that presents users with a computer-generated virtual environment, providing a sense of immersion.
[0482] This invention is a comprehensive device for organizing and reliving special moments in the user's daily life. This device mainly consists of three components: a server, a terminal, and a user, each playing a different role.
[0483] The device collects information through applications installed on the user's smartphone or mobile device. Specifically, it automatically acquires images, videos, voice memos, and location information using the camera, microphone, and GPS functions. This collection process is initiated automatically or manually based on the user's settings. Social media posts and calendar events are also acquired as target data. The information collected at this stage is collected with the user's permission and transmitted to the server via a secure channel.
[0484] The server receives the collected information and performs data analysis. Machine learning algorithms are used, with a particular "generative AI model" playing a key role in identifying the sentiment associated with the data's content. The analyzed data is automatically assigned relevant identifiers and organized based on activity. Through these processes, the server organizes information based on specific events, such as "birthdays" or "trips," which are important to the user.
[0485] Users can generate customized stories by interacting with the AI assistant. During the interaction, users communicate their wishes using prompts. For example, they might tell the AI a specific request such as, "Make a 3-minute video about my summer trip memories." Based on these prompts, the server generates personalized visual information using relevant data.
[0486] The device is equipped with the functionality to present generated visual information to the user. By using augmented reality (AR) and virtual reality (VR) technologies, users can re-experience special moments from their past in virtual or real-world spaces. This experience has a visual impact on the user and provides a new digital life that is not dependent on the physical environment.
[0487] This invention is expected to enhance the user experience when reliving special moments, as users will be able to easily access information in a highly organized and tagged state.
[0488] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0489] Step 1:
[0490] The device collects information from the user's daily life. Inputs include the camera, microphone, GPS, user social media posts, and calendar entries. This allows it to acquire images, videos, voice memos, and location information, which are then transmitted to a server via a secure communication method.
[0491] Step 2:
[0492] The server receives data sent from the terminal. It processes all collected data from the terminal as input. To perform data analysis, it uses a generative AI model to assign meaning to the items in the data and perform sentiment analysis. The output is a dataset with assigned identifiers.
[0493] Step 3:
[0494] The server assigns identifiers to the analyzed data. It uses analysis results provided by a generative AI model as input. It tags the data based on event names and sentiment, and structures the data through cross-media tagging. The output generates tagged and organized data groups.
[0495] Step 4:
[0496] The server groups specific event data based on organized data. It uses data groups with assigned identifiers as input. It automatically extracts and groups data for related events (e.g., birthdays and trips) by referencing the user's calendar information, etc. The output is a dataset grouped by event.
[0497] Step 5:
[0498] The user enters a prompt via the AI assistant. Specifically, they provide text instructions such as, "Make a 3-minute video of my summer trip memories." The server then receives the necessary details for generation and begins processing.
[0499] Step 6:
[0500] The server generates customized visual information based on user prompts. It uses prompts and grouped event data as input. Utilizing a generative AI model, it creates videos and slideshows in the user's desired format. The output is the completed visual content.
[0501] Step 7:
[0502] The device presents the generated video or slideshow to the user. It receives completed content from a server as input. Output includes providing visual information that utilizes AR or VR technology, allowing users to interactively re-experience special moments from the past in real or virtual space.
[0503] (Application Example 1)
[0504] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0505] The present invention aims to provide a method for efficiently organizing special moments in a user's daily life and effectively reliving them. In particular, it aims to realize a system that allows users to re-experience past experiences while interacting with a household robot. This will emotionally enrich special experiences in the user's digital life.
[0506] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0507] In this invention, the server includes a device means for automatically acquiring user information from multiple information sources, an analysis device means for analyzing the acquired user information and recognizing its content, and a tagging means for automatically assigning attribute tags related to the information based on the analysis results. This makes it possible for users to relive special moments from the past through a home robot.
[0508] "Device means" refers to a component that has the function of automatically acquiring user information from multiple information sources.
[0509] An "analysis device means" is a device that has the function of analyzing acquired user information and recognizing its content.
[0510] A "tagging method" is a mechanism that automatically assigns attribute tags related to information based on the analysis results.
[0511] An "event recognition means" is a mechanism for organizing and consolidating multiple user information related to a specific event.
[0512] A "generation device means" is a device for generating individualized visual information from standardized information.
[0513] "Presentation device means" refers to a device equipped with technology for presenting generated visual information in the real world or a virtual world.
[0514] "Means of recreating personalized narratives" refers to methods that allow users to re-experience past memories while interacting with home appliance robots.
[0515] The system implementing this invention can efficiently organize special moments in a user's daily life and allow them to re-experience them through a home robot. The system mainly consists of a terminal, a server, and a user.
[0516] Users collect various types of information through their smartphones and other devices, including photos, videos, voice memos, location data, social media posts, and calendar events related to their daily activities. This data is automatically sent to the server after obtaining the user's permission.
[0517] The server analyzes received user data using machine learning algorithms such as Python, TensorFlow, and PyTorch. This analysis identifies the sentiment associated with the data's content, and information tags are assigned based on this. This tagging intuitively organizes the information and links it to specific events.
[0518] In particular, home robots are designed to allow users to relive past memories in an interactive format. The robots receive voice input from the user and generate appropriate responses. By utilizing AR technology to provide visual information, users can experience special moments from the past in the real world.
[0519] For example, if a user asks a home robot to "show me photos from last year's trip," the system identifies the relevant photos using tags and displays them through an AR display. This allows the user to vividly recreate actual landscapes and past experiences.
[0520] An example of a prompt to input into a generative AI model is: "The user wants memories of a trip from last year. Please find relevant content based on specific dates and sentiment tags."
[0521] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0522] Step 1:
[0523] The device collects information from the user's daily life, including photos, videos, voice memos, location data, social media posts, and calendar events. This data is collected automatically with the user's permission. The input data is diverse and broadly covers the user's daily life. Structured raw data is generated as output.
[0524] Step 2:
[0525] The device sends the collected data to the server. After the server receives the data, it analyzes it using machine learning algorithms (such as TensorFlow or PyTorch). The server receives the raw data sent from the device as input and generates results that identify the content and sentiment as output.
[0526] Step 3:
[0527] The server automatically assigns information tags based on the analysis results. The input is already analyzed data, and the output is organized data with attributes related to the dataset (e.g., location, person, emotion, etc.) attached.
[0528] Step 4:
[0529] The server groups organized data related to a specific event. This includes organizing information based on the user's calendar information and related events. The input is organized data with tags, and the output generates a dataset that aggregates all data related to a single event.
[0530] Step 5:
[0531] When a user wishes to relive past memories through a home robot, the server uses a generated dataset to provide a personalized narrative. Specifically, the AI analyzes the user's voice input, retrieves relevant information, and generates and presents visual information on an AR device or robot display. The input is the user's request (as a prompt), and the output is an interactive visual experience.
[0532] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0533] This invention relates to a comprehensive system for efficiently organizing and reliving special moments, including emotions, in a user's daily life. The system mainly consists of a server, a terminal, an emotion engine, and a user, each playing a different role.
[0534] The device collects various types of data, such as photos, videos, and voice memos, through applications installed on the user's smartphone or other mobile devices. This collection is done automatically with the user's permission, and the device also uses location services to supplement the user's actions and the background of the scene.
[0535] The server receives data sent from the terminal and analyzes it using machine learning algorithms and an emotion engine. The emotion engine determines the user's emotional state, particularly through the analysis of voice and images, and reflects this in the analysis results. As a result, information tags based on the analysis results are automatically assigned to each data, and cross-media tagging is performed. This process allows the data to be organized by information such as location, person, event name, and emotion.
[0536] Furthermore, the server recognizes specific events based on the user's calendar information and historical data, and groups related data accordingly. For example, it can detect the user's birthday or travel events and group related photos and videos together.
[0537] Users utilize this organized data to generate personalized visual information via an AI assistant. Specifically, users specify the settings necessary for story creation (theme, video length, emotional expression, etc.) through dialogue. The server then generates personalized videos and slideshows that take emotional information into account, based on these settings.
[0538] The device has the ability to deliver this generated visual information through augmented reality (AR) or virtual reality (VR) technology. This allows users to re-experience special moments from the past in real or virtual space. For example, a user can use an AR device to relive memorable moments from their travels.
[0539] This system allows users to efficiently manage special moments, including emotions, within digital data and relive them as needed. This enables users to make more meaningful use of their individual memories.
[0540] The following describes the processing flow.
[0541] Step 1:
[0542] The device operates applications with the user's permission and automatically collects user data such as photos, videos, voice memos, and GPS data from the smartphone. Data collection is performed periodically in the background.
[0543] Step 2:
[0544] The device transmits the collected data to the server using a secure protocol. During this process, the data is compressed and encrypted to ensure information security.
[0545] Step 3:
[0546] The server analyzes the received data. Image recognition technology detects objects and faces in photos and videos and identifies their content. For voice memos, speech recognition technology is used to convert the audio into text.
[0547] Step 4:
[0548] An emotion engine operates on the server, analyzing audio and images from the analyzed data to determine the user's emotions. This identifies emotional states such as happiness, surprise, and sadness.
[0549] Step 5:
[0550] The server automatically assigns informational tags to the data based on the analysis results. These tags include information such as location, people, event names, and user sentiment, and are used later to search and organize the data.
[0551] Step 6:
[0552] The server performs event recognition and identifies specific events based on calendar information. It organizes the collected data in association with events, grouping data by event such as travel or birthdays.
[0553] Step 7:
[0554] The user uses an AI assistant to configure settings for generating a personalized story. The user specifies the theme, video length, emotional expression, and other details, and this information is sent to the server.
[0555] Step 8:
[0556] The server generates personalized visual information (e.g., videos, slideshows) that reflects user preferences and emotions and event information. Analyzed emotion information is also used in the generation process.
[0557] Step 9:
[0558] The device provides the user with generated visual information using augmented reality (AR) or virtual reality (VR) technology. The user can use an AR device or VR goggles to re-experience special moments in real or virtual space.
[0559] (Example 2)
[0560] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0561] It is difficult to efficiently collect the diverse information that individuals generate in their daily lives, analyze its content, and grasp its characteristics, including emotional states. Furthermore, effectively classifying data related to specific events and integrating it to make it re-experienced as personalized visual information has been difficult to achieve with conventional methods.
[0562] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0563] In this invention, the server includes terminal means for automatically acquiring personal information, analysis means for analyzing the acquired personal information and evaluating emotional states, tagging means for identifying content based on the analysis results and associating information tags, classification means for classifying and organizing personal information related to important activities, creation means for creating personalized visual elements using the organized information, and reproduction means for reproducing the created visual elements in the real world or virtual space. This makes it possible to efficiently analyze and organize diverse personal data and provide a system that allows users to re-experience special moments.
[0564] "Personal information" refers to digital data related to a user, such as photos, videos, voice memos, and location data.
[0565] "Terminal means" refers to a computer or device used to automatically acquire and collect user information.
[0566] "Analysis methods" refer to algorithms and systems that use acquired personal information to evaluate emotional states and analyze their characteristics using machine learning or generative AI models.
[0567] "Tagging method" refers to the process of identifying information related to user data based on analysis results and assigning corresponding labels or tags.
[0568] "Classification method" refers to a method or system for organizing and grouping personal information related to a specific event or important activity.
[0569] "Creation method" refers to a method for generating visual elements tailored to individual users based on organized information.
[0570] "Reproduction methods" refer to systems and technologies for presenting generated visual elements in real-world environments or virtual spaces.
[0571] This invention provides a system for efficiently collecting and analyzing personal information and generating and reproducing personalized visual information based on the results. The components of the system are described in detail below.
[0572] Users use their devices to automatically collect personal information such as photos, videos, and voice memos generated in their daily lives. With the user's permission, the device securely collects this data and uses location services to add background data such as location information. Mobile devices such as smartphones and tablets are primarily used for this purpose.
[0573] The collected data is sent to a server. The server is a high-performance computer system that runs machine learning algorithms and generative AI models for sentiment analysis. Examples of software used include machine learning libraries such as TensorFlow and PyTorch. This allows the server to analyze the data and extract the user's emotional state and other important information.
[0574] Once the analysis is complete, the server automatically assigns information tags to the data. This organizes and groups data related to specific events (e.g., birthdays or trips).
[0575] Through interaction with the AI assistant, users can specify themes such as "happy memories" or "special moments" and request the server to generate visual information. The AI assistant then generates the most suitable videos or slideshows based on the user's preferences. The generated visual information is presented via the user's device using augmented reality (AR) or virtual reality (VR) technology. In particular, using AR glasses or VR headsets makes it possible to visually re-experience past memories.
[0576] As a concrete example, if a user wants to relive memories of a past trip, they might enter the prompt, "Please create a 5-minute video using photos and videos from my trip to Hawaii that will bring back those memories." Based on this prompt, the system extracts and analyzes the necessary data and generates the desired video. This allows the user to relive that trip as if it happened yesterday.
[0577] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0578] Step 1:
[0579] Input: Photos, videos, and voice memos from the user's daily life.
[0580] Operation: The device automatically collects this data through applications on the user's smartphone or tablet. With the user's permission, location information is also obtained, and supplementary information is gathered to tag the data.
[0581] Output: The collected data, along with the associated location data, is prepared to be sent to the server.
[0582] Step 2:
[0583] Input: Personal information data transmitted from the device.
[0584] Operation: The server receives this data and analyzes its contents using machine learning algorithms and generative AI models. This analysis includes tone analysis from audio data and facial recognition in images to assess the user's emotional state.
[0585] Output: The analyzed data is tagged with emotional states and event information and registered in a database.
[0586] Step 3:
[0587] Input: Analyzed and tagged user data.
[0588] Operation: Based on the analysis information, the server classifies and organizes data related to specific events or important activities. For example, this includes the process of grouping data related to past trips or birthday events.
[0589] Output: A well-organized dataset is generated for each event.
[0590] Step 4:
[0591] Input: A well-organized dataset.
[0592] Operation: The user enters prompt text into the AI assistant, specifying the theme and content for visual information generation. Based on these instructions, the server uses image and video editing software to generate personalized videos and slideshows.
[0593] Output: Generated visual information file (video or slideshow).
[0594] Step 5:
[0595] Input: The generated visual information file.
[0596] Operation: The device reproduces visual information in the real world or virtual space through AR glasses or VR headsets. This allows users to relive special moments from the past in an immersive way.
[0597] Output: Presentation of interactive visual information that allows the user to re-experience the experience.
[0598] (Application Example 2)
[0599] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0600] As personal digital information increases daily, there is a need for systems that can efficiently manage this information and easily allow users to relive special moments. However, conventional systems are insufficient in providing detailed organization that takes emotions into account and in offering visual re-experiences. Therefore, the challenge is to provide users with a more personalized experience.
[0601] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0602] In this invention, the server includes terminal means for automatically acquiring personal information from multiple sources; analysis means for analyzing the acquired personal information and identifying its content; tagging means for automatically assigning identification tags related to the information based on the analysis results; event recognition means for aggregating multiple pieces of personal information related to a specific event; generation means for generating personalized visual information from the aggregated information; and presentation means for presenting the generated visual information in physical or virtual space. This enables users to efficiently manage special moments, including emotions, and re-experience them in physical or virtual space.
[0603] A "terminal device" is a device that has the function of automatically acquiring personal information from multiple sources.
[0604] "Analysis means" refers to a process or device for analyzing acquired personal information and identifying its contents.
[0605] A "tagging mechanism" is a system that automatically assigns identification tags related to information based on the analysis results.
[0606] An "event recognition system" is a system that has the function of aggregating multiple pieces of personal information related to a specific event.
[0607] "Generating means" refers to a process or apparatus that generates personalized visual information from compiled information.
[0608] "Presentation means" refers to a device or technology for presenting generated visual information in physical or virtual space.
[0609] The system implementing this invention mainly consists of a server, a terminal, and a user. The server is built on the cloud, and the terminal functions as a mobile device such as a smartphone or tablet. The user interfaces with the system through these devices.
[0610] The device collects various personal information, such as location data, photos, videos, and voice memos, during the user's activities. This data collection is done with the user's permission, and smartphone hardware such as GPS, camera, and microphone are used to obtain the information.
[0611] The server receives data sent from the terminal and analyzes it using machine learning algorithms. During the analysis process, it recognizes emotions within personal information, and identification tags are automatically assigned to the analysis results. Specifically, it uses machine learning frameworks such as TensorFlow and PyTorch to perform calculations to identify emotions from image and audio data.
[0612] The analyzed data is further organized by event recognition mechanisms, and data related to specific events is grouped together. For example, photos and videos related to specific events such as trips or festivals are automatically grouped.
[0613] Personalized visual information is generated from this organized data through a generation method. The user interacts with the generating AI model, prompting it to set the story's theme and emotional expression.
[0614] As a means of presentation, the generated visual information is presented to the user through augmented reality (AR) or virtual reality (VR). This allows the user to re-experience past memories in both physical and virtual spaces. Smartphone ARKit or ARCore are used for presentation.
[0615] As a concrete example, we will generate an AR experience that allows users to relive the emotions of a park they visited with their family, using photos and videos from that time. An example of the prompt text in this case is as follows:
[0616] "I'd like to use AR to relive memories of our family visit to XX Park last spring. I have photos and videos. Please recreate them while emphasizing the emotions."
[0617] In this way, a system is provided that allows users to relive special moments more deeply and emotionally.
[0618] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0619] Step 1:
[0620] The device uses the user's smartphone or tablet to collect personal information such as location data, photos, videos, and voice memos. This collection is performed using GPS, camera, and microphone, and the data is stored directly on the device. The input is various sensor data collected in real time, and the output is the collected raw data.
[0621] Step 2:
[0622] The terminal sends the collected data to the server. The server preprocesses the received data and converts it into a parseable format. Specifically, audio data is converted to text, and image data has its resolution adjusted. The input is the raw data sent from the terminal, and the output is the data converted into a parseable format.
[0623] Step 3:
[0624] The server uses machine learning algorithms to analyze user emotions from analyzable data. Natural language processing techniques are applied to emotion analysis of audio data, and image recognition techniques are applied to emotion analysis of image data. The input is pre-processed data, and the output is the emotion information assigned to each data point.
[0625] Step 4:
[0626] The server automatically assigns identification tags related to the information based on the analysis results. For example, tags such as sentiment information, location, and event name are added to the data. The input is data with sentiment information, and the output is tagged data.
[0627] Step 5:
[0628] The server uses event recognition mechanisms to organize the tagged data into related events. This allows the data to be classified based on specific contexts, such as travel or festivals. The input is tagged data, and the output is a dataset associated with events.
[0629] Step 6:
[0630] The user inputs prompt text to the generating AI model to configure the settings for generating visual information. This prompt specifies a particular theme or emotional expression. The input is the user's prompt text, and the output is the generation settings parameters.
[0631] Step 7:
[0632] The server considers generation configuration parameters and generates personalized visual information from event-related data. This visual information includes content for AR and VR. The input is a well-organized dataset and generation configuration parameters, and the output is the generated visual content.
[0633] Step 8:
[0634] The device presents the generated visual information to the user using ARKit or ARCore. This allows the user to re-experience past events in AR / VR based on specified emotions or themes. The input is the generated visual content, and the output is the user's visual re-experience.
[0635] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0636] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0637] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0638] [Fourth Embodiment]
[0639] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0640] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0641] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0642] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0643] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0644] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0645] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0646] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0647] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0648] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0649] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0650] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0651] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0652] This invention relates to a comprehensive system for efficiently organizing and reliving special moments in a user's daily life. In its embodiments, the system is mainly organized around three themes: a server, a terminal, and a user, each playing a different role.
[0653] The device collects various data, such as photos, videos, voice memos, GPS information, social media posts, and calendar events, through applications installed on the user's smartphone or other mobile device. This data collection is performed automatically with the user's permission. The user's location information and the date and time of capture can be used to supplement the data with background and scene information.
[0654] The server receives data sent from the terminal. After receiving the data, the server uses machine learning algorithms to analyze it and identify its content and associated emotions. Based on the analysis results, it assigns information tags related to each data item and performs cross-media tagging. This makes it possible to intuitively organize data based on information such as location, characters, event names, and emotions.
[0655] Furthermore, the server recognizes specific events based on the user's calendar information and historical data, and groups related data. For example, it automatically detects the user's birthday or travel events and groups together photos and videos associated with them.
[0656] Users can use this organized data to generate personalized visual information. Specifically, they interact with an AI assistant to determine settings (e.g., theme and video length) for creating a particular story. Based on this information, the server generates personalized videos or slideshows.
[0657] Finally, the device is equipped with technology to display the generated visual information. This technology uses augmented reality (AR) or virtual reality (VR) to allow users to re-experience past memories in real or virtual space. This makes it possible to recreate special moments regardless of the physical environment.
[0658] As described above, this system aims to enrich users' digital lives by automatically organizing their memories and reconstructing special moments in a unique way.
[0659] The following describes the processing flow.
[0660] Step 1:
[0661] The device obtains the necessary permissions from the user and automatically collects data such as photos, videos, and voice memos from the smartphone. Since GPS information is also collected, the user's location information can also be obtained.
[0662] Step 2:
[0663] The device sends the collected data to the server using a secure protocol. Data is compressed and encrypted during transmission to ensure its security.
[0664] Step 3:
[0665] The server stores the data received from the terminal and analyzes its content using machine learning algorithms. This analysis includes image recognition, speech recognition, and text analysis, and is used to identify people, places, and objects within the data.
[0666] Step 4:
[0667] Based on the analysis results, the server automatically assigns relevant information tags to the data. These tags include location, person, event name, and sentiment, and are useful for subsequent data retrieval and organization.
[0668] Step 5:
[0669] The server uses calendar information to recognize events and groups data related to specific events (such as trips or birthdays). This makes it possible to manage related memories together.
[0670] Step 6:
[0671] The user accesses the generated data and interacts with an AI assistant to configure settings for creating a personalized video story. Once the settings are complete, the server generates the video or slideshow based on them.
[0672] Step 7:
[0673] The device displays generated visual information to the user using augmented reality or virtual reality technology. Users can use AR devices or VR goggles to re-experience special moments in real or virtual space.
[0674] (Example 1)
[0675] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0676] In modern society, user information is generated in vast quantities from a wide range of sources, making it difficult to properly organize this information and quickly re-examine necessary information. In particular, organizing information based on emotions and systematically understanding chronologically related events are challenges.
[0677] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0678] In this invention, the server includes information device means for automatically collecting user information from multiple information sources, data analysis means for analyzing the collected user information and identifying its content and associated emotions, and identifier assignment means for automatically assigning identifiers related to the information based on the analysis results. This enables efficient organization of user information and allows for the re-experience of information based on emotions and related events.
[0679] "Information device means" refers to a device or system for automatically collecting user information from multiple information sources.
[0680] "Data analysis means" refers to means for analyzing collected user information and identifying its content and associated sentiments.
[0681] "Identifier assignment means" refers to a means for automatically assigning identifiers related to information based on the results of data analysis.
[0682] An "activity recognition means" is a means for systematizing multiple pieces of user information related to a specific activity.
[0683] "Information generation means" refers to means for generating customized visual information from systematized information.
[0684] "Presentation means" refers to a means of presenting generated visual information to a user in real space or virtual space.
[0685] A "learning algorithm" is an algorithm used in data analysis to identify the sentiment of users.
[0686] Augmented reality technology is a technology that overlays digital information onto the real world environment.
[0687] "Virtual reality technology" is a technology that presents users with a computer-generated virtual environment, providing a sense of immersion.
[0688] This invention is a comprehensive device for organizing and reliving special moments in the user's daily life. This device mainly consists of three components: a server, a terminal, and a user, each playing a different role.
[0689] The device collects information through applications installed on the user's smartphone or mobile device. Specifically, it automatically acquires images, videos, voice memos, and location information using the camera, microphone, and GPS functions. This collection process is initiated automatically or manually based on the user's settings. Social media posts and calendar events are also acquired as target data. The information collected at this stage is collected with the user's permission and transmitted to the server via a secure channel.
[0690] The server receives the collected information and performs data analysis. Machine learning algorithms are used, with a particular "generative AI model" playing a key role in identifying the sentiment associated with the data's content. The analyzed data is automatically assigned relevant identifiers and organized based on activity. Through these processes, the server organizes information based on specific events, such as "birthdays" or "trips," which are important to the user.
[0691] Users can generate customized stories by interacting with the AI assistant. During the interaction, users communicate their wishes using prompts. For example, they might tell the AI a specific request such as, "Make a 3-minute video about my summer trip memories." Based on these prompts, the server generates personalized visual information using relevant data.
[0692] The device is equipped with the functionality to present generated visual information to the user. By using augmented reality (AR) and virtual reality (VR) technologies, users can re-experience special moments from their past in virtual or real-world spaces. This experience has a visual impact on the user and provides a new digital life that is not dependent on the physical environment.
[0693] This invention is expected to enhance the user experience when reliving special moments, as users will be able to easily access information in a highly organized and tagged state.
[0694] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0695] Step 1:
[0696] The device collects information from the user's daily life. Inputs include the camera, microphone, GPS, user social media posts, and calendar entries. This allows it to acquire images, videos, voice memos, and location information, which are then transmitted to a server via a secure communication method.
[0697] Step 2:
[0698] The server receives data sent from the terminal. It processes all collected data from the terminal as input. To perform data analysis, it uses a generative AI model to assign meaning to the items in the data and perform sentiment analysis. The output is a dataset with assigned identifiers.
[0699] Step 3:
[0700] The server assigns identifiers to the analyzed data. It uses analysis results provided by a generative AI model as input. It tags the data based on event names and sentiment, and structures the data through cross-media tagging. The output generates tagged and organized data groups.
[0701] Step 4:
[0702] The server groups specific event data based on organized data. It uses data groups with assigned identifiers as input. It automatically extracts and groups data for related events (e.g., birthdays and trips) by referencing the user's calendar information, etc. The output is a dataset grouped by event.
[0703] Step 5:
[0704] The user enters a prompt via the AI assistant. Specifically, they provide text instructions such as, "Make a 3-minute video of my summer trip memories." The server then receives the necessary details for generation and begins processing.
[0705] Step 6:
[0706] The server generates customized visual information based on user prompts. It uses prompts and grouped event data as input. Utilizing a generative AI model, it creates videos and slideshows in the user's desired format. The output is the completed visual content.
[0707] Step 7:
[0708] The device presents the generated video or slideshow to the user. It receives completed content from a server as input. Output includes providing visual information that utilizes AR or VR technology, allowing users to interactively re-experience special moments from the past in real or virtual space.
[0709] (Application Example 1)
[0710] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0711] The present invention aims to provide a method for efficiently organizing special moments in a user's daily life and effectively reliving them. In particular, it aims to realize a system that allows users to re-experience past experiences while interacting with a household robot. This will emotionally enrich special experiences in the user's digital life.
[0712] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0713] In this invention, the server includes a device means for automatically acquiring user information from multiple information sources, an analysis device means for analyzing the acquired user information and recognizing its content, and a tagging means for automatically assigning attribute tags related to the information based on the analysis results. This makes it possible for users to relive special moments from the past through a home robot.
[0714] "Device means" refers to a component that has the function of automatically acquiring user information from multiple information sources.
[0715] An "analysis device means" is a device that has the function of analyzing acquired user information and recognizing its content.
[0716] A "tagging method" is a mechanism that automatically assigns attribute tags related to information based on the analysis results.
[0717] An "event recognition means" is a mechanism for organizing and consolidating multiple user information related to a specific event.
[0718] A "generation device means" is a device for generating individualized visual information from standardized information.
[0719] "Presentation device means" refers to a device equipped with technology for presenting generated visual information in the real world or a virtual world.
[0720] "Means of recreating personalized narratives" refers to methods that allow users to re-experience past memories while interacting with home appliance robots.
[0721] The system implementing this invention can efficiently organize special moments in a user's daily life and allow them to re-experience them through a home robot. The system mainly consists of a terminal, a server, and a user.
[0722] Users collect various types of information through their smartphones and other devices, including photos, videos, voice memos, location data, social media posts, and calendar events related to their daily activities. This data is automatically sent to the server after obtaining the user's permission.
[0723] The server analyzes received user data using machine learning algorithms such as Python, TensorFlow, and PyTorch. This analysis identifies the sentiment associated with the data's content, and information tags are assigned based on this. This tagging intuitively organizes the information and links it to specific events.
[0724] In particular, home robots are designed to allow users to relive past memories in an interactive format. The robots receive voice input from the user and generate appropriate responses. By utilizing AR technology to provide visual information, users can experience special moments from the past in the real world.
[0725] For example, if a user asks a home robot to "show me photos from last year's trip," the system identifies the relevant photos using tags and displays them through an AR display. This allows the user to vividly recreate actual landscapes and past experiences.
[0726] An example of a prompt to input into a generative AI model is: "The user wants memories of a trip from last year. Please find relevant content based on specific dates and sentiment tags."
[0727] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0728] Step 1:
[0729] The device collects information from the user's daily life, including photos, videos, voice memos, location data, social media posts, and calendar events. This data is collected automatically with the user's permission. The input data is diverse and broadly covers the user's daily life. Structured raw data is generated as output.
[0730] Step 2:
[0731] The device sends the collected data to the server. After the server receives the data, it analyzes it using machine learning algorithms (such as TensorFlow or PyTorch). The server receives the raw data sent from the device as input and generates results that identify the content and sentiment as output.
[0732] Step 3:
[0733] The server automatically assigns information tags based on the analysis results. The input is already analyzed data, and the output is organized data with attributes related to the dataset (e.g., location, person, emotion, etc.) attached.
[0734] Step 4:
[0735] The server groups organized data related to a specific event. This includes organizing information based on the user's calendar information and related events. The input is organized data with tags, and the output generates a dataset that aggregates all data related to a single event.
[0736] Step 5:
[0737] When a user wishes to relive past memories through a home robot, the server uses a generated dataset to provide a personalized narrative. Specifically, the AI analyzes the user's voice input, retrieves relevant information, and generates and presents visual information on an AR device or robot display. The input is the user's request (as a prompt), and the output is an interactive visual experience.
[0738] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0739] This invention relates to a comprehensive system for efficiently organizing and reliving special moments, including emotions, in a user's daily life. The system mainly consists of a server, a terminal, an emotion engine, and a user, each playing a different role.
[0740] The device collects various types of data, such as photos, videos, and voice memos, through applications installed on the user's smartphone or other mobile devices. This collection is done automatically with the user's permission, and the device also uses location services to supplement the user's actions and the background of the scene.
[0741] The server receives data sent from the terminal and analyzes it using machine learning algorithms and an emotion engine. The emotion engine determines the user's emotional state, particularly through the analysis of voice and images, and reflects this in the analysis results. As a result, information tags based on the analysis results are automatically assigned to each data, and cross-media tagging is performed. This process allows the data to be organized by information such as location, person, event name, and emotion.
[0742] Furthermore, the server recognizes specific events based on the user's calendar information and historical data, and groups related data accordingly. For example, it can detect the user's birthday or travel events and group related photos and videos together.
[0743] Users utilize this organized data to generate personalized visual information via an AI assistant. Specifically, users specify the settings necessary for story creation (theme, video length, emotional expression, etc.) through dialogue. The server then generates personalized videos and slideshows that take emotional information into account, based on these settings.
[0744] The device has the ability to deliver this generated visual information through augmented reality (AR) or virtual reality (VR) technology. This allows users to re-experience special moments from the past in real or virtual space. For example, a user can use an AR device to relive memorable moments from their travels.
[0745] This system allows users to efficiently manage special moments, including emotions, within digital data and relive them as needed. This enables users to make more meaningful use of their individual memories.
[0746] The following describes the processing flow.
[0747] Step 1:
[0748] The device operates applications with the user's permission and automatically collects user data such as photos, videos, voice memos, and GPS data from the smartphone. Data collection is performed periodically in the background.
[0749] Step 2:
[0750] The device transmits the collected data to the server using a secure protocol. During this process, the data is compressed and encrypted to ensure information security.
[0751] Step 3:
[0752] The server analyzes the received data. Image recognition technology detects objects and faces in photos and videos and identifies their content. For voice memos, speech recognition technology is used to convert the audio into text.
[0753] Step 4:
[0754] An emotion engine operates on the server, analyzing audio and images from the analyzed data to determine the user's emotions. This identifies emotional states such as happiness, surprise, and sadness.
[0755] Step 5:
[0756] The server automatically assigns informational tags to the data based on the analysis results. These tags include information such as location, people, event names, and user sentiment, and are used later to search and organize the data.
[0757] Step 6:
[0758] The server performs event recognition and identifies specific events based on calendar information. It organizes the collected data in association with events, grouping data by event such as travel or birthdays.
[0759] Step 7:
[0760] The user uses an AI assistant to configure settings for generating a personalized story. The user specifies the theme, video length, emotional expression, and other details, and this information is sent to the server.
[0761] Step 8:
[0762] The server generates personalized visual information (e.g., videos, slideshows) that reflects user preferences and emotions and event information. Analyzed emotion information is also used in the generation process.
[0763] Step 9:
[0764] The device provides the user with generated visual information using augmented reality (AR) or virtual reality (VR) technology. The user can use an AR device or VR goggles to re-experience special moments in real or virtual space.
[0765] (Example 2)
[0766] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0767] It is difficult to efficiently collect the diverse information that individuals generate in their daily lives, analyze its content, and grasp its characteristics, including emotional states. Furthermore, effectively classifying data related to specific events and integrating it to make it re-experienced as personalized visual information has been difficult to achieve with conventional methods.
[0768] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0769] In this invention, the server includes terminal means for automatically acquiring personal information, analysis means for analyzing the acquired personal information and evaluating emotional states, tagging means for identifying content based on the analysis results and associating information tags, classification means for classifying and organizing personal information related to important activities, creation means for creating personalized visual elements using the organized information, and reproduction means for reproducing the created visual elements in the real world or virtual space. This makes it possible to efficiently analyze and organize diverse personal data and provide a system that allows users to re-experience special moments.
[0770] "Personal information" refers to digital data related to a user, such as photos, videos, voice memos, and location data.
[0771] "Terminal means" refers to a computer or device used to automatically acquire and collect user information.
[0772] "Analysis methods" refer to algorithms and systems that use acquired personal information to evaluate emotional states and analyze their characteristics using machine learning or generative AI models.
[0773] "Tagging method" refers to the process of identifying information related to user data based on analysis results and assigning corresponding labels or tags.
[0774] "Classification method" refers to a method or system for organizing and grouping personal information related to a specific event or important activity.
[0775] "Creation method" refers to a method for generating visual elements tailored to individual users based on organized information.
[0776] "Reproduction methods" refer to systems and technologies for presenting generated visual elements in real-world environments or virtual spaces.
[0777] This invention provides a system for efficiently collecting and analyzing personal information and generating and reproducing personalized visual information based on the results. The components of the system are described in detail below.
[0778] Users use their devices to automatically collect personal information such as photos, videos, and voice memos generated in their daily lives. With the user's permission, the device securely collects this data and uses location services to add background data such as location information. Mobile devices such as smartphones and tablets are primarily used for this purpose.
[0779] The collected data is sent to a server. The server is a high-performance computer system that runs machine learning algorithms and generative AI models for sentiment analysis. Examples of software used include machine learning libraries such as TensorFlow and PyTorch. This allows the server to analyze the data and extract the user's emotional state and other important information.
[0780] Once the analysis is complete, the server automatically assigns information tags to the data. This organizes and groups data related to specific events (e.g., birthdays or trips).
[0781] Through interaction with the AI assistant, users can specify themes such as "happy memories" or "special moments" and request the server to generate visual information. The AI assistant then generates the most suitable videos or slideshows based on the user's preferences. The generated visual information is presented via the user's device using augmented reality (AR) or virtual reality (VR) technology. In particular, using AR glasses or VR headsets makes it possible to visually re-experience past memories.
[0782] As a concrete example, if a user wants to relive memories of a past trip, they might enter the prompt, "Please create a 5-minute video using photos and videos from my trip to Hawaii that will bring back those memories." Based on this prompt, the system extracts and analyzes the necessary data and generates the desired video. This allows the user to relive that trip as if it happened yesterday.
[0783] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0784] Step 1:
[0785] Input: Photos, videos, and voice memos from the user's daily life.
[0786] Operation: The device automatically collects this data through applications on the user's smartphone or tablet. With the user's permission, location information is also obtained, and supplementary information is gathered to tag the data.
[0787] Output: The collected data, along with the associated location data, is prepared to be sent to the server.
[0788] Step 2:
[0789] Input: Personal information data transmitted from the device.
[0790] Operation: The server receives this data and analyzes its contents using machine learning algorithms and generative AI models. This analysis includes tone analysis from audio data and facial recognition in images to assess the user's emotional state.
[0791] Output: The analyzed data is tagged with emotional states and event information and registered in a database.
[0792] Step 3:
[0793] Input: Analyzed and tagged user data.
[0794] Operation: Based on the analysis information, the server classifies and organizes data related to specific events or important activities. For example, this includes the process of grouping data related to past trips or birthday events.
[0795] Output: A well-organized dataset is generated for each event.
[0796] Step 4:
[0797] Input: A well-organized dataset.
[0798] Operation: The user enters prompt text into the AI assistant, specifying the theme and content for visual information generation. Based on these instructions, the server uses image and video editing software to generate personalized videos and slideshows.
[0799] Output: Generated visual information file (video or slideshow).
[0800] Step 5:
[0801] Input: The generated visual information file.
[0802] Operation: The device reproduces visual information in the real world or virtual space through AR glasses or VR headsets. This allows users to relive special moments from the past in an immersive way.
[0803] Output: Presentation of interactive visual information that allows the user to re-experience the experience.
[0804] (Application Example 2)
[0805] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0806] As personal digital information increases daily, there is a need for systems that can efficiently manage this information and easily allow users to relive special moments. However, conventional systems are insufficient in providing detailed organization that takes emotions into account and in offering visual re-experiences. Therefore, the challenge is to provide users with a more personalized experience.
[0807] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0808] In this invention, the server includes terminal means for automatically acquiring personal information from multiple sources; analysis means for analyzing the acquired personal information and identifying its content; tagging means for automatically assigning identification tags related to the information based on the analysis results; event recognition means for aggregating multiple pieces of personal information related to a specific event; generation means for generating personalized visual information from the aggregated information; and presentation means for presenting the generated visual information in physical or virtual space. This enables users to efficiently manage special moments, including emotions, and re-experience them in physical or virtual space.
[0809] A "terminal device" is a device that has the function of automatically acquiring personal information from multiple sources.
[0810] "Analysis means" refers to a process or device for analyzing acquired personal information and identifying its contents.
[0811] A "tagging mechanism" is a system that automatically assigns identification tags related to information based on the analysis results.
[0812] An "event recognition system" is a system that has the function of aggregating multiple pieces of personal information related to a specific event.
[0813] "Generating means" refers to a process or apparatus that generates personalized visual information from compiled information.
[0814] "Presentation means" refers to a device or technology for presenting generated visual information in physical or virtual space.
[0815] The system implementing this invention mainly consists of a server, a terminal, and a user. The server is built on the cloud, and the terminal functions as a mobile device such as a smartphone or tablet. The user interfaces with the system through these devices.
[0816] The device collects various personal information, such as location data, photos, videos, and voice memos, during the user's activities. This data collection is done with the user's permission, and smartphone hardware such as GPS, camera, and microphone are used to obtain the information.
[0817] The server receives data sent from the terminal and analyzes it using machine learning algorithms. During the analysis process, it recognizes emotions within personal information, and identification tags are automatically assigned to the analysis results. Specifically, it uses machine learning frameworks such as TensorFlow and PyTorch to perform calculations to identify emotions from image and audio data.
[0818] The analyzed data is further organized by event recognition mechanisms, and data related to specific events is grouped together. For example, photos and videos related to specific events such as trips or festivals are automatically grouped.
[0819] Personalized visual information is generated from this organized data through a generation method. The user interacts with the generating AI model, prompting it to set the story's theme and emotional expression.
[0820] As a means of presentation, the generated visual information is presented to the user through augmented reality (AR) or virtual reality (VR). This allows the user to re-experience past memories in both physical and virtual spaces. Smartphone ARKit or ARCore are used for presentation.
[0821] As a concrete example, we will generate an AR experience that allows users to relive the emotions of a park they visited with their family, using photos and videos from that time. An example of the prompt text in this case is as follows:
[0822] "I'd like to use AR to relive memories of our family visit to XX Park last spring. I have photos and videos. Please recreate them while emphasizing the emotions."
[0823] In this way, a system is provided that allows users to relive special moments more deeply and emotionally.
[0824] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0825] Step 1:
[0826] The device uses the user's smartphone or tablet to collect personal information such as location data, photos, videos, and voice memos. This collection is performed using GPS, camera, and microphone, and the data is stored directly on the device. The input is various sensor data collected in real time, and the output is the collected raw data.
[0827] Step 2:
[0828] The terminal sends the collected data to the server. The server preprocesses the received data and converts it into a parseable format. Specifically, audio data is converted to text, and image data has its resolution adjusted. The input is the raw data sent from the terminal, and the output is the data converted into a parseable format.
[0829] Step 3:
[0830] The server uses machine learning algorithms to analyze user emotions from analyzable data. Natural language processing techniques are applied to emotion analysis of audio data, and image recognition techniques are applied to emotion analysis of image data. The input is pre-processed data, and the output is the emotion information assigned to each data point.
[0831] Step 4:
[0832] The server automatically assigns identification tags related to the information based on the analysis results. For example, tags such as sentiment information, location, and event name are added to the data. The input is data with sentiment information, and the output is tagged data.
[0833] Step 5:
[0834] The server uses event recognition mechanisms to organize the tagged data into related events. This allows the data to be classified based on specific contexts, such as travel or festivals. The input is tagged data, and the output is a dataset associated with events.
[0835] Step 6:
[0836] The user inputs prompt text to the generating AI model to configure the settings for generating visual information. This prompt specifies a particular theme or emotional expression. The input is the user's prompt text, and the output is the generation settings parameters.
[0837] Step 7:
[0838] The server considers generation configuration parameters and generates personalized visual information from event-related data. This visual information includes content for AR and VR. The input is a well-organized dataset and generation configuration parameters, and the output is the generated visual content.
[0839] Step 8:
[0840] The device presents the generated visual information to the user using ARKit or ARCore. This allows the user to re-experience past events in AR / VR based on specified emotions or themes. The input is the generated visual content, and the output is the user's visual re-experience.
[0841] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0842] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0843] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0844] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0845] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0846] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0847] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.
[0848] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0849] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0850] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0851] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0852] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0853] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0854] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0855] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0856] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0857] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0858] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0859] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0860] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0861] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted as being incorporated by reference.
[0862] The following is further disclosed regarding the embodiments described above.
[0863] (Claim 1)
[0864] A device means for automatically collecting user data from multiple data sources,
[0865] An analysis means for analyzing collected user data and identifying its contents,
[0866] A tagging method that automatically assigns information tags related to the data based on the analysis results,
[0867] An event recognition means for organizing multiple user data related to a specific event,
[0868] A generation means for generating personalized visual information from organized data,
[0869] A display means for displaying generated visual information in real space or virtual space,
[0870] A system that includes this.
[0871] (Claim 2)
[0872] The system according to claim 1, wherein the analysis means has a function to determine the sentiment of user data using a machine learning algorithm.
[0873] (Claim 3)
[0874] The system according to claim 1, wherein the display means has the function of providing visual information by applying augmented reality technology or virtual reality technology.
[0875] "Example 1"
[0876] (Claim 1)
[0877] Information device means for automatically collecting user information from multiple information sources,
[0878] A data analysis means for analyzing collected user information and identifying its content and associated emotions,
[0879] An identifier assignment means that automatically assigns identifiers related to the information based on the analysis results,
[0880] An activity recognition means that systematizes multiple user information related to a specific activity,
[0881] Information generation means for generating customized visual information from systematized information,
[0882] A presentation means for presenting generated visual information in real space or virtual space,
[0883] A device that includes this.
[0884] (Claim 2)
[0885] The apparatus according to claim 1, wherein the data analysis means has a function to identify the emotions of user information using a learning algorithm.
[0886] (Claim 3)
[0887] The apparatus according to claim 1, wherein the presentation means has the function of providing visual information by applying augmented reality technology or virtual reality technology.
[0888] "Application Example 1"
[0889] (Claim 1)
[0890] A device means for automatically acquiring user information from multiple information sources,
[0891] An analysis device means for analyzing acquired user information and recognizing its content,
[0892] A tagging method that automatically assigns attribute tags related to the information based on the analysis results,
[0893] An event recognition means for organizing multiple user information related to a specific event,
[0894] A generation device means for generating individualized visual information from organized information,
[0895] A presentation device means for presenting generated visual information in the real world or virtual world,
[0896] A means of recreating personalized narratives that allows users to relive past memories while interacting with home appliance robots,
[0897] A system that includes this.
[0898] (Claim 2)
[0899] The system according to claim 1, wherein the analysis device has a function to determine the sentiment of user information using a machine learning algorithm.
[0900] (Claim 3)
[0901] The system according to claim 1, wherein the presentation device means has the function of providing visual information by applying augmented real-world technology or virtual reality technology.
[0902] "Example 2 of combining an emotion engine"
[0903] (Claim 1)
[0904] A terminal device that automatically acquires personal information,
[0905] An analytical means for analyzing acquired personal information and evaluating emotional state,
[0906] A tagging means that identifies content based on analysis results and associates information tags,
[0907] A classification method for classifying and organizing personal information related to important activities,
[0908] A means for creating individualized visual elements using organized information,
[0909] A means of reproduction for recreating the created visual elements in the real world or virtual space,
[0910] A system that includes this.
[0911] (Claim 2)
[0912] The system according to claim 1, wherein the analysis means has the function of identifying the user's emotions using a generative AI model.
[0913] (Claim 3)
[0914] The system according to claim 1, wherein the reproduction means has the function of presenting visual elements by applying augmented reality technology or virtual reality technology.
[0915] "Application example 2 when combining with an emotional engine"
[0916] (Claim 1)
[0917] A terminal device that automatically acquires personal information from multiple sources,
[0918] An analytical means for analyzing acquired personal information and identifying its contents,
[0919] A tagging means that automatically assigns identification tags related to the information based on the analysis results,
[0920] An event recognition method that aggregates multiple pieces of personal information related to a specific event,
[0921] A generation means for generating personalized visual information from compiled information,
[0922] A presentation means for presenting generated visual information in physical or virtual space,
[0923] A device that includes this.
[0924] (Claim 2)
[0925] The apparatus according to claim 1, wherein the analysis means has a function to determine the sentiment of personal information using a machine learning method.
[0926] (Claim 3)
[0927] The apparatus according to claim 1, wherein the presentation means has the function of providing visual information by applying augmented reality technology or virtual reality technology. [Explanation of Symbols]
[0928] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A device means for automatically acquiring user information from multiple information sources, An analysis device means for analyzing acquired user information and recognizing its content, A tagging method that automatically assigns attribute tags related to the information based on the analysis results, An event recognition means for organizing multiple user information related to a specific event, A generation device means for generating individualized visual information from organized information, A presentation device means for presenting generated visual information in the real world or virtual world, A means of recreating personalized narratives that allows users to relive past memories while interacting with home appliance robots, A system that includes this.
2. The system according to claim 1, wherein the analysis device has a function to determine the emotions of user information using a machine learning algorithm.
3. The system according to claim 1, wherein the presentation device means has the function of providing visual information by applying augmented real-world technology or virtual reality technology.