system

The system addresses the challenge of dynamically providing personalized information by collecting and processing user data to anonymize and filter unnecessary information, ensuring real-time relevance and privacy, thereby enhancing user experience.

JP2026105314APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-16
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Conventional information providing devices fail to dynamically and personally tailor information to users' continuous interests and memories, and they do not adequately balance data privacy protection with real-time performance.

Method used

A system that collects visual and auditory data from users, preprocesses it to anonymize and filter unnecessary information, analyzes user interests, and provides personalized information in real-time through a display device, integrating with other databases to meet diverse user needs.

Benefits of technology

Enables continuous and personalized information delivery that respects user privacy while providing relevant information in real-time, enhancing user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026105314000001_ABST
    Figure 2026105314000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A visual device means for collecting environmental data and audio information from the user, Information processing device means for pre-processing collected data to anonymize and reduce data size, An analytical device means for analyzing pre-processed data to estimate individual interests, A video output device means for displaying information related to the user, An information integration system means that combines the analyzed results with information from an external database, A system that works in conjunction with a sensor network that provides real-time information on the usage status of public facilities, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] Conventional information providing devices mainly collect and provide limited information only when the user touches the device at a specific time, making it difficult to complement the user's continuous interests and memories. Also, insufficient information presentation that balances data privacy protection and real-time performance has been provided. The present invention aims to solve these problems and realize dynamic and personalized information provision based on data obtained through the user's daily activities.

Means for Solving the Problems

[0005] The system according to the present invention includes a device for collecting visual and auditory data from a user. This device temporarily preprocesses the collected data, anonymizing and filtering it. It also has a system for analyzing the preprocessed data to estimate the user's interests and provides information through a display device that presents information relevant to the user. Furthermore, by integrating the analysis results with information from other databases, it is possible to present personalized information to the user in real time. In addition, it meets the diverse needs of users by having a function to accumulate the user's past behavior and generate memory supplementation information, and by providing recommendations based on the user's lifestyle in cooperation with other devices.

[0006] "User" refers to an individual who provides and receives video and audio information.

[0007] "Visual data" refers to information from images and videos that come into the user's field of vision.

[0008] "Audio data" refers to the information of sounds and conversations that users hear.

[0009] "Device" refers to a tool that is physically or electronically configured to perform a specific function.

[0010] "Preprocessing" refers to the initial data transformation and organization performed after data collection.

[0011] "Anonymization" refers to a technique that processes data in a way that makes it impossible to identify specific individuals.

[0012] "Filtering" refers to the process of removing unnecessary information from a dataset.

[0013] "Analysis" refers to the process of extracting useful information and patterns from data.

[0014] "Inferring interests" refers to estimating a user's preferences and interests based on their lifestyle and behavior.

[0015] The "display device" refers to a device for providing visual information to the user.

[0016] "Integrate" means to combine different data or information into one.

[0017] "Memory supplement information" refers to additional information for supplementing or expanding the user's past actions and experiences.

[0018] "Recommend" means to propose information for appropriate choices or actions based on the user's preferences and actions.

[0019] "Real-time" means that data collection, processing, and information provision are performed immediately.

Brief Description of Drawings

[0020] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9]Shows an emotion map to which a plurality of emotions are mapped. [Figure 10] Shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Embodiment 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Modes for Carrying Out the Invention

[0021] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0022] First, the terms used in the following description will be explained.

[0023] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be one arithmetic unit or a combination of a plurality of arithmetic units. Also, the processor may be one type of arithmetic unit or a combination of a plurality of types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0024] In the following embodiments, signed RAM (Random Access Memory) is a memory that temporarily stores information and is used as work memory by the processor.

[0025] In the following embodiments, the signed storage is one or more non-volatile storage devices that store various programs and various parameters. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes.

[0026] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0027] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0028] [First Embodiment]

[0029] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0030] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0031] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0032] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0033] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0034] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0035] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0036] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0037] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0038] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0039] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0040] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0041] The present invention provides personalized information services by utilizing visual and auditory data obtained through the user's daily life. This system consists of smart glasses, which are a device worn by the user, and a server that processes the data. The functions of each component are described below.

[0042] Data acquisition and preprocessing:

[0043] As users wear smart glasses and go about their daily lives, the device continuously collects visual and audio data using its camera and microphone. The device processes this information in real time, filtering out unnecessary data and anonymizing it. This reduces the amount of data collected while protecting the user's privacy.

[0044] Data transfer and analysis:

[0045] Pre-processed data is transmitted from the terminal to the server via a secure communication channel. The server uses machine learning algorithms to analyze the data and estimate the user's interests and behavioral patterns. Specifically, it uses image recognition technology to identify specific objects and brands from visual data, and keywords extracted from audio data to understand the user's areas of interest.

[0046] Information generation and provision:

[0047] Based on the analysis results, the server generates information relevant to the user. This information includes news, product information, and event announcements that are likely to be of interest to the user. The generated information is sent to the terminal and displayed as an overlay in the user's field of view. This allows the user to obtain the information they need in real time.

[0048] Memory supplementation and recommendations:

[0049] The server accumulates and analyzes users' past behavioral data to provide information that complements the user's memory and offers helpful recommendations for the future. For example, it can compile information about places users frequently visit and products they browse, and notify them when they visit those places again.

[0050] In this way, the integrated functioning of users, terminals, and servers enables an innovative system that allows for continuous and personalized information delivery. For example, when a user is attending an exhibition, the terminal can display information on booths and products that might interest them in real time, supporting efficient browsing.

[0051] The following describes the processing flow.

[0052] Step 1:

[0053] The device continuously collects visual and audio data through the smart glasses worn by the user. The camera captures objects and text within the user's field of view, and the microphone records ambient sounds. This data is collected in real time and prepared for initial filtering.

[0054] Step 2:

[0055] The device performs noise reduction and anonymization for privacy protection on the collected raw data. Specifically, it uses facial recognition technology to blur personally identifiable faces and removes unwanted noise from audio data. Unnecessary information from visual data is also filtered out.

[0056] Step 3:

[0057] The terminal efficiently compresses the pre-processed data and sends it to the server using a secure communication protocol. This step involves encryption to ensure the security of the communication.

[0058] Step 4:

[0059] The server inputs the received data into a machine learning model to analyze the user's interests and behavioral tendencies. It uses image recognition to identify specific objects and brands from visual data, and extracts important keywords from audio data. This analysis is then used to identify the user's interests.

[0060] Step 5:

[0061] The server generates information that is likely to be of interest to the user based on the analysis results. Here, it selects relevant news articles, new product information, event announcements, etc., while taking into account the user's past data and areas of interest.

[0062] Step 6:

[0063] The device receives information sent from the server and overlays it on the smart glasses' display. This information is displayed naturally along the user's line of sight, and the user can hide the information if needed.

[0064] Step 7:

[0065] The server accumulates data on user behavior and interests over the long term, generating recommendation information to supplement memories and predict future behavior. This information is presented to the user as needed to support their daily decision-making.

[0066] (Example 1)

[0067] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0068] In recent years, there has been a growing demand for personalized information services that enrich and streamline users' life experiences. However, existing technologies struggle to collect information from diverse data sources in real time and process and analyze it while protecting user privacy. This presents a challenge in accurately understanding users' interests and behavioral patterns and providing appropriate information.

[0069] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0070] In this invention, the server includes means for collecting visual and auditory information obtained from the user's daily life, means for processing the collected information in real time to anonymize it and remove unnecessary information, and means for transmitting the pre-processed information to a central processing unit using a secure communication path. This makes it possible to provide personalized information services to the user in real time while ensuring the user's privacy.

[0071] "Visual information" refers to image and video data acquired through cameras and optical sensors.

[0072] "Auditory information" refers to sound data acquired through microphones or sound sensors.

[0073] "Anonymization" refers to the process of removing or concealing information that could identify an individual from individual data.

[0074] "Filtering" is the process of removing unnecessary or noisy information from collected data.

[0075] A "communication path" is a physical or virtual route used to move information from one point to another.

[0076] A "central processing unit" refers to the main computing system used for data analysis and execution of instructions.

[0077] A "machine learning algorithm" is a method that learns patterns through data analysis and makes predictions and decisions based on future data.

[0078] "Behavioral history" refers to a record of a user's past actions and choices.

[0079] "Recommendation" refers to suggestions aimed at presenting appropriate information and options based on a user's past behavior and interests.

[0080] This invention is a personalized information delivery system designed to improve the user's life experience. The system consists of a terminal, such as smart glasses, worn by the user, and a server that handles data processing.

[0081] The device is equipped with a camera and microphone, and acquires visual and auditory information from the user's daily life. This makes it possible to collect diverse data about the user's surrounding environment. The collected data is processed in real time within the device, and filtering and anonymization are applied as needed. At this stage, unnecessary data and information related to the user's privacy are removed.

[0082] Data processed in real time is transmitted to the server via a secure communication channel. The server uses various machine learning algorithms, including generative AI models, to analyze the received data. This allows for the estimation of user interests and behavioral patterns. For example, image recognition technology is used to recognize specific objects from visual information, while natural language processing is used to extract keywords from audio information.

[0083] Based on user profiles obtained through data analysis, the server generates personalized information for each user. This information includes news, product information, and event announcements. The generated information is sent to the terminal and displayed as an overlay in the user's field of view, allowing the user to obtain the necessary information in real time.

[0084] Furthermore, the server accumulates the user's past behavioral history and provides recommendation information that can be used as a reference for the future. For example, based on information about the user's favorite restaurants and frequently purchased products, it can suggest new services and products that are highly relevant.

[0085] For example, when a user is visiting a museum and viewing a particular painting, providing real-time information about the history and artist associated with that painting can support a deeper understanding and experience.

[0086] An example of a prompt for a generative AI model might be: "When a user is visiting a museum and viewing a particular painting, please provide real-time information about the history and artist associated with that painting."

[0087] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0088] Step 1: Data Collection

[0089] When a user puts on smart glasses, the device uses a camera to acquire visual information and a microphone to collect audio information. Specifically, the device acquires images and videos of the surrounding environment, as well as ambient sounds and conversations. At this stage, the input is the user's visual and auditory environment, and the output is raw image and audio files.

[0090] Step 2: Data Preprocessing

[0091] The device processes the collected visual and audio information in real time. Specifically, it uses facial recognition technology to blur the faces of people in the video and speech recognition to remove specific information (e.g., personal names). The input is the raw data obtained in step 1, and the output is anonymized data from which this personal information has been removed.

[0092] Step 3: Data Transfer

[0093] The terminal sends pre-processed data to the server via a secure communication path. Specifically, it transfers data using encryption protocols such as SSL / TLS. The input is anonymized data, and the output is anonymized data that the server can access.

[0094] Step 4: Data Analysis

[0095] The server executes machine learning algorithms to analyze the received data. The input consists of anonymized visual and audio data. Image recognition and natural language processing are used to identify specific objects and brands from the visual data and extract keywords from the audio data. The output is profile information about the user's interests and behavioral patterns.

[0096] Step 5: Information Generation

[0097] The server generates customized information for the user based on the analysis results. The input is the user's profile information, and the output includes personalized news, product information, and event announcements. This information is prioritized based on the user's interests.

[0098] Step 6: Information Provision

[0099] The server sends the generated information to the device, which then overlays the information on the smart glasses' display. Specifically, it displays details of events of interest or sale information directly within the user's field of view. The input is customized information from the server, and the output is information provided visually to the user.

[0100] Step 7: Memory Supplementation and Recommendation

[0101] The server accumulates past user behavior data and generates recommendations for future behavior through analysis. The input is the user's past behavior history, and the output is information to supplement that memory and recommendations for future behavior.

[0102] (Application Example 1)

[0103] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0104] In modern urban life, there is a demand for efficient and personalized information. However, conventional information systems struggle to provide optimal information in real time based on users' interests and behavior. Therefore, there is a lack of systems that can effectively utilize information on public facility usage and events.

[0105] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0106] In this invention, the server includes a visual device means for collecting environmental data and audio information from the user, an information processing device means for pre-processing the collected data to anonymize and reduce its data size, and a system means that cooperates with a sensor network that provides real-time information on the usage status of public facilities. This enables users to receive personalized urban information in real time.

[0107] A "visual device for collecting environmental and audio information from users" is a terminal device worn by a user that records visual and audio information about the surroundings, and is a device that plays a role in data collection.

[0108] An "information processing device that pre-processes collected data to anonymize and reduce its size" is a device that processes data obtained through a visual device in real time, optimizing data capacity while protecting privacy.

[0109] An "analytical device that analyzes pre-processed data to estimate individual interests" is a device that analyzes users' interests and behavioral patterns from data and performs processing to meet specific needs.

[0110] A "video output device for displaying user-related information" is a display device that intuitively presents analyzed information to the user and transmits information through visual means.

[0111] An "information integration system that combines analyzed results with information from external databases" is an integrated system that combines information from existing databases to provide users with more detailed information.

[0112] A "system that works in conjunction with a sensor network to provide real-time information on the usage of public facilities" is a network system that monitors the movement of equipment and people within public facilities in real time and provides users with information on the current status of the facilities based on that information.

[0113] The system for realizing this invention consists of a visual device worn by the user, a server that processes information, and a sensor network that manages facility information. In this system, the user's visual device collects ambient environmental data and audio information. This collected data is immediately anonymized and efficiently subjected to data reduction processing.

[0114] The server analyzes collected data via an information processing device and uses machine learning algorithms to estimate user interests and behavioral patterns. This allows it to extract information on public facilities and events that users are likely to be interested in. A cloud platform is used for this analysis, specifically Microsoft Azure®, which is used for data analysis.

[0115] Furthermore, the server works in conjunction with a sensor network to collect real-time information on the usage of public facilities and provides users with relevant information based on that data. For example, information on available seats at nearby libraries and details of ongoing events can be overlaid on the display of the user's visual device. This allows citizens to utilize urban resources more efficiently.

[0116] As a concrete example, when a citizen is walking through a major intersection, their visual device displays a notification stating, "A free music concert at the community hall starts at 6 PM." This notification is provided based on real-time information obtained from the sensor network in the area and the user's past interest data. Another example of a prompt message is, "Based on my current location, please display information on the availability of nearby public facilities and events on my smart glasses," which prompts the user to provide information.

[0117] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0118] Step 1:

[0119] The user wears a visual device and goes about their daily life. The device acquires ambient environmental data and audio information through a camera and microphone. The input at this stage is visual and audio information of the environment, and the output is environmental data in digital format. The device uses a camera integrated with sensors to collect visual and audio data, and acquires this data as a primary record.

[0120] Step 2:

[0121] The device immediately preprocesses the collected visual and audio data. The input is the raw data collected in step 1, and user privacy is protected by filtering out important information and anonymizing it during processing. The output is compressed and anonymized data. In this process, a dedicated processor on the device optimizes the amount of data using a data reduction algorithm.

[0122] Step 3:

[0123] The server receives pre-processed data sent from the terminal. The input is compressed and anonymized data, and a machine learning model is used to analyze the user's interests and behavioral patterns. The output is user-specific interest data. Specifically, a generated AI model on the server recognizes specific objects and events from visual data and explores the user's interests from audio data.

[0124] Step 4:

[0125] The server works in conjunction with the sensor network to acquire real-time usage information and event information for public facilities. The input is sensor data from the facilities, and the output is recommendation information based on user interests. In this step, the server aggregates cloud-based sensor data and analyzes usage patterns.

[0126] Step 5:

[0127] The server integrates analyzed user interest information with sensor network data and sends specific event information and recommendations to the user's visual device. The input is the integrated information, and the output is the visual information displayed to the user. The user's visual device uses overlay displays to present this information in real time, delivering it to the user in a visually easy-to-understand format.

[0128] Step 6:

[0129] Users make decisions in urban life based on information displayed on visual devices. The input is the information displayed by the visual device, and the output is the user's action choices. This process enables users to make convenient choices, such as using public facilities or participating in events.

[0130] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0131] This invention is an information system centered around smart glasses used by users on a daily basis. It recognizes the user's emotions based on visual and auditory data and utilizes this to provide personalized information. In addition to a terminal (smart glasses) and a server, this system includes an emotion engine for emotion recognition.

[0132] Data collection and emotion recognition:

[0133] By wearing smart glasses, users collect visual and auditory data in various everyday situations. The device uses a camera to capture the user's facial expressions and surroundings as visual data, and a microphone to record the user's and their surroundings' voices. The emotion engine analyzes this visual and auditory data in real time to estimate the user's emotional state.

[0134] Data processing and analysis:

[0135] The collected data undergoes noise reduction and necessary anonymization processing on the terminal before being transmitted to the server via secure communication. On the server, machine learning models analyze the data to identify user interests and behavioral tendencies, and process the information while considering the emotional state provided by the emotion engine.

[0136] Information generation and presentation:

[0137] The user's emotional information, obtained through the emotion engine, is integrated into the information generation process on the server. The server dynamically generates information tailored to the user's interests and emotional state, determining personalized content. For example, if the user is feeling stressed, information related to relaxation will be prioritized and presented.

[0138] Display and Feedback:

[0139] The device overlays processed information onto the smart glasses' display. This display adjusts appropriately according to the user's gaze and facial expressions, optimizing the amount and type of information provided. By continuously monitoring the user's emotions, the system strives to provide the most relevant information at all times to meet the user's needs.

[0140] As a concrete example of the present invention, if a user is shopping in a mall and has a calm expression, the system will present the latest trend information. On the other hand, if the user is feeling anxious when deciding to make a purchase, the system will quickly display additional product reviews and price information based on feedback from the emotion engine to help them make a purchase decision.

[0141] The following describes the processing flow.

[0142] Step 1:

[0143] The device collects visual and auditory data using its camera and microphone while the user is wearing the smart glasses. The camera captures the user's facial expressions and gaze direction, while the microphone picks up voice and ambient sounds. This data is prepared as input for the emotion engine.

[0144] Step 2:

[0145] The emotion engine analyzes visual and auditory data supplied from the device to estimate the user's current emotional state in real time. It uses facial recognition technology to analyze facial features and extracts emotional changes from voice tone and speaking style.

[0146] Step 3:

[0147] The device transmits the emotion estimation results from the emotion engine to the server via a secure channel. During this process, the data is anonymized and compressed to ensure efficient and secure communication.

[0148] Step 4:

[0149] The server analyzes the received sentiment data by integrating it with previously accumulated data on user interests and behavioral tendencies. Based on the results of this analysis, it generates information content that corresponds to the user's current needs and interests.

[0150] Step 5:

[0151] The server sends information to the terminal that takes the user's emotional state into consideration. This information includes relaxation content to reduce stress and engaging entertainment information tailored to the user's emotions.

[0152] Step 6:

[0153] The device overlays information onto the smart glasses' display. The displayed information is presented in an optimal format, taking into account the user's gaze and posture, and the content is adjusted according to their emotions.

[0154] Step 7:

[0155] If a user clears the displayed information or their emotions change, the device immediately feeds this change back to the emotion engine and server. This feedback is used to optimize future information presentations.

[0156] (Example 2)

[0157] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0158] Traditional systems have a problem in that they do not adequately provide information tailored to the user's emotional state or specific situation. Therefore, it is difficult to present the content users want in a timely and appropriate manner. As a result, information does not align with user needs, leading to decreased satisfaction. Furthermore, from a privacy protection perspective, the anonymization and secure handling of collected data are of paramount importance.

[0159] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0160] In this invention, the server includes means for processing visual and auditory data collected from the user and analyzing emotions using an emotion recognition engine and a machine learning model; means for generating personalized information based on the user's emotional state using a generative AI model; and means for presenting dynamically adjusted information to the user using a display device. This makes it possible to provide personalized information to the user in real time and improve user satisfaction.

[0161] A "terminal" is a device worn by a user to collect visual and auditory data.

[0162] "Noise reduction" is a process that removes unwanted sounds and images from collected data to clarify important information.

[0163] "Anonymization" is a process that removes or transforms personally identifiable information to protect data privacy.

[0164] A "server" is a central computing system that receives data sent from terminals and performs analysis and information generation.

[0165] An "emotion recognition engine" is software or a system that estimates a user's emotional state based on visual and auditory data.

[0166] A "machine learning model" is an algorithm or system used to analyze collected data and estimate user interests and behaviors.

[0167] A "generative AI model" is a model that applies artificial intelligence technology to generate appropriate information based on the user's emotional state and interests.

[0168] A "display device" is a screen or display used to visually present generated information to a user.

[0169] "Overlay display" is a technique that displays information superimposed on the user's field of view, and is used to improve the user experience.

[0170] This invention is an information system built around smart glasses worn by the user. The system primarily consists of smart glasses, a server, and an emotion recognition engine.

[0171] By wearing smart glasses, users collect visual and auditory data. Specifically, the smart glasses utilize built-in cameras and microphones to collect the user's facial expressions and surrounding sounds, processing the data in real time. The collected data undergoes noise reduction and anonymization processing within the device. This processing improves data quality while protecting user privacy.

[0172] Pre-processed data is transmitted to the server via a secure communication protocol. The server uses an emotion recognition engine to analyze the data and estimate the user's emotional state. Based on this analysis, the server uses a generative AI model to generate information that matches the user's interests and emotions. For example, if the user is feeling stressed, the server will prioritize generating relaxing content.

[0173] The generated information is transferred from the server to the smart glasses' display device. The smart glasses overlay the information and dynamically adjust the displayed information according to the user's gaze and facial expressions. This enables the provision of personalized information that is tailored to the user's situation and emotions.

[0174] For example, if a user is calm and relaxed in a cafe, the server can generate and display information about nearby events and recommended reading lists. Furthermore, if a user feels anxious while shopping, the system can support their purchase decision by quickly providing additional product reviews and pricing information based on their emotions.

[0175] An example of a prompt from a generative AI model is, "How should information about relaxation be presented when a user is feeling stressed?"

[0176] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0177] Step 1:

[0178] When a user puts on smart glasses, the device collects the user's visual and audio data. Specifically, it uses a camera to capture the surroundings and the user's facial expressions, and a microphone to record audio. The inputs include real-time acquired visual and audio data. Using this data, the device performs noise reduction processing to output clean data that eliminates unnecessary information. This processing removes background sounds and deletes meaningless pixels from the video.

[0179] Step 2:

[0180] The device anonymizes the data preprocessed in Step 1. The input consists of de-noised visual and audio data. The device hides or transforms specific parts of the information to prevent personal identification, outputting a new, privacy-protected dataset. This process employs techniques such as applying filters to generalize facial features and removing personal names from audio data.

[0181] Step 3:

[0182] The terminal sends anonymized data to the server via a secure protocol (e.g., HTTPS). The input for transmission is the anonymized data obtained in step 2. The output is in the form of data communication packets received by the server. This transmission and reception process is encrypted to prevent data loss or interception.

[0183] Step 4:

[0184] The server analyzes the received data. The input data consists of anonymized visual and audio data sent from the terminal. The server uses an emotion recognition engine to estimate the user's emotional state. This analysis outputs information about the user's emotions. In this analysis step, image recognition algorithms and natural language processing techniques are applied to identify emotions from the user's facial expressions and tone of voice.

[0185] Step 5:

[0186] The server utilizes the analyzed emotional information and generates user-specific information using a generative AI model. The input data is the emotional information from step 4. The output is personalized information tailored to the user's emotional state. For example, if the user is anxious, content related to relaxation will be generated. In this step, pre-configured prompts are used to instruct the generative AI model to generate information.

[0187] Step 6:

[0188] Information generated from the server is sent to the terminal, which then overlays that information onto the smart glasses' display. The input sent to the terminal is the generated personalized information, and the output is the information displayed within the user's field of vision. In this overlay display, the display position and content are dynamically adjusted according to the user's gaze and facial expressions, providing a comfortable user experience.

[0189] (Application Example 2)

[0190] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0191] Modern consumers are required to select the most suitable products and services amidst information overload, and in particular, in the shopping experience at physical stores, it is crucial to present information that takes into account individual needs and emotional states. However, current technology makes it difficult to accurately grasp a user's emotional state and present the most appropriate information in real time. Therefore, there is an urgent need to develop a system that effectively provides personalized information by utilizing user emotional information.

[0192] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0193] In this invention, the server includes information processing means for collecting visual and auditory information from the user, information processing means for pre-processing, anonymizing, and sorting the collected information, and information analysis system means for analyzing the pre-processed information to estimate the user's emotional state. This makes it possible to provide personalized information about products and services that are tailored to the user's emotional state.

[0194] An "information processing device for collecting visual and auditory information from a user" refers to a device that acquires audiovisual information such as a user's facial expressions and voice, and has the function of collecting information in everyday situations.

[0195] An "information processing device that pre-processes, anonymizes, and filters collected information" is a device that removes personally identifiable elements from acquired information and converts it into an appropriate format, playing a role in extracting only useful information while protecting data privacy.

[0196] An "information analysis system that analyzes pre-processed information to estimate a user's emotional state" is a system that accurately estimates a user's emotions based on pre-processed information, and has the function of analyzing emotional states using specific algorithms or machine learning models.

[0197] A "display device for presenting relevant product or service information based on the user's emotional state" is a device that visually displays information about products and services to the user in a manner adapted to the analyzed emotional state, enabling real-time information provision.

[0198] An "information integration system that combines analyzed emotional states and estimation results with supplementary information obtained from other information storage devices" is a system that integrates the results of emotional analysis with data obtained from external information sources and processes them comprehensively, possessing the function of integrating the information necessary to provide users with the most optimal information.

[0199] In embodiments of the present invention, the information processing system is primarily composed of smart glasses worn by the user. The smart glasses are equipped with a camera to acquire visual information and a microphone to acquire audio information, and function as an "information processing device for collecting visual and audio information from the user." The device constantly acquires the user's visual and audio data, and the collected information is first pre-processed, including noise reduction. At this time, anonymization and sorting are performed to remove personally identifiable elements, thereby protecting privacy.

[0200] Pre-processed data is sent to a cloud server, which uses a machine learning model to estimate the user's emotional state. Specifically, the server analyzes the acquired data using an "information analysis system that analyzes pre-processed information to estimate the user's emotional state," and generates personalized information based on the results.

[0201] Product and service information tailored to the user's emotional state is overlaid on the smart glasses' display. This display device operates as a "display device for presenting relevant product or service information based on the user's emotional state," dynamically responding to the user's gaze and facial expressions to present information.

[0202] Furthermore, the server integrates the emotional state and estimation results using an "information integration system that combines the analyzed emotional state and estimation results with supplementary information obtained from other information storage devices" to optimize information delivery. This includes commercial information such as new products and promotions. In this way, users can receive information in a manner that harmonizes with their own emotions, making the in-store shopping experience more personalized and enhanced.

[0203] For example, when a user is inspecting a sofa in a store, if the emotion engine determines that the user is "interested," customer reviews and promotional information for that sofa will be displayed on the smart glasses. In addition, prompts such as "Please tell us what other users think of this product" and "Please suggest recommended product combinations" are used to generate suggestions for improving the user experience to the generative AI model.

[0204] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0205] Step 1:

[0206] The device collects visual and auditory information from the user. Using a camera and microphone, it captures the user's facial expressions and surrounding sounds, recording them as digital data. The input is the user's visual and auditory information, and the output is the digital data derived from this information.

[0207] Step 2:

[0208] The collected data is pre-processed on the terminal. Data accuracy and privacy are ensured by noise reduction and anonymization of personally identifiable elements. The input is the digital data obtained in step 1, and the output is filtered and anonymized clean data. This data processing is performed by a signal processing algorithm.

[0209] Step 3:

[0210] The terminal securely sends filtered data to the server. This data is encrypted and transferred to the server. The input is the clean data from step 2, and the output is the data after it has been transferred to the server.

[0211] Step 4:

[0212] The server analyzes the received data and uses a generative AI model to estimate the user's emotional state. The input is the data sent in step 3, and the output is information indicating the user's emotional state. This data processing is performed by an emotion analysis algorithm.

[0213] Step 5:

[0214] The server prepares data to generate and display information about relevant products and services based on the user's emotional state. The input is the emotional state information from step 4, and the output is information relevant to the user. For information generation, a generation AI model and prompt statements such as "Please tell me what other users think of this product" and "Please suggest recommended product combinations" are used.

[0215] Step 6:

[0216] The device overlays information received from the server onto the smart glasses. The display position and content of the information are dynamically adjusted based on the user's gaze and actions. The input is the information prepared in step 5, and the output is what is displayed on the smart glasses. This operation is performed using eye-tracking and display technologies.

[0217] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0218] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0219] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0220] [Second Embodiment]

[0221] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0222] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0223] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0224] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0225] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0226] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0227] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0228] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0229] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0230] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0231] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0232] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0233] The present invention provides personalized information services by utilizing visual and auditory data obtained through the user's daily life. This system consists of smart glasses, which are a device worn by the user, and a server that processes the data. The functions of each component are described below.

[0234] Data acquisition and preprocessing:

[0235] As users wear smart glasses and go about their daily lives, the device continuously collects visual and audio data using its camera and microphone. The device processes this information in real time, filtering out unnecessary data and anonymizing it. This reduces the amount of data collected while protecting the user's privacy.

[0236] Data transfer and analysis:

[0237] Pre-processed data is transmitted from the terminal to the server via a secure communication channel. The server uses machine learning algorithms to analyze the data and estimate the user's interests and behavioral patterns. Specifically, it uses image recognition technology to identify specific objects and brands from visual data, and keywords extracted from audio data to understand the user's areas of interest.

[0238] Information generation and provision:

[0239] Based on the analysis results, the server generates information relevant to the user. This information includes news, product information, and event announcements that are likely to be of interest to the user. The generated information is sent to the terminal and displayed as an overlay in the user's field of view. This allows the user to obtain the information they need in real time.

[0240] Memory supplementation and recommendations:

[0241] The server accumulates and analyzes users' past behavioral data to provide information that complements the user's memory and offers helpful recommendations for the future. For example, it can compile information about places users frequently visit and products they browse, and notify them when they visit those places again.

[0242] In this way, the integrated functioning of users, terminals, and servers enables an innovative system that allows for continuous and personalized information delivery. For example, when a user is attending an exhibition, the terminal can display information on booths and products that might interest them in real time, supporting efficient browsing.

[0243] The following describes the processing flow.

[0244] Step 1:

[0245] The device continuously collects visual and audio data through the smart glasses worn by the user. The camera captures objects and text within the user's field of view, and the microphone records ambient sounds. This data is collected in real time and prepared for initial filtering.

[0246] Step 2:

[0247] The device performs noise reduction and anonymization for privacy protection on the collected raw data. Specifically, it uses facial recognition technology to blur personally identifiable faces and removes unwanted noise from audio data. Unnecessary information from visual data is also filtered out.

[0248] Step 3:

[0249] The terminal efficiently compresses the pre-processed data and sends it to the server using a secure communication protocol. This step involves encryption to ensure the security of the communication.

[0250] Step 4:

[0251] The server inputs the received data into a machine learning model to analyze the user's interests and behavioral tendencies. It uses image recognition to identify specific objects and brands from visual data, and extracts important keywords from audio data. This analysis is then used to identify the user's interests.

[0252] Step 5:

[0253] The server generates information that is likely to be of interest to the user based on the analysis results. Here, it selects relevant news articles, new product information, event announcements, etc., while taking into account the user's past data and areas of interest.

[0254] Step 6:

[0255] The device receives information sent from the server and overlays it on the smart glasses' display. This information is displayed naturally along the user's line of sight, and the user can hide the information if needed.

[0256] Step 7:

[0257] The server accumulates data on user behavior and interests over the long term, generating recommendation information to supplement memories and predict future behavior. This information is presented to the user as needed to support their daily decision-making.

[0258] (Example 1)

[0259] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0260] In recent years, there has been a growing demand for personalized information services that enrich and streamline users' life experiences. However, existing technologies struggle to collect information from diverse data sources in real time and process and analyze it while protecting user privacy. This presents a challenge in accurately understanding users' interests and behavioral patterns and providing appropriate information.

[0261] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0262] In this invention, the server includes means for collecting visual and auditory information obtained from the user's daily life, means for processing the collected information in real time to anonymize it and remove unnecessary information, and means for transmitting the pre-processed information to a central processing unit using a secure communication path. This makes it possible to provide personalized information services to the user in real time while ensuring the user's privacy.

[0263] "Visual information" refers to image and video data acquired through cameras and optical sensors.

[0264] "Auditory information" refers to sound data acquired through microphones or sound sensors.

[0265] "Anonymization" refers to the process of removing or concealing information that could identify an individual from individual data.

[0266] "Filtering" is the process of removing unnecessary or noisy information from collected data.

[0267] A "communication path" is a physical or virtual route used to move information from one point to another.

[0268] A "central processing unit" refers to the main computing system used for data analysis and execution of instructions.

[0269] A "machine learning algorithm" is a method that learns patterns through data analysis and makes predictions and decisions based on future data.

[0270] "Behavioral history" refers to a record of a user's past actions and choices.

[0271] "Recommendation" refers to suggestions aimed at presenting appropriate information and options based on a user's past behavior and interests.

[0272] This invention is a personalized information delivery system designed to improve the user's life experience. The system consists of a terminal, such as smart glasses, worn by the user, and a server that handles data processing.

[0273] The device is equipped with a camera and microphone, and acquires visual and auditory information from the user's daily life. This makes it possible to collect diverse data about the user's surrounding environment. The collected data is processed in real time within the device, and filtering and anonymization are applied as needed. At this stage, unnecessary data and information related to the user's privacy are removed.

[0274] Data processed in real time is transmitted to the server via a secure communication channel. The server uses various machine learning algorithms, including generative AI models, to analyze the received data. This allows for the estimation of user interests and behavioral patterns. For example, image recognition technology is used to recognize specific objects from visual information, while natural language processing is used to extract keywords from audio information.

[0275] Based on user profiles obtained through data analysis, the server generates personalized information for each user. This information includes news, product information, and event announcements. The generated information is sent to the terminal and displayed as an overlay in the user's field of view, allowing the user to obtain the necessary information in real time.

[0276] Furthermore, the server accumulates the user's past behavioral history and provides recommendation information that can be used as a reference for the future. For example, based on information about the user's favorite restaurants and frequently purchased products, it can suggest new services and products that are highly relevant.

[0277] As a specific example, when a user is visiting a museum and looking at a particular painting, it is possible to support a deeper understanding and experience by providing real-time information about the history and artist related to that painting.

[0278] As an example of a prompt sentence for the generative AI model, the content "When the user is visiting a museum and looking at a particular painting, please provide real-time information about the history and artist related to that painting." can be considered.

[0279] The flow of the specific process in Example 1 will be described using FIG. 11.

[0280] Step 1: Data collection

[0281] When the user wears smart glasses, the terminal uses the camera to acquire visual information and collects audio information through the microphone. Specifically, the terminal acquires images and videos of the surrounding environment, as well as surrounding sounds and conversations. The input at this stage is the user's visual and audio environment, and the output is image files and audio files as raw data.

[0282] Step 2: Data preprocessing

[0283] The terminal processes the collected visual and audio information in real time. As specific operations, face recognition technology is used to blur the faces of people in the video, and voice recognition is used to omit specific information (such as personal names). The input is the raw data obtained in Step 1, and the output is data with these personal information removed and anonymized.

[0284] Step 3: Data transfer

[0285] The terminal transmits the preprocessed data to the server via a secure communication path. Specifically, an encryption protocol such as SSL / TLS is used to transfer the data. The input is the anonymized data, and the output is the anonymized data that can be accessed by the server.

[0286] Step 4: Data Analysis

[0287] The server executes a machine learning algorithm to analyze the received data. The input is anonymized visual and audio data. Through image recognition and natural language processing, specific objects and brands are identified from the visual data, and keywords are extracted from the audio data. The output is profile information regarding the user's interests and behavior patterns.

[0288] Step 5: Information Generation

[0289] Based on the analysis results, the server generates customized information to be provided to the user. The input is the user's profile information, and the output is personalized news, product information, event guides, etc. These pieces of information are prioritized based on the user's interests.

[0290] Step 6: Information Provision

[0291] The server transmits the generated information to the terminal, and the terminal overlays and displays the information on the display of the smart glasses. Specifically, details of interesting events and sales product information are directly displayed within the user's field of vision. The input is the customized information from the server, and the output is the information visually provided to the user.

[0292] Step 7: Memory Completion and Recommendation

[0293] The server accumulates the user's past behavior data and generates recommendations regarding future behavior through analysis. The input is the user's past behavior history, and the output is information for memory completion and recommendations for future actions.

[0294] (Application Example 1)

[0295] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0296] In modern urban life, there is a demand for efficient and personalized information. However, conventional information systems struggle to provide optimal information in real time based on users' interests and behavior. Therefore, there is a lack of systems that can effectively utilize information on public facility usage and events.

[0297] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0298] In this invention, the server includes a visual device means for collecting environmental data and audio information from the user, an information processing device means for pre-processing the collected data to anonymize and reduce its data size, and a system means that cooperates with a sensor network that provides real-time information on the usage status of public facilities. This enables users to receive personalized urban information in real time.

[0299] A "visual device for collecting environmental and audio information from users" is a terminal device worn by a user that records visual and audio information about the surroundings, and is a device that plays a role in data collection.

[0300] An "information processing device that pre-processes collected data to anonymize and reduce its size" is a device that processes data obtained through a visual device in real time, optimizing data capacity while protecting privacy.

[0301] An "analytical device that analyzes pre-processed data to estimate individual interests" is a device that analyzes users' interests and behavioral patterns from data and performs processing to meet specific needs.

[0302] A "video output device for displaying user-related information" is a display device that intuitively presents analyzed information to the user and transmits information through visual means.

[0303] An "information integration system that combines the analyzed results with information in an external database" is an integration system for providing users with more detailed information by combining information from existing databases.

[0304] A "system that cooperates with a sensor network that provides real-time usage status of public facilities" is a network system that monitors the facilities and the movement of people in public facilities in real time and provides users with the current status of the facilities based on that information.

[0305] The system for realizing this invention is composed of a visual device worn by a user, a server that processes information, and a sensor network that manages facility information. In this system, the user's visual device collects ambient environmental data and voice information. Then, the collected data is immediately anonymized and efficiently subjected to data reduction processing.

[0306] The server analyzes the collected data via an information processing device and estimates the user's interests and behavior patterns using a machine learning algorithm. Thereby, public facilities and event information that the user is likely to be interested in are extracted. For this analysis, a cloud platform is used, and as a specific example, data analysis is performed using Microsoft Azure.

[0307] In addition, the server cooperates with the sensor network, collects the real-time usage status of public facilities, and provides information related to the user based on that. For example, on the display of the visual device that the user is viewing, the available seat information of a nearby library and the details of ongoing events are overlaid and displayed. Thereby, citizens can utilize urban resources more efficiently.

[0308] As a concrete example, when a citizen is walking through a major intersection, their visual device displays a notification stating, "A free music concert at the community hall starts at 6 PM." This notification is provided based on real-time information obtained from the sensor network in the area and the user's past interest data. Another example of a prompt message is, "Based on my current location, please display information on the availability of nearby public facilities and events on my smart glasses," which prompts the user to provide information.

[0309] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0310] Step 1:

[0311] The user wears a visual device and goes about their daily life. The device acquires ambient environmental data and audio information through a camera and microphone. The input at this stage is visual and audio information of the environment, and the output is environmental data in digital format. The device uses a camera integrated with sensors to collect visual and audio data, and acquires this data as a primary record.

[0312] Step 2:

[0313] The device immediately preprocesses the collected visual and audio data. The input is the raw data collected in step 1, and user privacy is protected by filtering out important information and anonymizing it during processing. The output is compressed and anonymized data. In this process, a dedicated processor on the device optimizes the amount of data using a data reduction algorithm.

[0314] Step 3:

[0315] The server receives pre-processed data sent from the terminal. The input is compressed and anonymized data, and a machine learning model is used to analyze the user's interests and behavioral patterns. The output is user-specific interest data. Specifically, a generated AI model on the server recognizes specific objects and events from visual data and explores the user's interests from audio data.

[0316] Step 4:

[0317] The server works in conjunction with the sensor network to acquire real-time usage information and event information for public facilities. The input is sensor data from the facilities, and the output is recommendation information based on user interests. In this step, the server aggregates cloud-based sensor data and analyzes usage patterns.

[0318] Step 5:

[0319] The server integrates analyzed user interest information with sensor network data and sends specific event information and recommendations to the user's visual device. The input is the integrated information, and the output is the visual information displayed to the user. The user's visual device uses overlay displays to present this information in real time, delivering it to the user in a visually easy-to-understand format.

[0320] Step 6:

[0321] Users make decisions in urban life based on information displayed on visual devices. The input is the information displayed by the visual device, and the output is the user's action choices. This process enables users to make convenient choices, such as using public facilities or participating in events.

[0322] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0323] This invention is an information system centered around smart glasses used by users on a daily basis. It recognizes the user's emotions based on visual and auditory data and utilizes this to provide personalized information. In addition to a terminal (smart glasses) and a server, this system includes an emotion engine for emotion recognition.

[0324] Data collection and emotion recognition:

[0325] By wearing smart glasses, users collect visual and auditory data in various everyday situations. The device uses a camera to capture the user's facial expressions and surroundings as visual data, and a microphone to record the user's and their surroundings' voices. The emotion engine analyzes this visual and auditory data in real time to estimate the user's emotional state.

[0326] Data processing and analysis:

[0327] The collected data undergoes noise reduction and necessary anonymization processing on the terminal before being transmitted to the server via secure communication. On the server, machine learning models analyze the data to identify user interests and behavioral tendencies, and process the information while considering the emotional state provided by the emotion engine.

[0328] Information generation and presentation:

[0329] The user's emotional information, obtained through the emotion engine, is integrated into the information generation process on the server. The server dynamically generates information tailored to the user's interests and emotional state, determining personalized content. For example, if the user is feeling stressed, information related to relaxation will be prioritized and presented.

[0330] Display and Feedback:

[0331] The device overlays processed information onto the smart glasses' display. This display adjusts appropriately according to the user's gaze and facial expressions, optimizing the amount and type of information provided. By continuously monitoring the user's emotions, the system strives to provide the most relevant information at all times to meet the user's needs.

[0332] As a concrete example of the present invention, if a user is shopping in a mall and has a calm expression, the system will present the latest trend information. On the other hand, if the user is feeling anxious when deciding to make a purchase, the system will quickly display additional product reviews and price information based on feedback from the emotion engine to help them make a purchase decision.

[0333] The following describes the processing flow.

[0334] Step 1:

[0335] The device collects visual and auditory data using its camera and microphone while the user is wearing the smart glasses. The camera captures the user's facial expressions and gaze direction, while the microphone picks up voice and ambient sounds. This data is prepared as input for the emotion engine.

[0336] Step 2:

[0337] The emotion engine analyzes visual and auditory data supplied from the device to estimate the user's current emotional state in real time. It uses facial recognition technology to analyze facial features and extracts emotional changes from voice tone and speaking style.

[0338] Step 3:

[0339] The device transmits the emotion estimation results from the emotion engine to the server via a secure channel. During this process, the data is anonymized and compressed to ensure efficient and secure communication.

[0340] Step 4:

[0341] The server analyzes the received sentiment data by integrating it with previously accumulated data on user interests and behavioral tendencies. Based on the results of this analysis, it generates information content that corresponds to the user's current needs and interests.

[0342] Step 5:

[0343] The server sends information to the terminal that takes the user's emotional state into consideration. This information includes relaxation content to reduce stress and engaging entertainment information tailored to the user's emotions.

[0344] Step 6:

[0345] The device overlays information onto the smart glasses' display. The displayed information is presented in an optimal format, taking into account the user's gaze and posture, and the content is adjusted according to their emotions.

[0346] Step 7:

[0347] If a user clears the displayed information or their emotions change, the device immediately feeds this change back to the emotion engine and server. This feedback is used to optimize future information presentations.

[0348] (Example 2)

[0349] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0350] Traditional systems have a problem in that they do not adequately provide information tailored to the user's emotional state or specific situation. Therefore, it is difficult to present the content users want in a timely and appropriate manner. As a result, information does not align with user needs, leading to decreased satisfaction. Furthermore, from a privacy protection perspective, the anonymization and secure handling of collected data are of paramount importance.

[0351] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0352] In this invention, the server includes means for processing visual and auditory data collected from the user and analyzing emotions using an emotion recognition engine and a machine learning model; means for generating personalized information based on the user's emotional state using a generative AI model; and means for presenting dynamically adjusted information to the user using a display device. This makes it possible to provide personalized information to the user in real time and improve user satisfaction.

[0353] A "terminal" is a device worn by a user to collect visual and auditory data.

[0354] "Noise reduction" is a process that removes unwanted sounds and images from collected data to clarify important information.

[0355] "Anonymization" is a process that removes or transforms personally identifiable information to protect data privacy.

[0356] A "server" is a central computing system that receives data sent from terminals and performs analysis and information generation.

[0357] An "emotion recognition engine" is software or a system that estimates a user's emotional state based on visual and auditory data.

[0358] A "machine learning model" is an algorithm or system used to analyze collected data and estimate user interests and behaviors.

[0359] A "generative AI model" is a model that applies artificial intelligence technology to generate appropriate information based on the user's emotional state and interests.

[0360] A "display device" is a screen or display used to visually present generated information to a user.

[0361] "Overlay display" is a technique that displays information superimposed on the user's field of view, and is used to improve the user experience.

[0362] This invention is an information system built around smart glasses worn by the user. The system primarily consists of smart glasses, a server, and an emotion recognition engine.

[0363] By wearing smart glasses, users collect visual and auditory data. Specifically, the smart glasses utilize built-in cameras and microphones to collect the user's facial expressions and surrounding sounds, processing the data in real time. The collected data undergoes noise reduction and anonymization processing within the device. This processing improves data quality while protecting user privacy.

[0364] Pre-processed data is transmitted to the server via a secure communication protocol. The server uses an emotion recognition engine to analyze the data and estimate the user's emotional state. Based on this analysis, the server uses a generative AI model to generate information that matches the user's interests and emotions. For example, if the user is feeling stressed, the server will prioritize generating relaxing content.

[0365] The generated information is transferred from the server to the smart glasses' display device. The smart glasses overlay the information and dynamically adjust the displayed information according to the user's gaze and facial expressions. This enables the provision of personalized information that is tailored to the user's situation and emotions.

[0366] For example, if a user is calm and relaxed in a cafe, the server can generate and display information about nearby events and recommended reading lists. Furthermore, if a user feels anxious while shopping, the system can support their purchase decision by quickly providing additional product reviews and pricing information based on their emotions.

[0367] An example of a prompt from a generative AI model is, "How should information about relaxation be presented when a user is feeling stressed?"

[0368] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0369] Step 1:

[0370] When a user puts on smart glasses, the device collects the user's visual and audio data. Specifically, it uses a camera to capture the surroundings and the user's facial expressions, and a microphone to record audio. The inputs include real-time acquired visual and audio data. Using this data, the device performs noise reduction processing to output clean data that eliminates unnecessary information. This processing removes background sounds and deletes meaningless pixels from the video.

[0371] Step 2:

[0372] The device anonymizes the data preprocessed in Step 1. The input consists of de-noised visual and audio data. The device hides or transforms specific parts of the information to prevent personal identification, outputting a new, privacy-protected dataset. This process employs techniques such as applying filters to generalize facial features and removing personal names from audio data.

[0373] Step 3:

[0374] The terminal sends anonymized data to the server via a secure protocol (e.g., HTTPS). The input for transmission is the anonymized data obtained in step 2. The output is in the form of data communication packets received by the server. This transmission and reception process is encrypted to prevent data loss or interception.

[0375] Step 4:

[0376] The server analyzes the received data. The input data consists of anonymized visual and audio data sent from the terminal. The server uses an emotion recognition engine to estimate the user's emotional state. This analysis outputs information about the user's emotions. In this analysis step, image recognition algorithms and natural language processing techniques are applied to identify emotions from the user's facial expressions and tone of voice.

[0377] Step 5:

[0378] The server utilizes the analyzed emotional information and generates user-specific information using a generative AI model. The input data is the emotional information from step 4. The output is personalized information tailored to the user's emotional state. For example, if the user is anxious, content related to relaxation will be generated. In this step, pre-configured prompts are used to instruct the generative AI model to generate information.

[0379] Step 6:

[0380] Information generated from the server is sent to the terminal, which then overlays that information onto the smart glasses' display. The input sent to the terminal is the generated personalized information, and the output is the information displayed within the user's field of vision. In this overlay display, the display position and content are dynamically adjusted according to the user's gaze and facial expressions, providing a comfortable user experience.

[0381] (Application Example 2)

[0382] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0383] Modern consumers are required to select the most suitable products and services amidst information overload, and in particular, in the shopping experience at physical stores, it is crucial to present information that takes into account individual needs and emotional states. However, current technology makes it difficult to accurately grasp a user's emotional state and present the most appropriate information in real time. Therefore, there is an urgent need to develop a system that effectively provides personalized information by utilizing user emotional information.

[0384] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0385] In this invention, the server includes information processing means for collecting visual and auditory information from the user, information processing means for pre-processing, anonymizing, and sorting the collected information, and information analysis system means for analyzing the pre-processed information to estimate the user's emotional state. This makes it possible to provide personalized information about products and services that are tailored to the user's emotional state.

[0386] An "information processing device for collecting visual and auditory information from a user" refers to a device that acquires audiovisual information such as a user's facial expressions and voice, and has the function of collecting information in everyday situations.

[0387] An "information processing device that pre-processes, anonymizes, and filters collected information" is a device that removes personally identifiable elements from acquired information and converts it into an appropriate format, playing a role in extracting only useful information while protecting data privacy.

[0388] An "information analysis system that analyzes pre-processed information to estimate a user's emotional state" is a system that accurately estimates a user's emotions based on pre-processed information, and has the function of analyzing emotional states using specific algorithms or machine learning models.

[0389] A "display device for presenting relevant product or service information based on the user's emotional state" is a device that visually displays information about products and services to the user in a manner adapted to the analyzed emotional state, enabling real-time information provision.

[0390] An "information integration system that combines analyzed emotional states and estimation results with supplementary information obtained from other information storage devices" is a system that integrates the results of emotional analysis with data obtained from external information sources and processes them comprehensively, possessing the function of integrating the information necessary to provide users with the most optimal information.

[0391] In embodiments of the present invention, the information processing system is primarily composed of smart glasses worn by the user. The smart glasses are equipped with a camera to acquire visual information and a microphone to acquire audio information, and function as an "information processing device for collecting visual and audio information from the user." The device constantly acquires the user's visual and audio data, and the collected information is first pre-processed, including noise reduction. At this time, anonymization and sorting are performed to remove personally identifiable elements, thereby protecting privacy.

[0392] Pre-processed data is sent to a cloud server, which uses a machine learning model to estimate the user's emotional state. Specifically, the server analyzes the acquired data using an "information analysis system that analyzes pre-processed information to estimate the user's emotional state," and generates personalized information based on the results.

[0393] Product and service information tailored to the user's emotional state is overlaid on the smart glasses' display. This display device operates as a "display device for presenting relevant product or service information based on the user's emotional state," dynamically responding to the user's gaze and facial expressions to present information.

[0394] Furthermore, the server integrates the emotional state and estimation results using an "information integration system that combines the analyzed emotional state and estimation results with supplementary information obtained from other information storage devices" to optimize information delivery. This includes commercial information such as new products and promotions. In this way, users can receive information in a manner that harmonizes with their own emotions, making the in-store shopping experience more personalized and enhanced.

[0395] For example, when a user is inspecting a sofa in a store, if the emotion engine determines that the user is "interested," customer reviews and promotional information for that sofa will be displayed on the smart glasses. In addition, prompts such as "Please tell us what other users think of this product" and "Please suggest recommended product combinations" are used to generate suggestions for improving the user experience to the generative AI model.

[0396] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0397] Step 1:

[0398] The device collects visual and auditory information from the user. Using a camera and microphone, it captures the user's facial expressions and surrounding sounds, recording them as digital data. The input is the user's visual and auditory information, and the output is the digital data derived from this information.

[0399] Step 2:

[0400] The collected data is pre-processed on the terminal. Data accuracy and privacy are ensured by noise reduction and anonymization of personally identifiable elements. The input is the digital data obtained in step 1, and the output is filtered and anonymized clean data. This data processing is performed by a signal processing algorithm.

[0401] Step 3:

[0402] The terminal securely sends filtered data to the server. This data is encrypted and transferred to the server. The input is the clean data from step 2, and the output is the data after it has been transferred to the server.

[0403] Step 4:

[0404] The server analyzes the received data and uses a generative AI model to estimate the user's emotional state. The input is the data sent in step 3, and the output is information indicating the user's emotional state. This data processing is performed by an emotion analysis algorithm.

[0405] Step 5:

[0406] The server prepares data to generate and display information about relevant products and services based on the user's emotional state. The input is the emotional state information from step 4, and the output is information relevant to the user. For information generation, a generation AI model and prompt statements such as "Please tell me what other users think of this product" and "Please suggest recommended product combinations" are used.

[0407] Step 6:

[0408] The device overlays information received from the server onto the smart glasses. The display position and content of the information are dynamically adjusted based on the user's gaze and actions. The input is the information prepared in step 5, and the output is what is displayed on the smart glasses. This operation is performed using eye-tracking and display technologies.

[0409] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0410] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0411] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0412] [Third Embodiment]

[0413] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0414] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0415] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0416] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0417] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0418] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0419] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0420] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0421] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0422] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0423] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0424] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0425] The present invention provides personalized information services by utilizing visual and auditory data obtained through the user's daily life. This system consists of smart glasses, which are a device worn by the user, and a server that processes the data. The functions of each component are described below.

[0426] Data acquisition and preprocessing:

[0427] As users wear smart glasses and go about their daily lives, the device continuously collects visual and audio data using its camera and microphone. The device processes this information in real time, filtering out unnecessary data and anonymizing it. This reduces the amount of data collected while protecting the user's privacy.

[0428] Data transfer and analysis:

[0429] Pre-processed data is transmitted from the terminal to the server via a secure communication channel. The server uses machine learning algorithms to analyze the data and estimate the user's interests and behavioral patterns. Specifically, it uses image recognition technology to identify specific objects and brands from visual data, and keywords extracted from audio data to understand the user's areas of interest.

[0430] Information generation and provision:

[0431] Based on the analysis results, the server generates information relevant to the user. This information includes news, product information, and event announcements that are likely to be of interest to the user. The generated information is sent to the terminal and displayed as an overlay in the user's field of view. This allows the user to obtain the information they need in real time.

[0432] Memory supplementation and recommendations:

[0433] The server accumulates and analyzes users' past behavioral data to provide information that complements the user's memory and offers helpful recommendations for the future. For example, it can compile information about places users frequently visit and products they browse, and notify them when they visit those places again.

[0434] In this way, the integrated functioning of users, terminals, and servers enables an innovative system that allows for continuous and personalized information delivery. For example, when a user is attending an exhibition, the terminal can display information on booths and products that might interest them in real time, supporting efficient browsing.

[0435] The following describes the processing flow.

[0436] Step 1:

[0437] The device continuously collects visual and audio data through the smart glasses worn by the user. The camera captures objects and text within the user's field of view, and the microphone records ambient sounds. This data is collected in real time and prepared for initial filtering.

[0438] Step 2:

[0439] The device performs noise reduction and anonymization for privacy protection on the collected raw data. Specifically, it uses facial recognition technology to blur personally identifiable faces and removes unwanted noise from audio data. Unnecessary information from visual data is also filtered out.

[0440] Step 3:

[0441] The terminal efficiently compresses the pre-processed data and sends it to the server using a secure communication protocol. This step involves encryption to ensure the security of the communication.

[0442] Step 4:

[0443] The server inputs the received data into a machine learning model to analyze the user's interests and behavioral tendencies. It uses image recognition to identify specific objects and brands from visual data, and extracts important keywords from audio data. This analysis is then used to identify the user's interests.

[0444] Step 5:

[0445] The server generates information that is likely to be of interest to the user based on the analysis results. Here, it selects relevant news articles, new product information, event announcements, etc., while taking into account the user's past data and areas of interest.

[0446] Step 6:

[0447] The device receives information sent from the server and overlays it on the smart glasses' display. This information is displayed naturally along the user's line of sight, and the user can hide the information if needed.

[0448] Step 7:

[0449] The server accumulates data on user behavior and interests over the long term, generating recommendation information to supplement memories and predict future behavior. This information is presented to the user as needed to support their daily decision-making.

[0450] (Example 1)

[0451] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0452] In recent years, there has been a growing demand for personalized information services that enrich and streamline users' life experiences. However, existing technologies struggle to collect information from diverse data sources in real time and process and analyze it while protecting user privacy. This presents a challenge in accurately understanding users' interests and behavioral patterns and providing appropriate information.

[0453] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0454] In this invention, the server includes means for collecting visual and auditory information obtained from the user's daily life, means for processing the collected information in real time to anonymize it and remove unnecessary information, and means for transmitting the pre-processed information to a central processing unit using a secure communication path. This makes it possible to provide personalized information services to the user in real time while ensuring the user's privacy.

[0455] "Visual information" refers to image and video data acquired through cameras and optical sensors.

[0456] "Auditory information" refers to sound data acquired through microphones or sound sensors.

[0457] "Anonymization" refers to the process of removing or concealing information that could identify an individual from individual data.

[0458] "Filtering" is the process of removing unnecessary or noisy information from collected data.

[0459] A "communication path" is a physical or virtual route used to move information from one point to another.

[0460] A "central processing unit" refers to the main computing system used for data analysis and execution of instructions.

[0461] A "machine learning algorithm" is a method that learns patterns through data analysis and makes predictions and decisions based on future data.

[0462] "Behavioral history" refers to a record of a user's past actions and choices.

[0463] "Recommendation" refers to suggestions aimed at presenting appropriate information and options based on a user's past behavior and interests.

[0464] This invention is a personalized information delivery system designed to improve the user's life experience. The system consists of a terminal, such as smart glasses, worn by the user, and a server that handles data processing.

[0465] The device is equipped with a camera and microphone, and acquires visual and auditory information from the user's daily life. This makes it possible to collect diverse data about the user's surrounding environment. The collected data is processed in real time within the device, and filtering and anonymization are applied as needed. At this stage, unnecessary data and information related to the user's privacy are removed.

[0466] Data processed in real time is transmitted to the server via a secure communication channel. The server uses various machine learning algorithms, including generative AI models, to analyze the received data. This allows for the estimation of user interests and behavioral patterns. For example, image recognition technology is used to recognize specific objects from visual information, while natural language processing is used to extract keywords from audio information.

[0467] Based on user profiles obtained through data analysis, the server generates personalized information for each user. This information includes news, product information, and event announcements. The generated information is sent to the terminal and displayed as an overlay in the user's field of view, allowing the user to obtain the necessary information in real time.

[0468] Furthermore, the server accumulates the user's past behavioral history and provides recommendation information that can be used as a reference for the future. For example, based on information about the user's favorite restaurants and frequently purchased products, it can suggest new services and products that are highly relevant.

[0469] For example, when a user is visiting a museum and viewing a particular painting, providing real-time information about the history and artist associated with that painting can support a deeper understanding and experience.

[0470] An example of a prompt for a generative AI model might be: "When a user is visiting a museum and viewing a particular painting, please provide real-time information about the history and artist associated with that painting."

[0471] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0472] Step 1: Data Collection

[0473] When a user puts on smart glasses, the device uses a camera to acquire visual information and a microphone to collect audio information. Specifically, the device acquires images and videos of the surrounding environment, as well as ambient sounds and conversations. At this stage, the input is the user's visual and auditory environment, and the output is raw image and audio files.

[0474] Step 2: Data Preprocessing

[0475] The device processes the collected visual and audio information in real time. Specifically, it uses facial recognition technology to blur the faces of people in the video and speech recognition to remove specific information (e.g., personal names). The input is the raw data obtained in step 1, and the output is anonymized data from which this personal information has been removed.

[0476] Step 3: Data Transfer

[0477] The terminal sends pre-processed data to the server via a secure communication path. Specifically, it transfers data using encryption protocols such as SSL / TLS. The input is anonymized data, and the output is anonymized data that the server can access.

[0478] Step 4: Data Analysis

[0479] The server executes machine learning algorithms to analyze the received data. The input consists of anonymized visual and audio data. Image recognition and natural language processing are used to identify specific objects and brands from the visual data and extract keywords from the audio data. The output is profile information about the user's interests and behavioral patterns.

[0480] Step 5: Information Generation

[0481] The server generates customized information for the user based on the analysis results. The input is the user's profile information, and the output includes personalized news, product information, and event announcements. This information is prioritized based on the user's interests.

[0482] Step 6: Information Provision

[0483] The server sends the generated information to the device, which then overlays the information on the smart glasses' display. Specifically, it displays details of events of interest or sale information directly within the user's field of view. The input is customized information from the server, and the output is information provided visually to the user.

[0484] Step 7: Memory Supplementation and Recommendation

[0485] The server accumulates past user behavior data and generates recommendations for future behavior through analysis. The input is the user's past behavior history, and the output is information to supplement that memory and recommendations for future behavior.

[0486] (Application Example 1)

[0487] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0488] In modern urban life, there is a demand for efficient and personalized information. However, conventional information systems struggle to provide optimal information in real time based on users' interests and behavior. Therefore, there is a lack of systems that can effectively utilize information on public facility usage and events.

[0489] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0490] In this invention, the server includes a visual device means for collecting environmental data and audio information from the user, an information processing device means for pre-processing the collected data to anonymize and reduce its data size, and a system means that cooperates with a sensor network that provides real-time information on the usage status of public facilities. This enables users to receive personalized urban information in real time.

[0491] A "visual device for collecting environmental and audio information from users" is a terminal device worn by a user that records visual and audio information about the surroundings, and is a device that plays a role in data collection.

[0492] An "information processing device that pre-processes collected data to anonymize and reduce its size" is a device that processes data obtained through a visual device in real time, optimizing data capacity while protecting privacy.

[0493] An "analytical device that analyzes pre-processed data to estimate individual interests" is a device that analyzes users' interests and behavioral patterns from data and performs processing to meet specific needs.

[0494] A "video output device for displaying user-related information" is a display device that intuitively presents analyzed information to the user and transmits information through visual means.

[0495] An "information integration system that combines analyzed results with information from external databases" is an integrated system that combines information from existing databases to provide users with more detailed information.

[0496] A "system that works in conjunction with a sensor network to provide real-time information on the usage of public facilities" is a network system that monitors the movement of equipment and people within public facilities in real time and provides users with information on the current status of the facilities based on that information.

[0497] The system for realizing this invention consists of a visual device worn by the user, a server that processes information, and a sensor network that manages facility information. In this system, the user's visual device collects ambient environmental data and audio information. This collected data is immediately anonymized and efficiently subjected to data reduction processing.

[0498] The server analyzes the collected data via an information processing device and uses machine learning algorithms to estimate user interests and behavioral patterns. This allows it to extract information on public facilities and events that are likely to interest the user. A cloud platform is used for this analysis; specifically, Microsoft Azure is used for data analysis.

[0499] Furthermore, the server works in conjunction with a sensor network to collect real-time information on the usage of public facilities and provides users with relevant information based on that data. For example, information on available seats at nearby libraries and details of ongoing events can be overlaid on the display of the user's visual device. This allows citizens to utilize urban resources more efficiently.

[0500] As a concrete example, when a citizen is walking through a major intersection, their visual device displays a notification stating, "A free music concert at the community hall starts at 6 PM." This notification is provided based on real-time information obtained from the sensor network in the area and the user's past interest data. Another example of a prompt message is, "Based on my current location, please display information on the availability of nearby public facilities and events on my smart glasses," which prompts the user to provide information.

[0501] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0502] Step 1:

[0503] The user wears a visual device and goes about their daily life. The device acquires ambient environmental data and audio information through a camera and microphone. The input at this stage is visual and audio information of the environment, and the output is environmental data in digital format. The device uses a camera integrated with sensors to collect visual and audio data, and acquires this data as a primary record.

[0504] Step 2:

[0505] The device immediately preprocesses the collected visual and audio data. The input is the raw data collected in step 1, and user privacy is protected by filtering out important information and anonymizing it during processing. The output is compressed and anonymized data. In this process, a dedicated processor on the device optimizes the amount of data using a data reduction algorithm.

[0506] Step 3:

[0507] The server receives pre-processed data sent from the terminal. The input is compressed and anonymized data, and a machine learning model is used to analyze the user's interests and behavioral patterns. The output is user-specific interest data. Specifically, a generated AI model on the server recognizes specific objects and events from visual data and explores the user's interests from audio data.

[0508] Step 4:

[0509] The server works in conjunction with the sensor network to acquire real-time usage information and event information for public facilities. The input is sensor data from the facilities, and the output is recommendation information based on user interests. In this step, the server aggregates cloud-based sensor data and analyzes usage patterns.

[0510] Step 5:

[0511] The server integrates analyzed user interest information with sensor network data and sends specific event information and recommendations to the user's visual device. The input is the integrated information, and the output is the visual information displayed to the user. The user's visual device uses overlay displays to present this information in real time, delivering it to the user in a visually easy-to-understand format.

[0512] Step 6:

[0513] Users make decisions in urban life based on information displayed on visual devices. The input is the information displayed by the visual device, and the output is the user's action choices. This process enables users to make convenient choices, such as using public facilities or participating in events.

[0514] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0515] This invention is an information system centered around smart glasses used by users on a daily basis. It recognizes the user's emotions based on visual and auditory data and utilizes this to provide personalized information. In addition to a terminal (smart glasses) and a server, this system includes an emotion engine for emotion recognition.

[0516] Data collection and emotion recognition:

[0517] By wearing smart glasses, users collect visual and auditory data in various everyday situations. The device uses a camera to capture the user's facial expressions and surroundings as visual data, and a microphone to record the user's and their surroundings' voices. The emotion engine analyzes this visual and auditory data in real time to estimate the user's emotional state.

[0518] Data processing and analysis:

[0519] The collected data undergoes noise reduction and necessary anonymization processing on the terminal before being transmitted to the server via secure communication. On the server, machine learning models analyze the data to identify user interests and behavioral tendencies, and process the information while considering the emotional state provided by the emotion engine.

[0520] Information generation and presentation:

[0521] The user's emotional information, obtained through the emotion engine, is integrated into the information generation process on the server. The server dynamically generates information tailored to the user's interests and emotional state, determining personalized content. For example, if the user is feeling stressed, information related to relaxation will be prioritized and presented.

[0522] Display and Feedback:

[0523] The device overlays processed information onto the smart glasses' display. This display adjusts appropriately according to the user's gaze and facial expressions, optimizing the amount and type of information provided. By continuously monitoring the user's emotions, the system strives to provide the most relevant information at all times to meet the user's needs.

[0524] As a concrete example of the present invention, if a user is shopping in a mall and has a calm expression, the system will present the latest trend information. On the other hand, if the user is feeling anxious when deciding to make a purchase, the system will quickly display additional product reviews and price information based on feedback from the emotion engine to help them make a purchase decision.

[0525] The following describes the processing flow.

[0526] Step 1:

[0527] The device collects visual and auditory data using its camera and microphone while the user is wearing the smart glasses. The camera captures the user's facial expressions and gaze direction, while the microphone picks up voice and ambient sounds. This data is prepared as input for the emotion engine.

[0528] Step 2:

[0529] The emotion engine analyzes visual and auditory data supplied from the device to estimate the user's current emotional state in real time. It uses facial recognition technology to analyze facial features and extracts emotional changes from voice tone and speaking style.

[0530] Step 3:

[0531] The device transmits the emotion estimation results from the emotion engine to the server via a secure channel. During this process, the data is anonymized and compressed to ensure efficient and secure communication.

[0532] Step 4:

[0533] The server analyzes the received sentiment data by integrating it with previously accumulated data on user interests and behavioral tendencies. Based on the results of this analysis, it generates information content that corresponds to the user's current needs and interests.

[0534] Step 5:

[0535] The server sends information to the terminal that takes the user's emotional state into consideration. This information includes relaxation content to reduce stress and engaging entertainment information tailored to the user's emotions.

[0536] Step 6:

[0537] The device overlays information onto the smart glasses' display. The displayed information is presented in an optimal format, taking into account the user's gaze and posture, and the content is adjusted according to their emotions.

[0538] Step 7:

[0539] If a user clears the displayed information or their emotions change, the device immediately feeds this change back to the emotion engine and server. This feedback is used to optimize future information presentations.

[0540] (Example 2)

[0541] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0542] Traditional systems have a problem in that they do not adequately provide information tailored to the user's emotional state or specific situation. Therefore, it is difficult to present the content users want in a timely and appropriate manner. As a result, information does not align with user needs, leading to decreased satisfaction. Furthermore, from a privacy protection perspective, the anonymization and secure handling of collected data are of paramount importance.

[0543] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0544] In this invention, the server includes means for processing visual and auditory data collected from the user and analyzing emotions using an emotion recognition engine and a machine learning model; means for generating personalized information based on the user's emotional state using a generative AI model; and means for presenting dynamically adjusted information to the user using a display device. This makes it possible to provide personalized information to the user in real time and improve user satisfaction.

[0545] A "terminal" is a device worn by a user to collect visual and auditory data.

[0546] "Noise reduction" is a process that removes unwanted sounds and images from collected data to clarify important information.

[0547] "Anonymization" is a process that removes or transforms personally identifiable information to protect data privacy.

[0548] A "server" is a central computing system that receives data sent from terminals and performs analysis and information generation.

[0549] An "emotion recognition engine" is software or a system that estimates a user's emotional state based on visual and auditory data.

[0550] A "machine learning model" is an algorithm or system used to analyze collected data and estimate user interests and behaviors.

[0551] A "generative AI model" is a model that applies artificial intelligence technology to generate appropriate information based on the user's emotional state and interests.

[0552] A "display device" is a screen or display used to visually present generated information to a user.

[0553] "Overlay display" is a technique that displays information superimposed on the user's field of view, and is used to improve the user experience.

[0554] This invention is an information system built around smart glasses worn by the user. The system primarily consists of smart glasses, a server, and an emotion recognition engine.

[0555] By wearing smart glasses, users collect visual and auditory data. Specifically, the smart glasses utilize built-in cameras and microphones to collect the user's facial expressions and surrounding sounds, processing the data in real time. The collected data undergoes noise reduction and anonymization processing within the device. This processing improves data quality while protecting user privacy.

[0556] Pre-processed data is transmitted to the server via a secure communication protocol. The server uses an emotion recognition engine to analyze the data and estimate the user's emotional state. Based on this analysis, the server uses a generative AI model to generate information that matches the user's interests and emotions. For example, if the user is feeling stressed, the server will prioritize generating relaxing content.

[0557] The generated information is transferred from the server to the smart glasses' display device. The smart glasses overlay the information and dynamically adjust the displayed information according to the user's gaze and facial expressions. This enables the provision of personalized information that is tailored to the user's situation and emotions.

[0558] For example, if a user is calm and relaxed in a cafe, the server can generate and display information about nearby events and recommended reading lists. Furthermore, if a user feels anxious while shopping, the system can support their purchase decision by quickly providing additional product reviews and pricing information based on their emotions.

[0559] An example of a prompt from a generative AI model is, "How should information about relaxation be presented when a user is feeling stressed?"

[0560] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0561] Step 1:

[0562] When a user puts on smart glasses, the device collects the user's visual and audio data. Specifically, it uses a camera to capture the surroundings and the user's facial expressions, and a microphone to record audio. The inputs include real-time acquired visual and audio data. Using this data, the device performs noise reduction processing to output clean data that eliminates unnecessary information. This processing removes background sounds and deletes meaningless pixels from the video.

[0563] Step 2:

[0564] The device anonymizes the data preprocessed in Step 1. The input consists of de-noised visual and audio data. The device hides or transforms specific parts of the information to prevent personal identification, outputting a new, privacy-protected dataset. This process employs techniques such as applying filters to generalize facial features and removing personal names from audio data.

[0565] Step 3:

[0566] The terminal sends anonymized data to the server via a secure protocol (e.g., HTTPS). The input for transmission is the anonymized data obtained in step 2. The output is in the form of data communication packets received by the server. This transmission and reception process is encrypted to prevent data loss or interception.

[0567] Step 4:

[0568] The server analyzes the received data. The input data consists of anonymized visual and audio data sent from the terminal. The server uses an emotion recognition engine to estimate the user's emotional state. This analysis outputs information about the user's emotions. In this analysis step, image recognition algorithms and natural language processing techniques are applied to identify emotions from the user's facial expressions and tone of voice.

[0569] Step 5:

[0570] The server utilizes the analyzed emotional information and generates user-specific information using a generative AI model. The input data is the emotional information from step 4. The output is personalized information tailored to the user's emotional state. For example, if the user is anxious, content related to relaxation will be generated. In this step, pre-configured prompts are used to instruct the generative AI model to generate information.

[0571] Step 6:

[0572] Information generated from the server is sent to the terminal, which then overlays that information onto the smart glasses' display. The input sent to the terminal is the generated personalized information, and the output is the information displayed within the user's field of vision. In this overlay display, the display position and content are dynamically adjusted according to the user's gaze and facial expressions, providing a comfortable user experience.

[0573] (Application Example 2)

[0574] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0575] Modern consumers are required to select the most suitable products and services amidst information overload, and in particular, in the shopping experience at physical stores, it is crucial to present information that takes into account individual needs and emotional states. However, current technology makes it difficult to accurately grasp a user's emotional state and present the most appropriate information in real time. Therefore, there is an urgent need to develop a system that effectively provides personalized information by utilizing user emotional information.

[0576] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0577] In this invention, the server includes information processing means for collecting visual and auditory information from the user, information processing means for pre-processing, anonymizing, and sorting the collected information, and information analysis system means for analyzing the pre-processed information to estimate the user's emotional state. This makes it possible to provide personalized information about products and services that are tailored to the user's emotional state.

[0578] An "information processing device for collecting visual and auditory information from a user" refers to a device that acquires audiovisual information such as a user's facial expressions and voice, and has the function of collecting information in everyday situations.

[0579] An "information processing device that pre-processes, anonymizes, and filters collected information" is a device that removes personally identifiable elements from acquired information and converts it into an appropriate format, playing a role in extracting only useful information while protecting data privacy.

[0580] An "information analysis system that analyzes pre-processed information to estimate a user's emotional state" is a system that accurately estimates a user's emotions based on pre-processed information, and has the function of analyzing emotional states using specific algorithms or machine learning models.

[0581] A "display device for presenting relevant product or service information based on the user's emotional state" is a device that visually displays information about products and services to the user in a manner adapted to the analyzed emotional state, enabling real-time information provision.

[0582] An "information integration system that combines analyzed emotional states and estimation results with supplementary information obtained from other information storage devices" is a system that integrates the results of emotional analysis with data obtained from external information sources and processes them comprehensively, possessing the function of integrating the information necessary to provide users with the most optimal information.

[0583] In embodiments of the present invention, the information processing system is primarily composed of smart glasses worn by the user. The smart glasses are equipped with a camera to acquire visual information and a microphone to acquire audio information, and function as an "information processing device for collecting visual and audio information from the user." The device constantly acquires the user's visual and audio data, and the collected information is first pre-processed, including noise reduction. At this time, anonymization and sorting are performed to remove personally identifiable elements, thereby protecting privacy.

[0584] Pre-processed data is sent to a cloud server, which uses a machine learning model to estimate the user's emotional state. Specifically, the server analyzes the acquired data using an "information analysis system that analyzes pre-processed information to estimate the user's emotional state," and generates personalized information based on the results.

[0585] Product and service information tailored to the user's emotional state is overlaid on the smart glasses' display. This display device operates as a "display device for presenting relevant product or service information based on the user's emotional state," dynamically responding to the user's gaze and facial expressions to present information.

[0586] Furthermore, the server integrates the emotional state and estimation results using an "information integration system that combines the analyzed emotional state and estimation results with supplementary information obtained from other information storage devices" to optimize information delivery. This includes commercial information such as new products and promotions. In this way, users can receive information in a manner that harmonizes with their own emotions, making the in-store shopping experience more personalized and enhanced.

[0587] For example, when a user is inspecting a sofa in a store, if the emotion engine determines that the user is "interested," customer reviews and promotional information for that sofa will be displayed on the smart glasses. In addition, prompts such as "Please tell us what other users think of this product" and "Please suggest recommended product combinations" are used to generate suggestions for improving the user experience to the generative AI model.

[0588] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0589] Step 1:

[0590] The device collects visual and auditory information from the user. Using a camera and microphone, it captures the user's facial expressions and surrounding sounds, recording them as digital data. The input is the user's visual and auditory information, and the output is the digital data derived from this information.

[0591] Step 2:

[0592] The collected data is pre-processed on the terminal. Data accuracy and privacy are ensured by noise reduction and anonymization of personally identifiable elements. The input is the digital data obtained in step 1, and the output is filtered and anonymized clean data. This data processing is performed by a signal processing algorithm.

[0593] Step 3:

[0594] The terminal securely sends filtered data to the server. This data is encrypted and transferred to the server. The input is the clean data from step 2, and the output is the data after it has been transferred to the server.

[0595] Step 4:

[0596] The server analyzes the received data and uses a generative AI model to estimate the user's emotional state. The input is the data sent in step 3, and the output is information indicating the user's emotional state. This data processing is performed by an emotion analysis algorithm.

[0597] Step 5:

[0598] The server prepares data to generate and display information about relevant products and services based on the user's emotional state. The input is the emotional state information from step 4, and the output is information relevant to the user. For information generation, a generation AI model and prompt statements such as "Please tell me what other users think of this product" and "Please suggest recommended product combinations" are used.

[0599] Step 6:

[0600] The device overlays information received from the server onto the smart glasses. The display position and content of the information are dynamically adjusted based on the user's gaze and actions. The input is the information prepared in step 5, and the output is what is displayed on the smart glasses. This operation is performed using eye-tracking and display technologies.

[0601] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0602] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0603] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0604] [Fourth Embodiment]

[0605] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0606] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0607] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0608] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0609] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0610] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0611] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0612] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0613] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0614] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0615] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0616] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0617] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0618] The present invention provides personalized information services by utilizing visual and auditory data obtained through the user's daily life. This system consists of smart glasses, which are a device worn by the user, and a server that processes the data. The functions of each component are described below.

[0619] Data acquisition and preprocessing:

[0620] As users wear smart glasses and go about their daily lives, the device continuously collects visual and audio data using its camera and microphone. The device processes this information in real time, filtering out unnecessary data and anonymizing it. This reduces the amount of data collected while protecting the user's privacy.

[0621] Data transfer and analysis:

[0622] Pre-processed data is transmitted from the terminal to the server via a secure communication channel. The server uses machine learning algorithms to analyze the data and estimate the user's interests and behavioral patterns. Specifically, it uses image recognition technology to identify specific objects and brands from visual data, and keywords extracted from audio data to understand the user's areas of interest.

[0623] Information generation and provision:

[0624] Based on the analysis results, the server generates information relevant to the user. This information includes news, product information, and event announcements that are likely to be of interest to the user. The generated information is sent to the terminal and displayed as an overlay in the user's field of view. This allows the user to obtain the information they need in real time.

[0625] Memory supplementation and recommendations:

[0626] The server accumulates and analyzes users' past behavioral data to provide information that complements the user's memory and offers helpful recommendations for the future. For example, it can compile information about places users frequently visit and products they browse, and notify them when they visit those places again.

[0627] In this way, the integrated functioning of users, terminals, and servers enables an innovative system that allows for continuous and personalized information delivery. For example, when a user is attending an exhibition, the terminal can display information on booths and products that might interest them in real time, supporting efficient browsing.

[0628] The following describes the processing flow.

[0629] Step 1:

[0630] The device continuously collects visual and audio data through the smart glasses worn by the user. The camera captures objects and text within the user's field of view, and the microphone records ambient sounds. This data is collected in real time and prepared for initial filtering.

[0631] Step 2:

[0632] The device performs noise reduction and anonymization for privacy protection on the collected raw data. Specifically, it uses facial recognition technology to blur personally identifiable faces and removes unwanted noise from audio data. Unnecessary information from visual data is also filtered out.

[0633] Step 3:

[0634] The terminal efficiently compresses the pre-processed data and sends it to the server using a secure communication protocol. This step involves encryption to ensure the security of the communication.

[0635] Step 4:

[0636] The server inputs the received data into a machine learning model to analyze the user's interests and behavioral tendencies. It uses image recognition to identify specific objects and brands from visual data, and extracts important keywords from audio data. This analysis is then used to identify the user's interests.

[0637] Step 5:

[0638] The server generates information that is likely to be of interest to the user based on the analysis results. Here, it selects relevant news articles, new product information, event announcements, etc., while taking into account the user's past data and areas of interest.

[0639] Step 6:

[0640] The device receives information sent from the server and overlays it on the smart glasses' display. This information is displayed naturally along the user's line of sight, and the user can hide the information if needed.

[0641] Step 7:

[0642] The server accumulates data on user behavior and interests over the long term, generating recommendation information to supplement memories and predict future behavior. This information is presented to the user as needed to support their daily decision-making.

[0643] (Example 1)

[0644] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0645] In recent years, there has been a growing demand for personalized information services that enrich and streamline users' life experiences. However, existing technologies struggle to collect information from diverse data sources in real time and process and analyze it while protecting user privacy. This presents a challenge in accurately understanding users' interests and behavioral patterns and providing appropriate information.

[0646] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0647] In this invention, the server includes means for collecting visual and auditory information obtained from the user's daily life, means for processing the collected information in real time to anonymize it and remove unnecessary information, and means for transmitting the pre-processed information to a central processing unit using a secure communication path. This makes it possible to provide personalized information services to the user in real time while ensuring the user's privacy.

[0648] "Visual information" refers to image and video data acquired through cameras and optical sensors.

[0649] "Auditory information" refers to sound data acquired through microphones or sound sensors.

[0650] "Anonymization" refers to the process of removing or concealing information that could identify an individual from individual data.

[0651] "Filtering" is the process of removing unnecessary or noisy information from collected data.

[0652] A "communication path" is a physical or virtual route used to move information from one point to another.

[0653] A "central processing unit" refers to the main computing system used for data analysis and execution of instructions.

[0654] A "machine learning algorithm" is a method that learns patterns through data analysis and makes predictions and decisions based on future data.

[0655] "Behavioral history" refers to a record of a user's past actions and choices.

[0656] "Recommendation" refers to suggestions aimed at presenting appropriate information and options based on a user's past behavior and interests.

[0657] This invention is a personalized information delivery system designed to improve the user's life experience. The system consists of a terminal, such as smart glasses, worn by the user, and a server that handles data processing.

[0658] The device is equipped with a camera and microphone, and acquires visual and auditory information from the user's daily life. This makes it possible to collect diverse data about the user's surrounding environment. The collected data is processed in real time within the device, and filtering and anonymization are applied as needed. At this stage, unnecessary data and information related to the user's privacy are removed.

[0659] Data processed in real time is transmitted to the server via a secure communication channel. The server uses various machine learning algorithms, including generative AI models, to analyze the received data. This allows for the estimation of user interests and behavioral patterns. For example, image recognition technology is used to recognize specific objects from visual information, while natural language processing is used to extract keywords from audio information.

[0660] Based on user profiles obtained through data analysis, the server generates personalized information for each user. This information includes news, product information, and event announcements. The generated information is sent to the terminal and displayed as an overlay in the user's field of view, allowing the user to obtain the necessary information in real time.

[0661] Furthermore, the server accumulates the user's past behavioral history and provides recommendation information that can be used as a reference for the future. For example, based on information about the user's favorite restaurants and frequently purchased products, it can suggest new services and products that are highly relevant.

[0662] For example, when a user is visiting a museum and viewing a particular painting, providing real-time information about the history and artist associated with that painting can support a deeper understanding and experience.

[0663] An example of a prompt for a generative AI model might be: "When a user is visiting a museum and viewing a particular painting, please provide real-time information about the history and artist associated with that painting."

[0664] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0665] Step 1: Data Collection

[0666] When a user puts on smart glasses, the device uses a camera to acquire visual information and a microphone to collect audio information. Specifically, the device acquires images and videos of the surrounding environment, as well as ambient sounds and conversations. At this stage, the input is the user's visual and auditory environment, and the output is raw image and audio files.

[0667] Step 2: Data Preprocessing

[0668] The device processes the collected visual and audio information in real time. Specifically, it uses facial recognition technology to blur the faces of people in the video and speech recognition to remove specific information (e.g., personal names). The input is the raw data obtained in step 1, and the output is anonymized data from which this personal information has been removed.

[0669] Step 3: Data Transfer

[0670] The terminal sends pre-processed data to the server via a secure communication path. Specifically, it transfers data using encryption protocols such as SSL / TLS. The input is anonymized data, and the output is anonymized data that the server can access.

[0671] Step 4: Data Analysis

[0672] The server executes machine learning algorithms to analyze the received data. The input consists of anonymized visual and audio data. Image recognition and natural language processing are used to identify specific objects and brands from the visual data and extract keywords from the audio data. The output is profile information about the user's interests and behavioral patterns.

[0673] Step 5: Information Generation

[0674] The server generates customized information for the user based on the analysis results. The input is the user's profile information, and the output includes personalized news, product information, and event announcements. This information is prioritized based on the user's interests.

[0675] Step 6: Information Provision

[0676] The server sends the generated information to the device, which then overlays the information on the smart glasses' display. Specifically, it displays details of events of interest or sale information directly within the user's field of view. The input is customized information from the server, and the output is information provided visually to the user.

[0677] Step 7: Memory Supplementation and Recommendation

[0678] The server accumulates past user behavior data and generates recommendations for future behavior through analysis. The input is the user's past behavior history, and the output is information to supplement that memory and recommendations for future behavior.

[0679] (Application Example 1)

[0680] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0681] In modern urban life, there is a demand for efficient and personalized information. However, conventional information systems struggle to provide optimal information in real time based on users' interests and behavior. Therefore, there is a lack of systems that can effectively utilize information on public facility usage and events.

[0682] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0683] In this invention, the server includes a visual device means for collecting environmental data and audio information from the user, an information processing device means for pre-processing the collected data to anonymize and reduce its data size, and a system means that cooperates with a sensor network that provides real-time information on the usage status of public facilities. This enables users to receive personalized urban information in real time.

[0684] A "visual device for collecting environmental and audio information from users" is a terminal device worn by a user that records visual and audio information about the surroundings, and is a device that plays a role in data collection.

[0685] An "information processing device that pre-processes collected data to anonymize and reduce its size" is a device that processes data obtained through a visual device in real time, optimizing data capacity while protecting privacy.

[0686] An "analytical device that analyzes pre-processed data to estimate individual interests" is a device that analyzes users' interests and behavioral patterns from data and performs processing to meet specific needs.

[0687] A "video output device for displaying user-related information" is a display device that intuitively presents analyzed information to the user and transmits information through visual means.

[0688] An "information integration system that combines analyzed results with information from external databases" is an integrated system that combines information from existing databases to provide users with more detailed information.

[0689] A "system that works in conjunction with a sensor network to provide real-time information on the usage of public facilities" is a network system that monitors the movement of equipment and people within public facilities in real time and provides users with information on the current status of the facilities based on that information.

[0690] The system for realizing this invention consists of a visual device worn by the user, a server that processes information, and a sensor network that manages facility information. In this system, the user's visual device collects ambient environmental data and audio information. This collected data is immediately anonymized and efficiently subjected to data reduction processing.

[0691] The server analyzes the collected data via an information processing device and uses machine learning algorithms to estimate user interests and behavioral patterns. This allows it to extract information on public facilities and events that are likely to interest the user. A cloud platform is used for this analysis; specifically, Microsoft Azure is used for data analysis.

[0692] Furthermore, the server works in conjunction with a sensor network to collect real-time information on the usage of public facilities and provides users with relevant information based on that data. For example, information on available seats at nearby libraries and details of ongoing events can be overlaid on the display of the user's visual device. This allows citizens to utilize urban resources more efficiently.

[0693] As a concrete example, when a citizen is walking through a major intersection, their visual device displays a notification stating, "A free music concert at the community hall starts at 6 PM." This notification is provided based on real-time information obtained from the sensor network in the area and the user's past interest data. Another example of a prompt message is, "Based on my current location, please display information on the availability of nearby public facilities and events on my smart glasses," which prompts the user to provide information.

[0694] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0695] Step 1:

[0696] The user wears a visual device and goes about their daily life. The device acquires ambient environmental data and audio information through a camera and microphone. The input at this stage is visual and audio information of the environment, and the output is environmental data in digital format. The device uses a camera integrated with sensors to collect visual and audio data, and acquires this data as a primary record.

[0697] Step 2:

[0698] The device immediately preprocesses the collected visual and audio data. The input is the raw data collected in step 1, and user privacy is protected by filtering out important information and anonymizing it during processing. The output is compressed and anonymized data. In this process, a dedicated processor on the device optimizes the amount of data using a data reduction algorithm.

[0699] Step 3:

[0700] The server receives pre-processed data sent from the terminal. The input is compressed and anonymized data, and a machine learning model is used to analyze the user's interests and behavioral patterns. The output is user-specific interest data. Specifically, a generated AI model on the server recognizes specific objects and events from visual data and explores the user's interests from audio data.

[0701] Step 4:

[0702] The server works in conjunction with the sensor network to acquire real-time usage information and event information for public facilities. The input is sensor data from the facilities, and the output is recommendation information based on user interests. In this step, the server aggregates cloud-based sensor data and analyzes usage patterns.

[0703] Step 5:

[0704] The server integrates analyzed user interest information with sensor network data and sends specific event information and recommendations to the user's visual device. The input is the integrated information, and the output is the visual information displayed to the user. The user's visual device uses overlay displays to present this information in real time, delivering it to the user in a visually easy-to-understand format.

[0705] Step 6:

[0706] Users make decisions in urban life based on information displayed on visual devices. The input is the information displayed by the visual device, and the output is the user's action choices. This process enables users to make convenient choices, such as using public facilities or participating in events.

[0707] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0708] This invention is an information system centered around smart glasses used by users on a daily basis. It recognizes the user's emotions based on visual and auditory data and utilizes this to provide personalized information. In addition to a terminal (smart glasses) and a server, this system includes an emotion engine for emotion recognition.

[0709] Data collection and emotion recognition:

[0710] By wearing smart glasses, users collect visual and auditory data in various everyday situations. The device uses a camera to capture the user's facial expressions and surroundings as visual data, and a microphone to record the user's and their surroundings' voices. The emotion engine analyzes this visual and auditory data in real time to estimate the user's emotional state.

[0711] Data processing and analysis:

[0712] The collected data undergoes noise reduction and necessary anonymization processing on the terminal before being transmitted to the server via secure communication. On the server, machine learning models analyze the data to identify user interests and behavioral tendencies, and process the information while considering the emotional state provided by the emotion engine.

[0713] Information generation and presentation:

[0714] The user's emotional information, obtained through the emotion engine, is integrated into the information generation process on the server. The server dynamically generates information tailored to the user's interests and emotional state, determining personalized content. For example, if the user is feeling stressed, information related to relaxation will be prioritized and presented.

[0715] Display and Feedback:

[0716] The device overlays processed information onto the smart glasses' display. This display adjusts appropriately according to the user's gaze and facial expressions, optimizing the amount and type of information provided. By continuously monitoring the user's emotions, the system strives to provide the most relevant information at all times to meet the user's needs.

[0717] As a concrete example of the present invention, if a user is shopping in a mall and has a calm expression, the system will present the latest trend information. On the other hand, if the user is feeling anxious when deciding to make a purchase, the system will quickly display additional product reviews and price information based on feedback from the emotion engine to help them make a purchase decision.

[0718] The following describes the processing flow.

[0719] Step 1:

[0720] The device collects visual and auditory data using its camera and microphone while the user is wearing the smart glasses. The camera captures the user's facial expressions and gaze direction, while the microphone picks up voice and ambient sounds. This data is prepared as input for the emotion engine.

[0721] Step 2:

[0722] The emotion engine analyzes visual and auditory data supplied from the device to estimate the user's current emotional state in real time. It uses facial recognition technology to analyze facial features and extracts emotional changes from voice tone and speaking style.

[0723] Step 3:

[0724] The device transmits the emotion estimation results from the emotion engine to the server via a secure channel. During this process, the data is anonymized and compressed to ensure efficient and secure communication.

[0725] Step 4:

[0726] The server analyzes the received sentiment data by integrating it with previously accumulated data on user interests and behavioral tendencies. Based on the results of this analysis, it generates information content that corresponds to the user's current needs and interests.

[0727] Step 5:

[0728] The server sends information to the terminal that takes the user's emotional state into consideration. This information includes relaxation content to reduce stress and engaging entertainment information tailored to the user's emotions.

[0729] Step 6:

[0730] The device overlays information onto the smart glasses' display. The displayed information is presented in an optimal format, taking into account the user's gaze and posture, and the content is adjusted according to their emotions.

[0731] Step 7:

[0732] If a user clears the displayed information or their emotions change, the device immediately feeds this change back to the emotion engine and server. This feedback is used to optimize future information presentations.

[0733] (Example 2)

[0734] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0735] Traditional systems have a problem in that they do not adequately provide information tailored to the user's emotional state or specific situation. Therefore, it is difficult to present the content users want in a timely and appropriate manner. As a result, information does not align with user needs, leading to decreased satisfaction. Furthermore, from a privacy protection perspective, the anonymization and secure handling of collected data are of paramount importance.

[0736] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0737] In this invention, the server includes means for processing visual and auditory data collected from the user and analyzing emotions using an emotion recognition engine and a machine learning model; means for generating personalized information based on the user's emotional state using a generative AI model; and means for presenting dynamically adjusted information to the user using a display device. This makes it possible to provide personalized information to the user in real time and improve user satisfaction.

[0738] A "terminal" is a device worn by a user to collect visual and auditory data.

[0739] "Noise reduction" is a process that removes unwanted sounds and images from collected data to clarify important information.

[0740] "Anonymization" is a process that removes or transforms personally identifiable information to protect data privacy.

[0741] A "server" is a central computing system that receives data sent from terminals and performs analysis and information generation.

[0742] An "emotion recognition engine" is software or a system that estimates a user's emotional state based on visual and auditory data.

[0743] A "machine learning model" is an algorithm or system used to analyze collected data and estimate user interests and behaviors.

[0744] A "generative AI model" is a model that applies artificial intelligence technology to generate appropriate information based on the user's emotional state and interests.

[0745] A "display device" is a screen or display used to visually present generated information to a user.

[0746] "Overlay display" is a technique that displays information superimposed on the user's field of view, and is used to improve the user experience.

[0747] This invention is an information system built around smart glasses worn by the user. The system primarily consists of smart glasses, a server, and an emotion recognition engine.

[0748] By wearing smart glasses, users collect visual and auditory data. Specifically, the smart glasses utilize built-in cameras and microphones to collect the user's facial expressions and surrounding sounds, processing the data in real time. The collected data undergoes noise reduction and anonymization processing within the device. This processing improves data quality while protecting user privacy.

[0749] Pre-processed data is transmitted to the server via a secure communication protocol. The server uses an emotion recognition engine to analyze the data and estimate the user's emotional state. Based on this analysis, the server uses a generative AI model to generate information that matches the user's interests and emotions. For example, if the user is feeling stressed, the server will prioritize generating relaxing content.

[0750] The generated information is transferred from the server to the smart glasses' display device. The smart glasses overlay the information and dynamically adjust the displayed information according to the user's gaze and facial expressions. This enables the provision of personalized information that is tailored to the user's situation and emotions.

[0751] For example, if a user is calm and relaxed in a cafe, the server can generate and display information about nearby events and recommended reading lists. Furthermore, if a user feels anxious while shopping, the system can support their purchase decision by quickly providing additional product reviews and pricing information based on their emotions.

[0752] An example of a prompt from a generative AI model is, "How should information about relaxation be presented when a user is feeling stressed?"

[0753] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0754] Step 1:

[0755] When a user puts on smart glasses, the device collects the user's visual and audio data. Specifically, it uses a camera to capture the surroundings and the user's facial expressions, and a microphone to record audio. The inputs include real-time acquired visual and audio data. Using this data, the device performs noise reduction processing to output clean data that eliminates unnecessary information. This processing removes background sounds and deletes meaningless pixels from the video.

[0756] Step 2:

[0757] The device anonymizes the data preprocessed in Step 1. The input consists of de-noised visual and audio data. The device hides or transforms specific parts of the information to prevent personal identification, outputting a new, privacy-protected dataset. This process employs techniques such as applying filters to generalize facial features and removing personal names from audio data.

[0758] Step 3:

[0759] The terminal sends anonymized data to the server via a secure protocol (e.g., HTTPS). The input for transmission is the anonymized data obtained in step 2. The output is in the form of data communication packets received by the server. This transmission and reception process is encrypted to prevent data loss or interception.

[0760] Step 4:

[0761] The server analyzes the received data. The input data consists of anonymized visual and audio data sent from the terminal. The server uses an emotion recognition engine to estimate the user's emotional state. This analysis outputs information about the user's emotions. In this analysis step, image recognition algorithms and natural language processing techniques are applied to identify emotions from the user's facial expressions and tone of voice.

[0762] Step 5:

[0763] The server utilizes the analyzed emotional information and generates user-specific information using a generative AI model. The input data is the emotional information from step 4. The output is personalized information tailored to the user's emotional state. For example, if the user is anxious, content related to relaxation will be generated. In this step, pre-configured prompts are used to instruct the generative AI model to generate information.

[0764] Step 6:

[0765] Information generated from the server is sent to the terminal, which then overlays that information onto the smart glasses' display. The input sent to the terminal is the generated personalized information, and the output is the information displayed within the user's field of vision. In this overlay display, the display position and content are dynamically adjusted according to the user's gaze and facial expressions, providing a comfortable user experience.

[0766] (Application Example 2)

[0767] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0768] Modern consumers are required to select the most suitable products and services amidst information overload, and in particular, in the shopping experience at physical stores, it is crucial to present information that takes into account individual needs and emotional states. However, current technology makes it difficult to accurately grasp a user's emotional state and present the most appropriate information in real time. Therefore, there is an urgent need to develop a system that effectively provides personalized information by utilizing user emotional information.

[0769] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0770] In this invention, the server includes information processing means for collecting visual and auditory information from the user, information processing means for pre-processing, anonymizing, and sorting the collected information, and information analysis system means for analyzing the pre-processed information to estimate the user's emotional state. This makes it possible to provide personalized information about products and services that are tailored to the user's emotional state.

[0771] An "information processing device for collecting visual and auditory information from a user" refers to a device that acquires audiovisual information such as a user's facial expressions and voice, and has the function of collecting information in everyday situations.

[0772] An "information processing device that pre-processes, anonymizes, and filters collected information" is a device that removes personally identifiable elements from acquired information and converts it into an appropriate format, playing a role in extracting only useful information while protecting data privacy.

[0773] An "information analysis system that analyzes pre-processed information to estimate a user's emotional state" is a system that accurately estimates a user's emotions based on pre-processed information, and has the function of analyzing emotional states using specific algorithms or machine learning models.

[0774] A "display device for presenting relevant product or service information based on the user's emotional state" is a device that visually displays information about products and services to the user in a manner adapted to the analyzed emotional state, enabling real-time information provision.

[0775] An "information integration system that combines analyzed emotional states and estimation results with supplementary information obtained from other information storage devices" is a system that integrates the results of emotional analysis with data obtained from external information sources and processes them comprehensively, possessing the function of integrating the information necessary to provide users with the most optimal information.

[0776] In embodiments of the present invention, the information processing system is primarily composed of smart glasses worn by the user. The smart glasses are equipped with a camera to acquire visual information and a microphone to acquire audio information, and function as an "information processing device for collecting visual and audio information from the user." The device constantly acquires the user's visual and audio data, and the collected information is first pre-processed, including noise reduction. At this time, anonymization and sorting are performed to remove personally identifiable elements, thereby protecting privacy.

[0777] Pre-processed data is sent to a cloud server, which uses a machine learning model to estimate the user's emotional state. Specifically, the server analyzes the acquired data using an "information analysis system that analyzes pre-processed information to estimate the user's emotional state," and generates personalized information based on the results.

[0778] Product and service information tailored to the user's emotional state is overlaid on the smart glasses' display. This display device operates as a "display device for presenting relevant product or service information based on the user's emotional state," dynamically responding to the user's gaze and facial expressions to present information.

[0779] Furthermore, the server integrates the emotional state and estimation results using an "information integration system that combines the analyzed emotional state and estimation results with supplementary information obtained from other information storage devices" to optimize information delivery. This includes commercial information such as new products and promotions. In this way, users can receive information in a manner that harmonizes with their own emotions, making the in-store shopping experience more personalized and enhanced.

[0780] For example, when a user is inspecting a sofa in a store, if the emotion engine determines that the user is "interested," customer reviews and promotional information for that sofa will be displayed on the smart glasses. In addition, prompts such as "Please tell us what other users think of this product" and "Please suggest recommended product combinations" are used to generate suggestions for improving the user experience to the generative AI model.

[0781] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0782] Step 1:

[0783] The device collects visual and auditory information from the user. Using a camera and microphone, it captures the user's facial expressions and surrounding sounds, recording them as digital data. The input is the user's visual and auditory information, and the output is the digital data derived from this information.

[0784] Step 2:

[0785] The collected data is pre-processed on the terminal. Data accuracy and privacy are ensured by noise reduction and anonymization of personally identifiable elements. The input is the digital data obtained in step 1, and the output is filtered and anonymized clean data. This data processing is performed by a signal processing algorithm.

[0786] Step 3:

[0787] The terminal securely sends filtered data to the server. This data is encrypted and transferred to the server. The input is the clean data from step 2, and the output is the data after it has been transferred to the server.

[0788] Step 4:

[0789] The server analyzes the received data and uses a generative AI model to estimate the user's emotional state. The input is the data sent in step 3, and the output is information indicating the user's emotional state. This data processing is performed by an emotion analysis algorithm.

[0790] Step 5:

[0791] The server prepares data to generate and display information about relevant products and services based on the user's emotional state. The input is the emotional state information from step 4, and the output is information relevant to the user. For information generation, a generation AI model and prompt statements such as "Please tell me what other users think of this product" and "Please suggest recommended product combinations" are used.

[0792] Step 6:

[0793] The device overlays information received from the server onto the smart glasses. The display position and content of the information are dynamically adjusted based on the user's gaze and actions. The input is the information prepared in step 5, and the output is what is displayed on the smart glasses. This operation is performed using eye-tracking and display technologies.

[0794] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0795] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0796] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0797] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0798] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0799] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0800] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0801] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0802] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0803] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0804] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0805] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0806] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0807] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0808] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0809] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0810] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0811] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0812] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0813] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0814] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0815] The following is further disclosed regarding the embodiments described above.

[0816] (Claim 1)

[0817] A device for collecting visual and auditory data from a user,

[0818] A device and means for preprocessing the collected data to anonymize and filter it,

[0819] A system means for analyzing pre-processed data to estimate user interests,

[0820] A display device for presenting information related to the user,

[0821] A system means for integrating the analyzed results with information from other databases,

[0822] A system that includes this.

[0823] (Claim 2)

[0824] A system according to claim 1, which accumulates the user's past actions and generates memory supplementation information.

[0825] (Claim 3)

[0826] A system that provides recommendations based on the user's lifestyle in cooperation with other devices, according to claim 1.

[0827] "Example 1"

[0828] (Claim 1)

[0829] A device and means for collecting visual and auditory information obtained from the user's daily life,

[0830] A device and means for processing collected information in real time, anonymizing it, and removing unnecessary information,

[0831] Device means for transmitting pre-processed information to a central processing unit using a secure communication path,

[0832] A central processing unit analyzes information using machine learning algorithms to estimate user interests and behavioral patterns;

[0833] A visual output device means that presents personalized information relevant to the user,

[0834] A system that accumulates user behavior history based on analysis results and generates recommendations that will be useful in the future,

[0835] A system that includes this.

[0836] (Claim 2)

[0837] A system according to claim 1, which stores information on a user's past activities and provides memory supplementation information and future-oriented recommendations.

[0838] (Claim 3)

[0839] A system that collaborates with other devices and services to provide personalized suggestions based on the user's lifestyle, as described in claim 1.

[0840] "Application Example 1"

[0841] (Claim 1)

[0842] A visual device means for collecting environmental data and audio information from the user,

[0843] Information processing device means for pre-processing collected data to anonymize and reduce data size,

[0844] An analytical device means for analyzing pre-processed data to estimate individual interests,

[0845] A video output device means for displaying information related to the user,

[0846] An information integration system means that combines the analyzed results with information from an external database,

[0847] A system that works in conjunction with a sensor network that provides real-time information on the usage status of public facilities,

[0848] A system that includes this.

[0849] (Claim 2)

[0850] A memory management device according to claim 1, which accumulates the user's past actions and generates memory supplementation information based on that action history.

[0851] (Claim 3)

[0852] A recommendation system that provides event information based on the user's lifestyle in cooperation with other information gathering devices, according to claim 1.

[0853] "Example 2 of combining an emotion engine"

[0854] (Claim 1)

[0855] A means of using a terminal to collect visual and auditory data from users,

[0856] A means for applying noise reduction and anonymization processing to the collected data,

[0857] A means of sending pre-processed data to a server and analyzing it using an emotion recognition engine and a machine learning model,

[0858] A means for generating dynamic information and determining information based on the user's emotional state, using a generative AI model based on the analysis results.

[0859] A means of using a display device that adjusts the overlay display according to the user's gaze and facial expressions,

[0860] A system that includes this.

[0861] (Claim 2)

[0862] The system according to claim 1, which has means for accumulating a user's past emotional data and generating memory supplementation information.

[0863] (Claim 3)

[0864] The system according to claim 1, which provides a means for providing recommendations based on the user's lifestyle and emotions in conjunction with emotion recognition.

[0865] "Application example 2 when combining with an emotional engine"

[0866] (Claim 1)

[0867] Information processing device means for collecting visual and auditory information from a user,

[0868] Information processing device means for pre-processing, anonymizing, and sorting collected information,

[0869] An information analysis system means that analyzes pre-processed information to estimate the user's emotional state,

[0870] A display device for presenting relevant product or service information based on the user's emotional state,

[0871] An information integration system means that integrates the analyzed emotional state and estimation results with supplementary information obtained from other information storage devices,

[0872] A system that includes this.

[0873] (Claim 2)

[0874] A data storage system according to claim 1, which accumulates information on a user's past behavior and emotional state to generate memory supplementation information.

[0875] (Claim 3)

[0876] The information provision system according to claim 1, which provides recommendation information based on the user's lifestyle and emotional state in cooperation with other information processing devices. [Explanation of Symbols]

[0877] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A visual device means for collecting environmental data and audio information from the user, Information processing device means for pre-processing collected data to anonymize and reduce data size, An analytical device means for analyzing pre-processed data to estimate individual interests, A video output device means for displaying information related to the user, An information integration system means that combines the analyzed results with information from an external database, A system that works in conjunction with a sensor network that provides real-time information on the usage status of public facilities, A system that includes this.

2. A memory management device according to claim 1, which accumulates the user's past actions and generates memory supplementation information based on that action history.

3. The system according to claim 1, which provides event information based on the user's lifestyle in cooperation with other information gathering devices.