system

A system collects and summarizes user-specified data into personalized video content, using machine learning and an emotion engine to address the challenge of information overload and enhance user engagement.

JP2026105479APending Publication Date: 2026-06-26SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-16
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

The rapid increase in information makes it difficult for individuals to efficiently obtain information that matches their interests, leading to a waste of time and resources, and existing systems fail to provide personalized and emotionally responsive content delivery.

Method used

A system that collects user-specified data of interest, summarizes it, and provides it in video format, utilizing machine learning to personalize and adapt based on user feedback, incorporating an emotion engine to adjust content delivery.

Benefits of technology

Enables efficient, personalized, and emotionally responsive information delivery that meets user needs by continuously improving content relevance and engagement.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026105479000001_ABST
    Figure 2026105479000001_ABST
Patent Text Reader

Abstract

Provide a system. 【Solution means】 Means for receiving data related to the field of interest specified by the user, Means for collecting relevant information from the digital network based on the received field of interest, Means for automatically summarizing the collected information, Means for converting the summarized information into a video format with audio within the specified time, Means for providing the generated video and audio content to the user's information processing device, Means for collecting the user's opinions and performing machine learning processing to improve the adaptive performance of the system, Means for summarizing information in different categories and optimizing the display content based on the user's selection, A system including.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0005] , ,

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In modern society, the explosive increase in information is ongoing, making it difficult for individuals to efficiently obtain the information they need. As a result, the risk of missing important information increases, and time and resources are wasted by unnecessary information. In particular, for people leading busy daily lives, quickly obtaining information that matches their interests and concerns is a major challenge.

Means for Solving the Problems

[0005] To solve this problem, the present invention proposes a system that collects data corresponding to user-specified areas of interest, summarizes it, and provides it in video format. This system includes means for receiving user-specified information, means for collecting related information, means for summarizing information, means for generating videos, means for providing videos, and means for learning to improve the system using user feedback, thereby improving the efficiency and accuracy of user information acquisition. In particular, by utilizing machine learning models, it is possible to continuously increase the degree of personalization based on feedback and provide the user with the most relevant information.

[0006] A "user" refers to someone who uses a system to obtain information tailored to their personal interests.

[0007] "Areas of interest" refers to specific information categories that users specify based on their own interests.

[0008] "Data collection" refers to the process of obtaining information related to a specified area of ​​interest from the internet.

[0009] "Summarization" refers to the process of shortening collected information and extracting its essence.

[0010] "Video format" refers to a digital format designed to convey information visually and audibly.

[0011] A "learning algorithm" refers to a set of computational methods that utilize user feedback to continuously improve the predictive accuracy and adaptability of a system.

[0012] "Feedback" refers to the opinions and evaluations that users give to the content and services provided by a system.

[0013] "Personalization" refers to the process of optimizing content and services according to the preferences and needs of individual users. [Brief explanation of the drawing]

[0014] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Embodiments for Carrying Out the Invention

[0015] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0016] First, the terms used in the following description will be explained.

[0017] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0018] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0019] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.

[0020] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0022] [First Embodiment]

[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0035] The embodiments for carrying out this invention will be described in detail. This system has a configuration for efficiently providing the latest information based on the user's selected area of ​​interest.

[0036] First, the user uses their device to select their areas of interest. This information is sent to the server and stored in the database as a user profile. This reduces the effort required to select areas of interest again in the future.

[0037] The server collects news and information related to the user's specified areas of interest from the internet, based on their profile. This process utilizes methods to obtain the latest information through multiple news sites and open APIs. The key is ensuring the reliability and real-time nature of the information.

[0038] The collected information is analyzed on a server and summarized using natural language processing technology. During summarization, the most important elements are extracted to match the user's desired video length. In the text summarization stage, the goal is to eliminate redundancy and focus on the essential information the user truly needs.

[0039] Next, the server converts the summarized information into a video format. This process integrates text and related images and video clips in a slideshow format, and adds narration. This makes the information easier for the user to receive visually. In addition, each segment of the video uses visual effects to highlight key phrases and important points.

[0040] The generated video is delivered to the device, and the user watches it through the device. The video played on the device is optimized to minimize buffering, providing a smooth viewing experience.

[0041] After watching a video, users can provide feedback on it. This feedback is sent from the device to the server and used to improve the quality of future information delivery. Based on the user's feedback and viewing history, the server runs machine learning algorithms to further refine personalized information delivery for each user.

[0042] For example, if a user expresses interest in "environmental issues," the server will collect the latest environmental news and summarize it, focusing on important topics such as CO2 emission reduction. The generated video will combine relevant images and narration to explain the current state of global warming to the user in detail. In this way, the present invention realizes the efficient provision of information that meets user needs in an age of information overload.

[0043] The following describes the processing flow.

[0044] Step 1:

[0045] The user selects genres of interest using their device. The device sends this information to the server, where it is stored in the database as the user's profile data.

[0046] Step 2:

[0047] The server collects relevant news and information from the internet based on the user's profile. It consults multiple reliable sources and selects the most relevant information.

[0048] Step 3:

[0049] The server summarizes the collected information. Using natural language processing techniques, it extracts key points and removes redundant parts. As a result, a summary that fits the video time frame set by the user is obtained.

[0050] Step 4:

[0051] The server generates videos based on summarized information. It incorporates images and video clips related to the text and adds narration using speech synthesis technology. This allows information to be conveyed through both visual and auditory means.

[0052] Step 5:

[0053] The device receives the generated video and provides it to the user. The video is optimized for smooth playback on the device.

[0054] Step 6:

[0055] Users watch videos and provide feedback on the quality of the content. The device sends this feedback to the server, which is then used to improve future content.

[0056] Step 7:

[0057] The server receives user feedback and uses machine learning algorithms to improve the accuracy of information provided. This optimizes the selection of news for future users so that it better matches their interests.

[0058] (Example 1)

[0059] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0060] In modern society, information is vast and diverse, making it difficult for users to quickly and efficiently obtain the latest information that perfectly matches their interests. Furthermore, providing relevant information in a visually easy-to-understand format and personalizing that information delivery to each user are also challenges. There is also room for improvement in maintaining the speed and quality of information transmission.

[0061] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0062] In this invention, the server includes means for acquiring information on a user-specified area of ​​interest, means for machine-summarizing the acquired data, and means for converting the summarized data into a video format within a specified period. This makes it possible to quickly gather, summarize, and provide relevant information tailored to the user's interests in a visually easy-to-understand format.

[0063] A "user" is the entity that utilizes a service or system, and is the recipient of information.

[0064] "Areas of interest" refers to the areas of information that a user is particularly interested in and wants to learn more about.

[0065] "Information" includes data, news, and knowledge related to the user's areas of interest.

[0066] A "server" is a computer system used for collecting, processing, storing, and providing information.

[0067] "Summarization" is the process of making acquired information concise and extracting its core essence.

[0068] "Video format" refers to media formats such as videos and slideshows that can convey information visually.

[0069] "Device" refers to an electronic device used by a user to receive and utilize information.

[0070] A "learning method" is a means by which a system analyzes user opinions and behaviors to improve the accuracy and suitability of the information it provides.

[0071] "Textual information" refers to information provided in text format.

[0072] "Voice information" refers to information provided in audio format.

[0073] "Visual material" refers to elements of media that are communicated visually, such as images and video clips.

[0074] In order to implement this invention, it is necessary to build a system in which the user, server, and terminal elements work together.

[0075] Users select their areas of interest using a device. The device has a dedicated application or web interface installed, through which users input their areas of interest and send them to the server. This process allows users to easily specify the information they are interested in.

[0076] The server collects relevant information via the internet based on the user's areas of interest. The server utilizes scraping tools implemented in programming languages ​​such as Python, as well as open APIs. The collected information is automatically summarized using a natural language processing generative AI model. During summarization, redundant parts are removed, and the core information is extracted.

[0077] The server then converts the summarized information into a video format. This video generation uses video editing software such as Adobe Premiere Pro or Final Cut Pro. The text, related images, and video clips are integrated, and voice narration generated using a generative AI model is added to create a complete visual collection.

[0078] The generated video is sent from the server to the user's terminal, where the user uses it to view the information. The terminal plays the video smoothly using a high-speed data transfer protocol, minimizing buffering and providing a comfortable viewing experience.

[0079] After viewing, users can provide feedback through their device. This feedback is sent to the server and stored in a database. The server uses this feedback to run machine learning algorithms and learn to improve the quality and accuracy of the information provided in the future.

[0080] For example, if a user expresses interest in "environmental issues," the server will collect the latest environmental data and summarize key topics related to global warming. The resulting video will combine relevant visual elements with narration to visually convey the latest environmental issues to the user.

[0081] An example of a prompt message might be, "Please summarize the latest news on environmental issues and provide it in a visually easy-to-understand format."

[0082] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0083] Step 1:

[0084] The user selects their area of ​​interest using a terminal. As input, the user enters their area of ​​interest (e.g., "environmental issues") into the terminal interface. This information is sent from the terminal to the server. The terminal accurately receives the user's selection and provides it to the server as area of ​​interest information. This prepares the server to provide the user with the most relevant information.

[0085] Step 2:

[0086] The server collects relevant information from the internet based on the user's areas of interest. It receives area of ​​interest information as input and uses scraping tools and open APIs to collect data. The output is a collection of various pieces of information that match the user's areas of interest. The server verifies the reliability and relevance of the collected data and constructs the necessary datasets for the next step.

[0087] Step 3:

[0088] The server summarizes the collected information using natural language processing techniques. The input is the data obtained through information gathering. A generative AI model is used to eliminate redundancy and extract key points, generating a concise summary. The output is the summarized text data. Through this summary, the server prepares to provide the user with the essential information they need.

[0089] Step 4:

[0090] The server converts summarized information into a video format. Using summarized text as input, it collects relevant images and video clips to create a slideshow. Video editing software is used to add voice narration generated by a generative AI model and integrate the visual elements. The output of this process is a visually easy-to-understand video content.

[0091] Step 5:

[0092] The generated video is sent from the server to the terminal. The server takes the video file to be sent as input and transmits it to the terminal using an efficient data transmission protocol. The output is a video file viewable on the terminal. After receiving the video, the terminal displays it in an environment optimized for smooth playback.

[0093] Step 6:

[0094] Users watch videos through their devices and provide feedback. As input, they enter their impressions and evaluations based on the videos they've watched into the device interface. This feedback is sent from the device to the server and stored in a database. The output is user feedback data, which the server uses to personalize and improve the accuracy of future information provision.

[0095] (Application Example 1)

[0096] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0097] In modern society, information overload makes it difficult for users to efficiently obtain the information they need. Furthermore, in the field of smart cities, there is a lack of means to quickly and easily access the latest information. In addition, personalized information tailored to the diverse interests of each user is insufficient, highlighting the need for improved user experience.

[0098] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0099] In this invention, the server includes means for receiving data related to user-specified areas of interest, means for collecting relevant information from a digital network, means for automatically summarizing the collected information, and means for integrating information from different categories and optimizing the displayed content based on the user's selection. This allows the user to receive the latest information based on their interests in a convenient, visualized format, enabling them to acquire information more accurately and quickly.

[0100] "Data related to user-specified areas of interest" refers to a collection of information related to specific areas or themes that individual users are interested in.

[0101] "Means of collecting relevant information from digital networks" refers to devices or software that have the function of finding and collecting data related to a specified area of ​​interest through electronic data communication networks such as the internet.

[0102] A "means for automatically summarizing collected information" is a system that analyzes collected data, extracts important elements from it, and transforms them into a short, concise form.

[0103] "A means of aggregating information from different categories and optimizing displayed content based on user selection" refers to a technology that organizes and adjusts diverse information according to each individual's interests and purposes, and provides it in the most visually appealing and easy-to-understand format.

[0104] "Visualized content" refers to a form of information transmission that visually represents information that exists as text or data using images, videos, graphs, etc.

[0105] The system for implementing this invention primarily consists of a user terminal, a server, and digital communication between them. The user specifies their areas of interest using an information processing device such as a smartphone, tablet, or computer. This information is then transmitted to the server via the internet. Based on the received information, the server collects relevant data from the digital network. This process utilizes news APIs and web scraping techniques.

[0106] The server summarizes the collected data using natural language processing techniques. This process utilizes libraries such as Python's NLTK to eliminate redundancy and extract key elements. Furthermore, this summarized information is visualized using video editing libraries such as OpenCV, combining text, audio, and visual elements for presentation to the user. The server delivers the generated content to the user's device, enabling a visually engaging experience where the user can select and enjoy the most relevant information from a large amount of information.

[0107] For example, if a user expresses interest in "sustainable energy," the server will collect, summarize, and provide a visualized analysis of relevant cutting-edge technologies and projects. During this process, the user can receive this information via their device while commuting or taking a break. The following are examples of prompts for the generative AI model:

[0108] "Please provide a summary of the latest energy technologies in sustainable smart cities. Specifically, include trends in energy efficiency, renewable energy projects, and urban planning."

[0109] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0110] Step 1:

[0111] The user selects areas of interest on their device. The user chooses categories of interest on their smartphone or computer screen. This action generates data on the selected areas of interest, which is then sent to the server as input data.

[0112] Step 2:

[0113] The server collects relevant information from the digital network. Based on the user's areas of interest data, it uses news APIs and web scraping techniques to retrieve highly relevant information from the internet. The main input used in this process is the user's areas of interest, and the output is the retrieved raw information data.

[0114] Step 3:

[0115] This process summarizes the information collected by the server. Natural language processing is performed on the acquired data using Python's NLTK library, extracting important elements and shortening the information. The input is the data acquired in step 2, and the output is the summarized text data.

[0116] Step 4:

[0117] The server converts the summarized information into a video format with audio. Using video editing libraries such as OpenCV, it visualizes the summarized text data by combining audio, video clips, and images. The output of this process is video data with audio as visual content.

[0118] Step 5:

[0119] The server generates video data and provides it to the user's device. The generated video content is delivered to the user's smartphone or computer via the internet and becomes playable on the device. The input is video data with audio, and the output is content that the user can view.

[0120] Step 6:

[0121] Users view content and provide feedback. Users use a feedback function to provide ratings and opinions on the content they view on their devices, and this information is sent to the server. The input is user feedback data, and the output is data that will contribute to improving the accuracy of future information.

[0122] Step 7:

[0123] The server improves the accuracy of information provided based on feedback. The collected feedback data is analyzed using machine learning algorithms, and this analysis is reflected in future information provision, thereby improving the personalization of information for users. This step generates output data related to information improvements.

[0124] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0125] The embodiments for carrying out this invention will be described in detail. The present invention provides a system that acquires information based on user-specified areas of interest, further recognizes the user's emotions using an emotion engine, and adjusts content delivery accordingly.

[0126] First, the user selects genres of interest through their device. This information is sent to the server and recorded in the database as the user's profile. Based on this profile, the server then collects relevant news and information from the internet.

[0127] The collected information is summarized on the server, and key points are extracted using natural language processing technology. A video is then generated based on this summary, and delivered to the device as optimized content to maintain user interest. The video integrates relevant images and video clips, and the text is narrated using speech synthesis.

[0128] In addition, the system of the present invention is equipped with an emotion engine, which can recognize the user's emotional state in real time. The emotion engine analyzes the user's facial expressions and tone of voice through the camera and microphone to understand the user's emotional state. For example, by sensing changes in facial expressions and voice when the user finds something interesting, the appeal of the content can be evaluated in real time.

[0129] Furthermore, the server uses this emotional data to adjust the tone and pace of the content it delivers according to the user's emotional state. For example, if it determines that the user is excited, it can add more dynamic visuals. The user's emotional data is also collected as feedback and used to improve the personalization algorithm. This makes future information delivery even more accurate and improves user satisfaction.

[0130] For example, if a user is interested in "technology" and is looking for the latest relevant technical information, the server will collect the latest news on AI technology, summarize it, and provide it to the user's device in video format. If the emotion engine detects the user smiling while they are watching, the server can use that information to incorporate similar themes and tones into the next content, thereby increasing user engagement.

[0131] Thus, the present invention realizes efficient and personalized information delivery that responds to the user's interests and emotions, and effectively solves the problems faced by conventional systems.

[0132] The following describes the processing flow.

[0133] Step 1:

[0134] The user uses their device to select areas of interest. The device sends this information to the server, updating the user's profile.

[0135] Step 2:

[0136] The server collects relevant information from reliable news sites and databases based on the user's areas of interest. It retrieves information from multiple data sources and filters out duplicate and irrelevant information.

[0137] Step 3:

[0138] The server analyzes the collected information and extracts key points using natural language processing techniques. The extracted information is then summarized to match the specified video length.

[0139] Step 4:

[0140] The server uses a video generation engine based on the summarized information to create a video that includes visual elements and audio narration. Relevant images and video clips are inserted into the video, and the narration text is subjected to speech synthesis processing.

[0141] Step 5:

[0142] The device receives the generated video and provides it to the user. The device supports video streaming playback, enabling a high-quality viewing experience.

[0143] Step 6:

[0144] While the user watches a video, an emotion engine monitors their emotional state through the camera and microphone. Emotional data is extracted from the user's facial expressions, tone of voice, and other factors.

[0145] Step 7:

[0146] The server analyzes emotional data obtained from the emotion engine and adjusts the content accordingly. If the user is surprised or excited, it instantly adjusts the content to provide dynamic content.

[0147] Step 8:

[0148] After the user finishes watching, a feedback screen is displayed, giving them the opportunity to input their satisfaction level and suggestions for improvement. The device then sends this feedback to the server.

[0149] Step 9:

[0150] The server updates its machine learning algorithms based on feedback and sentiment data to improve the accuracy of the content it delivers next time. This further personalizes the user experience.

[0151] (Example 2)

[0152] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0153] In today's world, the sheer volume of information makes it difficult for users to efficiently access content that interests them. Furthermore, there is a lack of content tailored to users' emotions and preferences, resulting in a decline in the quality of the user experience.

[0154] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0155] In this invention, the server includes means for acquiring information from a communication network based on the user's areas of interest, means for summarizing the acquired information using natural language processing technology, means for converting the summarized information into audio and visual formats, and means for recognizing the user's emotional state and adjusting the content accordingly. This enables the efficient and personalized delivery of content that responds to the user's interests and emotions.

[0156] "User-specified areas of interest" refers to the categories or themes of information that users have selected based on their own interests and preferences.

[0157] "Acquiring relevant information from a communication network" refers to the process of collecting data related to user interests via networks such as the internet.

[0158] "Summarizing using natural language processing techniques" refers to using machine learning and artificial intelligence technologies to extract important information and points from text data and summarize them concisely.

[0159] "Converting to audio and visual formats" refers to processing summarized information into viewable content using speech synthesis and video editing technologies.

[0160] "Providing to user devices" means sending the generated content to the user's terminal and providing it in a viewable state.

[0161] "Recognizing the user's emotional state and adjusting content accordingly" refers to analyzing user facial expressions and voice data collected through sensors such as cameras and microphones, and then changing the content and presentation in real time based on the results.

[0162] "Algorithms for improving personalization performance" refer to computational methods that analyze user feedback and sentiment data to provide content optimized for individual users.

[0163] This invention is a system for providing personalized information based on a user's areas of interest and emotional state. The user uses a terminal to select categories of interest, and this information is sent to a server. The server collects relevant information via a communication network. The collected data is summarized using natural language processing techniques. Examples of techniques used in this process include machine learning models and generative AI models, including advanced technologies such as "BERT" and "GPT."

[0164] The summary information is converted into audio and visual content on the server. This is done using video editing software and speech synthesis technology, and then delivered to the device in a viewable format by a generative AI model. In this process, relevant images and video clips are also integrated and designed to maintain user interest.

[0165] Furthermore, this system can recognize the user's emotional state in real time. It captures the user's facial expressions and voice tone through the camera and microphone on the device, and an emotion recognition engine on the server analyzes this data. Based on this analysis, the server adjusts the tone and pace of the content to improve the user experience.

[0166] For example, if a user is interested in "technology" and is looking for the latest information, the server collects data on the latest technological innovations, summarizes it, and delivers it to the device in video format. In this process, the emotion engine analyzes whether the user is enjoying it and reflects that data in the next content displayed.

[0167] An example of a prompt for a generative AI model is, "A method for summarizing topics related to the latest AI technologies and adjusting the content to suit the user's preferences based on sentiment analysis." This configuration is a feature of the present invention that enhances the user experience.

[0168] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0169] Step 1:

[0170] The user selects areas of interest through their device. The user's chosen interest categories are sent to the server as input. This could involve the user selecting from checkboxes or dropdown menus on the interface. The output is the user's interest data received by the server.

[0171] Step 2:

[0172] The server retrieves relevant information from the network based on the received interest data. The input is the user's interest data, and based on this, the server creates requests and accesses online information sources. Specifically, this includes actions such as the server searching news feeds and databases. The output is a collection of the retrieved relevant information.

[0173] Step 3:

[0174] The server summarizes the acquired information using natural language processing techniques. The input is the information collected in step 2. Specifically, a generative AI model is used to extract key points from the information and create a summary. The output is the summarized text data.

[0175] Step 4:

[0176] The server converts the summary data into audio and visual formats. The input is the summary text created in step 3, and content is created using speech synthesis and video editing software based on this text. Specific operations include inputting the summary text into the speech synthesis engine and selecting video footage. The output is viewable audio and video data.

[0177] Step 5:

[0178] The device captures the user's emotions using its camera and microphone. The input is the user's facial expressions and voice tone in real time. Specifically, this involves the device's sensors recording these and sending the data to a server. The output is the user's emotion data.

[0179] Step 6:

[0180] The server analyzes emotional data and adjusts the tone and pace of the content. The input is the emotional data obtained in step 5. Based on the data, the server optimizes the presentation of the content. Specifically, this could involve adding additional effects to the content or changing the speed of the narration. The output is the adjusted content.

[0181] (Application Example 2)

[0182] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0183] Conventional information delivery systems have difficulty providing content flexibly based on users' interests and emotions, and have failed to increase user engagement. Furthermore, they lacked mechanisms for continuously improving the accuracy of information. Therefore, there is a need for a system that provides highly relevant information tailored to user interests, enables real-time content adjustment based on emotions, and further improves accuracy over time.

[0184] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0185] In this invention, the server includes means for acquiring information related to user-specified interests, means for automatically summarizing the acquired information based on importance, means for converting the summarized content into visual and auditory display formats, means for analyzing the user's emotional state and dynamically adjusting the display content based on the analysis results, and means for collecting user response data and applying a learning model to improve the system's personalization performance. This enables the provision of personalized content based on the user's interests and emotions.

[0186] A "user" is an individual or group that uses a system to obtain information and interact with it.

[0187] "Interest" refers to a user's concern or preference for a particular field or topic.

[0188] "Information" is a general term for content on the internet, including data, news, articles, videos, and audio.

[0189] "To acquire" refers to the act of gathering or collecting necessary information.

[0190] "Importance" is a criterion for evaluating the value and priority of information.

[0191] "Summarizing" refers to the process of extracting the key points from information and putting them into a concise summary.

[0192] "Visual and auditory display formats" refer to methods of conveying information to users visually and aurally by combining images and sounds.

[0193] "Emotional state" refers to psychological or emotional changes that can be interpreted from a user's facial expressions and tone of voice.

[0194] "To analyze" means to examine data and find meaning or patterns in it.

[0195] "Dynamic adjustment" refers to changing and adapting the content and structure of information in real time.

[0196] "Response data" refers to data obtained from user reactions to the system, such as their actions and facial expressions.

[0197] "Personalization capability" refers to the ability of a system to provide services that meet the different needs and preferences of each user.

[0198] A "learning model" is a mathematical framework that uses algorithms to learn patterns based on past data and predict future behaviors and patterns.

[0199] The system for implementing this invention is designed to provide personalized content based on the user's interests and emotions. The server receives data on areas of interest transmitted from the user via a terminal. This data is stored in a database and used as a profile. Next, the server automatically collects information related to these areas of interest from the internet. The collected information is summarized using natural language processing techniques based on its importance.

[0200] The summarized information is then converted into visual and auditory display formats. Here, speech synthesis technology is used to create narration. Video generation software is used to create the video, integrating appropriate visual elements. For example, if the user is interested in "travel," travel guides and the latest tourist spot information are provided as videos.

[0201] Furthermore, the server analyzes the user's emotional state in real time using the camera and microphone built into the device. Here, a machine learning model is used to determine the user's emotions from their facial expressions and tone of voice. Based on this emotion analysis, the server dynamically adjusts the speed and style of the content to optimize the user experience. Specifically, if a smile is detected in the user, similar themes and tones will be emphasized in the next content.

[0202] In addition, user response data is collected by the server. This data is reflected in a learning model and used to improve the accuracy of content delivery. The more repeatedly a user accesses the content, the more accurate the personalization becomes.

[0203] (Example of a prompt message)

[0204] Analyze the user's emotions from their facial expressions and voice, and suggest travel-related content that will be more engaging for them. The target user is a Japanese woman in her 30s, so emphasize plans that appear appealing.

[0205] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0206] Step 1:

[0207] The server receives data on areas of interest that the user has submitted through their device. This data is stored in a database as input, forming a user profile. This profile serves as the basis for subsequent information gathering and content delivery.

[0208] Step 2:

[0209] The server collects information related to the user's areas of interest from the internet based on their user profile. This collection step involves obtaining data from web crawling and news APIs. The input is information about the user's areas of interest, and the output is a collection of related information.

[0210] Step 3:

[0211] The server summarizes the collected information using natural language processing (NLP) techniques. The input is the collection of information obtained in step 2, and importance analysis is performed to generate summary information as output. This summary forms the basis for content generation.

[0212] Step 4:

[0213] The server generates visual and auditory display formats based on the summary information. This process utilizes video generation software and speech synthesis technology. The input is summary information, which is converted into video clips and audio narration, and the output is multimedia content.

[0214] Step 5:

[0215] The device uses the user's camera and microphone to collect emotional states in real time. The collected data is sent to the server as input. The server uses a machine learning model to analyze emotions from facial expressions and voice tone, and outputs the emotion analysis results.

[0216] Step 6:

[0217] The server uses sentiment analysis results to dynamically adjust the speed and style of content. The input is the sentiment analysis results, and the content parameters are modified to enhance the user experience. This provides content output that is appropriate to the user's state.

[0218] Step 7:

[0219] The server collects user response data and updates the learning model to improve the system's personalization capabilities. The input is user feedback, and this data is fed into the machine learning algorithm, resulting in improved adaptive performance as output.

[0220] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0221] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0222] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0223] [Second Embodiment]

[0224] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0225] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0226] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0227] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0228] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0229] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0230] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0231] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0232] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0233] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0234] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0235] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0236] The embodiments for carrying out this invention will be described in detail. This system has a configuration for efficiently providing the latest information based on the user's selected area of ​​interest.

[0237] First, the user uses their device to select their areas of interest. This information is sent to the server and stored in the database as a user profile. This reduces the effort required to select areas of interest again in the future.

[0238] The server collects news and information related to the user's specified areas of interest from the internet, based on their profile. This process utilizes methods to obtain the latest information through multiple news sites and open APIs. The key is ensuring the reliability and real-time nature of the information.

[0239] The collected information is analyzed on a server and summarized using natural language processing technology. During summarization, the most important elements are extracted to match the user's desired video length. In the text summarization stage, the goal is to eliminate redundancy and focus on the essential information the user truly needs.

[0240] Next, the server converts the summarized information into a video format. This process integrates text and related images and video clips in a slideshow format, and adds narration. This makes the information easier for the user to receive visually. In addition, each segment of the video uses visual effects to highlight key phrases and important points.

[0241] The generated video is delivered to the device, and the user watches it through the device. The video played on the device is optimized to minimize buffering, providing a smooth viewing experience.

[0242] After watching a video, users can provide feedback on it. This feedback is sent from the device to the server and used to improve the quality of future information delivery. Based on the user's feedback and viewing history, the server runs machine learning algorithms to further refine personalized information delivery for each user.

[0243] For example, if a user expresses interest in "environmental issues," the server will collect the latest environmental news and summarize it, focusing on important topics such as CO2 emission reduction. The generated video will combine relevant images and narration to explain the current state of global warming to the user in detail. In this way, the present invention realizes the efficient provision of information that meets user needs in an age of information overload.

[0244] The following describes the processing flow.

[0245] Step 1:

[0246] The user selects genres of interest using their device. The device sends this information to the server, where it is stored in the database as the user's profile data.

[0247] Step 2:

[0248] The server collects relevant news and information from the internet based on the user's profile. It consults multiple reliable sources and selects the most relevant information.

[0249] Step 3:

[0250] The server summarizes the collected information. Using natural language processing techniques, it extracts key points and removes redundant parts. As a result, a summary that fits the video time frame set by the user is obtained.

[0251] Step 4:

[0252] The server generates videos based on summarized information. It incorporates images and video clips related to the text and adds narration using speech synthesis technology. This allows information to be conveyed through both visual and auditory means.

[0253] Step 5:

[0254] The device receives the generated video and provides it to the user. The video is optimized for smooth playback on the device.

[0255] Step 6:

[0256] Users watch videos and provide feedback on the quality of the content. The device sends this feedback to the server, which is then used to improve future content.

[0257] Step 7:

[0258] The server receives user feedback and uses machine learning algorithms to improve the accuracy of information provided. This optimizes the selection of news for future users so that it better matches their interests.

[0259] (Example 1)

[0260] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0261] In modern society, information is vast and diverse, making it difficult for users to quickly and efficiently obtain the latest information that perfectly matches their interests. Furthermore, providing relevant information in a visually easy-to-understand format and personalizing that information delivery to each user are also challenges. There is also room for improvement in maintaining the speed and quality of information transmission.

[0262] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0263] In this invention, the server includes means for acquiring information on a user-specified area of ​​interest, means for machine-summarizing the acquired data, and means for converting the summarized data into a video format within a specified period. This makes it possible to quickly gather, summarize, and provide relevant information tailored to the user's interests in a visually easy-to-understand format.

[0264] A "user" is the entity that utilizes a service or system, and is the recipient of information.

[0265] "Areas of interest" refers to the areas of information that a user is particularly interested in and wants to learn more about.

[0266] "Information" includes data, news, and knowledge related to the user's areas of interest.

[0267] A "server" is a computer system used for collecting, processing, storing, and providing information.

[0268] "Summarization" is the process of making acquired information concise and extracting its core essence.

[0269] "Video format" refers to media formats such as videos and slideshows that can convey information visually.

[0270] "Device" refers to an electronic device used by a user to receive and utilize information.

[0271] A "learning method" is a means by which a system analyzes user opinions and behaviors to improve the accuracy and suitability of the information it provides.

[0272] "Textual information" refers to information provided in text format.

[0273] "Voice information" refers to information provided in audio format.

[0274] "Visual material" refers to elements of media that are communicated visually, such as images and video clips.

[0275] In order to implement this invention, it is necessary to build a system in which the user, server, and terminal elements work together.

[0276] Users select their areas of interest using a device. The device has a dedicated application or web interface installed, through which users input their areas of interest and send them to the server. This process allows users to easily specify the information they are interested in.

[0277] The server collects relevant information via the internet based on the user's areas of interest. The server utilizes scraping tools implemented in programming languages ​​such as Python, as well as open APIs. The collected information is automatically summarized using a natural language processing generative AI model. During summarization, redundant parts are removed, and the core information is extracted.

[0278] The server then converts the summarized information into a video format. This video generation uses video editing software such as Adobe Premiere Pro or Final Cut Pro. The text, related images, and video clips are integrated, and voice narration generated using a generative AI model is added to create a complete visual collection.

[0279] The generated video is sent from the server to the user's terminal, where the user uses it to view the information. The terminal plays the video smoothly using a high-speed data transfer protocol, minimizing buffering and providing a comfortable viewing experience.

[0280] After viewing, users can provide feedback through their device. This feedback is sent to the server and stored in a database. The server uses this feedback to run machine learning algorithms and learn to improve the quality and accuracy of the information provided in the future.

[0281] As a specific example, when a user is interested in "environmental issues", the server collects the latest environmental-related data and focuses on summarizing important topics related to global warming. The generated video is composed of relevant visual materials and narration, visually conveying the latest environmental situation to the user.

[0282] Examples of prompt sentences include "Please summarize the news on the latest environmental issues and provide it in a visually easy-to-understand format."

[0283] The flow of the specific process in Example 1 will be described using FIG. 11.

[0284] Step 1:

[0285] The user uses the terminal to select the area of interest. As input, the user's area of interest (e.g., "environmental issues") is entered into the terminal interface. This information is sent from the terminal to the server. The terminal accurately receives the user's selection and provides it to the server as area-of-interest information. Thereby, the server prepares to provide the optimal information to the user.

[0286] Step 2:

[0287] Based on the area of interest received from the user, the server collects relevant information on the Internet. As input, it receives the area-of-interest information and uses scraping tools or open APIs to collect data based on it. The output is a collection of various information that matches the user's area of interest. The server checks the reliability and relevance of the collected data and constructs the necessary dataset for the next step.

[0288] Step 3:

[0289] The server summarizes the collected information using natural language processing techniques. The input is the data obtained through information gathering. A generative AI model is used to eliminate redundancy and extract key points, generating a concise summary. The output is the summarized text data. Through this summary, the server prepares to provide the user with the essential information they need.

[0290] Step 4:

[0291] The server converts summarized information into a video format. Using summarized text as input, it collects relevant images and video clips to create a slideshow. Video editing software is used to add voice narration generated by a generative AI model and integrate the visual elements. The output of this process is a visually easy-to-understand video content.

[0292] Step 5:

[0293] The generated video is sent from the server to the terminal. The server takes the video file to be sent as input and transmits it to the terminal using an efficient data transmission protocol. The output is a video file viewable on the terminal. After receiving the video, the terminal displays it in an environment optimized for smooth playback.

[0294] Step 6:

[0295] Users watch videos through their devices and provide feedback. As input, they enter their impressions and evaluations based on the videos they've watched into the device interface. This feedback is sent from the device to the server and stored in a database. The output is user feedback data, which the server uses to personalize and improve the accuracy of future information provision.

[0296] (Application Example 1)

[0297] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0298] In modern society, information overload makes it difficult for users to efficiently obtain the information they need. Furthermore, in the field of smart cities, there is a lack of means to quickly and easily access the latest information. In addition, personalized information tailored to the diverse interests of each user is insufficient, highlighting the need for improved user experience.

[0299] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0300] In this invention, the server includes means for receiving data related to user-specified areas of interest, means for collecting relevant information from a digital network, means for automatically summarizing the collected information, and means for integrating information from different categories and optimizing the displayed content based on the user's selection. This allows the user to receive the latest information based on their interests in a convenient, visualized format, enabling them to acquire information more accurately and quickly.

[0301] "Data related to user-specified areas of interest" refers to a collection of information related to specific areas or themes that individual users are interested in.

[0302] "Means of collecting relevant information from digital networks" refers to devices or software that have the function of finding and collecting data related to a specified area of ​​interest through electronic data communication networks such as the internet.

[0303] A "means for automatically summarizing collected information" is a system that analyzes collected data, extracts important elements from it, and transforms them into a short, concise form.

[0304] The means of "summarizing information in different categories and optimizing display content based on user selection" is a technology for organizing and adjusting diverse information according to each individual's interests and purposes, and providing it in the most visually appealing and understandable form.

[0305] "Visualized content" is an information transmission form that visually represents information existing as text or data using images, videos, graphs, etc.

[0306] The system for implementing this invention is mainly composed of a user terminal, a server, and digital communication between them. The user uses an information processing device such as a smartphone, tablet, or computer to specify the field of their interest. Then, this information is transmitted to the server through the Internet. The server collects relevant data from the digital network based on the received information. This process is carried out by utilizing news APIs and web scraping technologies.

[0307] The server summarizes the collected data using natural language processing technology. In this process, libraries such as Python's NLTK library are utilized to eliminate information redundancy and extract important elements. Furthermore, this summarized information is visualized by a video editing library such as OpenCV, and becomes a form presented to the user by combining text information, audio information, and visual materials. The server provides the generated content to the user's terminal, realizing an experience where the user can visually enjoy the most relevant information based on their selection from an overload of information.

[0308] As a specific example, when the user shows interest in "sustainable energy", the server collects information on related latest technologies and projects, summarizes it, and provides a visualized analysis. In this process, the user can receive this information through the device during transportation or break times. Examples of prompt texts for the generative AI model are as follows.

[0309] "Please provide a summary of the latest energy technologies in sustainable smart cities. Specifically, include trends in energy efficiency, renewable energy projects, and urban planning."

[0310] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0311] Step 1:

[0312] The user selects areas of interest on their device. The user chooses categories of interest on their smartphone or computer screen. This action generates data on the selected areas of interest, which is then sent to the server as input data.

[0313] Step 2:

[0314] The server collects relevant information from the digital network. Based on the user's areas of interest data, it uses news APIs and web scraping techniques to retrieve highly relevant information from the internet. The main input used in this process is the user's areas of interest, and the output is the retrieved raw information data.

[0315] Step 3:

[0316] This process summarizes the information collected by the server. Natural language processing is performed on the acquired data using Python's NLTK library, extracting important elements and shortening the information. The input is the data acquired in step 2, and the output is the summarized text data.

[0317] Step 4:

[0318] The server converts the summarized information into a video format with audio. Using video editing libraries such as OpenCV, it visualizes the summarized text data by combining audio, video clips, and images. The output of this process is video data with audio as visual content.

[0319] Step 5:

[0320] The server generates video data and provides it to the user's device. The generated video content is delivered to the user's smartphone or computer via the internet and becomes playable on the device. The input is video data with audio, and the output is content that the user can view.

[0321] Step 6:

[0322] Users view content and provide feedback. Users use a feedback function to provide ratings and opinions on the content they view on their devices, and this information is sent to the server. The input is user feedback data, and the output is data that will contribute to improving the accuracy of future information.

[0323] Step 7:

[0324] The server improves the accuracy of information provided based on feedback. The collected feedback data is analyzed using machine learning algorithms, and this analysis is reflected in future information provision, thereby improving the personalization of information for users. This step generates output data related to information improvements.

[0325] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0326] The embodiments for carrying out this invention will be described in detail. The present invention provides a system that acquires information based on user-specified areas of interest, further recognizes the user's emotions using an emotion engine, and adjusts content delivery accordingly.

[0327] First, the user selects genres of interest through their device. This information is sent to the server and recorded in the database as the user's profile. Based on this profile, the server then collects relevant news and information from the internet.

[0328] The collected information is summarized on the server, and key points are extracted using natural language processing technology. A video is then generated based on this summary, and delivered to the device as optimized content to maintain user interest. The video integrates relevant images and video clips, and the text is narrated using speech synthesis.

[0329] In addition, the system of the present invention is equipped with an emotion engine, which can recognize the user's emotional state in real time. The emotion engine analyzes the user's facial expressions and tone of voice through the camera and microphone to understand the user's emotional state. For example, by sensing changes in facial expressions and voice when the user finds something interesting, the appeal of the content can be evaluated in real time.

[0330] Furthermore, the server uses this emotional data to adjust the tone and pace of the content it delivers according to the user's emotional state. For example, if it determines that the user is excited, it can add more dynamic visuals. The user's emotional data is also collected as feedback and used to improve the personalization algorithm. This makes future information delivery even more accurate and improves user satisfaction.

[0331] For example, if a user is interested in "technology" and is looking for the latest relevant technical information, the server will collect the latest news on AI technology, summarize it, and provide it to the user's device in video format. If the emotion engine detects the user smiling while they are watching, the server can use that information to incorporate similar themes and tones into the next content, thereby increasing user engagement.

[0332] Thus, the present invention realizes efficient and personalized information delivery that responds to the user's interests and emotions, and effectively solves the problems faced by conventional systems.

[0333] The following describes the processing flow.

[0334] Step 1:

[0335] The user uses their device to select areas of interest. The device sends this information to the server, updating the user's profile.

[0336] Step 2:

[0337] The server collects relevant information from reliable news sites and databases based on the user's areas of interest. It retrieves information from multiple data sources and filters out duplicate and irrelevant information.

[0338] Step 3:

[0339] The server analyzes the collected information and extracts key points using natural language processing techniques. The extracted information is then summarized to match the specified video length.

[0340] Step 4:

[0341] The server uses a video generation engine based on the summarized information to create a video that includes visual elements and audio narration. Relevant images and video clips are inserted into the video, and the narration text is subjected to speech synthesis processing.

[0342] Step 5:

[0343] The device receives the generated video and provides it to the user. The device supports video streaming playback, enabling a high-quality viewing experience.

[0344] Step 6:

[0345] While the user watches a video, an emotion engine monitors their emotional state through the camera and microphone. Emotional data is extracted from the user's facial expressions, tone of voice, and other factors.

[0346] Step 7:

[0347] The server analyzes emotional data obtained from the emotion engine and adjusts the content accordingly. If the user is surprised or excited, it instantly adjusts the content to provide dynamic content.

[0348] Step 8:

[0349] After the user finishes watching, a feedback screen is displayed, giving them the opportunity to input their satisfaction level and suggestions for improvement. The device then sends this feedback to the server.

[0350] Step 9:

[0351] The server updates its machine learning algorithms based on feedback and sentiment data to improve the accuracy of the content it delivers next time. This further personalizes the user experience.

[0352] (Example 2)

[0353] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0354] In today's world, the sheer volume of information makes it difficult for users to efficiently access content that interests them. Furthermore, there is a lack of content tailored to users' emotions and preferences, resulting in a decline in the quality of the user experience.

[0355] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0356] In this invention, the server includes means for acquiring information from a communication network based on the user's areas of interest, means for summarizing the acquired information using natural language processing technology, means for converting the summarized information into audio and visual formats, and means for recognizing the user's emotional state and adjusting the content accordingly. This enables the efficient and personalized delivery of content that responds to the user's interests and emotions.

[0357] "User-specified areas of interest" refers to the categories or themes of information that users have selected based on their own interests and preferences.

[0358] "Acquiring relevant information from a communication network" refers to the process of collecting data related to user interests via networks such as the internet.

[0359] "Summarizing using natural language processing techniques" refers to using machine learning and artificial intelligence technologies to extract important information and points from text data and summarize them concisely.

[0360] "Converting to audio and visual formats" refers to processing summarized information into viewable content using speech synthesis and video editing technologies.

[0361] "Providing to user devices" means sending the generated content to the user's terminal and providing it in a viewable state.

[0362] "Recognizing the user's emotional state and adjusting content accordingly" refers to analyzing user facial expressions and voice data collected through sensors such as cameras and microphones, and then changing the content and presentation in real time based on the results.

[0363] "Algorithms for improving personalization performance" refer to computational methods that analyze user feedback and sentiment data to provide content optimized for individual users.

[0364] This invention is a system for providing personalized information based on a user's areas of interest and emotional state. The user uses a terminal to select categories of interest, and this information is sent to a server. The server collects relevant information via a communication network. The collected data is summarized using natural language processing techniques. Examples of techniques used in this process include machine learning models and generative AI models, including advanced technologies such as "BERT" and "GPT."

[0365] The summary information is converted into audio and visual content on the server. This is done using video editing software and speech synthesis technology, and then delivered to the device in a viewable format by a generative AI model. In this process, relevant images and video clips are also integrated and designed to maintain user interest.

[0366] Furthermore, this system can recognize the user's emotional state in real time. It captures the user's facial expressions and voice tone through the camera and microphone on the device, and an emotion recognition engine on the server analyzes this data. Based on this analysis, the server adjusts the tone and pace of the content to improve the user experience.

[0367] For example, if a user is interested in "technology" and is looking for the latest information, the server collects data on the latest technological innovations, summarizes it, and delivers it to the device in video format. In this process, the emotion engine analyzes whether the user is enjoying it and reflects that data in the next content displayed.

[0368] An example of a prompt for a generative AI model is, "A method for summarizing topics related to the latest AI technologies and adjusting the content to suit the user's preferences based on sentiment analysis." This configuration is a feature of the present invention that enhances the user experience.

[0369] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0370] Step 1:

[0371] The user selects areas of interest through their device. The user's chosen interest categories are sent to the server as input. This could involve the user selecting from checkboxes or dropdown menus on the interface. The output is the user's interest data received by the server.

[0372] Step 2:

[0373] The server retrieves relevant information from the network based on the received interest data. The input is the user's interest data, and based on this, the server creates requests and accesses online information sources. Specifically, this includes actions such as the server searching news feeds and databases. The output is a collection of the retrieved relevant information.

[0374] Step 3:

[0375] The server summarizes the acquired information using natural language processing techniques. The input is the information collected in step 2. Specifically, a generative AI model is used to extract key points from the information and create a summary. The output is the summarized text data.

[0376] Step 4:

[0377] The server converts the summary data into audio and visual formats. The input is the summary text created in step 3, and content is created using speech synthesis and video editing software based on this text. Specific operations include inputting the summary text into the speech synthesis engine and selecting video footage. The output is viewable audio and video data.

[0378] Step 5:

[0379] The device captures the user's emotions using its camera and microphone. The input is the user's facial expressions and voice tone in real time. Specifically, this involves the device's sensors recording these and sending the data to a server. The output is the user's emotion data.

[0380] Step 6:

[0381] The server analyzes emotional data and adjusts the tone and pace of the content. The input is the emotional data obtained in step 5. Based on the data, the server optimizes the presentation of the content. Specifically, this could involve adding additional effects to the content or changing the speed of the narration. The output is the adjusted content.

[0382] (Application Example 2)

[0383] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0384] Conventional information delivery systems have difficulty providing content flexibly based on users' interests and emotions, and have failed to increase user engagement. Furthermore, they lacked mechanisms for continuously improving the accuracy of information. Therefore, there is a need for a system that provides highly relevant information tailored to user interests, enables real-time content adjustment based on emotions, and further improves accuracy over time.

[0385] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0386] In this invention, the server includes means for acquiring information related to user-specified interests, means for automatically summarizing the acquired information based on importance, means for converting the summarized content into visual and auditory display formats, means for analyzing the user's emotional state and dynamically adjusting the display content based on the analysis results, and means for collecting user response data and applying a learning model to improve the system's personalization performance. This enables the provision of personalized content based on the user's interests and emotions.

[0387] A "user" is an individual or group that uses a system to obtain information and interact with it.

[0388] "Interest" refers to a user's concern or preference for a particular field or topic.

[0389] "Information" is a general term for content on the internet, including data, news, articles, videos, and audio.

[0390] "To acquire" refers to the act of gathering or collecting necessary information.

[0391] "Importance" is a criterion for evaluating the value and priority of information.

[0392] "Summarizing" refers to the process of extracting the key points from information and putting them into a concise summary.

[0393] "Visual and auditory display formats" refer to methods of conveying information to users visually and aurally by combining images and sounds.

[0394] "Emotional state" refers to psychological or emotional changes that can be interpreted from a user's facial expressions and tone of voice.

[0395] "To analyze" means to examine data and find meaning or patterns in it.

[0396] "Dynamic adjustment" refers to changing and adapting the content and structure of information in real time.

[0397] "Response data" refers to data obtained from user reactions to the system, such as their actions and facial expressions.

[0398] "Personalization capability" refers to the ability of a system to provide services that meet the different needs and preferences of each user.

[0399] A "learning model" is a mathematical framework that uses algorithms to learn patterns based on past data and predict future behaviors and patterns.

[0400] The system for implementing this invention is designed to provide personalized content based on the user's interests and emotions. The server receives data on areas of interest transmitted from the user via a terminal. This data is stored in a database and used as a profile. Next, the server automatically collects information related to these areas of interest from the internet. The collected information is summarized using natural language processing techniques based on its importance.

[0401] The summarized information is then converted into visual and auditory display formats. Here, speech synthesis technology is used to create narration. Video generation software is used to create the video, integrating appropriate visual elements. For example, if the user is interested in "travel," travel guides and the latest tourist spot information are provided as videos.

[0402] Furthermore, the server analyzes the user's emotional state in real time using the camera and microphone built into the device. Here, a machine learning model is used to determine the user's emotions from their facial expressions and tone of voice. Based on this emotion analysis, the server dynamically adjusts the speed and style of the content to optimize the user experience. Specifically, if a smile is detected in the user, similar themes and tones will be emphasized in the next content.

[0403] In addition, user response data is collected by the server. This data is reflected in a learning model and used to improve the accuracy of content delivery. The more repeatedly a user accesses the content, the more accurate the personalization becomes.

[0404] (Example of a prompt message)

[0405] Analyze the user's emotions from their facial expressions and voice, and suggest travel-related content that will be more engaging for them. The target user is a Japanese woman in her 30s, so emphasize plans that appear appealing.

[0406] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0407] Step 1:

[0408] The server receives data on areas of interest that the user has submitted through their device. This data is stored in a database as input, forming a user profile. This profile serves as the basis for subsequent information gathering and content delivery.

[0409] Step 2:

[0410] The server collects information related to the user's areas of interest from the internet based on their user profile. This collection step involves obtaining data from web crawling and news APIs. The input is information about the user's areas of interest, and the output is a collection of related information.

[0411] Step 3:

[0412] The server summarizes the collected information using natural language processing (NLP) techniques. The input is the collection of information obtained in step 2, and importance analysis is performed to generate summary information as output. This summary forms the basis for content generation.

[0413] Step 4:

[0414] The server generates visual and auditory display formats based on the summary information. This process utilizes video generation software and speech synthesis technology. The input is summary information, which is converted into video clips and audio narration, and the output is multimedia content.

[0415] Step 5:

[0416] The device uses the user's camera and microphone to collect emotional states in real time. The collected data is sent to the server as input. The server uses a machine learning model to analyze emotions from facial expressions and voice tone, and outputs the emotion analysis results.

[0417] Step 6:

[0418] The server uses sentiment analysis results to dynamically adjust the speed and style of content. The input is the sentiment analysis results, and the content parameters are modified to enhance the user experience. This provides content output that is appropriate to the user's state.

[0419] Step 7:

[0420] The server collects user response data and updates the learning model to improve the system's personalization capabilities. The input is user feedback, and this data is fed into the machine learning algorithm, resulting in improved adaptive performance as output.

[0421] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0422] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0423] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0424] [Third Embodiment]

[0425] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0426] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0427] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0428] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0429] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0430] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0431] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0432] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0433] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0434] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0435] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0436] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0437] The embodiments for carrying out this invention will be described in detail. This system has a configuration for efficiently providing the latest information based on the user's selected area of ​​interest.

[0438] First, the user uses their device to select their areas of interest. This information is sent to the server and stored in the database as a user profile. This reduces the effort required to select areas of interest again in the future.

[0439] The server collects news and information related to the user's specified areas of interest from the internet, based on their profile. This process utilizes methods to obtain the latest information through multiple news sites and open APIs. The key is ensuring the reliability and real-time nature of the information.

[0440] The collected information is analyzed on a server and summarized using natural language processing technology. During summarization, the most important elements are extracted to match the user's desired video length. In the text summarization stage, the goal is to eliminate redundancy and focus on the essential information the user truly needs.

[0441] Next, the server converts the summarized information into a video format. This process integrates text and related images and video clips in a slideshow format, and adds narration. This makes the information easier for the user to receive visually. In addition, each segment of the video uses visual effects to highlight key phrases and important points.

[0442] The generated video is delivered to the device, and the user watches it through the device. The video played on the device is optimized to minimize buffering, providing a smooth viewing experience.

[0443] After watching a video, users can provide feedback on it. This feedback is sent from the device to the server and used to improve the quality of future information delivery. Based on the user's feedback and viewing history, the server runs machine learning algorithms to further refine personalized information delivery for each user.

[0444] For example, if a user expresses interest in "environmental issues," the server will collect the latest environmental news and summarize it, focusing on important topics such as CO2 emission reduction. The generated video will combine relevant images and narration to explain the current state of global warming to the user in detail. In this way, the present invention realizes the efficient provision of information that meets user needs in an age of information overload.

[0445] The following describes the processing flow.

[0446] Step 1:

[0447] The user selects genres of interest using their device. The device sends this information to the server, where it is stored in the database as the user's profile data.

[0448] Step 2:

[0449] The server collects relevant news and information from the internet based on the user's profile. It consults multiple reliable sources and selects the most relevant information.

[0450] Step 3:

[0451] The server summarizes the collected information. Using natural language processing techniques, it extracts key points and removes redundant parts. As a result, a summary that fits the video time frame set by the user is obtained.

[0452] Step 4:

[0453] The server generates videos based on summarized information. It incorporates images and video clips related to the text and adds narration using speech synthesis technology. This allows information to be conveyed through both visual and auditory means.

[0454] Step 5:

[0455] The device receives the generated video and provides it to the user. The video is optimized for smooth playback on the device.

[0456] Step 6:

[0457] Users watch videos and provide feedback on the quality of the content. The device sends this feedback to the server, which is then used to improve future content.

[0458] Step 7:

[0459] The server receives user feedback and uses machine learning algorithms to improve the accuracy of information provided. This optimizes the selection of news for future users so that it better matches their interests.

[0460] (Example 1)

[0461] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0462] In modern society, information is vast and diverse, making it difficult for users to quickly and efficiently obtain the latest information that perfectly matches their interests. Furthermore, providing relevant information in a visually easy-to-understand format and personalizing that information delivery to each user are also challenges. There is also room for improvement in maintaining the speed and quality of information transmission.

[0463] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0464] In this invention, the server includes means for acquiring information on a user-specified area of ​​interest, means for machine-summarizing the acquired data, and means for converting the summarized data into a video format within a specified period. This makes it possible to quickly gather, summarize, and provide relevant information tailored to the user's interests in a visually easy-to-understand format.

[0465] A "user" is the entity that utilizes a service or system, and is the recipient of information.

[0466] "Areas of interest" refers to the areas of information that a user is particularly interested in and wants to learn more about.

[0467] "Information" includes data, news, and knowledge related to the user's areas of interest.

[0468] A "server" is a computer system used for collecting, processing, storing, and providing information.

[0469] "Summarization" is the process of making acquired information concise and extracting its core essence.

[0470] "Video format" refers to media formats such as videos and slideshows that can convey information visually.

[0471] "Device" refers to an electronic device used by a user to receive and utilize information.

[0472] A "learning method" is a means by which a system analyzes user opinions and behaviors to improve the accuracy and suitability of the information it provides.

[0473] "Textual information" refers to information provided in text format.

[0474] "Voice information" refers to information provided in audio format.

[0475] "Visual material" refers to elements of media that are communicated visually, such as images and video clips.

[0476] In order to implement this invention, it is necessary to build a system in which the user, server, and terminal elements work together.

[0477] Users select their areas of interest using a device. The device has a dedicated application or web interface installed, through which users input their areas of interest and send them to the server. This process allows users to easily specify the information they are interested in.

[0478] The server collects relevant information via the internet based on the user's areas of interest. The server utilizes scraping tools implemented in programming languages ​​such as Python, as well as open APIs. The collected information is automatically summarized using a natural language processing generative AI model. During summarization, redundant parts are removed, and the core information is extracted.

[0479] The server then converts the summarized information into a video format. This video generation uses video editing software such as Adobe Premiere Pro or Final Cut Pro. The text, related images, and video clips are integrated, and voice narration generated using a generative AI model is added to create a complete visual collection.

[0480] The generated video is sent from the server to the user's terminal, where the user uses it to view the information. The terminal plays the video smoothly using a high-speed data transfer protocol, minimizing buffering and providing a comfortable viewing experience.

[0481] After viewing, users can provide feedback through their device. This feedback is sent to the server and stored in a database. The server uses this feedback to run machine learning algorithms and learn to improve the quality and accuracy of the information provided in the future.

[0482] For example, if a user expresses interest in "environmental issues," the server will collect the latest environmental data and summarize key topics related to global warming. The resulting video will combine relevant visual elements with narration to visually convey the latest environmental issues to the user.

[0483] An example of a prompt message might be, "Please summarize the latest news on environmental issues and provide it in a visually easy-to-understand format."

[0484] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0485] Step 1:

[0486] The user selects their area of ​​interest using a terminal. As input, the user enters their area of ​​interest (e.g., "environmental issues") into the terminal interface. This information is sent from the terminal to the server. The terminal accurately receives the user's selection and provides it to the server as area of ​​interest information. This prepares the server to provide the user with the most relevant information.

[0487] Step 2:

[0488] The server collects relevant information from the internet based on the user's areas of interest. It receives area of ​​interest information as input and uses scraping tools and open APIs to collect data. The output is a collection of various pieces of information that match the user's areas of interest. The server verifies the reliability and relevance of the collected data and constructs the necessary datasets for the next step.

[0489] Step 3:

[0490] The server summarizes the collected information using natural language processing techniques. The input is the data obtained through information gathering. A generative AI model is used to eliminate redundancy and extract key points, generating a concise summary. The output is the summarized text data. Through this summary, the server prepares to provide the user with the essential information they need.

[0491] Step 4:

[0492] The server converts summarized information into a video format. Using summarized text as input, it collects relevant images and video clips to create a slideshow. Video editing software is used to add voice narration generated by a generative AI model and integrate the visual elements. The output of this process is a visually easy-to-understand video content.

[0493] Step 5:

[0494] The generated video is sent from the server to the terminal. The server takes the video file to be sent as input and transmits it to the terminal using an efficient data transmission protocol. The output is a video file viewable on the terminal. After receiving the video, the terminal displays it in an environment optimized for smooth playback.

[0495] Step 6:

[0496] Users watch videos through their devices and provide feedback. As input, they enter their impressions and evaluations based on the videos they've watched into the device interface. This feedback is sent from the device to the server and stored in a database. The output is user feedback data, which the server uses to personalize and improve the accuracy of future information provision.

[0497] (Application Example 1)

[0498] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0499] In modern society, information overload makes it difficult for users to efficiently obtain the information they need. Furthermore, in the field of smart cities, there is a lack of means to quickly and easily access the latest information. In addition, personalized information tailored to the diverse interests of each user is insufficient, highlighting the need for improved user experience.

[0500] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0501] In this invention, the server includes means for receiving data related to user-specified areas of interest, means for collecting relevant information from a digital network, means for automatically summarizing the collected information, and means for integrating information from different categories and optimizing the displayed content based on the user's selection. This allows the user to receive the latest information based on their interests in a convenient, visualized format, enabling them to acquire information more accurately and quickly.

[0502] "Data related to user-specified areas of interest" refers to a collection of information related to specific areas or themes that individual users are interested in.

[0503] "Means of collecting relevant information from digital networks" refers to devices or software that have the function of finding and collecting data related to a specified area of ​​interest through electronic data communication networks such as the internet.

[0504] A "means for automatically summarizing collected information" is a system that analyzes collected data, extracts important elements from it, and transforms them into a short, concise form.

[0505] "A means of aggregating information from different categories and optimizing displayed content based on user selection" refers to a technology that organizes and adjusts diverse information according to each individual's interests and purposes, and provides it in the most visually appealing and easy-to-understand format.

[0506] "Visualized content" refers to a form of information transmission that visually represents information that exists as text or data using images, videos, graphs, etc.

[0507] The system for implementing this invention primarily consists of a user terminal, a server, and digital communication between them. The user specifies their areas of interest using an information processing device such as a smartphone, tablet, or computer. This information is then transmitted to the server via the internet. Based on the received information, the server collects relevant data from the digital network. This process utilizes news APIs and web scraping techniques.

[0508] The server summarizes the collected data using natural language processing techniques. This process utilizes libraries such as Python's NLTK to eliminate redundancy and extract key elements. Furthermore, this summarized information is visualized using video editing libraries such as OpenCV, combining text, audio, and visual elements for presentation to the user. The server delivers the generated content to the user's device, enabling a visually engaging experience where the user can select and enjoy the most relevant information from a large amount of information.

[0509] For example, if a user expresses interest in "sustainable energy," the server will collect, summarize, and provide a visualized analysis of relevant cutting-edge technologies and projects. During this process, the user can receive this information via their device while commuting or taking a break. The following are examples of prompts for the generative AI model:

[0510] "Please provide a summary of the latest energy technologies in sustainable smart cities. Specifically, include trends in energy efficiency, renewable energy projects, and urban planning."

[0511] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0512] Step 1:

[0513] The user selects areas of interest on their device. The user chooses categories of interest on their smartphone or computer screen. This action generates data on the selected areas of interest, which is then sent to the server as input data.

[0514] Step 2:

[0515] The server collects relevant information from the digital network. Based on the user's areas of interest data, it uses news APIs and web scraping techniques to retrieve highly relevant information from the internet. The main input used in this process is the user's areas of interest, and the output is the retrieved raw information data.

[0516] Step 3:

[0517] This process summarizes the information collected by the server. Natural language processing is performed on the acquired data using Python's NLTK library, extracting important elements and shortening the information. The input is the data acquired in step 2, and the output is the summarized text data.

[0518] Step 4:

[0519] The server converts the summarized information into a video format with audio. Using video editing libraries such as OpenCV, it visualizes the summarized text data by combining audio, video clips, and images. The output of this process is video data with audio as visual content.

[0520] Step 5:

[0521] The server generates video data and provides it to the user's device. The generated video content is delivered to the user's smartphone or computer via the internet and becomes playable on the device. The input is video data with audio, and the output is content that the user can view.

[0522] Step 6:

[0523] Users view content and provide feedback. Users use a feedback function to provide ratings and opinions on the content they view on their devices, and this information is sent to the server. The input is user feedback data, and the output is data that will contribute to improving the accuracy of future information.

[0524] Step 7:

[0525] The server improves the accuracy of information provided based on feedback. The collected feedback data is analyzed using machine learning algorithms, and this analysis is reflected in future information provision, thereby improving the personalization of information for users. This step generates output data related to information improvements.

[0526] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0527] The embodiments for carrying out this invention will be described in detail. The present invention provides a system that acquires information based on user-specified areas of interest, further recognizes the user's emotions using an emotion engine, and adjusts content delivery accordingly.

[0528] First, the user selects genres of interest through their device. This information is sent to the server and recorded in the database as the user's profile. Based on this profile, the server then collects relevant news and information from the internet.

[0529] The collected information is summarized on the server, and key points are extracted using natural language processing technology. A video is then generated based on this summary, and delivered to the device as optimized content to maintain user interest. The video integrates relevant images and video clips, and the text is narrated using speech synthesis.

[0530] In addition, the system of the present invention is equipped with an emotion engine, which can recognize the user's emotional state in real time. The emotion engine analyzes the user's facial expressions and tone of voice through the camera and microphone to understand the user's emotional state. For example, by sensing changes in facial expressions and voice when the user finds something interesting, the appeal of the content can be evaluated in real time.

[0531] Furthermore, the server uses this emotional data to adjust the tone and pace of the content it delivers according to the user's emotional state. For example, if it determines that the user is excited, it can add more dynamic visuals. The user's emotional data is also collected as feedback and used to improve the personalization algorithm. This makes future information delivery even more accurate and improves user satisfaction.

[0532] For example, if a user is interested in "technology" and is looking for the latest relevant technical information, the server will collect the latest news on AI technology, summarize it, and provide it to the user's device in video format. If the emotion engine detects the user smiling while they are watching, the server can use that information to incorporate similar themes and tones into the next content, thereby increasing user engagement.

[0533] Thus, the present invention realizes efficient and personalized information delivery that responds to the user's interests and emotions, and effectively solves the problems faced by conventional systems.

[0534] The following describes the processing flow.

[0535] Step 1:

[0536] The user uses their device to select areas of interest. The device sends this information to the server, updating the user's profile.

[0537] Step 2:

[0538] The server collects relevant information from reliable news sites and databases based on the user's areas of interest. It retrieves information from multiple data sources and filters out duplicate and irrelevant information.

[0539] Step 3:

[0540] The server analyzes the collected information and extracts key points using natural language processing techniques. The extracted information is then summarized to match the specified video length.

[0541] Step 4:

[0542] The server uses a video generation engine based on the summarized information to create a video that includes visual elements and audio narration. Relevant images and video clips are inserted into the video, and the narration text is subjected to speech synthesis processing.

[0543] Step 5:

[0544] The device receives the generated video and provides it to the user. The device supports video streaming playback, enabling a high-quality viewing experience.

[0545] Step 6:

[0546] While the user watches a video, an emotion engine monitors their emotional state through the camera and microphone. Emotional data is extracted from the user's facial expressions, tone of voice, and other factors.

[0547] Step 7:

[0548] The server analyzes emotional data obtained from the emotion engine and adjusts the content accordingly. If the user is surprised or excited, it instantly adjusts the content to provide dynamic content.

[0549] Step 8:

[0550] After the user finishes watching, a feedback screen is displayed, giving them the opportunity to input their satisfaction level and suggestions for improvement. The device then sends this feedback to the server.

[0551] Step 9:

[0552] The server updates its machine learning algorithms based on feedback and sentiment data to improve the accuracy of the content it delivers next time. This further personalizes the user experience.

[0553] (Example 2)

[0554] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0555] In today's world, the sheer volume of information makes it difficult for users to efficiently access content that interests them. Furthermore, there is a lack of content tailored to users' emotions and preferences, resulting in a decline in the quality of the user experience.

[0556] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0557] In this invention, the server includes means for acquiring information from a communication network based on the user's areas of interest, means for summarizing the acquired information using natural language processing technology, means for converting the summarized information into audio and visual formats, and means for recognizing the user's emotional state and adjusting the content accordingly. This enables the efficient and personalized delivery of content that responds to the user's interests and emotions.

[0558] "User-specified areas of interest" refers to the categories or themes of information that users have selected based on their own interests and preferences.

[0559] "Acquiring relevant information from a communication network" refers to the process of collecting data related to user interests via networks such as the internet.

[0560] "Summarizing using natural language processing techniques" refers to using machine learning and artificial intelligence technologies to extract important information and points from text data and summarize them concisely.

[0561] "Converting to audio and visual formats" refers to processing summarized information into viewable content using speech synthesis and video editing technologies.

[0562] "Providing to user devices" means sending the generated content to the user's terminal and providing it in a viewable state.

[0563] "Recognizing the user's emotional state and adjusting content accordingly" refers to analyzing user facial expressions and voice data collected through sensors such as cameras and microphones, and then changing the content and presentation in real time based on the results.

[0564] "Algorithms for improving personalization performance" refer to computational methods that analyze user feedback and sentiment data to provide content optimized for individual users.

[0565] This invention is a system for providing personalized information based on a user's areas of interest and emotional state. The user uses a terminal to select categories of interest, and this information is sent to a server. The server collects relevant information via a communication network. The collected data is summarized using natural language processing techniques. Examples of techniques used in this process include machine learning models and generative AI models, including advanced technologies such as "BERT" and "GPT."

[0566] The summary information is converted into audio and visual content on the server. This is done using video editing software and speech synthesis technology, and then delivered to the device in a viewable format by a generative AI model. In this process, relevant images and video clips are also integrated and designed to maintain user interest.

[0567] Furthermore, this system can recognize the user's emotional state in real time. It captures the user's facial expressions and voice tone through the camera and microphone on the device, and an emotion recognition engine on the server analyzes this data. Based on this analysis, the server adjusts the tone and pace of the content to improve the user experience.

[0568] For example, if a user is interested in "technology" and is looking for the latest information, the server collects data on the latest technological innovations, summarizes it, and delivers it to the device in video format. In this process, the emotion engine analyzes whether the user is enjoying it and reflects that data in the next content displayed.

[0569] An example of a prompt for a generative AI model is, "A method for summarizing topics related to the latest AI technologies and adjusting the content to suit the user's preferences based on sentiment analysis." This configuration is a feature of the present invention that enhances the user experience.

[0570] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0571] Step 1:

[0572] The user selects areas of interest through their device. The user's chosen interest categories are sent to the server as input. This could involve the user selecting from checkboxes or dropdown menus on the interface. The output is the user's interest data received by the server.

[0573] Step 2:

[0574] The server retrieves relevant information from the network based on the received interest data. The input is the user's interest data, and based on this, the server creates requests and accesses online information sources. Specifically, this includes actions such as the server searching news feeds and databases. The output is a collection of the retrieved relevant information.

[0575] Step 3:

[0576] The server summarizes the acquired information using natural language processing techniques. The input is the information collected in step 2. Specifically, a generative AI model is used to extract key points from the information and create a summary. The output is the summarized text data.

[0577] Step 4:

[0578] The server converts the summary data into audio and visual formats. The input is the summary text created in step 3, and content is created using speech synthesis and video editing software based on this text. Specific operations include inputting the summary text into the speech synthesis engine and selecting video footage. The output is viewable audio and video data.

[0579] Step 5:

[0580] The device captures the user's emotions using its camera and microphone. The input is the user's facial expressions and voice tone in real time. Specifically, this involves the device's sensors recording these and sending the data to a server. The output is the user's emotion data.

[0581] Step 6:

[0582] The server analyzes emotional data and adjusts the tone and pace of the content. The input is the emotional data obtained in step 5. Based on the data, the server optimizes the presentation of the content. Specifically, this could involve adding additional effects to the content or changing the speed of the narration. The output is the adjusted content.

[0583] (Application Example 2)

[0584] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0585] Conventional information delivery systems have difficulty providing content flexibly based on users' interests and emotions, and have failed to increase user engagement. Furthermore, they lacked mechanisms for continuously improving the accuracy of information. Therefore, there is a need for a system that provides highly relevant information tailored to user interests, enables real-time content adjustment based on emotions, and further improves accuracy over time.

[0586] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0587] In this invention, the server includes means for acquiring information related to user-specified interests, means for automatically summarizing the acquired information based on importance, means for converting the summarized content into visual and auditory display formats, means for analyzing the user's emotional state and dynamically adjusting the display content based on the analysis results, and means for collecting user response data and applying a learning model to improve the system's personalization performance. This enables the provision of personalized content based on the user's interests and emotions.

[0588] A "user" is an individual or group that uses a system to obtain information and interact with it.

[0589] "Interest" refers to a user's concern or preference for a particular field or topic.

[0590] "Information" is a general term for content on the internet, including data, news, articles, videos, and audio.

[0591] "To acquire" refers to the act of gathering or collecting necessary information.

[0592] "Importance" is a criterion for evaluating the value and priority of information.

[0593] "Summarizing" refers to the process of extracting the key points from information and putting them into a concise summary.

[0594] "Visual and auditory display formats" refer to methods of conveying information to users visually and aurally by combining images and sounds.

[0595] "Emotional state" refers to psychological or emotional changes that can be interpreted from a user's facial expressions and tone of voice.

[0596] "To analyze" means to examine data and find meaning or patterns in it.

[0597] "Dynamic adjustment" refers to changing and adapting the content and structure of information in real time.

[0598] "Response data" refers to data obtained from user reactions to the system, such as their actions and facial expressions.

[0599] "Personalization capability" refers to the ability of a system to provide services that meet the different needs and preferences of each user.

[0600] A "learning model" is a mathematical framework that uses algorithms to learn patterns based on past data and predict future behaviors and patterns.

[0601] The system for implementing this invention is designed to provide personalized content based on the user's interests and emotions. The server receives data on areas of interest transmitted from the user via a terminal. This data is stored in a database and used as a profile. Next, the server automatically collects information related to these areas of interest from the internet. The collected information is summarized using natural language processing techniques based on its importance.

[0602] The summarized information is then converted into visual and auditory display formats. Here, speech synthesis technology is used to create narration. Video generation software is used to create the video, integrating appropriate visual elements. For example, if the user is interested in "travel," travel guides and the latest tourist spot information are provided as videos.

[0603] Furthermore, the server analyzes the user's emotional state in real time using the camera and microphone built into the device. Here, a machine learning model is used to determine the user's emotions from their facial expressions and tone of voice. Based on this emotion analysis, the server dynamically adjusts the speed and style of the content to optimize the user experience. Specifically, if a smile is detected in the user, similar themes and tones will be emphasized in the next content.

[0604] In addition, user response data is collected by the server. This data is reflected in a learning model and used to improve the accuracy of content delivery. The more repeatedly a user accesses the content, the more accurate the personalization becomes.

[0605] (Example of a prompt message)

[0606] Analyze the user's emotions from their facial expressions and voice, and suggest travel-related content that will be more engaging for them. The target user is a Japanese woman in her 30s, so emphasize plans that appear appealing.

[0607] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0608] Step 1:

[0609] The server receives data on areas of interest that the user has submitted through their device. This data is stored in a database as input, forming a user profile. This profile serves as the basis for subsequent information gathering and content delivery.

[0610] Step 2:

[0611] The server collects information related to the user's areas of interest from the internet based on their user profile. This collection step involves obtaining data from web crawling and news APIs. The input is information about the user's areas of interest, and the output is a collection of related information.

[0612] Step 3:

[0613] The server summarizes the collected information using natural language processing (NLP) techniques. The input is the collection of information obtained in step 2, and importance analysis is performed to generate summary information as output. This summary forms the basis for content generation.

[0614] Step 4:

[0615] The server generates visual and auditory display formats based on the summary information. This process utilizes video generation software and speech synthesis technology. The input is summary information, which is converted into video clips and audio narration, and the output is multimedia content.

[0616] Step 5:

[0617] The device uses the user's camera and microphone to collect emotional states in real time. The collected data is sent to the server as input. The server uses a machine learning model to analyze emotions from facial expressions and voice tone, and outputs the emotion analysis results.

[0618] Step 6:

[0619] The server uses sentiment analysis results to dynamically adjust the speed and style of content. The input is the sentiment analysis results, and the content parameters are modified to enhance the user experience. This provides content output that is appropriate to the user's state.

[0620] Step 7:

[0621] The server collects user response data and updates the learning model to improve the system's personalization capabilities. The input is user feedback, and this data is fed into the machine learning algorithm, resulting in improved adaptive performance as output.

[0622] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0623] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0624] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0625] [Fourth Embodiment]

[0626] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0627] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0628] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0629] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0630] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0631] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0632] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0633] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0634] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0635] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0636] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0637] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0638] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0639] The embodiments for carrying out this invention will be described in detail. This system has a configuration for efficiently providing the latest information based on the user's selected area of ​​interest.

[0640] First, the user uses their device to select their areas of interest. This information is sent to the server and stored in the database as a user profile. This reduces the effort required to select areas of interest again in the future.

[0641] The server collects news and information related to the user's specified areas of interest from the internet, based on their profile. This process utilizes methods to obtain the latest information through multiple news sites and open APIs. The key is ensuring the reliability and real-time nature of the information.

[0642] The collected information is analyzed on a server and summarized using natural language processing technology. During summarization, the most important elements are extracted to match the user's desired video length. In the text summarization stage, the goal is to eliminate redundancy and focus on the essential information the user truly needs.

[0643] Next, the server converts the summarized information into a video format. This process integrates text and related images and video clips in a slideshow format, and adds narration. This makes the information easier for the user to receive visually. In addition, each segment of the video uses visual effects to highlight key phrases and important points.

[0644] The generated video is delivered to the device, and the user watches it through the device. The video played on the device is optimized to minimize buffering, providing a smooth viewing experience.

[0645] After watching a video, users can provide feedback on it. This feedback is sent from the device to the server and used to improve the quality of future information delivery. Based on the user's feedback and viewing history, the server runs machine learning algorithms to further refine personalized information delivery for each user.

[0646] For example, if a user expresses interest in "environmental issues," the server will collect the latest environmental news and summarize it, focusing on important topics such as CO2 emission reduction. The generated video will combine relevant images and narration to explain the current state of global warming to the user in detail. In this way, the present invention realizes the efficient provision of information that meets user needs in an age of information overload.

[0647] The following describes the processing flow.

[0648] Step 1:

[0649] The user selects genres of interest using their device. The device sends this information to the server, where it is stored in the database as the user's profile data.

[0650] Step 2:

[0651] The server collects relevant news and information from the internet based on the user's profile. It consults multiple reliable sources and selects the most relevant information.

[0652] Step 3:

[0653] The server summarizes the collected information. Using natural language processing techniques, it extracts key points and removes redundant parts. As a result, a summary that fits the video time frame set by the user is obtained.

[0654] Step 4:

[0655] The server generates videos based on summarized information. It incorporates images and video clips related to the text and adds narration using speech synthesis technology. This allows information to be conveyed through both visual and auditory means.

[0656] Step 5:

[0657] The device receives the generated video and provides it to the user. The video is optimized for smooth playback on the device.

[0658] Step 6:

[0659] Users watch videos and provide feedback on the quality of the content. The device sends this feedback to the server, which is then used to improve future content.

[0660] Step 7:

[0661] The server receives user feedback and uses machine learning algorithms to improve the accuracy of information provided. This optimizes the selection of news for future users so that it better matches their interests.

[0662] (Example 1)

[0663] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0664] In modern society, information is vast and diverse, making it difficult for users to quickly and efficiently obtain the latest information that perfectly matches their interests. Furthermore, providing relevant information in a visually easy-to-understand format and personalizing that information delivery to each user are also challenges. There is also room for improvement in maintaining the speed and quality of information transmission.

[0665] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0666] In this invention, the server includes means for acquiring information on a user-specified area of ​​interest, means for machine-summarizing the acquired data, and means for converting the summarized data into a video format within a specified period. This makes it possible to quickly gather, summarize, and provide relevant information tailored to the user's interests in a visually easy-to-understand format.

[0667] A "user" is the entity that utilizes a service or system, and is the recipient of information.

[0668] "Areas of interest" refers to the areas of information that a user is particularly interested in and wants to learn more about.

[0669] "Information" includes data, news, and knowledge related to the user's areas of interest.

[0670] A "server" is a computer system used for collecting, processing, storing, and providing information.

[0671] "Summarization" is the process of making acquired information concise and extracting its core essence.

[0672] "Video format" refers to media formats such as videos and slideshows that can convey information visually.

[0673] "Device" refers to an electronic device used by a user to receive and utilize information.

[0674] A "learning method" is a means by which a system analyzes user opinions and behaviors to improve the accuracy and suitability of the information it provides.

[0675] "Textual information" refers to information provided in text format.

[0676] "Voice information" refers to information provided in audio format.

[0677] "Visual material" refers to elements of media that are communicated visually, such as images and video clips.

[0678] In order to implement this invention, it is necessary to build a system in which the user, server, and terminal elements work together.

[0679] Users select their areas of interest using a device. The device has a dedicated application or web interface installed, through which users input their areas of interest and send them to the server. This process allows users to easily specify the information they are interested in.

[0680] The server collects relevant information via the internet based on the user's areas of interest. The server utilizes scraping tools implemented in programming languages ​​such as Python, as well as open APIs. The collected information is automatically summarized using a natural language processing generative AI model. During summarization, redundant parts are removed, and the core information is extracted.

[0681] The server then converts the summarized information into a video format. This video generation uses video editing software such as Adobe Premiere Pro or Final Cut Pro. The text, related images, and video clips are integrated, and voice narration generated using a generative AI model is added to create a complete visual collection.

[0682] The generated video is sent from the server to the user's terminal, where the user uses it to view the information. The terminal plays the video smoothly using a high-speed data transfer protocol, minimizing buffering and providing a comfortable viewing experience.

[0683] After viewing, users can provide feedback through their device. This feedback is sent to the server and stored in a database. The server uses this feedback to run machine learning algorithms and learn to improve the quality and accuracy of the information provided in the future.

[0684] For example, if a user expresses interest in "environmental issues," the server will collect the latest environmental data and summarize key topics related to global warming. The resulting video will combine relevant visual elements with narration to visually convey the latest environmental issues to the user.

[0685] An example of a prompt message might be, "Please summarize the latest news on environmental issues and provide it in a visually easy-to-understand format."

[0686] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0687] Step 1:

[0688] The user selects their area of ​​interest using a terminal. As input, the user enters their area of ​​interest (e.g., "environmental issues") into the terminal interface. This information is sent from the terminal to the server. The terminal accurately receives the user's selection and provides it to the server as area of ​​interest information. This prepares the server to provide the user with the most relevant information.

[0689] Step 2:

[0690] The server collects relevant information from the internet based on the user's areas of interest. It receives area of ​​interest information as input and uses scraping tools and open APIs to collect data. The output is a collection of various pieces of information that match the user's areas of interest. The server verifies the reliability and relevance of the collected data and constructs the necessary datasets for the next step.

[0691] Step 3:

[0692] The server summarizes the collected information using natural language processing techniques. The input is the data obtained through information gathering. A generative AI model is used to eliminate redundancy and extract key points, generating a concise summary. The output is the summarized text data. Through this summary, the server prepares to provide the user with the essential information they need.

[0693] Step 4:

[0694] The server converts summarized information into a video format. Using summarized text as input, it collects relevant images and video clips to create a slideshow. Video editing software is used to add voice narration generated by a generative AI model and integrate the visual elements. The output of this process is a visually easy-to-understand video content.

[0695] Step 5:

[0696] The generated video is sent from the server to the terminal. The server takes the video file to be sent as input and transmits it to the terminal using an efficient data transmission protocol. The output is a video file viewable on the terminal. After receiving the video, the terminal displays it in an environment optimized for smooth playback.

[0697] Step 6:

[0698] Users watch videos through their devices and provide feedback. As input, they enter their impressions and evaluations based on the videos they've watched into the device interface. This feedback is sent from the device to the server and stored in a database. The output is user feedback data, which the server uses to personalize and improve the accuracy of future information provision.

[0699] (Application Example 1)

[0700] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0701] In modern society, information overload makes it difficult for users to efficiently obtain the information they need. Furthermore, in the field of smart cities, there is a lack of means to quickly and easily access the latest information. In addition, personalized information tailored to the diverse interests of each user is insufficient, highlighting the need for improved user experience.

[0702] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0703] In this invention, the server includes means for receiving data related to user-specified areas of interest, means for collecting relevant information from a digital network, means for automatically summarizing the collected information, and means for integrating information from different categories and optimizing the displayed content based on the user's selection. This allows the user to receive the latest information based on their interests in a convenient, visualized format, enabling them to acquire information more accurately and quickly.

[0704] "Data related to user-specified areas of interest" refers to a collection of information related to specific areas or themes that individual users are interested in.

[0705] "Means of collecting relevant information from digital networks" refers to devices or software that have the function of finding and collecting data related to a specified area of ​​interest through electronic data communication networks such as the internet.

[0706] A "means for automatically summarizing collected information" is a system that analyzes collected data, extracts important elements from it, and transforms them into a short, concise form.

[0707] "A means of aggregating information from different categories and optimizing displayed content based on user selection" refers to a technology that organizes and adjusts diverse information according to each individual's interests and purposes, and provides it in the most visually appealing and easy-to-understand format.

[0708] "Visualized content" refers to a form of information transmission that visually represents information that exists as text or data using images, videos, graphs, etc.

[0709] The system for implementing this invention primarily consists of a user terminal, a server, and digital communication between them. The user specifies their areas of interest using an information processing device such as a smartphone, tablet, or computer. This information is then transmitted to the server via the internet. Based on the received information, the server collects relevant data from the digital network. This process utilizes news APIs and web scraping techniques.

[0710] The server summarizes the collected data using natural language processing techniques. This process utilizes libraries such as Python's NLTK to eliminate redundancy and extract key elements. Furthermore, this summarized information is visualized using video editing libraries such as OpenCV, combining text, audio, and visual elements for presentation to the user. The server delivers the generated content to the user's device, enabling a visually engaging experience where the user can select and enjoy the most relevant information from a large amount of information.

[0711] For example, if a user expresses interest in "sustainable energy," the server will collect, summarize, and provide a visualized analysis of relevant cutting-edge technologies and projects. During this process, the user can receive this information via their device while commuting or taking a break. The following are examples of prompts for the generative AI model:

[0712] "Please provide a summary of the latest energy technologies in sustainable smart cities. Specifically, include trends in energy efficiency, renewable energy projects, and urban planning."

[0713] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0714] Step 1:

[0715] The user selects areas of interest on their device. The user chooses categories of interest on their smartphone or computer screen. This action generates data on the selected areas of interest, which is then sent to the server as input data.

[0716] Step 2:

[0717] The server collects relevant information from the digital network. Based on the user's areas of interest data, it uses news APIs and web scraping techniques to retrieve highly relevant information from the internet. The main input used in this process is the user's areas of interest, and the output is the retrieved raw information data.

[0718] Step 3:

[0719] This process summarizes the information collected by the server. Natural language processing is performed on the acquired data using Python's NLTK library, extracting important elements and shortening the information. The input is the data acquired in step 2, and the output is the summarized text data.

[0720] Step 4:

[0721] The server converts the summarized information into a video format with audio. Using video editing libraries such as OpenCV, it visualizes the summarized text data by combining audio, video clips, and images. The output of this process is video data with audio as visual content.

[0722] Step 5:

[0723] The server generates video data and provides it to the user's device. The generated video content is delivered to the user's smartphone or computer via the internet and becomes playable on the device. The input is video data with audio, and the output is content that the user can view.

[0724] Step 6:

[0725] Users view content and provide feedback. Users use a feedback function to provide ratings and opinions on the content they view on their devices, and this information is sent to the server. The input is user feedback data, and the output is data that will contribute to improving the accuracy of future information.

[0726] Step 7:

[0727] The server improves the accuracy of information provided based on feedback. The collected feedback data is analyzed using machine learning algorithms, and this analysis is reflected in future information provision, thereby improving the personalization of information for users. This step generates output data related to information improvements.

[0728] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0729] The embodiments for carrying out this invention will be described in detail. The present invention provides a system that acquires information based on user-specified areas of interest, further recognizes the user's emotions using an emotion engine, and adjusts content delivery accordingly.

[0730] First, the user selects genres of interest through their device. This information is sent to the server and recorded in the database as the user's profile. Based on this profile, the server then collects relevant news and information from the internet.

[0731] The collected information is summarized on the server, and key points are extracted using natural language processing technology. A video is then generated based on this summary, and delivered to the device as optimized content to maintain user interest. The video integrates relevant images and video clips, and the text is narrated using speech synthesis.

[0732] In addition, the system of the present invention is equipped with an emotion engine, which can recognize the user's emotional state in real time. The emotion engine analyzes the user's facial expressions and tone of voice through the camera and microphone to understand the user's emotional state. For example, by sensing changes in facial expressions and voice when the user finds something interesting, the appeal of the content can be evaluated in real time.

[0733] Furthermore, the server uses this emotional data to adjust the tone and pace of the content it delivers according to the user's emotional state. For example, if it determines that the user is excited, it can add more dynamic visuals. The user's emotional data is also collected as feedback and used to improve the personalization algorithm. This makes future information delivery even more accurate and improves user satisfaction.

[0734] For example, if a user is interested in "technology" and is looking for the latest relevant technical information, the server will collect the latest news on AI technology, summarize it, and provide it to the user's device in video format. If the emotion engine detects the user smiling while they are watching, the server can use that information to incorporate similar themes and tones into the next content, thereby increasing user engagement.

[0735] Thus, the present invention realizes efficient and personalized information delivery that responds to the user's interests and emotions, and effectively solves the problems faced by conventional systems.

[0736] The following describes the processing flow.

[0737] Step 1:

[0738] The user uses their device to select areas of interest. The device sends this information to the server, updating the user's profile.

[0739] Step 2:

[0740] The server collects relevant information from reliable news sites and databases based on the user's areas of interest. It retrieves information from multiple data sources and filters out duplicate and irrelevant information.

[0741] Step 3:

[0742] The server analyzes the collected information and extracts key points using natural language processing techniques. The extracted information is then summarized to match the specified video length.

[0743] Step 4:

[0744] The server uses a video generation engine based on the summarized information to create a video that includes visual elements and audio narration. Relevant images and video clips are inserted into the video, and the narration text is subjected to speech synthesis processing.

[0745] Step 5:

[0746] The device receives the generated video and provides it to the user. The device supports video streaming playback, enabling a high-quality viewing experience.

[0747] Step 6:

[0748] While the user watches a video, an emotion engine monitors their emotional state through the camera and microphone. Emotional data is extracted from the user's facial expressions, tone of voice, and other factors.

[0749] Step 7:

[0750] The server analyzes emotional data obtained from the emotion engine and adjusts the content accordingly. If the user is surprised or excited, it instantly adjusts the content to provide dynamic content.

[0751] Step 8:

[0752] After the user finishes watching, a feedback screen is displayed, giving them the opportunity to input their satisfaction level and suggestions for improvement. The device then sends this feedback to the server.

[0753] Step 9:

[0754] The server updates its machine learning algorithms based on feedback and sentiment data to improve the accuracy of the content it delivers next time. This further personalizes the user experience.

[0755] (Example 2)

[0756] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0757] In today's world, the sheer volume of information makes it difficult for users to efficiently access content that interests them. Furthermore, there is a lack of content tailored to users' emotions and preferences, resulting in a decline in the quality of the user experience.

[0758] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0759] In this invention, the server includes means for acquiring information from a communication network based on the user's areas of interest, means for summarizing the acquired information using natural language processing technology, means for converting the summarized information into audio and visual formats, and means for recognizing the user's emotional state and adjusting the content accordingly. This enables the efficient and personalized delivery of content that responds to the user's interests and emotions.

[0760] "User-specified areas of interest" refers to the categories or themes of information that users have selected based on their own interests and preferences.

[0761] "Acquiring relevant information from a communication network" refers to the process of collecting data related to user interests via networks such as the internet.

[0762] "Summarizing using natural language processing techniques" refers to using machine learning and artificial intelligence technologies to extract important information and points from text data and summarize them concisely.

[0763] "Converting to audio and visual formats" refers to processing summarized information into viewable content using speech synthesis and video editing technologies.

[0764] "Providing to user devices" means sending the generated content to the user's terminal and providing it in a viewable state.

[0765] "Recognizing the user's emotional state and adjusting content accordingly" refers to analyzing user facial expressions and voice data collected through sensors such as cameras and microphones, and then changing the content and presentation in real time based on the results.

[0766] "Algorithms for improving personalization performance" refer to computational methods that analyze user feedback and sentiment data to provide content optimized for individual users.

[0767] This invention is a system for providing personalized information based on a user's areas of interest and emotional state. The user uses a terminal to select categories of interest, and this information is sent to a server. The server collects relevant information via a communication network. The collected data is summarized using natural language processing techniques. Examples of techniques used in this process include machine learning models and generative AI models, including advanced technologies such as "BERT" and "GPT."

[0768] The summary information is converted into audio and visual content on the server. This is done using video editing software and speech synthesis technology, and then delivered to the device in a viewable format by a generative AI model. In this process, relevant images and video clips are also integrated and designed to maintain user interest.

[0769] Furthermore, this system can recognize the user's emotional state in real time. It captures the user's facial expressions and voice tone through the camera and microphone on the device, and an emotion recognition engine on the server analyzes this data. Based on this analysis, the server adjusts the tone and pace of the content to improve the user experience.

[0770] For example, if a user is interested in "technology" and is looking for the latest information, the server collects data on the latest technological innovations, summarizes it, and delivers it to the device in video format. In this process, the emotion engine analyzes whether the user is enjoying it and reflects that data in the next content displayed.

[0771] An example of a prompt for a generative AI model is, "A method for summarizing topics related to the latest AI technologies and adjusting the content to suit the user's preferences based on sentiment analysis." This configuration is a feature of the present invention that enhances the user experience.

[0772] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0773] Step 1:

[0774] The user selects areas of interest through their device. The user's chosen interest categories are sent to the server as input. This could involve the user selecting from checkboxes or dropdown menus on the interface. The output is the user's interest data received by the server.

[0775] Step 2:

[0776] The server retrieves relevant information from the network based on the received interest data. The input is the user's interest data, and based on this, the server creates requests and accesses online information sources. Specifically, this includes actions such as the server searching news feeds and databases. The output is a collection of the retrieved relevant information.

[0777] Step 3:

[0778] The server summarizes the acquired information using natural language processing techniques. The input is the information collected in step 2. Specifically, a generative AI model is used to extract key points from the information and create a summary. The output is the summarized text data.

[0779] Step 4:

[0780] The server converts the summary data into audio and visual formats. The input is the summary text created in step 3, and content is created using speech synthesis and video editing software based on this text. Specific operations include inputting the summary text into the speech synthesis engine and selecting video footage. The output is viewable audio and video data.

[0781] Step 5:

[0782] The device captures the user's emotions using its camera and microphone. The input is the user's facial expressions and voice tone in real time. Specifically, this involves the device's sensors recording these and sending the data to a server. The output is the user's emotion data.

[0783] Step 6:

[0784] The server analyzes emotional data and adjusts the tone and pace of the content. The input is the emotional data obtained in step 5. Based on the data, the server optimizes the presentation of the content. Specifically, this could involve adding additional effects to the content or changing the speed of the narration. The output is the adjusted content.

[0785] (Application Example 2)

[0786] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0787] Conventional information delivery systems have difficulty providing content flexibly based on users' interests and emotions, and have failed to increase user engagement. Furthermore, they lacked mechanisms for continuously improving the accuracy of information. Therefore, there is a need for a system that provides highly relevant information tailored to user interests, enables real-time content adjustment based on emotions, and further improves accuracy over time.

[0788] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0789] In this invention, the server includes means for acquiring information related to user-specified interests, means for automatically summarizing the acquired information based on importance, means for converting the summarized content into visual and auditory display formats, means for analyzing the user's emotional state and dynamically adjusting the display content based on the analysis results, and means for collecting user response data and applying a learning model to improve the system's personalization performance. This enables the provision of personalized content based on the user's interests and emotions.

[0790] A "user" is an individual or group that uses a system to obtain information and interact with it.

[0791] "Interest" refers to a user's concern or preference for a particular field or topic.

[0792] "Information" is a general term for content on the internet, including data, news, articles, videos, and audio.

[0793] "To acquire" refers to the act of gathering or collecting necessary information.

[0794] "Importance" is a criterion for evaluating the value and priority of information.

[0795] "Summarizing" refers to the process of extracting the key points from information and putting them into a concise summary.

[0796] "Visual and auditory display formats" refer to methods of conveying information to users visually and aurally by combining images and sounds.

[0797] "Emotional state" refers to psychological or emotional changes that can be interpreted from a user's facial expressions and tone of voice.

[0798] "To analyze" means to examine data and find meaning or patterns in it.

[0799] "Dynamic adjustment" refers to changing and adapting the content and structure of information in real time.

[0800] "Response data" refers to data obtained from user reactions to the system, such as their actions and facial expressions.

[0801] "Personalization capability" refers to the ability of a system to provide services that meet the different needs and preferences of each user.

[0802] A "learning model" is a mathematical framework that uses algorithms to learn patterns based on past data and predict future behaviors and patterns.

[0803] The system for implementing this invention is designed to provide personalized content based on the user's interests and emotions. The server receives data on areas of interest transmitted from the user via a terminal. This data is stored in a database and used as a profile. Next, the server automatically collects information related to these areas of interest from the internet. The collected information is summarized using natural language processing techniques based on its importance.

[0804] The summarized information is then converted into visual and auditory display formats. Here, speech synthesis technology is used to create narration. Video generation software is used to create the video, integrating appropriate visual elements. For example, if the user is interested in "travel," travel guides and the latest tourist spot information are provided as videos.

[0805] Furthermore, the server analyzes the user's emotional state in real time using the camera and microphone built into the device. Here, a machine learning model is used to determine the user's emotions from their facial expressions and tone of voice. Based on this emotion analysis, the server dynamically adjusts the speed and style of the content to optimize the user experience. Specifically, if a smile is detected in the user, similar themes and tones will be emphasized in the next content.

[0806] In addition, user response data is collected by the server. This data is reflected in a learning model and used to improve the accuracy of content delivery. The more repeatedly a user accesses the content, the more accurate the personalization becomes.

[0807] (Example of a prompt message)

[0808] Analyze the user's emotions from their facial expressions and voice, and suggest travel-related content that will be more engaging for them. The target user is a Japanese woman in her 30s, so emphasize plans that appear appealing.

[0809] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0810] Step 1:

[0811] The server receives data on areas of interest that the user has submitted through their device. This data is stored in a database as input, forming a user profile. This profile serves as the basis for subsequent information gathering and content delivery.

[0812] Step 2:

[0813] The server collects information related to the user's areas of interest from the internet based on their user profile. This collection step involves obtaining data from web crawling and news APIs. The input is information about the user's areas of interest, and the output is a collection of related information.

[0814] Step 3:

[0815] The server summarizes the collected information using natural language processing (NLP) techniques. The input is the collection of information obtained in step 2, and importance analysis is performed to generate summary information as output. This summary forms the basis for content generation.

[0816] Step 4:

[0817] The server generates visual and auditory display formats based on the summary information. This process utilizes video generation software and speech synthesis technology. The input is summary information, which is converted into video clips and audio narration, and the output is multimedia content.

[0818] Step 5:

[0819] The device uses the user's camera and microphone to collect emotional states in real time. The collected data is sent to the server as input. The server uses a machine learning model to analyze emotions from facial expressions and voice tone, and outputs the emotion analysis results.

[0820] Step 6:

[0821] The server uses sentiment analysis results to dynamically adjust the speed and style of content. The input is the sentiment analysis results, and the content parameters are modified to enhance the user experience. This provides content output that is appropriate to the user's state.

[0822] Step 7:

[0823] The server collects user response data and updates the learning model to improve the system's personalization capabilities. The input is user feedback, and this data is fed into the machine learning algorithm, resulting in improved adaptive performance as output.

[0824] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0825] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0826] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0827] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0828] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0829] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0830] The inside of the Emotion Map 400 represents what's in your mind, while the outside represents what you're doing. Therefore, the further you go out the 400-coordinate scale, the more visible your emotions become (the more they manifest in your actions).

[0831] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0832] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0833] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0834] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0835] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0836] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0837] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0838] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0839] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0840] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0841] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0842] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0843] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0844] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0845] The following is further disclosed regarding the embodiments described above.

[0846] (Claim 1)

[0847] A means of receiving data related to the user's specified areas of interest,

[0848] A means of collecting relevant information from the internet based on the areas of interest received,

[0849] A means of automatically summarizing the collected information,

[0850] A means of converting summarized information into a video format within a specified time,

[0851] A means of providing the generated video to the user's terminal,

[0852] A means of collecting user feedback and implementing a learning algorithm to improve the system's personalization performance,

[0853] A system that includes this.

[0854] (Claim 2)

[0855] The system according to claim 1, comprising a machine learning model for improving the accuracy of information provided in subsequent instances based on user feedback.

[0856] (Claim 3)

[0857] The system according to claim 1, comprising means for displaying a combination of text information, audio information, and visual materials in a generated video.

[0858] "Example 1"

[0859] (Claim 1)

[0860] A means of obtaining information about the user's specified area of ​​interest,

[0861] A means of obtaining relevant data from the global information network based on the acquired area of ​​interest,

[0862] A means of summarizing the acquired data using a machine,

[0863] A means of converting summarized data into a video format within a specified period,

[0864] Means for providing the generated video to the user device,

[0865] A means of collecting user feedback and implementing learning methods to improve the system's personalization capabilities,

[0866] A means for integrating and displaying text information, audio information, and visual materials in the generated video,

[0867] A system that includes this.

[0868] (Claim 2)

[0869] The system according to claim 1, comprising a computer learning model for improving the accuracy of information provided in subsequent instances based on user feedback.

[0870] (Claim 3)

[0871] The system according to claim 1, comprising means for transmitting the generated video at high speed and minimizing buffer time.

[0872] "Application Example 1"

[0873] (Claim 1)

[0874] A means of receiving data related to the user's specified areas of interest,

[0875] A means of collecting relevant information from digital networks based on the areas of interest received,

[0876] A means of automatically summarizing the collected information,

[0877] A means of converting summarized information into a video format with audio within a specified time,

[0878] A means for providing the generated video and audio content to the user's information processing device,

[0879] A means of collecting user feedback and performing machine learning processing to improve the system's adaptive performance,

[0880] A means of aggregating information from different categories and optimizing the displayed content based on user selection,

[0881] A system that includes this.

[0882] (Claim 2)

[0883] The system according to claim 1, comprising a machine learning module for improving the relevance of information provided in subsequent instances based on user feedback.

[0884] (Claim 3)

[0885] The system according to claim 1, comprising means for displaying a combination of text data, audio data, and visual materials in the generated video.

[0886] "Example 2 of combining an emotion engine"

[0887] (Claim 1)

[0888] A means of receiving information about areas of interest specified by the user,

[0889] A means of obtaining relevant information from a communication network based on the received area of ​​interest,

[0890] A means of summarizing acquired information using natural language processing technology,

[0891] Means for converting summarized information into audio and visual formats within a specified time,

[0892] Means for providing generated audio and visual content to user devices,

[0893] A means of recognizing the user's emotional state and adjusting the tone and pace of the content provided based on that data,

[0894] A means of collecting user response data and implementing algorithms to improve the personalization performance of the system,

[0895] A system that includes this.

[0896] (Claim 2)

[0897] The system according to claim 1, comprising an analytical model for improving the accuracy of information provided in subsequent instances based on the user's emotional state.

[0898] (Claim 3)

[0899] The system according to claim 1, comprising means for integrating and providing audio information, text information, and visual materials in the generated viewing content.

[0900] "Application example 2 when combining with an emotional engine"

[0901] (Claim 1)

[0902] A means of obtaining information related to the user's specified interests,

[0903] A means of automatically summarizing acquired information based on its importance,

[0904] Means for converting summarized content into visual and auditory display formats,

[0905] A means of analyzing the user's emotional state and dynamically adjusting the displayed content based on the analysis results,

[0906] Means for providing generated visual and auditory information to terminal devices,

[0907] A means for collecting user response data and applying a learning model to improve the system's personalization performance,

[0908] A system that includes this.

[0909] (Claim 2)

[0910] The system according to claim 1, comprising a computational model for enhancing the accuracy of information subsequently provided based on user response data.

[0911] (Claim 3)

[0912] The system according to claim 1, comprising means for integrating and displaying linguistic information, audio output, and video material with generated visual information. [Explanation of symbols]

[0913] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means of receiving data related to the user's specified areas of interest, A means of collecting relevant information from digital networks based on the areas of interest received, A means of automatically summarizing the collected information, A means of converting summarized information into a video format with audio within a specified time, A means for providing the generated video and audio content to the user's information processing device, A means of collecting user feedback and performing machine learning processing to improve the system's adaptive performance, A means of aggregating information from different categories and optimizing the displayed content based on user selection, A system that includes this.

2. The system according to claim 1, comprising a machine learning module for improving the relevance of information provided in subsequent instances based on user feedback.

3. The system according to claim 1, comprising means for displaying a combination of text data, audio data, and visual materials in the generated video.