system
The system addresses the challenge of efficiently accessing and purchasing books by providing audio summaries and personalized recommendations based on user input and emotional analysis, enhancing information acquisition and purchase efficiency.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SOFTBANK GROUP CORP
- Filing Date
- 2024-12-16
- Publication Date
- 2026-06-26
AI Technical Summary
Users face challenges in quickly and efficiently accessing and understanding information resources, particularly books, due to limited time and visual or time constraints, with insufficient means for efficient information acquisition and selection.
A system that receives user input, identifies relevant publications using a generative model, provides audio summaries, and allows direct purchase through online sales platforms, incorporating emotion analysis for personalized recommendations.
Enables efficient information acquisition and purchase of relevant publications, tailored to users' emotional states, facilitating quick and informed decisions.
Smart Images

Figure 2026105401000001_ABST
Abstract
Description
Technical Field
[0001] The technology of the present disclosure relates to a system.
Background Art
[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance as a response to the user utterance.
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] In modern society, while users can access many information sources, they have limited time to quickly deepen their understanding of specific publications. Therefore, many people have difficulty in choosing books and also face the problem of not being able to obtain sufficient information before purchasing books. In addition, for users with visual or time constraints, there is also a problem that there is a lack of means to efficiently access and understand books.
Means for Solving the Problems
[0005] This invention employs means for receiving search information from users and identifying relevant publications based on that information, and means for using a generative model to generate summaries of the identified publications. Furthermore, by providing the generated summaries as audio data, it enables efficient information acquisition, including for users with visual or time constraints. In addition, by providing a mechanism that allows users to directly purchase publications through summaries and share evaluation information through integration with online sales platforms, it enables users to quickly select and understand appropriate publications.
[0006] A "user" is an individual or organization that uses an online service to search for information on publications and receive summaries.
[0007] "Target information" refers to information necessary to identify a publication, such as the book's title, author, and genre, which is entered by the user.
[0008] A "publication" is a publicly available written or digital information medium, such as a book, magazine, or ebook.
[0009] "Identifying information" means selecting relevant publications from a database based on the target information received from the user.
[0010] A "generative model" is an algorithm or system that uses natural language processing techniques to automatically generate summaries of publications.
[0011] "Audio data" refers to digital data obtained by converting a generated summary into an audio format.
[0012] An "online sales platform" is an e-commerce service that allows users to purchase books via the internet.
[0013] A "link" is a hypertext link or URL that allows a user to access an online sales platform from the summary and purchase the publication.
[0014] "Evaluation information" refers to feedback that includes users' impressions and opinions on publications, and is information that is shared with other users. [Brief explanation of the drawing]
[0015] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] This shows an emotion map where multiple emotions are mapped. [Figure 10] This shows an emotion map where multiple emotions are mapped. [Figure 11] This is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] This is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] This is a sequence diagram showing the processing flow of the data processing system in Example 2, when an emotion engine is combined. [Figure 14]It is a sequence diagram showing the processing flow of a data processing system in Application Example 2 when a sentiment engine is combined.
Embodiments for Carrying Out the Invention
[0016] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.
[0017] First, the terms used in the following description will be explained.
[0018] In the following embodiments, a numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.
[0019] In the following embodiments, a numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.
[0020] In the following embodiments, a numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, etc.
[0021] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).
[0022] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."
[0023] [First Embodiment]
[0024] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.
[0025] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.
[0026] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0027] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.
[0028] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.
[0029] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.
[0030] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.
[0031] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.
[0032] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.
[0033] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0034] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0035] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0036] The embodiments for carrying out the present invention are described in detail below.
[0037] This system uses a terminal equipped with an interface for users to input target information such as book titles, author names, and genres. The terminal receives the user's input and sends a search request to the server based on it. The server analyzes the received information and identifies relevant publications from its database.
[0038] The server utilizes a generative model employing natural language processing techniques to automatically generate summaries of identified publications. These summaries are also provided as audio data, making them accessible to users both visually and aurally. The terminal then plays this audio data, enabling users to receive the necessary information auditorily.
[0039] Furthermore, the server provides links to relevant online sales platforms based on the generated summaries, allowing users to purchase publications directly. These links enable users to easily proceed with the purchase process after reviewing the summaries.
[0040] Furthermore, users can submit their feedback and ratings of publications through an interface. The device sends this feedback to a server, which stores it in a database and shares it with other users.
[0041] As a concrete example, if a user is searching for "fantasy novels," they would type "fantasy" into their device. The device sends this information to a server, which generates a list of relevant fantasy novels and summaries. The user selects a work that interests them and receives information directly through auditory means by playing the summary. If the user wishes to purchase the work after listening to the summary, they can access the sales platform via the provided link and easily complete the purchase. Afterwards, the user can enter a review of the work they have read and share it with other users.
[0042] This allows users to select the book that best suits them, even within a limited time, and gain a deeper understanding.
[0043] The following describes the processing flow.
[0044] Step 1:
[0045] The user opens the terminal interface and enters relevant information such as the book title, author name, and genre into the search bar. Once the input is complete, they press the search button.
[0046] Step 2:
[0047] The terminal receives the user's search request and sends that information to the server as an HTTP request. The request includes the information entered by the user and asks the server to process it.
[0048] Step 3:
[0049] The server parses the received request and generates a database query to search for information on relevant publications. In this process, it obtains information on books that match the user's request as structured data.
[0050] Step 4:
[0051] Based on the book information obtained as search results, the server invokes a generative model using natural language processing techniques to generate summaries of each identified publication.
[0052] Step 5:
[0053] The server then initiates the process of converting the generated summary into audio data. This audio data is then adjusted so that the user can receive the summary audibly.
[0054] Step 6:
[0055] The terminal provides the user with summary information and audio data received from the server. The user can select a summary of interest from the list of book information displayed on the terminal's interface and listen to the audio summary by pressing the play button.
[0056] Step 7:
[0057] After the user reviews the summary, if they wish to purchase, they will access the online sales platform directly via a link on their device. This link will lead to the purchase page for the selected publication.
[0058] Step 8:
[0059] After purchase, users can enter their feedback and ratings for the publication via their device. The entered rating information is then sent from the device to the server.
[0060] Step 9:
[0061] The server stores the received evaluation information in a database and updates the data so that it can be referenced in future searches and by other users. This information can also be used to make suggestions and recommendations to other users.
[0062] (Example 1)
[0063] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0064] A challenge lies in the lack of appropriate methods for quickly and efficiently acquiring and understanding information resources. In particular, a consistent system is needed that allows users to obtain summaries of information resources not only visually but also aurally, and to easily purchase related information resources and share evaluation information.
[0065] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0066] In this invention, the server includes a device for receiving higher-level category information from a user and identifying information on related information resources based on that information; a device for using a generative model to generate an overview of the identified information resources; and a device for providing the generated overview as acoustic data. This enables users to efficiently select and deeply understand information resources, as well as to share purchase and evaluation information.
[0067] "Higher-level category information" refers to general identifying information related to a specific subject, and serves as the basis for users to search for information resources.
[0068] "Information resources" refer to a collection of information provided to users, and include a variety of media such as books, documents, and articles.
[0069] "Device" refers to a mechanical or electronic component designed to perform a specific function, including hardware and software for performing a specific process.
[0070] A "generative model" is an algorithm or system designed to produce output based on input data, and in particular uses natural language processing techniques to generate summaries of information resources.
[0071] "Audio data" refers to digital audio data used to transmit information as sound, providing users with information through means other than sight.
[0072] An "e-commerce platform" refers to an online marketplace used by users to purchase products and services online, enabling the trading of a wide variety of goods and services.
[0073] This system consists of user terminals and a server that processes data. Users can input specific higher-level category information through the terminal's interface. The input information is then transmitted from the terminal to the server. The server utilizes a large database to quickly identify information resources. The server is equipped with database management software for executing SQL queries.
[0074] The server uses the received information to identify relevant information resources and automatically generates summaries of those resources using a generative AI model. The generative AI model incorporates algorithms that utilize natural language processing techniques. This technology allows for the extraction of key points from large amounts of information and the presentation of summaries in an easily understandable format for the user.
[0075] The generated summary is converted into audio data and provided to the user via the device. Software for generating speech from text is used for this conversion. This allows the user to receive information not only visually but also aurally.
[0076] The server also has the functionality to generate connection destinations that facilitate direct connections to e-commerce platforms from summaries. This allows users to quickly purchase relevant information resources via their devices. Furthermore, users can input their impressions and feedback via their devices after reading and send them to the server, and this information is shared with other users.
[0077] As a concrete example, if a user is searching for "historical novels," they would type "historical novels" into their device. The device would send this information to the server, which would then compile information on relevant historical novels and generate a summary. The summary would then be made available for audible review, and relevant purchase links would be provided. An example of a prompt would be, "Please recommend some books on historical novels." This system allows users to efficiently identify information resources even with limited information and make quick purchases.
[0078] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0079] Step 1:
[0080] The user uses their device to input higher-level category information to search for a specific information resource. This action generates a search query as text data, which is then provided to the device as input data. The device receives this input and prepares to send it to the server.
[0081] Step 2:
[0082] The terminal sends input data obtained from the user to the server. The server receives the received text data as input and begins data analysis to search for relevant information resources. This analysis generates SQL queries based on the input categories and retrieves information resources from the data store.
[0083] Step 3:
[0084] The server uses SQL queries to search for information resources within the data store. Based on the search results, it outputs a list of candidate information resources. This list is used as input for subsequent processing. The server prepares the candidate list to be provided as input data to the generating AI model.
[0085] Step 4:
[0086] The server uses a generative AI model to generate a summary of the candidate list provided as input data. In this process, natural language processing is used to extract the key points of the information resources and create a concise summary. The generated summary is output as text data.
[0087] Step 5:
[0088] The server initiates a process to convert the generated summary into audio data. This process uses text-to-speech (TTS) software, enabling the information to be presented to the user audibly. The server then prepares to send the audio data to the terminal.
[0089] Step 6:
[0090] The device plays audio data received from the server, allowing users to listen to summaries in audio format. After obtaining information through audio, users can purchase information resources of interest via links to e-commerce platforms provided by the server.
[0091] Step 7:
[0092] After using the information resources, the user inputs feedback information through the terminal's interface. The terminal then prepares to send this feedback data to the server. Once this information is sent, the server stores the feedback information in its data store and outputs it in a format that can be shared with other users.
[0093] (Application Example 1)
[0094] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."
[0095] In today's information-saturated world, it's difficult for users to efficiently find content that interests them. Furthermore, they need to understand the content within a limited timeframe and smoothly complete the entire purchase process. However, existing systems require considerable effort and time to retrieve text information and complete the purchase process, placing a burden on users. This challenge needs to be addressed.
[0096] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0097] In this invention, the server includes means for receiving target information from a user and processing it to identify information about relevant content; means for generating a summary of the identified content using a generative model; and means for providing and streaming the generated summary as audio data. This allows users to efficiently obtain information about content of interest via audio and to proceed smoothly with subsequent purchasing procedures.
[0098] "User-provided target information" refers to information about books or content that the user is searching for, including input such as genre and title.
[0099] "Related content information" refers to detailed data on books and content that match the user's interests and search queries, identified based on the target information provided by the user.
[0100] A "generative model" is an algorithm that utilizes artificial intelligence technology to automatically create a summary of content from input data.
[0101] "Means of providing and streaming as audio data" refers to the technology and processes for converting the generated summary into an audio format so that users can listen to it in real time.
[0102] An "online sales system" refers to a platform that allows users to purchase content via the internet, enabling them to directly complete the purchase process.
[0103] "Rating information" refers to data that users input and share their impressions and ratings of content, and which is accessible to other users.
[0104] The system that implements this invention consists of a user, a server, and a terminal. Each component and its processing are described in detail below.
[0105] The server is primarily responsible for processing and generating data, and has a search function to identify relevant content based on information provided by the user. To achieve this, the server utilizes a generative AI model with natural language processing technology to extract relevant content from the database based on the genre and title specified by the user. In this process, the generative model uses an algorithm that analyzes text data and generates a summary.
[0106] The generated content summary is processed in real time as audio data and transmitted to the user's device via streaming technology. This process utilizes a speech synthesis system to generate the audio data. The speech synthesis system converts text into audio data, allowing the user to listen instead of reading.
[0107] If a user is interested in the content they have watched, they can easily purchase it through a link to the provided online sales system. This integration provides the device with a seamless experience for the user. After completing the purchase process, the user can also enter and share their evaluation of the content. The device sends this evaluation information to the server, which stores it in a database. The evaluation information becomes accessible to other users.
[0108] For example, if a user wants to research "near-future technology and science fiction," they enter the relevant information on their smartphone. They then listen to smoothly streaming audio, review the content, and if interested, easily purchase it through the platform.
[0109] An example of a prompt message is: "Please provide a program that generates a list of the latest books on near-future technology and science fiction, and then streams an audio summary of the first book."
[0110] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0111] Step 1:
[0112] The user enters relevant information, such as genre and title, into the device. Once the user has finished entering the information, it is sent to the server by the device. The entered information is then used for data processing in the next step.
[0113] Step 2:
[0114] The server searches the database based on the received target information and identifies relevant content. In this process, a generative AI model analyzes the input data using natural language processing and extracts relevant content. The extracted information is output as basic data for summary generation.
[0115] Step 3:
[0116] The server uses a generative AI model to create a summary of the identified content. This process uses data calculations to extract the key parts of the content as a summary and converts them into text format. The generated summary becomes the input for speech conversion in the next step.
[0117] Step 4:
[0118] The server converts the generated summary into audio data using speech synthesis. In the speech conversion process, the text data is converted into digital audio data using a synthesis algorithm, and this audio data is streamed to the terminal. The output audio data is available for the user to listen to.
[0119] Step 5:
[0120] When a user listens to the provided audio data and wishes to purchase content they are interested in, the device displays a link to an online sales system received from the server. This link is used as input for the purchase process. Using the link, the user can easily purchase the content.
[0121] Step 6:
[0122] After a user uses content, they enter their evaluation information into their device. The device then sends this evaluation information to the server. The server stores the received information in a database, making it accessible to other users. This evaluation data will be used in the future to select recommended content, among other things.
[0123] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0124] The embodiments for carrying out the present invention are described in detail below.
[0125] This system uses a terminal with an initial interface where the user inputs target information such as the book title, author name, and genre. Once the user has finished inputting, the terminal sends that information to the server as an HTTP request. The server analyzes this information and identifies the relevant publications from its database.
[0126] In particular, this invention incorporates an emotion engine that also collects additional information, including the user's emotions. The emotion engine has the function of analyzing the user's emotional state from the user's input, voice tone, text patterns, and so on.
[0127] The server utilizes generative models based on natural language processing technology to automatically create summaries of identified publications. These summaries are tailored to the user's emotional state, allowing for more emotionally richer expression. The summaries are also converted into audio data, making them accessible to the user aurally.
[0128] Users can not only grasp the book's overview by playing this audio summary through their device, but also receive information on recommended books based on the emotion engine's analysis. This feature makes it easy for users to find books that match their current emotions.
[0129] Furthermore, if a user is interested after reviewing the summary, they can purchase the publication directly through a link provided by the server. This link facilitates access to the appropriate online sales platform for the user. In addition, users can enter their impressions and ratings after reading through the terminal interface, and this information is stored in a database by the server and shared with other users.
[0130] For example, if a user is feeling stressed and wants to find recommended books on relaxation, the user's input and emotional tone are analyzed by the emotion engine. If the analysis indicates a desire to "calm down," the system prioritizes summarizing and recommending relaxation books and content optimized for that emotion.
[0131] This system allows users to receive more personalized information and suggestions, enabling them to make high-quality choices and purchases even within a limited timeframe.
[0132] The following describes the processing flow.
[0133] Step 1:
[0134] The user opens the device's interface, enters the book title, author name, or genre information into the search bar, and also describes their emotional state by typing or voice. The device collects this information.
[0135] Step 2:
[0136] The device compiles the collected information into an HTTP request and sends it to the server. The request includes book information and user sentiment information.
[0137] Step 3:
[0138] The server receives the request, generates and executes a query to retrieve information about relevant publications from the database, taking into account the user's sentiment.
[0139] Step 4:
[0140] The server generates summaries using a generative model that employs natural language processing techniques, based on the publication information in the search results. The generated summaries are then adjusted by an emotion engine to match the user's emotions.
[0141] Step 5:
[0142] The server converts the adjusted summary into audio data and provides it to the terminal in a streamable format.
[0143] Step 6:
[0144] The device displays and plays received summaries and audio data on the user interface, allowing users to receive the necessary information visually and audibly.
[0145] Step 7:
[0146] Users can review the played summary, and if interested, click the link on their device to access the online sales platform directly and purchase the book. This link will take them to the purchase page for the book.
[0147] Step 8:
[0148] After purchase, users enter their evaluation information about the publication from their device and submit their feedback. The device then sends this information to the server.
[0149] Step 9:
[0150] The server stores the received evaluation information in a database and updates the information so that other users can access it. This facilitates information sharing within the user community.
[0151] (Example 2)
[0152] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".
[0153] In modern information retrieval systems, users need to select appropriate content from a large amount of information, and this process is cumbersome. Furthermore, it is difficult to present information flexibly according to the user's emotional state, making it challenging to provide information that meets individual needs.
[0154] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0155] In this invention, the server includes means for analyzing the user's emotional state and identifying information in relevant publications; means for using a generative model to generate a summary tailored to the user's emotional state; and means for providing the generated summary in audio format. This enables the provision of information customized based on the user's emotions.
[0156] A "user" is an individual or group that uses a system or service.
[0157] "Target information" refers to information about the data that the user wishes to search for or retrieve, and includes specific book titles, author names, genres, etc.
[0158] "Emotional state" refers to the psychological or emotional state a user experiences when entering information or performing a search, and is part of the data that the system analyzes.
[0159] A "generative model" is an algorithm or machine learning program that uses natural language processing techniques to automatically create summaries of publications.
[0160] "Audio format" refers to the conversion of generated text data into audio data that users can access aurally.
[0161] A "link" is a means of connection that allows a user to directly access another online platform or webpage by clicking on it.
[0162] "Rating information" refers to data that expresses users' impressions and opinions about publications, and functions as feedback that is shared with other users.
[0163] An "information storage device" is an electronic device or system for storing and managing information, such as a database or cloud storage.
[0164] "Natural language processing technology" refers to the technology that enables computers to understand, analyze, and generate human language, and is used for sentiment analysis of text and automatic summarization.
[0165] This invention begins with a user inputting target information such as book title, author name, and genre using an information terminal. The terminal formats the input information as structured data and sends it to the server via an HTTP request. The server plays a central role in data processing and utilizes speech analysis systems and natural language processing technologies to analyze the received information.
[0166] The server analyzes the user's emotional state using an emotion engine, along with the information provided by the user. This analysis employs methods to evaluate voice tone and text patterns, thereby identifying the user's psychological state. Specifically, it uses Google's® speech recognition API to convert voice data into text, and NLTK, a natural language processing library suitable for text analysis.
[0167] The server, having received the analysis results, identifies relevant publications from its database and automatically generates summaries using a generative AI model. This process utilizes large-scale generative AI models, such as the GPT (Generative Pre-trained Transformer) series. The summaries are adjusted according to the user's emotional state, and the generated summaries are converted into audio format. Text-to-speech technologies such as Amazon Polly and Google Text-to-Speech are used for speech synthesis.
[0168] The user receives an audio summary provided by their device. By listening to this audio summary, the user can easily grasp the book's overview. In addition, the server also provides information on recommended books optimized for the user's emotions.
[0169] Interested users can purchase related publications directly online via links provided by the server. These links facilitate access to appropriate online sales platforms. User feedback and ratings after purchase are entered via their devices and stored in a database by the server. This information is shared with other users and helps improve the system's recommendation accuracy.
[0170] For example, if a user enters the prompt, "I'm looking for a book to relieve stress. I want to calm down," the emotion engine analyzes that psychological state as "I want to calm down." Based on this result, the server generates summaries of relaxation-related books and provides them as audio. By listening to this, the user can easily find a book that matches their mood.
[0171] As described above, the present invention provides a system that responds to the individual needs of users and enables the provision of information that is tailored to their emotions.
[0172] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0173] Step 1:
[0174] The user uses the terminal to input relevant information such as book title, author name, and genre. The entered information is stored on the terminal as text data. Once the user has entered all the information, the terminal converts this information into structured data in JSON format and prepares it for transmission to the server.
[0175] Step 2:
[0176] The device sends structured data to the server as an HTTP request. The sent request includes book information entered by the user and prompts for sentiment estimation. The server receives the request and validates its contents for data analysis. The server performs this process using backend technologies such as Python or Node.js.
[0177] Step 3:
[0178] The server searches the database based on the received data. Data queries use SQL or NoSQL to identify relevant publication information. This search process extracts records that match the keywords entered by the user. Search results include information such as book title, author name, and ISBN.
[0179] Step 4:
[0180] The server activates an emotion engine and analyzes the prompts and voice data sent by the user. Here, the speech recognition system converts the data into text, and then a natural language processing algorithm is applied. As a result of the analysis, the user's emotional state is identified, and this information, along with search results from the database, is used for the next processing step.
[0181] Step 5:
[0182] The server generates summaries of publications using a generative AI model. The AI model automatically generates summaries that take into account the acquired publication information and the user's emotional state. This generative model uses natural language processing techniques, and the generated summaries are emotionally optimized.
[0183] Step 6:
[0184] The server converts the generated summary into audio data. Text-to-speech technology is used for this conversion. During this synthesis process, the generated summary is prepared to be delivered to the user as natural-sounding speech.
[0185] Step 7:
[0186] The server sends audio data to the terminal, allowing the user to play an audio summary. Through this audio, the user can listen to and understand the book's overview. Audio playback allows the user to acquire information without relying on visual cues.
[0187] Step 8:
[0188] Users input their impressions and book ratings after listening to an audio summary into their device. The device sends this information to a server, which stores the received ratings in a database. This allows users to share their ratings with other users and contribute to improving the system's recommendation accuracy.
[0189] (Application Example 2)
[0190] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".
[0191] Modern users lead busy lives and often find it difficult to find the right publications quickly. Furthermore, there is a lack of easy ways to find content that resonates with their current emotions. Additionally, there is a lack of systems that provide emotionally relevant summaries and support quick purchase decisions. Technologies that effectively address these challenges are needed.
[0192] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0193] In this invention, the server includes means for receiving subject information and emotional state from the user and identifying relevant publications based thereon; means for using a generative model that generates summaries of publications tailored to the user's emotions; and means for providing the summaries as emotionally expressed audio data. This enables the user to quickly identify content that matches their current emotions and smoothly proceed through the purchase process.
[0194] "User-generated information" refers to data entered by users regarding books, authors, genres, etc.
[0195] "Emotional state" refers to information that indicates the user's current emotions and is obtained by analyzing their voice tone and text patterns.
[0196] "Relevant publications" refers to books and literary works that are identified as suitable based on the user's input and emotional state.
[0197] A "generative model" is an algorithm or program that uses natural language processing techniques to generate summaries of identified publications.
[0198] "Emotionally rich expression as audio data" refers to the process of converting the generated summary into audio and providing expression that includes tone and nuances that correspond to the user's emotions.
[0199] "Links to purchase publications via online purchasing platforms" refers to means of connecting to e-commerce sites where publications can be purchased, which are directly accessible from the summary.
[0200] "Rating information" refers to data that users record within the system, sharing their impressions and ratings of publications with other users.
[0201] The system implementing this invention identifies relevant publications based on user input information and emotions, provides them as audio summaries, and enables immediate purchase. The hardware and software configuration of this system is described below.
[0202] The terminal uses a smartphone or tablet as its user interface. On this terminal, an emotion analysis API for analyzing emotional states and a data input module for receiving user input run. For example, a general natural language processing API (e.g., OpenAI's emotion analysis tool) can be used for the emotion analysis API.
[0203] The server plays a central role in managing data processing and generative models. It receives information from users as HTTP requests and searches for relevant publications in a database (e.g., an SQL-based book database). It uses natural language processing techniques to generate summaries using generative AI models (e.g., GPT models).
[0204] The generated summary is converted into emotionally resonant audio data using a speech synthesis API (e.g., Google Cloud Text-to-Speech). This audio data is sent to the user's device, allowing them to listen to it in audio format.
[0205] Furthermore, as a purchase function, the server integrates with online sales platforms and generates appropriate purchase links according to user requests. By using electronic payment APIs (e.g., Stripe), a seamless purchase process can be achieved.
[0206] For example, if a user is feeling the emotion of "wanting to calm down," the emotion analysis API analyzes the emotions related to "relaxation" based on the information received from the device. The server identifies the relevant relaxation book and creates a summary using a generative AI model. Then, using a speech synthesis API, this summary can be converted into a pleasant tone of voice and delivered to the user.
[0207] An example of a prompt for a generative AI model might be: "Current emotion: I want to calm down. Please summarize books on relaxation. Please use a warm tone that matches my emotion."
[0208] In this way, a system is provided that allows users to instantly select publications that match their emotions, listen to summaries, and purchase them in a smooth and seamless manner.
[0209] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0210] Step 1:
[0211] The device receives information and voice tone from the user as input. This includes book title, author name, genre, etc. Once the user has finished inputting, it calls an emotion analysis API to analyze the voice tone and identify the emotional state.
[0212] Step 2:
[0213] The server uses the target information and identified emotional state received from the terminal as input to query the database. It constructs a query and retrieves data to identify related publications. The retrieved data is output as information on multiple related books.
[0214] Step 3:
[0215] The server takes relevant book information as input and uses a generative AI model to generate summaries of each book. By creating prompts and passing them to the model, it obtains summaries that utilize natural language processing techniques as output. These summaries are adapted to the user's emotions.
[0216] Step 4:
[0217] The server passes the generated summary as input to the speech synthesis API. The speech synthesis API converts the summary into audio data and outputs it as audio that matches the emotional state. The audio data is sent to the terminal.
[0218] Step 5:
[0219] The device plays audio data received from the server and provides the user with an audio summary. The user listens to the audio and proceeds if they are interested. Once the audio playback is complete, the device waits for the user's selection.
[0220] Step 6:
[0221] The server generates links to online purchase platforms for books the user has shown interest in. It uses a generated AI model's summary and link information as input and sends it to the user's device. The user can then purchase the book directly by clicking this link.
[0222] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.
[0223] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0224] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.
[0225] [Second Embodiment]
[0226] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.
[0227] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.
[0228] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0229] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.
[0230] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0231] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0232] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0233] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0234] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0235] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0236] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0237] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0238] The embodiments for carrying out the present invention are described in detail below.
[0239] This system uses a terminal equipped with an interface for users to input target information such as book titles, author names, and genres. The terminal receives the user's input and sends a search request to the server based on it. The server analyzes the received information and identifies relevant publications from its database.
[0240] The server utilizes a generative model employing natural language processing techniques to automatically generate summaries of identified publications. These summaries are also provided as audio data, making them accessible to users both visually and aurally. The terminal then plays this audio data, enabling users to receive the necessary information auditorily.
[0241] Furthermore, the server provides links to relevant online sales platforms based on the generated summaries, allowing users to purchase publications directly. These links enable users to easily proceed with the purchase process after reviewing the summaries.
[0242] Furthermore, users can submit their feedback and ratings of publications through an interface. The device sends this feedback to a server, which stores it in a database and shares it with other users.
[0243] As a concrete example, if a user is searching for "fantasy novels," they would type "fantasy" into their device. The device sends this information to a server, which generates a list of relevant fantasy novels and summaries. The user selects a work that interests them and receives information directly through auditory means by playing the summary. If the user wishes to purchase the work after listening to the summary, they can access the sales platform via the provided link and easily complete the purchase. Afterwards, the user can enter a review of the work they have read and share it with other users.
[0244] This allows users to select the book that best suits them, even within a limited time, and gain a deeper understanding.
[0245] The following describes the processing flow.
[0246] Step 1:
[0247] The user opens the terminal interface and enters relevant information such as the book title, author name, and genre into the search bar. Once the input is complete, they press the search button.
[0248] Step 2:
[0249] The terminal receives the user's search request and sends that information to the server as an HTTP request. The request includes the information entered by the user and asks the server to process it.
[0250] Step 3:
[0251] The server parses the received request and generates a database query to search for information on relevant publications. In this process, it obtains information on books that match the user's request as structured data.
[0252] Step 4:
[0253] Based on the book information obtained as search results, the server invokes a generative model using natural language processing techniques to generate summaries of each identified publication.
[0254] Step 5:
[0255] The server then initiates the process of converting the generated summary into audio data. This audio data is then adjusted so that the user can receive the summary audibly.
[0256] Step 6:
[0257] The terminal provides the user with summary information and audio data received from the server. The user can select a summary of interest from the list of book information displayed on the terminal's interface and listen to the audio summary by pressing the play button.
[0258] Step 7:
[0259] After the user reviews the summary, if they wish to purchase, they will access the online sales platform directly via a link on their device. This link will lead to the purchase page for the selected publication.
[0260] Step 8:
[0261] After purchase, users can enter their feedback and ratings for the publication via their device. The entered rating information is then sent from the device to the server.
[0262] Step 9:
[0263] The server stores the received evaluation information in a database and updates the data so that it can be referenced in future searches and by other users. This information can also be used to make suggestions and recommendations to other users.
[0264] (Example 1)
[0265] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0266] A challenge lies in the lack of appropriate methods for quickly and efficiently acquiring and understanding information resources. In particular, a consistent system is needed that allows users to obtain summaries of information resources not only visually but also aurally, and to easily purchase related information resources and share evaluation information.
[0267] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0268] In this invention, the server includes a device for receiving higher-level category information from a user and identifying information on related information resources based on that information; a device for using a generative model to generate an overview of the identified information resources; and a device for providing the generated overview as acoustic data. This enables users to efficiently select and deeply understand information resources, as well as to share purchase and evaluation information.
[0269] "Higher-level category information" refers to general identifying information related to a specific subject, and serves as the basis for users to search for information resources.
[0270] "Information resources" refer to a collection of information provided to users, and include a variety of media such as books, documents, and articles.
[0271] "Device" refers to a mechanical or electronic component designed to perform a specific function, including hardware and software for performing a specific process.
[0272] A "generative model" is an algorithm or system designed to produce output based on input data, and in particular uses natural language processing techniques to generate summaries of information resources.
[0273] "Audio data" refers to digital audio data used to transmit information as sound, providing users with information through means other than sight.
[0274] An "e-commerce platform" refers to an online marketplace used by users to purchase products and services online, enabling the trading of a wide variety of goods and services.
[0275] This system consists of user terminals and a server that processes data. Users can input specific higher-level category information through the terminal's interface. The input information is then transmitted from the terminal to the server. The server utilizes a large database to quickly identify information resources. The server is equipped with database management software for executing SQL queries.
[0276] The server uses the received information to identify relevant information resources and automatically generates summaries of those resources using a generative AI model. The generative AI model incorporates algorithms that utilize natural language processing techniques. This technology allows for the extraction of key points from large amounts of information and the presentation of summaries in an easily understandable format for the user.
[0277] The generated summary is converted into audio data and provided to the user via the device. Software for generating speech from text is used for this conversion. This allows the user to receive information not only visually but also aurally.
[0278] The server also has the functionality to generate connection destinations that facilitate direct connections to e-commerce platforms from summaries. This allows users to quickly purchase relevant information resources via their devices. Furthermore, users can input their impressions and feedback via their devices after reading and send them to the server, and this information is shared with other users.
[0279] As a concrete example, if a user is searching for "historical novels," they would type "historical novels" into their device. The device would send this information to the server, which would then compile information on relevant historical novels and generate a summary. The summary would then be made available for audible review, and relevant purchase links would be provided. An example of a prompt would be, "Please recommend some books on historical novels." This system allows users to efficiently identify information resources even with limited information and make quick purchases.
[0280] The flow of the specific process in Example 1 will be described using FIG. 11.
[0281] Step 1:
[0282] The user uses their terminal to input the top - category information for searching for specific information resources. By this operation, a search query as text data is generated and provided to the terminal as input data. The terminal receives this input and prepares to send it to the server.
[0283] Step 2:
[0284] The terminal sends the input data obtained from the user to the server. The server receives the received text data as input and starts data analysis for searching for relevant information resources. By this analysis, an SQL query based on the input category is generated, and information resources are obtained from the data store.
[0285] Step 3:
[0286] The server uses the SQL query to search for information resources in the data store. Based on the obtained search results, a candidate list of relevant information resources is output. This list is used as input for subsequent processing. The server prepares to provide the candidate list to the generation AI model as input data.
[0287] Step 4:
[0288] The server uses the generation AI model to generate a summary of the candidate list provided as input data. At this time, natural language processing is used to extract the main points of the information resources and create a concise summary. The generated summary is output as text data.
[0289] Step 5:
[0290] The server initiates a process to convert the generated summary into audio data. This process uses text-to-speech (TTS) software, enabling the information to be presented to the user audibly. The server then prepares to send the audio data to the terminal.
[0291] Step 6:
[0292] The device plays audio data received from the server, allowing users to listen to summaries in audio format. After obtaining information through audio, users can purchase information resources of interest via links to e-commerce platforms provided by the server.
[0293] Step 7:
[0294] After using the information resources, the user inputs feedback information through the terminal's interface. The terminal then prepares to send this feedback data to the server. Once this information is sent, the server stores the feedback information in its data store and outputs it in a format that can be shared with other users.
[0295] (Application Example 1)
[0296] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0297] In today's information-saturated world, it's difficult for users to efficiently find content that interests them. Furthermore, they need to understand the content within a limited timeframe and smoothly complete the entire purchase process. However, existing systems require considerable effort and time to retrieve text information and complete the purchase process, placing a burden on users. This challenge needs to be addressed.
[0298] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0299] In this invention, the server includes means for receiving target information from a user and processing it to identify information about relevant content; means for generating a summary of the identified content using a generative model; and means for providing and streaming the generated summary as audio data. This allows users to efficiently obtain information about content of interest via audio and to proceed smoothly with subsequent purchasing procedures.
[0300] "User-provided target information" refers to information about books or content that the user is searching for, including input such as genre and title.
[0301] "Related content information" refers to detailed data on books and content that match the user's interests and search queries, identified based on the target information provided by the user.
[0302] A "generative model" is an algorithm that utilizes artificial intelligence technology to automatically create a summary of content from input data.
[0303] "Means of providing and streaming as audio data" refers to the technology and processes for converting the generated summary into an audio format so that users can listen to it in real time.
[0304] An "online sales system" refers to a platform that allows users to purchase content via the internet, enabling them to directly complete the purchase process.
[0305] "Rating information" refers to data that users input and share their impressions and ratings of content, and which is accessible to other users.
[0306] The system that implements this invention consists of a user, a server, and a terminal. Each component and its processing are described in detail below.
[0307] The server is mainly responsible for data processing and generation, and has a search function to identify relevant content based on the information provided by the user. To achieve this, the server utilizes a generative AI model using natural language processing technology to extract corresponding content from the database based on the genre or title specified by the user. In this process, the generative model uses an algorithm to analyze text data and generate a summary.
[0308] The summary of the generated content is processed in real-time as audio data and transmitted to the user's terminal via streaming technology. In this process, a speech synthesis system for generating audio data is utilized. The speech synthesis system converts text into audio data, enabling the user to listen with their ears instead of reading with their eyes.
[0309] If the user is interested in the content they have viewed, they can easily purchase the content through the provided link to the online sales system. This integration provides a seamless experience for the user on the terminal. After the purchase procedure is completed, the user can input and share evaluation information about the content. The terminal transmits this evaluation information to the server, and the server saves it in the database. The evaluation information can be viewed by other users.
[0310] As a specific example, if a user wants to search for "near-future technology and science fiction", the user inputs the corresponding information from their smartphone. Then, they can listen to the smoothly streamed audio, check the content, and if they are interested, they can easily purchase it through the platform.
[0311] An example of a prompt sentence is "Please provide a program that generates a list of the latest books on near-future technology and science fiction, obtains a summary of the first book, and streams it as audio."
[0312] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0313] Step 1:
[0314] The user enters relevant information, such as genre and title, into the device. Once the user has finished entering the information, it is sent to the server by the device. The entered information is then used for data processing in the next step.
[0315] Step 2:
[0316] The server searches the database based on the received target information and identifies relevant content. In this process, a generative AI model analyzes the input data using natural language processing and extracts relevant content. The extracted information is output as basic data for summary generation.
[0317] Step 3:
[0318] The server uses a generative AI model to create a summary of the identified content. This process uses data calculations to extract the key parts of the content as a summary and converts them into text format. The generated summary becomes the input for speech conversion in the next step.
[0319] Step 4:
[0320] The server converts the generated summary into audio data using speech synthesis. In the speech conversion process, the text data is converted into digital audio data using a synthesis algorithm, and this audio data is streamed to the terminal. The output audio data is available for the user to listen to.
[0321] Step 5:
[0322] When a user listens to the provided audio data and wishes to purchase content they are interested in, the device displays a link to an online sales system received from the server. This link is used as input for the purchase process. Using the link, the user can easily purchase the content.
[0323] Step 6:
[0324] After a user uses content, they enter their evaluation information into their device. The device then sends this evaluation information to the server. The server stores the received information in a database, making it accessible to other users. This evaluation data will be used in the future to select recommended content, among other things.
[0325] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0326] The embodiments for carrying out the present invention are described in detail below.
[0327] This system uses a terminal with an initial interface where the user inputs target information such as the book title, author name, and genre. Once the user has finished inputting, the terminal sends that information to the server as an HTTP request. The server analyzes this information and identifies the relevant publications from its database.
[0328] In particular, this invention incorporates an emotion engine that also collects additional information, including the user's emotions. The emotion engine has the function of analyzing the user's emotional state from the user's input, voice tone, text patterns, and so on.
[0329] The server utilizes generative models based on natural language processing technology to automatically create summaries of identified publications. These summaries are tailored to the user's emotional state, allowing for more emotionally richer expression. The summaries are also converted into audio data, making them accessible to the user aurally.
[0330] Users can not only grasp the book's overview by playing this audio summary through their device, but also receive information on recommended books based on the emotion engine's analysis. This feature makes it easy for users to find books that match their current emotions.
[0331] Furthermore, if a user is interested after reviewing the summary, they can purchase the publication directly through a link provided by the server. This link facilitates access to the appropriate online sales platform for the user. In addition, users can enter their impressions and ratings after reading through the terminal interface, and this information is stored in a database by the server and shared with other users.
[0332] For example, if a user is feeling stressed and wants to find recommended books on relaxation, the user's input and emotional tone are analyzed by the emotion engine. If the analysis indicates a desire to "calm down," the system prioritizes summarizing and recommending relaxation books and content optimized for that emotion.
[0333] This system allows users to receive more personalized information and suggestions, enabling them to make high-quality choices and purchases even within a limited timeframe.
[0334] The following describes the processing flow.
[0335] Step 1:
[0336] The user opens the device's interface, enters the book title, author name, or genre information into the search bar, and also describes their emotional state by typing or voice. The device collects this information.
[0337] Step 2:
[0338] The device compiles the collected information into an HTTP request and sends it to the server. The request includes book information and user sentiment information.
[0339] Step 3:
[0340] The server receives the request, generates and executes a query to retrieve information about relevant publications from the database, taking into account the user's sentiment.
[0341] Step 4:
[0342] The server generates summaries using a generative model that employs natural language processing techniques, based on the publication information in the search results. The generated summaries are then adjusted by an emotion engine to match the user's emotions.
[0343] Step 5:
[0344] The server converts the adjusted summary into audio data and provides it to the terminal in a streamable format.
[0345] Step 6:
[0346] The device displays and plays received summaries and audio data on the user interface, allowing users to receive the necessary information visually and audibly.
[0347] Step 7:
[0348] Users can review the played summary, and if interested, click the link on their device to access the online sales platform directly and purchase the book. This link will take them to the purchase page for the book.
[0349] Step 8:
[0350] After purchase, users enter their evaluation information about the publication from their device and submit their feedback. The device then sends this information to the server.
[0351] Step 9:
[0352] The server stores the received evaluation information in a database and updates the information so that other users can access it. This facilitates information sharing within the user community.
[0353] (Example 2)
[0354] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".
[0355] In modern information retrieval systems, users need to select appropriate content from a large amount of information, and this process is cumbersome. Furthermore, it is difficult to present information flexibly according to the user's emotional state, making it challenging to provide information that meets individual needs.
[0356] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0357] In this invention, the server includes means for analyzing the user's emotional state and identifying information in relevant publications; means for using a generative model to generate a summary tailored to the user's emotional state; and means for providing the generated summary in audio format. This enables the provision of information customized based on the user's emotions.
[0358] A "user" is an individual or group that uses a system or service.
[0359] "Target information" refers to information about the data that the user wishes to search for or retrieve, and includes specific book titles, author names, genres, etc.
[0360] "Emotional state" refers to the psychological or emotional state a user experiences when entering information or performing a search, and is part of the data that the system analyzes.
[0361] A "generative model" is an algorithm or machine learning program that uses natural language processing techniques to automatically create summaries of publications.
[0362] "Audio format" refers to the conversion of generated text data into audio data that users can access aurally.
[0363] A "link" is a means of connection that allows a user to directly access another online platform or webpage by clicking on it.
[0364] "Rating information" refers to data that expresses users' impressions and opinions about publications, and functions as feedback that is shared with other users.
[0365] An "information storage device" is an electronic device or system for storing and managing information, such as a database or cloud storage.
[0366] "Natural language processing technology" refers to the technology that enables computers to understand, analyze, and generate human language, and is used for sentiment analysis of text and automatic summarization.
[0367] This invention begins with a user inputting target information such as book title, author name, and genre using an information terminal. The terminal formats the input information as structured data and sends it to the server via an HTTP request. The server plays a central role in data processing and utilizes speech analysis systems and natural language processing technologies to analyze the received information.
[0368] The server analyzes the user's emotional state using an emotion engine, along with the information provided by the user. This analysis employs methods to evaluate voice tone and text patterns, thereby identifying the user's psychological state. Specifically, it uses Google's speech recognition API to convert voice data into text, and NLTK, a natural language processing library suitable for text analysis.
[0369] The server, having received the analysis results, identifies relevant publications from its database and automatically generates summaries using a generative AI model. This process utilizes large-scale generative AI models, such as the GPT (Generative Pre-trained Transformer) series. The summaries are adjusted according to the user's emotional state, and the generated summaries are converted into audio format. Text-to-speech technologies such as Amazon Polly and Google Text-to-Speech are used for speech synthesis.
[0370] The user receives an audio summary provided by their device. By listening to this audio summary, the user can easily grasp the book's overview. In addition, the server also provides information on recommended books optimized for the user's emotions.
[0371] Interested users can purchase related publications directly online via links provided by the server. These links facilitate access to appropriate online sales platforms. User feedback and ratings after purchase are entered via their devices and stored in a database by the server. This information is shared with other users and helps improve the system's recommendation accuracy.
[0372] For example, if a user enters the prompt, "I'm looking for a book to relieve stress. I want to calm down," the emotion engine analyzes that psychological state as "I want to calm down." Based on this result, the server generates summaries of relaxation-related books and provides them as audio. By listening to this, the user can easily find a book that matches their mood.
[0373] As described above, the present invention provides a system that responds to the individual needs of users and enables the provision of information that is tailored to their emotions.
[0374] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0375] Step 1:
[0376] The user uses the terminal to input relevant information such as book title, author name, and genre. The entered information is stored on the terminal as text data. Once the user has entered all the information, the terminal converts this information into structured data in JSON format and prepares it for transmission to the server.
[0377] Step 2:
[0378] The device sends structured data to the server as an HTTP request. The sent request includes book information entered by the user and prompts for sentiment estimation. The server receives the request and validates its contents for data analysis. The server performs this process using backend technologies such as Python or Node.js.
[0379] Step 3:
[0380] The server searches the database based on the received data. Data queries use SQL or NoSQL to identify relevant publication information. This search process extracts records that match the keywords entered by the user. Search results include information such as book title, author name, and ISBN.
[0381] Step 4:
[0382] The server activates an emotion engine and analyzes the prompts and voice data sent by the user. Here, the speech recognition system converts the data into text, and then a natural language processing algorithm is applied. As a result of the analysis, the user's emotional state is identified, and this information, along with search results from the database, is used for the next processing step.
[0383] Step 5:
[0384] The server generates summaries of publications using a generative AI model. The AI model automatically generates summaries that take into account the acquired publication information and the user's emotional state. This generative model uses natural language processing techniques, and the generated summaries are emotionally optimized.
[0385] Step 6:
[0386] The server converts the generated summary into audio data. Text-to-speech technology is used for this conversion. During this synthesis process, the generated summary is prepared to be delivered to the user as natural-sounding speech.
[0387] Step 7:
[0388] The server sends audio data to the terminal, allowing the user to play an audio summary. Through this audio, the user can listen to and understand the book's overview. Audio playback allows the user to acquire information without relying on visual cues.
[0389] Step 8:
[0390] Users input their impressions and book ratings after listening to an audio summary into their device. The device sends this information to a server, which stores the received ratings in a database. This allows users to share their ratings with other users and contribute to improving the system's recommendation accuracy.
[0391] (Application Example 2)
[0392] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."
[0393] Modern users lead busy lives and often find it difficult to find the right publications quickly. Furthermore, there is a lack of easy ways to find content that resonates with their current emotions. Additionally, there is a lack of systems that provide emotionally relevant summaries and support quick purchase decisions. Technologies that effectively address these challenges are needed.
[0394] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0395] In this invention, the server includes means for receiving subject information and emotional state from the user and identifying relevant publications based thereon; means for using a generative model that generates summaries of publications tailored to the user's emotions; and means for providing the summaries as emotionally expressed audio data. This enables the user to quickly identify content that matches their current emotions and smoothly proceed through the purchase process.
[0396] "User-generated information" refers to data entered by users regarding books, authors, genres, etc.
[0397] "Emotional state" refers to information that indicates the user's current emotions and is obtained by analyzing their voice tone and text patterns.
[0398] "Relevant publications" refers to books and literary works that are identified as suitable based on the user's input and emotional state.
[0399] A "generative model" is an algorithm or program that uses natural language processing techniques to generate summaries of identified publications.
[0400] "Emotionally rich expression as audio data" refers to the process of converting the generated summary into audio and providing expression that includes tone and nuances that correspond to the user's emotions.
[0401] "Links to purchase publications via online purchasing platforms" refers to means of connecting to e-commerce sites where publications can be purchased, which are directly accessible from the summary.
[0402] "Rating information" refers to data that users record within the system, sharing their impressions and ratings of publications with other users.
[0403] The system implementing this invention identifies relevant publications based on user input information and emotions, provides them as audio summaries, and enables immediate purchase. The hardware and software configuration of this system is described below.
[0404] The terminal uses a smartphone or tablet as the user interface. On this terminal, an emotion analysis API for analyzing emotional states and a data input module for receiving user input run. For example, a general natural language processing API (e.g., OpenAI's emotion analysis tool) can be used for the emotion analysis API.
[0405] The server plays a central role in managing data processing and generative models. It receives information from users as HTTP requests and searches for relevant publications in a database (e.g., an SQL-based book database). It uses natural language processing techniques to generate summaries using generative AI models (e.g., GPT models).
[0406] The generated summary is converted into emotionally resonant audio data using a speech synthesis API (e.g., Google Cloud Text-to-Speech). This audio data is sent to the user's device, allowing them to listen to it in audio format.
[0407] Furthermore, as a purchase function, the server integrates with online sales platforms and generates appropriate purchase links according to user requests. By using electronic payment APIs (e.g., Stripe), a seamless purchase process can be achieved.
[0408] For example, if a user is feeling the emotion of "wanting to calm down," the emotion analysis API analyzes the emotions related to "relaxation" based on the information received from the device. The server identifies the relevant relaxation book and creates a summary using a generative AI model. Then, using a speech synthesis API, this summary can be converted into a pleasant tone of voice and delivered to the user.
[0409] An example of a prompt for a generative AI model might be: "Current emotion: I want to calm down. Please summarize books on relaxation. Please use a warm tone that matches my emotion."
[0410] In this way, a system is provided that allows users to instantly select publications that match their emotions, listen to summaries, and purchase them in a smooth and seamless manner.
[0411] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0412] Step 1:
[0413] The device receives information and voice tone from the user as input. This includes book title, author name, genre, etc. Once the user has finished inputting, it calls an emotion analysis API to analyze the voice tone and identify the emotional state.
[0414] Step 2:
[0415] The server uses the target information and identified emotional state received from the terminal as input to query the database. It constructs a query and retrieves data to identify related publications. The retrieved data is output as information on multiple related books.
[0416] Step 3:
[0417] The server takes relevant book information as input and uses a generative AI model to generate summaries of each book. By creating prompts and passing them to the model, it obtains summaries that utilize natural language processing techniques as output. These summaries are adapted to the user's emotions.
[0418] Step 4:
[0419] The server passes the generated summary as input to the speech synthesis API. The speech synthesis API converts the summary into audio data and outputs it as audio that matches the emotional state. The audio data is sent to the terminal.
[0420] Step 5:
[0421] The device plays audio data received from the server and provides the user with an audio summary. The user listens to the audio and proceeds if they are interested. Once the audio playback is complete, the device waits for the user's selection.
[0422] Step 6:
[0423] The server generates links to online purchase platforms for books the user has shown interest in. It uses a generated AI model's summary and link information as input and sends it to the user's device. The user can then purchase the book directly by clicking this link.
[0424] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0425] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0426] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.
[0427] [Third Embodiment]
[0428] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.
[0429] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.
[0430] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0431] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.
[0432] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0433] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0434] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0435] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0436] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0437] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0438] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0439] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".
[0440] The embodiments for carrying out the present invention are described in detail below.
[0441] This system uses a terminal equipped with an interface for users to input target information such as book titles, author names, and genres. The terminal receives the user's input and sends a search request to the server based on it. The server analyzes the received information and identifies relevant publications from its database.
[0442] The server utilizes a generative model employing natural language processing techniques to automatically generate summaries of identified publications. These summaries are also provided as audio data, making them accessible to users both visually and aurally. The terminal then plays this audio data, enabling users to receive the necessary information auditorily.
[0443] Furthermore, the server provides links to relevant online sales platforms based on the generated summaries, allowing users to purchase publications directly. These links enable users to easily proceed with the purchase process after reviewing the summaries.
[0444] Furthermore, users can submit their feedback and ratings of publications through an interface. The device sends this feedback to a server, which stores it in a database and shares it with other users.
[0445] As a concrete example, if a user is searching for "fantasy novels," they would type "fantasy" into their device. The device sends this information to a server, which generates a list of relevant fantasy novels and summaries. The user selects a work that interests them and receives information directly through auditory means by playing the summary. If the user wishes to purchase the work after listening to the summary, they can access the sales platform via the provided link and easily complete the purchase. Afterwards, the user can enter a review of the work they have read and share it with other users.
[0446] This allows users to select the book that best suits them, even within a limited time, and gain a deeper understanding.
[0447] The following describes the processing flow.
[0448] Step 1:
[0449] The user opens the terminal interface and enters relevant information such as the book title, author name, and genre into the search bar. Once the input is complete, they press the search button.
[0450] Step 2:
[0451] The terminal receives the user's search request and sends that information to the server as an HTTP request. The request includes the information entered by the user and asks the server to process it.
[0452] Step 3:
[0453] The server parses the received request and generates a database query to search for information on relevant publications. In this process, it obtains information on books that match the user's request as structured data.
[0454] Step 4:
[0455] Based on the book information obtained as search results, the server invokes a generative model using natural language processing techniques to generate summaries of each identified publication.
[0456] Step 5:
[0457] The server then initiates the process of converting the generated summary into audio data. This audio data is then adjusted so that the user can receive the summary audibly.
[0458] Step 6:
[0459] The terminal provides the user with summary information and audio data received from the server. The user can select a summary of interest from the list of book information displayed on the terminal's interface and listen to the audio summary by pressing the play button.
[0460] Step 7:
[0461] After the user reviews the summary, if they wish to purchase, they will access the online sales platform directly via a link on their device. This link will lead to the purchase page for the selected publication.
[0462] Step 8:
[0463] After purchase, users can enter their feedback and ratings for the publication via their device. The entered rating information is then sent from the device to the server.
[0464] Step 9:
[0465] The server stores the received evaluation information in a database and updates the data so that it can be referenced in future searches and by other users. This information can also be used to make suggestions and recommendations to other users.
[0466] (Example 1)
[0467] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0468] A challenge lies in the lack of appropriate methods for quickly and efficiently acquiring and understanding information resources. In particular, a consistent system is needed that allows users to obtain summaries of information resources not only visually but also aurally, and to easily purchase related information resources and share evaluation information.
[0469] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0470] In this invention, the server includes a device for receiving higher-level category information from a user and identifying information on related information resources based on that information; a device for using a generative model to generate an overview of the identified information resources; and a device for providing the generated overview as acoustic data. This enables users to efficiently select and deeply understand information resources, as well as to share purchase and evaluation information.
[0471] "Higher-level category information" refers to general identifying information related to a specific subject, and serves as the basis for users to search for information resources.
[0472] "Information resources" refer to a collection of information provided to users, and include a variety of media such as books, documents, and articles.
[0473] "Device" refers to a mechanical or electronic component designed to perform a specific function, including hardware and software for performing a specific process.
[0474] A "generative model" is an algorithm or system designed to produce output based on input data, and in particular uses natural language processing techniques to generate summaries of information resources.
[0475] "Audio data" refers to digital audio data used to transmit information as sound, providing users with information through means other than sight.
[0476] An "e-commerce platform" refers to an online marketplace used by users to purchase products and services online, enabling the trading of a wide variety of goods and services.
[0477] This system consists of user terminals and a server that processes data. Users can input specific higher-level category information through the terminal's interface. The input information is then transmitted from the terminal to the server. The server utilizes a large database to quickly identify information resources. The server is equipped with database management software for executing SQL queries.
[0478] The server uses the received information to identify relevant information resources and automatically generates summaries of those resources using a generative AI model. The generative AI model incorporates algorithms that utilize natural language processing techniques. This technology allows for the extraction of key points from large amounts of information and the presentation of summaries in an easily understandable format for the user.
[0479] The generated summary is converted into audio data and provided to the user via the device. Software for generating speech from text is used for this conversion. This allows the user to receive information not only visually but also aurally.
[0480] The server also has the functionality to generate connection destinations that facilitate direct connections to e-commerce platforms from summaries. This allows users to quickly purchase relevant information resources via their devices. Furthermore, users can input their impressions and feedback via their devices after reading and send them to the server, and this information is shared with other users.
[0481] As a concrete example, if a user is searching for "historical novels," they would type "historical novels" into their device. The device would send this information to the server, which would then compile information on relevant historical novels and generate a summary. The summary would then be made available for audible review, and relevant purchase links would be provided. An example of a prompt would be, "Please recommend some books on historical novels." This system allows users to efficiently identify information resources even with limited information and make quick purchases.
[0482] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0483] Step 1:
[0484] The user uses their device to input higher-level category information to search for a specific information resource. This action generates a search query as text data, which is then provided to the device as input data. The device receives this input and prepares to send it to the server.
[0485] Step 2:
[0486] The terminal sends input data obtained from the user to the server. The server receives the received text data as input and begins data analysis to search for relevant information resources. This analysis generates SQL queries based on the input categories and retrieves information resources from the data store.
[0487] Step 3:
[0488] The server uses SQL queries to search for information resources within the data store. Based on the search results, it outputs a list of candidate information resources. This list is used as input for subsequent processing. The server prepares the candidate list to be provided as input data to the generating AI model.
[0489] Step 4:
[0490] The server uses a generative AI model to generate a summary of the candidate list provided as input data. In this process, natural language processing is used to extract the key points of the information resources and create a concise summary. The generated summary is output as text data.
[0491] Step 5:
[0492] The server initiates a process to convert the generated summary into audio data. This process uses text-to-speech (TTS) software, enabling the information to be presented to the user audibly. The server then prepares to send the audio data to the terminal.
[0493] Step 6:
[0494] The device plays audio data received from the server, allowing users to listen to summaries in audio format. After obtaining information through audio, users can purchase information resources of interest via links to e-commerce platforms provided by the server.
[0495] Step 7:
[0496] After using the information resources, the user inputs feedback information through the terminal's interface. The terminal then prepares to send this feedback data to the server. Once this information is sent, the server stores the feedback information in its data store and outputs it in a format that can be shared with other users.
[0497] (Application Example 1)
[0498] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0499] In today's information-saturated world, it's difficult for users to efficiently find content that interests them. Furthermore, they need to understand the content within a limited timeframe and smoothly complete the entire purchase process. However, existing systems require considerable effort and time to retrieve text information and complete the purchase process, placing a burden on users. This challenge needs to be addressed.
[0500] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0501] In this invention, the server includes means for receiving target information from a user and processing it to identify information about relevant content; means for generating a summary of the identified content using a generative model; and means for providing and streaming the generated summary as audio data. This allows users to efficiently obtain information about content of interest via audio and to proceed smoothly with subsequent purchasing procedures.
[0502] "User-provided target information" refers to information about books or content that the user is searching for, including input such as genre and title.
[0503] "Related content information" refers to detailed data on books and content that match the user's interests and search queries, identified based on the target information provided by the user.
[0504] A "generative model" is an algorithm that utilizes artificial intelligence technology to automatically create a summary of content from input data.
[0505] "Means of providing and streaming as audio data" refers to the technology and processes for converting the generated summary into an audio format so that users can listen to it in real time.
[0506] An "online sales system" refers to a platform that allows users to purchase content via the internet, enabling them to directly complete the purchase process.
[0507] "Rating information" refers to data that users input and share their impressions and ratings of content, and which is accessible to other users.
[0508] The system that implements this invention consists of a user, a server, and a terminal. Each component and its processing are described in detail below.
[0509] The server is primarily responsible for processing and generating data, and has a search function to identify relevant content based on information provided by the user. To achieve this, the server utilizes a generative AI model with natural language processing technology to extract relevant content from the database based on the genre and title specified by the user. In this process, the generative model uses an algorithm that analyzes text data and generates a summary.
[0510] The generated content summary is processed in real time as audio data and transmitted to the user's device via streaming technology. This process utilizes a speech synthesis system to generate the audio data. The speech synthesis system converts text into audio data, allowing the user to listen instead of reading.
[0511] If a user is interested in the content they have watched, they can easily purchase it through a link to the provided online sales system. This integration provides the device with a seamless experience for the user. After completing the purchase process, the user can also enter and share their evaluation of the content. The device sends this evaluation information to the server, which stores it in a database. The evaluation information becomes accessible to other users.
[0512] For example, if a user wants to research "near-future technology and science fiction," they enter the relevant information on their smartphone. They then listen to smoothly streaming audio, review the content, and if interested, easily purchase it through the platform.
[0513] An example of a prompt message is: "Please provide a program that generates a list of the latest books on near-future technology and science fiction, and then streams an audio summary of the first book."
[0514] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0515] Step 1:
[0516] The user enters relevant information, such as genre and title, into the device. Once the user has finished entering the information, it is sent to the server by the device. The entered information is then used for data processing in the next step.
[0517] Step 2:
[0518] The server searches the database based on the received target information and identifies relevant content. In this process, a generative AI model analyzes the input data using natural language processing and extracts relevant content. The extracted information is output as basic data for summary generation.
[0519] Step 3:
[0520] The server uses a generative AI model to create a summary of the identified content. This process uses data calculations to extract the key parts of the content as a summary and converts them into text format. The generated summary becomes the input for speech conversion in the next step.
[0521] Step 4:
[0522] The server converts the generated summary into audio data using speech synthesis. In the speech conversion process, the text data is converted into digital audio data using a synthesis algorithm, and this audio data is streamed to the terminal. The output audio data is available for the user to listen to.
[0523] Step 5:
[0524] When a user listens to the provided audio data and wishes to purchase content they are interested in, the device displays a link to an online sales system received from the server. This link is used as input for the purchase process. Using the link, the user can easily purchase the content.
[0525] Step 6:
[0526] After a user uses content, they enter their evaluation information into their device. The device then sends this evaluation information to the server. The server stores the received information in a database, making it accessible to other users. This evaluation data will be used in the future to select recommended content, among other things.
[0527] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0528] The embodiments for carrying out the present invention are described in detail below.
[0529] This system uses a terminal with an initial interface where the user inputs target information such as the book title, author name, and genre. Once the user has finished inputting, the terminal sends that information to the server as an HTTP request. The server analyzes this information and identifies the relevant publications from its database.
[0530] In particular, this invention incorporates an emotion engine that also collects additional information, including the user's emotions. The emotion engine has the function of analyzing the user's emotional state from the user's input, voice tone, text patterns, and so on.
[0531] The server utilizes generative models based on natural language processing technology to automatically create summaries of identified publications. These summaries are tailored to the user's emotional state, allowing for more emotionally richer expression. The summaries are also converted into audio data, making them accessible to the user aurally.
[0532] Users can not only grasp the book's overview by playing this audio summary through their device, but also receive information on recommended books based on the emotion engine's analysis. This feature makes it easy for users to find books that match their current emotions.
[0533] Furthermore, if a user is interested after reviewing the summary, they can purchase the publication directly through a link provided by the server. This link facilitates access to the appropriate online sales platform for the user. In addition, users can enter their impressions and ratings after reading through the terminal interface, and this information is stored in a database by the server and shared with other users.
[0534] For example, if a user is feeling stressed and wants to find recommended books on relaxation, the user's input and emotional tone are analyzed by the emotion engine. If the analysis indicates a desire to "calm down," the system prioritizes summarizing and recommending relaxation books and content optimized for that emotion.
[0535] This system allows users to receive more personalized information and suggestions, enabling them to make high-quality choices and purchases even within a limited timeframe.
[0536] The following describes the processing flow.
[0537] Step 1:
[0538] The user opens the device's interface, enters the book title, author name, or genre information into the search bar, and also describes their emotional state by typing or voice. The device collects this information.
[0539] Step 2:
[0540] The device compiles the collected information into an HTTP request and sends it to the server. The request includes book information and user sentiment information.
[0541] Step 3:
[0542] The server receives the request, generates and executes a query to retrieve information about relevant publications from the database, taking into account the user's sentiment.
[0543] Step 4:
[0544] The server generates summaries using a generative model that employs natural language processing techniques, based on the publication information in the search results. The generated summaries are then adjusted by an emotion engine to match the user's emotions.
[0545] Step 5:
[0546] The server converts the adjusted summary into audio data and provides it to the terminal in a streamable format.
[0547] Step 6:
[0548] The device displays and plays received summaries and audio data on the user interface, allowing users to receive the necessary information visually and audibly.
[0549] Step 7:
[0550] Users can review the played summary, and if interested, click the link on their device to access the online sales platform directly and purchase the book. This link will take them to the purchase page for the book.
[0551] Step 8:
[0552] After purchase, users enter their evaluation information about the publication from their device and submit their feedback. The device then sends this information to the server.
[0553] Step 9:
[0554] The server stores the received evaluation information in a database and updates the information so that other users can access it. This facilitates information sharing within the user community.
[0555] (Example 2)
[0556] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0557] In modern information retrieval systems, users need to select appropriate content from a large amount of information, and this process is cumbersome. Furthermore, it is difficult to present information flexibly according to the user's emotional state, making it challenging to provide information that meets individual needs.
[0558] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0559] In this invention, the server includes means for analyzing the user's emotional state and identifying information in relevant publications; means for using a generative model to generate a summary tailored to the user's emotional state; and means for providing the generated summary in audio format. This enables the provision of information customized based on the user's emotions.
[0560] A "user" is an individual or group that uses a system or service.
[0561] "Target information" refers to information about the data that the user wishes to search for or retrieve, and includes specific book titles, author names, genres, etc.
[0562] "Emotional state" refers to the psychological or emotional state a user experiences when entering information or performing a search, and is part of the data that the system analyzes.
[0563] A "generative model" is an algorithm or machine learning program that uses natural language processing techniques to automatically create summaries of publications.
[0564] "Audio format" refers to the conversion of generated text data into audio data that users can access aurally.
[0565] A "link" is a means of connection that allows a user to directly access another online platform or webpage by clicking on it.
[0566] "Rating information" refers to data that expresses users' impressions and opinions about publications, and functions as feedback that is shared with other users.
[0567] An "information storage device" is an electronic device or system for storing and managing information, such as a database or cloud storage.
[0568] "Natural language processing technology" refers to the technology that enables computers to understand, analyze, and generate human language, and is used for sentiment analysis of text and automatic summarization.
[0569] This invention begins with a user inputting target information such as book title, author name, and genre using an information terminal. The terminal formats the input information as structured data and sends it to the server via an HTTP request. The server plays a central role in data processing and utilizes speech analysis systems and natural language processing technologies to analyze the received information.
[0570] The server analyzes the user's emotional state using an emotion engine, along with the information provided by the user. This analysis employs methods to evaluate voice tone and text patterns, thereby identifying the user's psychological state. Specifically, it uses Google's speech recognition API to convert voice data into text, and NLTK, a natural language processing library suitable for text analysis.
[0571] The server, having received the analysis results, identifies relevant publications from its database and automatically generates summaries using a generative AI model. This process utilizes large-scale generative AI models, such as the GPT (Generative Pre-trained Transformer) series. The summaries are adjusted according to the user's emotional state, and the generated summaries are converted into audio format. Text-to-speech technologies such as Amazon Polly and Google Text-to-Speech are used for speech synthesis.
[0572] The user receives an audio summary provided by their device. By listening to this audio summary, the user can easily grasp the book's overview. In addition, the server also provides information on recommended books optimized for the user's emotions.
[0573] Interested users can purchase related publications directly online via links provided by the server. These links facilitate access to appropriate online sales platforms. User feedback and ratings after purchase are entered via their devices and stored in a database by the server. This information is shared with other users and helps improve the system's recommendation accuracy.
[0574] For example, if a user enters the prompt, "I'm looking for a book to relieve stress. I want to calm down," the emotion engine analyzes that psychological state as "I want to calm down." Based on this result, the server generates summaries of relaxation-related books and provides them as audio. By listening to this, the user can easily find a book that matches their mood.
[0575] As described above, the present invention provides a system that responds to the individual needs of users and enables the provision of information that is tailored to their emotions.
[0576] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0577] Step 1:
[0578] The user uses the terminal to input relevant information such as book title, author name, and genre. The entered information is stored on the terminal as text data. Once the user has entered all the information, the terminal converts this information into structured data in JSON format and prepares it for transmission to the server.
[0579] Step 2:
[0580] The device sends structured data to the server as an HTTP request. The sent request includes book information entered by the user and prompts for sentiment estimation. The server receives the request and validates its contents for data analysis. The server performs this process using backend technologies such as Python or Node.js.
[0581] Step 3:
[0582] The server searches the database based on the received data. Data queries use SQL or NoSQL to identify relevant publication information. This search process extracts records that match the keywords entered by the user. Search results include information such as book title, author name, and ISBN.
[0583] Step 4:
[0584] The server activates an emotion engine and analyzes the prompts and voice data sent by the user. Here, the speech recognition system converts the data into text, and then a natural language processing algorithm is applied. As a result of the analysis, the user's emotional state is identified, and this information, along with search results from the database, is used for the next processing step.
[0585] Step 5:
[0586] The server generates summaries of publications using a generative AI model. The AI model automatically generates summaries that take into account the acquired publication information and the user's emotional state. This generative model uses natural language processing techniques, and the generated summaries are emotionally optimized.
[0587] Step 6:
[0588] The server converts the generated summary into audio data. Text-to-speech technology is used for this conversion. During this synthesis process, the generated summary is prepared to be delivered to the user as natural-sounding speech.
[0589] Step 7:
[0590] The server sends audio data to the terminal, allowing the user to play an audio summary. Through this audio, the user can listen to and understand the book's overview. Audio playback allows the user to acquire information without relying on visual cues.
[0591] Step 8:
[0592] Users input their impressions and book ratings after listening to an audio summary into their device. The device sends this information to a server, which stores the received ratings in a database. This allows users to share their ratings with other users and contribute to improving the system's recommendation accuracy.
[0593] (Application Example 2)
[0594] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."
[0595] Modern users lead busy lives and often find it difficult to find the right publications quickly. Furthermore, there is a lack of easy ways to find content that resonates with their current emotions. Additionally, there is a lack of systems that provide emotionally relevant summaries and support quick purchase decisions. Technologies that effectively address these challenges are needed.
[0596] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0597] In this invention, the server includes means for receiving subject information and emotional state from the user and identifying relevant publications based thereon; means for using a generative model that generates summaries of publications tailored to the user's emotions; and means for providing the summaries as emotionally expressed audio data. This enables the user to quickly identify content that matches their current emotions and smoothly proceed through the purchase process.
[0598] "User-generated information" refers to data entered by users regarding books, authors, genres, etc.
[0599] "Emotional state" refers to information that indicates the user's current emotions and is obtained by analyzing their voice tone and text patterns.
[0600] "Relevant publications" refers to books and literary works that are identified as suitable based on the user's input and emotional state.
[0601] A "generative model" is an algorithm or program that uses natural language processing techniques to generate summaries of identified publications.
[0602] "Emotionally rich expression as audio data" refers to the process of converting the generated summary into audio and providing expression that includes tone and nuances that correspond to the user's emotions.
[0603] "Links to purchase publications via online purchasing platforms" refers to means of connecting to e-commerce sites where publications can be purchased, which are directly accessible from the summary.
[0604] "Rating information" refers to data that users record within the system, sharing their impressions and ratings of publications with other users.
[0605] The system implementing this invention identifies relevant publications based on user input information and emotions, provides them as audio summaries, and enables immediate purchase. The hardware and software configuration of this system is described below.
[0606] The terminal uses a smartphone or tablet as the user interface. On this terminal, an emotion analysis API for analyzing emotional states and a data input module for receiving user input run. For example, a general natural language processing API (e.g., OpenAI's emotion analysis tool) can be used for the emotion analysis API.
[0607] The server plays a central role in managing data processing and generative models. It receives information from users as HTTP requests and searches for relevant publications in a database (e.g., an SQL-based book database). It uses natural language processing techniques to generate summaries using generative AI models (e.g., GPT models).
[0608] The generated summary is converted into emotionally resonant audio data using a speech synthesis API (e.g., Google Cloud Text-to-Speech). This audio data is sent to the user's device, allowing them to listen to it in audio format.
[0609] Furthermore, as a purchase function, the server integrates with online sales platforms and generates appropriate purchase links according to user requests. By using electronic payment APIs (e.g., Stripe), a seamless purchase process can be achieved.
[0610] For example, if a user is feeling the emotion of "wanting to calm down," the emotion analysis API analyzes the emotions related to "relaxation" based on the information received from the device. The server identifies the relevant relaxation book and creates a summary using a generative AI model. Then, using a speech synthesis API, this summary can be converted into a pleasant tone of voice and delivered to the user.
[0611] An example of a prompt for a generative AI model might be: "Current emotion: I want to calm down. Please summarize books on relaxation. Please use a warm tone that matches my emotion."
[0612] In this way, a system is provided that allows users to instantly select publications that match their emotions, listen to summaries, and purchase them in a smooth and seamless manner.
[0613] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0614] Step 1:
[0615] The device receives information and voice tone from the user as input. This includes book title, author name, genre, etc. Once the user has finished inputting, it calls an emotion analysis API to analyze the voice tone and identify the emotional state.
[0616] Step 2:
[0617] The server uses the target information and identified emotional state received from the terminal as input to query the database. It constructs a query and retrieves data to identify related publications. The retrieved data is output as information on multiple related books.
[0618] Step 3:
[0619] The server takes relevant book information as input and uses a generative AI model to generate summaries of each book. By creating prompts and passing them to the model, it obtains summaries that utilize natural language processing techniques as output. These summaries are adapted to the user's emotions.
[0620] Step 4:
[0621] The server passes the generated summary as input to the speech synthesis API. The speech synthesis API converts the summary into audio data and outputs it as audio that matches the emotional state. The audio data is sent to the terminal.
[0622] Step 5:
[0623] The device plays audio data received from the server and provides the user with an audio summary. The user listens to the audio and proceeds if they are interested. Once the audio playback is complete, the device waits for the user's selection.
[0624] Step 6:
[0625] The server generates links to online purchase platforms for books the user has shown interest in. It uses a generated AI model's summary and link information as input and sends it to the user's device. The user can then purchase the book directly by clicking this link.
[0626] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0627] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0628] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.
[0629] [Fourth Embodiment]
[0630] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.
[0631] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.
[0632] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).
[0633] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.
[0634] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.
[0635] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).
[0636] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.
[0637] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.
[0638] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.
[0639] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.
[0640] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.
[0641] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.
[0642] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0643] The embodiments for carrying out the present invention are described in detail below.
[0644] This system uses a terminal equipped with an interface for users to input target information such as book titles, author names, and genres. The terminal receives the user's input and sends a search request to the server based on it. The server analyzes the received information and identifies relevant publications from its database.
[0645] The server utilizes a generative model employing natural language processing techniques to automatically generate summaries of identified publications. These summaries are also provided as audio data, making them accessible to users both visually and aurally. The terminal then plays this audio data, enabling users to receive the necessary information auditorily.
[0646] Furthermore, the server provides links to relevant online sales platforms based on the generated summaries, allowing users to purchase publications directly. These links enable users to easily proceed with the purchase process after reviewing the summaries.
[0647] Furthermore, users can submit their feedback and ratings of publications through an interface. The device sends this feedback to a server, which stores it in a database and shares it with other users.
[0648] As a concrete example, if a user is searching for "fantasy novels," they would type "fantasy" into their device. The device sends this information to a server, which generates a list of relevant fantasy novels and summaries. The user selects a work that interests them and receives information directly through auditory means by playing the summary. If the user wishes to purchase the work after listening to the summary, they can access the sales platform via the provided link and easily complete the purchase. Afterwards, the user can enter a review of the work they have read and share it with other users.
[0649] This allows users to select the book that best suits them, even within a limited time, and gain a deeper understanding.
[0650] The following describes the processing flow.
[0651] Step 1:
[0652] The user opens the terminal interface and enters relevant information such as the book title, author name, and genre into the search bar. Once the input is complete, they press the search button.
[0653] Step 2:
[0654] The terminal receives the user's search request and sends that information to the server as an HTTP request. The request includes the information entered by the user and asks the server to process it.
[0655] Step 3:
[0656] The server parses the received request and generates a database query to search for information on relevant publications. In this process, it obtains information on books that match the user's request as structured data.
[0657] Step 4:
[0658] Based on the book information obtained as search results, the server invokes a generative model using natural language processing techniques to generate summaries of each identified publication.
[0659] Step 5:
[0660] The server then initiates the process of converting the generated summary into audio data. This audio data is then adjusted so that the user can receive the summary audibly.
[0661] Step 6:
[0662] The terminal provides the user with summary information and audio data received from the server. The user can select a summary of interest from the list of book information displayed on the terminal's interface and listen to the audio summary by pressing the play button.
[0663] Step 7:
[0664] After the user reviews the summary, if they wish to purchase, they will access the online sales platform directly via a link on their device. This link will lead to the purchase page for the selected publication.
[0665] Step 8:
[0666] After purchase, users can enter their feedback and ratings for the publication via their device. The entered rating information is then sent from the device to the server.
[0667] Step 9:
[0668] The server stores the received evaluation information in a database and updates the data so that it can be referenced in future searches and by other users. This information can also be used to make suggestions and recommendations to other users.
[0669] (Example 1)
[0670] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0671] A challenge lies in the lack of appropriate methods for quickly and efficiently acquiring and understanding information resources. In particular, a consistent system is needed that allows users to obtain summaries of information resources not only visually but also aurally, and to easily purchase related information resources and share evaluation information.
[0672] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.
[0673] In this invention, the server includes a device for receiving higher-level category information from a user and identifying information on related information resources based on that information; a device for using a generative model to generate an overview of the identified information resources; and a device for providing the generated overview as acoustic data. This enables users to efficiently select and deeply understand information resources, as well as to share purchase and evaluation information.
[0674] "Higher-level category information" refers to general identifying information related to a specific subject, and serves as the basis for users to search for information resources.
[0675] "Information resources" refer to a collection of information provided to users, and include a variety of media such as books, documents, and articles.
[0676] "Device" refers to a mechanical or electronic component designed to perform a specific function, including hardware and software for performing a specific process.
[0677] A "generative model" is an algorithm or system designed to produce output based on input data, and in particular uses natural language processing techniques to generate summaries of information resources.
[0678] "Audio data" refers to digital audio data used to transmit information as sound, providing users with information through means other than sight.
[0679] An "e-commerce platform" refers to an online marketplace used by users to purchase products and services online, enabling the trading of a wide variety of goods and services.
[0680] This system consists of user terminals and a server that processes data. Users can input specific higher-level category information through the terminal's interface. The input information is then transmitted from the terminal to the server. The server utilizes a large database to quickly identify information resources. The server is equipped with database management software for executing SQL queries.
[0681] The server uses the received information to identify relevant information resources and automatically generates summaries of those resources using a generative AI model. The generative AI model incorporates algorithms that utilize natural language processing techniques. This technology allows for the extraction of key points from large amounts of information and the presentation of summaries in an easily understandable format for the user.
[0682] The generated summary is converted into audio data and provided to the user via the device. Software for generating speech from text is used for this conversion. This allows the user to receive information not only visually but also aurally.
[0683] The server also has the functionality to generate connection destinations that facilitate direct connections to e-commerce platforms from summaries. This allows users to quickly purchase relevant information resources via their devices. Furthermore, users can input their impressions and feedback via their devices after reading and send them to the server, and this information is shared with other users.
[0684] As a concrete example, if a user is searching for "historical novels," they would type "historical novels" into their device. The device would send this information to the server, which would then compile information on relevant historical novels and generate a summary. The summary would then be made available for audible review, and relevant purchase links would be provided. An example of a prompt would be, "Please recommend some books on historical novels." This system allows users to efficiently identify information resources even with limited information and make quick purchases.
[0685] The flow of the specific processing in Example 1 will be explained using Figure 11.
[0686] Step 1:
[0687] The user uses their device to input higher-level category information to search for a specific information resource. This action generates a search query as text data, which is then provided to the device as input data. The device receives this input and prepares to send it to the server.
[0688] Step 2:
[0689] The terminal sends input data obtained from the user to the server. The server receives the received text data as input and begins data analysis to search for relevant information resources. This analysis generates SQL queries based on the input categories and retrieves information resources from the data store.
[0690] Step 3:
[0691] The server uses SQL queries to search for information resources within the data store. Based on the search results, it outputs a list of candidate information resources. This list is used as input for subsequent processing. The server prepares the candidate list to be provided as input data to the generating AI model.
[0692] Step 4:
[0693] The server uses a generative AI model to generate a summary of the candidate list provided as input data. In this process, natural language processing is used to extract the key points of the information resources and create a concise summary. The generated summary is output as text data.
[0694] Step 5:
[0695] The server initiates a process to convert the generated summary into audio data. This process uses text-to-speech (TTS) software, enabling the information to be presented to the user audibly. The server then prepares to send the audio data to the terminal.
[0696] Step 6:
[0697] The device plays audio data received from the server, allowing users to listen to summaries in audio format. After obtaining information through audio, users can purchase information resources of interest via links to e-commerce platforms provided by the server.
[0698] Step 7:
[0699] After using the information resources, the user inputs feedback information through the terminal's interface. The terminal then prepares to send this feedback data to the server. Once this information is sent, the server stores the feedback information in its data store and outputs it in a format that can be shared with other users.
[0700] (Application Example 1)
[0701] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0702] In today's information-saturated world, it's difficult for users to efficiently find content that interests them. Furthermore, they need to understand the content within a limited timeframe and smoothly complete the entire purchase process. However, existing systems require considerable effort and time to retrieve text information and complete the purchase process, placing a burden on users. This challenge needs to be addressed.
[0703] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.
[0704] In this invention, the server includes means for receiving target information from a user and processing it to identify information about relevant content; means for generating a summary of the identified content using a generative model; and means for providing and streaming the generated summary as audio data. This allows users to efficiently obtain information about content of interest via audio and to proceed smoothly with subsequent purchasing procedures.
[0705] "User-provided target information" refers to information about books or content that the user is searching for, including input such as genre and title.
[0706] "Related content information" refers to detailed data on books and content that match the user's interests and search queries, identified based on the target information provided by the user.
[0707] A "generative model" is an algorithm that utilizes artificial intelligence technology to automatically create a summary of content from input data.
[0708] "Means of providing and streaming as audio data" refers to the technology and processes for converting the generated summary into an audio format so that users can listen to it in real time.
[0709] An "online sales system" refers to a platform that allows users to purchase content via the internet, enabling them to directly complete the purchase process.
[0710] "Rating information" refers to data that users input and share their impressions and ratings of content, and which is accessible to other users.
[0711] The system that implements this invention consists of a user, a server, and a terminal. Each component and its processing are described in detail below.
[0712] The server is primarily responsible for processing and generating data, and has a search function to identify relevant content based on information provided by the user. To achieve this, the server utilizes a generative AI model with natural language processing technology to extract relevant content from the database based on the genre and title specified by the user. In this process, the generative model uses an algorithm that analyzes text data and generates a summary.
[0713] The generated content summary is processed in real time as audio data and transmitted to the user's device via streaming technology. This process utilizes a speech synthesis system to generate the audio data. The speech synthesis system converts text into audio data, allowing the user to listen instead of reading.
[0714] If a user is interested in the content they have watched, they can easily purchase it through a link to the provided online sales system. This integration provides the device with a seamless experience for the user. After completing the purchase process, the user can also enter and share their evaluation of the content. The device sends this evaluation information to the server, which stores it in a database. The evaluation information becomes accessible to other users.
[0715] For example, if a user wants to research "near-future technology and science fiction," they enter the relevant information on their smartphone. They then listen to smoothly streaming audio, review the content, and if interested, easily purchase it through the platform.
[0716] An example of a prompt message is: "Please provide a program that generates a list of the latest books on near-future technology and science fiction, and then streams an audio summary of the first book."
[0717] The flow of a specific process in Application Example 1 will be explained using Figure 12.
[0718] Step 1:
[0719] The user enters relevant information, such as genre and title, into the device. Once the user has finished entering the information, it is sent to the server by the device. The entered information is then used for data processing in the next step.
[0720] Step 2:
[0721] The server searches the database based on the received target information and identifies relevant content. In this process, a generative AI model analyzes the input data using natural language processing and extracts relevant content. The extracted information is output as basic data for summary generation.
[0722] Step 3:
[0723] The server uses a generative AI model to create a summary of the identified content. This process uses data calculations to extract the key parts of the content as a summary and converts them into text format. The generated summary becomes the input for speech conversion in the next step.
[0724] Step 4:
[0725] The server converts the generated summary into audio data using speech synthesis. In the speech conversion process, the text data is converted into digital audio data using a synthesis algorithm, and this audio data is streamed to the terminal. The output audio data is available for the user to listen to.
[0726] Step 5:
[0727] When a user listens to the provided audio data and wishes to purchase content they are interested in, the device displays a link to an online sales system received from the server. This link is used as input for the purchase process. Using the link, the user can easily purchase the content.
[0728] Step 6:
[0729] After a user uses content, they enter their evaluation information into their device. The device then sends this evaluation information to the server. The server stores the received information in a database, making it accessible to other users. This evaluation data will be used in the future to select recommended content, among other things.
[0730] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.
[0731] The embodiments for carrying out the present invention are described in detail below.
[0732] This system uses a terminal with an initial interface where the user inputs target information such as the book title, author name, and genre. Once the user has finished inputting, the terminal sends that information to the server as an HTTP request. The server analyzes this information and identifies the relevant publications from its database.
[0733] In particular, this invention incorporates an emotion engine that also collects additional information, including the user's emotions. The emotion engine has the function of analyzing the user's emotional state from the user's input, voice tone, text patterns, and so on.
[0734] The server utilizes generative models based on natural language processing technology to automatically create summaries of identified publications. These summaries are tailored to the user's emotional state, allowing for more emotionally richer expression. The summaries are also converted into audio data, making them accessible to the user aurally.
[0735] Users can not only grasp the book's overview by playing this audio summary through their device, but also receive information on recommended books based on the emotion engine's analysis. This feature makes it easy for users to find books that match their current emotions.
[0736] Furthermore, if a user is interested after reviewing the summary, they can purchase the publication directly through a link provided by the server. This link facilitates access to the appropriate online sales platform for the user. In addition, users can enter their impressions and ratings after reading through the terminal interface, and this information is stored in a database by the server and shared with other users.
[0737] For example, if a user is feeling stressed and wants to find recommended books on relaxation, the user's input and emotional tone are analyzed by the emotion engine. If the analysis indicates a desire to "calm down," the system prioritizes summarizing and recommending relaxation books and content optimized for that emotion.
[0738] This system allows users to receive more personalized information and suggestions, enabling them to make high-quality choices and purchases even within a limited timeframe.
[0739] The following describes the processing flow.
[0740] Step 1:
[0741] The user opens the device's interface, enters the book title, author name, or genre information into the search bar, and also describes their emotional state by typing or voice. The device collects this information.
[0742] Step 2:
[0743] The device compiles the collected information into an HTTP request and sends it to the server. The request includes book information and user sentiment information.
[0744] Step 3:
[0745] The server receives the request, generates and executes a query to retrieve information about relevant publications from the database, taking into account the user's sentiment.
[0746] Step 4:
[0747] The server generates summaries using a generative model that employs natural language processing techniques, based on the publication information in the search results. The generated summaries are then adjusted by an emotion engine to match the user's emotions.
[0748] Step 5:
[0749] The server converts the adjusted summary into audio data and provides it to the terminal in a streamable format.
[0750] Step 6:
[0751] The device displays and plays received summaries and audio data on the user interface, allowing users to receive the necessary information visually and audibly.
[0752] Step 7:
[0753] Users can review the played summary, and if interested, click the link on their device to access the online sales platform directly and purchase the book. This link will take them to the purchase page for the book.
[0754] Step 8:
[0755] After purchase, users enter their evaluation information about the publication from their device and submit their feedback. The device then sends this information to the server.
[0756] Step 9:
[0757] The server stores the received evaluation information in a database and updates the information so that other users can access it. This facilitates information sharing within the user community.
[0758] (Example 2)
[0759] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0760] In modern information retrieval systems, users need to select appropriate content from a large amount of information, and this process is cumbersome. Furthermore, it is difficult to present information flexibly according to the user's emotional state, making it challenging to provide information that meets individual needs.
[0761] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.
[0762] In this invention, the server includes means for analyzing the user's emotional state and identifying information in relevant publications; means for using a generative model to generate a summary tailored to the user's emotional state; and means for providing the generated summary in audio format. This enables the provision of information customized based on the user's emotions.
[0763] A "user" is an individual or group that uses a system or service.
[0764] "Target information" refers to information about the data that the user wishes to search for or retrieve, and includes specific book titles, author names, genres, etc.
[0765] "Emotional state" refers to the psychological or emotional state a user experiences when entering information or performing a search, and is part of the data that the system analyzes.
[0766] A "generative model" is an algorithm or machine learning program that uses natural language processing techniques to automatically create summaries of publications.
[0767] "Audio format" refers to the conversion of generated text data into audio data that users can access aurally.
[0768] A "link" is a means of connection that allows a user to directly access another online platform or webpage by clicking on it.
[0769] "Rating information" refers to data that expresses users' impressions and opinions about publications, and functions as feedback that is shared with other users.
[0770] An "information storage device" is an electronic device or system for storing and managing information, such as a database or cloud storage.
[0771] "Natural language processing technology" refers to the technology that enables computers to understand, analyze, and generate human language, and is used for sentiment analysis of text and automatic summarization.
[0772] This invention begins with a user inputting target information such as book title, author name, and genre using an information terminal. The terminal formats the input information as structured data and sends it to the server via an HTTP request. The server plays a central role in data processing and utilizes speech analysis systems and natural language processing technologies to analyze the received information.
[0773] The server analyzes the user's emotional state using an emotion engine, along with the information provided by the user. This analysis employs methods to evaluate voice tone and text patterns, thereby identifying the user's psychological state. Specifically, it uses Google's speech recognition API to convert voice data into text, and NLTK, a natural language processing library suitable for text analysis.
[0774] The server, having received the analysis results, identifies relevant publications from its database and automatically generates summaries using a generative AI model. This process utilizes large-scale generative AI models, such as the GPT (Generative Pre-trained Transformer) series. The summaries are adjusted according to the user's emotional state, and the generated summaries are converted into audio format. Text-to-speech technologies such as Amazon Polly and Google Text-to-Speech are used for speech synthesis.
[0775] The user receives an audio summary provided by their device. By listening to this audio summary, the user can easily grasp the book's overview. In addition, the server also provides information on recommended books optimized for the user's emotions.
[0776] Interested users can purchase related publications directly online via links provided by the server. These links facilitate access to appropriate online sales platforms. User feedback and ratings after purchase are entered via their devices and stored in a database by the server. This information is shared with other users and helps improve the system's recommendation accuracy.
[0777] For example, if a user enters the prompt, "I'm looking for a book to relieve stress. I want to calm down," the emotion engine analyzes that psychological state as "I want to calm down." Based on this result, the server generates summaries of relaxation-related books and provides them as audio. By listening to this, the user can easily find a book that matches their mood.
[0778] As described above, the present invention provides a system that responds to the individual needs of users and enables the provision of information that is tailored to their emotions.
[0779] The flow of the specific processing in Example 2 will be explained using Figure 13.
[0780] Step 1:
[0781] The user uses the terminal to input relevant information such as book title, author name, and genre. The entered information is stored on the terminal as text data. Once the user has entered all the information, the terminal converts this information into structured data in JSON format and prepares it for transmission to the server.
[0782] Step 2:
[0783] The device sends structured data to the server as an HTTP request. The sent request includes book information entered by the user and prompts for sentiment estimation. The server receives the request and validates its contents for data analysis. The server performs this process using backend technologies such as Python or Node.js.
[0784] Step 3:
[0785] The server searches the database based on the received data. Data queries use SQL or NoSQL to identify relevant publication information. This search process extracts records that match the keywords entered by the user. Search results include information such as book title, author name, and ISBN.
[0786] Step 4:
[0787] The server activates an emotion engine and analyzes the prompts and voice data sent by the user. Here, the speech recognition system converts the data into text, and then a natural language processing algorithm is applied. As a result of the analysis, the user's emotional state is identified, and this information, along with search results from the database, is used for the next processing step.
[0788] Step 5:
[0789] The server generates summaries of publications using a generative AI model. The AI model automatically generates summaries that take into account the acquired publication information and the user's emotional state. This generative model uses natural language processing techniques, and the generated summaries are emotionally optimized.
[0790] Step 6:
[0791] The server converts the generated summary into audio data. Text-to-speech technology is used for this conversion. During this synthesis process, the generated summary is prepared to be delivered to the user as natural-sounding speech.
[0792] Step 7:
[0793] The server sends audio data to the terminal, allowing the user to play an audio summary. Through this audio, the user can listen to and understand the book's overview. Audio playback allows the user to acquire information without relying on visual cues.
[0794] Step 8:
[0795] Users input their impressions and book ratings after listening to an audio summary into their device. The device sends this information to a server, which stores the received ratings in a database. This allows users to share their ratings with other users and contribute to improving the system's recommendation accuracy.
[0796] (Application Example 2)
[0797] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".
[0798] Modern users lead busy lives and often find it difficult to find the right publications quickly. Furthermore, there is a lack of easy ways to find content that resonates with their current emotions. Additionally, there is a lack of systems that provide emotionally relevant summaries and support quick purchase decisions. Technologies that effectively address these challenges are needed.
[0799] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.
[0800] In this invention, the server includes means for receiving subject information and emotional state from the user and identifying relevant publications based thereon; means for using a generative model that generates summaries of publications tailored to the user's emotions; and means for providing the summaries as emotionally expressed audio data. This enables the user to quickly identify content that matches their current emotions and smoothly proceed through the purchase process.
[0801] "User-generated information" refers to data entered by users regarding books, authors, genres, etc.
[0802] "Emotional state" refers to information that indicates the user's current emotions and is obtained by analyzing their voice tone and text patterns.
[0803] "Relevant publications" refers to books and literary works that are identified as suitable based on the user's input and emotional state.
[0804] A "generative model" is an algorithm or program that uses natural language processing techniques to generate summaries of identified publications.
[0805] "Emotionally rich expression as audio data" refers to the process of converting the generated summary into audio and providing expression that includes tone and nuances that correspond to the user's emotions.
[0806] "Links to purchase publications via online purchasing platforms" refers to means of connecting to e-commerce sites where publications can be purchased, which are directly accessible from the summary.
[0807] "Rating information" refers to data that users record within the system, sharing their impressions and ratings of publications with other users.
[0808] The system implementing this invention identifies relevant publications based on user input information and emotions, provides them as audio summaries, and enables immediate purchase. The hardware and software configuration of this system is described below.
[0809] The terminal uses a smartphone or tablet as the user interface. On this terminal, an emotion analysis API for analyzing emotional states and a data input module for receiving user input run. For example, a general natural language processing API (e.g., OpenAI's emotion analysis tool) can be used for the emotion analysis API.
[0810] The server plays a central role in managing data processing and generative models. It receives information from users as HTTP requests and searches for relevant publications in a database (e.g., an SQL-based book database). It uses natural language processing techniques to generate summaries using generative AI models (e.g., GPT models).
[0811] The generated summary is converted into emotionally resonant audio data using a speech synthesis API (e.g., Google Cloud Text-to-Speech). This audio data is sent to the user's device, allowing them to listen to it in audio format.
[0812] Furthermore, as a purchase function, the server integrates with online sales platforms and generates appropriate purchase links according to user requests. By using electronic payment APIs (e.g., Stripe), a seamless purchase process can be achieved.
[0813] For example, if a user is feeling the emotion of "wanting to calm down," the emotion analysis API analyzes the emotions related to "relaxation" based on the information received from the device. The server identifies the relevant relaxation book and creates a summary using a generative AI model. Then, using a speech synthesis API, this summary can be converted into a pleasant tone of voice and delivered to the user.
[0814] An example of a prompt for a generative AI model might be: "Current emotion: I want to calm down. Please summarize books on relaxation. Please use a warm tone that matches my emotion."
[0815] In this way, a system is provided that allows users to instantly select publications that match their emotions, listen to summaries, and purchase them in a smooth and seamless manner.
[0816] The flow of a specific process in Application Example 2 will be explained using Figure 14.
[0817] Step 1:
[0818] The device receives information and voice tone from the user as input. This includes book title, author name, genre, etc. Once the user has finished inputting, it calls an emotion analysis API to analyze the voice tone and identify the emotional state.
[0819] Step 2:
[0820] The server uses the target information and identified emotional state received from the terminal as input to query the database. It constructs a query and retrieves data to identify related publications. The retrieved data is output as information on multiple related books.
[0821] Step 3:
[0822] The server takes relevant book information as input and uses a generative AI model to generate summaries of each book. By creating prompts and passing them to the model, it obtains summaries that utilize natural language processing techniques as output. These summaries are adapted to the user's emotions.
[0823] Step 4:
[0824] The server passes the generated summary as input to the speech synthesis API. The speech synthesis API converts the summary into audio data and outputs it as audio that matches the emotional state. The audio data is sent to the terminal.
[0825] Step 5:
[0826] The device plays audio data received from the server and provides the user with an audio summary. The user listens to the audio and proceeds if they are interested. Once the audio playback is complete, the device waits for the user's selection.
[0827] Step 6:
[0828] The server generates links to online purchase platforms for books the user has shown interest in. It uses a generated AI model's summary and link information as input and sends it to the user's device. The user can then purchase the book directly by clicking this link.
[0829] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.
[0830] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.
[0831] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.
[0832] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.
[0833] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.
[0834] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.
[0835] The inside of the Emotion Map 400 represents what's in your mind, while the outside represents what you're doing. Therefore, the further you go out the 400-coordinate scale, the more visible your emotions become (the more they manifest in your actions).
[0836] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.
[0837] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."
[0838] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.
[0839] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.
[0840] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.
[0841] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.
[0842] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.
[0843] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.
[0844] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.
[0845] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.
[0846] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.
[0847] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.
[0848] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.
[0849] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.
[0850] The following is further disclosed regarding the embodiments described above.
[0851] (Claim 1)
[0852] A means for receiving target information from a user and processing it to identify information on related publications based on said information,
[0853] Means for using a generative model to generate a summary of the identified publication,
[0854] A means of providing the generated summary as audio data,
[0855] A means of generating a link to purchase the publication directly from the summary via an online sales platform,
[0856] A means for users to input and share evaluation information about publications,
[0857] A system that includes this.
[0858] (Claim 2)
[0859] The system according to claim 1, further comprising means for retrieving information on relevant publications from a database based on a user's search query.
[0860] (Claim 3)
[0861] The system according to claim 1, wherein the generative model automatically generates summaries of publications using natural language processing techniques.
[0862] "Example 1"
[0863] (Claim 1)
[0864] A device for receiving higher-level category information from a user and identifying information on related information resources based on said information,
[0865] A device that uses a generative model to generate an overview of the identified information resource,
[0866] A device that provides the generated summary as acoustic data,
[0867] A device that generates a connection destination for purchasing information resources directly from the overview via an e-commerce platform,
[0868] A device that allows users to input and share their impressions of information resources,
[0869] A system that includes this.
[0870] (Claim 2)
[0871] The system according to claim 1, further comprising a device for retrieving information on relevant information resources from a data store based on a user's research query.
[0872] (Claim 3)
[0873] The system according to claim 1, wherein the generative model automatically generates an overview of an information resource using natural language processing technology.
[0874] "Application Example 1"
[0875] (Claim 1)
[0876] A means for receiving target information from a user and processing it to identify related content information based on said information,
[0877] Means for using a generative model to generate a summary of the identified content,
[0878] A means of providing the generated summary as audio data and streaming it,
[0879] A means of generating a link to purchase content directly from the summary via an online sales system,
[0880] A means for users to input and share their evaluation information about content,
[0881] A system that includes this.
[0882] (Claim 2)
[0883] The system according to claim 1, further comprising means for retrieving information on relevant content from a database based on a user's search query.
[0884] (Claim 3)
[0885] The system according to claim 1, wherein the generative model automatically generates a summary of content using natural language processing techniques.
[0886] "Example 2 of combining an emotion engine"
[0887] (Claim 1)
[0888] A means for analyzing target information and emotional state from a user, and for processing information related to publications based on said information and analysis results,
[0889] Means for using a generative model to generate a summary of the identified publication in accordance with the user's emotional state,
[0890] A means of providing the generated summary in audio format,
[0891] A means of generating a link to purchase the publication directly from the summary via a communication method,
[0892] A means by which users can input, save, and share evaluation information about publications,
[0893] A system that includes this.
[0894] (Claim 2)
[0895] The system according to claim 1, further comprising means for retrieving information on relevant publications from an information storage device based on the user's search queries and sentiment state.
[0896] (Claim 3)
[0897] The system according to claim 1, wherein a generative model uses natural language processing techniques to automatically generate sentiment-appropriate summaries of publications.
[0898] "Application example 2 when combining with an emotional engine"
[0899] (Claim 1)
[0900] A means for receiving target information and emotional states associated with said information from a user, and for processing to identify information on related publications based on said information,
[0901] Means for using a generative model to generate a summary of the identified publication in accordance with the user's emotional state,
[0902] A means of providing the generated summary as audio data with emotionally rich expression,
[0903] A means of generating a link to purchase the publication directly from the summary via an online purchasing platform,
[0904] A means for users to input and share their evaluation information about publications with other users,
[0905] A system that includes this.
[0906] (Claim 2)
[0907] The system according to claim 1, further comprising means for retrieving information on relevant publications from a database based on the user's search intent.
[0908] (Claim 3)
[0909] The system according to claim 1, wherein the generative model uses natural language processing techniques to automatically generate summaries of publications that are adapted to the emotional state of the user. [Explanation of symbols]
[0910] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>
Claims
1. A means for receiving target information from a user and processing it to identify related content information based on said information, Means for using a generative model to generate a summary of the identified content, A means of providing the generated summary as audio data and streaming it, A means of generating a link to purchase content directly from the summary via an online sales system, A means for users to input and share their evaluation information about content, A system that includes this.
2. The system according to claim 1, further comprising means for obtaining information on relevant content from a database based on a user's search query.
3. The system according to claim 1, wherein the generative model automatically generates a summary of content using natural language processing technology.