system

The system addresses the inefficiencies of conventional book information access by generating audio summaries and providing purchase guidance, enhancing user experience for time-constrained and visually impaired individuals.

JP2026100612APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Individuals with time constraints, particularly business people and visually impaired individuals, face challenges in efficiently accessing and selecting relevant book information due to conventional character-based methods, which are time-consuming and lack direct purchase guidance.

Method used

A system that extracts relevant book information from a database using book search queries on a user terminal, generates a text summary using a generative model, converts it into audio format, and provides book review information along with links to online bookstores to streamline the reading and purchasing process.

Benefits of technology

Enables users to efficiently understand book content, make informed purchasing decisions, and complete transactions quickly through voice-based information delivery, improving user satisfaction and convenience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026100612000001_ABST
    Figure 2026100612000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A means for receiving query data about books entered from a user terminal, A means of extracting relevant book information from a database based on query data, A means of generating a text summary using a generative model based on extracted book information, A means for converting the generated text summary into an audio format, A means for transmitting summarized data converted into audio format to the user's terminal, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance that responds to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In modern society, it is difficult for individuals with time constraints to efficiently understand the content of books. Especially for business people and visually impaired people, the access to information is limited by the conventional character-based information acquisition methods. Also, among a large number of books, it is also a problem that it takes time and effort to select an appropriate book. Furthermore, when purchasing a book, there is also a problem that there is a lack of a lead that leads to a direct purchase.

Means for Solving the Problems

[0005] This invention provides a system that extracts relevant book information from a database using book search queries on a user terminal and generates a text summary using a generative model. It also converts the generated summary into an audio format, enabling users to easily access the information. Furthermore, by combining this with means of presenting relevant book review information and links to online bookstores, the system streamlines the user's reading selection and purchase process, meeting diverse needs.

[0006] A "user terminal" is a device used by a user to input and receive information.

[0007] "Query data" refers to information that users input through their devices to search for information related to books.

[0008] A "database" is a recording medium that stores information about books and allows for searching and retrieval as needed.

[0009] A "generative model" is an artificial intelligence model used to analyze the content of a book and generate a text summary.

[0010] A "text summary" is information that describes the main points of a book in a shortened form.

[0011] "Audio format" refers to a media format that converts text content into audio.

[0012] "Review information" refers to information posted by users regarding their evaluations and impressions of books.

[0013] An "online bookstore" is an e-commerce site where you can purchase books via the internet. [Brief explanation of the drawing]

[0014] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

MODE FOR CARRYING OUT THE INVENTION

[0015] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0016] First, the terms used in the following description will be explained.

[0017] In the following embodiments, a tagged processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0018] In the following embodiments, a tagged RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0019] In the following embodiments, a tagged storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0020] In the following embodiments, a tagged communication I / F (Interface) is an interface including a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0021] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0022] [First Embodiment]

[0023] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0024] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0025] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0026] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0027] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0028] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0029] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0030] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0031] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0032] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0033] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0034] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0035] This invention relates to a system for simplifying the entire process from efficiently obtaining book information using a terminal to making a purchase. The server, which is the core of the system, functions as follows:

[0036] When a user enters a query about a book using a terminal, the terminal sends that data to the server. The server searches its internal database based on the received query and extracts the relevant book information. Using the extracted information, the server applies a generative model to generate a summary of the book's content. This summary is in text format and is then converted into audio format.

[0037] Once an audio summary is prepared, the server sends it to the user's device. The user can then play this audio summary through their device during their daily life, enabling them to efficiently understand the book's content. The server also retrieves review information related to the book to support the user's purchase decision. In addition, the server provides links to online bookstores, guiding users to quickly purchase books they like.

[0038] As a concrete example, consider a case where a user searches for "history textbooks." The user enters "history textbooks" into their device, and the server extracts relevant books from its database. Based on the extracted information, a generative AI creates a summary and converts it into audio format. The user can listen to the audio summary on their device, check reviews, and proceed with the purchase via a link to an online bookstore. In this way, the present invention provides users with an easy and quick way to obtain information and make a purchase.

[0039] The following describes the processing flow.

[0040] Step 1:

[0041] The user uses their device to enter a query about a book and clicks the search button. The device then sends this query data to the server.

[0042] Step 2:

[0043] Based on the query data received by the server, it searches its internal database and extracts the relevant book information. The server then prepares the extracted results for the next processing step.

[0044] Step 3:

[0045] The server passes the acquired book information to a generative model to create a summary of the content. The generative model extracts the key points of the book and generates the summary in text format.

[0046] Step 4:

[0047] The text summary generated by the server is converted into audio format. Therefore, a dedicated speech synthesis engine is used to generate the audio data.

[0048] Step 5:

[0049] The server generates audio summary data and sends it to the user's device. The device receives the audio data and prepares for playback.

[0050] Step 6:

[0051] Users can play an audio summary on their device to efficiently understand the book's content. They can then use this summary to consider purchasing the book.

[0052] Step 7:

[0053] The server retrieves relevant book review information from the database and sends it to the terminal. This allows the user to see ratings from other users.

[0054] Step 8:

[0055] The server provides the device with a link to an online bookstore. When the user clicks the link, the device opens the page of the specified online bookstore and makes the book available for purchase.

[0056] (Example 1)

[0057] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0058] In today's information-saturated society, there is a need to streamline the process by which users can quickly and efficiently acquire the document information they require, understand its content, and make purchasing decisions. However, conventional information retrieval systems have struggled to integrate information extraction, summarization, evaluation, and purchase guidance, often resulting in decreased user satisfaction.

[0059] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0060] In this invention, the server includes means for receiving query data regarding documents input from a user device, means for extracting relevant document information from a storage device based on the query data, and means for generating a text summary using a generation algorithm based on the extracted document information. This enables the user to quickly obtain the information they need, efficiently understand its contents, and make better purchasing decisions based on the evaluation information.

[0061] "User equipment" refers to information terminals such as computers and mobile devices operated by the user, and includes means for inputting and receiving information.

[0062] "Query data" refers to information such as queries and keywords that users enter when searching for specific document information.

[0063] "Document information" refers to information about specific text, books, or digital content that is searched based on a user's query.

[0064] A "storage device" refers to a database or hardware or software system for storing information where document information and related data are pre-stored.

[0065] A "generation algorithm" refers to the computational procedures and processes used to perform text summarization based on extracted document information.

[0066] A "text summary" refers to a concise explanation that condenses the main points and content of a document.

[0067] "Audio format" refers to a form of information that has been converted from text-based information into audio data, enabling users to receive information in audio format.

[0068] An "electronic marketplace" refers to a platform or website where users can purchase documents and goods online.

[0069] This invention is a system for users to efficiently acquire document information and make purchasing decisions based on that information. The system mainly consists of three elements: a server, a terminal, and the user.

[0070] The server has a communication interface for receiving query data from user terminals. Based on user input, the server searches a database in its storage device and extracts relevant document information. The server then applies a generative AI model to the extracted document information to create a text summary of the document content. This generative AI model utilizes natural language processing techniques and has the ability to efficiently summarize the key points of a document.

[0071] After a text summary is generated, the server uses speech-to-speech software to convert it into audio format. This audio data is sent to the user's terminal, allowing the user to easily access the information in everyday situations. The server also retrieves review information for related documents and sends it to the user's terminal. This allows users to make more informed decisions by referring to feedback from other buyers.

[0072] Furthermore, the server generates links to online e-marketplaces, providing users with a smooth path to purchase relevant documents. These links appear on the user's device, allowing them to instantly purchase the necessary documents.

[0073] As a concrete example, consider a case where a user searches for information about "books on ancient Roman history." The user enters a query on their device, the server generates a summary based on the relevant information, and sends the audio data to the device. The user listens to the summary in audio format, views reviews as needed, and clicks on the provided links to go to the purchase page.

[0074] As an example of a prompt, you might input something like, "I'd like a summary of a book on ancient Roman history." This allows the system to respond quickly and provide information that meets the user's needs.

[0075] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0076] Step 1:

[0077] The user opens the search interface on their device and enters keywords related to books or documents. The entered keywords are sent to the server as query data. This sends the server a query for the information the user is looking for, and processing begins.

[0078] Step 2:

[0079] The server searches its database in storage based on the query data it receives. Using a database search algorithm, the server extracts matching document information. The input is keywords sent by the user, and the output is a set of corresponding document information. This process efficiently collects the information the user is looking for.

[0080] Step 3:

[0081] The server inputs the extracted document information into a generative AI model to generate a text summary. In this step, the generative AI model performs natural language processing to generate a concise overview of the key points. The generated text summary is output. This reduces the amount of information and makes it easier for the user to understand.

[0082] Step 4:

[0083] The server uses speech-to-speech software to convert the generated text summary into audio format. The text data is converted into an audio file and output. Users can then access the information in audio format. This process is a means of utilizing the summarized information over a long period of time.

[0084] Step 5:

[0085] The server sends audio files and associated review information to the user's terminal. The input is the generated audio files and review information retrieved from the database, and the output is the complete dataset delivered to the user's terminal. This allows the user to acquire and review the information visually or aurally.

[0086] Step 6:

[0087] Users play audio summaries on their devices and review the provided review information. By manipulating the outputted dataset, users can easily make purchasing decisions. Specific actions include playing audio and loading reviews using the device's audio player.

[0088] Step 7:

[0089] The server generates a link to the relevant online marketplace and sends it to the user's terminal. This link guides the user to purchase the document immediately. When the user clicks the link, the online transaction begins and the purchase is completed.

[0090] (Application Example 1)

[0091] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0092] Existing book purchasing processes require users to spend a significant amount of time and effort obtaining book information. The process of reviewing book content, reading reviews, and completing the purchase is cumbersome, hindering a smooth buying experience. In particular, there are no systems that complete these processes solely through voice, and the user interface is not intuitive, compromising convenience. Therefore, there is a need for a system that provides information directly to users via voice, enabling quick and smooth purchases.

[0093] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0094] In this invention, the server includes means for receiving query information about books entered from a user's computing terminal, means for extracting the corresponding book information from an information storage unit based on the query information, means for generating a text summary using a generative model based on the extracted book information, means for converting the generated text summary into an audio format, means for transmitting the summary information converted into an audio format to the user's computing terminal, and means for the user to listen to the audio summary, select purchase information, and quickly complete the transaction through electronic payment. This enables the user to efficiently understand the contents of the book and quickly complete the purchase procedure.

[0095] A "user computing terminal" is an electronic device used by users to input information and process received data.

[0096] "Query information" refers to the search criteria and question data entered by the user to retrieve specific book information.

[0097] An "information storage unit" refers to a database or storage system used to store book information and related data.

[0098] A "generative model" is an artificial intelligence algorithm used to summarize or transform information from acquired data.

[0099] A "textual summary" is a text-based summary of book information.

[0100] "Audio format" refers to audio data used to output text data as auditory information.

[0101] An "audio summary" is information that has been condensed from the content of a book and then converted into audio data.

[0102] "Purchase information" refers to data related to product selection and payment used by users when purchasing books.

[0103] "Electronic payment" refers to electronic payment methods used for making payments online.

[0104] The system for carrying out this invention includes a server and a user computing terminal. The server receives query information about books from the user computing terminal, searches an information storage unit, and extracts the relevant book information. Based on this information, the server generates a text summary using a generative AI model, and further converts the summary into an audio format.

[0105] The user's computing terminal receives the converted audio summary and provides an environment where users can listen to that information in their daily lives. After listening to the audio summary, users can select purchase information and complete the transaction instantly through electronic payment.

[0106] Specific hardware includes smartphones and cloud servers, while software uses natural language processing libraries (e.g., NLTK, spaCy), speech synthesis APIs (e.g., Google® Text-to-Speech), and payment APIs (e.g., Stripe).

[0107] For example, when a user searches for the latest bestselling novel, an audio summary such as "This novel is a thrilling story set in a futuristic city" is sent to the device. If the user chooses to purchase, electronic payment is completed instantly using fingerprint or facial recognition.

[0108] Examples of prompt statements are as follows:

[0109] "Generate an audio summary of a bestselling novel and comment on whether you should buy it."

[0110] In this way, a system is created that allows users to efficiently understand the content of books and quickly complete the purchase process.

[0111] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0112] Step 1:

[0113] The user enters query information about books on their device. This query information is prepared as text data and sent to the server. For example, the search phrase "latest best-selling novels" might be entered.

[0114] Step 2:

[0115] The server uses the received query information to search the information storage unit. A database search is performed, extracting the relevant book information and related data. The input is the query information, and the output is a set of book information.

[0116] Step 3:

[0117] The server inputs the extracted book information into an AI model to generate a textual summary. At this stage, a concise text summary is created. The input is book information, and the output is the summarized text.

[0118] Step 4:

[0119] The server converts the generated summary text into audio format. It uses a speech synthesis API to convert the text data into audio data. At this stage, the generated audio summary is obtained. The input is a text summary, and the output is audio data.

[0120] Step 5:

[0121] The server sends the summarized data, converted into audio format, to the user's device. The user listens to the audio summary through their device and efficiently understands the information. They also use an audio playback application to verify the summary.

[0122] Step 6:

[0123] The user listens to an audio summary and confirms their purchase intention as needed. They select purchase information on the terminal and prepare to make an electronic payment using fingerprint or facial recognition. The input is purchase intention information, and the output is payment preparation information.

[0124] Step 7:

[0125] Based on the user's selection, the server executes the electronic payment process. The transaction is completed using the payment API, and confirmation information regarding the purchase is sent to the terminal. Inputs are purchase information and authentication information, and output is transaction completion information.

[0126] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0127] This invention relates to a system that provides optimal book summary information while taking into account the user's emotional state. This system incorporates an emotion engine to improve the user experience and functions as follows:

[0128] When a user uses a device and enters a query about a book, the device sends that data to the server. In addition, the device recognizes the user's emotions by sending the user's voice and input data to an emotion engine in real time. The emotion engine analyzes the tone of voice, word choice, input speed, etc., to estimate the user's current emotional state.

[0129] The server searches the database based on recognized sentiment information and queries, and extracts relevant book information. This extracted information is passed to a generative model, which creates a summary. The generative model adjusts the style and level of detail of the summary according to the user's sentiment.

[0130] Subsequently, during the process of converting the text summary into speech, adjustments are made to take the user's emotions into consideration. The speech synthesis engine generates speech that matches the emotion, such as a friendly or calm tone.

[0131] For example, if a user wants to obtain information quickly amidst a busy daily life, the emotion engine senses their stress and anxiety. Based on this information, the server creates a summary tailored to the busy user and delivers it quickly. Furthermore, the summary is played in a calming voice to help the user relax.

[0132] The server provides information including relevant book reviews and also presents links to online bookstores to encourage user purchases. In this way, the present invention aims to improve the user experience by providing flexible responses that take into account the user's emotions.

[0133] The following describes the processing flow.

[0134] Step 1:

[0135] The user enters book queries on the device and sends them to the server via voice or input data. In parallel, the device provides the user's real-time sentiment data to the sentiment engine.

[0136] Step 2:

[0137] As the server receives query data, the emotion engine analyzes the user's voice tone and input data to estimate the user's emotional state. The emotion engine then sends these analysis results to the server.

[0138] Step 3:

[0139] Based on the query data and sentiment information received by the server, it searches the database and extracts the relevant book information. The server then prepares to pass the extracted book information to the generative model.

[0140] Step 4:

[0141] The server passes book information to a generative model, which generates a summary. The generative model adjusts the summary based on the user's sentiment information, extracts the necessary information, and creates a text summary.

[0142] Step 5:

[0143] The server activates a speech synthesis engine to convert the generated text summary into speech format. The speech synthesis engine takes sentiment data into account and generates an appropriate speech tone.

[0144] Step 6:

[0145] The server sends the summary, converted into audio format, to the user's device. The device plays the received audio summary, allowing the user to listen to it.

[0146] Step 7:

[0147] Users review the provided audio and text summaries and receive emotionally relevant suggestions to decide on their next course of action. They then purchase books by reading reviews or clicking on links to online bookstores.

[0148] Step 8:

[0149] The server retrieves review information for relevant books from a database and sends it to the user's device. It also provides links to online bookstores, creating a quick and easy path for users to make purchases.

[0150] (Example 2)

[0151] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0152] Conventional information presentation systems struggled to provide information while considering the user's emotional state, and were unable to flexibly respond to the user's desired style and level of detail. Furthermore, the presentation style of extracted information was uniform, resulting in a poor user experience. Additionally, guidance to e-commerce platforms was sometimes not smooth, failing to fully stimulate user purchasing intent.

[0153] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0154] In this invention, the server includes means for estimating the user's emotional state using an emotion analysis device based on information input from the user terminal, means for extracting relevant book information from information sources based on the user's emotional state and query data, and means for converting the generated text summary into audio data and making adjustments appropriate to the user's emotional state. This enables the user to receive appropriate information according to their emotional state, providing a better user experience. Furthermore, it enables smooth guidance to related e-commerce platforms, thereby increasing the user's purchasing intent.

[0155] A "user terminal" is a device used by a user to input information and receive results, and includes electronic devices such as computers, smartphones, and tablets.

[0156] An "emotion analysis device" refers to a system or software that analyzes a user's voice tone, input speed, and selected words to estimate the user's emotional state at that time.

[0157] "Query data" refers to data that users input to seek specific information, and typically includes questions or requests presented in text format.

[0158] "Information sources" refer to various databases and recording media, which are foundational resources for providing information based on search queries.

[0159] A "generative model" refers to an algorithm or system that uses natural language processing to create a text summary based on specific input data.

[0160] "Audio data" refers to audio information recorded in digital format, and is used to present information and provide guidance to users.

[0161] An "e-commerce platform" refers to an online system or website that enables the provision of product information and sales procedures in a digital environment.

[0162] This invention is a system that provides optimal book summary information while taking into account the user's emotional state. The embodiments of this system are described in detail below.

[0163] The user uses a terminal to enter queries about books. The terminal not only sends this entered query data to a server but also collects data such as the user's voice and typing speed. This data is sent to an emotion analysis device, which can estimate the user's emotional state based on their voice tone, typing speed, and selected words.

[0164] The server uses the user's emotional state and query data obtained from the emotion analysis device to extract relevant book information from a source. This source is, for example, a relational database, and efficient data management is performed. The server then uses a generative model (for example, an AI model that performs natural language processing) to summarize the extracted book information. The generative model adjusts the style and level of detail of the text summary according to the user's emotional state. As a concrete example of a prompt, instructions are given to the generative AI model in the form of, "The user is busy and would like a concise summary."

[0165] The generated summary is converted into audio data, and then speech synthesis is performed to suit the user's emotional state. Dedicated software is used for speech synthesis, and this audio data is sent to the terminal. During speech synthesis, adjustments are made to the voice, such as "reading in a relaxed tone."

[0166] Users can ultimately receive the summary in both audio and text formats from their device. The server also provides related book reviews and links to e-commerce platforms. For example, if a user wants to find a book that interests them, the system is expected to detect that the user is feeling stressed and then perform the action of summarizing the book's content in an easy-to-read and reassuring style, and providing immediately accessible links.

[0167] This enables flexible information delivery that takes user emotions into consideration, as well as a comfortable user experience.

[0168] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0169] Step 1:

[0170] The user uses a terminal to enter a query about books. The input data includes text information about specific book titles and topics based on the user's interests. This input becomes the query data that forms the basis of the subsequent information retrieval process. Specifically, the user enters text into the terminal's input field and presses the "Search" button.

[0171] Step 2:

[0172] The terminal simultaneously records the user's voice data and input speed and transmits them to the emotion analysis device. The input includes raw data such as the user's voice tone and input speed. The emotion analysis device uses this data to estimate the emotional state and outputs the result. Specifically, the terminal collects voice data through the microphone and records the input speed in real time as a log.

[0173] Step 3:

[0174] The server receives query data and sentiment data sent from the terminal and extracts relevant book information from the information source. Input includes query data in string format and sentiment status expressed as numbers or categories. It issues search queries to the information source and retrieves information about the relevant books as output. Specifically, it issues SQL queries to retrieve book titles and summaries from the relational database.

[0175] Step 4:

[0176] The server uses a generative AI model to generate a summary text based on the acquired book information. The input includes book information extracted from the database and the user's emotional state. The generative AI model receives a prompt instructing it to "generate a summary according to the user's request," and outputs the summary text. Specifically, the process involves the AI ​​model sequentially summarizing the text according to the amount of text being processed.

[0177] Step 5:

[0178] The server passes the generated summary text to the speech conversion engine to generate audio data. The input includes the summary text generated by the generation AI model and the user's emotional state. The speech conversion engine outputs audio data in a tone appropriate to the emotional state. Specifically, the speech synthesis software adjusts the voice tone based on the emotional request and generates a digital audio file.

[0179] Step 6:

[0180] The terminal receives audio transmitted from the server and plays it for the user. The input includes audio data sent from the server. Specifically, the user clicks the audio play button, and the audio is output through the terminal's speaker.

[0181] Step 7:

[0182] The server also provides the user's device with relevant book review information and links to e-commerce platforms. Input includes review information and purchase link data extracted from the source. The user reviews this information on the device and navigates to the purchase page by clicking the links. Specifically, the purchase link is displayed on the screen, and the user performs a mouse click or tap.

[0183] (Application Example 2)

[0184] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0185] In today's information society, users face the challenge of quickly and efficiently obtaining relevant information from a vast amount of data. Furthermore, a lack of information provision that considers the user's emotional state degrades the quality of the user experience. To address this issue, optimal information provision and audio output tailored to the user's emotional state are required.

[0186] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0187] In this invention, the server includes means for receiving query data about books entered from a user terminal, means for extracting the relevant book information from a data storage area based on the query data, and means for analyzing the user's emotional state and adjusting the style and level of detail of the summary based on the analysis results. This makes it possible to provide optimal information according to the user's emotional state.

[0188] A "user terminal" is a device used by users to input query data and receive information.

[0189] "Query data" refers to data entered to search for information about a book.

[0190] A "data storage area" is a place where book information and evaluation information are stored and can be retrieved as needed.

[0191] A "generative model" is an algorithm or program used to generate a text summary based on extracted book information.

[0192] "Emotional state" refers to the psychological or emotional condition inferred from the user's voice and behavior.

[0193] "Audio format" refers to a format in which text data is converted into audio data using speech synthesis technology.

[0194] An "online store" is a sales platform where products can be purchased online.

[0195] In order to implement this invention, it is necessary to construct a system that utilizes a user terminal, a server, a data storage area, an emotion analysis engine, a generative model, and a speech synthesis engine.

[0196] First, the user terminal receives query data about books entered by the user. The terminal has a built-in sentiment analysis engine that analyzes the emotional state from the user's voice and input actions. This takes into account factors such as voice tone and speed, and the words chosen. Common API services (for example, natural language processing APIs and speech analysis APIs) are used for sentiment analysis.

[0197] Next, the server extracts relevant book information from the data storage area based on the received query data and sentiment analysis results. This book information is summarized by a generative AI model, and its style and level of detail are adjusted according to the user's emotional state. The generative AI model used is designed to respond to prompts that match the user's intent and emotions.

[0198] The generated summary is sent to a speech synthesis engine and converted into an audio format with a narration style tailored to the user's emotions. Existing speech synthesis technologies are used for the speech synthesis.

[0199] Finally, the server sends the generated summary to the user's terminal, also providing links to related book reviews and e-book stores. This allows users to receive emotionally personalized information and immediately purchase related products.

[0200] As a concrete example, in an implementation where the user is experiencing stress, the system can provide a quick and concise summary, delivering the information in a calming voice to reduce stress. An example of a prompt to the generative AI model would be the command, "Given the user's anxiety, generate a summary of relaxing products and add sentimental summaries of customer reviews."

[0201] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0202] Step 1:

[0203] The user terminal waits for the user to input query data about books and receives the input. It takes in the input query data and the user's voice and behavior data and sends it to the sentiment analysis engine. The sentiment analysis engine analyzes the voice tone and input speed to estimate the user's emotional state. The query and emotional state are output as input data.

[0204] Step 2:

[0205] The server receives query data and emotional state sent from the terminal. Based on the query data, it searches for and extracts the relevant book information from the data storage area. As a result, the book information is extracted and sent to the next step.

[0206] Step 3:

[0207] The server runs a generative AI model based on the extracted book information. It adjusts the style and level of detail of the summary according to the user's emotional state to generate an appropriate text summary. In this process, the generative AI model utilizes emotion-responsive prompts. The summary text data is generated, and its output is obtained.

[0208] Step 4:

[0209] The generated text summary is sent to the speech synthesis engine. The speech synthesis engine receives the summary and converts it into speech in a voice style that matches the user's emotions. The output from the speech synthesis engine is the audio data to be provided to the user.

[0210] Step 5:

[0211] The server transmits the generated audio data, related book reviews, and links to online stores to the user's terminal. The user's terminal then provides the received information to the user through screen display and audio playback. The user can receive customized information and consider their options.

[0212] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0213] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search)<url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0214] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0215] [Second Embodiment]

[0216] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0217] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0218] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0219] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0220] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0221] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0222] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0223] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0224] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0225] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0226] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0227] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0228] This invention relates to a system for simplifying the entire process from efficiently obtaining book information using a terminal to making a purchase. The server, which is the core of the system, functions as follows:

[0229] When a user enters a query about a book using a terminal, the terminal sends that data to the server. The server searches its internal database based on the received query and extracts the relevant book information. Using the extracted information, the server applies a generative model to generate a summary of the book's content. This summary is in text format and is then converted into audio format.

[0230] Once an audio summary is prepared, the server sends it to the user's device. The user can then play this audio summary through their device during their daily life, enabling them to efficiently understand the book's content. The server also retrieves review information related to the book to support the user's purchase decision. In addition, the server provides links to online bookstores, guiding users to quickly purchase books they like.

[0231] As a concrete example, consider a case where a user searches for "history textbooks." The user enters "history textbooks" into their device, and the server extracts relevant books from its database. Based on the extracted information, a generative AI creates a summary and converts it into audio format. The user can listen to the audio summary on their device, check reviews, and proceed with the purchase via a link to an online bookstore. In this way, the present invention provides users with an easy and quick way to obtain information and make a purchase.

[0232] The following describes the processing flow.

[0233] Step 1:

[0234] The user uses their device to enter a query about a book and clicks the search button. The device then sends this query data to the server.

[0235] Step 2:

[0236] Based on the query data received by the server, it searches its internal database and extracts the relevant book information. The server then prepares the extracted results for the next processing step.

[0237] Step 3:

[0238] The server passes the acquired book information to a generative model to create a summary of the content. The generative model extracts the key points of the book and generates the summary in text format.

[0239] Step 4:

[0240] The text summary generated by the server is converted into audio format. Therefore, a dedicated speech synthesis engine is used to generate the audio data.

[0241] Step 5:

[0242] The server generates audio summary data and sends it to the user's device. The device receives the audio data and prepares for playback.

[0243] Step 6:

[0244] Users can play an audio summary on their device to efficiently understand the book's content. They can then use this summary to consider purchasing the book.

[0245] Step 7:

[0246] The server retrieves relevant book review information from the database and sends it to the terminal. This allows the user to see ratings from other users.

[0247] Step 8:

[0248] The server provides the device with a link to an online bookstore. When the user clicks the link, the device opens the page of the specified online bookstore and makes the book available for purchase.

[0249] (Example 1)

[0250] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0251] In today's information-saturated society, there is a need to streamline the process by which users can quickly and efficiently acquire the document information they require, understand its content, and make purchasing decisions. However, conventional information retrieval systems have struggled to integrate information extraction, summarization, evaluation, and purchase guidance, often resulting in decreased user satisfaction.

[0252] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0253] In this invention, the server includes means for receiving query data regarding documents input from a user device, means for extracting relevant document information from a storage device based on the query data, and means for generating a text summary using a generation algorithm based on the extracted document information. This enables the user to quickly obtain the information they need, efficiently understand its contents, and make better purchasing decisions based on the evaluation information.

[0254] "User equipment" refers to information terminals such as computers and mobile devices operated by the user, and includes means for inputting and receiving information.

[0255] "Query data" refers to information such as queries and keywords that users enter when searching for specific document information.

[0256] "Document information" refers to information about specific text, books, or digital content that is searched based on a user's query.

[0257] A "storage device" refers to a database or hardware or software system for storing information where document information and related data are pre-stored.

[0258] A "generation algorithm" refers to the computational procedures and processes used to perform text summarization based on extracted document information.

[0259] A "text summary" refers to a concise explanation that condenses the main points and content of a document.

[0260] "Audio format" refers to a form of information that has been converted from text-based information into audio data, enabling users to receive information in audio format.

[0261] An "electronic marketplace" refers to a platform or website where users can purchase documents and goods online.

[0262] This invention is a system for users to efficiently acquire document information and make purchasing decisions based on that information. The system mainly consists of three elements: a server, a terminal, and the user.

[0263] The server has a communication interface for receiving query data from user terminals. Based on user input, the server searches a database in its storage device and extracts relevant document information. The server then applies a generative AI model to the extracted document information to create a text summary of the document content. This generative AI model utilizes natural language processing techniques and has the ability to efficiently summarize the key points of a document.

[0264] After a text summary is generated, the server uses speech-to-speech software to convert it into audio format. This audio data is sent to the user's terminal, allowing the user to easily access the information in everyday situations. The server also retrieves review information for related documents and sends it to the user's terminal. This allows users to make more informed decisions by referring to feedback from other buyers.

[0265] Furthermore, the server generates links to online e-marketplaces, providing users with a smooth path to purchase relevant documents. These links appear on the user's device, allowing them to instantly purchase the necessary documents.

[0266] As a concrete example, consider a case where a user searches for information about "books on ancient Roman history." The user enters a query on their device, the server generates a summary based on the relevant information, and sends the audio data to the device. The user listens to the summary in audio format, views reviews as needed, and clicks on the provided links to go to the purchase page.

[0267] As an example of a prompt, you might input something like, "I'd like a summary of a book on ancient Roman history." This allows the system to respond quickly and provide information that meets the user's needs.

[0268] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0269] Step 1:

[0270] The user opens the search interface on their device and enters keywords related to books or documents. The entered keywords are sent to the server as query data. This sends the server a query for the information the user is looking for, and processing begins.

[0271] Step 2:

[0272] The server searches its database in storage based on the query data it receives. Using a database search algorithm, the server extracts matching document information. The input is keywords sent by the user, and the output is a set of corresponding document information. This process efficiently collects the information the user is looking for.

[0273] Step 3:

[0274] The server inputs the extracted document information into a generative AI model to generate a text summary. In this step, the generative AI model performs natural language processing to generate a concise overview of the key points. The generated text summary is output. This reduces the amount of information and makes it easier for the user to understand.

[0275] Step 4:

[0276] The server uses speech-to-speech software to convert the generated text summary into audio format. The text data is converted into an audio file and output. Users can then access the information in audio format. This process is a means of utilizing the summarized information over a long period of time.

[0277] Step 5:

[0278] The server sends audio files and associated review information to the user's terminal. The input is the generated audio files and review information retrieved from the database, and the output is the complete dataset delivered to the user's terminal. This allows the user to acquire and review the information visually or aurally.

[0279] Step 6:

[0280] Users play audio summaries on their devices and review the provided review information. By manipulating the outputted dataset, users can easily make purchasing decisions. Specific actions include playing audio and loading reviews using the device's audio player.

[0281] Step 7:

[0282] The server generates a link to the relevant online marketplace and sends it to the user's terminal. This link guides the user to purchase the document immediately. When the user clicks the link, the online transaction begins and the purchase is completed.

[0283] (Application Example 1)

[0284] Next, Application Example 1 will be described. In the following description, the data processing device 12 is referred to as a "server", and the smart glasses 214 are referred to as a "terminal".

[0285] In the existing book purchase process, users need to spend a lot of time and effort to obtain book information. The process of checking the content of the book, referring to reviews, and purchasing procedures is cumbersome, which hinders a smooth purchasing experience. In particular, there is no system that can complete these only with voice, and due to the non-intuitive user interface, the convenience is impaired. Therefore, there is a need for a system that can directly provide information to users in voice and enable quick and smooth purchases.

[0286] The specific processing by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0287] In this invention, the server includes means for receiving query information regarding a book input from a user computing terminal, means for extracting corresponding book information from an information storage unit based on the query information, means for generating a character summary using a generation model based on the extracted book information, means for converting the generated character summary into an audio format, means for transmitting the summary information in audio format to the user computing terminal, and means for completing a transaction quickly through electronic payment after the user listens to the audio summary and selects purchase information. As a result, users can efficiently understand the book content and quickly complete the purchase procedure.

[0288] A "user computing terminal" is an electronic device for a user to input information and process the received data.

[0289] "Query information" is search conditions or question data input by a user to obtain specific book information.

[0290] An "information storage unit" refers to a database or storage system used to store book information and related data.

[0291] A "generative model" is an artificial intelligence algorithm used to summarize or transform information from acquired data.

[0292] A "textual summary" is a text-based summary of book information.

[0293] "Audio format" refers to audio data used to output text data as auditory information.

[0294] An "audio summary" is information that has been condensed from the content of a book and then converted into audio data.

[0295] "Purchase information" refers to data related to product selection and payment used by users when purchasing books.

[0296] "Electronic payment" refers to electronic payment methods used for making payments online.

[0297] The system for carrying out this invention includes a server and a user computing terminal. The server receives query information about books from the user computing terminal, searches an information storage unit, and extracts the relevant book information. Based on this information, the server generates a text summary using a generative AI model, and further converts the summary into an audio format.

[0298] The user's computing terminal receives the converted audio summary and provides an environment where users can listen to that information in their daily lives. After listening to the audio summary, users can select purchase information and complete the transaction immediately through electronic payment.

[0299] Specific hardware includes smartphones and cloud servers, and software uses natural language processing libraries (e.g., NLTK, spaCy), text-to-speech APIs (e.g., Google Text-to-Speech), and payment APIs (e.g., Stripe).

[0300] For example, when a user searches for the latest best-selling novel, an audio summary such as "This novel is a thrilling story set in a future city" is sent to the terminal. When the user selects to purchase, electronic payment is completed instantaneously through fingerprint or face authentication.

[0301] Examples of prompt sentences are as follows:

[0302] "Generate an audio summary of the best-selling novel and comment on whether it should be purchased."

[0303] In this way, a system is realized that enables users to efficiently understand the content of books and quickly complete the purchase procedure.

[0304] The flow of the specific process in Application Example 1 will be described using FIG. 12.

[0305] Step 1:

[0306] The user inputs query information about a book on the terminal. This query information is prepared as text data and sent to the server. For example, a search phrase such as "the latest best-selling novel" is input.

[0307] Step 2:

[0308] The server searches the information storage unit using the received query information. Here, a database search is performed to extract the corresponding book information and related data. The input is the query information, and the output is a set of book information.

[0309] Step 3:

[0310] The server inputs the extracted book information into an AI model to generate a textual summary. At this stage, a concise text summary is created. The input is book information, and the output is the summarized text.

[0311] Step 4:

[0312] The server converts the generated summary text into audio format. It uses a speech synthesis API to convert the text data into audio data. At this stage, the generated audio summary is obtained. The input is a text summary, and the output is audio data.

[0313] Step 5:

[0314] The server sends the summarized data, converted into audio format, to the user's device. The user listens to the audio summary through their device and efficiently understands the information. They also use an audio playback application to verify the summary.

[0315] Step 6:

[0316] The user listens to an audio summary and confirms their purchase intention as needed. They select purchase information on the terminal and prepare to make an electronic payment using fingerprint or facial recognition. The input is purchase intention information, and the output is payment preparation information.

[0317] Step 7:

[0318] Based on the user's selection, the server executes the electronic payment process. The transaction is completed using the payment API, and confirmation information regarding the purchase is sent to the terminal. Inputs are purchase information and authentication information, and output is transaction completion information.

[0319] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0320] This invention relates to a system that provides optimal book summary information while taking into account the user's emotional state. This system incorporates an emotion engine to improve the user experience and functions as follows:

[0321] When a user uses a device and enters a query about a book, the device sends that data to the server. In addition, the device recognizes the user's emotions by sending the user's voice and input data to an emotion engine in real time. The emotion engine analyzes the tone of voice, word choice, input speed, etc., to estimate the user's current emotional state.

[0322] The server searches the database based on recognized sentiment information and queries, and extracts relevant book information. This extracted information is passed to a generative model, which creates a summary. The generative model adjusts the style and level of detail of the summary according to the user's sentiment.

[0323] Subsequently, during the process of converting the text summary into speech, adjustments are made to take the user's emotions into consideration. The speech synthesis engine generates speech that matches the emotion, such as a friendly or calm tone.

[0324] For example, if a user wants to obtain information quickly amidst a busy daily life, the emotion engine senses their stress and anxiety. Based on this information, the server creates a summary tailored to the busy user and delivers it quickly. Furthermore, the summary is played in a calming voice to help the user relax.

[0325] The server provides information including relevant book reviews and also presents links to online bookstores to encourage user purchases. In this way, the present invention aims to improve the user experience by providing flexible responses that take into account the user's emotions.

[0326] The following describes the processing flow.

[0327] Step 1:

[0328] The user enters book queries on the device and sends them to the server via voice or input data. In parallel, the device provides the user's real-time sentiment data to the sentiment engine.

[0329] Step 2:

[0330] As the server receives query data, the emotion engine analyzes the user's voice tone and input data to estimate the user's emotional state. The emotion engine then sends these analysis results to the server.

[0331] Step 3:

[0332] Based on the query data and sentiment information received by the server, it searches the database and extracts the relevant book information. The server then prepares to pass the extracted book information to the generative model.

[0333] Step 4:

[0334] The server passes book information to a generative model, which generates a summary. The generative model adjusts the summary based on the user's sentiment information, extracts the necessary information, and creates a text summary.

[0335] Step 5:

[0336] The server activates a speech synthesis engine to convert the generated text summary into speech format. The speech synthesis engine takes sentiment data into account and generates an appropriate speech tone.

[0337] Step 6:

[0338] The server sends the summary, converted into audio format, to the user's device. The device plays the received audio summary, allowing the user to listen to it.

[0339] Step 7:

[0340] Users review the provided audio and text summaries and receive emotionally relevant suggestions to decide on their next course of action. They then purchase books by reading reviews or clicking on links to online bookstores.

[0341] Step 8:

[0342] The server retrieves review information for relevant books from a database and sends it to the user's device. It also provides links to online bookstores, creating a quick and easy path for users to make purchases.

[0343] (Example 2)

[0344] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0345] Conventional information presentation systems struggled to provide information while considering the user's emotional state, and were unable to flexibly respond to the user's desired style and level of detail. Furthermore, the presentation style of extracted information was uniform, resulting in a poor user experience. Additionally, guidance to e-commerce platforms was sometimes not smooth, failing to fully stimulate user purchasing intent.

[0346] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0347] In this invention, the server includes means for estimating the user's emotional state using an emotion analysis device based on information input from the user terminal, means for extracting relevant book information from information sources based on the user's emotional state and query data, and means for converting the generated text summary into audio data and making adjustments appropriate to the user's emotional state. This enables the user to receive appropriate information according to their emotional state, providing a better user experience. Furthermore, it enables smooth guidance to related e-commerce platforms, thereby increasing the user's purchasing intent.

[0348] A "user terminal" is a device used by a user to input information and receive results, and includes electronic devices such as computers, smartphones, and tablets.

[0349] An "emotion analysis device" refers to a system or software that analyzes a user's voice tone, input speed, and selected words to estimate the user's emotional state at that time.

[0350] "Query data" refers to data that users input to seek specific information, and typically includes questions or requests presented in text format.

[0351] "Information sources" refer to various databases and recording media, which are foundational resources for providing information based on search queries.

[0352] A "generative model" refers to an algorithm or system that uses natural language processing to create a text summary based on specific input data.

[0353] "Audio data" refers to audio information recorded in digital format, and is used to present information and provide guidance to users.

[0354] An "e-commerce platform" refers to an online system or website that enables the provision of product information and sales procedures in a digital environment.

[0355] This invention is a system that provides optimal book summary information while taking into account the user's emotional state. The embodiments of this system are described in detail below.

[0356] The user uses a terminal to enter queries about books. The terminal not only sends this entered query data to a server but also collects data such as the user's voice and typing speed. This data is sent to an emotion analysis device, which can estimate the user's emotional state based on their voice tone, typing speed, and selected words.

[0357] The server uses the user's emotional state and query data obtained from the emotion analysis device to extract relevant book information from a source. This source is, for example, a relational database, and efficient data management is performed. The server then uses a generative model (for example, an AI model that performs natural language processing) to summarize the extracted book information. The generative model adjusts the style and level of detail of the text summary according to the user's emotional state. As a concrete example of a prompt, instructions are given to the generative AI model in the form of, "The user is busy and would like a concise summary."

[0358] The generated summary is converted into audio data, and then speech synthesis is performed to suit the user's emotional state. Dedicated software is used for speech synthesis, and this audio data is sent to the terminal. During speech synthesis, adjustments are made to the voice, such as "reading in a relaxed tone."

[0359] Users can ultimately receive the summary in both audio and text formats from their device. The server also provides related book reviews and links to e-commerce platforms. For example, if a user wants to find a book that interests them, the system is expected to detect that the user is feeling stressed and then perform the action of summarizing the book's content in an easy-to-read and reassuring style, and providing immediately accessible links.

[0360] This enables flexible information delivery that takes user emotions into consideration, as well as a comfortable user experience.

[0361] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0362] Step 1:

[0363] The user uses a terminal to enter a query about books. The input data includes text information about specific book titles and topics based on the user's interests. This input becomes the query data that forms the basis of the subsequent information retrieval process. Specifically, the user enters text into the terminal's input field and presses the "Search" button.

[0364] Step 2:

[0365] The terminal simultaneously records the user's voice data and input speed and transmits them to the emotion analysis device. The input includes raw data such as the user's voice tone and input speed. The emotion analysis device uses this data to estimate the emotional state and outputs the result. Specifically, the terminal collects voice data through the microphone and records the input speed in real time as a log.

[0366] Step 3:

[0367] The server receives query data and sentiment data sent from the terminal and extracts relevant book information from the information source. Input includes query data in string format and sentiment status expressed as numbers or categories. It issues search queries to the information source and retrieves information about the relevant books as output. Specifically, it issues SQL queries to retrieve book titles and summaries from the relational database.

[0368] Step 4:

[0369] The server uses a generative AI model to generate a summary text based on the acquired book information. The input includes book information extracted from the database and the user's emotional state. The generative AI model receives a prompt instructing it to "generate a summary according to the user's request," and outputs the summary text. Specifically, the process involves the AI ​​model sequentially summarizing the text according to the amount of text being processed.

[0370] Step 5:

[0371] The server passes the generated summary text to the speech conversion engine to generate audio data. The input includes the summary text generated by the generation AI model and the user's emotional state. The speech conversion engine outputs audio data in a tone appropriate to the emotional state. Specifically, the speech synthesis software adjusts the voice tone based on the emotional request and generates a digital audio file.

[0372] Step 6:

[0373] The device receives audio transmitted from the server and plays it for the user. The input includes audio data sent from the server. Specifically, the user clicks the audio play button, and the audio is output through the device's speaker.

[0374] Step 7:

[0375] The server also provides the user's device with relevant book review information and links to e-commerce platforms. Input includes review information and purchase link data extracted from the source. The user reviews this information on the device and navigates to the purchase page by clicking the links. Specifically, the purchase link is displayed on the screen, and the user performs a mouse click or tap.

[0376] (Application Example 2)

[0377] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0378] In today's information society, users face the challenge of quickly and efficiently obtaining relevant information from a vast amount of data. Furthermore, a lack of information provision that considers the user's emotional state degrades the quality of the user experience. To address this issue, optimal information provision and audio output tailored to the user's emotional state are required.

[0379] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0380] In this invention, the server includes means for receiving query data about books entered from a user terminal, means for extracting the relevant book information from a data storage area based on the query data, and means for analyzing the user's emotional state and adjusting the style and level of detail of the summary based on the analysis results. This makes it possible to provide optimal information according to the user's emotional state.

[0381] A "user terminal" is a device used by users to input query data and receive information.

[0382] "Query data" refers to data entered to search for information about a book.

[0383] A "data storage area" is a place where book information and evaluation information are stored and can be retrieved as needed.

[0384] A "generative model" is an algorithm or program used to generate a text summary based on extracted book information.

[0385] "Emotional state" refers to the psychological or emotional condition inferred from the user's voice and behavior.

[0386] "Audio format" refers to a format in which text data is converted into audio data using speech synthesis technology.

[0387] An "online store" is a sales platform where products can be purchased online.

[0388] In order to implement this invention, it is necessary to construct a system that utilizes a user terminal, a server, a data storage area, an emotion analysis engine, a generative model, and a speech synthesis engine.

[0389] First, the user terminal receives query data about books entered by the user. The terminal has a built-in sentiment analysis engine that analyzes the emotional state from the user's voice and input actions. This takes into account factors such as voice tone and speed, and the words chosen. Common API services (for example, natural language processing APIs and speech analysis APIs) are used for sentiment analysis.

[0390] Next, the server extracts relevant book information from the data storage area based on the received query data and sentiment analysis results. This book information is summarized by a generative AI model, and its style and level of detail are adjusted according to the user's emotional state. The generative AI model used is designed to respond to prompts that match the user's intent and emotions.

[0391] The generated summary is sent to a speech synthesis engine and converted into an audio format with a narration style tailored to the user's emotions. Existing speech synthesis technologies are used for the speech synthesis.

[0392] Finally, the server sends the generated summary to the user's terminal, also providing links to related book reviews and e-book stores. This allows users to receive emotionally personalized information and immediately purchase related products.

[0393] As a concrete example, in an implementation where the user is experiencing stress, the system can provide a quick and concise summary, delivering the information in a calming voice to reduce stress. An example of a prompt to the generative AI model would be the command, "Given the user's anxiety, generate a summary of relaxing products and add sentimental summaries of customer reviews."

[0394] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0395] Step 1:

[0396] The user terminal waits for the user to input query data about books and receives the input. It takes in the input query data and the user's voice and behavior data and sends it to the sentiment analysis engine. The sentiment analysis engine analyzes the voice tone and input speed to estimate the user's emotional state. The query and emotional state are output as input data.

[0397] Step 2:

[0398] The server receives query data and emotional state sent from the terminal. Based on the query data, it searches for and extracts the relevant book information from the data storage area. As a result, the book information is extracted and sent to the next step.

[0399] Step 3:

[0400] The server runs a generative AI model based on the extracted book information. It adjusts the style and level of detail of the summary according to the user's emotional state to generate an appropriate text summary. In this process, the generative AI model utilizes emotion-responsive prompts. The summary text data is generated, and its output is obtained.

[0401] Step 4:

[0402] The generated text summary is sent to the speech synthesis engine. The speech synthesis engine receives the summary and converts it into speech in a voice style that matches the user's emotions. The output from the speech synthesis engine is the audio data to be provided to the user.

[0403] Step 5:

[0404] The server transmits the generated audio data, related book reviews, and links to online stores to the user's terminal. The user's terminal then provides the received information to the user through screen display and audio playback. The user can receive customized information and consider their options.

[0405] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0406] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0407] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0408] [Third Embodiment]

[0409] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0410] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0411] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0412] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0413] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0414] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0415] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0416] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0417] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0418] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0419] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0420] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0421] This invention relates to a system for simplifying the entire process from efficiently obtaining book information using a terminal to making a purchase. The server, which is the core of the system, functions as follows:

[0422] When a user enters a query about a book using a terminal, the terminal sends that data to the server. The server searches its internal database based on the received query and extracts the relevant book information. Using the extracted information, the server applies a generative model to generate a summary of the book's content. This summary is in text format and is then converted into audio format.

[0423] Once an audio summary is prepared, the server sends it to the user's device. The user can then play this audio summary through their device during their daily life, enabling them to efficiently understand the book's content. The server also retrieves review information related to the book to support the user's purchase decision. In addition, the server provides links to online bookstores, guiding users to quickly purchase books they like.

[0424] As a concrete example, consider a case where a user searches for "history textbooks." The user enters "history textbooks" into their device, and the server extracts relevant books from its database. Based on the extracted information, a generative AI creates a summary and converts it into audio format. The user can listen to the audio summary on their device, check reviews, and proceed with the purchase via a link to an online bookstore. In this way, the present invention provides users with an easy and quick way to obtain information and make a purchase.

[0425] The following describes the processing flow.

[0426] Step 1:

[0427] The user uses their device to enter a query about a book and clicks the search button. The device then sends this query data to the server.

[0428] Step 2:

[0429] Based on the query data received by the server, it searches its internal database and extracts the relevant book information. The server then prepares the extracted results for the next processing step.

[0430] Step 3:

[0431] The server passes the acquired book information to a generative model to create a summary of the content. The generative model extracts the key points of the book and generates the summary in text format.

[0432] Step 4:

[0433] The text summary generated by the server is converted into audio format. Therefore, a dedicated speech synthesis engine is used to generate the audio data.

[0434] Step 5:

[0435] The server generates audio summary data and sends it to the user's device. The device receives the audio data and prepares for playback.

[0436] Step 6:

[0437] Users can play an audio summary on their device to efficiently understand the book's content. They can then use this summary to consider purchasing the book.

[0438] Step 7:

[0439] The server retrieves relevant book review information from the database and sends it to the terminal. This allows the user to see ratings from other users.

[0440] Step 8:

[0441] The server provides the device with a link to an online bookstore. When the user clicks the link, the device opens the page of the specified online bookstore and makes the book available for purchase.

[0442] (Example 1)

[0443] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0444] In today's information-saturated society, there is a need to streamline the process by which users can quickly and efficiently acquire the document information they require, understand its content, and make purchasing decisions. However, conventional information retrieval systems have struggled to integrate information extraction, summarization, evaluation, and purchase guidance, often resulting in decreased user satisfaction.

[0445] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0446] In this invention, the server includes means for receiving query data regarding documents input from a user device, means for extracting relevant document information from a storage device based on the query data, and means for generating a text summary using a generation algorithm based on the extracted document information. This enables the user to quickly obtain the information they need, efficiently understand its contents, and make better purchasing decisions based on the evaluation information.

[0447] "User equipment" refers to information terminals such as computers and mobile devices operated by the user, and includes means for inputting and receiving information.

[0448] "Query data" refers to information such as queries and keywords that users enter when searching for specific document information.

[0449] "Document information" refers to information about specific text, books, or digital content that is searched based on a user's query.

[0450] A "storage device" refers to a database or hardware or software system for storing information where document information and related data are pre-stored.

[0451] A "generation algorithm" refers to the computational procedures and processes used to perform text summarization based on extracted document information.

[0452] A "text summary" refers to a concise explanation that condenses the main points and content of a document.

[0453] "Audio format" refers to a form of information that has been converted from text-based information into audio data, enabling users to receive information in audio format.

[0454] An "electronic marketplace" refers to a platform or website where users can purchase documents and goods online.

[0455] This invention is a system for users to efficiently acquire document information and make purchasing decisions based on that information. The system mainly consists of three elements: a server, a terminal, and the user.

[0456] The server has a communication interface for receiving query data from user terminals. Based on user input, the server searches a database in its storage device and extracts relevant document information. The server then applies a generative AI model to the extracted document information to create a text summary of the document content. This generative AI model utilizes natural language processing techniques and has the ability to efficiently summarize the key points of a document.

[0457] After a text summary is generated, the server uses speech-to-speech software to convert it into audio format. This audio data is sent to the user's terminal, allowing the user to easily access the information in everyday situations. The server also retrieves review information for related documents and sends it to the user's terminal. This allows users to make more informed decisions by referring to feedback from other buyers.

[0458] Furthermore, the server generates links to online e-marketplaces, providing users with a smooth path to purchase relevant documents. These links appear on the user's device, allowing them to instantly purchase the necessary documents.

[0459] As a concrete example, consider a case where a user searches for information about "books on ancient Roman history." The user enters a query on their device, the server generates a summary based on the relevant information, and sends the audio data to the device. The user listens to the summary in audio format, views reviews as needed, and clicks on the provided links to go to the purchase page.

[0460] As an example of a prompt, you might input something like, "I'd like a summary of a book on ancient Roman history." This allows the system to respond quickly and provide information that meets the user's needs.

[0461] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0462] Step 1:

[0463] The user opens the search interface on their device and enters keywords related to books or documents. The entered keywords are sent to the server as query data. This sends the server a query for the information the user is looking for, and processing begins.

[0464] Step 2:

[0465] The server searches its database in storage based on the query data it receives. Using a database search algorithm, the server extracts matching document information. The input is keywords sent by the user, and the output is a set of corresponding document information. This process efficiently collects the information the user is looking for.

[0466] Step 3:

[0467] The server inputs the extracted document information into a generative AI model to generate a text summary. In this step, the generative AI model performs natural language processing to generate a concise overview of the key points. The generated text summary is output. This reduces the amount of information and makes it easier for the user to understand.

[0468] Step 4:

[0469] The server uses speech-to-speech software to convert the generated text summary into audio format. The text data is converted into an audio file and output. Users can then access the information in audio format. This process is a means of utilizing the summarized information over a long period of time.

[0470] Step 5:

[0471] The server sends audio files and associated review information to the user's terminal. The input is the generated audio files and review information retrieved from the database, and the output is the complete dataset delivered to the user's terminal. This allows the user to acquire and review the information visually or aurally.

[0472] Step 6:

[0473] Users play audio summaries on their devices and review the provided review information. By manipulating the outputted dataset, users can easily make purchasing decisions. Specific actions include playing audio and loading reviews using the device's audio player.

[0474] Step 7:

[0475] The server generates a link to the relevant online marketplace and sends it to the user's terminal. This link guides the user to purchase the document immediately. When the user clicks the link, the online transaction begins and the purchase is completed.

[0476] (Application Example 1)

[0477] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0478] Existing book purchasing processes require users to spend a significant amount of time and effort obtaining book information. The process of reviewing book content, reading reviews, and completing the purchase is cumbersome, hindering a smooth buying experience. In particular, there are no systems that complete these processes solely through voice, and the user interface is not intuitive, compromising convenience. Therefore, there is a need for a system that provides information directly to users via voice, enabling quick and smooth purchases.

[0479] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0480] In this invention, the server includes means for receiving query information about books entered from a user's computing terminal, means for extracting the corresponding book information from an information storage unit based on the query information, means for generating a text summary using a generative model based on the extracted book information, means for converting the generated text summary into an audio format, means for transmitting the summary information converted into an audio format to the user's computing terminal, and means for the user to listen to the audio summary, select purchase information, and quickly complete the transaction through electronic payment. This enables the user to efficiently understand the contents of the book and quickly complete the purchase procedure.

[0481] A "user computing terminal" is an electronic device used by users to input information and process received data.

[0482] "Query information" refers to the search criteria and question data entered by the user to retrieve specific book information.

[0483] An "information storage unit" refers to a database or storage system used to store book information and related data.

[0484] A "generative model" is an artificial intelligence algorithm used to summarize or transform information from acquired data.

[0485] A "textual summary" is a text-based summary of book information.

[0486] "Audio format" refers to audio data used to output text data as auditory information.

[0487] An "audio summary" is information that has been condensed from the content of a book and then converted into audio data.

[0488] "Purchase information" refers to data related to product selection and payment used by users when purchasing books.

[0489] "Electronic payment" refers to electronic payment methods used for making payments online.

[0490] The system for carrying out this invention includes a server and a user computing terminal. The server receives query information about books from the user computing terminal, searches an information storage unit, and extracts the relevant book information. Based on this information, the server generates a text summary using a generative AI model, and further converts the summary into an audio format.

[0491] The user's computing terminal receives the converted audio summary and provides an environment where users can listen to that information in their daily lives. After listening to the audio summary, users can select purchase information and complete the transaction immediately through electronic payment.

[0492] Specific hardware includes smartphones and cloud servers, while software uses natural language processing libraries (e.g., NLTK, spaCy), speech synthesis APIs (e.g., Google Text-to-Speech), and payment APIs (e.g., Stripe).

[0493] For example, when a user searches for the latest bestselling novel, an audio summary such as "This novel is a thrilling story set in a futuristic city" is sent to the device. If the user chooses to purchase, electronic payment is completed instantly using fingerprint or facial recognition.

[0494] Examples of prompt statements are as follows:

[0495] "Generate an audio summary of a bestselling novel and comment on whether you should buy it."

[0496] In this way, a system is created that allows users to efficiently understand the content of books and quickly complete the purchase process.

[0497] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0498] Step 1:

[0499] The user enters query information about books on their device. This query information is prepared as text data and sent to the server. For example, the search phrase "latest best-selling novels" might be entered.

[0500] Step 2:

[0501] The server uses the received query information to search the information storage. A database search is performed here, and the relevant book information and related data are extracted. The input is the query information, and the output is a set of book information.

[0502] Step 3:

[0503] The server inputs the extracted book information into an AI model to generate a textual summary. At this stage, a concise text summary is created. The input is book information, and the output is the summarized text.

[0504] Step 4:

[0505] The server converts the generated summary text into audio format. It uses a speech synthesis API to convert the text data into audio data. At this stage, the generated audio summary is obtained. The input is a text summary, and the output is audio data.

[0506] Step 5:

[0507] The server sends the summarized data, converted into audio format, to the user's device. The user listens to the audio summary through their device and efficiently understands the information. They also use an audio playback application to verify the summary.

[0508] Step 6:

[0509] The user listens to an audio summary and confirms their purchase intention as needed. They select purchase information on the terminal and prepare to make an electronic payment using fingerprint or facial recognition. The input is purchase intention information, and the output is payment preparation information.

[0510] Step 7:

[0511] Based on the user's selection, the server executes the electronic payment process. The transaction is completed using the payment API, and confirmation information regarding the purchase is sent to the terminal. Inputs are purchase information and authentication information, and output is transaction completion information.

[0512] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0513] This invention relates to a system that provides optimal book summary information while taking into account the user's emotional state. This system incorporates an emotion engine to improve the user experience and functions as follows:

[0514] When a user uses a device and enters a query about a book, the device sends that data to the server. In addition, the device recognizes the user's emotions by sending the user's voice and input data to an emotion engine in real time. The emotion engine analyzes the tone of voice, word choice, input speed, etc., to estimate the user's current emotional state.

[0515] The server searches the database based on recognized sentiment information and queries, and extracts relevant book information. This extracted information is passed to a generative model, which creates a summary. The generative model adjusts the style and level of detail of the summary according to the user's sentiment.

[0516] Subsequently, during the process of converting the text summary into speech, adjustments are made to take the user's emotions into consideration. The speech synthesis engine generates speech that matches the emotion, such as a friendly or calm tone.

[0517] For example, if a user wants to obtain information quickly amidst a busy daily life, the emotion engine senses their stress and anxiety. Based on this information, the server creates a summary tailored to the busy user and delivers it quickly. Furthermore, the summary is played in a calming voice to help the user relax.

[0518] The server provides information including relevant book reviews and also presents links to online bookstores to encourage user purchases. In this way, the present invention aims to improve the user experience by providing flexible responses that take into account the user's emotions.

[0519] The following describes the processing flow.

[0520] Step 1:

[0521] The user enters book queries on the device and sends them to the server via voice or input data. In parallel, the device provides the user's real-time sentiment data to the sentiment engine.

[0522] Step 2:

[0523] As the server receives query data, the emotion engine analyzes the user's voice tone and input data to estimate the user's emotional state. The emotion engine then sends these analysis results to the server.

[0524] Step 3:

[0525] Based on the query data and sentiment information received by the server, it searches the database and extracts the relevant book information. The server then prepares to pass the extracted book information to the generative model.

[0526] Step 4:

[0527] The server passes book information to a generative model, which generates a summary. The generative model adjusts the summary based on the user's sentiment information, extracts the necessary information, and creates a text summary.

[0528] Step 5:

[0529] The server activates a speech synthesis engine to convert the generated text summary into speech format. The speech synthesis engine takes sentiment data into account and generates an appropriate speech tone.

[0530] Step 6:

[0531] The server sends the summary, converted into audio format, to the user's device. The device plays the received audio summary, allowing the user to listen to it.

[0532] Step 7:

[0533] Users review the provided audio and text summaries and receive emotionally relevant suggestions to decide on their next course of action. They then purchase books by reading reviews or clicking on links to online bookstores.

[0534] Step 8:

[0535] The server retrieves review information for relevant books from a database and sends it to the user's device. It also provides links to online bookstores, creating a quick and easy path for users to make purchases.

[0536] (Example 2)

[0537] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0538] Conventional information presentation systems struggled to provide information while considering the user's emotional state, and were unable to flexibly respond to the user's desired style and level of detail. Furthermore, the presentation style of extracted information was uniform, resulting in a poor user experience. Additionally, guidance to e-commerce platforms was sometimes not smooth, failing to fully stimulate user purchasing intent.

[0539] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0540] In this invention, the server includes means for estimating the user's emotional state using an emotion analysis device based on information input from the user terminal, means for extracting relevant book information from information sources based on the user's emotional state and query data, and means for converting the generated text summary into audio data and making adjustments appropriate to the user's emotional state. This enables the user to receive appropriate information according to their emotional state, providing a better user experience. Furthermore, it enables smooth guidance to related e-commerce platforms, thereby increasing the user's purchasing intent.

[0541] A "user terminal" is a device used by a user to input information and receive results, and includes electronic devices such as computers, smartphones, and tablets.

[0542] An "emotion analysis device" refers to a system or software that analyzes a user's voice tone, input speed, and selected words to estimate the user's emotional state at that time.

[0543] "Query data" refers to data that users input to seek specific information, and typically includes questions or requests presented in text format.

[0544] "Information sources" refer to various databases and recording media, which are foundational resources for providing information based on search queries.

[0545] A "generative model" refers to an algorithm or system that uses natural language processing to create a text summary based on specific input data.

[0546] "Audio data" refers to audio information recorded in digital format, and is used to present information and provide guidance to users.

[0547] An "e-commerce platform" refers to an online system or website that enables the provision of product information and sales procedures in a digital environment.

[0548] This invention is a system that provides optimal book summary information while taking into account the user's emotional state. The embodiments of this system are described in detail below.

[0549] The user uses a terminal to enter queries about books. The terminal not only sends this entered query data to a server but also collects data such as the user's voice and typing speed. This data is sent to an emotion analysis device, which can estimate the user's emotional state based on their voice tone, typing speed, and selected words.

[0550] The server uses the user's emotional state and query data obtained from the emotion analysis device to extract relevant book information from a source. This source is, for example, a relational database, and efficient data management is performed. The server then uses a generative model (for example, an AI model that performs natural language processing) to summarize the extracted book information. The generative model adjusts the style and level of detail of the text summary according to the user's emotional state. As a concrete example of a prompt, instructions are given to the generative AI model in the form of, "The user is busy and would like a concise summary."

[0551] The generated summary is converted into audio data, and then speech synthesis is performed to suit the user's emotional state. Dedicated software is used for speech synthesis, and this audio data is sent to the terminal. During speech synthesis, adjustments are made to the voice, such as "reading in a relaxed tone."

[0552] Users can ultimately receive the summary in both audio and text formats from their device. The server also provides related book reviews and links to e-commerce platforms. For example, if a user wants to find a book that interests them, the system is expected to detect that the user is feeling stressed and then perform the action of summarizing the book's content in an easy-to-read and reassuring style, and providing immediately accessible links.

[0553] This enables flexible information delivery that takes user emotions into consideration, as well as a comfortable user experience.

[0554] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0555] Step 1:

[0556] The user uses a terminal to enter a query about books. The input data includes text information about specific book titles and topics based on the user's interests. This input becomes the query data that forms the basis of the subsequent information retrieval process. Specifically, the user enters text into the terminal's input field and presses the "Search" button.

[0557] Step 2:

[0558] The terminal simultaneously records the user's voice data and input speed and transmits them to the emotion analysis device. The input includes raw data such as the user's voice tone and input speed. The emotion analysis device uses this data to estimate the emotional state and outputs the result. Specifically, the terminal collects voice data through the microphone and records the input speed in real time as a log.

[0559] Step 3:

[0560] The server receives query data and sentiment data sent from the terminal and extracts relevant book information from the information source. Input includes query data in string format and sentiment status expressed as numbers or categories. It issues search queries to the information source and retrieves information about the relevant books as output. Specifically, it issues SQL queries to retrieve book titles and summaries from the relational database.

[0561] Step 4:

[0562] The server uses a generative AI model to generate a summary text based on the acquired book information. The input includes book information extracted from the database and the user's emotional state. The generative AI model receives a prompt instructing it to "generate a summary according to the user's request," and outputs the summary text. Specifically, the process involves the AI ​​model sequentially summarizing the text according to the amount of text being processed.

[0563] Step 5:

[0564] The server passes the generated summary text to the speech conversion engine to generate audio data. The input includes the summary text generated by the generation AI model and the user's emotional state. The speech conversion engine outputs audio data in a tone appropriate to the emotional state. Specifically, the speech synthesis software adjusts the voice tone based on the emotional request and generates a digital audio file.

[0565] Step 6:

[0566] The device receives audio transmitted from the server and plays it for the user. The input includes audio data sent from the server. Specifically, the user clicks the audio play button, and the audio is output through the device's speaker.

[0567] Step 7:

[0568] The server also provides the user's device with relevant book review information and links to e-commerce platforms. Input includes review information and purchase link data extracted from the source. The user reviews this information on the device and navigates to the purchase page by clicking the links. Specifically, the purchase link is displayed on the screen, and the user performs a mouse click or tap.

[0569] (Application Example 2)

[0570] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0571] In today's information society, users face the challenge of quickly and efficiently obtaining relevant information from a vast amount of data. Furthermore, a lack of information provision that considers the user's emotional state degrades the quality of the user experience. To address this issue, optimal information provision and audio output tailored to the user's emotional state are required.

[0572] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0573] In this invention, the server includes means for receiving query data about books entered from a user terminal, means for extracting the relevant book information from a data storage area based on the query data, and means for analyzing the user's emotional state and adjusting the style and level of detail of the summary based on the analysis results. This makes it possible to provide optimal information according to the user's emotional state.

[0574] A "user terminal" is a device used by users to input query data and receive information.

[0575] "Query data" refers to data entered to search for information about a book.

[0576] A "data storage area" is a place where book information and evaluation information are stored and can be retrieved as needed.

[0577] A "generative model" is an algorithm or program used to generate a text summary based on extracted book information.

[0578] "Emotional state" refers to the psychological or emotional condition inferred from the user's voice and behavior.

[0579] "Audio format" refers to a format in which text data is converted into audio data using speech synthesis technology.

[0580] An "online store" is a sales platform where products can be purchased online.

[0581] In order to implement this invention, it is necessary to construct a system that utilizes a user terminal, a server, a data storage area, an emotion analysis engine, a generative model, and a speech synthesis engine.

[0582] First, the user terminal receives query data about books entered by the user. The terminal has a built-in sentiment analysis engine that analyzes the emotional state from the user's voice and input actions. This takes into account factors such as voice tone and speed, and the words chosen. Common API services (for example, natural language processing APIs and speech analysis APIs) are used for sentiment analysis.

[0583] Next, the server extracts relevant book information from the data storage area based on the received query data and sentiment analysis results. This book information is summarized by a generative AI model, and its style and level of detail are adjusted according to the user's emotional state. The generative AI model used is designed to respond to prompts that match the user's intent and emotions.

[0584] The generated summary is sent to a speech synthesis engine and converted into an audio format with a narration style tailored to the user's emotions. Existing speech synthesis technologies are used for the speech synthesis.

[0585] Finally, the server sends the generated summary to the user's terminal, also providing links to related book reviews and e-book stores. This allows users to receive emotionally personalized information and immediately purchase related products.

[0586] As a concrete example, in an implementation where the user is experiencing stress, the system can provide a quick and concise summary, delivering the information in a calming voice to reduce stress. An example of a prompt to the generative AI model would be the command, "Given the user's anxiety, generate a summary of relaxing products and add sentimental summaries of customer reviews."

[0587] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0588] Step 1:

[0589] The user terminal waits for the user to input query data about books and receives the input. It takes in the input query data and the user's voice and behavior data and sends it to the sentiment analysis engine. The sentiment analysis engine analyzes the voice tone and input speed to estimate the user's emotional state. The query and emotional state are output as input data.

[0590] Step 2:

[0591] The server receives query data and emotional state sent from the terminal. Based on the query data, it searches for and extracts the relevant book information from the data storage area. As a result, the book information is extracted and sent to the next step.

[0592] Step 3:

[0593] The server runs a generative AI model based on the extracted book information. It adjusts the style and level of detail of the summary according to the user's emotional state to generate an appropriate text summary. In this process, the generative AI model utilizes emotion-responsive prompts. The summary text data is generated, and its output is obtained.

[0594] Step 4:

[0595] The generated text summary is sent to the speech synthesis engine. The speech synthesis engine receives the summary and converts it into speech in a voice style that matches the user's emotions. The output from the speech synthesis engine is the audio data to be provided to the user.

[0596] Step 5:

[0597] The server transmits the generated audio data, related book reviews, and links to online stores to the user's terminal. The user's terminal then provides the received information to the user through screen display and audio playback. The user can receive customized information and consider their options.

[0598] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0599] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0600] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0601] [Fourth Embodiment]

[0602] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0603] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0604] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0605] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0606] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0607] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0608] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0609] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0610] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0611] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0612] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0613] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0614] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0615] This invention relates to a system for simplifying the entire process from efficiently obtaining book information using a terminal to making a purchase. The server, which is the core of the system, functions as follows:

[0616] When a user enters a query about a book using a terminal, the terminal sends that data to the server. The server searches its internal database based on the received query and extracts the relevant book information. Using the extracted information, the server applies a generative model to generate a summary of the book's content. This summary is in text format and is then converted into audio format.

[0617] Once an audio summary is prepared, the server sends it to the user's device. The user can then play this audio summary through their device during their daily life, enabling them to efficiently understand the book's content. The server also retrieves review information related to the book to support the user's purchase decision. In addition, the server provides links to online bookstores, guiding users to quickly purchase books they like.

[0618] As a concrete example, consider a case where a user searches for "history textbooks." The user enters "history textbooks" into their device, and the server extracts relevant books from its database. Based on the extracted information, a generative AI creates a summary and converts it into audio format. The user can listen to the audio summary on their device, check reviews, and proceed with the purchase via a link to an online bookstore. In this way, the present invention provides users with an easy and quick way to obtain information and make a purchase.

[0619] The following describes the processing flow.

[0620] Step 1:

[0621] The user uses their device to enter a query about a book and clicks the search button. The device then sends this query data to the server.

[0622] Step 2:

[0623] Based on the query data received by the server, it searches its internal database and extracts the relevant book information. The server then prepares the extracted results for the next processing step.

[0624] Step 3:

[0625] The server passes the acquired book information to a generative model to create a summary of the content. The generative model extracts the key points of the book and generates the summary in text format.

[0626] Step 4:

[0627] The text summary generated by the server is converted into audio format. Therefore, a dedicated speech synthesis engine is used to generate the audio data.

[0628] Step 5:

[0629] The server generates audio summary data and sends it to the user's device. The device receives the audio data and prepares for playback.

[0630] Step 6:

[0631] Users can play an audio summary on their device to efficiently understand the book's content. They can then use this summary to consider purchasing the book.

[0632] Step 7:

[0633] The server retrieves relevant book review information from the database and sends it to the terminal. This allows the user to see ratings from other users.

[0634] Step 8:

[0635] The server provides the device with a link to an online bookstore. When the user clicks the link, the device opens the page of the specified online bookstore and makes the book available for purchase.

[0636] (Example 1)

[0637] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0638] In today's information-saturated society, there is a need to streamline the process by which users can quickly and efficiently acquire the document information they require, understand its content, and make purchasing decisions. However, conventional information retrieval systems have struggled to integrate information extraction, summarization, evaluation, and purchase guidance, often resulting in decreased user satisfaction.

[0639] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0640] In this invention, the server includes means for receiving query data regarding documents input from a user device, means for extracting relevant document information from a storage device based on the query data, and means for generating a text summary using a generation algorithm based on the extracted document information. This enables the user to quickly obtain the information they need, efficiently understand its contents, and make better purchasing decisions based on the evaluation information.

[0641] "User equipment" refers to information terminals such as computers and mobile devices operated by the user, and includes means for inputting and receiving information.

[0642] "Query data" refers to information such as queries and keywords that users enter when searching for specific document information.

[0643] "Document information" refers to information about specific text, books, or digital content that is searched based on a user's query.

[0644] A "storage device" refers to a database or hardware or software system for storing information where document information and related data are pre-stored.

[0645] A "generation algorithm" refers to the computational procedures and processes used to perform text summarization based on extracted document information.

[0646] A "text summary" refers to a concise explanation that condenses the main points and content of a document.

[0647] "Audio format" refers to a form of information that has been converted from text-based information into audio data, enabling users to receive information in audio format.

[0648] An "electronic marketplace" refers to a platform or website where users can purchase documents and goods online.

[0649] This invention is a system for users to efficiently acquire document information and make purchasing decisions based on that information. The system mainly consists of three elements: a server, a terminal, and the user.

[0650] The server has a communication interface for receiving query data from user terminals. Based on user input, the server searches a database in its storage device and extracts relevant document information. The server then applies a generative AI model to the extracted document information to create a text summary of the document content. This generative AI model utilizes natural language processing techniques and has the ability to efficiently summarize the key points of a document.

[0651] After a text summary is generated, the server uses speech-to-speech software to convert it into audio format. This audio data is sent to the user's terminal, allowing the user to easily access the information in everyday situations. The server also retrieves review information for related documents and sends it to the user's terminal. This allows users to make more informed decisions by referring to feedback from other buyers.

[0652] Furthermore, the server generates links to online e-marketplaces, providing users with a smooth path to purchase relevant documents. These links appear on the user's device, allowing them to instantly purchase the necessary documents.

[0653] As a concrete example, consider a case where a user searches for information about "books on ancient Roman history." The user enters a query on their device, the server generates a summary based on the relevant information, and sends the audio data to the device. The user listens to the summary in audio format, views reviews as needed, and clicks on the provided links to go to the purchase page.

[0654] As an example of a prompt, you might input something like, "I'd like a summary of a book on ancient Roman history." This allows the system to respond quickly and provide information that meets the user's needs.

[0655] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0656] Step 1:

[0657] The user opens the search interface on their device and enters keywords related to books or documents. The entered keywords are sent to the server as query data. This sends the server a query for the information the user is looking for, and processing begins.

[0658] Step 2:

[0659] The server searches its database in storage based on the query data it receives. Using a database search algorithm, the server extracts matching document information. The input is keywords sent by the user, and the output is a set of corresponding document information. This process efficiently collects the information the user is looking for.

[0660] Step 3:

[0661] The server inputs the extracted document information into a generative AI model to generate a text summary. In this step, the generative AI model performs natural language processing to generate a concise overview of the key points. The generated text summary is output. This reduces the amount of information and makes it easier for the user to understand.

[0662] Step 4:

[0663] The server uses speech-to-speech software to convert the generated text summary into audio format. The text data is converted into an audio file and output. Users can then access the information in audio format. This process is a means of utilizing the summarized information over a long period of time.

[0664] Step 5:

[0665] The server sends audio files and associated review information to the user's terminal. The input is the generated audio files and review information retrieved from the database, and the output is the complete dataset delivered to the user's terminal. This allows the user to acquire and review the information visually or aurally.

[0666] Step 6:

[0667] Users play audio summaries on their devices and review the provided review information. By manipulating the outputted dataset, users can easily make purchasing decisions. Specific actions include playing audio and loading reviews using the device's audio player.

[0668] Step 7:

[0669] The server generates a link to the relevant online marketplace and sends it to the user's terminal. This link guides the user to purchase the document immediately. When the user clicks the link, the online transaction begins and the purchase is completed.

[0670] (Application Example 1)

[0671] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0672] Existing book purchasing processes require users to spend a significant amount of time and effort obtaining book information. The process of reviewing book content, reading reviews, and completing the purchase is cumbersome, hindering a smooth buying experience. In particular, there are no systems that complete these processes solely through voice, and the user interface is not intuitive, compromising convenience. Therefore, there is a need for a system that provides information directly to users via voice, enabling quick and smooth purchases.

[0673] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0674] In this invention, the server includes means for receiving query information about books entered from a user's computing terminal, means for extracting the corresponding book information from an information storage unit based on the query information, means for generating a text summary using a generative model based on the extracted book information, means for converting the generated text summary into an audio format, means for transmitting the summary information converted into an audio format to the user's computing terminal, and means for the user to listen to the audio summary, select purchase information, and quickly complete the transaction through electronic payment. This enables the user to efficiently understand the contents of the book and quickly complete the purchase procedure.

[0675] A "user computing terminal" is an electronic device used by users to input information and process received data.

[0676] "Query information" refers to the search criteria and question data entered by the user to retrieve specific book information.

[0677] An "information storage unit" refers to a database or storage system used to store book information and related data.

[0678] A "generative model" is an artificial intelligence algorithm used to summarize or transform information from acquired data.

[0679] A "textual summary" is a text-based summary of book information.

[0680] "Audio format" refers to audio data used to output text data as auditory information.

[0681] An "audio summary" is information that has been condensed from the content of a book and then converted into audio data.

[0682] "Purchase information" refers to data related to product selection and payment used by users when purchasing books.

[0683] "Electronic payment" refers to electronic payment methods used for making payments online.

[0684] The system for carrying out this invention includes a server and a user computing terminal. The server receives query information about books from the user computing terminal, searches an information storage unit, and extracts the relevant book information. Based on this information, the server generates a text summary using a generative AI model, and further converts the summary into an audio format.

[0685] The user's computing terminal receives the converted audio summary and provides an environment where users can listen to that information in their daily lives. After listening to the audio summary, users can select purchase information and complete the transaction immediately through electronic payment.

[0686] Specific hardware includes smartphones and cloud servers, while software uses natural language processing libraries (e.g., NLTK, spaCy), speech synthesis APIs (e.g., Google Text-to-Speech), and payment APIs (e.g., Stripe).

[0687] For example, when a user searches for the latest bestselling novel, an audio summary such as "This novel is a thrilling story set in a futuristic city" is sent to the device. If the user chooses to purchase, electronic payment is completed instantly using fingerprint or facial recognition.

[0688] Examples of prompt statements are as follows:

[0689] "Generate an audio summary of a bestselling novel and comment on whether you should buy it."

[0690] In this way, a system is created that allows users to efficiently understand the content of books and quickly complete the purchase process.

[0691] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0692] Step 1:

[0693] The user enters query information about books on their device. This query information is prepared as text data and sent to the server. For example, the search phrase "latest best-selling novels" might be entered.

[0694] Step 2:

[0695] The server uses the received query information to search the information storage. A database search is performed here, and the relevant book information and related data are extracted. The input is the query information, and the output is a set of book information.

[0696] Step 3:

[0697] The server inputs the extracted book information into an AI model to generate a textual summary. At this stage, a concise text summary is created. The input is book information, and the output is the summarized text.

[0698] Step 4:

[0699] The server converts the generated summary text into audio format. It uses a speech synthesis API to convert the text data into audio data. At this stage, the generated audio summary is obtained. The input is a text summary, and the output is audio data.

[0700] Step 5:

[0701] The server sends the summarized data, converted into audio format, to the user's device. The user listens to the audio summary through their device and efficiently understands the information. They also use an audio playback application to verify the summary.

[0702] Step 6:

[0703] The user listens to an audio summary and confirms their purchase intention as needed. They select purchase information on the terminal and prepare to make an electronic payment using fingerprint or facial recognition. The input is purchase intention information, and the output is payment preparation information.

[0704] Step 7:

[0705] Based on the user's selection, the server executes the electronic payment process. The transaction is completed using the payment API, and confirmation information regarding the purchase is sent to the terminal. Inputs are purchase information and authentication information, and output is transaction completion information.

[0706] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0707] This invention relates to a system that provides optimal book summary information while taking into account the user's emotional state. This system incorporates an emotion engine to improve the user experience and functions as follows:

[0708] When a user uses a device and enters a query about a book, the device sends that data to the server. In addition, the device recognizes the user's emotions by sending the user's voice and input data to an emotion engine in real time. The emotion engine analyzes the tone of voice, word choice, input speed, etc., to estimate the user's current emotional state.

[0709] The server searches the database based on recognized sentiment information and queries, and extracts relevant book information. This extracted information is passed to a generative model, which creates a summary. The generative model adjusts the style and level of detail of the summary according to the user's sentiment.

[0710] Subsequently, during the process of converting the text summary into speech, adjustments are made to take the user's emotions into consideration. The speech synthesis engine generates speech that matches the emotion, such as a friendly or calm tone.

[0711] For example, if a user wants to obtain information quickly amidst a busy daily life, the emotion engine senses their stress and anxiety. Based on this information, the server creates a summary tailored to the busy user and delivers it quickly. Furthermore, the summary is played in a calming voice to help the user relax.

[0712] The server provides information including relevant book reviews and also presents links to online bookstores to encourage user purchases. In this way, the present invention aims to improve the user experience by providing flexible responses that take into account the user's emotions.

[0713] The following describes the processing flow.

[0714] Step 1:

[0715] The user enters book queries on the device and sends them to the server via voice or input data. In parallel, the device provides the user's real-time sentiment data to the sentiment engine.

[0716] Step 2:

[0717] As the server receives query data, the emotion engine analyzes the user's voice tone and input data to estimate the user's emotional state. The emotion engine then sends these analysis results to the server.

[0718] Step 3:

[0719] Based on the query data and sentiment information received by the server, it searches the database and extracts the relevant book information. The server then prepares to pass the extracted book information to the generative model.

[0720] Step 4:

[0721] The server passes book information to a generative model, which generates a summary. The generative model adjusts the summary based on the user's sentiment information, extracts the necessary information, and creates a text summary.

[0722] Step 5:

[0723] The server activates a speech synthesis engine to convert the generated text summary into speech format. The speech synthesis engine takes sentiment data into account and generates an appropriate speech tone.

[0724] Step 6:

[0725] The server sends the summary, converted into audio format, to the user's device. The device plays the received audio summary, allowing the user to listen to it.

[0726] Step 7:

[0727] Users review the provided audio and text summaries and receive emotionally relevant suggestions to decide on their next course of action. They then purchase books by reading reviews or clicking on links to online bookstores.

[0728] Step 8:

[0729] The server retrieves review information for relevant books from a database and sends it to the user's device. It also provides links to online bookstores, creating a quick and easy path for users to make purchases.

[0730] (Example 2)

[0731] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0732] Conventional information presentation systems struggled to provide information while considering the user's emotional state, and were unable to flexibly respond to the user's desired style and level of detail. Furthermore, the presentation style of extracted information was uniform, resulting in a poor user experience. Additionally, guidance to e-commerce platforms was sometimes not smooth, failing to fully stimulate user purchasing intent.

[0733] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0734] In this invention, the server includes means for estimating the user's emotional state using an emotion analysis device based on information input from the user terminal, means for extracting relevant book information from information sources based on the user's emotional state and query data, and means for converting the generated text summary into audio data and making adjustments appropriate to the user's emotional state. This enables the user to receive appropriate information according to their emotional state, providing a better user experience. Furthermore, it enables smooth guidance to related e-commerce platforms, thereby increasing the user's purchasing intent.

[0735] A "user terminal" is a device used by a user to input information and receive results, and includes electronic devices such as computers, smartphones, and tablets.

[0736] An "emotion analysis device" refers to a system or software that analyzes a user's voice tone, input speed, and selected words to estimate the user's emotional state at that time.

[0737] "Query data" refers to data that users input to seek specific information, and typically includes questions or requests presented in text format.

[0738] "Information sources" refer to various databases and recording media, which are foundational resources for providing information based on search queries.

[0739] A "generative model" refers to an algorithm or system that uses natural language processing to create a text summary based on specific input data.

[0740] "Audio data" refers to audio information recorded in digital format, and is used to present information and provide guidance to users.

[0741] An "e-commerce platform" refers to an online system or website that enables the provision of product information and sales procedures in a digital environment.

[0742] This invention is a system that provides optimal book summary information while taking into account the user's emotional state. The embodiments of this system are described in detail below.

[0743] The user uses a terminal to enter queries about books. The terminal not only sends this entered query data to a server but also collects data such as the user's voice and typing speed. This data is sent to an emotion analysis device, which can estimate the user's emotional state based on their voice tone, typing speed, and selected words.

[0744] The server uses the user's emotional state and query data obtained from the emotion analysis device to extract relevant book information from a source. This source is, for example, a relational database, and efficient data management is performed. The server then uses a generative model (for example, an AI model that performs natural language processing) to summarize the extracted book information. The generative model adjusts the style and level of detail of the text summary according to the user's emotional state. As a concrete example of a prompt, instructions are given to the generative AI model in the form of, "The user is busy and would like a concise summary."

[0745] The generated summary is converted into audio data, and then speech synthesis is performed to suit the user's emotional state. Dedicated software is used for speech synthesis, and this audio data is sent to the terminal. During speech synthesis, adjustments are made to the voice, such as "reading in a relaxed tone."

[0746] Users can ultimately receive the summary in both audio and text formats from their device. The server also provides related book reviews and links to e-commerce platforms. For example, if a user wants to find a book that interests them, the system is expected to detect that the user is feeling stressed and then perform the action of summarizing the book's content in an easy-to-read and reassuring style, and providing immediately accessible links.

[0747] This enables flexible information delivery that takes user emotions into consideration, as well as a comfortable user experience.

[0748] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0749] Step 1:

[0750] The user uses a terminal to enter a query about books. The input data includes text information about specific book titles and topics based on the user's interests. This input becomes the query data that forms the basis of the subsequent information retrieval process. Specifically, the user enters text into the terminal's input field and presses the "Search" button.

[0751] Step 2:

[0752] The terminal simultaneously records the user's voice data and input speed and transmits them to the emotion analysis device. The input includes raw data such as the user's voice tone and input speed. The emotion analysis device uses this data to estimate the emotional state and outputs the result. Specifically, the terminal collects voice data through the microphone and records the input speed in real time as a log.

[0753] Step 3:

[0754] The server receives query data and sentiment data sent from the terminal and extracts relevant book information from the information source. Input includes query data in string format and sentiment status expressed as numbers or categories. It issues search queries to the information source and retrieves information about the relevant books as output. Specifically, it issues SQL queries to retrieve book titles and summaries from the relational database.

[0755] Step 4:

[0756] The server uses a generative AI model to generate a summary text based on the acquired book information. The input includes book information extracted from the database and the user's emotional state. The generative AI model receives a prompt instructing it to "generate a summary according to the user's request," and outputs the summary text. Specifically, the process involves the AI ​​model sequentially summarizing the text according to the amount of text being processed.

[0757] Step 5:

[0758] The server passes the generated summary text to the speech conversion engine to generate audio data. The input includes the summary text generated by the generation AI model and the user's emotional state. The speech conversion engine outputs audio data in a tone appropriate to the emotional state. Specifically, the speech synthesis software adjusts the voice tone based on the emotional request and generates a digital audio file.

[0759] Step 6:

[0760] The device receives audio transmitted from the server and plays it for the user. The input includes audio data sent from the server. Specifically, the user clicks the audio play button, and the audio is output through the device's speaker.

[0761] Step 7:

[0762] The server also provides the user's device with relevant book review information and links to e-commerce platforms. Input includes review information and purchase link data extracted from the source. The user reviews this information on the device and navigates to the purchase page by clicking the links. Specifically, the purchase link is displayed on the screen, and the user performs a mouse click or tap.

[0763] (Application Example 2)

[0764] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0765] In today's information society, users face the challenge of quickly and efficiently obtaining relevant information from a vast amount of data. Furthermore, a lack of information provision that considers the user's emotional state degrades the quality of the user experience. To address this issue, optimal information provision and audio output tailored to the user's emotional state are required.

[0766] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0767] In this invention, the server includes means for receiving query data about books entered from a user terminal, means for extracting the relevant book information from a data storage area based on the query data, and means for analyzing the user's emotional state and adjusting the style and level of detail of the summary based on the analysis results. This makes it possible to provide optimal information according to the user's emotional state.

[0768] A "user terminal" is a device used by users to input query data and receive information.

[0769] "Query data" refers to data entered to search for information about a book.

[0770] A "data storage area" is a place where book information and evaluation information are stored and can be retrieved as needed.

[0771] A "generative model" is an algorithm or program used to generate a text summary based on extracted book information.

[0772] "Emotional state" refers to the psychological or emotional condition inferred from the user's voice and behavior.

[0773] "Audio format" refers to a format in which text data is converted into audio data using speech synthesis technology.

[0774] An "online store" is a sales platform where products can be purchased online.

[0775] In order to implement this invention, it is necessary to construct a system that utilizes a user terminal, a server, a data storage area, an emotion analysis engine, a generative model, and a speech synthesis engine.

[0776] First, the user terminal receives query data about books entered by the user. The terminal has a built-in sentiment analysis engine that analyzes the emotional state from the user's voice and input actions. This takes into account factors such as voice tone and speed, and the words chosen. Common API services (for example, natural language processing APIs and speech analysis APIs) are used for sentiment analysis.

[0777] Next, the server extracts relevant book information from the data storage area based on the received query data and sentiment analysis results. This book information is summarized by a generative AI model, and its style and level of detail are adjusted according to the user's emotional state. The generative AI model used is designed to respond to prompts that match the user's intent and emotions.

[0778] The generated summary is sent to a speech synthesis engine and converted into an audio format with a narration style tailored to the user's emotions. Existing speech synthesis technologies are used for the speech synthesis.

[0779] Finally, the server sends the generated summary to the user's terminal, also providing links to related book reviews and e-book stores. This allows users to receive emotionally personalized information and immediately purchase related products.

[0780] As a concrete example, in an implementation where the user is experiencing stress, the system can provide a quick and concise summary, delivering the information in a calming voice to reduce stress. An example of a prompt to the generative AI model would be the command, "Given the user's anxiety, generate a summary of relaxing products and add sentimental summaries of customer reviews."

[0781] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0782] Step 1:

[0783] The user terminal waits for the user to input query data about books and receives the input. It takes in the input query data and the user's voice and behavior data and sends it to the sentiment analysis engine. The sentiment analysis engine analyzes the voice tone and input speed to estimate the user's emotional state. The query and emotional state are output as input data.

[0784] Step 2:

[0785] The server receives query data and emotional state sent from the terminal. Based on the query data, it searches for and extracts the relevant book information from the data storage area. As a result, the book information is extracted and sent to the next step.

[0786] Step 3:

[0787] The server runs a generative AI model based on the extracted book information. It adjusts the style and level of detail of the summary according to the user's emotional state to generate an appropriate text summary. In this process, the generative AI model utilizes emotion-responsive prompts. The summary text data is generated, and its output is obtained.

[0788] Step 4:

[0789] The generated text summary is sent to the speech synthesis engine. The speech synthesis engine receives the summary and converts it into speech in a voice style that matches the user's emotions. The output from the speech synthesis engine is the audio data to be provided to the user.

[0790] Step 5:

[0791] The server transmits the generated audio data, related book reviews, and links to online stores to the user's terminal. The user's terminal then provides the received information to the user through screen display and audio playback. The user can receive customized information and consider their options.

[0792] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0793] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0794] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0795] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0796] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0797] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0798] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0799] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0800] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0801] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0802] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0803] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0804] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0805] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0806] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0807] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0808] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0809] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0810] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0811] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0812] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0813] The following is further disclosed regarding the embodiments described above.

[0814] (Claim 1)

[0815] A means for receiving query data about books entered from a user terminal,

[0816] A means of extracting relevant book information from a database based on query data,

[0817] A means of generating a text summary using a generative model based on extracted book information,

[0818] A means for converting the generated text summary into an audio format,

[0819] A means for transmitting summarized data converted into audio format to the user's terminal,

[0820] A system that includes this.

[0821] (Claim 2)

[0822] The system according to claim 1, further comprising means for obtaining book review information related to the search results from a database and transmitting it to a user terminal.

[0823] (Claim 3)

[0824] The system according to claim 1, further comprising means for presenting a link to a relevant online bookstore on the user's device based on the generated summary and review information, thereby enabling the user to purchase the book in question.

[0825] "Example 1"

[0826] (Claim 1)

[0827] A means for receiving inquiry data regarding documents entered from a user device,

[0828] A means for extracting relevant document information from a storage device based on query data,

[0829] A means for generating a text summary using a generation algorithm based on extracted document information,

[0830] A means of converting the generated text summary into audio format,

[0831] A means for transmitting summary data converted into audio format to the user's device,

[0832] A system that includes this.

[0833] (Claim 2)

[0834] The system according to claim 1, further comprising means for obtaining evaluation information of documents related to the search results from a storage device and transmitting it to a user device.

[0835] (Claim 3)

[0836] The system according to claim 1, further comprising means for presenting a link to a relevant electronic marketplace on the user's device based on the generated summary and evaluation information, thereby enabling the user to purchase the relevant document.

[0837] "Application Example 1"

[0838] (Claim 1)

[0839] A means for receiving query information about books entered from a user computing terminal,

[0840] A means for extracting relevant book information from the information storage unit based on query information,

[0841] A means of generating a textual summary using a generative model based on extracted book information,

[0842] A means for converting the generated text summary into an audio format,

[0843] A means for transmitting summarized information converted into audio format to a user's computing terminal,

[0844] A means for users to listen to an audio summary, select purchase information, and quickly complete the transaction through electronic payment.

[0845] A system that includes this.

[0846] (Claim 2)

[0847] The system according to claim 1, further comprising means for obtaining book review information related to the search results from an information storage unit and transmitting it to a user computing terminal.

[0848] (Claim 3)

[0849] The system according to claim 1, further comprising means for presenting a connection to a relevant online retailer to the user's computing terminal based on the generated summary and review information, thereby enabling the user to purchase the book in question.

[0850] "Example 2 of combining an emotion engine"

[0851] (Claim 1)

[0852] A means for estimating the user's emotional state using an emotion analysis device based on information input from the user's terminal,

[0853] A means for extracting relevant book information from sources based on the user's emotional state and query data,

[0854] A method for generating text summaries with styles and levels of detail that match the user's emotions, using a generative model based on extracted book information,

[0855] A means of converting the generated text summary into audio data and making adjustments to suit the user's emotional state,

[0856] A means for transmitting the converted audio data to the user's terminal,

[0857] A system that includes this.

[0858] (Claim 2)

[0859] The system according to claim 1, further comprising means for obtaining evaluation information of books related to the search results from a source and transmitting it to a user terminal.

[0860] (Claim 3)

[0861] The system according to claim 1, further comprising means for presenting a link to a relevant e-commerce platform on the user's terminal based on the generated summary and evaluation information, thereby enabling the user to purchase the book in question.

[0862] "Application example 2 when combining with an emotional engine"

[0863] (Claim 1)

[0864] A means for receiving query data about books entered from a user terminal,

[0865] A means for extracting relevant book information from the data storage area based on query data,

[0866] A means of generating a text summary using a generative model based on extracted book information,

[0867] A means of analyzing the user's emotional state and adjusting the style and level of detail of the summary based on the analysis results,

[0868] A means for converting the generated text summary into an audio format,

[0869] A means for transmitting summarized data converted into audio format to the user's terminal,

[0870] A system that includes this.

[0871] (Claim 2)

[0872] The system according to claim 1, further comprising means for obtaining evaluation information of books related to the search results from a data storage area and transmitting it to a user terminal.

[0873] (Claim 3)

[0874] The system according to claim 1, further comprising means for presenting a link to a relevant e-store on the user's terminal based on the generated summary and evaluation information, thereby enabling the user to purchase the book in question. [Explanation of symbols]

[0875] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means for receiving query data about books entered from a user terminal, A means of extracting relevant book information from a database based on query data, A means of generating a text summary using a generative model based on extracted book information, A means for converting the generated text summary into an audio format, A means for transmitting summarized data converted into audio format to the user's terminal, A system that includes this.

2. The system according to claim 1, further comprising means for obtaining book review information related to the search results from a database and transmitting it to the user terminal.

3. The system according to claim 1, further comprising means for presenting a link to a relevant online bookstore on the user's terminal based on the generated summary and review information, thereby enabling the user to purchase the book in question.