system

A system simplifies medical reservations by converting voice or text input to text, analyzing intent, and providing reservation options, addressing user difficulties and institutional labor, with language and dialect support, ensuring smooth and supported reservations.

JP2026103408APending Publication Date: 2026-06-24SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-12
Publication Date
2026-06-24

Smart Images

  • Figure 2026103408000001_ABST
    Figure 2026103408000001_ABST
Patent Text Reader

Abstract

Provide a system. 【Solution means】 An input device that receives voice data or string data, A conversion device that converts the input voice data into string data, A natural language processing device that analyzes the converted string data to identify the intention, A search device that accesses the information system of a medical institution to check available appointment time slots and locations, A user interface that presents the search results and induces a selection, A notification device that finalizes and notifies an appointment based on the selected option, A voice output device that presents the generated options in voice, A system including the above.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] There is a need to solve the difficulties faced by the elderly and those who are not good at using digital devices in the reservation process of medical institutions. Specifically, in the conventional complex and time-consuming reservation procedures, these users have a great deal of trouble, and in some cases, they may give up the reservation itself. Also, on the medical institution side, a great deal of labor and time are allocated to the reservation work, and a reduction in the work burden is desired.

Means for Solving the Problems

[0005] This invention provides a system that quickly receives voice or text input from users, converts the voice data into text, and analyzes the user's intent using natural language processing technology. Furthermore, this system connects to a database of medical institutions, efficiently searches for available dates, times, and facilities, and presents the search results to the user. The user selects their preferred option from the presented options, and the reservation is confirmed based on that selection. In addition, after the reservation is officially confirmed, the system has a function to notify the user and a function to connect with an operator if the user encounters difficulties, utilizing multilingual and dialect-compatible speech recognition technology. This creates an environment where all users, including the elderly and users unfamiliar with digital devices, can easily complete reservations.

[0006] "User" refers to an individual or group that uses this system to make reservations at medical institutions.

[0007] "Voice input" refers to the process of receiving the user's voice as a digital signal through a microphone or similar device.

[0008] "Text input" refers to the process by which a user enters characters using a keyboard or software.

[0009] A "speech recognition engine" refers to software or hardware that analyzes received audio data and converts it into corresponding text data.

[0010] "Text data" refers to information expressed in the form of characters or sentences, converted by a speech recognition engine.

[0011] "Natural language processing technology" refers to the technology used by computers to analyze, understand, and generate human language.

[0012] "Medical institutions" refer to facilities that provide health management services, such as hospitals, clinics, and medical offices.

[0013] A "database" refers to a system used to manage appointment status and other information at medical institutions.

[0014] The "reservation candidate list" refers to a list of available dates, times, and facilities that users can select for a reservation.

[0015] "Notification" refers to an action or information that informs the user of details such as reservation confirmations or reminders.

[0016] An "operator" refers to a person assigned to directly assist users as needed.

[0017] "Multilingual" refers to a feature that supports two or more different languages.

[0018] A "dialect" refers to a variation of language used in a particular region. [Brief explanation of the drawing]

[0019] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] This is a conceptual diagram showing an example of the essential functions of a data processing device and a smart device according to the first embodiment. [Figure 3] This is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] This is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] This is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] This is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] This is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] This is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] Shows an emotion map to which a plurality of emotions are mapped. [Figure 10] Shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Embodiment 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Embodiment 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Mode for Carrying Out the Invention

[0020] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0021] First, the terms used in the following description will be explained.

[0022] In the following embodiments, the labeled processor (hereinafter simply referred to as "processor") may be one arithmetic unit or a combination of a plurality of arithmetic units. Also, the processor may be one type of arithmetic unit or a combination of a plurality of types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0023] In the following embodiments, signed RAM (Random Access Memory) is a memory that temporarily stores information and is used as work memory by the processor.

[0024] In the following embodiments, the signed storage is one or more non-volatile storage devices that store various programs and various parameters. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes.

[0025] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0026] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0027] [First Embodiment]

[0028] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0029] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0030] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0031] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0032] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0033] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0034] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0035] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0036] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0037] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0038] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0039] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0040] This invention provides a system that allows elderly people and those who are not comfortable using digital devices to easily make reservations at medical institutions. The system primarily operates based on the user's voice or text input, with the server and terminal working together to process the information.

[0041] First, the device uses a microphone and recording function to receive voice input from the user. If the user says, "I would like to make an appointment with the dermatologist next Friday," that voice is captured as a digital signal. Text input is also possible, and the user can enter "I would like to make an appointment with the dermatologist next Friday" using the keyboard or by touching the screen.

[0042] In the case of voice input, the terminal uses a speech recognition engine to convert the voice signal into text data. The converted text data is then sent to the server to process the reservation.

[0043] The server analyzes the received text data using natural language processing technology to clarify the user's intent. Specifically, it extracts keywords such as "medical department" and "date and time" from the text to identify the information necessary for making a reservation.

[0044] The server then queries the medical institution's database to check for available appointment times and corresponding departments. The confirmed appointment options are reconfigured into a selectable format for the user and sent to the terminal.

[0045] Next, the device presents these options to the user and prompts them to select a specific date, time, and medical department. For example, it might ask, "We have an appointment available for dermatology on Friday at 10am. Is this alright?"

[0046] After the user makes a selection, the device sends that information back to the server to confirm the reservation. Once the reservation is confirmed, the server generates a confirmation message and sends it to the device to notify the user that the reservation is complete. A reminder is also sent as the reservation date approaches.

[0047] This system also includes a feature that allows users to connect with an operator with a single button press if they encounter difficulties during the booking process. This enables quick responses even when direct support is needed. Furthermore, the voice recognition function supports multiple languages ​​and is designed to be easy to use for users living in rural areas, as it flexibly adapts to Japanese dialects and other regional variations.

[0048] The following describes the processing flow.

[0049] Step 1:

[0050] Users enter their desired appointment details through their device's voice or text input function. For example, they might say, "I'd like to book an internal medicine appointment for tomorrow afternoon," or type similar information as text.

[0051] Step 2:

[0052] The terminal converts voice input into text data using a speech recognition engine. If text is entered, it is processed as text data. The converted or entered text is then sent to the server for further analysis.

[0053] Step 3:

[0054] The server feeds the received text data to a natural language processing engine, which analyzes the user's intent from the text. In this process, it extracts important keywords such as the desired medical department and preferred date and time.

[0055] Step 4:

[0056] Based on the extracted information, the server sends a query to the healthcare facility database to retrieve available dates, times, and a list of facilities. The obtained data is then converted into a user-friendly format and sent to the terminal.

[0057] Step 5:

[0058] The terminal presents the user with available reservation options received from the server. For example, it might display a message on the screen or announce a voice message saying, "You have appointments available in the internal medicine department tomorrow at 3pm and 4pm. Which would you prefer?"

[0059] Step 6:

[0060] The user selects their desired reservation from the presented options and confirms it. The selected information is sent to the server via the terminal.

[0061] Step 7:

[0062] The server sends the reservation information selected by the user as a confirmation request to the medical institution's reservation system, formally confirming the reservation. It generates a confirmation message and sends it back to the terminal to notify the user.

[0063] Step 8:

[0064] The terminal receives a confirmation message from the server and notifies the user, either visually or audibly, that the reservation is complete. It also generates a reminder and notifies the user as the reservation date and time approaches.

[0065] (Example 1)

[0066] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0067] In modern society, the numerous steps and technical complexities involved in making appointments at medical institutions pose significant obstacles for elderly individuals and those unfamiliar with digital devices. There is a need to provide such users with an intuitive and easy-to-use appointment system. Furthermore, the realization of technology capable of accurate speech recognition and intent analysis, while considering linguistic diversity and dialects, is essential. Additionally, it is necessary to provide a means for users to quickly receive assistance if they encounter difficulties during operation.

[0068] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0069] This invention includes a server that receives information input from the user, a technology that converts the received voice data into text information, and a technology that analyzes the converted text information using natural language processing to extract the user's purpose. This allows users to easily make reservations at medical institutions and achieves highly accurate reservation processing through voice recognition adapted to various languages ​​and dialects. Furthermore, if users encounter difficulties during operation, they can immediately receive support from the operator, ensuring a safe and secure user experience.

[0070] A "user" refers to a person who uses the system to make an appointment at a medical institution.

[0071] "Information input" refers to data that users provide to the system through voice or text.

[0072] "Device" refers to a hardware or software component that receives data from a user.

[0073] "Audio data" refers to a signal in which the words spoken by a user are saved in a digital format.

[0074] "Textual information" refers to digital information obtained by converting audio data into text format.

[0075] "Technology" refers to the methods and means that a system uses to achieve a specific function.

[0076] "Natural language processing" refers to the process of analyzing data using language processing techniques to understand the user's intent from textual information.

[0077] "Purpose" refers to the ultimate intention or goal of a user when operating a system.

[0078] A "medical facility" refers to an organization or place that provides medical treatment or medical care.

[0079] An "information collection" refers to the data stored in databases and servers used by a system.

[0080] "Time slot" refers to a specific time frame during which reservations are possible.

[0081] An "organization" refers to a facility or organization that provides a specific service.

[0082] "Providing" refers to the act of a system presenting information or options to a user.

[0083] "Promotion" refers to the guidance a system provides to make it easier for users to take the next action.

[0084] "Final decision" refers to the official confirmation of a reservation based on the user's selection.

[0085] "Notification" refers to the act of informing users about the status of their reservation or any necessary confirmations.

[0086] "Establishing communication" means creating a situation where a user can contact an operator when they need support.

[0087] "Diversity" describes a wide range of things that have different types or forms.

[0088] "Language" refers to a systematic collection of sounds and characters that humans use to communicate.

[0089] A "dialect" refers to a variant of a language that exhibits different characteristics depending on the region and culture, even within the same language.

[0090] This invention provides a system that allows users to easily and intuitively make appointments at medical institutions. The system functions primarily through the interaction of a server, terminals, and users.

[0091] First, when a user wishes to make an appointment at a medical institution, they use an application on a device such as a smartphone or tablet. This application can accept input via voice or text. For example, a user might say, "I would like to make an appointment with the dermatologist next Friday." This input is done using the microphone built into the smartphone.

[0092] Next, the device converts the audio data into text. This conversion uses speech recognition software such as Google® Speech-to-Text API. The converted text data is then sent from the device to the server via the internet.

[0093] The server analyzes the received text data using natural language processing techniques. Libraries such as spaCy and Transformers are used for this analysis. As a result of the analysis, the user's intent is clarified, and the information necessary for making a reservation is extracted.

[0094] Next, the server queries a database containing information about medical facilities to determine available time slots and institutions. For example, AWS® RDS can be used to efficiently query the database.

[0095] The terminal provides the user with reservation options sent from the server. These options are displayed clearly in either a calendar or list format. The user then selects the most suitable reservation option from the options.

[0096] Finally, the device sends the selected information back to the server to finalize the reservation. Once the reservation is confirmed, the server generates a confirmation message and sends it to the device to notify the user. Reminders can also be sent as needed, notifying the user as the reservation date approaches.

[0097] As a concrete example, here is an example of a prompt statement to be input to a generative AI model:

[0098] Please input "I would like to make an appointment with a dermatologist next Friday" using voice input, send this information to the AI ​​model for analysis, and suggest available medical institutions.

[0099] This system allows users to easily make appointments at medical facilities and features voice recognition adapted to various languages ​​and dialects. Furthermore, it enables rapid communication with operators in emergencies, allowing users to complete the booking process with peace of mind.

[0100] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0101] Step 1:

[0102] Users request appointments using voice or text. Specifically, they launch an application on their smartphone and input a request, such as "I would like to book an appointment with the dermatologist next Friday," using voice or text. This input constitutes the basic information input into the system.

[0103] Step 2:

[0104] The device receives user voice input and converts it into text. It uses the Google Speech-to-Text API to analyze the voice data and generate text. During this process, it performs the most optimal text conversion possible while voice is present, and prepares the resulting text data. This converted information is then passed to the next process as intermediate data.

[0105] Step 3:

[0106] The terminal sends text information to the server. Specifically, it sends an HTTP request over the internet to prepare for further processing on the server. The input here is the text information generated in step 2, and the output is the completion of the transmission to the server.

[0107] Step 4:

[0108] The server analyzes the received text information using natural language processing technology. Libraries such as spaCy and Transformers are used for analysis, extracting keywords such as "medical department" and "date and time." It receives text information as input and outputs the analysis data, identifying the information necessary for making a reservation.

[0109] Step 5:

[0110] The server executes queries against a database of medical facility information. It searches SQL and NoSQL databases for medical departments and dates that match the user's needs, and collects available reservation options. The input is analytical data, and the output is a list of options presented to the user.

[0111] Step 6:

[0112] The terminal receives reservation candidates sent from the server and presents them to the user. Specifically, it displays them in a calendar or list format to make them easily understandable visually to the user. At this point, the input is a list of reservation candidates, and the output is the provision of visual information to the user.

[0113] Step 7:

[0114] The user selects their preferred booking option from the presented choices. The user's selected booking information is then sent as input to the next step.

[0115] Step 8:

[0116] The terminal sends the user's selections back to the server to finalize the reservation. The input here is the user's selection information, and the output is a reservation confirmation command.

[0117] Step 9:

[0118] The server confirms the reservation and generates a confirmation message. It sends the confirmation information back to the terminal, notifying the user. The output provides the user with information on the reservation status and how to manage their future schedule.

[0119] Through the steps outlined above, users can intuitively and smoothly complete their medical appointment bookings.

[0120] (Application Example 1)

[0121] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0122] The goal is to eliminate the complexities of the procedures and communication barriers faced by elderly users and those unfamiliar with digital tools when making appointments at medical institutions. In particular, it is necessary to provide a more intuitive and smoother booking process by utilizing voice input.

[0123] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0124] In this invention, the server includes an input device that receives voice data or string data, a conversion device that converts the input voice data into string data, and a natural language processing device that analyzes the converted string data and identifies the intent. This makes it possible for users to easily communicate their reservation requests through voice input and for the system to automatically present appropriate reservation information based on that.

[0125] "Audio data" refers to data that captures the voice spoken by the user as a digital signal.

[0126] "String data" refers to text-formatted data obtained by converting audio data.

[0127] An "input device" is a device used to receive audio data or text data.

[0128] A "conversion device" is a device that has the function of converting audio data into text data.

[0129] A "natural language processing device" is a device that analyzes text data and performs processing to identify the user's intent.

[0130] "Medical institution" is a general term for organizations and facilities that provide medical care and treatment.

[0131] An "information system" is a computer system used to manage and manipulate data and information.

[0132] A "search device" is a device used to access an information system and find available time slots and locations for reservations.

[0133] A "user interface" is an interface used for exchanging information between a system and a user.

[0134] A "notification device" is a device used to transmit information or messages to a user.

[0135] A "voice output device" is a device that presents generated options or information to the user as voice.

[0136] A "support device" is a device that has the function of enabling communication with an operator when a user has difficulty making a reservation selection.

[0137] This invention is a system that enables elderly people and users unfamiliar with using digital devices to smoothly make appointments at medical institutions using voice input. The system converts voice data into string data, analyzes the user's requests using natural language processing, and presents appropriate appointment options.

[0138] The terminal is equipped with an input device to receive audio data and uses speech recognition technology to convert the acquired audio into a digital format. The converted audio data is sent to the server as text data. In this operation, the "Google Speech Recognition API" can be used as a specific example of a speech recognition engine.

[0139] After receiving the string data, the server uses a natural language processing unit to analyze the data and extract information about the medical institution and date / time the user wishes to book. The search device on the server accesses the medical institution's information system to obtain information about available time slots and medical departments. Specific examples of natural language processing technologies used here include "spaCy" or "NLTK".

[0140] Users can receive search results presented through a user interface, and be prompted to make a selection. Furthermore, it is possible to provide information without relying on visual cues by using an audio output device to present options.

[0141] Once the user completes their selection, the device will notify them of the appointment confirmation via a notification system and set reminders as needed to ensure they don't forget their appointment at the medical institution.

[0142] For example, if a user voice-inputs, "I want to book an appointment with a dermatologist next Friday," the system will search for suitable dates and times and potential medical facilities, present the user with the best option, and confirm the appointment for the selected time slot.

[0143] An example prompt for the generating AI model is as follows: "Please propose a system that allows elderly people to easily book appointments at medical facilities via their smartphones. Include features that use voice input to automatically retrieve and suggest available medical departments to the user."

[0144] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0145] Step 1:

[0146] The device receives audio data. It captures what the user says via the microphone and converts the analog audio into a digital audio signal. The input is the user's voice, and the output is digital audio data. Specifically, the interface between the microphone and the speech recognition engine is used.

[0147] Step 2:

[0148] The device converts digital audio data into text data. It uses a speech recognition engine to analyze the audio signal and convert it into text format. The input is digital audio data, and the output is text data. Specifically, it calls speech recognition services such as the "Google Speech Recognition API."

[0149] Step 3:

[0150] The server analyzes string data to identify the user's intent. It utilizes a natural language processing system to extract necessary keywords and phrases from the text. The input is string data, and the output is reservation intent information (e.g., date and time, medical specialty). Specifically, the natural language processing engine "spaCy" or "NLTK" is used.

[0151] Step 4:

[0152] The server accesses the medical institution's information system to search for available appointment times and departments. It queries the database to retrieve appointment options that match the user's intent. The input is information about the appointment intent, and the output is a list of available appointment options. Specifically, it executes SQL queries to retrieve the necessary data.

[0153] Step 5:

[0154] The terminal presents the user with reservation options. The user interface visually or audibly displays the options and prompts the user to make a selection. Input is a list of available reservation options, and output is the user's selection. Specific operations include screen display and speech synthesis technology.

[0155] Step 6:

[0156] The user selects a reservation option on the terminal, and the terminal sends this information to the server. This transmits reservation information based on the user's selection to the server. The input is the user's selection, and the output is the selected reservation information. The terminal's selection interface is used for this operation.

[0157] Step 7:

[0158] The server confirms the reservation and notifies the terminal. It registers the selected information in the reservation system and generates a confirmation message. The input is the selected reservation information, and the output is the reservation confirmation message. Specifically, database updates and the notification system are involved.

[0159] Step 8:

[0160] The device confirms the reservation with the user and sets a reminder. It notifies the user that the reservation is complete and sets up a reminder to notify them as the reservation date approaches. The input is a reservation confirmation message, and the output is a reminder message. Specifically, the device's notification function and the calendar API are used.

[0161] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0162] This invention is a system that takes into account the user's emotions when making a reservation at a medical institution, providing a more comfortable and smoother experience. The system comprises a terminal for receiving voice or text input from the user, a server for analyzing the data and processing the reservation, and an emotion engine for recognizing the user's emotions.

[0163] First, the user communicates their appointment request to the system using the device's voice or text input function. For example, they might say or type, "I'd like to make a dental appointment for next Wednesday."

[0164] The terminal converts voice input into text using a speech recognition engine and then sends it to the server. The server analyzes this text data using natural language processing technology to identify the user's desired medical department and date / time. Simultaneously, an emotion engine analyzes the user's emotions from the tone of voice and selected words in the input data. For example, it might determine that the user is experiencing stress.

[0165] Based on the information gathered and the results of sentiment analysis, the server queries a database of healthcare facilities to generate the most suitable booking options. For example, if the system analyzes that the user is stressed, it will prioritize showing time slots where appointments can be booked quickly.

[0166] Next, the generated reservation options are sent to the terminal and presented to the user. If the user makes a selection, that selection is sent to the server and the formal reservation process is completed. Simultaneously with the reservation confirmation, the server generates a confirmation notification and sends it to the terminal to inform the user. Furthermore, it also has a function that suggests connecting the user to an operator depending on the user's emotional state. For example, if frustration or anxiety is detected, it automatically guides the user to human support.

[0167] Finally, as the reservation date approaches, the system generates a reminder and notifies the user via their device. This ensures the user doesn't forget to use their reservation. By combining this with an emotion engine, a more user-friendly reservation system is created.

[0168] The following describes the processing flow.

[0169] Step 1:

[0170] Users enter their appointment requests using the device's voice or text input function. For example, they might say, "I'd like to make an appointment with the orthopedics department next Monday," or type the same thing as text.

[0171] Step 2:

[0172] The terminal uses a speech recognition engine to convert the input speech data into text data. The converted text data or text input data is then sent directly to the server.

[0173] Step 3:

[0174] The server analyzes the received text data using a natural language processing engine and extracts important keywords such as the medical department and date / time from the user's reservation request.

[0175] Step 4:

[0176] The server uses an emotion engine to analyze the user's emotions from the content of text data or, if it's audio, from the tone of voice. For example, it can determine if the user's voice contains feelings of anxiety or unease.

[0177] Step 5:

[0178] Based on the extracted information and sentiment analysis results, the server consults a database of healthcare facilities to generate the most suitable booking options for available dates, times, and facilities. For example, if the user is feeling anxious, it prioritizes selecting earlier time slots.

[0179] Step 6:

[0180] The terminal presents the user with reservation options sent from the server. The information is displayed on the screen or spoken aloud, for example, "You have an appointment available at the orthopedics department on Monday at 10:00 AM. Do you want to confirm the reservation?"

[0181] Step 7:

[0182] The user selects their preferred booking option from the presented choices and confirms it. The selected information is sent to the server via the terminal.

[0183] Step 8:

[0184] Based on the user's selection information, the server sends a reservation confirmation request to the medical institution's reservation system, formally confirming the reservation. After confirmation, a detailed confirmation message is generated and sent to the user's terminal to inform them.

[0185] Step 9:

[0186] The device receives a confirmation message from the server and informs the user that the reservation is complete. It also follows up with the user by sending reminder notifications as the reservation date approaches, if necessary.

[0187] (Example 2)

[0188] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0189] Modern medical appointment systems often fail to consider user emotions, leading to stress and anxiety for users as they proceed with booking appointments. Furthermore, users who experience difficulties in selecting appointments often lack access to adequate support.

[0190] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0191] In this invention, the server includes means for receiving voice or text input from the user, means for analyzing the user's emotional state, and means for generating optimal reservation options based on the user's emotional state. This enables a comfortable and smooth reservation experience that takes the user's emotions into consideration.

[0192] A "user" refers to an individual who uses the system to make an appointment at a medical institution.

[0193] "Voice data" refers to information that records the voice spoken by a user in digital format.

[0194] "Text data" refers to written information obtained through audio data or user text input.

[0195] "Natural language processing technology" refers to the technology that enables devices to understand and analyze human language.

[0196] "Emotional state" refers to the emotional state a user exhibits while using the system.

[0197] A "medical institution database" refers to a collection of information that includes available appointment times and facility information for medical institutions.

[0198] "Reservation options" refer to the multiple available date, time, and facility choices presented to the user.

[0199] "Operator" refers to a person or role that provides support to users through a system.

[0200] "Speech recognition technology" refers to the technology that converts speech data into text data.

[0201] This invention is a system designed to provide a comfortable and smooth experience for users when making appointments at medical institutions, taking into account their emotional state. The following describes embodiments for carrying out the invention.

[0202] Users enter their reservation preferences using voice or text via a terminal. When using voice input, the terminal employs speech recognition technology to convert the voice data into text. For example, typical speech recognition software processes this input.

[0203] The device converts the received audio into text and then sends it to the server as digital data. During this process, natural language processing (NLP) technology is used to analyze the text and extract the user's intent. Specifically, the next scheduled appointment might be translated as "I would like to make an internal medicine appointment for next Tuesday." At this stage, the natural language processing engine and sentiment analysis engine are operational. For example, an NLP library might be used for analysis, and the sentiment analysis engine might be used for sentiment analysis.

[0204] The server identifies the user's desired medical department and appointment time based on text parsed in natural language. Simultaneously, it analyzes the user's emotional state through sentiment analysis. The results of the sentiment analysis assess whether the user is experiencing stress or anxiety, and are used to generate the most suitable appointment options.

[0205] The server accesses a database of medical institutions based on the analysis results and emotional state information, searching for available time slots and facility information. From this data, it presents the most suitable booking options according to the user's emotional state. For example, if it determines that the user is feeling anxious, it generates an option to prioritize immediately available appointments.

[0206] For example, if a user enters text saying, "I want to make a dermatology appointment next Friday, but I've been busy lately and I'm worried about my health," the system uses an emotion engine to determine the user's concerns and prioritizes presenting time slots with high appointment availability.

[0207] An example of a prompt for a generative AI model is, "What suggestions can be made if a user wants to make an appointment at a medical institution but may be experiencing stress?"

[0208] This enables reservation suggestions that take user emotions into consideration, resulting in a more comfortable service for users.

[0209] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0210] Step 1:

[0211] The user enters their request for a medical appointment via voice or text. This user input becomes the initial data. For example, they might enter a request such as, "I would like to make an appointment with an internal medicine specialist next Tuesday."

[0212] Step 2:

[0213] The device uses speech recognition technology to convert speech data into text data. This process converts the input from speech to text format. The converted text is then sent to the server as digital data.

[0214] Step 3:

[0215] The server analyzes the received text data using natural language processing technology to extract the user's intent. In this step, a generative AI model is used to analyze the text and identify information such as the medical department and appointment date / time the user desires. For example, keywords such as "internal medicine" and "next Tuesday" may be extracted.

[0216] Step 4:

[0217] The server uses an emotion analysis engine to analyze the user's emotional state. The input data used is the text obtained in the previous step, from which emotional features are extracted. For example, it can determine whether the user is experiencing stress.

[0218] Step 5:

[0219] The server accesses a database of medical institutions based on the analysis results and emotional information, searching for available time slots and facility information. This process generates booking options that are suitable for the user's intentions and emotions. As a result, the booking options deemed most suitable for the user are output.

[0220] Step 6:

[0221] The terminal presents the user with reservation options sent from the server. The output data consists of reservation date and time and facility information, which is presented to the user visually or audibly.

[0222] Step 7:

[0223] The user selects their desired reservation from the presented options. This selection registers the user's decision as input data.

[0224] Step 8:

[0225] The server confirms the reservation based on the user's selection and notifies them of this information. The confirmation notification is sent to the terminal, informing the user that the reservation is complete. This officially establishes the reservation with the medical institution.

[0226] (Application Example 2)

[0227] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0228] Traditional reservation systems perform reservation procedures mechanically without considering the user's subjective emotional state, which led to problems such as the reservation process not proceeding smoothly when the user was feeling anxious or stressed. Furthermore, there was a lack of options for receiving appropriate support, resulting in a lower quality of user experience.

[0229] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0230] In this invention, the server includes means for receiving voice or text input from the user and converting the voice data into text data, means for analyzing the converted text data using natural language processing technology to extract the user's intent, and means for analyzing the user's emotions from the user's input data. This makes it possible to present the optimal reservation option according to the user's emotional state and to quickly suggest human support as needed.

[0231] A "user" refers to an individual or their representative who makes a reservation using the system and provides input information via voice or text.

[0232] "Voice data" refers to information provided by the user through voice, and is the raw data that the system uses as preprocessing for natural language analysis.

[0233] "Text data" refers to the format in which audio data is represented as characters, and it is the foundational information that a system uses to analyze it using natural language processing technology.

[0234] "Natural language processing technology" refers to a group of technologies that analyze text data to understand its intent and content, enabling machines to produce meaningful responses and actions.

[0235] A "means of analyzing emotions" refers to a module within a system that extracts emotional nuances from user-inputted voice or text and evaluates the user's emotional state based on that.

[0236] "Optimal booking options" are suggestions generated to present available dates, times, and facility choices based on the user's intentions and emotional state.

[0237] "Support" refers to the connection with an operator and additional guidance provided based on emotional analysis when a user has difficulty making a reservation.

[0238] This invention provides a method for realizing a system that provides an optimal booking experience based on the user's emotional state.

[0239] The server plays a central role in this system. The server uses a speech recognition engine (e.g., Google Speech-to-Text API) to receive voice input from the user and convert it into text data. This text data is then analyzed using natural language processing techniques (e.g., Python's NLTK or SpaCy) to extract the user's intent. Furthermore, sentiment analysis tools such as AWS Comprehend and IBM Watson® Tone Analyzer are used to analyze the user's emotions and evaluate their emotional state.

[0240] A terminal is a device that allows users to input information through an interface and receive results. The terminal sends voice or text input to the server and presents the user with reservation options and confirmation notifications sent from the server.

[0241] When a user enters information about the service they intend to book, the data is analyzed, and the most suitable booking options are generated based on the user's emotional state. For example, if a user enters, "I want to book a day care service for my mother this weekend, but I'm a little nervous," the system will present available booking options at facilities that will make the user feel at ease. Furthermore, if the emotional analysis indicates that the user is having difficulty making a booking choice, a connection with an operator will be immediately suggested.

[0242] For example, users can maximize the system's functionality by using prompts such as, "I'd like to book a day care service for the weekend, but I'm a little worried. Could you tell me what times are available?" This allows users to easily select the best booking option and receive the necessary support.

[0243] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0244] Step 1:

[0245] The terminal accepts voice or text input from the user. The user speaks or texts about the desired reservation details and date / time. This input forms the basis of the reservation process.

[0246] Step 2:

[0247] The server uses a speech recognition engine to convert audio data sent from the terminal into text data. When voice input is received, the speech recognition engine processes the data and converts it into text information, preparing it for natural language processing.

[0248] Step 3:

[0249] The server analyzes the text data converted using natural language processing technology. During the analysis, it extracts the user's desired service details and date / time. This clarifies the specific intent necessary for making a reservation.

[0250] Step 4:

[0251] The server uses an emotion analysis engine to evaluate the user's emotional state from text data. Based on the input words and their tone, it performs data processing to identify emotions and numerically interprets feelings such as stress and anxiety.

[0252] Step 5:

[0253] The server searches a database of facilities for the best booking option based on the results of natural language processing and sentiment analysis. It queries facility availability and service details, and generates options that are suitable for the user's preferences and emotional state.

[0254] Step 6:

[0255] The terminal presents the user with reservation options sent from the server. The user can review the presented options and choose the one that best suits their needs. The user's selection at this step becomes the final reservation.

[0256] Step 7:

[0257] The server confirms the reservation selected by the user and sends a reservation confirmation notification to the device. This officially records the final reservation, and the user retains the confirmation information.

[0258] Step 8:

[0259] The server will suggest connecting you with an operator as needed. If the sentiment analysis determines that you are having difficulty making a reservation, the server will immediately begin guiding you to an operator for assistance.

[0260] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0261] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0262] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0263] [Second Embodiment]

[0264] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0265] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0266] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0267] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0268] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0269] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0270] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0271] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0272] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0273] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0274] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0275] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0276] This invention provides a system that allows elderly people and those who are not comfortable using digital devices to easily make appointments at medical institutions. The system primarily operates based on the user's voice or text input, with the server and terminal working together to process the information.

[0277] First, the device uses a microphone and recording function to receive voice input from the user. If the user says, "I would like to make an appointment with the dermatologist next Friday," that voice is captured as a digital signal. Text input is also possible, and the user can enter "I would like to make an appointment with the dermatologist next Friday" using the keyboard or by touching the screen.

[0278] In the case of voice input, the terminal uses a voice recognition engine to convert the voice signal into text data. The converted text data is sent to the server to proceed with the reservation process.

[0279] The server analyzes the received text data using natural language processing technology to clarify the user's intention. Specifically, keywords such as "department" and "date and time" are extracted from the text to identify the information required for the reservation.

[0280] After that, the server executes a query against the database of the medical institution to check if there are available time slots and corresponding departments for reservation. The confirmed reservation candidates are reconfigured in a selectable format for the user and sent to the terminal.

[0281] Next, the terminal presents these options to the user and prompts the user to select a specific date and time and department. For example, it requests a selection in the form of "A reservation for the dermatology department is available at 10:00 on Friday. Is this okay with you?"

[0282] After the user makes a selection, the terminal sends the information back to the server to complete the reservation procedure. When the reservation is confirmed, the server generates a confirmation message and sends it to the terminal to notify the user that the reservation has been completed. Also, a notification is made again when the reservation date approaches as a reminder.

[0283] This system also has a function to connect to an operator with a single button when the user feels difficulty during the reservation process. This enables quick response even when direct support is required. Furthermore, the voice recognition function supports multiple languages and can flexibly adapt to Japanese dialects, etc., making it easy to use for users living in local areas.

[0284] The following explains the processing flow.

[0285] Step 1:

[0286] The user inputs the content of the desired medical institution reservation through the voice input function or text input function of the terminal. For example, say "I want to make an appointment with the internal medicine department tomorrow afternoon" or input the same content in text.

[0287] Step 2:

[0288] The terminal converts the voice - input data into text data by a voice recognition engine. If it is input in text, it is directly processed as text data. The converted or input text is sent to the server for the next analysis.

[0289] Step 3:

[0290] The server provides the received text data to a natural - language processing engine to analyze the user's intention from the text. In this process, important keywords such as the required medical department and desired date and time are extracted.

[0291] Step 4:

[0292] The server sends a query to the database of the medical institution based on the extracted information to obtain a list of available dates and facilities. The obtained data is converted into a format that is easy for the user to select and sent to the terminal.

[0293] Step 5:

[0294] The terminal presents the reservation - available options received from the server to the user. For example, display on the screen or guide by voice: "It is available to make an appointment with the internal medicine department at 3 pm and 4 pm tomorrow. Which one do you choose?"

[0295] Step 6:

[0296] The user selects the desired reservation from the presented options and performs an operation to express a confirmation intention. The selected information is sent to the server through the terminal.

[0297] Step 7:

[0298] The server sends the reservation information selected by the user as a confirmation request to the medical institution's reservation system, formally confirming the reservation. It generates a confirmation message and sends it back to the terminal to notify the user.

[0299] Step 8:

[0300] The terminal receives a confirmation message from the server and notifies the user, either visually or audibly, that the reservation is complete. It also generates a reminder and notifies the user as the reservation date and time approaches.

[0301] (Example 1)

[0302] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0303] In modern society, the numerous steps and technical complexities involved in making appointments at medical institutions pose significant obstacles for elderly individuals and those unfamiliar with digital devices. There is a need to provide such users with an intuitive and easy-to-use appointment system. Furthermore, the realization of technology capable of accurate speech recognition and intent analysis, while considering linguistic diversity and dialects, is essential. Additionally, it is necessary to provide a means for users to quickly receive assistance if they encounter difficulties during operation.

[0304] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0305] In this invention, the server includes a device that receives information input from a user, a technology that converts the received voice data into character information, and a technology that analyzes the converted character information by natural language analysis to extract the user's purpose. As a result, the user can easily make a reservation at a medical institution, and realize high-precision reservation processing through voice recognition adapted to various languages and dialects. In addition, if the user feels difficulty during operation, they can immediately receive support from the operator, so they can use it with confidence.

[0306] The "user" refers to a person who makes a reservation at a medical institution using the system.

[0307] The "information input" refers to data provided by the user to the system through voice or text.

[0308] The "device" refers to a hardware or software component that receives data from the user.

[0309] The "voice data" refers to a signal that stores the words spoken by the user in a digital format.

[0310] The "character information" refers to digital information obtained by converting voice data into text format.

[0311] The "technology" refers to the methods and means used by the system to achieve a specific function.

[0312] The "natural language analysis" refers to the process of analyzing data using language processing technology to understand the user's intention from the character information.

[0313] The "purpose" refers to the ultimate intention or goal when the user operates the system.

[0314] The "medical facility" refers to an organization or place that provides medical treatment and care.

[0315] An "information collection" refers to the data stored in databases and servers used by a system.

[0316] "Time slot" refers to a specific time frame during which reservations are possible.

[0317] An "organization" refers to a facility or organization that provides a specific service.

[0318] "Providing" refers to the act of a system presenting information or options to a user.

[0319] "Promotion" refers to the guidance a system provides to make it easier for users to take the next action.

[0320] "Final decision" refers to the official confirmation of a reservation based on the user's selection.

[0321] "Notification" refers to the act of informing users about the status of their reservation or any necessary confirmations.

[0322] "Establishing communication" means creating a situation where a user can contact an operator when they need support.

[0323] "Diversity" describes a wide range of things that have different types or forms.

[0324] "Language" refers to a systematic collection of sounds and characters that humans use to communicate.

[0325] A "dialect" refers to a variant of a language that exhibits different characteristics depending on the region and culture, even within the same language.

[0326] This invention provides a system that allows users to easily and intuitively make appointments at medical institutions. The system functions primarily through the interaction of a server, terminals, and users.

[0327] First, when a user wishes to make an appointment at a medical institution, they use an application on a device such as a smartphone or tablet. This application can accept input via voice or text. For example, a user might say, "I would like to make an appointment with the dermatologist next Friday." This input is done using the microphone built into the smartphone.

[0328] Next, the device converts the audio data into text. This conversion uses speech recognition software such as the Google Speech-to-Text API. The converted text data is then sent from the device to the server via the internet.

[0329] The server analyzes the received text data using natural language processing techniques. Libraries such as spaCy and Transformers are used for this analysis. As a result of the analysis, the user's intent is clarified, and the information necessary for making a reservation is extracted.

[0330] Next, the server queries a database containing information about medical facilities to determine available time slots and institutions. For example, AWS RDS can be used to efficiently query the database.

[0331] The terminal provides the user with reservation options sent from the server. These options are displayed clearly in either a calendar or list format. The user then selects the most suitable reservation option from the options.

[0332] Finally, the device sends the selected information back to the server to finalize the reservation. Once the reservation is confirmed, the server generates a confirmation message and sends it to the device to notify the user. Reminders can also be sent as needed, notifying the user as the reservation date approaches.

[0333] As a concrete example, here is an example of a prompt statement to be input to a generative AI model:

[0334] Please input "I would like to make an appointment with a dermatologist next Friday" using voice input, send this information to the AI ​​model for analysis, and suggest available medical institutions.

[0335] This system allows users to easily make appointments at medical facilities and features voice recognition adapted to various languages ​​and dialects. Furthermore, it enables rapid communication with operators in emergencies, allowing users to complete the booking process with peace of mind.

[0336] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0337] Step 1:

[0338] Users request appointments using voice or text. Specifically, they launch an application on their smartphone and input a request, such as "I would like to book an appointment with the dermatologist next Friday," using voice or text. This input constitutes the basic information input into the system.

[0339] Step 2:

[0340] The device receives user voice input and converts it into text. It uses the Google Speech-to-Text API to analyze the voice data and generate text. During this process, it performs the most optimal text conversion possible while voice is present, and prepares the resulting text data. This converted information is then passed to the next process as intermediate data.

[0341] Step 3:

[0342] The terminal sends text information to the server. Specifically, it sends an HTTP request over the internet to prepare for further processing on the server. The input here is the text information generated in step 2, and the output is the completion of the transmission to the server.

[0343] Step 4:

[0344] The server analyzes the received text information using natural language processing technology. Libraries such as spaCy and Transformers are used for analysis, extracting keywords such as "medical department" and "date and time." It receives text information as input and outputs the analysis data, identifying the information necessary for making a reservation.

[0345] Step 5:

[0346] The server executes queries against a database of medical facility information. It searches SQL and NoSQL databases for medical departments and dates that match the user's needs, and collects available reservation options. The input is analytical data, and the output is a list of options presented to the user.

[0347] Step 6:

[0348] The terminal receives reservation candidates sent from the server and presents them to the user. Specifically, it displays them in a calendar or list format to make them easily understandable visually to the user. At this point, the input is a list of reservation candidates, and the output is the provision of visual information to the user.

[0349] Step 7:

[0350] The user selects their preferred booking option from the presented choices. The user's selected booking information is then sent as input to the next step.

[0351] Step 8:

[0352] The terminal sends the user's selections back to the server to finalize the reservation. The input here is the user's selection information, and the output is a reservation confirmation command.

[0353] Step 9:

[0354] The server confirms the reservation and generates a confirmation message. It sends the confirmation information back to the terminal, notifying the user. The output provides the user with information on the reservation status and how to manage their future schedule.

[0355] Through the steps outlined above, users can intuitively and smoothly complete their medical appointment bookings.

[0356] (Application Example 1)

[0357] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0358] The goal is to eliminate the complexities of the procedures and communication barriers faced by elderly users and those unfamiliar with digital tools when making appointments at medical institutions. In particular, it is necessary to provide a more intuitive and smoother booking process by utilizing voice input.

[0359] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0360] In this invention, the server includes an input device that receives voice data or string data, a conversion device that converts the input voice data into string data, and a natural language processing device that analyzes the converted string data and identifies the intent. This makes it possible for users to easily communicate their reservation requests through voice input and for the system to automatically present appropriate reservation information based on that.

[0361] "Audio data" refers to data that captures the voice spoken by the user as a digital signal.

[0362] "String data" refers to text-formatted data obtained by converting audio data.

[0363] An "input device" is a device used to receive audio data or text data.

[0364] A "conversion device" is a device that has the function of converting audio data into text data.

[0365] A "natural language processing device" is a device that analyzes text data and performs processing to identify the user's intent.

[0366] "Medical institution" is a general term for organizations and facilities that provide medical care and treatment.

[0367] An "information system" is a computer system used to manage and manipulate data and information.

[0368] A "search device" is a device used to access an information system and find available time slots and locations for reservations.

[0369] A "user interface" is an interface used for exchanging information between a system and a user.

[0370] A "notification device" is a device used to transmit information or messages to a user.

[0371] A "voice output device" is a device that presents generated options or information to the user as voice.

[0372] A "support device" is a device that has the function of enabling communication with an operator when a user has difficulty making a reservation selection.

[0373] This invention is a system that enables elderly people and users unfamiliar with using digital devices to smoothly make appointments at medical institutions using voice input. The system converts voice data into string data, analyzes the user's requests using natural language processing, and presents appropriate appointment options.

[0374] The terminal is equipped with an input device to receive audio data and uses speech recognition technology to convert the acquired audio into a digital format. The converted audio data is sent to the server as text data. In this operation, the "Google Speech Recognition API" can be used as a specific example of a speech recognition engine.

[0375] After receiving the string data, the server uses a natural language processing unit to analyze the data and extract information about the medical institution and date / time the user wishes to book. The search device on the server accesses the medical institution's information system to obtain information about available time slots and medical departments. Specific examples of natural language processing technologies used here include "spaCy" or "NLTK".

[0376] Users can receive search results presented through a user interface, and be prompted to make a selection. Furthermore, it is possible to provide information without relying on visual cues by using an audio output device to present options.

[0377] Once the user completes their selection, the device will notify them of the appointment confirmation via a notification device and set reminders as needed to ensure they don't forget their appointment at the medical institution.

[0378] For example, if a user uses voice input to say, "I want to book an appointment with a dermatologist next Friday," the system will search for suitable dates and times and potential medical facilities, present the user with the best option, and confirm the appointment for the selected time slot.

[0379] An example prompt for the generating AI model is as follows: "Please propose a system that allows elderly people to easily book appointments at medical facilities via their smartphones. Include a function that uses voice input to automatically retrieve and suggest available medical departments to the user."

[0380] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0381] Step 1:

[0382] The device receives audio data. It captures what the user says via the microphone and converts the analog audio into a digital audio signal. The input is the user's voice, and the output is digital audio data. Specifically, the interface between the microphone and the speech recognition engine is used.

[0383] Step 2:

[0384] The device converts digital audio data into text data. It uses a speech recognition engine to analyze the audio signal and convert it into text format. The input is digital audio data, and the output is text data. Specifically, it calls speech recognition services such as the "Google Speech Recognition API."

[0385] Step 3:

[0386] The server analyzes string data to identify the user's intent. It utilizes a natural language processing system to extract necessary keywords and phrases from the text. The input is string data, and the output is reservation intent information (e.g., date and time, medical specialty). Specifically, the natural language processing engine "spaCy" or "NLTK" is used.

[0387] Step 4:

[0388] The server accesses the medical institution's information system to search for available appointment times and departments. It queries the database to retrieve appointment options that match the user's intent. The input is information about the appointment intent, and the output is a list of available appointment options. Specifically, it executes SQL queries to retrieve the necessary data.

[0389] Step 5:

[0390] The terminal presents the user with reservation options. The user interface visually or audibly displays the options and prompts the user to make a selection. Input is a list of available reservation options, and output is the user's selection. Specific operations include screen display and speech synthesis technology.

[0391] Step 6:

[0392] The user selects a reservation option on the terminal, and the terminal sends this information to the server. This transmits reservation information based on the user's selection to the server. The input is the user's selection, and the output is the selected reservation information. The terminal's selection interface is used for this operation.

[0393] Step 7:

[0394] The server confirms the reservation and notifies the terminal. It registers the selected information in the reservation system and generates a confirmation message. The input is the selected reservation information, and the output is the reservation confirmation message. Specifically, database updates and the notification system are involved.

[0395] Step 8:

[0396] The device confirms the reservation with the user and sets a reminder. It notifies the user that the reservation is complete and sets up a reminder to notify them as the reservation date approaches. The input is a reservation confirmation message, and the output is a reminder message. Specifically, the device's notification function and the calendar API are used.

[0397] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0398] This invention is a system that takes into account the user's emotions when making a reservation at a medical institution, providing a more comfortable and smoother experience. The system comprises a terminal for receiving voice or text input from the user, a server for analyzing the data and processing the reservation, and an emotion engine for recognizing the user's emotions.

[0399] First, the user communicates their appointment request to the system using the device's voice or text input function. For example, they might say or type, "I'd like to make a dental appointment for next Wednesday."

[0400] The terminal converts voice input into text using a speech recognition engine and then sends it to the server. The server analyzes this text data using natural language processing technology to identify the user's desired medical department and date / time. Simultaneously, an emotion engine analyzes the user's emotions from the tone of voice and selected words in the input data. For example, it might determine that the user is experiencing stress.

[0401] Based on the information gathered and the results of sentiment analysis, the server queries a database of healthcare facilities to generate the most suitable booking options. For example, if the system analyzes that the user is stressed, it will prioritize showing time slots where appointments can be booked quickly.

[0402] Next, the generated reservation options are sent to the terminal and presented to the user. If the user makes a selection, that selection is sent to the server and the formal reservation process is completed. Simultaneously with the reservation confirmation, the server generates a confirmation notification and sends it to the terminal to inform the user. Furthermore, it also has a function that suggests connecting the user to an operator depending on the user's emotional state. For example, if frustration or anxiety is detected, it automatically guides the user to human support.

[0403] Finally, as the reservation date approaches, the system generates a reminder and notifies the user via their device. This ensures the user doesn't forget to use their reservation. By combining this with an emotion engine, a more user-friendly reservation system is created.

[0404] The following describes the processing flow.

[0405] Step 1:

[0406] Users enter their appointment requests using the device's voice or text input function. For example, they might say, "I'd like to make an appointment with the orthopedics department next Monday," or type the same thing as text.

[0407] Step 2:

[0408] The terminal uses a speech recognition engine to convert the input speech data into text data. The converted text data or text input data is then sent directly to the server.

[0409] Step 3:

[0410] The server analyzes the received text data using a natural language processing engine and extracts important keywords such as the medical department and date / time from the user's reservation request.

[0411] Step 4:

[0412] The server uses an emotion engine to analyze the user's emotions from the content of text data or, if it's audio, from the tone of voice. For example, it can determine if the user's voice contains feelings of anxiety or unease.

[0413] Step 5:

[0414] Based on the extracted information and sentiment analysis results, the server consults a database of healthcare facilities to generate the most suitable booking options for available dates, times, and facilities. For example, if the user is feeling anxious, it prioritizes selecting earlier time slots.

[0415] Step 6:

[0416] The terminal presents the user with reservation options sent from the server. The information is displayed on the screen or spoken aloud, for example, "You have an appointment available at the orthopedics department on Monday at 10:00 AM. Do you want to confirm the reservation?"

[0417] Step 7:

[0418] The user selects their preferred booking option from the presented choices and confirms it. The selected information is sent to the server via the terminal.

[0419] Step 8:

[0420] Based on the user's selection information, the server sends a reservation confirmation request to the medical institution's reservation system, formally confirming the reservation. After confirmation, a detailed confirmation message is generated and sent to the user's terminal to inform them.

[0421] Step 9:

[0422] The device receives a confirmation message from the server and informs the user that the reservation is complete. It also follows up with the user by sending reminder notifications as the reservation date approaches, if necessary.

[0423] (Example 2)

[0424] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0425] Modern medical appointment systems often fail to consider user emotions, leading to stress and anxiety for users as they proceed with booking appointments. Furthermore, users who experience difficulties in selecting appointments often lack access to adequate support.

[0426] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0427] In this invention, the server includes means for receiving voice or text input from the user, means for analyzing the user's emotional state, and means for generating optimal reservation options based on the user's emotional state. This enables a comfortable and smooth reservation experience that takes the user's emotions into consideration.

[0428] A "user" refers to an individual who uses the system to make an appointment at a medical institution.

[0429] "Voice data" refers to information that records the voice spoken by a user in digital format.

[0430] "Text data" refers to written information obtained through audio data or user text input.

[0431] "Natural language processing technology" refers to the technology that enables devices to understand and analyze human language.

[0432] "Emotional state" refers to the emotional state a user exhibits while using the system.

[0433] A "medical institution database" refers to a collection of information that includes available appointment times and facility information for medical institutions.

[0434] "Reservation options" refer to the multiple available date, time, and facility choices presented to the user.

[0435] "Operator" refers to a person or role that provides support to users through a system.

[0436] "Speech recognition technology" refers to the technology that converts speech data into text data.

[0437] This invention is a system designed to provide a comfortable and smooth experience for users when making appointments at medical institutions, taking into account their emotional state. The following describes embodiments for carrying out the invention.

[0438] Users enter their reservation preferences using voice or text via a terminal. When using voice input, the terminal employs speech recognition technology to convert the voice data into text. For example, typical speech recognition software processes this input.

[0439] The device converts the received audio into text and then sends it to the server as digital data. During this process, natural language processing (NLP) technology is used to analyze the text and extract the user's intent. Specifically, the next scheduled appointment might be translated as "I would like to make an internal medicine appointment for next Tuesday." At this stage, the natural language processing engine and sentiment analysis engine are operational. For example, an NLP library might be used for analysis, and the sentiment analysis engine might be used for sentiment analysis.

[0440] The server identifies the user's desired medical department and appointment time based on text parsed in natural language. Simultaneously, it analyzes the user's emotional state through sentiment analysis. The results of the sentiment analysis assess whether the user is experiencing stress or anxiety, and are used to generate the most suitable appointment options.

[0441] The server accesses a database of medical institutions based on the analysis results and emotional state information, searching for available time slots and facility information. From this data, it presents the most suitable booking options according to the user's emotional state. For example, if it determines that the user is feeling anxious, it generates an option to prioritize immediately available appointments.

[0442] For example, if a user enters text saying, "I want to make a dermatology appointment next Friday, but I've been busy lately and I'm worried about my health," the system uses an emotion engine to determine the user's concerns and prioritizes presenting time slots with high appointment availability.

[0443] An example of a prompt for a generative AI model is, "What suggestions can be made if a user wants to make an appointment at a medical institution but may be experiencing stress?"

[0444] This enables reservation suggestions that take user emotions into consideration, resulting in a more comfortable service for users.

[0445] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0446] Step 1:

[0447] The user enters their request for a medical appointment via voice or text. This user input becomes the initial data. For example, they might enter a request such as, "I would like to make an appointment with an internal medicine specialist next Tuesday."

[0448] Step 2:

[0449] The device uses speech recognition technology to convert speech data into text data. This process converts the input from speech to text format. The converted text is then sent to the server as digital data.

[0450] Step 3:

[0451] The server analyzes the received text data using natural language processing technology to extract the user's intent. In this step, a generative AI model is used to analyze the text and identify information such as the medical department and appointment date / time the user desires. For example, keywords such as "internal medicine" and "next Tuesday" may be extracted.

[0452] Step 4:

[0453] The server uses an emotion analysis engine to analyze the user's emotional state. The input data used is the text obtained in the previous step, from which emotional features are extracted. For example, it can determine whether the user is experiencing stress.

[0454] Step 5:

[0455] The server accesses a database of medical institutions based on the analysis results and emotional information, searching for available time slots and facility information. This process generates booking options that are suitable for the user's intentions and emotions. As a result, the booking options deemed most suitable for the user are output.

[0456] Step 6:

[0457] The terminal presents the user with reservation options sent from the server. The output data consists of reservation date and time and facility information, which is presented to the user visually or audibly.

[0458] Step 7:

[0459] The user selects their desired reservation from the presented options. This selection registers the user's decision as input data.

[0460] Step 8:

[0461] The server confirms the reservation based on the user's selection and notifies them of this information. The confirmation notification is sent to the terminal, informing the user that the reservation is complete. This officially establishes the reservation with the medical institution.

[0462] (Application Example 2)

[0463] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0464] Traditional reservation systems perform reservation procedures mechanically without considering the user's subjective emotional state, which led to problems such as the reservation process not proceeding smoothly when the user was feeling anxious or stressed. Furthermore, there was a lack of options for receiving appropriate support, resulting in a lower quality of user experience.

[0465] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0466] In this invention, the server includes means for receiving voice or text input from the user and converting the voice data into text data, means for analyzing the converted text data using natural language processing technology to extract the user's intent, and means for analyzing the user's emotions from the user's input data. This makes it possible to present the optimal reservation option according to the user's emotional state and to quickly suggest human support as needed.

[0467] A "user" refers to an individual or their representative who makes a reservation using the system and provides input information via voice or text.

[0468] "Voice data" refers to information provided by the user through voice, and is the raw data that the system uses as preprocessing for natural language analysis.

[0469] "Text data" refers to the format in which audio data is represented as characters, and it is the foundational information that a system uses to analyze it using natural language processing technology.

[0470] "Natural language processing technology" refers to a group of technologies that analyze text data to understand its intent and content, enabling machines to produce meaningful responses and actions.

[0471] A "means of analyzing emotions" refers to a module within a system that extracts emotional nuances from user-inputted voice or text and evaluates the user's emotional state based on that.

[0472] "Optimal booking options" are suggestions generated to present available dates, times, and facility choices based on the user's intentions and emotional state.

[0473] "Support" refers to the connection with an operator and additional guidance provided based on emotional analysis when a user has difficulty making a reservation.

[0474] This invention provides a method for realizing a system that provides an optimal booking experience based on the user's emotional state.

[0475] The server plays a central role in this system. The server uses a speech recognition engine (e.g., Google Speech-to-Text API) to receive voice input from the user and convert it into text data. This text data is then analyzed using natural language processing techniques (e.g., Python's NLTK or SpaCy) to extract the user's intent. Furthermore, sentiment analysis tools such as AWS Comprehend and IBM Watson Tone Analyzer are used to analyze the user's emotions and evaluate their emotional state.

[0476] A terminal is a device that allows users to input information through an interface and receive results. The terminal sends voice or text input to the server and presents the user with reservation options and confirmation notifications sent from the server.

[0477] When a user enters information about the service they intend to book, the data is analyzed, and the most suitable booking options are generated based on the user's emotional state. For example, if a user enters, "I want to book a day care service for my mother this weekend, but I'm a little nervous," the system will present available booking options at facilities that will make the user feel at ease. Furthermore, if the emotional analysis indicates that the user is having difficulty making a booking choice, a connection with an operator will be immediately suggested.

[0478] For example, users can maximize the system's functionality by using prompts such as, "I'd like to book a day care service for the weekend, but I'm a little worried. Could you tell me what times are available?" This allows users to easily select the best booking option and receive the necessary support.

[0479] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0480] Step 1:

[0481] The terminal accepts voice or text input from the user. The user speaks or texts about the desired reservation details and date / time. This input forms the basis of the reservation process.

[0482] Step 2:

[0483] The server uses a speech recognition engine to convert voice data sent from the terminal into text data. When voice input is received, the speech recognition engine processes the data and converts it into text information, preparing it for natural language processing.

[0484] Step 3:

[0485] The server analyzes the text data converted using natural language processing technology. During the analysis, it extracts the user's desired service details and date / time. This clarifies the specific intent necessary for making a reservation.

[0486] Step 4:

[0487] The server uses an emotion analysis engine to evaluate the user's emotional state from text data. Based on the input words and their tone, it performs data processing to identify emotions and numerically interprets feelings such as stress and anxiety.

[0488] Step 5:

[0489] The server searches a database of facilities for the best booking option based on the results of natural language processing and sentiment analysis. It queries facility availability and service details, and generates options that are suitable for the user's preferences and emotional state.

[0490] Step 6:

[0491] The terminal presents the user with reservation options sent from the server. The user can review the presented options and choose the one that best suits their needs. The user's selection at this step becomes the final reservation.

[0492] Step 7:

[0493] The server confirms the reservation selected by the user and sends a reservation confirmation notification to the device. This officially records the final reservation, and the user retains the confirmation information.

[0494] Step 8:

[0495] The server will suggest connecting you with an operator as needed. If the sentiment analysis determines that you are having difficulty making a reservation, the server will immediately begin guiding you to an operator for assistance.

[0496] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0497] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0498] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0499] [Third Embodiment]

[0500] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0501] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0502] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0503] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0504] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0505] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0506] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0507] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0508] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0509] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0510] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0511] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0512] This invention provides a system that allows elderly people and those who are not comfortable using digital devices to easily make appointments at medical institutions. The system primarily operates based on the user's voice or text input, with the server and terminal working together to process the information.

[0513] First, the device uses a microphone and recording function to receive voice input from the user. If the user says, "I would like to make an appointment with the dermatologist next Friday," that voice is captured as a digital signal. Text input is also possible, and the user can enter "I would like to make an appointment with the dermatologist next Friday" using the keyboard or by touching the screen.

[0514] In the case of voice input, the terminal uses a speech recognition engine to convert the voice signal into text data. The converted text data is then sent to the server to process the reservation.

[0515] The server analyzes the received text data using natural language processing technology to clarify the user's intent. Specifically, it extracts keywords such as "medical department" and "date and time" from the text to identify the information necessary for making a reservation.

[0516] The server then queries the medical institution's database to check for available appointment times and corresponding departments. The confirmed appointment options are reconfigured into a selectable format for the user and sent to the terminal.

[0517] Next, the device presents these options to the user and prompts them to select a specific date, time, and medical department. For example, it might ask, "We have an appointment available for dermatology on Friday at 10am. Is this alright?"

[0518] After the user makes a selection, the device sends that information back to the server to confirm the reservation. Once the reservation is confirmed, the server generates a confirmation message and sends it to the device to notify the user that the reservation is complete. A reminder is also sent as the reservation date approaches.

[0519] This system also includes a feature that allows users to connect with an operator with a single button press if they encounter difficulties during the booking process. This enables quick responses even when direct support is needed. Furthermore, the voice recognition function supports multiple languages ​​and is designed to be easy to use for users living in rural areas, as it flexibly adapts to Japanese dialects and other regional variations.

[0520] The following describes the processing flow.

[0521] Step 1:

[0522] Users enter their desired appointment details through their device's voice or text input function. For example, they might say, "I'd like to book an internal medicine appointment for tomorrow afternoon," or type similar information as text.

[0523] Step 2:

[0524] The terminal converts voice input into text data using a speech recognition engine. If text is entered, it is processed as text data. The converted or entered text is then sent to the server for further analysis.

[0525] Step 3:

[0526] The server feeds the received text data to a natural language processing engine, which analyzes the user's intent from the text. In this process, it extracts important keywords such as the desired medical department and preferred date and time.

[0527] Step 4:

[0528] Based on the extracted information, the server sends a query to the healthcare facility database to retrieve available dates, times, and a list of facilities. The obtained data is then converted into a user-friendly format and sent to the terminal.

[0529] Step 5:

[0530] The terminal presents the user with available reservation options received from the server. For example, it might display a message on the screen or announce a voice message saying, "You have appointments available in the internal medicine department tomorrow at 3pm and 4pm. Which would you prefer?"

[0531] Step 6:

[0532] The user selects their desired reservation from the presented options and confirms it. The selected information is sent to the server via the terminal.

[0533] Step 7:

[0534] The server sends the reservation information selected by the user as a confirmation request to the medical institution's reservation system, formally confirming the reservation. It generates a confirmation message and sends it back to the terminal to notify the user.

[0535] Step 8:

[0536] The terminal receives a confirmation message from the server and notifies the user, either visually or audibly, that the reservation is complete. It also generates a reminder and notifies the user as the reservation date and time approaches.

[0537] (Example 1)

[0538] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0539] In modern society, the numerous steps and technical complexities involved in making appointments at medical institutions pose significant obstacles for elderly individuals and those unfamiliar with digital devices. There is a need to provide such users with an intuitive and easy-to-use appointment system. Furthermore, the realization of technology capable of accurate speech recognition and intent analysis, while considering linguistic diversity and dialects, is essential. Additionally, it is necessary to provide a means for users to quickly receive assistance if they encounter difficulties during operation.

[0540] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0541] This invention includes a server that receives information input from the user, a technology that converts the received voice data into text information, and a technology that analyzes the converted text information using natural language processing to extract the user's purpose. This allows users to easily make reservations at medical institutions and achieves highly accurate reservation processing through voice recognition adapted to various languages ​​and dialects. Furthermore, if users encounter difficulties during operation, they can immediately receive support from the operator, ensuring a safe and secure user experience.

[0542] A "user" refers to a person who uses the system to make an appointment at a medical institution.

[0543] "Information input" refers to data that users provide to the system through voice or text.

[0544] "Device" refers to a hardware or software component that receives data from a user.

[0545] "Audio data" refers to a signal in which the words spoken by a user are saved in a digital format.

[0546] "Textual information" refers to digital information obtained by converting audio data into text format.

[0547] "Technology" refers to the methods and means that a system uses to achieve a specific function.

[0548] "Natural language processing" refers to the process of analyzing data using language processing techniques to understand the user's intent from textual information.

[0549] "Purpose" refers to the ultimate intention or goal of a user when operating a system.

[0550] A "medical facility" refers to an organization or place that provides medical treatment or medical care.

[0551] An "information collection" refers to the data stored in databases and servers used by a system.

[0552] "Time slot" refers to a specific time frame during which reservations are possible.

[0553] An "organization" refers to a facility or organization that provides a specific service.

[0554] "Providing" refers to the act of a system presenting information or options to a user.

[0555] "Promotion" refers to the guidance a system provides to make it easier for users to take the next action.

[0556] "Final decision" refers to the official confirmation of a reservation based on the user's selection.

[0557] "Notification" refers to the act of informing users about the status of their reservation or any necessary confirmations.

[0558] "Establishing communication" means creating a situation where a user can contact an operator when they need support.

[0559] "Diversity" describes a wide range of things that have different types or forms.

[0560] "Language" refers to a systematic collection of sounds and characters that humans use to communicate.

[0561] A "dialect" refers to a variant of a language that exhibits different characteristics depending on the region and culture, even within the same language.

[0562] This invention provides a system that allows users to easily and intuitively make appointments at medical institutions. The system functions primarily through the interaction of a server, terminals, and users.

[0563] First, when a user wishes to make an appointment at a medical institution, they use an application on a device such as a smartphone or tablet. This application can accept input via voice or text. For example, a user might say, "I would like to make an appointment with the dermatologist next Friday." This input is done using the microphone built into the smartphone.

[0564] Next, the device converts the audio data into text. This conversion uses speech recognition software such as the Google Speech-to-Text API. The converted text data is then sent from the device to the server via the internet.

[0565] The server analyzes the received text data using natural language processing techniques. Libraries such as spaCy and Transformers are used for this analysis. As a result of the analysis, the user's intent is clarified, and the information necessary for making a reservation is extracted.

[0566] Next, the server queries a database containing information about medical facilities to determine available time slots and institutions. For example, AWS RDS can be used to efficiently query the database.

[0567] The terminal provides the user with reservation options sent from the server. These options are displayed clearly in either a calendar or list format. The user then selects the most suitable reservation option from the options.

[0568] Finally, the device sends the selected information back to the server to finalize the reservation. Once the reservation is confirmed, the server generates a confirmation message and sends it to the device to notify the user. Reminders can also be sent as needed, notifying the user as the reservation date approaches.

[0569] As a concrete example, here is an example of a prompt statement to be input to a generative AI model:

[0570] Please input "I would like to make an appointment with a dermatologist next Friday" using voice input, send this information to the AI ​​model for analysis, and suggest available medical institutions.

[0571] This system allows users to easily make appointments at medical facilities and features voice recognition adapted to various languages ​​and dialects. Furthermore, it enables rapid communication with operators in emergencies, allowing users to complete the booking process with peace of mind.

[0572] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0573] Step 1:

[0574] Users request appointments using voice or text. Specifically, they launch an application on their smartphone and input a request, such as "I would like to book an appointment with the dermatologist next Friday," using voice or text. This input constitutes the basic information input into the system.

[0575] Step 2:

[0576] The device receives user voice input and converts it into text. It uses the Google Speech-to-Text API to analyze the voice data and generate text. During this process, it performs the most optimal text conversion possible while voice is present, and prepares the resulting text data. This converted information is then passed to the next process as intermediate data.

[0577] Step 3:

[0578] The terminal sends text information to the server. Specifically, it sends an HTTP request over the internet to prepare for further processing on the server. The input here is the text information generated in step 2, and the output is the completion of the transmission to the server.

[0579] Step 4:

[0580] The server analyzes the received text information using natural language processing technology. Libraries such as spaCy and Transformers are used for analysis, extracting keywords such as "medical department" and "date and time." It receives text information as input and outputs the analysis data, identifying the information necessary for making a reservation.

[0581] Step 5:

[0582] The server executes queries against a database of medical facility information. It searches SQL and NoSQL databases for medical departments and dates that match the user's needs, and collects available reservation options. The input is analytical data, and the output is a list of options presented to the user.

[0583] Step 6:

[0584] The terminal receives reservation candidates sent from the server and presents them to the user. Specifically, it displays them in a calendar or list format to make them easily understandable visually to the user. At this point, the input is a list of reservation candidates, and the output is the provision of visual information to the user.

[0585] Step 7:

[0586] The user selects their preferred booking option from the presented choices. The user's selected booking information is then sent as input to the next step.

[0587] Step 8:

[0588] The terminal sends the user's selections back to the server to finalize the reservation. The input here is the user's selection information, and the output is a reservation confirmation command.

[0589] Step 9:

[0590] The server confirms the reservation and generates a confirmation message. It sends the confirmation information back to the terminal, notifying the user. The output provides the user with information on the reservation status and how to manage their future schedule.

[0591] Through the steps outlined above, users can intuitively and smoothly complete their medical appointment bookings.

[0592] (Application Example 1)

[0593] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0594] The goal is to eliminate the complexities of the procedures and communication barriers faced by elderly users and those unfamiliar with digital tools when making appointments at medical institutions. In particular, it is necessary to provide a more intuitive and smoother booking process by utilizing voice input.

[0595] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0596] In this invention, the server includes an input device that receives voice data or string data, a conversion device that converts the input voice data into string data, and a natural language processing device that analyzes the converted string data and identifies the intent. This makes it possible for users to easily communicate their reservation requests through voice input and for the system to automatically present appropriate reservation information based on that.

[0597] "Audio data" refers to data that captures the voice spoken by the user as a digital signal.

[0598] "String data" refers to text-formatted data obtained by converting audio data.

[0599] An "input device" is a device used to receive audio data or text data.

[0600] A "conversion device" is a device that has the function of converting audio data into text data.

[0601] A "natural language processing device" is a device that analyzes text data and performs processing to identify the user's intent.

[0602] "Medical institution" is a general term for organizations and facilities that provide medical care and treatment.

[0603] An "information system" is a computer system used to manage and manipulate data and information.

[0604] A "search device" is a device used to access an information system and find available time slots and locations for reservations.

[0605] A "user interface" is an interface used for exchanging information between a system and a user.

[0606] A "notification device" is a device used to transmit information or messages to a user.

[0607] A "voice output device" is a device that presents generated options or information to the user as voice.

[0608] A "support device" is a device that has the function of enabling communication with an operator when a user has difficulty making a reservation selection.

[0609] This invention is a system that enables elderly people and users unfamiliar with using digital devices to smoothly make appointments at medical institutions using voice input. The system converts voice data into string data, analyzes the user's requests using natural language processing, and presents appropriate appointment options.

[0610] The terminal is equipped with an input device to receive audio data and uses speech recognition technology to convert the acquired audio into a digital format. The converted audio data is sent to the server as text data. In this operation, the "Google Speech Recognition API" can be used as a specific example of a speech recognition engine.

[0611] After receiving the string data, the server uses a natural language processing unit to analyze the data and extract information about the medical institution and date / time the user wishes to book. The search device on the server accesses the medical institution's information system to obtain information about available time slots and medical departments. Specific examples of natural language processing technologies used here include "spaCy" or "NLTK".

[0612] Users can receive search results presented through a user interface, and be prompted to make a selection. Furthermore, it is possible to provide information without relying on visual cues by using an audio output device to present options.

[0613] Once the user completes their selection, the device will notify them of the appointment confirmation via a notification device and set reminders as needed to ensure they don't forget their appointment at the medical institution.

[0614] For example, if a user uses voice input to say, "I want to book an appointment with a dermatologist next Friday," the system will search for suitable dates and times and potential medical facilities, present the user with the best option, and confirm the appointment for the selected time slot.

[0615] An example prompt for the generating AI model is as follows: "Please propose a system that allows elderly people to easily book appointments at medical facilities via their smartphones. Include a function that uses voice input to automatically retrieve and suggest available medical departments to the user."

[0616] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0617] Step 1:

[0618] The device receives audio data. It captures what the user says via the microphone and converts the analog audio into a digital audio signal. The input is the user's voice, and the output is digital audio data. Specifically, the interface between the microphone and the speech recognition engine is used.

[0619] Step 2:

[0620] The device converts digital audio data into text data. It uses a speech recognition engine to analyze the audio signal and convert it into text format. The input is digital audio data, and the output is text data. Specifically, it calls speech recognition services such as the "Google Speech Recognition API."

[0621] Step 3:

[0622] The server analyzes string data to identify the user's intent. It utilizes a natural language processing system to extract necessary keywords and phrases from the text. The input is string data, and the output is reservation intent information (e.g., date and time, medical specialty). Specifically, the natural language processing engine "spaCy" or "NLTK" is used.

[0623] Step 4:

[0624] The server accesses the medical institution's information system to search for available appointment times and departments. It queries the database to retrieve appointment options that match the user's intent. The input is information about the appointment intent, and the output is a list of available appointment options. Specifically, it executes SQL queries to retrieve the necessary data.

[0625] Step 5:

[0626] The terminal presents the user with reservation options. The user interface visually or audibly displays the options and prompts the user to make a selection. Input is a list of available reservation options, and output is the user's selection. Specific operations include screen display and speech synthesis technology.

[0627] Step 6:

[0628] The user selects a reservation option on the terminal, and the terminal sends this information to the server. This transmits reservation information based on the user's selection to the server. The input is the user's selection, and the output is the selected reservation information. The terminal's selection interface is used for this operation.

[0629] Step 7:

[0630] The server confirms the reservation and notifies the terminal. It registers the selected information in the reservation system and generates a confirmation message. The input is the selected reservation information, and the output is the reservation confirmation message. Specifically, database updates and the notification system are involved.

[0631] Step 8:

[0632] The device confirms the reservation with the user and sets a reminder. It notifies the user that the reservation is complete and sets up a reminder to notify them as the reservation date approaches. The input is a reservation confirmation message, and the output is a reminder message. Specifically, the device's notification function and the calendar API are used.

[0633] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0634] This invention is a system that takes into account the user's emotions when making a reservation at a medical institution, providing a more comfortable and smoother experience. The system comprises a terminal for receiving voice or text input from the user, a server for analyzing the data and processing the reservation, and an emotion engine for recognizing the user's emotions.

[0635] First, the user communicates their appointment request to the system using the device's voice or text input function. For example, they might say or type, "I'd like to make a dental appointment for next Wednesday."

[0636] The terminal converts voice input into text using a speech recognition engine and then sends it to the server. The server analyzes this text data using natural language processing technology to identify the user's desired medical department and date / time. Simultaneously, an emotion engine analyzes the user's emotions from the tone of voice and selected words in the input data. For example, it might determine that the user is experiencing stress.

[0637] Based on the information gathered and the results of sentiment analysis, the server queries a database of healthcare facilities to generate the most suitable booking options. For example, if the system analyzes that the user is stressed, it will prioritize showing time slots where appointments can be booked quickly.

[0638] Next, the generated reservation options are sent to the terminal and presented to the user. If the user makes a selection, that selection is sent to the server and the formal reservation process is completed. Simultaneously with the reservation confirmation, the server generates a confirmation notification and sends it to the terminal to inform the user. Furthermore, it also has a function that suggests connecting the user to an operator depending on the user's emotional state. For example, if frustration or anxiety is detected, it automatically guides the user to human support.

[0639] Finally, as the reservation date approaches, the system generates a reminder and notifies the user via their device. This ensures the user doesn't forget to use their reservation. By combining this with an emotion engine, a more user-friendly reservation system is created.

[0640] The following describes the processing flow.

[0641] Step 1:

[0642] Users enter their appointment requests using the device's voice or text input function. For example, they might say, "I'd like to make an appointment with the orthopedics department next Monday," or type the same thing as text.

[0643] Step 2:

[0644] The terminal uses a speech recognition engine to convert the input speech data into text data. The converted text data or text input data is then sent directly to the server.

[0645] Step 3:

[0646] The server analyzes the received text data using a natural language processing engine and extracts important keywords such as the medical department and date / time from the user's reservation request.

[0647] Step 4:

[0648] The server uses an emotion engine to analyze the user's emotions from the content of text data or, if it's audio, from the tone of voice. For example, it can determine if the user's voice contains feelings of anxiety or unease.

[0649] Step 5:

[0650] Based on the extracted information and sentiment analysis results, the server consults a database of healthcare facilities to generate the most suitable booking options for available dates, times, and facilities. For example, if the user is feeling anxious, it prioritizes selecting earlier time slots.

[0651] Step 6:

[0652] The terminal presents the user with reservation options sent from the server. The information is displayed on the screen or spoken aloud, for example, "You have an appointment available at the orthopedics department on Monday at 10:00 AM. Do you want to confirm the reservation?"

[0653] Step 7:

[0654] The user selects their preferred booking option from the presented choices and confirms it. The selected information is sent to the server via the terminal.

[0655] Step 8:

[0656] Based on the user's selection information, the server sends a reservation confirmation request to the medical institution's reservation system, formally confirming the reservation. After confirmation, a detailed confirmation message is generated and sent to the user's terminal to inform them.

[0657] Step 9:

[0658] The device receives a confirmation message from the server and informs the user that the reservation is complete. It also follows up with the user by sending reminder notifications as the reservation date approaches, if necessary.

[0659] (Example 2)

[0660] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0661] Modern medical appointment systems often fail to consider user emotions, leading to stress and anxiety for users as they proceed with booking appointments. Furthermore, users who experience difficulties in selecting appointments often lack access to adequate support.

[0662] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0663] In this invention, the server includes means for receiving voice or text input from the user, means for analyzing the user's emotional state, and means for generating optimal reservation options based on the user's emotional state. This enables a comfortable and smooth reservation experience that takes the user's emotions into consideration.

[0664] A "user" refers to an individual who uses the system to make an appointment at a medical institution.

[0665] "Voice data" refers to information that records the voice spoken by a user in digital format.

[0666] "Text data" refers to written information obtained through audio data or user text input.

[0667] "Natural language processing technology" refers to the technology that enables devices to understand and analyze human language.

[0668] "Emotional state" refers to the emotional state a user exhibits while using the system.

[0669] A "medical institution database" refers to a collection of information that includes available appointment times and facility information for medical institutions.

[0670] "Reservation options" refer to the multiple available date, time, and facility choices presented to the user.

[0671] "Operator" refers to a person or role that provides support to users through a system.

[0672] "Speech recognition technology" refers to the technology that converts speech data into text data.

[0673] This invention is a system designed to provide a comfortable and smooth experience for users when making appointments at medical institutions, taking into account their emotional state. The following describes embodiments for carrying out the invention.

[0674] Users enter their reservation preferences using voice or text via a terminal. When using voice input, the terminal employs speech recognition technology to convert the voice data into text. For example, typical speech recognition software processes this input.

[0675] The device converts the received audio into text and then sends it to the server as digital data. During this process, natural language processing (NLP) technology is used to analyze the text and extract the user's intent. Specifically, the next scheduled appointment might be translated as "I would like to make an internal medicine appointment for next Tuesday." At this stage, the natural language processing engine and sentiment analysis engine are operational. For example, an NLP library might be used for analysis, and the sentiment analysis engine might be used for sentiment analysis.

[0676] The server identifies the user's desired medical department and appointment time based on text parsed in natural language. Simultaneously, it analyzes the user's emotional state through sentiment analysis. The results of the sentiment analysis assess whether the user is experiencing stress or anxiety, and are used to generate the most suitable appointment options.

[0677] The server accesses a database of medical institutions based on the analysis results and emotional state information, searching for available time slots and facility information. From this data, it presents the most suitable booking options according to the user's emotional state. For example, if it determines that the user is feeling anxious, it generates an option to prioritize immediately available appointments.

[0678] For example, if a user enters text saying, "I want to make a dermatology appointment next Friday, but I've been busy lately and I'm worried about my health," the system uses an emotion engine to determine the user's concerns and prioritizes presenting time slots with high appointment availability.

[0679] An example of a prompt for a generative AI model is, "What suggestions can be made if a user wants to make an appointment at a medical institution but may be experiencing stress?"

[0680] This enables reservation suggestions that take user emotions into consideration, resulting in a more comfortable service for users.

[0681] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0682] Step 1:

[0683] The user enters their request for a medical appointment via voice or text. This user input becomes the initial data. For example, they might enter a request such as, "I would like to make an appointment with an internal medicine specialist next Tuesday."

[0684] Step 2:

[0685] The device uses speech recognition technology to convert speech data into text data. This process converts the input from speech to text format. The converted text is then sent to the server as digital data.

[0686] Step 3:

[0687] The server analyzes the received text data using natural language processing technology to extract the user's intent. In this step, a generative AI model is used to analyze the text and identify information such as the medical department and appointment date / time the user desires. For example, keywords such as "internal medicine" and "next Tuesday" may be extracted.

[0688] Step 4:

[0689] The server uses an emotion analysis engine to analyze the user's emotional state. The input data used is the text obtained in the previous step, from which emotional features are extracted. For example, it can determine whether the user is experiencing stress.

[0690] Step 5:

[0691] The server accesses a database of medical institutions based on the analysis results and emotional information, searching for available time slots and facility information. This process generates booking options that are suitable for the user's intentions and emotions. As a result, the booking options deemed most suitable for the user are output.

[0692] Step 6:

[0693] The terminal presents the user with reservation options sent from the server. The output data consists of reservation date and time and facility information, which is presented to the user visually or audibly.

[0694] Step 7:

[0695] The user selects their desired reservation from the presented options. This selection registers the user's decision as input data.

[0696] Step 8:

[0697] The server confirms the reservation based on the user's selection and notifies them of this information. The confirmation notification is sent to the terminal, informing the user that the reservation is complete. This officially establishes the reservation with the medical institution.

[0698] (Application Example 2)

[0699] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0700] Traditional reservation systems perform reservation procedures mechanically without considering the user's subjective emotional state, which led to problems such as the reservation process not proceeding smoothly when the user was feeling anxious or stressed. Furthermore, there was a lack of options for receiving appropriate support, resulting in a lower quality of user experience.

[0701] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0702] In this invention, the server includes means for receiving voice or text input from the user and converting the voice data into text data, means for analyzing the converted text data using natural language processing technology to extract the user's intent, and means for analyzing the user's emotions from the user's input data. This makes it possible to present the optimal reservation option according to the user's emotional state and to quickly suggest human support as needed.

[0703] A "user" refers to an individual or their representative who makes a reservation using the system and provides input information via voice or text.

[0704] "Voice data" refers to information provided by the user through voice, and is the raw data that the system uses as preprocessing for natural language analysis.

[0705] "Text data" refers to the format in which audio data is represented as characters, and it is the foundational information that a system uses to analyze it using natural language processing technology.

[0706] "Natural language processing technology" refers to a group of technologies that analyze text data to understand its intent and content, enabling machines to produce meaningful responses and actions.

[0707] A "means of analyzing emotions" refers to a module within a system that extracts emotional nuances from user-inputted voice or text and evaluates the user's emotional state based on that.

[0708] "Optimal booking options" are suggestions generated to present available dates, times, and facility choices based on the user's intentions and emotional state.

[0709] "Support" refers to the connection with an operator and additional guidance provided based on emotional analysis when a user has difficulty making a reservation.

[0710] This invention provides a method for realizing a system that provides an optimal booking experience based on the user's emotional state.

[0711] The server plays a central role in this system. The server uses a speech recognition engine (e.g., Google Speech-to-Text API) to receive voice input from the user and convert it into text data. This text data is then analyzed using natural language processing techniques (e.g., Python's NLTK or SpaCy) to extract the user's intent. Furthermore, sentiment analysis tools such as AWS Comprehend and IBM Watson Tone Analyzer are used to analyze the user's emotions and evaluate their emotional state.

[0712] A terminal is a device that allows users to input information through an interface and receive results. The terminal sends voice or text input to the server and presents the user with reservation options and confirmation notifications sent from the server.

[0713] When a user enters information about the service they intend to book, the data is analyzed, and the most suitable booking options are generated based on the user's emotional state. For example, if a user enters, "I want to book a day care service for my mother this weekend, but I'm a little nervous," the system will present available booking options at facilities that will make the user feel at ease. Furthermore, if the emotional analysis indicates that the user is having difficulty making a booking choice, a connection with an operator will be immediately suggested.

[0714] For example, users can maximize the system's functionality by using prompts such as, "I'd like to book a day care service for the weekend, but I'm a little worried. Could you tell me what times are available?" This allows users to easily select the best booking option and receive the necessary support.

[0715] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0716] Step 1:

[0717] The terminal accepts voice or text input from the user. The user speaks or texts about the desired reservation details and date / time. This input forms the basis of the reservation process.

[0718] Step 2:

[0719] The server uses a speech recognition engine to convert voice data sent from the terminal into text data. When voice input is received, the speech recognition engine processes the data and converts it into text information, preparing it for natural language processing.

[0720] Step 3:

[0721] The server analyzes the text data converted using natural language processing technology. During the analysis, it extracts the user's desired service details and date / time. This clarifies the specific intent necessary for making a reservation.

[0722] Step 4:

[0723] The server uses an emotion analysis engine to evaluate the user's emotional state from text data. Based on the input words and their tone, it performs data processing to identify emotions and numerically interprets feelings such as stress and anxiety.

[0724] Step 5:

[0725] The server searches a database of facilities for the best booking option based on the results of natural language processing and sentiment analysis. It queries facility availability and service details, and generates options that are suitable for the user's preferences and emotional state.

[0726] Step 6:

[0727] The terminal presents the user with reservation options sent from the server. The user can review the presented options and choose the one that best suits their needs. The user's selection at this step becomes the final reservation.

[0728] Step 7:

[0729] The server confirms the reservation selected by the user and sends a reservation confirmation notification to the device. This officially records the final reservation, and the user retains the confirmation information.

[0730] Step 8:

[0731] The server will suggest connecting you with an operator as needed. If the sentiment analysis determines that you are having difficulty making a reservation, the server will immediately begin guiding you to an operator for assistance.

[0732] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0733] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0734] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0735] [Fourth Embodiment]

[0736] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0737] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0738] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0739] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0740] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0741] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0742] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0743] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0744] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0745] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0746] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0747] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0748] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0749] This invention provides a system that allows elderly people and those who are not comfortable using digital devices to easily make appointments at medical institutions. The system primarily operates based on the user's voice or text input, with the server and terminal working together to process the information.

[0750] First, the device uses a microphone and recording function to receive voice input from the user. If the user says, "I would like to make an appointment with the dermatologist next Friday," that voice is captured as a digital signal. Text input is also possible, and the user can enter "I would like to make an appointment with the dermatologist next Friday" using the keyboard or by touching the screen.

[0751] In the case of voice input, the terminal uses a speech recognition engine to convert the voice signal into text data. The converted text data is then sent to the server to process the reservation.

[0752] The server analyzes the received text data using natural language processing technology to clarify the user's intent. Specifically, it extracts keywords such as "medical department" and "date and time" from the text to identify the information necessary for making a reservation.

[0753] The server then queries the medical institution's database to check for available appointment times and corresponding departments. The confirmed appointment options are reconfigured into a selectable format for the user and sent to the terminal.

[0754] Next, the device presents these options to the user and prompts them to select a specific date, time, and medical department. For example, it might ask, "We have an appointment available for dermatology on Friday at 10am. Is this alright?"

[0755] After the user makes a selection, the device sends that information back to the server to confirm the reservation. Once the reservation is confirmed, the server generates a confirmation message and sends it to the device to notify the user that the reservation is complete. A reminder is also sent as the reservation date approaches.

[0756] This system also includes a feature that allows users to connect with an operator with a single button press if they encounter difficulties during the booking process. This enables quick responses even when direct support is needed. Furthermore, the voice recognition function supports multiple languages ​​and is designed to be easy to use for users living in rural areas, as it flexibly adapts to Japanese dialects and other regional variations.

[0757] The following describes the processing flow.

[0758] Step 1:

[0759] Users enter their desired appointment details through their device's voice or text input function. For example, they might say, "I'd like to book an internal medicine appointment for tomorrow afternoon," or type similar information as text.

[0760] Step 2:

[0761] The terminal converts voice input into text data using a speech recognition engine. If text is entered, it is processed as text data. The converted or entered text is then sent to the server for further analysis.

[0762] Step 3:

[0763] The server feeds the received text data to a natural language processing engine, which analyzes the user's intent from the text. In this process, it extracts important keywords such as the desired medical department and preferred date and time.

[0764] Step 4:

[0765] Based on the extracted information, the server sends a query to the healthcare facility database to retrieve available dates, times, and a list of facilities. The obtained data is then converted into a user-friendly format and sent to the terminal.

[0766] Step 5:

[0767] The terminal presents the user with available reservation options received from the server. For example, it might display a message on the screen or announce a voice message saying, "You have appointments available in the internal medicine department tomorrow at 3pm and 4pm. Which would you prefer?"

[0768] Step 6:

[0769] The user selects their desired reservation from the presented options and confirms it. The selected information is sent to the server via the terminal.

[0770] Step 7:

[0771] The server sends the reservation information selected by the user as a confirmation request to the medical institution's reservation system, formally confirming the reservation. It generates a confirmation message and sends it back to the terminal to notify the user.

[0772] Step 8:

[0773] The terminal receives a confirmation message from the server and notifies the user, either visually or audibly, that the reservation is complete. It also generates a reminder and notifies the user as the reservation date and time approaches.

[0774] (Example 1)

[0775] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0776] In modern society, the numerous steps and technical complexities involved in making appointments at medical institutions pose significant obstacles for elderly individuals and those unfamiliar with digital devices. There is a need to provide such users with an intuitive and easy-to-use appointment system. Furthermore, the realization of technology capable of accurate speech recognition and intent analysis, while considering linguistic diversity and dialects, is essential. Additionally, it is necessary to provide a means for users to quickly receive assistance if they encounter difficulties during operation.

[0777] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0778] This invention includes a server that receives information input from the user, a technology that converts the received voice data into text information, and a technology that analyzes the converted text information using natural language processing to extract the user's purpose. This allows users to easily make reservations at medical institutions and achieves highly accurate reservation processing through voice recognition adapted to various languages ​​and dialects. Furthermore, if users encounter difficulties during operation, they can immediately receive support from the operator, ensuring a safe and secure user experience.

[0779] A "user" refers to a person who uses the system to make an appointment at a medical institution.

[0780] "Information input" refers to data that users provide to the system through voice or text.

[0781] "Device" refers to a hardware or software component that receives data from a user.

[0782] "Audio data" refers to a signal in which the words spoken by a user are saved in a digital format.

[0783] "Textual information" refers to digital information obtained by converting audio data into text format.

[0784] "Technology" refers to the methods and means that a system uses to achieve a specific function.

[0785] "Natural language processing" refers to the process of analyzing data using language processing techniques to understand the user's intent from textual information.

[0786] "Purpose" refers to the ultimate intention or goal of a user when operating a system.

[0787] A "medical facility" refers to an organization or place that provides medical treatment or medical care.

[0788] An "information collection" refers to the data stored in databases and servers used by a system.

[0789] "Time slot" refers to a specific time frame during which reservations are possible.

[0790] An "organization" refers to a facility or organization that provides a specific service.

[0791] "Providing" refers to the act of a system presenting information or options to a user.

[0792] "Promotion" refers to the guidance a system provides to make it easier for users to take the next action.

[0793] "Final decision" refers to the official confirmation of a reservation based on the user's selection.

[0794] "Notification" refers to the act of informing users about the status of their reservation or any necessary confirmations.

[0795] "Establishing communication" means creating a situation where a user can contact an operator when they need support.

[0796] "Diversity" describes a wide range of things that have different types or forms.

[0797] "Language" refers to a systematic collection of sounds and characters that humans use to communicate.

[0798] A "dialect" refers to a variant of a language that exhibits different characteristics depending on the region and culture, even within the same language.

[0799] This invention provides a system that allows users to easily and intuitively make appointments at medical institutions. The system functions primarily through the interaction of a server, terminals, and users.

[0800] First, when a user wishes to make an appointment at a medical institution, they use an application on a device such as a smartphone or tablet. This application can accept input via voice or text. For example, a user might say, "I would like to make an appointment with the dermatologist next Friday." This input is done using the microphone built into the smartphone.

[0801] Next, the device converts the audio data into text. This conversion uses speech recognition software such as the Google Speech-to-Text API. The converted text data is then sent from the device to the server via the internet.

[0802] The server analyzes the received text data using natural language processing techniques. Libraries such as spaCy and Transformers are used for this analysis. As a result of the analysis, the user's intent is clarified, and the information necessary for making a reservation is extracted.

[0803] Next, the server queries a database containing information about medical facilities to determine available time slots and institutions. For example, AWS RDS can be used to efficiently query the database.

[0804] The terminal provides the user with reservation options sent from the server. These options are displayed clearly in either a calendar or list format. The user then selects the most suitable reservation option from the options.

[0805] Finally, the device sends the selected information back to the server to finalize the reservation. Once the reservation is confirmed, the server generates a confirmation message and sends it to the device to notify the user. Reminders can also be sent as needed, notifying the user as the reservation date approaches.

[0806] As a concrete example, here is an example of a prompt statement to be input to a generative AI model:

[0807] Please input "I would like to make an appointment with a dermatologist next Friday" using voice input, send this information to the AI ​​model for analysis, and suggest available medical institutions.

[0808] This system allows users to easily make appointments at medical facilities and features voice recognition adapted to various languages ​​and dialects. Furthermore, it enables rapid communication with operators in emergencies, allowing users to complete the booking process with peace of mind.

[0809] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0810] Step 1:

[0811] Users request appointments using voice or text. Specifically, they launch an application on their smartphone and input a request, such as "I would like to book an appointment with the dermatologist next Friday," using voice or text. This input constitutes the basic information input into the system.

[0812] Step 2:

[0813] The device receives user voice input and converts it into text. It uses the Google Speech-to-Text API to analyze the voice data and generate text. During this process, it performs the most optimal text conversion possible while voice is present, and prepares the resulting text data. This converted information is then passed to the next process as intermediate data.

[0814] Step 3:

[0815] The terminal sends text information to the server. Specifically, it sends an HTTP request over the internet to prepare for further processing on the server. The input here is the text information generated in step 2, and the output is the completion of the transmission to the server.

[0816] Step 4:

[0817] The server analyzes the received text information using natural language processing technology. Libraries such as spaCy and Transformers are used for analysis, extracting keywords such as "medical department" and "date and time." It receives text information as input and outputs the analysis data, identifying the information necessary for making a reservation.

[0818] Step 5:

[0819] The server executes queries against a database of medical facility information. It searches SQL and NoSQL databases for medical departments and dates that match the user's needs, and collects available reservation options. The input is analytical data, and the output is a list of options presented to the user.

[0820] Step 6:

[0821] The terminal receives reservation candidates sent from the server and presents them to the user. Specifically, it displays them in a calendar or list format to make them easily understandable visually to the user. At this point, the input is a list of reservation candidates, and the output is the provision of visual information to the user.

[0822] Step 7:

[0823] The user selects their preferred booking option from the presented choices. The user's selected booking information is then sent as input to the next step.

[0824] Step 8:

[0825] The terminal sends the user's selections back to the server to finalize the reservation. The input here is the user's selection information, and the output is a reservation confirmation command.

[0826] Step 9:

[0827] The server confirms the reservation and generates a confirmation message. It sends the confirmation information back to the terminal, notifying the user. The output provides the user with information on the reservation status and how to manage their future schedule.

[0828] Through the steps outlined above, users can intuitively and smoothly complete their medical appointment bookings.

[0829] (Application Example 1)

[0830] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0831] The goal is to eliminate the complexities of the procedures and communication barriers faced by elderly users and those unfamiliar with digital tools when making appointments at medical institutions. In particular, it is necessary to provide a more intuitive and smoother booking process by utilizing voice input.

[0832] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0833] In this invention, the server includes an input device that receives voice data or string data, a conversion device that converts the input voice data into string data, and a natural language processing device that analyzes the converted string data and identifies the intent. This makes it possible for users to easily communicate their reservation requests through voice input and for the system to automatically present appropriate reservation information based on that.

[0834] "Audio data" refers to data that captures the voice spoken by the user as a digital signal.

[0835] "String data" refers to text-formatted data obtained by converting audio data.

[0836] An "input device" is a device used to receive audio data or text data.

[0837] A "conversion device" is a device that has the function of converting audio data into text data.

[0838] A "natural language processing device" is a device that analyzes text data and performs processing to identify the user's intent.

[0839] "Medical institution" is a general term for organizations and facilities that provide medical care and treatment.

[0840] An "information system" is a computer system used to manage and manipulate data and information.

[0841] A "search device" is a device used to access an information system and find available time slots and locations for reservations.

[0842] A "user interface" is an interface used for exchanging information between a system and a user.

[0843] A "notification device" is a device used to transmit information or messages to a user.

[0844] A "voice output device" is a device that presents generated options or information to the user as voice.

[0845] A "support device" is a device that has the function of enabling communication with an operator when a user has difficulty making a reservation selection.

[0846] This invention is a system that enables elderly people and users unfamiliar with using digital devices to smoothly make appointments at medical institutions using voice input. The system converts voice data into string data, analyzes the user's requests using natural language processing, and presents appropriate appointment options.

[0847] The terminal is equipped with an input device to receive audio data and uses speech recognition technology to convert the acquired audio into a digital format. The converted audio data is sent to the server as text data. In this operation, the "Google Speech Recognition API" can be used as a specific example of a speech recognition engine.

[0848] After receiving the string data, the server uses a natural language processing unit to analyze the data and extract information about the medical institution and date / time the user wishes to book. The search device on the server accesses the medical institution's information system to obtain information about available time slots and medical departments. Specific examples of natural language processing technologies used here include "spaCy" or "NLTK".

[0849] Users can receive search results presented through a user interface, and be prompted to make a selection. Furthermore, it is possible to provide information without relying on visual cues by using an audio output device to present options.

[0850] Once the user completes their selection, the device will notify them of the appointment confirmation via a notification device and set reminders as needed to ensure they don't forget their appointment at the medical institution.

[0851] For example, if a user uses voice input to say, "I want to book an appointment with a dermatologist next Friday," the system will search for suitable dates and times and potential medical facilities, present the user with the best option, and confirm the appointment for the selected time slot.

[0852] An example prompt for the generating AI model is as follows: "Please propose a system that allows elderly people to easily book appointments at medical facilities via their smartphones. Include a function that uses voice input to automatically retrieve and suggest available medical departments to the user."

[0853] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0854] Step 1:

[0855] The device receives audio data. It captures what the user says via the microphone and converts the analog audio into a digital audio signal. The input is the user's voice, and the output is digital audio data. Specifically, the interface between the microphone and the speech recognition engine is used.

[0856] Step 2:

[0857] The device converts digital audio data into text data. It uses a speech recognition engine to analyze the audio signal and convert it into text format. The input is digital audio data, and the output is text data. Specifically, it calls speech recognition services such as the "Google Speech Recognition API."

[0858] Step 3:

[0859] The server analyzes string data to identify the user's intent. It utilizes a natural language processing system to extract necessary keywords and phrases from the text. The input is string data, and the output is reservation intent information (e.g., date and time, medical specialty). Specifically, the natural language processing engine "spaCy" or "NLTK" is used.

[0860] Step 4:

[0861] The server accesses the medical institution's information system to search for available appointment times and departments. It queries the database to retrieve appointment options that match the user's intent. The input is information about the appointment intent, and the output is a list of available appointment options. Specifically, it executes SQL queries to retrieve the necessary data.

[0862] Step 5:

[0863] The terminal presents the user with reservation options. The user interface visually or audibly displays the options and prompts the user to make a selection. Input is a list of available reservation options, and output is the user's selection. Specific operations include screen display and speech synthesis technology.

[0864] Step 6:

[0865] The user selects a reservation option on the terminal, and the terminal sends this information to the server. This transmits reservation information based on the user's selection to the server. The input is the user's selection, and the output is the selected reservation information. The terminal's selection interface is used for this operation.

[0866] Step 7:

[0867] The server confirms the reservation and notifies the terminal. It registers the selected information in the reservation system and generates a confirmation message. The input is the selected reservation information, and the output is the reservation confirmation message. Specifically, database updates and the notification system are involved.

[0868] Step 8:

[0869] The device confirms the reservation with the user and sets a reminder. It notifies the user that the reservation is complete and sets up a reminder to notify them as the reservation date approaches. The input is a reservation confirmation message, and the output is a reminder message. Specifically, the device's notification function and the calendar API are used.

[0870] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0871] This invention is a system that takes into account the user's emotions when making a reservation at a medical institution, providing a more comfortable and smoother experience. The system comprises a terminal for receiving voice or text input from the user, a server for analyzing the data and processing the reservation, and an emotion engine for recognizing the user's emotions.

[0872] First, the user communicates their appointment request to the system using the device's voice or text input function. For example, they might say or type, "I'd like to make a dental appointment for next Wednesday."

[0873] The terminal converts voice input into text using a speech recognition engine and then sends it to the server. The server analyzes this text data using natural language processing technology to identify the user's desired medical department and date / time. Simultaneously, an emotion engine analyzes the user's emotions from the tone of voice and selected words in the input data. For example, it might determine that the user is experiencing stress.

[0874] Based on the information gathered and the results of sentiment analysis, the server queries a database of healthcare facilities to generate the most suitable booking options. For example, if the system analyzes that the user is stressed, it will prioritize showing time slots where appointments can be booked quickly.

[0875] Next, the generated reservation options are sent to the terminal and presented to the user. If the user makes a selection, that selection is sent to the server and the formal reservation process is completed. Simultaneously with the reservation confirmation, the server generates a confirmation notification and sends it to the terminal to inform the user. Furthermore, it also has a function that suggests connecting the user to an operator depending on the user's emotional state. For example, if frustration or anxiety is detected, it automatically guides the user to human support.

[0876] Finally, as the reservation date approaches, the system generates a reminder and notifies the user via their device. This ensures the user doesn't forget to use their reservation. By combining this with an emotion engine, a more user-friendly reservation system is created.

[0877] The following describes the processing flow.

[0878] Step 1:

[0879] Users enter their appointment requests using the device's voice or text input function. For example, they might say, "I'd like to make an appointment with the orthopedics department next Monday," or type the same thing as text.

[0880] Step 2:

[0881] The terminal uses a speech recognition engine to convert the input speech data into text data. The converted text data or text input data is then sent directly to the server.

[0882] Step 3:

[0883] The server analyzes the received text data using a natural language processing engine and extracts important keywords such as the medical department and date / time from the user's reservation request.

[0884] Step 4:

[0885] The server uses an emotion engine to analyze the user's emotions from the content of text data or, if it's audio, from the tone of voice. For example, it can determine if the user's voice contains feelings of anxiety or unease.

[0886] Step 5:

[0887] Based on the extracted information and sentiment analysis results, the server consults a database of healthcare facilities to generate the most suitable booking options for available dates, times, and facilities. For example, if the user is feeling anxious, it prioritizes selecting earlier time slots.

[0888] Step 6:

[0889] The terminal presents the user with reservation options sent from the server. The information is displayed on the screen or spoken aloud, for example, "You have an appointment available at the orthopedics department on Monday at 10:00 AM. Do you want to confirm the reservation?"

[0890] Step 7:

[0891] The user selects their preferred booking option from the presented choices and confirms it. The selected information is sent to the server via the terminal.

[0892] Step 8:

[0893] Based on the user's selection information, the server sends a reservation confirmation request to the medical institution's reservation system, formally confirming the reservation. After confirmation, a detailed confirmation message is generated and sent to the user's terminal to inform them.

[0894] Step 9:

[0895] The device receives a confirmation message from the server and informs the user that the reservation is complete. It also follows up with the user by sending reminder notifications as the reservation date approaches, if necessary.

[0896] (Example 2)

[0897] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0898] Modern medical appointment systems often fail to consider user emotions, leading to stress and anxiety for users as they proceed with booking appointments. Furthermore, users who experience difficulties in selecting appointments often lack access to adequate support.

[0899] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0900] In this invention, the server includes means for receiving voice or text input from the user, means for analyzing the user's emotional state, and means for generating optimal reservation options based on the user's emotional state. This enables a comfortable and smooth reservation experience that takes the user's emotions into consideration.

[0901] A "user" refers to an individual who uses the system to make an appointment at a medical institution.

[0902] "Voice data" refers to information that records the voice spoken by a user in digital format.

[0903] "Text data" refers to written information obtained through audio data or user text input.

[0904] "Natural language processing technology" refers to the technology that enables devices to understand and analyze human language.

[0905] "Emotional state" refers to the emotional state a user exhibits while using the system.

[0906] A "medical institution database" refers to a collection of information that includes available appointment times and facility information for medical institutions.

[0907] "Reservation options" refer to the multiple available date, time, and facility choices presented to the user.

[0908] "Operator" refers to a person or role that provides support to users through a system.

[0909] "Speech recognition technology" refers to the technology that converts speech data into text data.

[0910] This invention is a system designed to provide a comfortable and smooth experience for users when making appointments at medical institutions, taking into account their emotional state. The following describes embodiments for carrying out the invention.

[0911] Users enter their reservation preferences using voice or text via a terminal. When using voice input, the terminal employs speech recognition technology to convert the voice data into text. For example, typical speech recognition software processes this input.

[0912] The device converts the received audio into text and then sends it to the server as digital data. During this process, natural language processing (NLP) technology is used to analyze the text and extract the user's intent. Specifically, the next scheduled appointment might be translated as "I would like to make an internal medicine appointment for next Tuesday." At this stage, the natural language processing engine and sentiment analysis engine are operational. For example, an NLP library might be used for analysis, and the sentiment analysis engine might be used for sentiment analysis.

[0913] The server identifies the user's desired medical department and appointment time based on text parsed in natural language. Simultaneously, it analyzes the user's emotional state through sentiment analysis. The results of the sentiment analysis assess whether the user is experiencing stress or anxiety, and are used to generate the most suitable appointment options.

[0914] The server accesses a database of medical institutions based on the analysis results and emotional state information, searching for available time slots and facility information. From this data, it presents the most suitable booking options according to the user's emotional state. For example, if it determines that the user is feeling anxious, it generates an option to prioritize immediately available appointments.

[0915] For example, if a user enters text saying, "I want to make a dermatology appointment next Friday, but I've been busy lately and I'm worried about my health," the system uses an emotion engine to determine the user's concerns and prioritizes presenting time slots with high appointment availability.

[0916] An example of a prompt for a generative AI model is, "What suggestions can be made if a user wants to make an appointment at a medical institution but may be experiencing stress?"

[0917] This enables reservation suggestions that take user emotions into consideration, resulting in a more comfortable service for users.

[0918] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0919] Step 1:

[0920] The user enters their request for a medical appointment via voice or text. This user input becomes the initial data. For example, they might enter a request such as, "I would like to make an appointment with an internal medicine specialist next Tuesday."

[0921] Step 2:

[0922] The device uses speech recognition technology to convert speech data into text data. This process converts the input from speech to text format. The converted text is then sent to the server as digital data.

[0923] Step 3:

[0924] The server analyzes the received text data using natural language processing technology to extract the user's intent. In this step, a generative AI model is used to analyze the text and identify information such as the medical department and appointment date / time the user desires. For example, keywords such as "internal medicine" and "next Tuesday" may be extracted.

[0925] Step 4:

[0926] The server uses an emotion analysis engine to analyze the user's emotional state. The input data used is the text obtained in the previous step, from which emotional features are extracted. For example, it can determine whether the user is experiencing stress.

[0927] Step 5:

[0928] The server accesses a database of medical institutions based on the analysis results and emotional information, searching for available time slots and facility information. This process generates booking options that are suitable for the user's intentions and emotions. As a result, the booking options deemed most suitable for the user are output.

[0929] Step 6:

[0930] The terminal presents the user with reservation options sent from the server. The output data consists of reservation date and time and facility information, which is presented to the user visually or audibly.

[0931] Step 7:

[0932] The user selects their desired reservation from the presented options. This selection registers the user's decision as input data.

[0933] Step 8:

[0934] The server confirms the reservation based on the user's selection and notifies them of this information. The confirmation notification is sent to the terminal, informing the user that the reservation is complete. This officially establishes the reservation with the medical institution.

[0935] (Application Example 2)

[0936] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0937] Traditional reservation systems perform reservation procedures mechanically without considering the user's subjective emotional state, which led to problems such as the reservation process not proceeding smoothly when the user was feeling anxious or stressed. Furthermore, there was a lack of options for receiving appropriate support, resulting in a lower quality of user experience.

[0938] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0939] In this invention, the server includes means for receiving voice or text input from the user and converting the voice data into text data, means for analyzing the converted text data using natural language processing technology to extract the user's intent, and means for analyzing the user's emotions from the user's input data. This makes it possible to present the optimal reservation option according to the user's emotional state and to quickly suggest human support as needed.

[0940] A "user" refers to an individual or their representative who makes a reservation using the system and provides input information via voice or text.

[0941] "Voice data" refers to information provided by the user through voice, and is the raw data that the system uses as preprocessing for natural language analysis.

[0942] "Text data" refers to the format in which audio data is represented as characters, and it is the foundational information that a system uses to analyze it using natural language processing technology.

[0943] "Natural language processing technology" refers to a group of technologies that analyze text data to understand its intent and content, enabling machines to produce meaningful responses and actions.

[0944] A "means of analyzing emotions" refers to a module within a system that extracts emotional nuances from user-inputted voice or text and evaluates the user's emotional state based on that.

[0945] "Optimal booking options" are suggestions generated to present available dates, times, and facility choices based on the user's intentions and emotional state.

[0946] "Support" refers to the connection with an operator and additional guidance provided based on emotional analysis when a user has difficulty making a reservation.

[0947] This invention provides a method for realizing a system that provides an optimal booking experience based on the user's emotional state.

[0948] The server plays a central role in this system. The server uses a speech recognition engine (e.g., Google Speech-to-Text API) to receive voice input from the user and convert it into text data. This text data is then analyzed using natural language processing techniques (e.g., Python's NLTK or SpaCy) to extract the user's intent. Furthermore, sentiment analysis tools such as AWS Comprehend and IBM Watson Tone Analyzer are used to analyze the user's emotions and evaluate their emotional state.

[0949] A terminal is a device that allows users to input information through an interface and receive results. The terminal sends voice or text input to the server and presents the user with reservation options and confirmation notifications sent from the server.

[0950] When a user enters information about the service they intend to book, the data is analyzed, and the most suitable booking options are generated based on the user's emotional state. For example, if a user enters, "I want to book a day care service for my mother this weekend, but I'm a little nervous," the system will present available booking options at facilities that will make the user feel at ease. Furthermore, if the emotional analysis indicates that the user is having difficulty making a booking choice, a connection with an operator will be immediately suggested.

[0951] For example, users can maximize the system's functionality by using prompts such as, "I'd like to book a day care service for the weekend, but I'm a little worried. Could you tell me what times are available?" This allows users to easily select the best booking option and receive the necessary support.

[0952] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0953] Step 1:

[0954] The terminal accepts voice or text input from the user. The user speaks or texts about the desired reservation details and date / time. This input forms the basis of the reservation process.

[0955] Step 2:

[0956] The server uses a speech recognition engine to convert voice data sent from the terminal into text data. When voice input is received, the speech recognition engine processes the data and converts it into text information, preparing it for natural language processing.

[0957] Step 3:

[0958] The server analyzes the text data converted using natural language processing technology. During the analysis, it extracts the user's desired service details and date / time. This clarifies the specific intent necessary for making a reservation.

[0959] Step 4:

[0960] The server uses an emotion analysis engine to evaluate the user's emotional state from text data. Based on the input words and their tone, it performs data processing to identify emotions and numerically interprets feelings such as stress and anxiety.

[0961] Step 5:

[0962] The server searches a database of facilities for the best booking option based on the results of natural language processing and sentiment analysis. It queries facility availability and service details, and generates options that are suitable for the user's preferences and emotional state.

[0963] Step 6:

[0964] The terminal presents the user with reservation options sent from the server. The user can review the presented options and choose the one that best suits their needs. The user's selection at this step becomes the final reservation.

[0965] Step 7:

[0966] The server confirms the reservation selected by the user and sends a reservation confirmation notification to the device. This officially records the final reservation, and the user retains the confirmation information.

[0967] Step 8:

[0968] The server will suggest connecting you with an operator as needed. If the sentiment analysis determines that you are having difficulty making a reservation, the server will immediately begin guiding you to an operator for assistance.

[0969] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0970] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0971] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0972] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0973] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0974] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0975] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0976] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0977] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0978] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0979] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0980] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0981] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0982] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0983] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0984] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0985] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0986] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0987] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0988] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0989] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0990] The following is further disclosed regarding the embodiments described above.

[0991] (Claim 1)

[0992] A means of receiving voice or text input from the user,

[0993] A means of converting received audio data into text data,

[0994] A means of analyzing the converted text data using natural language processing technology to extract the user's intent,

[0995] A means of connecting to a database of medical institutions to search for available dates, times, and facilities,

[0996] A means of presenting search results to the user and prompting them to make a choice,

[0997] A means of confirming and notifying the user of the reservation based on their selection,

[0998] A system that includes this.

[0999] (Claim 2)

[1000] The system according to claim 1, further comprising means for enabling a user to connect with an operator if the user has difficulty selecting a reservation.

[1001] (Claim 3)

[1002] The system according to claim 1, comprising speech recognition technology that supports multiple languages ​​and dialects.

[1003] "Example 1"

[1004] (Claim 1)

[1005] A device that accepts information input from users,

[1006] Technology to convert received audio data into text information,

[1007] This technology analyzes the converted text information using natural language processing to extract the user's purpose,

[1008] A device that connects to a database of medical facility information to search for available time slots and institutions for reservations,

[1009] Technologies that provide search results to users and facilitate their selection,

[1010] Technology that makes final reservation decisions and provides notifications based on user selection,

[1011] A system that includes this.

[1012] (Claim 2)

[1013] The system according to claim 1, comprising technology that enables the user to establish communication with the operator if the user finds the selection process difficult.

[1014] (Claim 3)

[1015] The system according to claim 1, comprising speech recognition technology that adapts to a variety of languages ​​and dialects.

[1016] "Application Example 1"

[1017] (Claim 1)

[1018] An input device that receives audio data or text data,

[1019] A conversion device that converts input audio data into string data,

[1020] A natural language processing device that analyzes converted string data and identifies intent,

[1021] A search device that accesses the information system of a medical institution to find available time slots and locations for appointments,

[1022] A user interface that presents search results and guides the user to make a selection,

[1023] A notification device that confirms and notifies the reservation based on the selected option,

[1024] A voice output device that presents the generated options aloud,

[1025] A system that includes this.

[1026] (Claim 2)

[1027] The system according to claim 1, further comprising a support device that enables communication with an operator when a user experiences difficulty in making a reservation selection.

[1028] (Claim 3)

[1029] The system according to claim 1, comprising speech recognition technology that supports multiple languages ​​and region-specific language varieties.

[1030] "Example 2 of combining an emotion engine"

[1031] (Claim 1)

[1032] A means of receiving voice or text input from the user,

[1033] A means of converting received audio data into text data,

[1034] A means of analyzing the converted text data using natural language processing technology to extract the user's intent,

[1035] A means of analyzing the user's emotional state,

[1036] A means of connecting to a database of medical institutions to search for available dates, times, and facilities,

[1037] A means for generating the optimal reservation option based on the user's emotional state,

[1038] A means of presenting the generated options to the user and prompting them to make a selection,

[1039] A means of confirming and notifying the user of the reservation based on their selection,

[1040] A system that includes this.

[1041] (Claim 2)

[1042] The system according to claim 1, further comprising means for enabling a user to connect with an operator if the user has difficulty selecting a reservation.

[1043] (Claim 3)

[1044] The system according to claim 1, comprising speech recognition technology that supports multiple languages ​​and dialects.

[1045] "Application example 2 when combining with an emotional engine"

[1046] (Claim 1)

[1047] A means of receiving voice or text input from the user,

[1048] A means of converting received audio data into text data,

[1049] A means of analyzing the converted text data using natural language processing technology to extract the user's intent,

[1050] A method for analyzing emotions from user input data,

[1051] Based on the results of the sentiment analysis described above, a means is provided to connect to the facility's database and search for the optimal reservation option.

[1052] A means of presenting search results to the user and providing options that match the user's emotional state,

[1053] A means of confirming and notifying the user of the reservation based on their selection,

[1054] A system that includes this.

[1055] (Claim 2)

[1056] The system according to claim 1, further comprising means for connecting a user with an operator based on sentiment analysis if the user has difficulty making a reservation selection.

[1057] (Claim 3)

[1058] The system according to claim 1, comprising speech recognition technology that supports multiple languages ​​and dialects, and further comprising means for improving the accuracy of emotion recognition. [Explanation of symbols]

[1059] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. An input device that receives audio data or text data, A conversion device that converts input audio data into string data, A natural language processing device that analyzes converted string data and identifies intent, A search device that accesses the information system of a medical institution to find available time slots and locations for appointments, A user interface that presents search results and guides the user to make a selection, A notification device that confirms and notifies the reservation based on the selected option, A voice output device that presents the generated options aloud, A system that includes this.

2. The system according to claim 1, further comprising a support device that enables communication with an operator when a user experiences difficulty in making a reservation selection.

3. The system according to claim 1, comprising speech recognition technology that supports multiple languages ​​and region-specific language varieties.