Method, information processing device, and non-transitory computer-readable medium
The method enhances dialogue systems by registering keywords and commands to accurately execute functions based on natural language inputs, improving the likelihood of desired function execution and response accuracy.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- TOYOTA JIDOSHA KK
- Filing Date
- 2025-12-02
- Publication Date
- 2026-06-11
Smart Images

Figure JP2025042067_11062026_PF_FP_ABST
Abstract
Description
Method, Information Processing Apparatus, and Non-Temporary Computer-Readable Medium 【0001】 The present disclosure relates to a method, an information processing apparatus, and a non-temporary computer-readable medium, and particularly to a method for executing functions in an interactive system, an information processing apparatus, and a non-temporary computer-readable medium. Cross-Reference to Related Applications 【0002】 This application claims the priority of Japanese Patent Application No. 2024-210737, filed in Japan on December 3, 2024, and the entire disclosure of the prior application is incorporated herein by reference. 【0003】 Conventionally, a technique for generating an interactive bot specialized for a given domain using a large language model based on documents in the given domain is known. For example, Patent Document 1 discloses a technique for generating query data that can be answered with the given document using a language model. 【0004】 Japanese Unexamined Patent Application Publication No. 2023-76413 【0005】The technology described in Patent Document 1 primarily focuses on generating queries, or anticipated questions, for documents belonging to a specific domain using a large-scale language model, and does not disclose the structural analysis of user questions. Conventionally, as in Patent Document 1, a method has been employed in dialogue systems where the input of a specific phrase (question) triggers the execution of a function corresponding to that phrase. However, when a user speaks in natural language, there are many variations in paraphrasing. Therefore, it was necessary for the dialogue system to register a large number of phrases ("I want to read the instruction manual," "Show me the manual," "Show me the catalog," etc.) associated with each function, which was cumbersome. In addition, a method has also been employed in dialogue systems where the input of a keyword triggers the execution of a function corresponding to that keyword. However, if multiple homophones are registered and associated with multiple functions, a function unintended by the user may be executed. In other words, there was room for improvement in the technology related to the method of executing functions in dialogue systems. 【0006】 In light of these circumstances, the purpose of this disclosure is to improve the technology related to the method of executing functions in a dialogue system. 【0007】 (1) A method according to one embodiment of the present disclosure is a method executed by an information processing device, which includes: registering one or more keywords and one or more commands for each of a plurality of functions; receiving input information including a sentence in natural language; when a keyword registered in one of the plurality of functions and a command registered in that function are detected from the sentence, executing the function; and outputting response information including result information obtained as a result of executing the function. 【0008】 (2) A method according to one embodiment of the present disclosure is the method according to (1), further comprising: generating a prompt based on the input information and the result information; inputting the prompt into a large-scale language model to obtain a draft response; wherein the response information includes the draft response. 【0009】 (3) A method according to one embodiment of the present disclosure is the method according to (1) or (2), wherein the plurality of functions include at least one of skill-based functions and non-skill-based functions. 【0010】 (4) A method according to one embodiment of the present disclosure is a method according to any one of (1) to (3), wherein the skill-based function includes at least one of product specification search, used car market price search, catalog list display, instruction manual display, test drive vehicle search, and delivery date search. 【0011】 (5) A method according to one embodiment of the present disclosure is a method according to any one of (1) to (4), further comprising: registering one or more second keywords for each of a plurality of functions; and executing the function when a keyword registered for one of the plurality of functions, a second keyword registered for the function, and a command registered for the function are detected from the text. 【0012】 (6) A method according to one embodiment of the present disclosure is a method according to any one of (1) to (5), wherein the second keyword includes at least one of the vehicle type or region information. 【0013】 (7) A method according to one embodiment of the present disclosure is the method according to any one of (1) to (6), wherein the text includes the content of a dialogue between multiple speakers. 【0014】 (8) An information processing device according to one embodiment of the present disclosure includes a processor that performs processing including registering one or more keywords and one or more commands for each of a plurality of functions, receiving input information including text in natural language, and when a keyword registered in one of the plurality of functions and a command registered in the function are detected in the text, executing the function, and outputting response information including result information obtained as a result of executing the function. 【0015】(9) An information processing device according to one embodiment of the present disclosure is the information processing device described in (8), further comprising: generating a prompt based on the input information and the result information; inputting the prompt into a large-scale language model to obtain a draft response; and the response information comprises a processor that performs processing including the draft response. 【0016】 (10) An information processing device according to one embodiment of the present disclosure is the information processing device described in (8) or (9), wherein the plurality of functions include at least one of skill-based functions and non-skill-based functions. 【0017】 (11) An information processing device according to one embodiment of the present disclosure is an information processing device according to any one of paragraphs (8) to (10), wherein the skill-based function includes at least one of product specification search, used car market price search, catalog list display, instruction manual display, test drive vehicle search, and delivery date search. 【0018】 (12) An information processing device according to one embodiment of the present disclosure is an information processing device according to any one of paragraphs (8) to (11), further comprising a processor that performs processing including registering one or more second keywords for each of a plurality of functions, and executing the function when a keyword registered for one of the plurality of functions, a second keyword registered for the function, and a command registered for the function are detected from the text. 【0019】 (13) An information processing device according to one embodiment of the present disclosure is an information processing device according to any one of paragraphs (8) to (12), wherein the second keyword further includes at least one of the vehicle type or region information. 【0020】 (14) An information processing device according to one embodiment of the present disclosure is an information processing device according to any one of paragraphs (8) to (13), wherein the text includes the content of a dialogue between multiple speakers. 【0021】(15) A non-temporary computer-readable medium according to one embodiment of the present disclosure is a non-temporary computer-readable medium executed by an information processing device, which causes the information processing device to perform the following processes: registering one or more keywords and one or more commands for each of a plurality of functions; receiving input information including text in natural language; when a keyword registered in any of the plurality of functions and a command registered in the function are detected from the text, executing the function; and outputting response information including result information obtained as a result of executing the function. 【0022】 (16) A non-temporary computer-readable medium according to one embodiment of the present disclosure is the non-temporary computer-readable medium described in (15), further comprising: generating a prompt based on the input information and the result information; inputting the prompt into a large language model to obtain a draft response; wherein the response information causes an information processing device to perform processing including the draft response. 【0023】 (17) A non-temporary computer-readable medium according to one embodiment of the present disclosure is a non-temporary computer-readable medium as described in (15) or (16), wherein the plurality of functions include at least one of skill-based functions and a non-skill-based function. 【0024】 (18) A non-temporary computer-readable medium according to one embodiment of the present disclosure is a non-temporary computer-readable medium as described in any one of paragraphs (15) to (17), wherein the skill-based function includes at least one of product specification search, used car market price search, catalog list display, instruction manual display, test drive vehicle search, and delivery date search. 【0025】(19) A non-temporary computer-readable medium according to one embodiment of the present disclosure is a non-temporary computer-readable medium according to any one of paragraphs (15) to (18), further comprising: registering one or more second keywords for each of a plurality of functions; and causing an information processing device to execute a function when, from the text, a keyword registered for one of the plurality of functions, a second keyword registered for the function, and a command registered for the function are detected. 【0026】 (20) A non-temporary computer-readable medium according to one embodiment of the present disclosure is a non-temporary computer-readable medium according to any one of paragraphs (15) to (19), wherein the second keyword includes at least one of vehicle type or region information. 【0027】 According to a method, information processing apparatus, and non-temporary computer-readable medium according to one embodiment of this disclosure, the technology relating to a method for executing functions in an interactive system is improved. 【0028】 This is a block diagram showing the schematic configuration of a system according to one embodiment of the present disclosure. This is a block diagram showing the schematic configuration of a terminal device. This is a block diagram showing the schematic configuration of an information processing device. This is a block diagram showing the schematic configuration of a first server and a second server. This is a flowchart showing an example of operation of the information processing device 1. This is a flowchart showing an example of operation of the information processing device 2. This is a flowchart showing an example of operation of the information processing device 3. This is a diagram showing an example of a user interface output by a terminal device. 【0029】 The technology according to the embodiments of this disclosure will be described below with reference to the drawings. 【0030】 In each figure, identical or corresponding parts are denoted by the same reference numerals. In the description of this embodiment, the description of identical or corresponding parts will be omitted or simplified as appropriate. 【0031】 (Outline of Embodiment) The outline and configuration of the system 1 according to this embodiment will be described with reference to Figure 1. 【0032】The system 1 according to this embodiment comprises a terminal device 10, an information processing device 20, a first server 30, and a second server 40. The terminal device 10, the information processing device 20, the first server 30, and the second server 40 are all connected to a network 50, including, for example, a mobile communication network and the Internet. 【0033】 The terminal device 10 is any device used by staff of, for example, an automobile company (including a company that manufactures or sells automobiles), an automobile-related service company, or an automobile dealership. For example, a general-purpose electronic device such as a smartphone, PC, or tablet terminal, or a dedicated electronic device, can be used as the terminal device 10. Although Figure 1 shows an example in which System 1 is equipped with one terminal device 10, it is not limited to this. System 1 may be equipped with two or more terminal devices 10. 【0034】 The information processing device 20 is, for example, a server device installed in a data center or the like. For example, the information processing device 20 is a server belonging to a cloud computing system or other computing system. The information processing device 20 can communicate with terminal devices 10, etc., via the network 50. Although Figure 1 shows an example in which system 1 is equipped with one information processing device 20, it is not limited to this. System 1 may be equipped with two or more information processing devices 20. 【0035】The first server 30 is a server device installed, for example, in a data center. The first server 30 is a server belonging to a cloud computing system or other computing system. The first server 30 is equipped with a Large Language Model (LLM). In this embodiment, the large language model includes an arbitrary dialogue system such as a chatbot. The large language model outputs text corresponding to a prompt based on the input of a prompt, which is an instruction from the user. The first server 30 can communicate with an information processing device 20, etc., via a network 50. For example, the information processing device 20 inputs a prompt to the large language model of the first server 30 and obtains the output result corresponding to the prompt. Although Figure 1 shows an example in which the system 1 is equipped with one first server 30, it is not limited to this. The system 1 may be equipped with two or more first servers 30. 【0036】 The second server 40 is a server device installed, for example, in a data center. The second server 40 is a server belonging to a cloud computing system or other computing system. In this embodiment, the second server 40 is equipped with a database, etc., used when executing non-skill-based functions. For example, the second server 40 is equipped with a database that aggregates information such as vehicle catalogs and instruction manuals. The second server 40 can communicate with the information processing device 20, etc., via the network 50. Although Figure 1 shows an example in which the system 1 is equipped with one second server 40, it is not limited to this. The system 1 may be equipped with two or more second servers 40. 【0037】First, an overview of this embodiment will be described. The information processing device 20 registers one or more keywords and one or more commands for each of the multiple functions. Here, the information processing device 20 receives input information including text in natural language. When the information processing device 20 detects from the text a keyword and a command that have been registered for any of the multiple functions, it executes that function. The information processing device 20 outputs response information including result information obtained as a result of executing that function (hereinafter also referred to as "result information"). 【0038】 Here, "function" refers to the processing performed by the control unit 21 of the information processing device 20 in response to a question or request contained in a natural language text received as input information. Keywords are words such as "instruction manual," "manual," "fuel efficiency," "charging time," "appraisal market price," "catalog," "test drive," and "delivery date." Keywords are used by the control unit 21 to identify a function. Commands include words such as "show me," "I want to read it," "shall we take a look," "teach me," and "I want to know." Commands may include words indicating actions, questions, requests, etc. They are used by the control unit 21 to decide whether or not to perform the function identified by the keyword. Commands are also used by the control unit 21 to identify a function when the keyword is a homophone. 【0039】 The control unit 21 of the information processing device 20 can utilize the second server when executing a function. Furthermore, the control unit 21 can utilize the first server when generating response information. 【0040】 Thus, according to this embodiment, when the information processing device 20 detects a keyword and command registered for a certain function from a natural language text, it executes that function and outputs response information including result information obtained as a result of executing the function. Therefore, by identifying and executing a function based on a combination of keywords and commands registered for a certain function, the probability of the function desired by the user being executed is increased, and the technology related to the function execution method in the dialogue system is improved. 【0041】 Next, we will describe the various components of System 1. 【0042】 (Configuration of the Terminal Device) As shown in FIG. 2, the terminal device 10 includes a control unit 11, a storage unit 12, a communication unit 13, an input unit 14, and an output unit 15. 【0043】 The control unit 11 includes at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a CPU (central processing unit) or a GPU (graphics processing unit), or a dedicated processor specialized for specific processing. The dedicated circuit is, for example, an FPGA (field-programmable gate array) or an ASIC (application specific integrated circuit). The control unit 11 executes processes related to the operation of the terminal device 10 while controlling each part of the terminal device 10. 【0044】 The storage unit 12 includes at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or a combination of at least two of these. The semiconductor memory is, for example, a RAM (random access memory) or a ROM (read only memory). The RAM is, for example, a SRAM (static random access memory) or a DRAM (dynamic random access memory). The ROM is, for example, an EEPROM (electrically erasable programmable read only memory). The storage unit 12 functions as, for example, a main memory device, an auxiliary storage device, or a cache memory. The storage unit 12 stores data used for the operation of the terminal device 10 and data obtained by the operation of the terminal device 10. 【0045】The communication unit 13 includes at least one external communication interface. The communication interface may be either a wired or wireless communication interface. In the case of wired communication, the communication interface may be, for example, a LAN (Local Area Network) interface or a USB (Universal Serial Bus) interface. In the case of wireless communication, the communication interface may be, for example, an interface compatible with mobile communication standards such as LTE (Long Term Evolution), 4G (4th generation), or 5G (5th generation), or an interface compatible with short-range wireless communication such as WiFi (registered trademark) or Bluetooth (registered trademark). The communication unit 13 receives data used for the operation of the terminal device 10 and transmits data obtained by the operation of the terminal device 10. 【0046】 The input unit 14 includes at least one input interface. The input interface may be, for example, a physical key, a capacitive key, a pointing device, or a touchscreen integrated with a display. Alternatively, the input interface may be, for example, a microphone that accepts voice input or a camera that accepts gesture input. The input unit 14 accepts operations to input data used for the operation of the terminal device 10. Instead of being provided in the terminal device 10, the input unit 14 may be connected to the terminal device 10 as an external input device. Any connection method can be used, for example, USB (Universal Serial Bus), HDMI (High-Definition Multimedia Interface), or Bluetooth (Bluetooth). 【0047】The output unit 15 includes at least one output interface. The output interface is, for example, a display that outputs information visually, or a speaker that outputs information audibly, etc. The display is, for example, an LCD (liquid crystal display) or an organic EL (electro luminescence) display. The output unit 15 displays and outputs the data obtained by the operation of the terminal device 10. Instead of being provided in the terminal device 10, the output unit 15 may be connected to the terminal device 10 as an external output device. As the connection method, for example, any method such as USB, HDMI (registered trademark), or Bluetooth (registered trademark) can be used. 【0048】 The functions of the terminal device 10 are realized by executing a non - transient computer - readable medium according to this embodiment on a computer corresponding to the terminal device 10. That is, the functions of the terminal device 10 are realized by software. The non - transient computer - readable medium causes the computer to execute the operation of the terminal device 10, thereby making the computer function as the terminal device 10. That is, the computer functions as the terminal device 10 by executing the operation of the terminal device 10 according to the non - transient computer - readable medium. 【0049】 Some or all of the functions of the terminal device 10 may be realized by a dedicated circuit corresponding to the control unit 11. That is, some or all of the functions of the terminal device 10 may be realized by hardware. 【0050】 (Configuration of the information processing device) As shown in FIG. 3, the information processing device 20 includes a control unit 21, a storage unit 22, and a communication unit 23. 【0051】The control unit 21 includes at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a CPU (central processing unit) or GPU (graphics processing unit), or a dedicated processor specialized for a specific process. The dedicated circuit is, for example, an FPGA (field-programmable gate array) or an ASIC (application-specific integrated circuit). The control unit 21 controls each part of the information processing device 20 and executes processes related to the operation of the information processing device 20. 【0052】 The storage unit 22 includes at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or at least two combinations thereof. The semiconductor memory is, for example, RAM (random access memory) or ROM (read-only memory). The RAM is, for example, SRAM (static random access memory) or DRAM (dynamic random access memory). The ROM is, for example, EEPROM (electrically erasable programmable read-only memory). The storage unit 22 functions, for example, as a main memory, auxiliary memory, or cache memory. The storage unit 22 stores data used for the operation of the information processing device 20 and data obtained by the operation of the information processing device 20. 【0053】The communication unit 23 includes at least one external communication interface. The communication interface may be either a wired or wireless communication interface. In the case of wired communication, the communication interface may be, for example, a LAN (Local Area Network) interface or a USB (Universal Serial Bus) interface. In the case of wireless communication, the communication interface may be, for example, an interface compatible with mobile communication standards such as LTE (Long Term Evolution), 4G (4th generation), or 5G (5th generation), or an interface compatible with short-range wireless communication such as WiFi (registered trademark) or Bluetooth (registered trademark). The communication unit 23 receives data used in the operation of the information processing device 20 and transmits data obtained by the operation of the information processing device 20. 【0054】 The functions of the information processing device 20 are realized by executing a non-temporary computer-readable medium according to this embodiment on a processor corresponding to the control unit 21. In other words, the functions of the information processing device 20 are realized by software. The non-temporary computer-readable medium causes the computer to function as the information processing device 20 by having the computer execute the operations of the information processing device 20. That is, the computer functions as the information processing device 20 by executing the operations of the information processing device 20 according to the non-temporary computer-readable medium. 【0055】In this embodiment, the computer temporarily stores non-temporary computer-readable media, such as those recorded on a portable recording medium or transmitted from a server, in its main memory. The computer then reads the non-temporary computer-readable media stored in the main memory with its processor and executes processing according to the read non-temporary computer-readable media with the processor. The computer may also directly read non-temporary computer-readable media from a portable recording medium and execute processing according to the non-temporary computer-readable media. The computer may also sequentially execute processing according to the received non-temporary computer-readable media each time it receives non-temporary computer-readable media from an external server. Processing may also be executed by a so-called ASP (application service provider) type service, which does not transmit non-temporary computer-readable media from an external server to the computer, but realizes its function only through execution instructions and result acquisition. Non-temporary computer-readable media includes information used for processing by an electronic computer that is equivalent to non-temporary computer-readable media. For example, data that is not a direct instruction to a computer but has the property of defining the computer's processing falls under the category of "something equivalent to a non-temporary computer-readable medium." 【0056】In this embodiment, non-temporary computer-readable media can be recorded on a computer-readable recording medium. The computer-readable recording medium includes non-temporary computer-readable media and is, for example, a magnetic recording device, an optical disc, a magneto-optical recording medium, or a semiconductor memory. Distribution of non-temporary computer-readable media is carried out, for example, by selling, transferring, or leasing portable recording media such as DVDs (digital versatile discs) or CD-ROMs (compact disc read-only memory) on which non-temporary computer-readable media are recorded. Alternatively, distribution of non-temporary computer-readable media may be carried out by storing them in the storage of an external server and transmitting them from the external server to other computers. Furthermore, non-temporary computer-readable media may be provided as a non-temporary computer-readable media product. 【0057】 Some or all of the functions of the information processing device 20 may be implemented by a dedicated circuit corresponding to the control unit 21. In other words, some or all of the functions of the information processing device 20 may be implemented by hardware. 【0058】 (Configuration of the first server) As shown in Figure 4, the first server 30 comprises a control unit 31, a storage unit 32, and a communication unit 33. 【0059】 The control unit 31 includes at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a CPU (central processing unit) or GPU (graphics processing unit), or a dedicated processor specialized for a specific process. The dedicated circuit is, for example, an FPGA (field-programmable gate array) or an ASIC (application-specific integrated circuit). The control unit 31 controls each part of the first server 30 and executes processes related to the operation of the first server 30. 【0060】 The storage unit 32 includes at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or at least two combinations thereof. The semiconductor memory is, for example, RAM (random access memory) or ROM (read-only memory). The RAM is, for example, SRAM (static random access memory) or DRAM (dynamic random access memory). The ROM is, for example, EEPROM (electrically erasable programmable read-only memory). The storage unit 32 functions, for example, as a main memory, auxiliary memory, or cache memory. The storage unit 32 stores data used for the operation of the first server 30 and data obtained by the operation of the first server 30. 【0061】 For example, the memory unit 32 stores a large-scale language model. The large-scale language model may be capable of realizing a RAG function that generates answers by referencing one or more databases. Alternatively, for example, the large-scale language model may be stored in GPU memory such as the VRAM of the GPU included in the control unit 31. Examples of GPU memory include random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), and other forms of memory known in the art. In examples where the GPU is configured as part of another processor, such as a host processor, the GPU memory can be accessed by components other than the GPU. 【0062】The communication unit 33 includes at least one external communication interface. The communication interface may be either a wired or wireless communication interface. In the case of wired communication, the communication interface may be, for example, a LAN (Local Area Network) interface or a USB (Universal Serial Bus) interface. In the case of wireless communication, the communication interface may be, for example, an interface compatible with mobile communication standards such as LTE (Long Term Evolution), 4G (4th generation), or 5G (5th generation), or an interface compatible with short-range wireless communication such as WiFi (registered trademark) or Bluetooth (registered trademark). The communication unit 33 receives data used for the operation of the first server 30 and transmits data obtained by the operation of the first server 30. 【0063】 The functions of the first server 30 are realized by executing a non-temporary computer-readable medium according to this embodiment on a processor corresponding to the control unit 31. In other words, the functions of the first server 30 are realized by software. The non-temporary computer-readable medium causes the computer to function as the first server 30 by having the computer execute the operations of the first server 30. That is, the computer functions as the first server 30 by executing the operations of the first server 30 according to the non-temporary computer-readable medium. 【0064】 Some or all of the functions of the first server 30 may be implemented by a dedicated circuit corresponding to the control unit 31. In other words, some or all of the functions of the first server 30 may be implemented by hardware. 【0065】 (Configuration of the second server) Similarly, as shown in Figure 4, the second server 40 comprises a control unit 41, a storage unit 42, and a communication unit 43. The second server 40 is, for example, a business system or business support system for an automobile company, an automobile-related service company, an automobile dealership, etc. 【0066】The control unit 41 includes at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a CPU or GPU, or a dedicated processor specialized for a specific process. The dedicated circuit is, for example, an FPGA or ASIC. The control unit 41 controls each part of the second server 40 and executes processes related to the operation of the second server 40. 【0067】 The storage unit 42 includes at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or at least two combinations thereof. The semiconductor memory is, for example, RAM or ROM. The RAM is, for example, SRAM or DRAM. The ROM is, for example, EEPROM. The storage unit 42 functions, for example, as a main memory, auxiliary memory, or cache memory. The storage unit 42 stores data used for the operation of the second server 40 and data obtained by the operation of the second server 40. For example, the storage unit 42 stores information such as vehicle catalogs and instruction manuals. The storage unit 42 may classify or tag the information such as catalogs and instruction manuals by vehicle type and store it. The storage unit 42 may also store information related to the company's own vehicle types and information related to other companies' vehicle types separately. This information may also be stored in the storage unit 22 of the information processing device 20. 【0068】 The communication unit 43 includes at least one external communication interface. The communication interface may be either a wired or wireless communication interface. In the case of wired communication, the communication interface may be, for example, a LAN interface or USB. In the case of wireless communication, the communication interface may be, for example, an interface compatible with mobile communication standards such as LTE, 4G, or 5G, or an interface compatible with short-range wireless communication such as WiFi® or Bluetooth®. The communication unit 43 receives data used for the operation of the second server 40 and transmits data obtained by the operation of the second server 40. 【0069】The functions of the second server 40 are realized by executing a non-temporary computer-readable medium according to this embodiment on a processor corresponding to the control unit 41. In other words, the functions of the second server 40 are realized by software. The non-temporary computer-readable medium causes the computer to function as the second server 40 by having the computer execute the operations of the second server 40. That is, the computer functions as the second server 40 by executing the operations of the second server 40 according to the non-temporary computer-readable medium. 【0070】 Some or all of the functions of the second server 40 may be implemented by a dedicated circuit corresponding to the control unit 41. In other words, some or all of the functions of the second server 40 may be implemented by hardware. 【0071】 (Example of operation of the information processing device 1) An example of operation of the information processing device 20 according to this embodiment will be described with reference to Figure 5. Figure 5 is a flowchart showing an example of a method executed by the information processing device 20 according to this embodiment. 【0072】 S100: The control unit 21 of the information processing device 20 registers one or more keywords and one or more commands for each of the multiple functions. Any method can be used to register the keywords and commands. For example, the keywords and commands may be stored in advance in the storage unit 22 of the information processing device 20. Alternatively, candidate keywords and commands may be acquired by the information processing device 20 via the input interface of the terminal device 10. In this case, the control unit 21 of the information processing device 20 may store the necessary keywords and commands from the candidate keywords and commands acquired via the terminal device 10 in the storage unit 22. 【0073】S200: The control unit 21 receives input information including text in natural language. The control unit 21 receives input information including text in natural language via the input interface of the terminal device 10. The text in natural language includes at least one of speech information or text information. If the text included in the input information is text, the control unit 21 may use the text for processing by applying natural language processing or the like to the text. If the text included in the input information is speech, the control unit 21 converts the text into text information by transcribing it using speech recognition processing. The control unit 21 may use the converted text information for processing by applying natural language processing or the like to the text. 【0074】 S300: When the control unit 21 detects a keyword registered to one of the multiple functions and a command registered to that function from the text, it executes that function. When the control unit 21 detects a keyword associated with one of the multiple functions from the text, it identifies the function to be executed. Furthermore, when the control unit 21 detects a command associated with that function from the text, it executes that function. The control unit 21 obtains result information corresponding to the input information by executing the function. 【0075】For example, if the control unit 21 detects the keyword "instruction manual" registered in the instruction manual display function from the text, it identifies the instruction manual display function. If the control unit 21 further detects the command "show me" registered in the function from the text, it executes the instruction manual display function. Also, for example, if the control unit 21 detects the keyword "manual" registered in the instruction manual display function from the text, it identifies the instruction manual display function. If the control unit 21 further detects the command "I want to read" registered in the function from the text, it executes the instruction manual display function. However, if the control unit 21 detects only "manual" in the text and does not detect a command such as "I want to read," it does not execute the instruction manual display function. Homophones such as "manual" may be registered in the manual transmission vehicle catalog list display function as well as the instruction manual display function. In this case, if the control unit 21 detects "manual," it also identifies the manual transmission vehicle catalog list display function. Then, if the control unit 21 further detects the command "tell me what gear it is" registered in the manual transmission vehicle catalog list display function from the text, it executes the manual transmission vehicle catalog list display function. 【0076】 S400: The control unit 21 outputs response information including result information obtained as a result of executing a function. In S300, the control unit 21 executes a function and acquires result information. The result information is, for example, information related to the function executed by the control unit 21. The control unit 21 outputs response information including the acquired result information. The control unit 21 outputs the response information via the output interface of the terminal device 10. If the function executed by the control unit 21 is the instruction manual display function, the response information may be, for example, "The instruction manual will be displayed." 【0077】As described above, according to Operation Example 1 of this embodiment, one or more keywords and one or more commands are registered for each of the multiple functions. When the control unit 21 detects a keyword and a command registered for any function from the natural language text it receives as input, it executes that function. As in Operation Example 1 of this embodiment, by identifying and executing a function using a combination of keywords and commands, the probability that the function desired by the user will be executed correctly is increased. Therefore, the technology related to the function execution method in the dialogue system is improved in that it becomes easier to obtain accurate output results. 【0078】 (Example of operation of the information processing device 2) An example of operation of the information processing device 20 according to this embodiment will be described with reference to Figure 6. Figure 6 is a flowchart of an example of a method performed by the information processing device 20 according to this embodiment. The same reference numerals are used for the same operations as in Figure 5. 【0079】 S100: The control unit 21 of the information processing device 20 registers one or more keywords and one or more commands for each of the multiple functions. Any method can be used to register the keywords and commands. For example, the keywords and commands may be stored in advance in the storage unit 22 of the information processing device 20. Alternatively, candidate keywords and commands may be acquired by the information processing device 20 via the input interface of the terminal device 10. In this case, the control unit 21 of the information processing device 20 may store the necessary keywords and commands from the candidate keywords and commands acquired via the terminal device 10 in the storage unit 22. 【0080】S200: The control unit 21 receives input information including text in natural language. The control unit 21 receives input information including text in natural language via the input interface of the terminal device 10. The text in natural language includes at least one of speech information or text information. If the text included in the input information is text, the control unit 21 may use the text for processing by applying natural language processing or the like to the text. If the text included in the input information is speech, the control unit 21 converts the text into text information by transcribing it using speech recognition processing. The control unit 21 may use the converted text information for processing by applying natural language processing or the like to the text. 【0081】 S300: When the control unit 21 detects a keyword registered in one of the multiple functions and a command registered in that function from the text, it executes that function. When the control unit 21 detects a keyword associated with one of the multiple functions from the text, it identifies the function to be executed. Furthermore, when the control unit 21 detects a command associated with that function from the text, it executes that function. By executing the function, the control unit 21 obtains result information corresponding to the input information. The result information is, for example, information from the RAG database that should be referenced by the large-scale language model. 【0082】 S310: The control unit 21 generates a prompt based on the input information and result information. For example, the control unit 21 generates a prompt that combines the input information and result information. 【0083】 S320: The control unit 21 inputs the prompt into the large-scale language model to obtain a draft response. The control unit 21 transmits the prompt generated in S310 to the first server 30 via the communication unit 23. The control unit 21 obtains a draft response from the first server 30 via the communication unit 23. 【0084】 S410: The control unit 21 outputs response information including a draft response. The control unit 21 generates response information including the draft response obtained in S320. The response information may be the same as the draft response. 【0085】As described above, according to the operation example 2 of this embodiment, the control unit 21 generates a prompt based on the input information and result information. The control unit 21 then inputs the prompt into a large-scale language model to obtain a draft response. As a result, the control unit 21 can answer the user's question with greater accuracy. Therefore, the probability of the function desired by the user being executed is increased, and the technology related to the function execution method in the dialogue system is improved in that it outputs response information with greater accuracy. 【0086】 Here, the multiple functions may include at least one of skill-based functions and non-skill-based functions. Skill-based functions are functions that are realized by activating a dedicated non-temporary computer-readable medium (skill). When the control unit 21 executes a skill-based function, it activates the dedicated non-temporary computer-readable medium (skill). The dedicated non-temporary computer-readable medium may be installed on the terminal device 10. In this case, the control unit 21 may execute the dedicated non-temporary computer-readable medium (skill) installed on the terminal device 10. 【0087】 Skill-based functions may include, for example, at least one of the following: product specification search, used car market price search, catalog list display, instruction manual display, test drive vehicle search, and delivery date search. Product specification search is a function implemented by an application that searches for vehicles that meet specified conditions. The search target for product specification search may include information published on the internet. The control unit 21 can execute the product specification search function when the text includes phrases such as "Tell me about cars with a total height of 1.5m or less." In this case, the keyword registered in the product specification function is "total height," and the command is "tell me." 【0088】 Furthermore, the used car market price search is a function implemented by a cross-search used car application that searches across used cars at various dealerships. The control unit 21 can execute the used car market price search function when the text includes phrases such as "Tell me the market price for a Prius." In this case, the keyword registered in the used car market price search function is "market price," and the command is "Tell me." 【0089】 The catalog list display function is implemented by an application that displays catalog information in a list. The control unit 21 can execute the catalog list display function when the text contains phrases such as "Show me the Prius catalog." In this case, the keyword registered in the catalog list display function is "catalog," and the command is "show me." 【0090】 The instruction manual display function is implemented by an application that displays the instruction manual. The control unit 21 can execute the instruction manual display function when the text includes phrases such as "Show me the Prius instruction manual." In this case, the keyword registered in the instruction manual display function is "instruction manual" or "manual," and the command is "show me." 【0091】 The test drive vehicle search function is implemented by an application that searches for dealerships, dates, etc., where a desired vehicle can be test driven. The control unit 21 can execute the test drive vehicle search function when the text includes phrases such as "Tell me which dealerships offer test drives of the Prius." In this case, the keyword registered in the test drive vehicle search function is "test drive," and the command is "tell me." 【0092】 The delivery date search is a function implemented by an application that searches for delivery dates. The control unit 21 can execute the delivery date search function when the text contains phrases such as "Tell me the delivery date for the Prius." In this case, the keyword registered in the delivery date search function is "delivery date," and the command is "tell me." 【0093】 Skill-based functions may also include any other functions that are realized by activating a skill. The response information may be a comment indicating that the skill will be activated if the function executed by the control unit 21 is a skill-based function. 【0094】 Non-skill-based functions are any functions other than skill-based functions. Non-skill-based functions may include, for example, at least one of the following: catalog-based RAG, web search-based RAG, or intranet search. 【0095】Catalog-based RAG is a RAG function that causes a large-scale language model to refer to a catalog database stored on a second server. The control unit 21 can execute the catalog-based RAG function when the text contains phrases such as "Tell me the fuel efficiency of a Prius." In this case, the keyword registered in the catalog-based RAG function is "fuel efficiency," and the command is "tell me." 【0096】 The web search-based RAG is a RAG function that causes a large-scale language model to refer to a database of web search result information stored on a second server. The control unit 21 may execute the web search-based RAG function when the text contains phrases such as "Tell me about driving spots in Ariake" or "Tell me the fuel efficiency of the N-Box." In this case, the keyword registered in the web search-based RAG function is "fuel efficiency," and the command is "tell me." The control unit 21 may decide whether to execute catalog-based RAG or web search-based RAG by determining whether the car in the text is a vehicle sold by the company or a vehicle sold by another company. 【0097】 The intranet search function searches for information that is confidential or shared only within the group of companies. The control unit 21 can execute the intranet search function if the text contains phrases such as "Tell me how to cancel an order." In this case, the keyword registered in the intranet search function is "how to cancel an order," and the command is "tell me." 【0098】 Large-scale language models can reduce erroneous answers (hallucination) by using RAGs such as catalog-based RAGs or web search-based RAGs. The answer information may be comments that provide a substantive answer to the user's question, if the function executed by the control unit 21 is a non-skill-based function. 【0099】 (Operation Example 3 of the Information Processing Device) An operation example 3 of the information processing device 20 according to this embodiment will be described with reference to Figure 7. Figure 7 is a flowchart showing an example of a method executed by the information processing device 20 according to this embodiment. The same reference numerals are used for operations that are the same as those in Figures 5 and 6. 【0100】S101: The control unit 21 of the information processing device 20 registers one or more keywords, one or more second keywords, and one or more commands for each of the multiple functions. Any method can be used to register the keywords, second keywords, and commands. For example, the keywords, second keywords, and commands may be stored in advance in the storage unit 22 of the information processing device 20. Alternatively, candidates for keywords, second keywords, and commands may be acquired by the information processing device 20 via the input interface of the terminal device 10. In this case, the control unit 21 of the information processing device 20 may store the necessary candidates for keywords, second keywords, and commands acquired via the terminal device 10 in the storage unit 22 as keywords, second keywords, and commands. 【0101】 Here, the second keyword is a word used to further narrow down the function that the control unit 21, identified by the keyword, should perform. Examples of second keywords include words such as "Prius" and "Ariake." The second keyword is useful when a certain keyword is registered to multiple functions. 【0102】 S200: The control unit 21 receives input information including text in natural language. The control unit 21 receives input information including text in natural language via the input interface of the terminal device 10. The text in natural language includes at least one of speech information or text information. If the text included in the input information is text, the control unit 21 may use the text for processing by applying natural language processing or the like to the text. If the text included in the input information is speech, the control unit 21 converts the text into text information by transcribing it using speech recognition processing. The control unit 21 may use the converted text information for processing by applying natural language processing or the like to the text. 【0103】S301: When the control unit 21 detects a keyword registered to one of the multiple functions, a second keyword registered to that function, and a command registered to that function from the text, it executes that function. When the control unit 21 detects a keyword associated with one of the multiple functions and a second keyword associated with that function from the text, it identifies the function to be executed. Furthermore, when the control unit 21 detects a command associated with that function from the text, it executes that function. By executing the function, the control unit 21 obtains result information corresponding to the input information. 【0104】 For example, if the control unit 21 detects the keyword "instruction manual" registered in the instruction manual display function from the text, it identifies the instruction manual display function. However, if instruction manual display functions exist for several vehicles and the keyword "instruction manual" is registered for all of them, the control unit 21 cannot determine which vehicle's instruction manual display function to execute based on that keyword. In this case, if the control unit 21 detects a second keyword, "Prius," from the text, it identifies the instruction manual display function for the Prius. If the control unit 21 detects the command "show me," which is registered in that function, it executes the instruction manual display function for the Prius. 【0105】 S400: The control unit 21 outputs response information including result information obtained as a result of executing a function. In S300, the control unit 21 executes a function and acquires result information. The result information is information related to the function executed by the control unit 21 or information such as the RAG database that should be referenced by the large-scale language model. The control unit 21 outputs response information including the acquired result information. The control unit 21 outputs the response information via the output interface of the terminal device 10. 【0106】 Thus, according to the operation example 3 of this embodiment, by identifying and executing a function using a combination of a keyword, a second keyword, and a command, the probability of executing the function desired by the user is further increased. Therefore, the technology related to the function execution method in the dialogue system is improved in that it becomes easier to obtain accurate output results. 【0107】 Here, the second keyword may include at least one piece of information about the vehicle type or the region. If the second keyword is information about the vehicle type, for example, "Prius" would be an example. For example, if the natural language sentence is "Tell me about driving spots in Ariake," then "driving spots" would be the keyword and "Ariake" would be the second keyword. 【0108】 Although Operation Examples 1 through 3 have been described as independent operation examples, these operation examples or their variations may be combined. For example, the control unit 21 may use a large-scale language model in Operation Example 1 or Operation Example 3. In this case, after executing a function in S300 or S301, the control unit 21 may generate a prompt based on the input information and result information. The control unit 21 may then input the prompt into the large-scale language model to obtain a draft response. Furthermore, the control unit 21 may output response information including the draft response. 【0109】 (Example of User Interface) The following shows an example of a user interface displayed by the terminal device 10. The control unit 21 of the information processing device 20 receives input information, including text in natural language, through the user interface shown in Figure 8. 【0110】 In the user interface screen shown in Figure 8, the natural language sentence that the control unit 21 accepts as input is, "How long does it take to charge the new Prius using a 100V power supply?". The control unit 21 identifies and executes the catalog-based RAG function for the new Prius, which has the keyword "charging time", the second keyword related to the vehicle model "new Prius", and the command "how long does it take?" registered. The control unit 21 outputs answer information including result information. At this time, the control unit 21 generates a prompt based on the database information related to the new Prius catalog RAG, which is the result information, and the input information including the natural language sentence, and inputs it into a large-scale language model to obtain a draft answer sentence. Using this draft answer sentence, the control unit 21 outputs "Approximately 8 hours" as the answer information, which is an appropriate answer when charging with a 100V power supply. 【0111】Furthermore, the natural language text may include the content of a dialogue between multiple speakers. That is, the control unit 21 may extract keywords, secondary keywords, and commands in light of the context of the text containing the content of a dialogue between multiple speakers. 【0112】 Furthermore, if the natural language text does not contain any registered keywords, second keywords, or commands, the control unit 21 may generate a prompt based on the text without executing any of the functions. In this case, the control unit 21 may input the generated prompt into a large-scale language model to obtain a draft response and display it as response information. 【0113】 While this disclosure has been described based on the drawings and embodiments, it should be noted that those skilled in the art will find it easy to make various modifications and alterations based on this disclosure. Therefore, it should be noted that these modifications and alterations are within the scope of this disclosure. For example, the functions included in each component or step (indicated by S100, etc.) can be rearranged in a logically consistent manner, and multiple components or steps can be combined into one or separated. 【0114】 1 System 10 Terminal device 11 Control unit 12 Storage unit 13 Communication unit 14 Input unit 15 Output unit 20 Information processing device 21 Control unit 22 Storage unit 23 Communication unit 30 First server 31 Control unit 32 Storage unit 33 Communication unit 40 Second server 41 Control unit 42 Storage unit 43 Communication unit 50 Network
Claims
1. A method to be executed by an information processing device, comprising: registering one or more keywords and one or more commands for each of a plurality of functions; receiving input information including text in natural language; when a keyword and a command registered for any of the plurality of functions are detected in the text, executing the function; and outputting response information including result information obtained as a result of executing the function.
2. A method according to claim 1, further comprising: generating a prompt based on the input information and the result information; inputting the prompt into a large-scale language model to obtain a draft response; wherein the response information includes the draft response.
3. The method according to claim 2, wherein the plurality of functions include at least one of skill-based functions and non-skill-based functions.
4. A method according to claim 3, wherein the skill-based function includes at least one of product specification search, used car market price search, catalog list display, instruction manual display, test drive vehicle search, and delivery date search.
5. A method according to claim 4, further comprising: registering one or more second keywords for each of a plurality of functions; and executing the function when a keyword registered for any of the plurality of functions, a second keyword registered for the function, and a command registered for the function are detected from the text.
6. The method according to claim 5, wherein the second keyword includes at least one of information relating to a vehicle type or a region.
7. The method according to claim 6, wherein the text includes the content of a dialogue between multiple speakers.
8. An information processing device comprising a processor that performs processing including registering one or more keywords and one or more commands for each of multiple functions, receiving input information including text in natural language, executing a function when a keyword and a command registered in any of the multiple functions are detected in the text, and outputting response information including result information obtained as a result of executing the function.
9. An information processing device according to claim 8, further comprising: generating a prompt based on the input information and the result information; inputting the prompt into a large-scale language model to obtain a draft response; and the response information comprises a processor that performs processing including the draft response.
10. An information processing device according to claim 9, wherein the plurality of functions include at least one of skill-based functions and non-skill-based functions.
11. An information processing device according to claim 10, wherein the skill-based function includes at least one of product specification search, used car market price search, catalog list display, instruction manual display, test drive vehicle search, and delivery date search.
12. An information processing device according to claim 11, further comprising a processor that performs a process including registering one or more second keywords for each of a plurality of functions, and executing a function when a keyword registered for one of the plurality of functions, a second keyword registered for the function, and a command registered for the function are detected from the text.
13. An information processing device according to claim 12, wherein the second keyword includes at least one of vehicle type information or region information.
14. An information processing device according to claim 13, wherein the text includes the content of a dialogue between multiple speakers.
15. A non-temporary computer-readable medium for which an information processing device is executed, wherein the medium causes the information processing device to perform a process that includes: registering one or more keywords and one or more commands for each of a plurality of functions; receiving input information including text in natural language; and, when a keyword and a command registered for any of the plurality of functions are detected in the text, executing the function, and outputting response information including result information obtained as a result of executing the function.
16. A non-temporary computer-readable medium according to claim 15, further comprising: generating a prompt based on the input information and the result information; inputting the prompt into a large-scale language model to obtain a draft response; wherein the response information causes an information processing device to perform processing including the draft response.
17. A non-temporary computer-readable medium according to claim 16, wherein the plurality of functions include at least one of skill-based functions and non-skill-based functions.
18. A non-temporary computer-readable medium according to claim 17, wherein the skill-based function includes at least one of product specification search, used car market price search, catalog list display, instruction manual display, test drive vehicle search, and delivery date search.
19. A non-temporary computer-readable medium according to claim 18, further comprising: registering one or more second keywords for each of a plurality of functions; and causing an information processing device to execute a process that includes executing a function when the keyword registered for any of the plurality of functions, the second keyword registered for the function, and the command registered for the function are detected from the text.
20. A non-temporary computer-readable medium according to claim 19, wherein the second keyword includes at least one of vehicle type information or region information.