Information processing device, information processing method, and information processing program
The information processing device and method enhance conversation support by using tag information to identify user interests beyond text-based proper nouns, enabling more relevant and accurate information presentation.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- PIONEER IP
- Filing Date
- 2024-12-16
- Publication Date
- 2026-06-26
Smart Images

Figure 2026105708000001_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the technical field of information processing apparatuses, information processing methods, and information processing programs. More specifically, it belongs to the technical field of information processing apparatuses and information processing methods that perform processing based on inputs such as utterances from users, and programs for such information processing apparatuses.
Background Art
[0002] In recent years, research and development have been conducted on conversation support systems that support user conversations. As a document showing prior art corresponding to such a current situation, for example, the following Patent Document 1 can be cited.
[0003] This Patent Document 1 discloses a conversation support system having a configuration in which, by performing conventional morphological analysis processing on the utterance of a speaker, proper nouns included in the sentence as the utterance are recorded as the speaker's interesting words. And when a user as a speaker requests the conversation support system to provide a common topic with another user who the user is scheduled to meet, for example, the conversation support system determines whether there are common interesting words between the user and the other user, and when there are common interesting words, performs a search on an external information providing site based on the common interesting words, and provides the user with a topic to be used as a conversation in the meeting with the other user.
Prior Art Documents
Patent Documents
[0004]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0005] However, in the prior art disclosed in Patent Document 1, the above-mentioned interest words, which are proper nouns, are extracted from words contained in the text as a user's utterance. Therefore, it was not possible to identify words that were not directly contained in the text, such as the user's utterance, as interest words. Consequently, there is room for improvement in terms of providing the user with topics and information using matters related to the utterance other than the text itself.
[0006] Therefore, this application was made in view of the above-mentioned requests for improvement, and one example of the problem is to provide an information processing device, an information processing method, and a program for the information processing device that enable the presentation of information to a user in accordance with the user's preferences. [Means for solving the problem]
[0007] To solve the above problems, the invention described in claim 1 includes: a first acquisition means for acquiring first tag information which includes at least a first tag that indicates either the relationship between components constituting the input from the user, a pre-set attribute to which the content of the input belongs, or a summary of the content, and is attached to the input; a second acquisition means for acquiring second tag information which includes at least a second tag that indicates either the relationship between components constituting the obtainable information, a pre-set attribute to which the content of the obtainable information belongs, or a summary of the content, and is attached to the obtainable information; a generation means for generating presentation information to be presented to the user based on each of the acquired first tag information and second tag information; and a presentation means for presenting the generated presentation information to the user.
[0008] To solve the above problems, the invention described in claim 7 is an information processing method to be performed in an information processing apparatus comprising a first acquisition means, a second acquisition means, a generation means, and a presentation means, the method comprising: a first acquisition step of acquiring first tag information by the first acquisition means, which includes at least a first tag that indicates either the relationship between components constituting an input from a user, a pre-set attribute to which the content of the input belongs, or a summary of the content, and is attached to the input; a second acquisition step of acquiring second tag information by the second acquisition means, which includes at least a second tag that indicates either the relationship between components constituting the obtainable information, a pre-set attribute to which the content of the obtainable information belongs, or a summary of the content, and is attached to the obtainable information; a generation step of generating presentation information to be presented to the user by the generation means based on each of the acquired first tag information and second tag information; and a presentation step of presenting the generated presentation information to the user by the presentation means.
[0009] To solve the above problems, the invention described in claim 8 makes a computer included in an information processing device function as: a first acquisition means for acquiring first tag information which includes at least a first tag that indicates either the relationship between components constituting the input from the user, a pre-set attribute to which the content of the input belongs, or a summary of the content, and is attached to the input; a second acquisition means for acquiring second tag information which includes at least a second tag that indicates either the relationship between components constituting the obtainable information, a pre-set attribute to which the content of the obtainable information belongs, or a summary of the content, and is attached to the obtainable information; a generation means for generating presentation information to be presented to the user based on each of the acquired first tag information and second tag information; and a presentation means for presenting the generated presentation information to the user. [Brief explanation of the drawing]
[0010] [Figure 1] This is a block diagram showing the outline configuration of the information processing device of the embodiment. [Figure 2] The following are block diagrams showing the general configuration of the conversation support system of the first embodiment, including the terminal device of the first embodiment; (a) is the block diagram, and (b) is a diagram illustrating the recording method of tag information, etc., of the first embodiment. [Figure 3] This is a flowchart showing the information processing in the first embodiment. [Figure 4] Figure (I) illustrates the prompt and response in the information processing of the first embodiment, where (a) is Figure (i) showing the example, (b) is Figure (ii) showing the example, (c) is Figure (iii) showing the example, and (d) is Figure (iv) showing the example. [Figure 5] Figure (II) illustrates the prompt and its response in the information processing of the first embodiment. [Figure 6] This is a flowchart showing the information processing for the first modified example. [Figure 7] These figures illustrate prompts and their responses in the information processing of the first modified example, where (a) is Figure (i) showing the example and (b) is Figure (ii) showing the example. [Figure 8] This is a flowchart showing the information processing for the second modified example. [Figure 9] The second modified example shows a diagram illustrating the prompt and its response in information processing, where (a) is diagram (i) showing the example and (b) is diagram (ii) showing the example. [Figure 10] This is a flowchart showing the information processing in the second embodiment. [Figure 11] Figure (I) illustrates the prompt and response in the information processing of the second embodiment, where (a) is Figure (i) showing the example and (b) is Figure (ii) showing the example. [Figure 12] Figure (II) illustrates the prompt and its response in the information processing of the second embodiment. [Figure 13] This is a flowchart showing the information processing in the third embodiment. [Figure 14]FIG. is a diagram exemplifying a prompt and its response in the information processing of the third embodiment, where (a) is FIG. (i) showing the example, and (b) is FIG. (ii) showing the example.
BEST MODE FOR CARRYING OUT THE INVENTION
[0011] Next, the mode for carrying out the present application will be described with reference to FIG. 1. Note that FIG. 1 is a block diagram showing the schematic configuration of the information processing apparatus of the embodiment.
[0012] As shown in FIG. 1, the information processing apparatus S of the embodiment includes a first acquisition unit 3, a second acquisition unit 3A, a generation unit 6, and a presentation unit 6A.
[0013] In this configuration, the first acquisition unit 3 acquires first tag information that includes at least a first tag that indicates any one of the relationships between the components constituting the input from the user, a preset attribute to which the content of the input belongs, or a summary of the content, and that is attached to the input. Further, the second acquisition unit 3A acquires second tag information that includes at least a second tag that is attached to acquirable information that can be acquired from the outside other than the above input, and that indicates any one of the relationships between the components constituting the acquirable information, a preset attribute to which the content of the acquirable information belongs, or a summary of the content, and that is attached to the acquirable information.
[0014] Then, the generation unit 6 generates presentation information to be presented to the user based on the acquired first tag information and second tag information respectively. Thereby, the presentation unit 6A presents the generated presentation information to the user.
[0015] As described above, according to the operation of the information processing apparatus S of the embodiment, since the presentation information addressed to the user is generated based on the first tag information and the second tag information, the presentation information conforming to the user's preference can be presented to the user.
EXAMPLE
[0016] Next, several specific embodiments corresponding to the above-described embodiments will be explained with reference to the drawings. Each embodiment described below is an embodiment applied to a terminal device that can be connected via a network such as the Internet to a conversation support system that can utilize a large-scale language model (e.g., LLM (Large Language Model)) on a so-called cloud. The terminal device is a terminal device used by a user (applicant) who uses the language model system. Specifically, the terminal device is implemented as, for example, a smartphone carried by the user or a personal computer used by the user.
[0017] (I) First Example First, the first embodiment corresponding to the embodiment will be described using Figures 2 to 5. Figure 2 is a block diagram showing the general configuration of the conversation support system of the first embodiment, including the terminal device of the first embodiment; Figure 3 is a flowchart showing the information processing of the first embodiment; and Figures 4 and 5 are diagrams illustrating the prompts and responses in the information processing, respectively. In Figure 2, the same component number as the component in the information processing device S of the embodiment shown in Figure 1 is used for each component of the first embodiment that corresponds to each component in the information processing device S.
[0018] (1) Configuration and Operation of the Conversation Support System in the First Embodiment As shown in Figure 2(a), the conversation support system SS of the first embodiment consists of a terminal device T of the first embodiment used by the user, a language model system 100 that can utilize the large-scale language model MD on, for example, a so-called cloud, and the network NW that connects the terminal device T and the language model system 100. The terminal device T consists of an interface 3, a processing unit 10 consisting of a CPU (Central Processing Unit), ROM (Read Only Memory), RAM (Random Access Memory), etc., an operation unit 11 consisting of operation buttons or a touch panel, etc., a microphone 12, a display 13 consisting of a liquid crystal display, etc., and a speaker 14.
[0019] The processing unit 10 is composed of a speech acquisition unit 1, a prompt generation unit 2, a recording unit 5, and an output generation unit 6. In this case, the speech acquisition unit 1, the prompt generation unit 2, the recording unit 5, and the output generation unit 6 may be realized by hardware logic circuits such as the CPU that constitute the processing unit 10, or they may be realized in software by the CPU executing a program corresponding to the flowchart (see Figure 3) showing the information processing of the first embodiment, which will be described later. In this case, the program may be one that is pre-recorded in the terminal device T, or one that is acquired each time via the network NW. The interface 3 corresponds to an example of the first acquisition means 3 and an example of the second acquisition means 3A of the embodiment, as well as an example of the "content information acquisition means," an example of the "first transmission means," an example of the "first reception means," an example of the "acquirable information acquisition means," and an example of the "second transmission means" of the present application. The output generation unit 6 corresponds to an example of the generation means 6 and an example of the presentation means 6A of the embodiment, as well as an example of the "first instruction information generation means" and an example of the "second instruction information generation means" of the present application. Furthermore, as shown by the dashed lines in Figure 2, the interface 3 and the output generation unit 6 constitute an example of the information processing device S of this embodiment.
[0020] In the above configuration, under the control of the processing unit 10, interface 3 controls the exchange of data and information between the terminal device T and the language model system 100 via the network NW. Meanwhile, under the control of the processing unit 10, microphone 12 collects the sound of speech uttered by the user using the terminal device T and outputs the collected sound as an audio signal to the speech acquisition unit 1 of the processing unit 10. The speech acquisition unit 1 then transcribes the content of the speech based on the audio signal and outputs the transcribed speech to the prompt generation unit 2 and the recording unit 5. At this time, in response to the acquisition of the speech, the speech acquisition unit 1 acquires date and time information indicating the date and time of acquisition, location information indicating the geographical location of acquisition, and user identification information to identify the user who made the speech from others. Here, the date and time information is acquired, for example, from a timer (not shown) provided in the processing unit 10, and the location information is acquired, for example, from a location sensor (not shown) provided in the terminal device T (for example, a location sensor included in GNSS (Global Navigation Satellite System) or a standalone location sensor). Furthermore, the user identification information mentioned above can be, for example, the information entered by the user when they start using terminal device T and recorded in the recording unit 5.
[0021] The prompt generation unit 2 then generates a prompt for the first embodiment to obtain tag information for the first embodiment from the language model system 100 based on the transcribed utterance obtained from the utterance acquisition unit 1. At this time, the prompt generation unit 2 obtains tags such as nouns from the transcribed utterance using the same morphological analysis method as in the past, (i) Tags indicating the context of the content of the utterance, (ii) A tag indicating the category (attribute) to which the utterance belongs, (iii) Tags indicating a summary of the utterance The prompt of the first embodiment, which will be described in detail later, is generated as a prompt that allows the above tag information, including the above, to be obtained from the language model system 100 (more specifically, from the large-scale language model MD contained therein), and is output to the language model system 100 via interface 3 and network NW. In this case, the above context corresponds to an example of the "relationship between constituent elements" of the present invention.
[0022] Next, the language model system 100 that output the prompt of the first embodiment outputs response information including tag information of the first embodiment corresponding to the prompt of the first embodiment. This response information is acquired by the processing unit 10 via the network NW and interface 3. Subsequently, the recording unit 5 of the processing unit 10 associates the utterance output from the utterance acquisition unit 1, the tag information included in the acquired response information, the date and time information, the location information, and the user identification information, etc., and records them nonvolatably. As a result, as illustrated in Figure 2(b), the user identification information ID, the utterance V, the date and time information TM, the location information PL, and the tag information TG are associated with each other and recorded nonvolatably in the recording unit 5 as utterance-related tag information 50 for each user and each utterance V.
[0023] Next, the output generation unit 6 of the processing unit 10 generates information to be presented (broadcast) to the user via the speaker 14 based on the speech-related tag information 50, which contains the tag information TG and the like in relation to each other, and outputs it to the speaker 14. As a result, the speaker 14, under the control of the processing unit 10, broadcasts a sound corresponding to the information output from the output generation unit 6. If the information is to be presented to the user visually, the output generation unit 6 outputs the information to the display 13. As a result, the display 13, under the control of the processing unit 10, displays an image or the like corresponding to the information output from the output generation unit 6. Furthermore, when an operation necessary to perform the operation of the terminal device T, including the information processing of the first embodiment, is performed in the operation unit 11, the operation unit 11 generates an operation signal corresponding to the operation and outputs it to the processing unit 10. As a result, the processing unit 10 comprehensively controls each of the above functions of the terminal device T, including the information processing of the first embodiment, based on the operation signal.
[0024] (2) Information processing in the first embodiment Next, the information processing of the first embodiment, which is executed mainly by the processing unit 10 of the terminal device T having the configuration described above, will be specifically explained with reference to Figures 2 to 5.
[0025] The information processing in the first embodiment begins, for example, when a power switch (not shown) of the terminal device T is turned on. As shown in Figure 3, once the information processing in the first embodiment begins, the processing unit 10 first monitors whether the sound collection result corresponding to the sound of the user's speech has been acquired as an audio signal (step S1). If the audio signal is not acquired in the monitoring in step S1 (step S1: NO), the processing unit 10 proceeds to step S7, which will be described later.
[0026] On the other hand, if the above sound signal is acquired during monitoring in step S1 (step S1: YES), the speech acquisition unit 1 of the processing unit 10 then converts the content of the speech into text based on the sound signal using the same method as before, and outputs the converted speech to the prompt generation unit 2 and the recording unit 5 (step S2). In addition, the speech acquisition unit 1 acquires the above date and time information TM, the above location information PS, and the above user identification information ID, etc., in response to the acquisition of the speech.
[0027] Next, the prompt generation unit 2 generates the prompt of the first embodiment based on the transcribed utterance (step S3). At this time, if the user utters an utterance V1 having the content illustrated in the left of Figure 4(a), the prompt generation unit 2 generates the prompt PT1 of the first embodiment (hereinafter referred to as "prompt PT1," etc.) which includes the utterance V1 and an instruction statement OR1 that requests tag information including a plurality of tags (see (i) above) indicating the context of the content of the utterance V1 from the language model system 100. Subsequently, the prompt generation unit 2 outputs the generated prompt PT1, along with terminal identification information for identifying terminal device T from other terminal devices, to the language model system 100 via interface 3 and network NW (step S4).
[0028] Next, the processing unit 10 determines whether or not response information containing desired tag information corresponding to the prompt PT1 has been received from the language model system 100 (step S5). If the determination in step S5 is that the response information has not been received (step S5: NO), the processing unit 10 waits until the response information is received. On the other hand, if the determination in step S5 is that the response information has been received (step S5: YES), the recording unit 5 of the processing unit 10 records the tag information contained in the received response information as speech-related tag information 50, associating it with the original utterance V1, the corresponding date and time information TM and location information PS, and the user identification information ID, etc. (step S6). After that, the processing unit 10 determines whether or not to terminate the information processing of the first embodiment for reasons such as an operation to terminate the information processing of the first embodiment being performed in the operation unit 11 (step S7). If the determination in step S7 is to terminate the information processing of the first embodiment (step S7: YES), the processing unit 10 terminates the information processing of the first embodiment. On the other hand, if the determination in step S7 is to continue the information processing of the first embodiment (step S7: NO), the processing unit 10 returns to step S1 and continues the series of information processing described above.
[0029] Here, the tag information that is received as part of the above response information (see step S5: YES) and recorded in association with the utterance V1, etc. (see step S6 above) is, for example, the tag information TG1 shown on the right of Figure 4(a), which was included in the response information RS1 corresponding to the prompt PT1 shown on the left of Figure 4(a), and is recorded in the recording unit 5. At this time, when compared with the content of the original utterance V1, the tags [perfume] and [interest] included in the tag information TG1 are tags assigned from the content of the utterance V1 by the above morphological analysis method, and the other tags included in the tag information TG1 are tags assigned to indicate the context of the content of the utterance V1.
[0030] In addition to those illustrated in Figure 4(a), several other examples of prompt and response information exchanged between the language model system 100 and the information processing system in the first embodiment are also possible, as shown below. (a) Example 1 As a first example corresponding to the first embodiment, the prompt generation unit 2 can generate a prompt PT2 that specifies the format of the tags included in the tag information included in the response information from the language model system 100, such as the instruction statement OR2 included in the prompt PT2 of the first embodiment illustrated on the left of Figure 4(b), and send it to the language model system 100. In this case, the tag information TG2 included in the response information RS2 will contain multiple tags in the format specified by the instruction statement OR2, as illustrated on the right of Figure 4(b), and will be received and recorded in the recording unit 5. With such a prompt PT2, the output format of the tag information can be standardized.
[0031] (b) Example 2 Next, as a second example corresponding to the first embodiment, for example, when a user utters an utterance V2 having the content illustrated in Figure 4(c) left, the prompt generation unit 2 can generate a prompt PT3 that includes the utterance V2 and an instruction statement OR3 requesting tag information from the language model system 100, which includes the utterance V2 and multiple tags (see (ii) above) indicating the category (attribute) to which the utterance V2 belongs, and send it to the language model system 100. In this case, the tag information TG3 included in the response information RS3 includes the multiple tags specified by the instruction statement OR3, as illustrated in Figure 4(c) right, and is received and recorded in the recording unit 5. At this time, when compared with the content of the original utterance V2, the tags [weekend] and [going out] included in the tag information TG3 are tags assigned from the content of utterance V2 by the morphological analysis method described above, and the other tags included in the tag information TG3 are tags assigned to indicate the category to which the content of utterance V3 belongs.
[0032] (c) Third example Next, as a third example corresponding to the first embodiment, for example, when a user utters an utterance V3 having the content illustrated in Figure 4(d) left, the prompt generation unit 2 can generate a prompt PT4 that includes the utterance V3 and an instruction statement OR3 similar to the second example above, requesting tag information including multiple tags (see (ii) above) indicating the category (attribute) to which the utterance V3 belongs, and transmit it to the language model system 100. In this case, the tag information TG4 included in the response information RS4 includes the multiple tags specified by the instruction statement OR4, as illustrated in Figure 4(d) right, and is received and recorded in the recording unit 5. When compared with the content of the original utterance V3, the tags [this week], [busy], [assignment], [class], and [preparation] included in the tag information TG4 are tags assigned from the content of the utterance V3 by the morphological analysis method described above, while the remaining tag [deadline] is a tag assigned to indicate the category to which the content of the utterance V3 belongs.
[0033] (d) Example 4 Next, as a fourth example corresponding to the first embodiment, for example, when a user utters an utterance V1 having the content illustrated in Figure 5 left, the prompt generation unit 2 can generate a prompt PT5 that includes an instruction statement OR4 requesting tag information from the language model system 100, which includes the utterance V1, a plurality of tags indicating the context of the content of the utterance V1 (see (i) above), and a tag indicating who the utterance V1 is directed to (i.e., who the utterance is directed at), and send it to the language model system 100. In this case, the tag information TG5 included in the response information RS5 includes the plurality of tags specified by the instruction statement OR4, as illustrated in Figure 5 right, and is received and recorded in the recording unit 5. At this time, when the content of the original utterance V1 and the content of the instruction statement OR4 are considered, the tags [perfume] and [interest] included in the tag information TG5 are tags assigned from the content of the utterance V1 by the morphological analysis method described above, and the remaining tags [lack of knowledge] and [hobby] are tags assigned to indicate the context of the content of the utterance V1. Furthermore, the tag [Unspecified] is assigned to indicate the target of the utterance. It is desirable that the response information RS5 be recorded in the recording unit 5 as an attribute, for example, a "Disclosure Scope" tag.
[0034] Here, regarding the target of the utterance in the fourth example above, for example, the user's own utterances, such as so-called "reminders" or "schedules" for the user themselves, are tagged with [myself]. This makes it possible to configure the system so that, for example, when quoting the user's own utterance in an utterance addressed to someone other than the user, it quotes an utterance recorded other than the one tagged with [myself], or an utterance tagged with [unspecified].
[0035] Furthermore, regarding the target of the utterance in the fourth example above, for example, utterances directed at specific individuals other than the user, or utterances (conversations) within a group to which the user belongs, it is possible to record them in the recording unit 5 with tags that limit the scope of disclosure to only those specific individuals, the group itself, or those included in that group. Specifically, such tags can be assigned and recorded, for example, to the person or group identified in the utterance.
[0036] When using the above-mentioned tag [Disclosure Scope], identifying specific individuals or groups within the disclosure scope can be done, for example, in a face-to-face conversation, by using the voiceprint of the participant as a clue, or by identifying the person being spoken to from the specific name included in the participant's utterance. In this case, if a nickname is used, the person can be identified by having the other party verbally confirm the name registered as their own. In contrast, in the case of so-called online conversations, the person can be identified using the identification information (ID) used in that online conversation, and in the case of voice-only conversations (calls), the voiceprint can be recorded and used for identification. Using this configuration, it is also possible to associate tag information recorded during face-to-face conversations with tag information recorded during online conversations.
[0037] Furthermore, in the fourth example above, if a group conversation is taking place, it is necessary to identify the participants within that group. In this case, voiceprint authentication can be used to recognize the number of participants. On the other hand, if a camera capable of capturing images of each participant can be used, more accurate authentication of the speaker can be achieved by matching the authentication results obtained using, for example, artificial intelligence lip-reading technology based on the image captured by the camera with the content of the utterances associated with each participant's voiceprint. Moreover, by using such an authentication method, it becomes possible to associate the tag information assigned to online conversations with that of face-to-face conversations. In the case of group conversations, it is preferable to set the disclosure scope indicated by the tags separately for the group itself and for each of the individuals participating in that group.
[0038] As explained above, according to the information processing in the conversation support system SS including the terminal device T of the first embodiment, tag information TG1, etc., including tags generated by the large-scale language model MD, is recorded in association with the content of the user's utterance (see Figure 3). For example, when using the tag information TG1, etc., to identify past related utterances by that user or to identify relationships with other people, the accuracy of such identification can be improved.
[0039] Furthermore, since user identification information IDs for identifying users are also recorded in association (see Figure 2(b)), the accuracy of identifying past related utterances in relation to the user can be further improved by using this tag information TG1, etc.
[0040] Furthermore, according to the fourth example above, tag information TG5, which further includes tags indicating the scope of disclosure of the utterance, is recorded in association with the data, thus further improving the accuracy of identifying past related utterances using the tag information TG5.
[0041] Furthermore, according to the fourth example above, utterances from multiple parties, including the user of terminal device T, are acquired, and tag information, which includes a tag that limits the scope of disclosure of the content of each utterance to only those parties, is recorded in association with each utterance (see Figure 5). This improves the accuracy of identifying past related utterances among multiple parties using the tag information.
[0042] Furthermore, according to the fourth example above, a group is formed by multiple persons, including the user of terminal device T. Each of these persons and / or the group is identified, and tag information, which further includes a tag that limits the scope of disclosure of the content of each utterance to within that group, is recorded in association with each utterance, etc. Therefore, the accuracy of identifying past related utterances within the group using this tag information can be improved.
[0043] Furthermore, since the utterance V1 etc. is a user utterance consisting of a sentence, and the content of that utterance is transcribed into text (see Figure 3, step S2), tag information TG1 etc., which includes tags generated by the large-scale language model MD, is recorded in association with the content of the user's utterance. For example, when using the tag information TG1 etc. to identify past related utterances by that user or to identify their relationships with other people, the accuracy of such identification can be improved.
[0044] [Differentiation] (a) First variation Next, a first modified example, which corresponds to the first embodiment, will be described using Figures 6 and 7. Figure 6 is a flowchart showing the information processing of the first modified example, and Figure 7 is a diagram illustrating the prompt and its response information in the information processing of the first modified example.
[0045] In the first embodiment described above, the tag information TG1 etc. attached to the utterance V1 etc. is associated with the user identification information ID etc. and recorded in the recording unit 5 as utterance-related tag information 50. In contrast, in the first modified example described below, along with the tag information TG1 etc., the response sentence spoken as response information to the utterance V1 etc. is also acquired from the language model system 100. The hardware configuration of the conversation support system in the first modified example is basically the same as that of the conversation support system SS in the first embodiment. Therefore, in the following description, components that are the same as those in the conversation support system SS in the first embodiment will be given the same component numbers and detailed explanations will be omitted. Also, among the information processing of the first modified example executed in the conversation support system of the first modified example, processing that is the same as the information processing of the first embodiment (see Figure 3) will be given the same step numbers and detailed explanations will be omitted.
[0046] In other words, as shown in Figure 6, the information processing of the first modified example begins with the execution of steps S1 and S2, which are the same as those in the first embodiment. Next, the prompt generation unit 2 of the first modified example generates a prompt PT6 (step S10) that includes the utterance V4, tag information including a plurality of tags (see (ii) above) indicating the category (attribute) to which the utterance V4 belongs, and an instruction statement OR5 requesting the language model system 100 to provide a response sentence for a concise response (response utterance) to the utterance V4, when the user utters an utterance V4, for example, as illustrated in Figure 7(a) left. Subsequently, the prompt PT6 is transmitted to the language model system 100 by steps S4 and S5, which are the same as those in the information processing of the first embodiment, and response information RS6 (see Figure 7(a) right) is obtained. In this case, the tag information TG6 included in the response information RS6 is received with the plurality of tags specified by the instruction statement OR5, as illustrated in Figure 7(a) right. In this case, when compared with the content of the original utterance V4, the tags [Cooking] and [Repertoire] included in the tag information TG6 are tags assigned from the content of utterance V4 using the morphological analysis method described above, while the other tags included in the tag information TG6 are tags assigned to indicate the category to which the content of utterance V4 belongs.
[0047] In addition to the above, the response information RS6 also includes the response statement RB1 specified by the instruction statement OR5, which is received. At this time, the content of the response statement RB1 is a response statement that includes suggestions, etc., corresponding to the content of the utterance V4. Subsequently, the tag information TG6 and the response statement RB1 are associated with the original utterance V4 and user identification information ID, etc., and recorded in the recording unit 5 as utterance-related tag information 50 (step S11). Then, the output generation unit 6 of the processing unit 10 generates information to be presented (broadcast) to the user via the speaker 14 based on the response statement RB1 recorded in the recording unit 5 in association with each other, and outputs it to the speaker 14. As a result, the speaker 14, under the control of the processing unit 10, broadcasts the response statement RB1 corresponding to the information output from the output generation unit 6 (step S12). The response statement RB1 itself may also be configured to be output to the display 13. After that, the processing unit 10 proceeds to the processing of step S7, which is the same as the information processing of the first embodiment.
[0048] Furthermore, as another example of the prompt and response sentences of the first modified example, if the user utters an utterance V5 having the content illustrated in Figure 7(b) left, a prompt PT7 is generated and sent to the language model system 100, which includes the utterance V5, tag information including multiple tags (see (ii) above) indicating the category (attribute) to which the utterance V5 belongs, and an instruction OR5 requesting the language model system 100 to provide a concise response (response utterance) to the utterance V5. Response information RS7 (see Figure 7(b) right) to the prompt PT7 is then obtained (see steps S10, S4, and S5 in Figure 6). In this case, the tag information TG7 included in the response information RS7 is received, which includes multiple tags specified by the instruction OR5, as illustrated in Figure 7(b) right. In this case, when compared with the content of the original utterance V5, the tags [perfume], [new purchase], and [boredom] included in the tag information TG7 are tags assigned from the content of utterance V5 using the morphological analysis method described above, while the other tags included in the tag information TG7 are tags assigned to indicate the category to which the content of utterance V5 belongs.
[0049] In addition, the response information RS7 also includes the response statement RB2 specified by the instruction statement OR5. At this time, the content of the response statement RB2 is a response statement that includes suggestions corresponding to the content of the utterance V5. Subsequently, the tag information TG7 and the response statement RB2 are associated with the original utterance V5 and user identification information ID, etc., and recorded in the recording unit 5 as utterance-related tag information 50 (see step S11 in Figure 6), and the response statement RB2 is then presented to the user via the speaker 14 or display 13 (see step S12 in Figure 6).
[0050] Here, other examples of the content of instruction OR5 in prompt PT6 or prompt PT7 of the first modified example will be described below. More specifically, in a configuration that requests tag information and a response sentence in a single prompt, such as instruction OR5 in the first modified example, the processing load on the entire conversation support system SS or the terminal device T can be reduced.
[0051] Another example of this is to separate the prompt requesting a response sentence and the prompt requesting tag information, and configure the system to generate and send both simultaneously to the language model system 100. In this case, the response speed when obtaining response information from the language model system 100 will be faster, and the recording of tag information to the recording unit 5 can also be performed in real time.
[0052] Another example is to configure the system to generate and send prompts requesting response sentences in real time, while simultaneously generating and sending prompts requesting tag information later. In this case, the overall processing load of the conversation support system SS is reduced, while response sentences are acquired and recorded in real time, and tag information can also be recorded later.
[0053] As another example, if the terminal device T can determine in advance whether or not a response sentence can be obtained from the language model system 100, the system may be configured to generate and send a prompt for tag information and a prompt for response sentence simultaneously or in a time-separated manner if a response sentence can be obtained, and to generate and send only a prompt for tag information if a response sentence cannot be obtained. In this case, prompt generation can be made more efficient.
[0054] As explained above, according to the information processing of the first modified example, prompts PT6, etc. are generated to cause the large-scale language model MD to further generate response statements RB1, etc. Since the response information corresponding to prompts PT6, etc. includes response statements RB1, etc., the response statements RB1, etc. corresponding to tag information TG6, etc. can be acquired along with tag information TG6, etc. and recorded in the recording unit 5.
[0055] Furthermore, if the generation of tag information TG6 and response statements RB1 can be performed by a single prompt PT6, the overall processing load on the conversation support system SS can be reduced.
[0056] Furthermore, even when prompts for requests such as tag information TG6 and prompts for requests such as response statements RB1 are generated separately and sent to the language model system 100 at different times, the overall processing load on the conversation support system SS can be reduced.
[0057] Furthermore, if a prompt for a request such as response statement RB1 is sent before a prompt for a request such as tag information TG6, the response information, including response statement RB1, can be received quickly while reducing the processing load on the entire conversation support system SS.
[0058] (stomach) Second variation Next, a second modification, which is another variation corresponding to the first embodiment, will be described using Figures 8 and 9. Figure 8 is a flowchart showing the information processing of the second modification, and Figure 9 is a diagram illustrating the prompt and its response information in the information processing of the second modification.
[0059] In the first embodiment described above, the tag information TG1 etc. attached to the utterance V1 etc. is associated with user identification information ID etc. and recorded in the recording unit 5. In contrast, in the second modified example described below, when a new utterance is acquired, past utterances of other users that have tags common to the tag information attached to the utterance are searched, and a response sentence for the new utterance is acquired from the language model system 100 based on the searched past utterances. The hardware configuration of the conversation support system in the second modified example is basically the same as that of the conversation support system SS in the first embodiment. Therefore, in the following description, components that are the same as those in the conversation support system SS in the first embodiment will be given the same component numbers and detailed explanations will be omitted. Also, among the information processing of the second modified example executed in the conversation support system of the second modified example, processes that are the same as the information processing of the first embodiment (see Figure 3) will be given the same step numbers and detailed explanations will be omitted.
[0060] In other words, as shown in Figure 8, the information processing in the second modified example begins when a new utterance is made by the user (step S1: YES). First, the same processing steps S1 to S6 as in the first embodiment are performed to record tag information associated with the utterance (step S6). Next, the processing unit 10 of the second modified example searches within the recording unit 5 for past utterances made by people other than the user who made the utterance acquired in step S1, which are recorded in the recording unit 5 and contain as many tags as possible that are common with the tags included in the tag information recorded in step S6, and whose disclosure scope is, for example, "unspecified" (step S15). At this time, the processing unit 10 may, for example, if the utterance for which new tag information was recorded in the processing up to step S6 is an utterance targeting the immediate future (for example, the day before or after today), such as "today" or "tomorrow," limit the search range in step S15 to the week before and after the utterance in step S1, or search for recent utterances that have one or more common tags. Furthermore, the user may be allowed to change this search range.
[0061] Next, the prompt generation unit 2 of the second modified example generates a prompt (step S16) requesting the language model system 100 to provide a concise response (response utterance) to the user who made the utterance, based on the content of the utterance retrieved in step S15. More specifically, for example, if an utterance V6 having the content illustrated in Figure 9(a) left is retrieved as a past utterance (see step S15), and a new utterance V7 is acquired (see step S1), the prompt generation unit 2 of the second modified example generates a prompt PT8 that includes utterances V6 and V7, and an instruction statement OR6 requesting the language model system 100 to provide a response (response utterance) based on utterances V6 and V7. Subsequently, the prompt PT8 is transmitted to the language model system 100 by steps S4 and S5, similar to the information processing in the first embodiment, and response information RS8 (see Figure 9(a) right) is acquired. In this case, the response information RS8 is received containing a response message RB3 for the user who made the new utterance V7 (in the case of Figure 9(a), "BB"), as specified by the instruction message OR6. At this time, the content of the response message RB3 is a response message that corresponds to the new utterance V7 and includes suggestions based on the content of the previous utterance V6. Subsequently, the output generation unit 6 of the processing unit 10 presents the response message RB3 to the user who made the new utterance V7 via the speaker 14 or display 13 (step S17). The processing unit 10 then proceeds to the processing of step S7, which is the same as the information processing in the first embodiment.
[0062] Furthermore, as another example of the prompt and response sentences in the second modified example, if a user utters an utterance V8 having the content illustrated in Figure 9(b) left, and another user utters another utterance V9, a prompt PT9 is generated and sent to the language model system 100, which includes utterances V8 and V9, and an instruction OR7 requesting the language model system 100 to provide a response sentence to each user based on a common tag assigned to each of utterances V8 and V9. Response information RS9 for the prompt PT9 (see Figure 9(b) right) is then obtained (see steps S15, S16, S4, and S5 in Figure 8). In this case, the order of utterances V8 and V9 is irrelevant. The response information RS9 received in this case includes the response sentence RB4 specified by the instruction OR7. At this time, the content of response statement RB4 corresponds to the content of utterance V8 and utterance V9, respectively, and is a response statement that is presented to both the user who uttered utterance V8 and the user who uttered utterance V9. Subsequently, response statement RB4 is presented to each user via speaker 14 or display 13 (see step S17 in Figure 8).
[0063] As explained above, according to the information processing of the second modified example, in addition to the effects of the information processing of the first embodiment, it becomes possible to present multiple users with response sentences that connect each user based on the content of their respective utterances.
[0064] (II) Second Example Next, a second embodiment, which is another embodiment corresponding to the embodiment, will be described using Figures 10 to 12. Figure 10 is a flowchart showing the information processing of the second embodiment, and Figures 11 and 12 are diagrams illustrating the prompt and response information in the information processing of the second embodiment, respectively.
[0065] In the first embodiment described above, the tag information TG1 etc. attached to the utterance V1 etc. is associated with the user identification information ID etc. and recorded in the recording unit 5. In contrast, in the second embodiment described below, in addition to the tag information TG1 etc. attached to the utterance V1 etc., tag information attached to article information such as newspapers and magazines that can be obtained online via the network NW, and review information corresponding to reviews of facilities such as commercial facilities and events held there (including event information indicating the event itself) is obtained from the language model system 100. In this case, the above-mentioned article information and review information are examples of the information that can be obtained in the embodiment. Here, "review" generally refers to text or information that evaluates various things such as products and services, movies and books, music, etc., and conveys their content and value. Then, based on the tag information corresponding to the above-mentioned article information and review information, and the tag information TG1 etc. attached to the utterance V1 etc., a response sentence to be presented to the user who uttered the utterance V1 etc. is obtained from the language model system 100. In this case, the response sentence will be a response sentence related to the above-mentioned article information and review information. Note that the subject of the review in the above-mentioned review information is not limited to the above-mentioned facilities and events, but may also be products, movies, books, music, real estate properties, smartphone or personal computer applications, or companies etc.
[0066] Furthermore, the hardware configuration of the conversation support system in the second embodiment is basically the same as that of the conversation support system SS in the first embodiment. Therefore, in the following description, components that are the same as those in the conversation support system SS in the first embodiment will be given the same component numbers, and detailed explanations will be omitted. Also, among the information processing of the second embodiment executed in the conversation support system of the second embodiment, processing that is the same as the information processing of the first embodiment (see Figure 3) will be given the same step numbers, and detailed explanations will be omitted.
[0067] In other words, as shown in Figure 10, the information processing in the second embodiment begins with the execution of steps S1 to S6, which are the same as in the first embodiment. Tag information is then assigned to the user's utterance acquired in step S1 by the language model system 100 and recorded in the recording unit 5. Here, the tag information that is received (see step S5: YES) and recorded in association with the utterance acquired in step S1 (indicated as utterance V10 in Figure 11(a) left) (see step S6 above) is recorded in the recording unit 5 as response information RS10 corresponding to the prompt PT10 exemplified in Figure 11(a) left, which includes the instruction statement OR1, similar to the first embodiment. For example, response information RS10 including tag information TG8 exemplified in Figure 11(b) right is recorded in the recording unit 5. In this case, when compared with the content of the original utterance V10, the tags [ABC], [Now], and [Perfume] included in the tag information TG8 are tags assigned from the content of utterance V10 by the morphological analysis method described above, while the other tags included in the tag information TG8 are tags assigned to indicate the context of the content of utterance V10.
[0068] Next, the processing unit 10 of the second embodiment searches for article information and review information that can be searched via the network NW, using the same method as before, to find article information and review information that contains as many nouns as tags included in the tag information TG8 recorded in the recording unit 5 in step S6 as possible, either via the network NW or from those already recorded in the terminal device T (step S20). At this time, the processing unit 10 may search for all article information and review information that contains the noun (noun as a tag) in a number greater than or equal to a preset threshold, or it may search for all article information and review information that contains the noun multiple times. Note that the article information and review information searched in step S20 will already be in text form at that point.
[0069] Next, the prompt generation unit 2 of the second embodiment generates a prompt based on the retrieved article information and review information (step S21). At this time, if, for example, article information A1 having the content illustrated in Figure 11(b) left is retrieved, the prompt generation unit 2 generates a prompt PT11 of the second embodiment that includes the article information A1 and an instruction statement OR8 requesting tag information including a plurality of tags (see (i) above) indicating the context of the content of the article information A1 from the language model system 100. At this time, the prompt generation unit 2 may include an instruction statement in the instruction statement OR8 requesting tag information including a plurality of tags (see (ii) above) indicating the category (attribute) to which the article information A1 belongs from the language model system 100. After that, the prompt generation unit 2 outputs the generated prompt PT11 together with the terminal identification information to the language model system 100 via the interface 3 and the network NW (step S22).
[0070] Next, the processing unit 10 determines whether or not it has received response information containing the desired tag information from the language model system 100 corresponding to the prompt PT11 (step S23). If the determination in step S23 is that no response information has been received (step S23: NO), the processing unit 10 waits until the response information is received. On the other hand, if the determination in step S23 is that the response information has been received (step S23: YES), the recording unit 5 of the processing unit 10 records the tag information contained in the received response information in association with the original article information A1, the corresponding date and time information TM and location information PS, and the user identification information ID, etc. (step S24).
[0071] Here, the tag information that is received as part of the above response information (see step S23: YES) and recorded in association with article information A1 (see step S24 above) is, for example, the tag information TG9 shown on the right of Figure 11(b), and the summary sentence AB1 which summarizes article information A1, which were included in the response information RS11 corresponding to the prompt PT11 shown on the left of Figure 11(b), and these are recorded in the recording unit 5. At this time, when compared with the content of the original article information A1 and its summary sentence AB1, the tags [perfume], [○○Beauty] and [fragrance festival] included in the tag information TG9 are tags assigned by the above morphological analysis method from the content of article information A1 or the content of summary sentence AB1, and the other tags included in the tag information TG11 are tags assigned to indicate the context of the content of article information A1 or the content of summary sentence AB1.
[0072] Subsequently, the processing unit 10 compares the tags included in the tag information recorded in step S6 (tags assigned to the user's utterance) with the tags included in the tag information recorded in step S24 (tags assigned to article information A1), and determines that article information and review information with multiple matching tags will be presented to the user whose utterance was acquired in step S1 (step S25). At this time, the processing unit 10 determines that article information and review information with more matching tags will be ranked higher in the presentation order to the user, and further identifies the target of the determined article information or review information. If there are multiple articles or review information with overlapping targets, it is preferable that the processing unit 10 determines that the article information or review information with the most matching tags among the article information or review information for the same target will be presented to the user.
[0073] As a result, the output generation unit 6 of the processing unit 10 presents the article information or review information determined in step S25 to the user whose utterance was acquired in step S1 via the speaker 14 or display 13 (step S26). After that, the processing unit 10 proceeds to the processing in step S7, which is the same as the information processing in the first embodiment.
[0074] In addition to those illustrated in Figure 11, several other examples of prompt and response information exchanged between the language model system 100 and the information processing system in the second embodiment are also possible, as shown below. (a) Example 1 As a first example corresponding to the second embodiment, information processing of the second embodiment targeting utterances from multiple users can be cited. In this first example of information processing, the processing unit 10 first identifies tags common to both (hereinafter referred to as "common utterance tags") from the context-based tags of the content of the utterances of the first user recorded in the recording unit 5 and the context-based tags of the utterances of the second user, which have also been recorded. The processing unit 10 then searches for article information or review information containing the same noun as the common utterance tag (see step S20 in Figure 10) and assigns tag information to the searched article information or review information (or the subject of the review) (see steps S21 to S24 in Figure 10). The processing unit 10 then compares the common speech tag with the tag information recorded in step S24, and determines the article information and review information for which matching tags exist to be presented to each user (see step S25 in Figure 10). The system may then be configured to present the determined article information or review information to each user (see step S26 in Figure 10). In this case, the article information or review information may be presented on each of the terminal devices T of the multiple users, or it may be presented on the terminal device T of only one user.
[0075] More specifically, for example, as shown in the instruction OR9 included in the first example prompt PT12 illustrated on the left of Figure 12, when a first user (Mr. AA in Figure 12) and a second user (Mr. BB in Figure 12) are identified, each having a tag including a common speech tag attached to their utterances, the prompt generation unit 2 generates a prompt PT12 requesting the first user to acknowledge the existence of the second user who shares common hobbies or preferences (i.e., has the common speech tag attached) and to introduce review information (see step S25 in Figure 10) about events common to both of them, and transmits it to the language model system 100. In this case, the response information RS12 is received with the review information PR of the event specified by the instruction OR12, as illustrated on the right of Figure 12. Subsequently, the output generation unit 6 presents the review information PR to the first user's terminal device T via its speaker 14 or display 13 (see step S26 in Figure 11). Furthermore, review information PR may be displayed on the terminal devices T of all users, not just one user.
[0076] (b) Other examples Next, as another example corresponding to the second embodiment, for example, with respect to article information on a network NW, article information that has been pre-tagged based on context and summary from the provider of the article information may be used and recorded in the terminal device T. In this case, a database for such recording can be used, and the system can be configured to search within that database as appropriate.
[0077] Furthermore, regarding the review information of the second embodiment, it is also possible to record review information previously spoken by other users (for example, "I went to XX. It was fun.") within the terminal device T and utilize this as the review information of the second embodiment. In this case, priority may be given to utilizing review information from the user of the terminal device T itself.
[0078] Furthermore, when searching based on tag information assigned to the user's utterances on terminal device T, the article information may contain both event information and other information (for example, general news, blogs, or reviews). In this case, a so-called weighting process may be performed to make event information related to events more likely to be displayed. In this case, since the above event information is often related to the duration of the event, it is conceivable to configure the system to increase the weighting so that events with shorter durations are more likely to be displayed.
[0079] Furthermore, the system may be configured to present users with information posted on so-called SNS (Social Network Service) instead of article information or review information as in the second embodiment. In this case, it is preferable to determine the posted information to be presented to the user by a two-step process, for example, as described in (a) and (b) below. (a) Search for social media posts that have tags (which may be assigned by the user themselves) that correspond to nouns included in the user's utterances. (i) If the search results find posted information with the relevant tags, the content will be tagged according to its context, and posted information in which multiple tags match the utterance will be extracted and presented to the user.
[0080] As explained above, according to the information processing of the second embodiment, summary texts AB1 and review information PR are presented to the user based on tag information corresponding to the user (see steps S20 to S26 in Figure 10), so that summary texts AB1 etc. that are in line with the user's preferences can be presented to the user.
[0081] Furthermore, when other tags corresponding to common utterance tags found in the tags associated with each utterance of multiple users are searched, and article information or review information to which those other tags are attached is presented, article information or review information that reflects the common interests of the multiple users can be presented to each of those users.
[0082] Furthermore, since at least one of the article information or review information corresponding to the tags attached to the user's utterance is presented, a wide range of article information and other content tailored to the user's preferences can be presented to that user.
[0083] Furthermore, a prompt PT10 is generated and sent to cause the large-scale language model MD to generate tag information TG8 attached to the user's utterance. In response to the prompt PT10, article information or reviews containing the tags included in the tag information TG8 generated by the large-scale language model MD are retrieved. A prompt PT11 is generated and sent based on the article information or review information, and in response to the prompt PT11, tag information TG9 generated by the large-scale language model MD is retrieved. Thus, article information or review information that is more in line with the user's preferences can be presented to the user.
[0084] Furthermore, since the system acquires article information or review information that includes noun tags and is contained in the acquired tag information TG8, it is possible to present article information or review information that is in line with the user's preferences while reducing the processing load in the conversation support system of the second embodiment.
[0085] Furthermore, by assigning tags to user utterances that consist of sentences and generating article or review information tailored to the user based on those tags, it is possible to present the user with article or review information that is relevant to the content of their utterance.
[0086] (III) Third Example Next, a third embodiment, which is yet another embodiment corresponding to the embodiment, will be described using Figures 13 and 14. Figure 13 is a flowchart showing the information processing of the third embodiment, and Figure 14 is a diagram illustrating the prompt and response information in the said information processing.
[0087] In the first embodiment described above, the tag information TG1 etc. attached to the utterance V1 etc. is associated with user identification information ID etc. and recorded in the recording unit 5. In contrast, in the third embodiment described below, in addition to the tag information TG1 etc. attached to the utterance V1 etc., tag information attached to bookmark information used in apps used by the user or websites on a network NW such as the internet is obtained from the language model system 100. Here, "bookmark information" generally refers to information that indicates the location on the network NW of information that the user wishes to view later or frequently because it matches the user's preferences, and is often in the form of text information. Based on the tag information corresponding to the above bookmark information and the tag information TG1 etc. attached to the utterance V1 etc., the bookmark information itself or related information related to the bookmark information is obtained from the language model system 100 to be presented to the user who uttered the utterance V1 etc.
[0088] Furthermore, the hardware configuration of the conversation support system in the third embodiment is basically the same as that of the conversation support system SS in the first embodiment. Therefore, in the following description, components that are the same as those in the conversation support system SS in the first embodiment will be given the same component numbers, and detailed explanations will be omitted. Also, among the information processing of the third embodiment executed in the conversation support system of the third embodiment, processing that is the same as the information processing of the first embodiment (see Figure 3) will be given the same step numbers, and detailed explanations will be omitted.
[0089] In other words, as shown in Figure 13, the information processing in the third embodiment involves the same steps S1 to S6 as in the first embodiment, and tag information is assigned to the user's utterance acquired in step S1 by the language model system 100 and recorded in the recording unit 5. Here, the tag information that is received (see step S5: YES) and recorded in association with the utterance acquired in step S1 (indicated as utterance V10 in Figure 14(a) left) (see step S6 above) is recorded in the recording unit 5 as response information RS10 corresponding to the prompt PT10 exemplified in Figure 14(a) left, which includes the instruction statement OR1, similar to the first embodiment. For example, response information RS10 including tag information TG8 exemplified in Figure 14(b) right is recorded in the recording unit 5. At this time, the content of the tags included in the tag information TG8 compared with the content of the original utterance V10 is the same as in the case of the second embodiment exemplified in Figure 11(a).
[0090] Meanwhile, in parallel with the processing in steps S1 to S6 above, the processing unit 10 of the third embodiment acquires the bookmark information via the network NW, for example, periodically (step S30). Next, the prompt generation unit 2 of the third embodiment generates a prompt of the third embodiment based on the acquired bookmark information, for example, in accordance with the timing when the bookmark information was acquired (step S31). At this time, if the bookmark information BK having the content illustrated in the left of Figure 14(b) is acquired including character information, the prompt generation unit 2 generates a prompt PT13 that includes the bookmark information BK and an instruction statement OR10 requesting tag information from the language model system 100, which includes a summary of the content of the bookmark information BK and a plurality of tags (see (i) above) indicating the context of the summary. Subsequently, the prompt generation unit 2 outputs the generated prompt PT13 together with the terminal identification information to the language model system 100 via the interface 3 and the network NW (step S32).
[0091] Next, the processing unit 10 determines whether or not it has received response information containing the desired tag information from the language model system 100 corresponding to the prompt PT13 (step S33). If the determination in step S33 is that no response information has been received (step S33: NO), the processing unit 10 waits until the response information is received. On the other hand, if the determination in step S33 is that response information has been received (step S33: YES), the recording unit 5 of the processing unit 10 records the tag information contained in the received response information in association with the original bookmark information BK, the corresponding date and time information TM and location information PS and user identification information ID, etc. (step S34).
[0092] Here, the tag information that is received as part of the above response information (see step S33:YES) and recorded in association with the bookmark information BK (see step S34 above) is, for example, the tag information TG10 and summary sentence AB2 that summarizes the bookmark information BK, as exemplified on the right of Figure 14(b), which were included in the response information RS13 corresponding to the prompt PT13 exemplified on the left of Figure 14(b). At this time, when compared with the content of the original bookmark information BK and its summary sentence AB2, the tags [perfume] and [fashion] included in the tag information TG10 are tags assigned by the above morphological analysis method from the content of the bookmark information BK or the content of the summary sentence AB2, and the other tags included in the tag information TG10 are tags assigned to indicate the context of the content of the bookmark information BK or the content of the summary sentence AB2.
[0093] Subsequently, the processing unit 10 compares the tags included in the tag information recorded in step S6 (tags assigned to the user's utterance) with the tags included in the tag information recorded in step S34 (tags assigned to the bookmark information BK), and determines that the bookmark information BK with multiple matching tags will be presented to the user whose utterance was acquired in step S1 (step S35). At this time, the processing unit 10 determines that the bookmark information BK with more matching tags will be ranked higher in the presentation order to the user, and further identifies the information to be acquired from the network NW regarding the information contained in the determined bookmark information BK. At this time, if there are multiple bookmark information BKs with overlapping information, it is preferable for the processing unit 10 to determine that the bookmark information BK with the most matching tags among the bookmark information BKs with the same information will be presented to the user.
[0094] As a result, the output generation unit 6 of the processing unit 10 presents the bookmark information BK determined in step S35 to the user whose utterance was acquired in step S1 via the speaker 14 or display 13 (step S36). After that, the processing unit 10 proceeds to the processing in step S7, which is the same as the information processing in the first embodiment.
[0095] In addition to the example shown in Figure 13, other possible information processing methods for the third embodiment include those shown below. (a) Example 1 As a first example corresponding to the third embodiment, for example, the system may be configured to determine whether or not to present bookmark information BK to a user based on the frequency of the user's access to each information source (information source) on the network NW. In this case, for example, if the access frequency is above a preset threshold, the bookmark information BK determined in step S35 is presented immediately, while if the access frequency is below the threshold, the system may be configured to temporarily suspend the presentation of bookmark information BK even if it has been determined in step S35. (stomach) Example 2 As a second example corresponding to the third embodiment, the system can be configured to present search results for related information concerning the bookmark information BK instead of the bookmark information BK itself.
[0096] As explained above, according to the information processing of the third embodiment, the bookmark information used by the user is presented in accordance with the content of the user's utterance, thus increasing the convenience of the bookmark information BK for the user.
[0097] Furthermore, if tag information TG10 is acquired at the same time as bookmark information BK is acquired, tag information TG10 can be quickly added to bookmark information BK.
[0098] Furthermore, if bookmark information is presented based on the user's access history (e.g., frequency) to each information source, the convenience of the bookmark information for that user will be further enhanced.
[0099] Furthermore, if the frequency of a user's access to the above information sources exceeds a predetermined threshold frequency, and bookmark information BK is presented to the user immediately, it becomes possible to present bookmark information BK to the user in a manner that matches their usage.
[0100] Furthermore, since the bookmark information BK used by the user is presented in accordance with the content of the user's speech, the convenience of the bookmark information BK for that user is further enhanced.
[0101] In the embodiments described above, tags were assigned to the utterances as input from the user. However, the system may also be configured to assign tags to images taken by the user, text such as notes written by the user, or text messages exchanged between users as input. In this case, the characters or text messages input via the operation unit 11, etc., are input to the utterance acquisition unit 1 as character information, eliminating the need for the characterization process by the utterance acquisition unit 1 described above (see step S2 in Figures 3, 6, 8, 10, and 13). This configuration allows for the acquisition of character information from both voice input and text message input, enabling the use of the conversation support system SS in a wide range of scenarios and situations, such as when the user only uses text messages or when text messages and voice are used in combination. Furthermore, since tag information can be collected more broadly than when only voice input is used, the accuracy of various searches or specific actions can be improved.
[0102] Furthermore, in each of the embodiments described above, the processing unit 10 targeted all tag information attached to utterances input by the user for searching. However, instead, the number of times each tag information is attached may be accumulated for each user, and the tag information may be targeted for searching based on the number or frequency of such attachments. Specifically, tag information whose number of attachments is greater than or equal to a predetermined number (for example, a cumulative number of attachments of 10 or more), or whose attachment frequency is greater than or equal to a predetermined number within a predetermined period (for example, three or more attachments per week), may be targeted for searching. With this configuration, the accuracy of various searches or specifics can be further improved because the search targets are tag information that appears (uttered) frequently and / or frequently among the user's utterances.
[0103] Furthermore, it is possible to record programs corresponding to the flowcharts shown in Figures 3, 6, 8, 10, and 13 onto a recording medium such as an optical disc or hard disk, or to obtain them via a network such as the Internet, and then read and execute them on a general-purpose microcomputer, thereby making the microcomputer function as the processing unit 10 in each embodiment. [Explanation of Symbols]
[0104] 3. First acquisition means (interface) 3A Second acquisition method 6 Generation means (output generation section) 6A Presentation means 10 Processing Unit 100 Language Model Systems S Information Processing Device T terminal device SS Conversation Support System MD Large-Scale Language Model V1, V2, V3, V4, V5, V6, V7, V8, V9, V10 Utterance OR1, OR2, OR3, OR4, OR5, OR6, OR7, OR8, OR9, OR10 Directives PT1, PT2, PT3, PT4, PT5, PT6, PT7, PT8, PT9, PT10, PT11, PT12, PT13 prompt RS1, RS2, RS3, RS4, RS5, RS6, RS7, RS8, RS9, RS10, RS11, RS12, RS13 Response Information TG1, TG2, TG3, TG4, TG5, TG6, TG7, TG8, TG9, TG10 Tag Information RB1, RB2, RB3, RB4 response messages AB1, AB2 Summary text A1 Article Information BK Bookmark Information
Claims
1. A first acquisition means for acquiring first tag information that indicates one of the following: the relationships between the constituent elements of the user input, the pre-set attributes to which the content of the input belongs, or a summary of the content, and which includes at least a first tag attached to the input; A second acquisition means for acquiring second tag information which includes at least the second tag attached to acquireable information that can be obtained from an external source in addition to the aforementioned input, and which indicates one of the following: the relationship between the constituent elements of the acquireable information, a pre-set attribute to which the content of the acquireable information belongs, or a summary of the content, and which is attached to the acquireable information; A generation means for generating presentation information to be presented to the user based on the first tag information and second tag information acquired, A presentation means for presenting the generated presentation information to the user, An information processing device characterized by comprising:
2. In the information processing apparatus according to claim 1, The first acquisition means acquires each of the first tag information corresponding to each of the inputs of the multiple users, A means for identifying a common first tag that is common to each of the first tags included in each of the acquired first tag information, A search means for searching for the second tag information which includes the corresponding second tag, which is the second tag corresponding to the identified common first tag, Equipped with, The generation means generates the retrieveable information to which the corresponding second tag included in the retrieved second information tag is attached as the presented information. The presentation means is an information processing device characterized by presenting the generated presentation information to each of the users.
3. In the information processing apparatus according to claim 1 or claim 2, The information processing device is characterized in that the obtainable information is at least one of (i) article information or (ii) review information concerning the subject to be presented to the user, which can be obtained from an external source in addition to input.
4. In the information processing apparatus according to claim 1, Content information acquisition means for acquiring content information that indicates the content of the aforementioned input, A first instruction information generation means generates first instruction information for causing a large-scale language model to generate first tag information, which includes at least the first tag, based on the acquired content information, A first transmission means for transmitting the generated first instruction information to a device capable of using the large-scale language model, A first receiving means that receives the first tag information generated by the large-scale language model in response to the transmitted first instruction information from the device, Acquisitive information acquisition means for acquiring the acquireable information including the first tag included in the received first tag information, A second instruction information generation means generates a second instruction information for causing the large-scale language model to generate the second tag information, which includes at least the second tag, based on the acquired acquireable information, A second transmission means for transmitting the generated second instruction information to the device, Furthermore, The second acquisition means is characterized by receiving and acquiring the second tag information generated by the large-scale language model in response to the transmitted second instruction information from the device.
5. In the information processing apparatus according to claim 4, The information processing device is characterized in that the acquisition means for acquiring the acquisition information includes the first tag, which is a noun and is included in the received first tag information.
6. In the information processing apparatus according to claim 1, The input is a sentence uttered by the user. An information processing device characterized in that the content of the input is the content of the utterance.
7. An information processing method performed in an information processing apparatus comprising a first acquisition means, a second acquisition means, a generation means, and a presentation means, A first acquisition step in which the first acquisition means acquires first tag information that indicates the relationship between the constituent elements of the user input, a pre-set attribute to which the content of the input belongs, or a summary of the content, and includes at least a first tag attached to the input; A second acquisition step is to acquire, using the second acquisition means, second tag information which includes at least the second tag attached to acquireable information that can be obtained from an external source in addition to the aforementioned input, and which indicates one of the following: the relationship between the constituent elements of the acquireable information, a pre-set attribute to which the content of the acquireable information belongs, or a summary of the content, and which is attached to the acquireable information; A generation step in which, based on the acquired first tag information and second tag information, the generation means generates presentation information to be presented to the user, A presentation step of presenting the generated presentation information to the user using the presentation means, An information processing method characterized by including
8. The computer included in the information processing device A first acquisition means for acquiring first tag information that indicates the relationships between the constituent elements of user input, the pre-set attributes to which the content of the input belongs, or a summary of the content, and includes at least a first tag attached to the input. A second acquisition means for acquiring second tag information which includes at least the second tag attached to acquireable information that can be obtained from an external source in addition to the aforementioned input, and which indicates one of the following: the relationship between the constituent elements of the acquireable information, a pre-set attribute to which the content of the acquireable information belongs, or a summary of the content, and which is attached to the acquireable information. A generation means for generating presentation information to be presented to the user based on the first tag information and second tag information acquired, and Presentation means for presenting the generated presentation information to the user, An information processing program characterized by functioning as such.