Information processing device, information processing method, and information processing program
The information processing system addresses the limitations of conventional AI chatbots by enabling interactive character-driven conversations and advertisements, enhancing user engagement and monetization through personalized voice-over services and targeted ads.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- LY CORP
- Filing Date
- 2024-12-20
- Publication Date
- 2026-07-02
Smart Images

Figure 2026109933000001_ABST
Abstract
Description
Technical Field
[0006] , , , ,
[0001] The present invention relates to an information processing apparatus, an information processing method, and an information processing program.
Background Art
[0002] In an AI (artificial intelligence) chatbot service in which AI answers user questions, a technique for utilizing chatbot dialogue information performed on a user terminal and chat dialogue information between a plurality of user terminals is disclosed (see Patent Document 1).
Prior Art Documents
Patent Documents
[0003]
Patent Document 1
Summary of the Invention
Problems to be Solved by the Invention
[0004] However, the above conventional technology only obtains dialogue information in a chat executed on a user terminal, performs machine learning using the obtained dialogue information as teacher data to generate a learned model that inputs questions and outputs answers, generates an answer based on the generated learned model for an instruction sentence from a client terminal, and transmits the generated answer to the client terminal. Therefore, in the above conventional technology, there is room for improvement in terms of dialogue (conversation) with an imaginary (or real) character using AI and advertisement distribution by the character.
[0005] The present application has been made in view of the above, and an object thereof is to provide a technology for expressing or reproducing a character using AI and providing information that effectively utilizes the character.
Means for Solving the Problems
[0006] The information processing device according to the present invention is characterized by comprising: a provisioning unit that uses AI to provide a chat room in which multiple characters participate; a search unit that performs a search based on a search query entered by a user in the chat room; a setting unit that sets a character from among the multiple characters to provide the search results based on the search query; and a voice control unit that, in the chat room, causes the other characters from among the multiple characters to speak content other than the search results in the voice and tone of a character other than the selected character, and also causes the selected character to speak content related to the search results in the voice and tone of a character. [Effects of the Invention]
[0007] According to one embodiment of the invention, it is possible to provide a technology that uses AI to represent or reproduce a character and provides information that effectively utilizes the character. [Brief explanation of the drawing]
[0008] [Figure 1] Figure 1 is an explanatory diagram showing an overview of the information processing system according to the embodiment. [Figure 2] Figure 2 is an explanatory diagram showing an overview of the push notification advertisement according to the embodiment. [Figure 3] Figure 3 is an explanatory diagram illustrating the outline of a character group talk-style advertisement according to the embodiment. [Figure 4] Figure 4 is an explanatory diagram showing an overview of the OA-compatible voice reading function in the search flow according to the embodiment. [Figure 5] Figure 5 shows an example of the configuration of a terminal device according to this embodiment. [Figure 6] Figure 6 shows an example of the configuration of a server device according to the embodiment. [Figure 7] Figure 7 is a flowchart showing the first processing procedure according to the embodiment. [Figure 8] Figure 8 is a flowchart showing the second processing procedure according to the embodiment. [Figure 9]Figure 9 is a flowchart showing the third processing procedure according to the embodiment. [Figure 10] Figure 10 is a flowchart showing the fourth processing step according to the embodiment. [Figure 11] Figure 11 shows an example of a hardware configuration. [Modes for carrying out the invention]
[0009] The following describes in detail, with reference to the drawings, embodiments for implementing the information processing device, information processing method, and information processing program according to the present application (hereinafter referred to as "embodiments"). Note that these embodiments do not limit the information processing device, information processing method, and information processing program according to the present application. Furthermore, the same parts are denoted by the same reference numerals in the following embodiments, and redundant descriptions are omitted.
[0010] [1. Overview of the Information Processing System] First, with reference to Figure 1, an overview of the information processing system according to the embodiment will be described. Figure 1 is an explanatory diagram showing an overview of the information processing system according to the embodiment. As shown in Figure 1, the information processing system 1 according to the embodiment includes a terminal device 10 and a server device 100. The terminal device 10 and the server device 100 are connected to each other via a network N, either by wired or wireless means, enabling communication between them. This allows the terminal device 10 to cooperate with the server device 100. The network N is, for example, a LAN (Local Area Network), a WAN (Wide Area Network), or the Internet.
[0011] Terminal device 10 is an information processing device used by user U. For example, terminal device 10 may be a smart device such as a smartphone or tablet, or a PC (Personal Computer) such as a desktop or notebook (laptop). Alternatively, terminal device 10 may be a mobile phone such as a feature phone, a PDA (Personal Digital Assistant), a game console or AV equipment with communication functions, an information appliance or digital appliance, a car navigation system, a smartwatch or head-mounted display (HDD), a wearable device such as smart glasses, or an IoT (Internet of Things) compatible house or building, car, home appliance, electronic device, etc.
[0012] In this embodiment, the terminal device 10 is a smart device such as a smartphone or tablet used by user U, and is a mobile terminal device that can communicate with any server device via wireless communication networks such as LTE (Long Term Evolution), 4G (4th Generation), 5G (5th Generation), Bluetooth (registered trademark), or wireless LAN. The terminal device 10 also has a screen such as a liquid crystal display with touch panel functionality, and accepts various operations on displayed data such as content from user U using a finger or stylus, such as tapping, sliding, and scrolling. Operations performed on the area of the screen where content is displayed may also be considered as operations on the content. Furthermore, the terminal device 10 may be an information processing device such as a desktop PC or notebook PC, not just a smart device.
[0013] The server device 100 is, for example, a computer such as a PC or blade server, or a mainframe or workstation. The server device 100 may also be implemented through cloud computing.
[0014] In this embodiment, the server device 100 is an information processing device that cooperates with the terminal device 10 of each user U and provides API (Application Programming Interface) services and various data for the terminal device 10 of each user U, such as various applications (hereinafter referred to as apps). It is realized by a computer, a cloud system, or the like.
[0015] Also, the server device 100 may be an information processing device that provides some online service for the terminal device 10 of each user U. For example, as an online service, the server device 100 may provide services such as Internet connection, search service, advertisement distribution service, chat service, dialogue service by voice, image, video, etc., SNS (Social Networking Service), e-commerce (EC: Electronic Commerce), electronic payment, online game, online banking, online trading, accommodation and ticket reservation, video and music distribution, news, map, route search, route guidance, route information, operation information, weather forecast, etc. Actually, the server device 100 may cooperate with various servers that provide the above-mentioned online services and mediate the online services, or be responsible for the processing of the online services.
[0016] The server device 100 can acquire user information regarding the user U. For example, the server device 100 acquires information (attribute information) regarding attributes of the user U such as the gender, age, and residential area of the user U as user information. In addition, the server device 100 can acquire information regarding attributes such as the demographics (demographic attributes), psychographics (psychological attributes), geographics (geographical attributes), and behavioral (behavioral attributes) of the user U. Further, the server device 100 may acquire, as user information, the segment or persona (persona) to which the user U belongs in the field of marketing. Then, the server device 100 stores and manages information (attribute information) regarding the attributes of the user U together with the identification information (such as user ID) indicating the user U.
[0017] In addition, the server device 100 acquires various types of history information (log data) indicating the actions of the user U from the terminal device 10 of the user U or from various servers or the like based on the user ID or the like. For example, the server device 100 acquires a location history, which is a history of the location and time of the user U, from the terminal device 10. In addition, the server device 100 acquires a search history, which is a history of search queries input by the user U, from a search server (search engine). In addition, the server device 100 acquires a browsing history, which is a history of the content browsed by the user U, from a content server. In addition, the server device 100 acquires a purchase history (settlement history), which is a history of product purchases and settlement processes of the user U, from an e-commerce server or a settlement processing server. In addition, the server device 100 may acquire a listing history or a sales history, which is a history of the user U's listings on a marketplace, from an e-commerce server or a settlement processing server. In addition, the server device 100 acquires a posting history, which is a history of the user U's posts, from a posting server or an SNS server that provides a word-of-mouth posting service. Note that the above various servers or the like may be the server device 100 itself. That is, the server device 100 may function as the above various servers or the like.
[0018] Furthermore, the number of devices included in the information processing system 1 shown in Figure 1 is not limited to those illustrated. For example, in Figure 1, only one terminal device 10 is shown for the sake of illustration, but this is merely an example and not limiting; there may be two or more.
[0019] [2. Using characters in chat and talk apps] In recent years, the market size for "Vtuber," or streamers who use characters (avatars) to post videos or broadcast live, has been expanding. More than 60% of the revenue of Vtuber management companies comes from non-streaming revenue, mainly from content and merchandise sales based on the character's appeal. Therefore, it can be assumed that many users are willing to invest in content that leaves a tangible trace or can be enjoyed personally, rather than just receiving donations.
[0020] As mentioned above, the emerging VTuber market is thriving, but similarly, in Japan's proud anime culture (anime market), not only are dedicated fans investing in non-anime content, but overseas fans can also be considered. In other words, it is expected that new markets can be opened up by utilizing characters.
[0021] [2-1. Official character / person accounts with voice-over functionality] In this embodiment, in a chat / talk application such as a messaging app, a virtual space (talk room) is created where the user can have a one-on-one conversation with an official account (OA). When the user selects a character icon and asks a question in the talk room, an AI (Artificial Intelligence) such as a GPT (Generative Pre-trained Transformer) generates a response, which is then read aloud in the character's voice.
[0022] For example, as shown in Figure 1, the server device 100 sets the character that will send messages in the messaging application (step S1). For example, if user U adds a character's official account (OA) as a friend, the server device 100 sets that character as the character that will send messages. Alternatively, the server device 100 accepts settings regarding the conditions for the character that will send messages and selects a character that meets those conditions. At this time, the server device 100 may present multiple existing characters that meet the conditions as candidates to the user U who has entered the character conditions, and allow them to select one. Alternatively, the server device 100 may use AI to automatically generate an original character that meets those conditions.
[0023] Furthermore, the server device 100 may receive advertising information and the setting of a character that reads out the advertising information as a message from the advertiser (which may also be an advertising agency) that has submitted a bid for the advertisement. In this case, the server device 100 may set the character set by the advertiser as the character that sends messages to user U, or if user U adds the character's official account (OA) as a friend, the server device 100 may set that character as the character that sends messages.
[0024] Furthermore, the server device 100 may charge per character when setting up a character. The charges may vary for each character.
[0025] If no character is set, the server device 100 may set a predetermined default character. The default character may be an anonymous character or a fictional character.
[0026] Next, the server device 100 performs a search based on the search query entered by user U (step S2). The search query is not limited to keywords; it can be a question, a sentence, an image, or audio. In other words, any input from user U is acceptable, regardless of data type or content. For example, the search query could be a message entered by user U. Furthermore, the AI and the search engine may work together to perform the search.
[0027] Next, the server device 100 passes the AI a prompt (instruction text) requesting the generation of search results corresponding to the search query and a message explaining the search results. The AI then uses the AI to generate a message explaining the search results in an expression that matches the personality of the configured character (step S3). For example, the server device 100 may generate a message in which the configured character makes comments, advice, additional information, or related anecdotes about the search results.
[0028] In this case, the server device 100 may generate a message that explains advertising information related to the search results, as a message that explains the search results.
[0029] Furthermore, if no character is set, the server device 100 does not need to generate a message explaining the search results in response to the search query. Alternatively, the server device 100 may generate a message using a predefined phrase (or template) to explain the search results.
[0030] Next, the server device 100 passes the search results (or a message describing the search results) and a prompt requesting the generation of a script for voice reading to the AI, and the AI uses the AI to generate a script that matches the personality of the configured character (step S4).
[0031] Next, the server device 100 generates audio data in which the script is read aloud in the voice and tone of the designated character (step S5). For example, the server device 100 may request the person in charge of the character's voice (the person themselves, a voice actor, etc.) to read the script aloud and receive audio data of the script being read aloud from that person. In practice, the server device 100 may not request the person in charge directly, but rather through the agency to which the person in charge belongs, or a company that holds the rights to the character. Alternatively, the server device 100 may automatically generate audio data of the script being read aloud using an AI that has learned the character's voice.
[0032] Furthermore, the content of the message explaining the search results does not necessarily have to match the content of the script (and the audio data of the script being read aloud). In other words, the server device 100 does not have to read the message explaining the search results exactly as it is in the character's voice. For example, the server device 100 may generate audio data that gives a sense of live performance or the feeling that the character is speaking extemporaneously (e.g., misreading the message, going off on a tangent, inserting jokes, etc.).
[0033] Furthermore, the server device 100 may generate a script that explains advertising information related to the search results, and generate audio data in which the script is read aloud in the voice and tone of a designated character.
[0034] Furthermore, if no character is set, the server device 100 does not need to generate a script and audio data. Alternatively, the server device 100 may generate a script using a predefined phrase and generate audio data of the script being read aloud using a normal voice (default voice, machine voice, etc.).
[0035] Next, the server device 100 displays the search results along with a message explaining the search results in the messaging application (step S6). For example, the server device 100 displays a speech bubble from a character icon in the messaging application, and displays the search results and a message from the character explaining the search results within the speech bubble.
[0036] Furthermore, if no character is set, the server device 100 may display only the search results and not display a message explaining the search results according to the search query. Alternatively, the server device 100 may display a message using a pre-defined format.
[0037] Next, the server device 100 uses audio data in which the script described above is read aloud in the voice and tone of the designated character (playing the audio data) to explain the search results in the character's voice and tone (step S7). For example, the server device 100 displays a message explaining the search results and simultaneously plays the audio data to explain the search results. Alternatively, after displaying a message explaining the search results, the server device 100 plays the audio data to explain the search results when user U presses (clicks or taps) the character's icon (or speech bubble). Note that the content of the displayed message and the content of the audio explaining the search results do not necessarily have to match.
[0038] Furthermore, the server device 100 may charge a fee each time it plays audio data (when it explains the search results in the character's voice and tone). In other words, the server device 100 may charge a fee each time it explains the search results in the character's voice and tone. The fee may vary depending on the character. If the content of the search result explanation is advertising information, the advertiser may be charged.
[0039] In this embodiment, the following effects are expected. • Acquiring core fans of the content domestically (appealing to core fans of the character or the work in which it appears) • Differentiation and monetization: Free for AI-generated text responses only, and paid (premium) for voice-over responses using character voices. • Can be used as an AI-powered Q&A tool to guide tourists from overseas around Japan. • This new form (usage pattern) of AI utilization models in Japan has attracted attention from analysts.
[0040] The voice reading in a character's voice is performed by an AI that has learned the character's voice, resulting in "automated voice × AI-generated response = voice reading in a character's voice." Examples of characters include Vtubers, voice actors / anime characters, idols, comedians, actors, athletes, celebrities, and other real people. Furthermore, it is not limited to existing characters; original characters are also acceptable. Because it is "automated voice × AI-generated response = voice reading in a character's voice," it is possible to generate responses in foreign languages as well as Japanese, enabling foreign language reading.
[0041] For example, when a user performs a keyword search for recommended restaurants in a certain area, and there are "no bids" (default) for that keyword (meaning no ads are registered for that keyword), the server device 100 provides search results along with a standard phrase corresponding to the keyword, such as "This area is recommended." For text-to-speech, a normal voice (default voice, machine voice, etc.) is used. Alternatively, only text responses may be provided, without any voice descriptions or readings of each restaurant.
[0042] Furthermore, if there is a bid (i.e., an ad has been registered for the keyword), the server device 100 outputs the ad text of the bidding store and provides the search results below it. In other words, the server device 100 provides the keyword search results and the ad of the bidding store (an ad related to the keyword or search result). At this time, the server device 100 reads aloud the store description along with the campaign using normal voice. The search results may also be read aloud using normal voice. The server device 100 may also repeatedly read aloud the ad using normal voice while the user is viewing the search results. For example, the server device 100 may read aloud the ad using normal voice at regular intervals instead of background music.
[0043] Furthermore, in the case of "Bidding enabled × Custom settings (character customization)," the server device 100 inputs a prompt (prompt: instruction text) such as "Please explain the advertisement as this character" along with the advertisement text of the bidder store to an AI such as GPT, generating text as if the character were explaining the store, providing the generated text first, and then providing the search results below it. At this time, the server device 100 returns a speech synthesis script that reflects the reading style appropriate for the character (character-specific tone, phrasing, sentence endings, etc.) to the user's terminal device 10, and has the character's voice read aloud the store explanation along with the campaign based on the speech synthesis script. The server device 100 may also have the character's voice repeatedly read the advertisement while the user is viewing the search results. The text display of the generated text may also be changed to a style appropriate for the character (character-specific tone, phrasing, sentence endings, etc.). The search results may also be changed to a display format and expression appropriate for the character. The server device 100 may also charge the advertiser each time the character reads the text aloud.
[0044] Furthermore, when the server device 100 utilizes the voices of existing characters (including their use as training data for training models / AI), it ensures that appropriate compensation is paid to the rights holders (companies, agencies, voice actors, etc.) related to the characters or their voices. The amount of compensation (a percentage of sales, a unit price per reading, etc.) may be determined through discussions or consultations with the rights holders.
[0045] Ad settings The server device 100 allows advertisers (advertising agencies) to set the character, target audience (target users), region, etc., as one of the advertising settings. For example, the server device 100 allows advertisers to set the character genre (anime character, Vtuber, person, etc.), the type of character (human, non-human, etc.), and the age range of the character (under-student, student, etc.).
[0046] Furthermore, the server device 100 allows users to set the attributes and segments of the target users (age, gender, etc.). It also allows users to set the target region for advertising (areas close to the store's location, trade area, etc.). Additionally, the server device 100 allows users to set the origin and residence of the characters. For example, the server device 100 can be used to set local characters or characters that appeal to so-called "pilgrimages" to specific locations.
[0047] Furthermore, the server device 100 may allow advertisers to directly nominate characters. For example, the server device 100 may employ characters whose official accounts (OAs) have been added as friends by advertisers.
[0048] The server device 100 may select (or present as a candidate) an existing character according to the advertising settings, or it may generate an original character using AI.
[0049] • Bidding settings Furthermore, the server device 100 sets a bid price for each time an advertisement is read aloud (displayed), and competes with other stores that participate in the bidding at the same time, identifying the store that submits the highest bid as the advertiser.
[0050] Due to confidentiality issues regarding communications, logs of conversations cannot be used. Therefore, an auction using CTR (Click Through Rate) is not possible, and the competition will be based on the order of bids.
[0051] ·Setting items Furthermore, the server device 100 allows users to set the following items as configuration items: store name (Toriou Keisuke Akihabara Store), pronunciation of the store name (Toriou Keisuke Akihabara Ten), campaigns (free seasoned egg, double stamps, one free extra serving of noodles, etc.), store URL, address / location (used for matching with the user's location or search terms), and business days / hours (used for matching with the user's search time). The server device 100 may also limit the number of characters allowed for campaign input, such as a maximum of 7 characters.
[0052] CP Furthermore, the server device 100 employs the actual person behind the character (the voice actor) on a per-official account (OA) basis for the character, has them read a script for voice synthesis, and then trades the voice data. At this time, the server device 100 pays the CP (Contents Provider), who is the rights holder of the character, a reward or consideration according to one of the following options A to C. The CP is the owner (rights holder) of the character that is registered as a friend as an OA.
[0053] (Proposal A): Purchase of audio data Server device 100 purchases audio data at a price ladder based on its popularity.
[0054] (Proposal B): Revenue sharing for text-to-speech ads Server device 100 shares 20% of the revenue generated from reading aloud (with or without sound) on the OA with CP (revenue share: R / S).
[0055] (Proposal C): Performance-based compensation based on the number of friends The server device 100 is configured to pay a predetermined amount (for example, 50 yen per person) for each friend (the number of people who have added the OA as a friend). Note that friend registration is limited to first-time registrations only.
[0056] · User Basically, to allow users to freely ask questions (=searches) that serve as the starting point for advertisements, no charges will be made to users. Server device 100 provides a function that reads only the first sentence in the basic plan (default), and may also provide "full text reading" and "voice-enabled push notifications" as paid plans. The learning content and dialogue responses are the same in both the basic and paid plans.
[0057] Furthermore, the server device 100 may allow users, rather than advertisers, to specify the character. For example, the server device 100 may have a character that the user has added as a friend on the official account (OA) read out the advertisement.
[0058] Furthermore, the server device 100 may also perform matching between users or characters and advertisements. For example, the server device 100 may list advertisements based on user × search target (query spoken by the user) × character.
[0059] Furthermore, advertisers may specify the user, search target, and character when submitting their ads. For example, advertisers can input what the character should say when submitting their ads.
[0060] • Another perspective The server device 100 optimizes the search result message according to the bidding status. For example, the server device 100 searches for keywords entered by the user in the AI chat. Next, the server device 100 generates a message explaining the search results based on the keyword search results and the ad submission information pre-configured for the search type. If there are ads corresponding to the keywords or search results, the server device 100 generates an ad message. Finally, the server device 100 provides the search results and message to the user. The search results and message may be read aloud using normal voice or a character's voice. The server device 100 may also modify the message portion according to the configured items.
[0061] Furthermore, the server device 100 will add (display) "PR," "<Advertisement>," or "*This is a commercial." to part of the display for advertisements. For example, if the content of a message spoken by a character is related to an advertisement, the server device 100 will display information indicating that it is an advertisement (such as "PR") in part of the message. The server device 100 may also include information indicating that it is an advertisement in part of the voice spoken by the character. For example, the server device 100 may notify the listener that the spoken message is an advertisement, either in the character's voice or in normal voice.
[0062] Furthermore, the server device 100 performs character-based matching. For example, the server device 100 matches characters with submitted information (advertisements) and users.
[0063] Furthermore, if the search results include a submission facility, the server device 100 will not include it in the search results but will provide it as part of a separate message. The server device 100 will also use AI to generate messages that reflect the characteristic speaking style (tone of voice, phrasing, timing of speech, sentence endings, etc.) of a specified character.
[0064] [2-2. Push notification advertisements] In this embodiment, the character sends an advertisement in the form of a message via push notification to a user U who has added the character's official account (OA) as a friend. Figure 2 is an explanatory diagram showing an overview of the push notification advertisement according to this embodiment.
[0065] For example, as shown in Figure 2, the server device 100 receives from the advertiser the advertising settings, along with the advertising information (read-aloud information) and the settings for a character that reads the advertising information as a read-aloud voice (step S11). The advertising settings include information about the location and / or time (time period) in which the advertising information will be presented. The location and / or time in which the advertising information will be presented may be a location and / or time related to the content of the advertising information (for example, around the store's location, during the store's business hours, etc.), may be random, or may be a location and / or time related to the character (for example, the broadcast time of the work in which the character appears, the region in which the work is set, etc.).
[0066] Next, the server device 100 accepts a friend request from user U to the character's official account (OA) (step S12). At this time, the server device 100 may also receive information from user U indicating that user U has added the character's official account (OA) as a friend. Alternatively, the server device 100 may accept a setting from user U to allow push notifications and voice readings of advertising information by the character set by the advertiser.
[0067] Next, the server device 100 passes a prompt to the AI requesting the generation of a message regarding advertising information, and the AI uses this prompt to generate a message regarding advertising information in an expression that matches the personality of the configured character (step S13).
[0068] Next, the server device 100 passes advertising information (or a message related to advertising information) and a prompt requesting the generation of a script for voice reading to the AI, and the AI uses this prompt to generate a script that matches the personality of the configured character (step S14).
[0069] Next, the server device 100 generates audio data in which the script is read aloud in the voice and tone of the designated character (step S15).
[0070] Next, the server device 100 sends a message regarding the advertisement information to the user U's terminal device 10 via push notification based on the advertisement settings, and displays it on the screen (step S16). For example, the server device 100 sends a message regarding the advertisement information to the user U's terminal device 10 via push notification at the location and / or time (time period) where the advertisement information is presented, and displays it on the screen. The screen on which the message is displayed may be the screen of a messenger application, or it may be a modal / pop-up / dialog, etc. Alternatively, the server device 100 may, in a messaging application, display a speech bubble from a character icon, and display the advertisement information and a message from the character regarding the advertisement information within the speech bubble.
[0071] Next, the server device 100 uses audio data in which the above script is read aloud in the voice and tone of the designated character to notify the user of advertising information in the character's voice and tone (step S17). For example, the server device 100 displays a message about advertising information and simultaneously plays the audio data to notify the user of advertising information. Alternatively, after displaying a message about advertising information, the server device 100 plays the audio data to notify the user of advertising information when the user U presses the displayed message (in the case of a messenger app, an icon or speech bubble). Note that the content of the displayed message and the content of the audio notifying the user of advertising information do not necessarily have to match.
[0072] For example, when creating an OA (Office Automation) document, users can choose to enable or disable push notifications. If push notifications are enabled, users can receive push notification advertisements from companies, etc. If the server device 100 has completely purchased the audio data, it can also have the original CP (Content Manager) display its own character as an advertisement (request that the character read aloud).
[0073] Since there are use cases such as VOD (Video On Demand) and anime-related information distribution, apparel collaborations, and restaurant collaborations, a single character can be contracted for a variety of categories. In this case, the server device 100 will perform checks on the relevance of the character to the category during the review process. As for the billing method, for example, it could be the number of friends (= number of distributions) of the character × unit price (20 yen, etc.).
[0074] [2-2-1. Organic Push Notifications] Server device 100 will have a character suddenly speak to the user as part of push notifications, asking questions such as, "What did you have for lunch today?" The goal is to give the user (OA's friend) a sense of immersion in everyday conversation. The character will then naturally notify the user of advertising information within the flow of this everyday conversation. Basically, there is no charge (user payment) for the character speaking to the user, but if an advertisement is displayed and notified (read aloud) as part of the conversation, charges will be incurred as usual.
[0075] Furthermore, the server device 100 may also allow the ringtone for push notifications to be customized to match the character (customizable settings). For example, a line spoken by the character (a catchphrase, a famous line, etc.) or music unique to the character (onomatopoeia, background music, etc.) may be used as the ringtone for the character. The audio used for the push notification ringtone may be recorded separately when the recording is purchased.
[0076] • Another perspective The server device 100 provides advertisements with specified voice prompts. For example, the server device 100 receives advertisement information and a specified voice prompt from the advertiser. The server device 100 then sends the advertisement information and the voice prompt information as a message via the advertiser's messenger account, and outputs the audio of the voice prompt reading the information to the user's terminal device 10. The user's terminal device 10 reads the voice prompt information aloud as a push notification. At this time, the user's terminal device 10 reads the information aloud only if voice prompting is permitted. Alternatively, the user's terminal device 10 may read the information aloud via messenger. The server device 100 also generates messages in the distinctive speaking style of the specified character.
[0077] [2-3. Character Group Talk Format Advertisement] In this embodiment, in a chat / talk application such as a messaging application, user U gathers characters who have added the official account (OA) as a friend to form a group including user U. User U and each character belonging to the group exchange messages in a conversational format, and a character designated to handle advertising notifies the user of the advertisement in their own voice and tone. The characters belonging to the group may be arbitrarily selected by user U. In other words, user U may arbitrarily set the group and its members. Figure 3 is an explanatory diagram showing an overview of the character group talk format advertisement according to this embodiment.
[0078] For example, as shown in Figure 3, the server device 100 forms a group of multiple characters (step S21). For example, the server device 100 gathers characters whose official accounts (OAs) user U has added as friends and forms a group that includes user U.
[0079] Next, the server device 100 uses AI to provide a chat room in which multiple characters can participate, on a group basis (step S22).
[0080] Next, the server device 100 uses AI to facilitate conversations between participants in the chat room through message exchanges (step S23). For example, the server device 100 uses AI to facilitate conversations between characters and / or between characters and user U through message exchanges in the chat room.
[0081] Next, the server device 100 receives from user U the setting of a character that will formally or finally answer user U's questions (step S24). For example, the server device 100 receives from user U the setting of a character that will speak advertising information. In practice, the character that will answer may be randomly selected from among several characters.
[0082] Next, the server device 100 receives the advertising information settings from the advertiser who submitted the bid (step S25). At this time, the server device 100 may also receive the settings for a character that will read out the advertising information from the advertiser.
[0083] Furthermore, the server device 100 may accept from user U the setting of priority levels for each of multiple characters as respondents, and select a character to be the respondent each time according to the suitability (match degree) between the advertising information and each character, and the priority level of each character. For example, the server device 100 may exclude characters that do not match the advertising information (match degree is below a threshold) even if they have a high priority, and select a character with a high priority among the characters that do match the advertising information. Alternatively, the suitability (match degree) between the advertising information and each character may be used as the priority level for each character as a respondent. For example, the server device 100 may select the character with the highest match degree to the advertising information as the respondent for each piece of advertising information.
[0084] Next, the server device 100 performs a search in the chat room based on the search query entered by user U (step S26). For example, the server device 100, in cooperation with the AI and search engine, receives a question from user U, extracts a search query from that question, and performs a search based on the search query.
[0085] Next, the server device 100 displays messages other than advertising information related to the search results as statements made by characters other than the selected character (respondent) in the chat room, and also displays messages related to advertising information related to the search results as statements made by the selected character (respondent) (step S27). The generation of character messages is as described above.
[0086] Next, the server device 100 has the chat room read aloud (speak) messages other than advertising information related to the search results in the voice and tone of one of the multiple characters other than the selected character, and also has the selected character read aloud messages related to advertising information related to the search results in the voice and tone of the selected character (step S28). The generation of the character's voice data is as described above.
[0087] For example, the server device 100 may automatically play audio data and perform voice reading at the same time as displaying the message. Alternatively, the server device 100 may perform voice reading when the user U presses (clicks or taps) the character icon (or speech bubble) after displaying the message. Note that the content of the displayed message and the content of the voice reading do not necessarily have to match.
[0088] At this time, the server device 100 instructs the AI to generate messages and audio data (or scripts) in the conversation, preventing the selected character and other characters from making negative statements about the advertising information (such as "you shouldn't go" or "it's no good"). It is also possible to generate messages from other characters that serve as a "prelude" for the selected character to make a message about the advertising information.
[0089] In steps S27 and S28, the server device 100 may display or speak a message regarding advertising information related to the search results only if the character set by the advertiser matches the character set by user U.
[0090] Furthermore, in steps S27 and S28, if no advertising information has been set by the advertiser, or if there is no advertising information related to the search results (i.e., no relevant advertisements), the server device 100 may display or announce a message related to the search results in the usual way, rather than displaying advertising information related to the search results.
[0091] In this embodiment, for a group, a model is prepared that has learned the individuality and characteristics of each character, or a model that has learned the interactions between the characters, and multiple characters speak to each other in an environment where they can converse like in an open-source environment. For example, in a group that includes characters A, B, and C that the user has added as friends, if character B is set as the character that answers the user's questions (the respondent), the conversation will proceed as follows.
[0092] User: Tell me about ramen in Akihabara. A (Gaya 1): Uh, I'm from Hachioji so I don't know. B (Respondent): What kind do you like? User: Miso with thick noodles B (Respondent): Then maybe around here? A (Commenter 1): Looks delicious! C (Gaya 2): I've eaten here before, it's delicious!
[0093] In addition to answering user questions, the system can also send push notifications from characters to proactively deliver advertising information. If a user does not respond to a push notification from a character (leaving it unread) and a certain amount of time (e.g., 2 hours) has passed since the notification was received, the character that sent the push notification or another character will call out to the user with a "Hey!" (a final reminder).
[0094] Furthermore, for example, during the time slot when an anime or drama is airing, users can ask the characters appearing in that anime or drama, "Are you watching?", enabling them to converse and comment on the characters in the anime or drama they are watching in real time.
[0095] Conversely, push notifications can also be sent from the characters to the users. During the time (time slot) when the anime or drama is airing, it's possible for the characters appearing in the anime or drama to send push notifications to the users with messages or voice messages such as, "It's on now, are you watching?"
[0096] The script (content) of the character's commentary may be generated using AI based on momentum data such as real-time searches.
[0097] • Another perspective The server device 100 provides character group advertisements. For example, in a chat room (talk room) with an AI that provides answers as multiple characters, the server device 100 may select a character from among multiple characters to provide search results based on the search query entered by the user. In practice, the server device 100 may allow the user to specify / select the character that will provide the search results. The server device 100 also provides answers related to the search results (the search results themselves, refined answers, etc.) from the selected character, along with answers other than the search results in the voices of the other characters. In other words, the selected character is the "respondent," and the other characters are the "background voices."
[0098] Furthermore, the server device 100 provides the search results to the AI of the specified / selected character, causing it to generate an answer that includes the search results. At this time, the server device 100 generates the answer while reflecting the affinity between the character and the search query.
[0099] Furthermore, the server device 100 does not provide search results to the AI of other characters, and causes them to generate answers that do not include search results (answers unrelated to search results). In this case, if the generated answer happens to include search results, the server device 100 discards that answer and causes the AI to generate a new answer. Alternatively, a prompt may be generated and given to the AI to exclude search results.
[0100] Furthermore, if there is a search query and a corresponding advertisement specification, the server device 100 will prune the corresponding character.
[0101] [2-4. Voice reading for OA support in search navigation] When the server device 100 is a search user rather than an OA user, it displays a still image of a character as an overlay on a portion of the user's terminal device 10 screen (search screen), such as the lower right corner. Figure 4 is an explanatory diagram showing an overview of OA-compatible voice reading in a search user according to this embodiment.
[0102] For example, as shown in Figure 4, the server device 100 performs a search in the search engine based on the search query entered by user U (step S31).
[0103] Next, the server device 100 has the answer generation AI read the search results from the search engine performed by user U, and generates text related to the search results as an answer (step S32). Note that the text related to the search results is not limited to text, but may also include images.
[0104] Next, the server device 100 has the character conversion AI read the text related to the search results to generate a script for the character to read aloud (step S33). In practice, the server device 100 may have the character conversion AI read the search results to generate a script for the character to read aloud. This character may be a randomly selected character, a character related to (highly relevant to) the search results, a character set by an advertiser who submitted an advertisement related to the search results, a character set by user U, or a character from a work or other project that user U is interested in based on their browsing history.
[0105] Next, the server device 100 generates audio data in which the script is read aloud in the character's voice and tone (step S34).
[0106] Next, the server device 100 displays the character image and the messenger app icon along with the text related to the search results in the search results display area (step S35). At this time, the server device 100 may also display a play button if there is audio data.
[0107] Next, the server device 100 uses audio data to have the script read aloud in the character's voice and tone (step S36). At this time, the server device 100 may also play the audio data and explain the search results in the character's voice and tone when the user U presses the character's image (or play button). Alternatively, the server device 100 displays the text related to the search results and simultaneously plays the audio data and explains the search results in the character's voice and tone.
[0108] Next, when the user U presses the icon, the server device 100 displays a QR code (registered trademark) for adding the character as a friend in the messenger app and guides the user to the messenger app (step S37).
[0109] Furthermore, if user U has already registered a character as a friend, the server device 100 may display a badge on the icon indicating that the character has been registered as a friend (or change the icon), and display special information (text, images, etc.) that is only provided to registered friends.
[0110] Furthermore, if user U has already registered a character as a friend, the server device 100 may generate a script with special content that is only provided if the character is registered as a friend. Also, if user U has already registered a character as a friend, the server device 100 may generate audio data in which the script is read aloud in a special voice and tone that is only provided if the character is registered as a friend.
[0111] Furthermore, the server device 100 may also configure the search settings of the search engine to allow user U to specify the character who will read the script aloud.
[0112] In this embodiment, the server device 100 reads aloud AI-generated suggestions in the character's voice in response to the user pressing the "read aloud button" (play button). At this time, the server device 100 reads aloud a character script generated by a model that has learned the character's characteristics (character-specific tone of voice, phrasing, sentence endings, etc.).
[0113] The server device 100 generates search results in response to user searches (questions, inquiries) using a combination of a search engine and an answer generation AI. For example, the server device 100 generates a script by having a character conversion AI read the answer generated by the answer generation AI to the user's question. Then, when the audio playback button is pressed, the server device 100 has the registered synthesized voice read the script aloud. The server device 100 uses pre-purchased audio data for office automation to read the script aloud.
[0114] In the PC version, the user specifies the character to be read aloud from the search settings. If there is an AI response that can be read aloud, the server device 100 displays a play button in the search results. The server device 100 may also display a QR code (registered trademark) for registering as an OA friend when the user presses the app icon. Note that the QR code (registered trademark) is just one example. In practice, it is not limited to two-dimensional codes such as the QR code (registered trademark). The server device 100 may also display a special effect if the user has already registered as an OA friend.
[0115] • Another perspective The server device 100 provides voice reading support for official accounts (OAs) in the search process. For example, the server device 100 generates text for search results corresponding to search queries in a search domain, based on character information specified in the search and the corresponding advertisement.
[0116] The server device 100 may also be configured to perform voice reading using the character set in the advertisement for the official account (OA).
[0117] Furthermore, the server device 100 may provide the voice reading service only to users who have registered the character OA as a friend.
[0118] Furthermore, the server device 100 may perform special functions if the user has registered the character's OA as a friend. If the user has not registered the character's OA as a friend, the server device 100 may display a friend registration button when performing voice reading.
[0119] [3. Example of terminal device configuration] Next, the configuration of the terminal device 10 will be described using Figure 5. Figure 5 is a diagram showing an example of the configuration of the terminal device 10 according to this embodiment. As shown in Figure 5, the terminal device 10 comprises a communication unit 11, a display unit 12, an input unit 13, a positioning unit 14, a sensor unit 20, a control unit 30 (controller), and a storage unit 40.
[0120] (Communications Section 11) The communication unit 11 is connected to the network N by wire or wireless connection and transmits and receives information to and from the server device 100 via the network N. For example, the communication unit 11 is implemented by a NIC (Network Interface Card), an antenna, and a communication circuit according to the communication method.
[0121] (Display section 12) The display unit 12 is a display device that displays various information such as location information. For example, the display unit 12 may be a liquid crystal display (LCD) or an organic electro-luminescent display (OLED). The display unit 12 may also be a touch panel display, but is not limited to this.
[0122] (Input section 13) The input unit 13 is an input device that receives various operations from the user U. For example, the input unit 13 has buttons for inputting characters, numbers, etc. The input unit 13 may also be an input / output port (I / O port) or a USB (Universal Serial Bus) port. If the display unit 12 is a touch panel display, a part of the display unit 12 functions as the input unit 13. The input unit 13 may also be a microphone that receives voice input from the user U. The microphone may be wireless.
[0123] (Positioning unit 14) The positioning unit 14 receives signals (radio waves) transmitted from GPS (Global Positioning System) satellites and, based on the received signals, acquires position information (e.g., latitude and longitude) indicating the current position of the terminal device 10. In other words, the positioning unit 14 determines the position of the terminal device 10. Note that GPS is just one example of a GNSS (Global Navigation Satellite System).
[0124] Furthermore, the positioning unit 14 can determine its position using various methods other than GPS. For example, the positioning unit 14 may use various communication functions of the terminal device 10 to determine its position as an auxiliary positioning means for position correction, etc., as described below.
[0125] (Wi-Fi positioning) For example, the positioning unit 14 determines the location of the terminal device 10 by utilizing the Wi-Fi® communication function of the terminal device 10 and the communication network provided by each telecommunications company. Specifically, the positioning unit 14 determines the location of the terminal device 10 by performing Wi-Fi communication, etc., and determining the distance to nearby base stations and access points.
[0126] (Beacon positioning) Furthermore, the positioning unit 14 may determine the location using the Bluetooth® function of the terminal device 10. For example, the positioning unit 14 determines the location of the terminal device 10 by connecting to a beacon transmitter connected via the Bluetooth® function.
[0127] (Geomagnetic positioning) Furthermore, the positioning unit 14 determines the position of the terminal device 10 based on the geomagnetic pattern of the structure, which has been measured in advance, and the geomagnetic sensor provided by the terminal device 10.
[0128] (RFID positioning) Furthermore, if, for example, the terminal device 10 is equipped with an RFID (Radio Frequency Identification) tag function equivalent to that of a contactless IC card used at a train station ticket gate or in a store, or if it is equipped with a function to read RFID tags, the location where it was used will be recorded along with the information on the payment or other transactions made by the terminal device 10. The positioning unit 14 may determine the location of the terminal device 10 by acquiring such information. Alternatively, the location may be determined by an optical sensor or infrared sensor equipped in the terminal device 10.
[0129] The positioning unit 14 may, if necessary, determine the position of the terminal device 10 using one or a combination of the positioning means described above.
[0130] (Sensor unit 20) The sensor unit 20 includes various sensors mounted on or connected to the terminal device 10. The connection can be wired or wireless. For example, the sensors may be detection devices other than the terminal device 10, such as wearable devices or wireless devices. In the example shown in Figure 5, the sensor unit 20 includes an acceleration sensor 21, a gyro sensor 22, a barometric pressure sensor 23, a temperature sensor 24, a sound sensor 25, a light sensor 26, a magnetic sensor 27, and an image sensor (camera) 28.
[0131] The sensors 21-28 described above are merely examples and not limiting. In other words, the sensor unit 20 may be configured to include some of the sensors 21-28, or it may include other sensors such as humidity sensors in addition to or instead of the sensors 21-28.
[0132] The acceleration sensor 21 is, for example, a 3-axis acceleration sensor and detects the physical movement of the terminal device 10, such as its direction of movement, velocity, and acceleration. The gyro sensor 22 detects the physical movement of the terminal device 10, such as its tilt in the three axes, based on its angular velocity. The barometric pressure sensor 23 detects the atmospheric pressure around the terminal device 10, for example.
[0133] Since the terminal device 10 is equipped with the acceleration sensor 21, gyroscope 22, barometric pressure sensor 23, etc., it becomes possible to determine the position of the terminal device 10 using technologies such as pedestrian dead-reckoning (PDR) that utilize these sensors 21 to 23. This makes it possible to obtain indoor location information that is difficult to obtain with positioning systems such as GPS.
[0134] For example, a pedometer using an accelerometer 21 can calculate the number of steps, walking speed, and distance walked. Additionally, a gyroscope 22 can be used to determine the user U's direction of movement, gaze direction, and body tilt. Furthermore, the barometric pressure detected by the barometric pressure sensor 23 can be used to determine the altitude and floor number of the user U's terminal device 10.
[0135] The temperature sensor 24 detects, for example, the ambient temperature around the terminal device 10. The sound sensor 25 detects, for example, the ambient sound around the terminal device 10. The light sensor 26 detects the ambient illumination around the terminal device 10. The magnetic sensor 27 detects, for example, the Earth's magnetic field around the terminal device 10. The image sensor 28 captures an image of the area around the terminal device 10.
[0136] The aforementioned pressure sensor 23, temperature sensor 24, sound sensor 25, light sensor 26, and image sensor 28 can detect the surrounding environment and conditions of the terminal device 10 by detecting atmospheric pressure, temperature, sound, and illuminance, respectively, and by capturing images of the surroundings. Furthermore, it becomes possible to improve the accuracy of the location information of the terminal device 10 based on the surrounding environment and conditions.
[0137] (Control Unit 30) The control unit 30 includes, for example, a microcomputer having a CPU (Central Processing Unit) or MPU (Micro Processing Unit), ROM (Read Only Memory), RAM (Random Access Memory), input / output ports, and various circuits. Alternatively, the control unit 30 may be composed of hardware such as an integrated circuit (ASIC) or FPGA (Field Programmable Gate Array). The control unit 30 includes a transmission unit 31, a reception unit 32, and a processing unit 33.
[0138] (Transmitter 31) The transmission unit 31 can transmit various information, such as information input by the user U using the input unit 13, various information detected by sensors 21-28 mounted on or connected to the terminal device 10, and location information of the terminal device 10 determined by the positioning unit 14, to the server device 100 via the communication unit 11.
[0139] (Receiving unit 32) The receiving unit 32 can receive various information provided by the server device 100, as well as requests for various information from the server device 100, via the communication unit 11.
[0140] (Processing 33) The processing unit 33 controls the entire terminal device 10, including the display unit 12. For example, the processing unit 33 can output and display various information transmitted by the transmission unit 31 and various information received from the server device 100 by the reception unit 32 to the display unit 12. The processing unit 33 may also cooperate with the server device 100 via the communication unit 11 to perform controls such as displaying messages that reflect the character's personality and characteristics, or performing voice readings using the character's voice.
[0141] (Storage unit 40) The storage unit 40 is implemented by, for example, semiconductor memory elements such as RAM (Random Access Memory) and flash memory, or by storage devices such as HDD (Hard Disk Drive), SSD (Solid State Drive), and optical discs. Various programs and various data are stored in this storage unit 40.
[0142] [4. Example of Server Device Configuration] Next, the configuration of the server device 100 according to the embodiment will be described using Figure 6. Figure 6 is a diagram showing an example of the configuration of the server device 100 according to the embodiment. As shown in Figure 6, the server device 100 includes a communication unit 110, a storage unit 120, and a control unit 130.
[0143] (Communications Department 110) The communication unit 110 is implemented, for example, by a NIC (Network Interface Card). The communication unit 110 is connected to the network N by wire or wireless connection.
[0144] (Storage unit 120) The storage unit 120 is implemented by, for example, semiconductor memory elements such as RAM (Random Access Memory) and flash memory, or by storage devices such as HDDs, SSDs, and optical discs. For example, the storage unit 120 stores information transmitted and received via the communication unit 110. The storage unit 120 may also store user U's attribute information and history information (log data) along with identification information (user ID, etc.) indicating user U.
[0145] (Control unit 130) The control unit 130 is a controller, and is realized by executing various programs (corresponding to an example of an information processing program) stored in the internal storage device of the server device 100 using a storage area such as RAM as a working area, for example, by a CPU (Central Processing Unit), MPU (Micro Processing Unit), GPU (Graphics Processing Unit), ASIC (Application Specific Integrated Circuit), or FPGA (Field Programmable Gate Array). In the example shown in Figure 6, the control unit 130 has an acquisition unit 131, a setting unit 132, a provision unit 133, a search unit 134, a generation unit 135, a display control unit 136, and an audio control unit 137.
[0146] (Acquisition part 131) The acquisition unit 131 acquires the search query entered by the user U. For example, when the user U enters a search query into a search engine or the like and performs a keyword search, the acquisition unit 131 acquires the search query via the communication unit 110. In other words, the acquisition unit 131 acquires the keyword entered by the user U into the search box of a search engine, website, or app via the communication unit 110. Note that the search query is not limited to keywords; it may also be a question, a sentence, an image, or audio.
[0147] Furthermore, the acquisition unit 131 acquires user information about user U via the communication unit 110. For example, the acquisition unit 131 acquires identification information (such as user ID), location information, and attribute information of user U from user U's terminal device 10. The acquisition unit 131 may also acquire identification information and attribute information of user U when user U is registered. The acquisition unit 131 then stores the user information in the storage unit 120.
[0148] Furthermore, the acquisition unit 131 acquires various historical information (log data) indicating the user U's actions via the communication unit 110. For example, the acquisition unit 131 acquires various historical information indicating the user U's actions from the user U's terminal device 10, or from various servers based on the user ID, etc. The acquisition unit 131 then stores the various historical information in the storage unit 120.
[0149] (Settings section 132) The setting unit 132 sets the character that will deliver the message. At this time, the setting unit 132 may also accept settings regarding the conditions for the character that delivers the message. For example, the setting unit 132 accepts settings regarding the conditions for the character that delivers the message and selects a character that meets those conditions.
[0150] The setting unit 132 receives the settings for advertising information from the advertiser who has submitted a bid. For example, the setting unit 132 receives the settings for advertising information and the voice to be read aloud from the advertiser. At this time, the setting unit 132 may also receive the settings for the character that will read the information aloud as the voice to be read aloud from the advertiser. In other words, the setting unit 132 may receive the settings for advertising information and the character that will read the advertising information from the advertiser who has submitted a bid.
[0151] Furthermore, the settings unit 132 accepts a setting from the user U regarding whether or not to allow the reading of the information to be read aloud. At this time, the settings unit 132 may also accept a setting from the user U regarding the character that will read the information aloud as the reading voice.
[0152] Furthermore, the setting unit 132 sets which of the multiple characters will provide search results based on the search query.
[0153] Furthermore, the settings unit 132 allows user U to specify the character who will read the script in the search settings of the search engine.
[0154] (Provider 133) The service provider 133 uses AI to provide a chat room in which multiple characters can participate. For example, the service provider 133 provides a chat room in which characters that user U has added as friends can participate. The service provider 133 may also include a display control unit 136 and a voice control unit 137, which will be described later. For example, the service provider 133 may consist of a display control unit 136 and a voice control unit 137.
[0155] (Search section 134) The search unit 134 performs a search based on a search query entered by user U. For example, the search unit 134 performs a search based on a search query entered by user U in a chat room or search engine. Alternatively, the search unit 134 performs a search based on a question from user U. For example, the search unit 134 may work in conjunction with an AI, submitting a question from user U to the AI and receiving search results as an answer from the AI.
[0156] (Generation unit 135) The generation unit 135 uses AI to generate messages that explain the search results corresponding to the search query. For example, the generation unit 135 uses AI to generate messages in an expression that matches the personality of the set character. In this case, the generation unit 135 uses AI to generate messages that explain advertising information related to the search query or search results in an expression that matches the personality of the set character.
[0157] Furthermore, the generation unit 135 may use AI to automatically generate characters that meet the conditions for the character that will deliver the message.
[0158] Furthermore, the generation unit 135 uses AI to generate audio data in which the displayed message is read aloud in the voice and tone of a designated character. For example, the generation unit 135 uses AI to generate a script that matches the personality of a designated character, requests the voice actor for the character to read the script aloud, and receives audio data of the voice actor reading the script. Alternatively, the generation unit 135 uses AI to generate a script that matches the personality of a designated character, and uses AI that has learned the character's voice to automatically generate audio data of the script being read aloud.
[0159] Furthermore, the generation unit 135 uses AI to generate spoken information in an expression that matches the personality of the character set as the spoken voice. In addition, the generation unit 135 uses AI to generate audio data in which the spoken information is read aloud in the voice and tone of the character set as the spoken voice.
[0160] The generation unit 135, in the chat room, provides the search results to the AI and generates audio data in which the AI speaks (reads aloud) the content (message) related to the search results in the voice and tone of a set character. Furthermore, the generation unit 135 generates audio data in which other characters speak content unrelated to the search results in the voice and tone of another character.
[0161] Although not shown in the diagram, the generation unit 135 functionally comprises a text generation unit 135a, a script generation unit 135b, and a voice generation unit 135c.
[0162] (Sentence generation unit 135a) The text generation unit 135a reads the search results from the search engine performed by user U into the answer generation AI and generates text related to the search results as an answer. Furthermore, if user U has already registered a character as a friend, the text generation unit 135a may generate special text that is only provided if the character is registered as a friend.
[0163] (Script generation unit 135b) The script generation unit 135b has the character conversion AI read the text related to the search results and generates a script for the character to read aloud. If user U has already registered the character as a friend, the script generation unit 135b may generate a special script that is only provided if the user has registered the character as a friend.
[0164] (Speech generation unit 135c) The voice generation unit 135c generates audio data in which the script is read aloud in the character's voice and tone. If user U has already registered the character as a friend, the voice generation unit 135c may also generate audio data in which the script is read aloud in a special voice and tone that is only provided when the character is registered as a friend.
[0165] (Display control unit 136) The display control unit 136 displays the search results along with a message explaining the search results. At this time, the display control unit 136 may also display a message explaining the advertising information in an expression that matches the personality of the set character, along with the search results and advertising information.
[0166] Furthermore, the display control unit 136 may cause the user U's terminal device 10, which has added the advertiser's account as a friend in the messenger app, to display spoken information based on the advertisement as a message. In this case, the display control unit 136 may output the spoken information to the user U's terminal device 10 and display it in an expression that matches the personality of the set character.
[0167] Furthermore, the display control unit 136 may, via push notification, display the spoken information based on the advertising information as a message on the user U's terminal device 10, with the character set as the spoken voice speaking to the user U.
[0168] Alternatively, the display control unit 136 may exchange messages with user U in a conversational format with a character set as the voice to be read aloud, and in the course of the conversation, it may cause user U's terminal device 10 to display the read-aloud information based on the advertisement as a message.
[0169] Furthermore, the display control unit 136 displays content other than the search results as statements made by characters other than the selected character in the chat room, and also displays content related to the search results as statements made by the selected character.
[0170] Furthermore, the display control unit 136 displays the character's image and the messenger app icon along with the text related to the search results in the search results display area. When the user U presses the icon, the display control unit 136 displays a QR code for adding the character as a friend in the messenger app, guiding the user to the messenger app. The display control unit 136 may also display a play button if audio data is available.
[0171] Furthermore, the display control unit 136 may, if user U has already registered a character as a friend, display a badge on the icon indicating that the character has been registered as a friend, and display special information that is only available when the character has been registered as a friend.
[0172] (Sound control unit 137) The voice control unit 137 uses voice data to have the script read aloud in the character's voice and tone. For example, the voice control unit 137 reads a message aloud in the voice of a set character. In this case, the voice control unit 137 reads the displayed message aloud in the voice and tone of the set character.
[0173] In other words, the voice control unit 137 outputs audio data of the read-aloud information read aloud using the set voice to the user U's terminal device 10, and reads the read-aloud information using the voice. At this time, the voice control unit 137 may output audio data to the user U's terminal device 10 and read the read-aloud information using the voice only if the user U has given permission for the read-aloud information to be read aloud. For example, the voice control unit 137 may determine that the user U has given permission for the read-aloud information to be read aloud if the user U has added the official account (OA) of the character that performs voice reading set by the advertiser as a friend.
[0174] In this case, the voice control unit 137 may read the information aloud in the voice of the character set as the read-aloud voice. For example, the voice control unit 137 outputs the voice data to the user U's terminal device 10 and reads the information aloud in the voice and tone of the set character.
[0175] Furthermore, the voice control unit 137 causes the chat room to have the other characters, among the multiple characters, speak in the voice and tone of characters other than the selected character, to speak in the voice and tone of the selected character, to speak in the voice and tone of characters related to the search results.
[0176] At this time, the voice control unit 137 causes other characters in the chat room to speak information other than advertising information related to the search results, and also causes the selected character to speak advertising information related to the search results.
[0177] For example, in a chat room, the voice control unit 137, within the flow of conversation between multiple characters, causes other characters to speak about things other than the search results, while causing the selected character to speak about things related to the search results.
[0178] Furthermore, the voice control unit 137 causes the selected character to speak about the search results in the chat room, and also causes other characters to respond to the selected character's statements with their own opinions or additional information.
[0179] For example, the voice control unit 137 uses the generated voice data in the chat room to make the system speak about the search results in the voice and tone of a pre-configured character. The voice control unit 137 also uses the generated voice data in the chat room to make the system speak about content other than the search results in the voice and tone of other characters.
[0180] Furthermore, if a character image is displayed in the search results display area, the voice control unit 137 will use the voice data to read aloud the script in the character's voice and tone when the user U presses the character image.
[0181] Furthermore, if a play button is displayed in the search results display area, the audio control unit 137 will use the audio data to read the script aloud in the character's voice and tone when the user U presses the play button.
[0182] [5. Processing Procedure] Next, the processing procedure by the server device 100 according to the embodiment will be described using Figures 7 to 10. Figure 7 is a flowchart of the first processing procedure according to the embodiment. Figure 8 is a flowchart of the second processing procedure according to the embodiment. Figure 9 is a flowchart of the third processing procedure according to the embodiment. Figure 10 is a flowchart of the fourth processing procedure according to the embodiment. Note that the processing procedures shown below are repeatedly executed by the control unit 130 of the server device 100.
[0183] [5-1. First Processing Step] For example, as shown in Figure 7, the setting unit 132 of the server device 100 sets the character that will send the message from the user U (step S101).
[0184] Next, the setting unit 132 of the server device 100 receives the setting of advertising information from the advertiser who submitted the bid (step S102).
[0185] Next, the search unit 134 of the server device 100 performs a search based on the search query entered by user U (step S103).
[0186] Next, the generation unit 135 of the server device 100 uses AI to generate a message that explains advertising information related to the search query or search result, in an expression that matches the personality of the set character, as a search result corresponding to the search query (step S104).
[0187] Next, the generation unit 135 of the server device 100 uses AI to generate audio data in which a message explaining advertising information related to the search query or search results is read aloud in the voice and tone of a set character (step S105).
[0188] Next, the display control unit 136 of the server device 100 displays a message explaining the advertising information in an expression that matches the personality of the set character, along with the search results and advertising information (step S106).
[0189] Next, the voice control unit 137 of the server device 100 uses the voice data to read a message explaining the advertising information in the voice and tone of a set character (step S107).
[0190] [5-2. Second Processing Procedure] For example, as shown in Figure 8, the setting unit 132 of the server device 100 receives settings from the advertiser for advertising information to be read aloud and a character to read the information aloud as the voice (step S201).
[0191] Next, the setting unit 132 of the server device 100 receives a setting from user U to allow the reading of the text-to-speech information (step S202).
[0192] Next, the generation unit 135 of the server device 100 uses AI to generate spoken information based on the advertising information in an expression that matches the personality of the character set as the spoken voice (step S203).
[0193] Next, the generation unit 135 of the server device 100 uses AI to generate audio data in which the text-to-speech information based on the advertisement is read aloud in the voice and tone of a character set as the text-to-speech voice (step S204).
[0194] Next, the display control unit 136 of the server device 100 displays spoken information based on the advertisement as a message on the user U's terminal device 10 via push notification, with the configured character speaking to the user U (step S205).
[0195] Next, the voice control unit 137 of the server device 100 outputs voice data to the user U's terminal device 10, causing the read-aloud information based on the advertisement information to be read aloud in the voice and tone of the set character (step S206).
[0196] [5-3. Third Processing Procedure] For example, as shown in Figure 9, the provision unit 133 of the server device 100 uses AI to provide a chat room in which multiple characters that user U has registered as friends participate (step S301).
[0197] Next, the search unit 134 of the server device 100 performs a search based on the search query entered by user U in the chat room (step S302).
[0198] Next, the setting unit 132 of the server device 100 receives the setting of advertising information from the advertiser who submitted the bid (step S303).
[0199] Next, the setting unit 132 of the server device 100 sets a character from among the multiple characters mentioned above that will respond with advertising information related to the search results based on the search query (step S304).
[0200] Note that the processes in steps S303 and S304 may be executed before the processes in steps S301 and S302. In other words, the settings may be performed in advance.
[0201] Next, the generation unit 135 of the server device 100 provides advertising information related to the search results to the AI, generating a message containing content related to the advertising information related to the search results as a statement from a set character, and further generating a message containing content other than advertising information related to the search results as a statement from another character (step S305).
[0202] Next, the generation unit 135 of the server device 100 provides advertising information related to the search results to the AI in the chat room, generating audio data in which the AI speaks about the advertising information related to the search results in the voice and tone of a set character, and further generating audio data in which the AI speaks about content other than advertising information related to the search results in the voice and tone of another character (step S306).
[0203] Next, the display control unit 136 of the server device 100 displays messages in the chat room that are not advertising information related to the search results, as statements made by characters other than the selected character, and also displays messages that are advertising information related to the search results as statements made by the selected character (step S307).
[0204] Next, the voice control unit 137 of the server device 100 uses the generated voice data in the chat room to have the other characters, among the multiple characters, speak in the voice and tone of characters other than the selected character, content other than advertising information related to the search results, and also has the selected character speak in the voice and tone of characters, content related to advertising information related to the search results (step S308).
[0205] [5-4. Fourth Processing Step] For example, as shown in Figure 10, the search unit 134 of the server device 100 performs a search based on the search query entered by the user U in the search engine (step S401).
[0206] Next, the text generation unit 135a (generation unit 135) of the server device 100 reads the search results from the search engine performed by user U and generates text related to the search results as an answer (step S402).
[0207] Next, the script generation unit 135b (generation unit 135) of the server device 100 has the character conversion AI read the text related to the search results and generates a script for the character to read aloud (step S403).
[0208] Next, the voice generation unit 135c (generation unit 135) of the server device 100 generates audio data in which the script is read aloud in the character's voice and tone (step S404).
[0209] Next, the display control unit 136 of the server device 100 displays, in the search result display area, a character image, an audio playback button, and a messenger app icon, along with text related to the search results (step S405).
[0210] Next, when the user U presses the icon, the display control unit 136 of the server device 100 displays a QR code for adding the character as a friend in the messenger app and guides the user to the messenger app (step S406).
[0211] At this time, if user U has already registered the character as a friend, the display control unit 136 of the server device 100 displays a badge on the icon indicating that the character has been registered as a friend, and displays special information that is only provided when the character has been registered as a friend (step S407).
[0212] Next, if a character image or a play button is displayed in the search results display area, the audio control unit 137 of the server device 100 will use audio data to read the script aloud in the character's voice and tone when the user U presses the character image or play button (step S408).
[0213] At this time, if user U has already registered the character as a friend, the voice control unit 137 of the server device 100 will use voice data that reads a special script in a special voice and tone, which is only provided when the user has registered the character as a friend, to have the script read aloud in the character's voice and tone (step S409).
[0214] [6. Variant Example] The terminal device 10 and server device 100 described above may be implemented in various other forms besides those of the embodiment described above. Therefore, the following describes modifications of the embodiment.
[0215] In the above embodiment, some or all of the processing performed by the server device 100 may actually be performed by the terminal device 10 (or an application running on the terminal device 10). For example, the terminal device 10 may perform all processing in a standalone manner. In this case, the terminal device 10 is assumed to have the same functions as the server device 100 in the above embodiment. Furthermore, in the above embodiment, since the terminal device 10 is in cooperation with the server device 100, from the perspective of the user U, it appears as if the processing of the server device 100 is also being performed by the terminal device 10. In other words, from another perspective, it can be said that the terminal device 10 is equipped with the server device 100.
[0216] Furthermore, in the above embodiment, the server device 100 may be a container engine that constructs an execution environment (container) for virtually running applications on a host computer. For example, the server device 100 may construct a container for executing various functions on the user U's terminal device 10.
[0217] Furthermore, in the above embodiment, the server device 100 may change the character that reads out the advertisements depending on the location and time (time zone) of the user U. For example, the server device 100 may be able to set a character that reads out the advertisements for each location and / or time (time zone).
[0218] Furthermore, in the above embodiment, the server device 100 may use AI to generate video data that combines not only audio data in which the character reads messages or scripts in the character's voice and tone, but also video footage of the character (animation, 3D model, etc.). In this case, instead of displaying a message as spoken by the character, the server device 100 may display a link to the character's video data (or a play button for the video data).
[0219] [7. Effects] The information processing device (server device 100) according to the present invention is characterized by comprising: a provisioning unit 133 that provides a chat room in which multiple characters participate using AI; a search unit 134 that performs a search based on a search query entered by a user U in the chat room; a setting unit 132 that sets which of the multiple characters will respond with search results based on the search query; and a voice control unit 137 that, in the chat room, causes the other characters other than the selected character to speak content other than the search results in the voice and tone of the selected character, and also causes the selected character to speak content related to the search results in the voice and tone of the selected character.
[0220] This allows for a simulated chat room environment where a user is conversing within a group with multiple characters. The system can then have the responding character read aloud the search results in their own voice, as if they were actually speaking, while the other characters engage in casual conversation.
[0221] Furthermore, the information processing device according to the present invention further includes a display control unit 136 that, in a chat room, displays content other than the search results as statements made by characters other than the selected character, and displays content related to the search results as statements made by the selected character.
[0222] This allows for a simulated chat room environment where a user is conversing within a group with multiple characters. The content related to the search results can be displayed naturally as a response from the character answering the question, interspersed with casual conversation among the other characters.
[0223] The voice control unit 137, in a chat room, causes the selected character to speak about the search results while having other characters speak about things other than the search results within the flow of conversation between multiple characters.
[0224] This allows the character providing the answer to naturally read aloud the search results within the flow of conversation between the user and multiple characters in a chat room, while also incorporating casual conversation among the other characters.
[0225] The voice control unit 137, in the chat room, causes the selected character to speak about the search results, and also causes other characters to respond to the selected character's statements with their own opinions or additional information.
[0226] This allows for the representation of user questions and the reactions of other characters to the statements (answers) made by the characters who answer them.
[0227] The settings unit 132 receives advertising information settings from advertisers who have submitted bids. The voice control unit 137 causes other characters in the chat room to speak information other than advertising information related to the search results, and also causes the selected character to speak advertising information related to the search results.
[0228] This allows for natural voice-over of advertising information from advertisers within the flow of conversations in a chat room between a user and multiple characters in a group setting.
[0229] The service provider 133 provides a chat room in which the characters that user U has added as friends can participate.
[0230] This makes it possible to create a chat room consisting only of characters that user U has added as friends.
[0231] Furthermore, the information processing device according to the present invention further includes a generation unit 135 that generates audio data in which the AI speaks about the search results in the voice and tone of a set character by providing the search results to the AI. The audio control unit 137 uses the audio data in the chat room to cause the AI to speak about the search results in the voice and tone of a set character.
[0232] This allows you to have the search results read aloud in the character's voice, as if the character were actually speaking.
[0233] The generation unit 135 further generates audio data in which other characters speak content other than the search results in the voice and tone of another character. The audio control unit 137 uses this audio data in the chat room to have other characters speak content other than the search results in the voice and tone of another character.
[0234] This allows you to have the content of casual conversations, other than search results, read aloud in the voice of the character, just as if the other character were actually speaking.
[0235] Through any or a combination of the above-described processes, the information processing device according to the present application can provide technology that uses AI to represent or reproduce characters and provides information that effectively utilizes characters.
[0236] [8. Hardware Configuration] Furthermore, the terminal device 10 and server device 100 according to the above-described embodiment are realized by a computer 1000 having a configuration such as that shown in Figure 11. The following explanation will use the server device 100 as an example. Figure 11 is a diagram showing an example of the hardware configuration. The computer 1000 is connected to an output device 1010 and an input device 1020, and has a configuration in which an arithmetic unit 1030, a primary storage device 1040, a secondary storage device 1050, an output interface 1060, an input interface 1070, and a network interface 1080 are connected by a bus 1090.
[0237] The arithmetic unit 1030 operates based on programs stored in the primary storage device 1040 and the secondary storage device 1050, as well as programs read from the input device 1020, and executes various processes. The arithmetic unit 1030 can be implemented using, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), an ASIC (Application Specific Integrated Circuit), or an FPGA (Field Programmable Gate Array).
[0238] The primary storage device 1040 is a memory device, such as RAM (Random Access Memory), that temporarily stores data used by the arithmetic unit 1030 for various calculations. The secondary storage device 1050 is a storage device where data used by the arithmetic unit 1030 for various calculations and various databases are registered, and can be implemented using ROM (Read Only Memory), HDD (Hard Disk Drive), SSD (Solid State Drive), flash memory, etc. The secondary storage device 1050 may be internal storage or external storage. The secondary storage device 1050 may also be a removable storage medium such as USB (Universal Serial Bus) memory or SD (Secure Digital) memory card. The secondary storage device 1050 may also be cloud storage (online storage), NAS (Network Attached Storage), file server, etc.
[0239] The output I / F 1060 is an interface for transmitting information to be output to output devices 1010, such as displays, projectors, and printers, and is implemented using connectors of standards such as USB (Universal Serial Bus), DVI (Digital Visual Interface), and HDMI (High Definition Multimedia Interface). The input I / F 1070 is an interface for receiving information from various input devices 1020, such as mice, keyboards, keypads, buttons, and scanners, and is implemented using, for example, USB.
[0240] Furthermore, the output interface 1060 and input interface 1070 may be wirelessly connected to the output device 1010 and input device 1020, respectively. In other words, the output device 1010 and input device 1020 may be wireless devices.
[0241] Furthermore, the output device 1010 and the input device 1020 may be integrated as a touch panel. In this case, the output I / F 1060 and the input I / F 1070 may also be integrated as an input / output I / F.
[0242] The input device 1020 may also be a device that reads information from, for example, an optical recording medium such as a CD (Compact Disc), DVD (Digital Versatile Disc), or PD (Phase Change Rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.
[0243] The network interface 1080 receives data from other devices via network N and sends it to the computing unit 1030, and also transmits data generated by the computing unit 1030 to other devices via network N.
[0244] The arithmetic unit 1030 controls the output device 1010 and the input device 1020 via the output interface 1060 and the input interface 1070. For example, the arithmetic unit 1030 loads a program from the input device 1020 or the secondary storage device 1050 onto the primary storage device 1040 and executes the loaded program.
[0245] For example, when computer 1000 functions as a server device 100, the arithmetic unit 1030 of computer 1000 realizes the functions of the control unit 130 by executing a program loaded onto the primary storage device 1040. Alternatively, the arithmetic unit 1030 of computer 1000 may load a program obtained from another device via the network interface 1080 onto the primary storage device 1040 and execute the loaded program. Furthermore, the arithmetic unit 1030 of computer 1000 may cooperate with other devices via the network interface 1080 and call and use program functions, data, etc., from other programs on other devices.
[0246] [9. Other] Although embodiments of the present invention have been described above, the present invention is not limited by the content of these embodiments. Furthermore, the aforementioned components include those that can be easily conceived by those skilled in the art, those that are substantially the same, and those that fall within the so-called equivalent range. Moreover, the aforementioned components can be combined as appropriate. Furthermore, various omissions, substitutions, or modifications of the components can be made without departing from the gist of the embodiments described above.
[0247] Furthermore, among the processes described in the above embodiments, all or part of the processes described as being performed automatically can be performed manually, or all or part of the processes described as being performed manually can be performed automatically by known methods. In addition, the processing procedures, specific names, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each figure is not limited to the information shown.
[0248] Furthermore, the components of each illustrated device are functionally conceptual and do not necessarily need to be physically configured as shown. In other words, the specific forms of distribution and integration of each device are not limited to those shown, and all or part of them can be functionally or physically distributed and integrated in any unit according to various loads and usage conditions.
[0249] For example, the server device 100 described above may be implemented using multiple server computers, and the configuration can be flexibly changed, such as by calling external platforms via APIs (Application Programming Interfaces) or network computing depending on the function.
[0250] Furthermore, the embodiments and modifications described above can be combined as appropriate, provided that the processing content is not inconsistent.
[0251] Furthermore, the terms "section, module, unit" mentioned above can be replaced with "means" or "circuit," etc. For example, the acquisition unit can be replaced with acquisition means or acquisition circuit. [Explanation of Symbols]
[0252] 1. Information Processing System 10 Terminal devices 100 Server Devices 110 Communications Department 120 Storage section 130 Control Unit 131 Acquisition Department 132 Settings Section 133 Provision Department 134 Search Section 135 Generation part 135a Sentence generator 135b Script Generation Section 135c Voice generation unit 136 Display Control Unit 137 Audio Control Unit
Claims
1. A service provider that uses AI to provide a chat room in which multiple characters can participate, A search unit that performs a search based on a search query entered by a user in the aforementioned chat room, A setting unit for setting which of the aforementioned multiple characters will provide the search results based on the search query, In the aforementioned chat room, the voice control unit causes the other characters, among the multiple characters, to speak in the voice and tone of a character other than the selected character, to speak content other than the search results, and also causes the selected character to speak content related to the search results, An information processing device characterized by comprising:
2. In the chat room, a display control unit displays content other than the search results as statements made by characters other than the selected character, and displays content related to the search results as statements made by the selected character. The information processing apparatus according to claim 1, further comprising:
3. The voice control unit, in the chat room, causes the selected character to speak about the search results while having the other characters speak about things other than the search results, within the flow of conversation between the multiple characters. The information processing apparatus according to feature 1.
4. The voice control unit causes the selected character to speak about the search results in the chat room, and also causes the other characters to respond to the selected character's statements with their opinions or other information. The information processing apparatus according to feature 1.
5. The aforementioned setting unit receives advertising information settings from advertisers who have submitted bids. The voice control unit causes the other characters in the chat room to speak information other than advertising information related to the search results, and also causes the selected character to speak advertising information related to the search results. The information processing apparatus according to feature 1.
6. The aforementioned service provider offers a chat room in which the characters that the user has added as friends can participate. The information processing apparatus according to feature 1.
7. The system further includes a generation unit that provides the search results to an AI and generates audio data in which the AI speaks about the search results in the voice and tone of the aforementioned character. The voice control unit uses the voice data in the chat room to cause the configured character to speak about the search results in the voice and tone of voice of the character. The information processing apparatus according to feature 1.
8. The generation unit further generates audio data in which the other character speaks content other than the search results in the voice and tone of the other character. The voice control unit uses the voice data in the chat room to cause the other character to speak content other than the search results in the voice and tone of voice of the other character. The information processing apparatus according to feature 7.
9. An information processing method performed by an information processing device, The process of providing a chat room with multiple characters participating using AI, A search process that performs a search based on a search query entered by a user in the aforementioned chat room, A setting step of selecting a character from among the aforementioned multiple characters that will provide search results based on the search query, In the aforementioned chat room, a voice control process is performed to cause the other characters, among the multiple characters, to speak in the voice and tone of a character other than the selected character, to speak content other than the search results, and to cause the selected character to speak content related to the search results. An information processing method characterized by including
10. A procedure for providing a chat room with multiple characters participating using AI, A search procedure that performs a search based on a search query entered by a user in the aforementioned chat room, A setting procedure for selecting a character from among the aforementioned multiple characters that will provide search results based on the search query, In the aforementioned chat room, a voice control procedure is provided to cause the other characters, among the multiple characters, to speak in the voice and tone of a character other than the selected character, to speak content other than the search results, and to cause the selected character to speak content related to the search results, An information processing program characterized by causing a computer to execute it.