Method, apparatus, device and product for interface interaction
By presenting text and media content during user-intelligent conversations, the problem of limited interaction methods in existing technologies is solved, thereby improving the efficiency of multimodal information presentation and interaction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING ZITIAO NETWORK TECH CO LTD
- Filing Date
- 2026-03-31
- Publication Date
- 2026-06-26
AI Technical Summary
In existing technologies, the interaction methods between users and intelligent systems are limited, making it difficult to effectively improve the efficiency of information acquisition and interaction through multimodal content.
This paper provides a user interface interaction method that presents text and media content and matches it with audio content during a user's conversation with an intelligent system, thereby enabling multimodal information presentation and interaction.
It improves the efficiency of users obtaining information and enhances the interactive experience and efficiency between users and intelligent systems.
Smart Images

Figure CN122285162A_ABST
Abstract
Description
Technical Field
[0001] The examples in this article generally relate to the field of computer science, and in particular to methods, apparatuses, devices, and products for user interface interaction. Background Technology
[0002] With the development of computer technology and continuous breakthroughs in the field of artificial intelligence, various models have emerged. Users can communicate continuously and fluently with models through conversational text, completing various tasks such as information consultation, logical thinking, and content interpretation without using complex commands. Summary of the Invention
[0003] In a first aspect, a method for interface interaction is provided. The method includes: presenting a first interface indicating an ongoing call between a user and a smart system; and playing first audio content and presenting text content and media content on the first interface, wherein at least a portion of the first audio content matches the text content, and the first audio content includes descriptive content about the media content.
[0004] In a second aspect, an apparatus for interface interaction is provided. The apparatus includes: a presentation module configured to present a first interface indicating an ongoing media call between a user and a smart system; and a playback module configured to play first audio content of the call and present text content and media content on the first interface, at least a portion of the first audio content matching the text content, the first audio content including descriptive content about the media content.
[0005] In a third aspect, an electronic device is provided. The device includes at least one processor; and at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor. When executed by the at least one processor, the instructions cause the device to perform the method of the first aspect.
[0006] In a fourth aspect, a computer program product is provided, which is tangibly stored in a computer storage medium and includes computer-executable instructions that, when executed by a device, cause the device to perform the method of the first aspect.
[0007] In this way, this paper can provide users with multimodal content during conversations with intelligent systems, thereby effectively improving the efficiency of users obtaining information and enhancing the interaction efficiency between users and intelligent systems.
[0008] It should be understood that the content described in this section is not intended to limit the key or important features of the examples in this article, nor is it intended to restrict the scope of the solution. Other features will become readily apparent from the following description. Attached Figure Description
[0009] The above and other features, advantages, and aspects of the various examples herein will become more apparent when taken in conjunction with the accompanying drawings and the following detailed description. In the accompanying drawings, the same or similar reference numerals denote the same or similar elements, wherein: Figure 1 A schematic diagram of the example environment is shown; Figures 2A to 2G Example interfaces for some scenarios are shown; Figures 3A to 3F Example interfaces for some scenarios are shown; Figure 4 Example interfaces for some scenarios are shown; Figure 5 The flowcharts show example processes of interface interactions in some scenarios; Figure 6 Schematic block diagrams of example devices for interface interaction in some scenarios are shown; and Figure 7 A block diagram of an electronic device capable of implementing multiple illustrative scenarios is shown. Detailed Implementation
[0010] The examples in this document will now be described in more detail with reference to the accompanying drawings. While some examples are shown in the drawings, it should be understood that solutions can be implemented in various forms and should not be construed as limited to the examples presented herein. Rather, these examples are provided to provide a more thorough and complete understanding of the solutions. It should be understood that the drawings and examples in this document are for illustrative purposes only and are not intended to limit the scope of protection of the solutions.
[0011] It should be noted that the headings of any section / subsection provided herein are not restrictive. Various examples are described throughout this document, and examples of any type may be included under any section / subsection. Furthermore, examples described in any section / subsection may be combined in any way with any other examples described in the same section / subsection and / or different sections / subsections.
[0012] In the description of the examples in this document, the term "including" and similar terms should be understood as open inclusion, i.e., "including but not limited to". The term "based on" should be understood as "at least partially based on". The term "an example" or "the example" should be understood as "at least one example". The term "some examples" should be understood as "at least some examples". Other explicit and implicit definitions may also be included below. The terms "first", "second", etc., may refer to different or the same objects. Other explicit and implicit definitions may also be included below.
[0013] The examples in this document may involve user data, data acquisition, and / or use. All of these aspects comply with relevant laws, regulations, and rules. In the examples, all data collection, acquisition, processing, manipulation, forwarding, and use are conducted with the user's knowledge and confirmation. Accordingly, when implementing each example, the type, scope of use, and usage scenarios of any data or information that may be involved should be communicated to the user and their authorization obtained through appropriate means, in accordance with relevant laws and regulations. The specific methods of notification and / or authorization can vary depending on the actual situation and application scenario; the scope of the solution is not limited in this regard.
[0014] In this manual and the sample solutions, any processing of personal information will be conducted only under legal grounds (such as obtaining the consent of the data subject or being necessary for the performance of a contract) and will only be carried out within the scope stipulated or agreed upon. A user's refusal to process personal information beyond what is necessary for basic functions will not affect the user's use of basic functions.
[0015] As mentioned above, with the development of computer technology and continuous breakthroughs in the field of artificial intelligence, various models have emerged. Users can communicate continuously and fluently with models through conversational text, completing various tasks such as information consultation, logical thinking, and content interpretation without using complex commands.
[0016] A user interface interaction scheme is proposed. The scheme includes: presenting a first interface indicating a call between the user and an intelligent system; playing first audio content of the call and presenting text content and media content on the first interface, wherein at least a portion of the first audio content matches the text content, and the first audio content includes descriptive content about the media content.
[0017] In this way, this paper can provide users with multimodal content during conversations with intelligent systems, thereby effectively improving the efficiency of users obtaining information and enhancing the interaction efficiency between users and intelligent systems.
[0018] The following describes various examples of this scheme in further detail with reference to the accompanying drawings.
[0019] Example Environment Figure 1 A schematic diagram of example environment 100 is shown. (e.g.) Figure 1 As shown, example environment 100 may include electronic device 110.
[0020] In this example environment 100, electronic device 110 can run an application 120 that supports user interface interaction. Application 120 can be any suitable type of application for user interface interaction, including but not limited to: search applications, content sharing applications, applications for interaction with smart systems, or other suitable applications. User 140 can interact with application 120 via electronic device 110 and / or its attached devices.
[0021] exist Figure 1 In environment 100, if application 120 is active, electronic device 110 can use application 120 to present interface 150 for supporting interface interaction.
[0022] In some cases, electronic device 110 communicates with server 130 to provide services to application 120. Electronic device 110 can be any type of mobile terminal, fixed terminal, or portable terminal, including mobile phones, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, media computers, multimedia tablets, handheld computers, portable gaming terminals, VR / AR devices, personal communication system (PCS) devices, personal navigation devices, personal digital assistants (PDAs), audio / video players, digital cameras / camcorders, positioning devices, television receivers, radio receivers, e-book devices, gaming devices, or any combination of the foregoing, including accessories and peripherals of these devices or any combination thereof. In some cases, electronic device 110 can also support any type of user-facing interface (such as "wearable" circuitry).
[0023] Server 130 can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks, and big data and artificial intelligence platforms. Server 130 may include, for example, computing systems / servers such as mainframes, edge computing nodes, computing devices in a cloud environment, etc. Server 130 can provide backend services for applications 120 that support user interface interaction in electronic devices 110.
[0024] A communication connection can be established between server 130 and electronic device 110. This communication connection can be established via wired or wireless means. The communication connection can include, but is not limited to, Bluetooth, mobile network, Universal Serial Bus (USB), and Wireless Fidelity (WiFi) connections. In some cases, server 130 and electronic device 110 can exchange signaling information through their communication connection.
[0025] It should be understood that the structure and function of the various elements in environment 100 are described for illustrative purposes only and do not imply any limitation on the scope of the scheme.
[0026] The following description of the example will continue with reference to the accompanying drawings.
[0027] Example Interaction Figures 2A to 2G Example interfaces 200A to 200G are shown, illustrating interface interactions under various scenarios. Interfaces 200A to 200G can, for example, be... Figure 1 The electronic device 110 shown is provided.
[0028] In some situations, electronic device 110 displays a first interface. The first interface indicates the ongoing call between the user and the intelligent system. For example, such as... Figure 2A As shown, electronic device 110 can present a first interface. The first interface can be, for example, interface 200A. Interface 200A can indicate that a call between the user and the intelligent system is in progress. Interface 200A can include a set of controls, such as controls 201-1 to 201-4. Such a set of controls can be used to adjust the call status, such as hanging up or muting the call. The call can include, for example, video calls or voice calls. In some scenarios, the first interface is triggered by a conversation interface associated with the intelligent system. As an example, electronic device 110 can present a conversation interface associated with the intelligent system. The conversation interface can be used for interaction between the user and the intelligent system. For example, the user can send a message to the intelligent system through the conversation interface and then view the intelligent system's response to the message through the conversation interface. The conversation interface can include call controls. When a call control is triggered (e.g., by clicking or double-clicking), electronic device 110 can initiate a call between the user and the intelligent system and present the first interface (e.g., interface 200A).
[0029] In this paper, an intelligent system refers to a system capable of autonomous control based on machine learning models. An intelligent system is, for example, a virtual object or physical entity capable of making decisions and autonomously executing actions based on machine learning models to achieve preset goals or complete preset tasks. An intelligent system can be an automated program that understands user intent and can utilize models or invoke tools to complete various types of tasks. In some contexts, examples of intelligent systems may include, but are not limited to: agents, bots, chatbots, digital avatars, intelligent customer service, digital assistants, etc. Alternatively, an intelligent system can also be an intelligent role implemented based on machine learning models. An "intelligent system" can process user requests based on generative models (e.g., language models, multimodal models) to perform specified types of tasks. In some cases, an intelligent system may also relate to virtual accounts, which may have corresponding avatars or nicknames.
[0030] In some cases, electronic device 110 plays first audio content of the call and presents text content and media content on a first interface. At least a portion of the first audio content matches the text content. The first audio content includes descriptive content about the media content.
[0031] As an example, such as Figure 2B As shown, during a conversation between a user and an intelligent system, electronic device 110 can present text content and one or more media contents on a first interface (e.g., interface 200B). The text content can correspond to a text modality, such as plain text. For example, the text content could be text 202-1 and text 202-2. The media content can correspond to multiple content modalities, such as image modality, video modality, chart modality, etc. For example, the media content could include image 203 and table 204. This first audio content, text content, and media content can be generated by the intelligent system. In some scenarios, electronic device 110 can present text content and one or more media contents on interface 200B in a mixed-sorting display style.
[0032] When text and media content are presented, electronic device 110 can play first audio content. The first audio content can originate from an intelligent system, meaning it can be generated and played by the intelligent system. The first audio content can correspond to at least a portion of the text content generated in interface 200B; that is, the first audio content can be a playback of at least a portion of the text content in interface 200B. For example, the text content may include links. When the intelligent system generates audio content based on the text content, a "not read" marker can be added to the portion corresponding to the link, thus excluding the content corresponding to the link from the audio content. The first audio content can also include descriptive content about the media content; that is, the first audio content includes audio segments describing the media content. For example, if the media content is a photo of attraction A, then the first audio content includes audio segments introducing attraction A.
[0033] In some scenarios, electronic device 110 can proactively play audio content and present text and media content through an intelligent system. For example, after interface 200B is presented, the intelligent system can generate explanatory content associated with the call function. Such explanatory content may include first text content and first media content. After the explanatory content is generated, electronic device 110 can play audio content corresponding to the first text content through the intelligent system and present the first text content and first media content on interface 200B.
[0034] In other scenarios, electronic device 110 can also passively play audio content and present text and media content through an intelligent system. For example, after receiving a user's corresponding operation, the intelligent system can then play audio content and present text and media content.
[0035] As an example, electronic device 110 can receive a first operation via a call. Such a first operation could be, for example, a voice input operation. After receiving the first operation, electronic device 110 can provide the operation content corresponding to the first operation (e.g., audio input by the user) to the intelligent system. The intelligent system can generate text content, media content, and the first audio content based on the user-input audio. Furthermore, electronic device 110 can play the audio content and present the text content and media content on interface 200B. For example, the user-input audio could indicate an inquiry about a historical story. The intelligent system can then generate text content (e.g., a description of the historical story), media content (e.g., images associated with the historical story), and the first audio content (an explanation of the historical story) associated with the historical story based on this inquiry. Furthermore, electronic device 110 can play the first audio content explaining the historical story and present the text description and media content (e.g., images, tables, etc.) associated with the historical story on interface 200B. In some scenarios, such a first operation can also be a text input operation, an image input operation, etc. That is, the electronic device 110 can use the intelligent system to play audio content corresponding to text input operations or image input operations for the user, and present the corresponding text content and media content.
[0036] In some cases, the first audio content includes at least a first audio segment and a second audio segment. The first audio segment corresponds to text content. The second audio segment corresponds to descriptive content. As an example, when generating the first audio content, the intelligent system can generate multiple audio segments, each corresponding to different content. For instance, if the first audio content is an explanation of a historical event, the intelligent system can generate not only a first audio segment about the text content (e.g., a strategy for explaining a historical event) but also a second audio segment. The second audio segment can correspond to descriptive content (e.g., descriptive text). The descriptive text can be a description of the media content, while the second audio segment can be an explanation of the media content (e.g., a chart).
[0037] In some scenarios, when playing the first audio content, the electronic device 110 can play a first audio segment corresponding to the text content and a second audio segment corresponding to the descriptive content in a preset order. Taking the first audio content as an explanation of a historical event as an example, the electronic device 110 can first play the first audio segment about the strategy for explaining the historical event. Furthermore, after the first audio segment has finished playing, the electronic device 110 can play the second audio segment explaining the charts.
[0038] In some cases, electronic device 110 can receive a first instruction. The first instruction is related to media content. Furthermore, electronic device 110 can display a second interface and maintain communication with the intelligent system. The second interface is used to display media content or related information about the media content.
[0039] As an example, such as Figure 2B As shown, the electronic device 110 can present a first interface (e.g., interface 200B). Interface 200B includes media content. This media content may include, for example, image 203. The electronic device 110 can receive a first instruction while playing audio content. The first instruction can be a voice instruction, action instruction, or any other suitable type of instruction from the user regarding the media content, to adapt to the interaction habits of different users. Such voice instructions may include, but are not limited to, voice input by the user in real time or preset voice by the system. Such action instructions may include, but are not limited to, click operations, long press operations, swipe operations, hover operations, and gesture operations. Taking a click operation on media content as an example, after receiving the first instruction, as... Figure 2C As shown, the electronic device 110 can present a second interface and maintain communication with the intelligent system. This second interface can be, for example, interface 200C. Interface 200C can be, for example, a media content viewing interface. The electronic device 110 can present image 203 in interface 200C to help the user better view the media content. The electronic device 110 can also present association information related to the media content in interface 200C. Such association information can, for example, include elements associated with the media content. Such elements can, for example, be used for annotation of at least a portion of the media content.
[0040] Alternatively, the electronic device 110 may also display interactive controls associated with the image 203 on the interface 200C. These interactive controls may include, for example, controls 210 and 215. Control 210 can be used to store the image 203, and control 215 can be used to share the image 203. During the storage and sharing of the image 203, the electronic device 110 can also maintain the playback of audio content and the communication between the user and the intelligent system.
[0041] In some scenarios, media content can be associated with product objects. For example, such as... Figure 2D As shown, electronic device 110 can present media content in interface 200D. The media content may include content component 217. Content component 217 may be, for example, a product card for a product object, such as a sofa product card. Upon receiving a first instruction for content component 217 (e.g., a click on content component 217), electronic device 110 can present a second interface. Figure 2EAs shown, the second interface can be interface 200E. Interface 200E can be a viewing interface for the product object, or it can also be called a product details page. The interface content of interface 200E includes product information associated with the product object, such as the product image 217-1, the product name, and the product description. Throughout the process of presenting the viewing interface for the product object, the electronic device 110 can also maintain communication between the user and the intelligent system.
[0042] In other scenarios, media content can also be associated with specific services. These services can be any suitable lifestyle service, such as transportation or food ordering. For example, regarding food ordering... Figure 2F As shown, electronic device 110 can display media content on interface 200F. Such media content may include content component 218. Content component 218 may be presented in a card format, such as a service card for ordering food. Upon receiving a first instruction for content component 218 (e.g., a click on content component 218), as... Figure 2G As shown, electronic device 110 can present a second interface (e.g., interface 200G). Interface 200G can be a viewing interface for the ordering service, which may be referred to as a service details page. Interface 200G includes information related to the ordering service, such as an introduction to the ordering service, descriptions of the food associated with the ordering service, and introductions of the merchants associated with the ordering service. For example, electronic device 110 can display information related to the ordering service through panel 220. Throughout the presentation of interface 200G, electronic device 110 can also maintain communication between the user and the intelligent system.
[0043] In this way, users can freely interact with media content while talking to the intelligent system, displaying a second interface corresponding to the media content and maintaining the call, thereby effectively improving the user's interactive experience.
[0044] In some cases, after the second interface is presented, the electronic device 110 can also stop playing the first audio content and play the second audio content. The second audio content is associated with the interface content of the second interface. As an example, such as Figure 2CAs shown. Taking the second interface (e.g., interface 200C) as an example, where the interface content includes image 203, image 203 can be a photograph of attraction A. After interface 200C is presented, electronic device 110 can actively stop playing the first audio content and play second audio content that introduces attraction A. In some scenarios, the interface content of the second interface can also include information related to image 203. Such information could be, for example, an interface element used to identify a target item in image 203. The second audio content can include a description of the interface element and the target item. Such second audio content could be, for example, "Look at the stone marked in red in the picture; this stone is a feature of attraction A."
[0045] In other scenarios, the second interface can be a product viewing interface (e.g., interface 200E). Once the second interface (e.g., interface 200E) is presented, the electronic device 110 can actively play second audio content. Such second audio content could be something like, "I noticed you've opened the product details page; would you like me to introduce the product to you?"
[0046] Alternatively, the second interface can also be a viewing interface for a specific service (e.g., interface 200G). Once the second interface (e.g., interface 200G) is displayed, the electronic device 110 can also actively play second audio content associated with the interface content in interface 200G. Such second audio content could be something like, "Would you like me to introduce the merchant's store environment and food specialties?"
[0047] In some cases, the electronic device 110 may receive a second instruction while presenting the second interface. Furthermore, the electronic device 110 may play second audio content.
[0048] As an example, such as Figure 2C As shown, during the presentation of the second interface, the electronic device 110 can receive a second instruction. Similar to the first instruction, the second instruction can also be a voice instruction or a gesture instruction. Such a second instruction could be, for example, a query regarding interface content (e.g., image 203) in the second interface. For example, the electronic device 110 can receive a first voice input from the user, using the first voice as the second instruction. This voice could be, for example, "Introduce this image." After receiving the second instruction, the electronic device 110 provides the second instruction to the intelligent system. The intelligent system can generate second audio content based on the second instruction and the interface content. For example, the first voice is "Introduce this image." Further, the intelligent system can generate second audio content based on the first voice and image 203. The second audio content can be an introduction to image 203. After the second audio content is generated, the electronic device 110 can play the second audio content.
[0049] In this way, after the second instruction is received, this paper can provide the user with second audio content generated based on the second instruction and the interface content, thereby helping the user to better obtain information related to the interface content in the second interface, thus improving the efficiency of the user in obtaining information.
[0050] In some cases, electronic device 110 may play second audio content in response to receiving a second instruction. The second audio content is related to product information.
[0051] As an example, the second interface can be a product viewing interface, such as interface 200E. Interface 200E includes product information. Electronic device 110 can receive a second command. Taking a voice command as an example, such a second command can be an inquiry about product information. For example, electronic device 110 can receive a second voice input from the user and use the second voice as a second command. Here, the second voice could be, for example, "Introduce this product to me." After receiving the second voice, the intelligent system can generate second audio content based on the second voice and product information. After the second audio content is generated, electronic device 110 can play the second audio content to provide the user with an introduction to the product.
[0052] In some scenarios, the second interface can be a viewing interface for a specific service, such as interface 200F. Taking the service as an example of ordering food, interface 200F includes service information related to the ordering service. Service information may include merchant details, package details, etc. Electronic device 110 can receive a second command. Taking a voice command as an example, such a second command could be an inquiry about a package. For example, the second command could be "Give me an introduction to this package." After receiving the second command, electronic device 110 can provide it to the intelligent system. The intelligent system can generate second audio content based on the second command and the interface content about package details in interface 200F. Such second audio content could be, for example, an introduction to the package. After the second audio content is generated, electronic device 110 can play the second audio content to provide the user with an introduction to the package.
[0053] In some cases, electronic device 110 can receive interactive operations on the media content. Furthermore, electronic device 110 can stop playing the first audio content and play a third audio content. The third audio content is associated with the operational state of the media content.
[0054] As an example, electronic device 110 can receive user interaction operations. Interaction operations can represent user-initiated actions on media content to change its state or generate new content. Interaction operations are used to enable interaction between the user and the media content to process, adjust, or expand it. These operations can be, for example, selection, editing, or content addition operations. After an interaction operation is received, electronic device 110 can adjust the operation state of the media content. This operation state can represent the interactive mode or behavioral stage of the media content during user interaction. The operation state can indicate the types of user operations currently supported by the media content and how the element responds to these operations. Operation states can include, but are not limited to: selection state (media content is selected to indicate its target for subsequent operations), editing state (media content can be modified), addition state (supports adding new content or child elements), and drag-and-drop state (media content can be moved to adjust its layout). When the operation state of the media content corresponds to a preset state, electronic device 110 can stop playing the first audio content and play the third audio content.
[0055] For example, when the media content is switched to edit mode, the electronic device 110 can stop playing the first audio content and play the third audio content. This third audio content could, for example, indicate recommended editing operations for the media content.
[0056] In this way, when an interactive operation is received for media content, the electronic device 110 can play third audio content associated with the operation status of the media content, thereby providing users with richer ways to interact with the intelligent system and providing users with richer information.
[0057] In some cases, the interactive operation instructs the addition of a second element related to the media content, and the third audio content is related to the second element. As an example, electronic device 110 can receive user interactive operations. Such interactive operations can be, for example, content addition operations. Content addition operations can be used to add a second element related to the media content. The second element here can be any appropriate content to be added, which can correspond to multiple modalities, such as text modality, image modality, video modality, etc. After the content addition operation is received, electronic device 110 presents the added second element on a first interface. After the second element is added, electronic device 110 can utilize an intelligent system to generate third audio content associated with the second element. Such third audio content can be, for example, an introduction to the second element. For example, if the second element is a photo of attraction B, then the third audio content can be an introduction to attraction B.
[0058] For example, the content addition operation could be the addition of a product card for product B. After the product card for product B is received, the electronic device 110 can use the intelligent system to generate third-party audio content associated with product B. This third-party audio content could be an introduction to product B.
[0059] In this way, after a user adds a second element to the media content, a third audio content related to the second element can be provided to the user in a timely manner, thereby effectively improving the efficiency of the user's information acquisition.
[0060] In some cases, the interactive action instructs the selection of a third element of media content, and the third audio content is related to the third element. For example, such as... Figure 2B As shown, interface 200B includes media content. The media content may include a third element. This third element can be any constituent unit of the media content, which can be a specific object identified, manipulated, or referenced within the media content. The third element can also be the media content itself, such as a complete image or video. For example, the third element can be image 203, or a part of image 203 (such as the sun, trees, etc. in image 203). Taking image 203 as the third element as an example, electronic device 110 can receive interactive operations on image 203. Such interactive operations can be, for example, a selection operation on image 203. After receiving a selection operation on image 203, electronic device 110 can adjust the operating state of image 203. For example, electronic device 110 can adjust image 203 to a selected state. Furthermore, electronic device 110 can play third audio content, which can be an introduction to image 203.
[0061] In this way, after the third element of the media content is selected, the electronic device 110 can play the third audio content associated with the third element, thereby effectively improving the efficiency of the user in obtaining information associated with the third element.
[0062] In some cases, electronic device 110 can receive a third instruction. Furthermore, electronic device 110 can play a fourth and a fifth audio content. The fourth audio content includes a transitional prompt preceding the fifth audio content. The fifth audio content includes a response to the third instruction.
[0063] As an example, electronic device 110 can receive a third instruction. Similar to the first instruction, the third instruction can also be a voice instruction or a gesture instruction. The third instruction can be used to trigger the intelligent system to generate fourth and fifth audio content. Taking a voice instruction as an example, such a third instruction could be, for instance, "Tell me another historical story." After receiving the third instruction, electronic device 110 can play the fourth and fifth audio content. The fourth audio content can be a transitional prompt before the fifth audio content is generated. This transitional prompt can represent temporary feedback information output to the user by the intelligent system before the fifth audio content is generated. The transitional prompt can be used to indicate to the user that the current instruction has been received and is being processed. Examples include "Wait a moment," "Let me find the image," and "Generation will be complete soon." After the fifth audio content is generated, electronic device 110 can play the fifth audio content. Such fifth audio content could be, for example, a response to the third instruction, such as an explanation of another historical story.
[0064] Alternatively, during the generation of the fifth audio content, the electronic device 110 may also display text prompts. The text prompts may include descriptive text indicating the processing method and status of the third instruction. For example, the descriptive text may be "Planning the tool to use," "Searching," etc.
[0065] Figures 3A to 3F Example interfaces 300A to 300F are shown, illustrating interactive interfaces for various scenarios. Interfaces 300A to 300F demonstrate different scenarios involving changes in media content. Interfaces 300A to 300F can, for example, be derived from... Figure 1 The electronic device 110 shown is provided.
[0066] In some cases, the electronic device 110 can present changes in the media content during the playback of the first audio content. These changes can represent perceptible visual differences in the media content that occur during playback. The changes are used to indicate updates to the media content's status. These changes may include, but are not limited to: changes in display style, addition of identifier elements, dynamic updates of content, and adjustments to layout.
[0067] As an example, such as Figure 3AAs shown, electronic device 110 can present a first interface indicating an ongoing call between the user and the intelligent system. The first interface can be interface 300A. Interface 300A includes text content and media content. The media content may include, for example, image 305 and table 310. During the playback of the first audio content, electronic device 110 can adjust the style of the media content or present interface elements associated with the media content. For example, electronic device 110 can adjust the display size of image 305 or present interface elements indicating any appropriate object in image 305 in the interactive interface.
[0068] In some cases, changes in media content are related to the first audio content. For example, during the playback of the first audio content, the electronic device 110 may adjust the style of the media content or present interface elements associated with the media content based on the playback progress of the first audio content. For instance, when the first audio content reaches the third part corresponding to image 305, the electronic device 110 may zoom in on image 305 to help the user better focus on the media content.
[0069] In some cases, electronic device 110 may display a prompt message on a first interface in response to playing a first portion of the first audio content. The prompt message is related to the descriptive content. As an example, when playing the first portion of the first audio content, electronic device 110 may display the prompt message at the relevant location within the media content. The first portion may be a specific segment of the first audio content, which may correspond to the media content or an appropriate component of the media content. For example, it may be an explanation of the media content within the first audio content. The prompt message here may represent supplementary information used to annotate, explain, or supplement the media content. The prompt message may be determined based on the descriptive content.
[0070] In some scenarios, prompts are used to label media content. For example, such as... Figure 3B As shown, the prompt information can be, for example, an appropriate interface element. Such an interface element could be, for example, element 315. Element 315 can be used to annotate media content or at least a portion of objects within the media content. The first portion here can, for example, correspond to the content indicated by element 315. For example, image 305 includes the sun. When the first audio content plays to the first part introducing the sun, the electronic device 110 can display element 315 at the position of the sun in image 305 to indicate that the first audio content is currently playing to the part introducing the sun.
[0071] In other scenarios, the notification may also include supplementary descriptions related to the media content. For example, such as... Figure 3CAs shown, the prompt message may also include text elements. For example, a text element could be element 320. Element 320 may indicate supplementary information about at least a portion of the media content. For example, when the first audio content plays to the first part introducing the sun, the electronic device 110 may display element 320 on the interface 300C. Such element 320 could, for example, be supplementary information about the sun.
[0072] In this way, when the first part of the audio content is played, a prompt message can be displayed on the first screen to help users better understand the media content, thereby improving the efficiency of users in obtaining information.
[0073] In some cases, the text content includes descriptive text about a fourth element of the media content, and this descriptive text at least indicates the element style or element position of the fourth element. For example, such as... Figure 3B As shown, the media content may include image 305. Electronic device 110 may present a fourth element in image 305. The fourth element may represent an auxiliary element in the media content used to identify, locate, or highlight a specific portion of the content. The fourth element itself is not the identified target content, but rather a visual, interactive, or semantic marker pointing to the target content, which may be used to indicate content association or serve as an interactive entry point. For example, the fourth element here may be element 320. The text content corresponding to the first audio content may include descriptive text corresponding to element 320. The descriptive text may indicate the element style or element position of element 320. Such an element style may be, for example, a dashed box style, and the element position may be the upper half of the image. Such descriptive text may be, for example, "as shown by the dashed box in the upper half of the image."
[0074] In some cases, the media content includes a first element presented in a first style. The electronic device 110 may, in response to playing a second portion of the first audio content, display the first element of the media content in a second style. The first style differs from the second style. The first element is related to the second portion.
[0075] As an example, such as Figure 3CAs shown, the media content may include a first element presented in a first style. The first element can be any constituent unit of the media content, and can be a specific object identified, manipulated, or referenced within the media content. The first element can also be the media content itself, such as a complete image or video. The second part can be a specific segment of the first audio content, which can correspond to the first element in the media content, such as an explanation of the first element. Taking image 305 as the first element as an example, the electronic device 110 can present image 305 in the interface 300C at a first size. When the first audio content plays to explain the second part of image 305, the electronic device 110 can adjust the display style of image 305 to indicate that an explanation of image 305 is currently being given. For example, when image 305 is being explained, such as... Figure 3D As shown, the electronic device 110 can adjust the display size of the image 305 in the interface 300D from a first size to a second size. The first size can be smaller than the second size. Alternatively, when the image 305 is being explained, the electronic device 110 can also adjust the border style of the image 305 (e.g., highlight the border of the image 305), its display position, etc.
[0076] In some scenarios, taking chart 310 as an example, when the first audio content plays to explain the second part of chart 310, the electronic device 110 can adjust the display style of chart 310 to indicate that an explanation of chart 310 is currently being provided. For example... Figure 3E As shown, when chart 310 is being explained, electronic device 110 can adjust the border of chart 310 from the first style to the second style to highlight the border of the chart 310 currently being explained.
[0077] In other scenarios, such as Figure 3F As shown, the first element can be a component unit in the media content. Taking cell 325 in chart 310 as the first element as an example, when the first audio content plays to the narration cell 325, the electronic device 110 can not only adjust the border style of chart 310, but also adjust the display style of cell 325. For example, the electronic device 110 can change the background color of cell 325 from the first color to the second color to indicate that cell 325 is currently being narrated.
[0078] Figure 4 Example interface 400 of an interactive interface is shown, based on several scenarios. Interface 400 illustrates example scenarios where media content and text content are presented in different areas. Interface 400 can, for example, be... Figure 1 The electronic device 110 shown is provided.
[0079] In some cases, the electronic device 110 can display media content and text content in different areas of the first interface. Specifically, the electronic device 110 can display media content in a preset area of the first interface, while text content is displayed outside the preset area. As an example, such as Figure 4 As shown, the electronic device 110 can display media content in a preset area (e.g., area 405) of a first interface (e.g., interface 400), and display text content outside area 405. For example, the electronic device 110 can display text content in area 410 of interface 400. There may be no overlapping area between area 405 and area 410. Area 405 here can also be referred to as a whiteboard area.
[0080] In some cases, the electronic device 110 can move text content within the first interface and maintain it within a preset area to display media content. For example, such as... Figure 4 As shown, electronic device 110 can display media content in area 405 and text content in area 410. During the gradual generation of text content, if area 410 cannot fully display the text content, electronic device 110 can actively move the text content to ensure the user can view the latest generated text content. While the text content is being moved, electronic device 110 can maintain the media content displayed in area 405. That is, the media content does not need to move with the text content.
[0081] In some scenarios, the electronic device 110 can also receive swipe gestures from the user on the text content. Such swipe gestures can correspond to a preset direction (e.g., up or down). Upon receiving a swipe gesture, the electronic device 110 can move the text content based on the preset direction corresponding to the swipe gesture. During the movement of the text content, the electronic device 110 can maintain the media content displayed in area 405.
[0082] In other scenarios, during the playback of audio content, electronic device 110 can switch the media content presented in area 405 based on the playback progress of the audio content.
[0083] In this way, this paper can provide users with multimodal content during conversations with intelligent systems, thereby effectively improving the efficiency of users obtaining information and enhancing the interaction efficiency between users and intelligent systems.
[0084] Example process Figure 5 A flowchart of an example process 500 for interface interaction under certain conditions is shown. Process 500 can be implemented at electronic device 110. See below for reference. Figure 1 To describe process 500.
[0085] like Figure 5 As shown, in box 510, electronic device 110 presents a first interface that indicates the user's ongoing call with the smart system.
[0086] In frame 520, electronic device 110 plays first audio content of the call and presents text content and media content on a first interface. At least a portion of the first audio content matches the text content, and the first audio content includes descriptive content about the media content.
[0087] In some cases, process 500 also includes: presenting changes in media content during the playback of the first audio content.
[0088] In some cases, changes in media content are related to the primary audio content.
[0089] This approach helps users better focus on media content and improve the efficiency of information acquisition.
[0090] In some cases, changes in the presentation of media content include: in response to the playback of a first portion of first audio content, displaying a prompt message on a first interface, wherein the prompt message is related to the descriptive content.
[0091] In this way, when the first part of the audio content is played, a prompt message can be displayed on the first screen to help users better understand the media content, thereby improving the efficiency of users in obtaining information.
[0092] In some cases, the prompt information presented on the first interface includes: presenting prompt information on the first interface that is used to label media content; or presenting prompt information on the first interface that includes supplementary descriptions related to the media content.
[0093] In some cases, the media content includes a first element presented in a first style, and the changes in presenting the media content include: in response to the playback of a second part of the first audio content, displaying the first element of the media content in a second style, the first style being different from the second style, and the first element being related to the second part.
[0094] In some cases, process 500 may also include: receiving a first instruction related to media content; and presenting a second interface and maintaining communication with the intelligent system, the second interface being used to present the media content or related information about the media content.
[0095] In this way, users can freely interact with media content while talking to the intelligent system, displaying a second interface corresponding to the media content and maintaining the call, thereby effectively improving the user's interactive experience.
[0096] In some cases, process 500 may also include: stopping the playback of the first audio content and playing the second audio content, which is associated with the interface content of the second interface.
[0097] In some cases, playing the second audio content includes: receiving a second instruction during the presentation of the second interface; and playing the second audio content.
[0098] In this way, after the second instruction is received, this paper can provide the user with second audio content generated based on the second instruction and the interface content, thereby helping the user to better obtain information associated with the second interface and thus improving the efficiency of the user's information acquisition.
[0099] In some cases, the media content is associated with a product object, the interface content includes product information of the product object, and playing the second audio content of the call includes: in response to receiving a second instruction, playing the second audio content, which is related to the product information.
[0100] In some cases, process 500 may also include: receiving interactive operations on the media content; and stopping the playback of the first audio content and playing a third audio content associated with the operational state of the media content.
[0101] In this way, when an interactive operation is received for media content, the electronic device 110 can play third audio content associated with the operation status of the media content, thereby providing users with richer ways to interact with the intelligent system and providing users with richer information.
[0102] In some cases, the interactive action instructs the addition of a second element related to the media content, and the third audio content is related to the second element.
[0103] In this way, after a user adds a second element associated with the media content, a third audio content associated with the second element can be provided to the user in a timely manner, thereby effectively improving the efficiency of the user in obtaining information.
[0104] In some cases, the interactive action instructs the selection of a third element of the media content, and the third audio content is related to the third element.
[0105] In this way, after the third element of the media content is selected, the electronic device 110 can play the third audio content associated with the third element, thereby effectively improving the efficiency of the user in obtaining information associated with the third element.
[0106] In some cases, presenting media content on the first interface includes presenting the media content in a preset area of the first interface, and presenting text content outside the preset area.
[0107] In some cases, process 500 also includes: moving text content within a first interface and maintaining media content within a preset area.
[0108] In this way, the media content can remain displayed in the preset area when the text content is moved, thus allowing users to view the media content in the preset area at any time, thereby improving the user's interactive experience.
[0109] In some cases, the first audio content includes at least a first audio segment and a second audio segment, where the first audio segment corresponds to the text content and the second audio segment corresponds to the descriptive content.
[0110] In some cases, the text content includes descriptive text about a fourth element of the media content, and the descriptive text at least indicates the element style or element position of the fourth element.
[0111] In this way, this article can provide users with richer information about media content, thereby improving the efficiency of users in obtaining information.
[0112] In some cases, process 500 may also include: receiving a third instruction; and playing a fourth audio content and a fifth audio content, the fourth audio content including a transitional prompt preceding the fifth audio content, and the fifth audio content including a response to the third instruction.
[0113] Example devices and equipment A corresponding apparatus for implementing the above methods or processes is also provided. Figure 6 A schematic structural block diagram of an example device 600 for interface interaction is shown, according to some scenarios. Device 600 can be implemented as or included in electronic device 110. The various modules / components in device 600 can be implemented by hardware, software, firmware, or any combination thereof.
[0114] like Figure 6 As shown, the device 600 includes: a presentation module 610 configured to present a first interface indicating a user's ongoing call with a smart system; and a playback module 620 configured to play first audio content of the call and present text content and media content on the first interface, wherein at least a portion of the first audio content matches the text content, and the first audio content includes a description of the media content.
[0115] In some cases, device 600 also includes a change presentation module configured to present changes in media content during the playback of the first audio content.
[0116] In some cases, changes in media content are related to the primary audio content.
[0117] In some cases, the change presentation module is also configured to: in response to the playback of the first part of the first audio content, present a prompt message on the first interface, the prompt message being related to the description content.
[0118] In some cases, the change presentation module is also configured to: present a prompt message on the first interface, the prompt message being used to annotate the media content; or present a prompt message on the first interface, the prompt message including supplementary descriptions associated with the media content.
[0119] In some cases, the media content includes a first element presented in a first style, and the variation presentation module is also configured to: in response to the playback of a second part of the first audio content, display the first element of the media content in a second style, the first style being different from the second style, and the first element being related to the second part.
[0120] In some cases, device 600 also includes an interface presentation module configured to receive a first instruction related to media content; and to present a second interface and maintain communication with the intelligent system, the second interface being used to present the media content or related information about the media content.
[0121] In some cases, device 600 also includes a first stop module configured to stop playing the first audio content and play second audio content associated with the interface content of the second interface.
[0122] In some cases, the first stop module is also configured to: receive a second instruction during the presentation of the second interface; and play second audio content.
[0123] In some cases, the media content is associated with a product object, the interface content includes the product information of the product object, and the stop module is also configured to: in response to receiving a second instruction, play second audio content, which is related to the product information.
[0124] In some cases, device 600 also includes a second stop module configured to receive interactive operations on media content; and to stop playing the first audio content and play a third audio content associated with the operational state of the media content.
[0125] In some cases, the interactive action instructs the addition of a second element related to the media content, and the third audio content is related to the second element.
[0126] In some cases, the interactive action instructs the selection of a third element of the media content, and the third audio content is related to the third element.
[0127] In some cases, the presentation module 610 is also configured to present media content in a preset area of the first interface, and to present text content outside the preset area.
[0128] In some cases, device 600 also includes a movement module configured to move text content within a first interface and maintain media content within a preset area.
[0129] In some cases, the first audio content includes at least a first audio segment and a second audio segment, where the first audio segment corresponds to the text content and the second audio segment corresponds to the descriptive content.
[0130] In some cases, the text content includes descriptive text about a fourth element of the media content, and the descriptive text at least indicates the element style or element position of the fourth element.
[0131] In some cases, device 600 also includes a receiving module configured to receive a third instruction; and to play a fourth audio content and a fifth audio content, the fourth audio content including a transitional prompt preceding the fifth audio content, and the fifth audio content including a response to the third instruction.
[0132] The modules included in device 600 can be implemented in various ways, including software, hardware, firmware, or any combination thereof. In some cases, one or more modules can be implemented using software and / or firmware, such as machine-executable instructions stored on a storage medium. In addition to or as an alternative to machine-executable instructions, some or all of the units in device 600 can be implemented at least partially by one or more hardware logic components. By way of example, and not limitation, exemplary types of hardware logic components that can be used include field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard parts (ASSPs), systems on a chip (SOCs), complex programmable logic devices (CPLDs), and so on.
[0133] Figure 7 A block diagram of an electronic device 700 in which one or more examples may be implemented is shown. It should be understood that... Figure 7 The electronic device 700 shown is merely exemplary and should not be construed as limiting the functionality and scope of the examples described herein. Figure 7 The illustrated electronic device 700 can be used to implement the electronic device 110 discussed above.
[0134] like Figure 7 As shown, electronic device 700 is in the form of a general-purpose electronic device. Components of electronic device 700 may include, but are not limited to, one or more processing units or processors 710, memory 720, storage devices 730, one or more communication units 740, one or more input devices 750, and one or more output devices 760. Processor 710 may be a physical or virtual processor and is capable of performing various processes according to programs stored in memory 720. In a multiprocessor system, multiple processors execute computer-executable instructions in parallel to improve the parallel processing capability of electronic device 700.
[0135] Electronic device 700 typically includes multiple computer storage media. Such media can be any accessible media that is accessible to electronic device 700, including but not limited to volatile and non-volatile media, removable and non-removable media. Memory 720 can be volatile memory (e.g., registers, cache, random access memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. Storage device 730 can be removable or non-removable media and can include machine-readable media, such as flash drives, disks, or any other media that can be used to store information and / or data and can be accessed within electronic device 700.
[0136] Electronic device 700 may further include additional removable / non-removable, volatile / non-volatile storage media. Although not explicitly stated... Figure 7 As shown, disk drives for reading from or writing to removable, non-volatile disks (e.g., "floppy disks") and optical disk drives for reading from or writing to removable, non-volatile optical disks can be provided. In these cases, each drive can be connected to a bus (not shown) via one or more data media interfaces. Memory 720 may include computer program product 725 having one or more program modules configured to perform various methods or actions of various examples.
[0137] The communication unit 740 enables communication with other electronic devices via a communication medium. Additionally, the functionality of the components of the electronic device 700 can be implemented using a single computing cluster or multiple computing machines capable of communicating via communication connections. Therefore, the electronic device 700 can operate in a networked environment using logical connections to one or more other servers, networked personal computers, or another network node.
[0138] Input device 750 can be one or more input devices, such as a mouse, keyboard, trackball, etc. Output device 760 can be one or more output devices, such as a monitor, speaker, printer, etc. Electronic device 700 can also communicate with one or more external devices (not shown) via communication unit 740 as needed. These external devices include storage devices, display devices, etc., and can communicate with one or more devices that enable user interaction with electronic device 700, or with any device that enables electronic device 700 to communicate with one or more other electronic devices (e.g., network card, modem, etc.). Such communication can be performed via input / output (I / O) interface (not shown).
[0139] A computer-readable storage medium is provided that stores computer-executable instructions thereon, wherein the computer-executable instructions are executed by a processor to implement the methods described above. A computer program product is also provided, which is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, which are executed by a processor to implement the methods described above.
[0140] The flowcharts and / or block diagrams of the methods, apparatus, devices, and computer program products referred to herein describe various aspects. It should be understood that each block of the flowcharts and / or block diagrams, as well as combinations of blocks in the flowcharts and / or block diagrams, can be implemented by computer-readable program instructions.
[0141] These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processor of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner; thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.
[0142] Computer-readable program instructions can be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions that execute on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0143] The flowcharts and block diagrams in the accompanying figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products under various scenarios. In this respect, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction, which contains one or more executable instructions for implementing the specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the figures. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0144] Various examples have been described above. The foregoing descriptions are exemplary and not exhaustive, nor are they limited to the disclosed implementations. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described implementations. The terminology used herein is chosen to best explain the principles, practical applications, or improvements to technology in the market, or to enable others skilled in the art to understand the various implementations disclosed herein.
Claims
1. A method for interface interaction, comprising: The system presents a first interface that indicates the user's ongoing call with the intelligent system. as well as Play the first audio content of the call and present text content and media content on the first interface, wherein at least a portion of the first audio content matches the text content, and the first audio content includes descriptive content about the media content.
2. The method according to claim 1, further comprising: During the playback of the first audio content, changes in the media content are presented.
3. The method of claim 2, wherein the change in the media content is related to the first audio content.
4. The method according to claim 3, wherein the changes in presenting the media content include: In response to playing a first portion of the first audio content, a prompt message is displayed on the first interface, the prompt message being related to the description content.
5. The method according to claim 4, wherein, The prompt information displayed on the first interface includes: On the first interface, the prompt information is displayed, and the prompt information is used to annotate the media content; or On the first interface, the prompt information is presented, which includes supplementary descriptions associated with the media content.
6. The method of claim 3, wherein the media content includes a first element presented in a first style, and the variation in presenting the media content includes: In response to playing a second portion of the first audio content, a first element of the media content is displayed in a second style, which is different from the second style, and the first element is related to the second portion.
7. The method according to claim 1, further comprising: Receive a first instruction, which is related to the media content; as well as A second interface is displayed while maintaining communication with the intelligent system. The second interface is used to display the media content or related information of the media content.
8. The method according to claim 7, further comprising: Stop playing the first audio content and play the second audio content, which is associated with the interface content of the second interface.
9. The method of claim 8, wherein playing the second audio content comprises: During the presentation of the second interface, a second instruction is received; as well as Play the second audio content.
10. The method of claim 9, wherein the media content is associated with a product object, the interface content includes product information of the product object, and the second audio content of playing the call includes: In response to receiving the second instruction, the second audio content is played, the second audio content being related to the product information.
11. The method of claim 1, further comprising: Receive interactive operations on the media content; as well as Stop playing the first audio content and play the third audio content, which is associated with the operation state of the media content.
12. The method of claim 11, wherein the interactive operation instructs the addition of a second element relating to the media content, and the third audio content is relating to the second element.
13. The method of claim 11, wherein the interactive operation instructs selection of a third element of the media content, and the third audio content is associated with the third element.
14. The method according to claim 1, wherein the media content in the first interface includes: The media content is presented in a preset area of the first interface, and the text content is presented outside the preset area.
15. The method of claim 14, further comprising: Move the text content within the first interface and keep it within the preset area to display the media content.
16. The method of claim 1, wherein the first audio content includes at least a first audio segment and a second audio segment, the first audio segment corresponding to the text content and the second audio segment corresponding to the description content.
17. The method of claim 1, further comprising: Receive third instructions; as well as Play a fourth audio content and a fifth audio content, the fourth audio content including a transitional prompt preceding the fifth audio content, and the fifth audio content including a response to the third instruction.
18. A device for user interface interaction, comprising: The presentation module is configured to present a first interface that indicates the user's ongoing call with the intelligent system. as well as A playback module is configured to play first audio content of the call and present text content and media content on the first interface, wherein at least a portion of the first audio content matches the text content, and the first audio content includes descriptive content about the media content.
19. An electronic device comprising: At least one processor; as well as At least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions causing the electronic device to perform the method according to any one of claims 1 to 17 when executed by the at least one processor.
20. A computer program product tangibly stored in a computer storage medium and comprising computer-executable instructions that, when executed by a device, cause the device to perform the method according to any one of claims 1 to 17.