Application interaction method, apparatus, device, and storage medium
By receiving multimodal information, identifying user intent, and matching it to the target business module, the system dynamically displays functional components, solving the problem of cumbersome operations in terminal applications and improving application usability and user experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING XIAOMI MOBILE SOFTWARE CO LTD
- Filing Date
- 2024-12-16
- Publication Date
- 2026-06-16
Smart Images

Figure CN122220008A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of computer technology, and in particular to an application interaction method, apparatus, device, and storage medium. Background Technology
[0002] With the development of computer technology, applications on terminals have become increasingly diverse, greatly facilitating users' daily lives and work. Correspondingly, users have higher and higher demands for the functionality of these applications, and their reliance on them is becoming stronger, expecting applications to provide more efficient, convenient, intelligent, and personalized services.
[0003] It should be noted that the information disclosed in the background section above is only used to enhance the understanding of the background of this disclosure, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention
[0004] The purpose of this disclosure is to provide an application interaction method, apparatus, device, and storage medium.
[0005] According to a first aspect of the present disclosure, an application interaction method is provided, comprising: receiving first multimodal information input to an application, determining the intent of the first multimodal information; and, in response to the intent matching a target business module in the application, displaying a functional component associated with the target business module in an application page of the application.
[0006] In some implementations, displaying functional components associated with the target business module on the application page of the application includes: determining data to be displayed for at least one function in the target business module; and displaying at least one functional component containing the corresponding data to be displayed on the application page.
[0007] In some implementations, determining the data to be displayed for at least one function in the target business module includes: acquiring pre-stored business data for at least one function in the target business module; and determining the data to be displayed for the corresponding function based on the pre-stored business data.
[0008] In some implementations, determining the data to be displayed for at least one function in the target business module includes: in response to the existence of an execution requirement in the intent, determining a target function that matches the execution requirement among the at least one function; executing the target function to obtain the data to be displayed for the target function.
[0009] In some implementations, executing the target function and obtaining the data to be displayed for the target function includes: acquiring input parameters required to execute the target function; determining parameter values corresponding to each input parameter; constructing an instruction to execute the target function based on the parameter values; and executing the instruction to run the target function and obtain the data to be displayed for the target function.
[0010] In some implementations, determining the parameter values corresponding to each input parameter includes: parsing the parameter values corresponding to the input parameters from the first multimodal information; and / or receiving second multimodal information through a functional component corresponding to the target function, parsing the second multimodal information to obtain the parameter values corresponding to the input parameters; and / or determining a pre-processing function that takes the input parameters as the output result, executing the pre-processing function to obtain a pre-processing result, and determining the parameter values corresponding to the input parameters based on the pre-processing result.
[0011] In some implementations, the data to be displayed indicates a display mode, which is used to determine the display size of the functional component.
[0012] In some implementations, the first multimodal information includes at least one of the following: voice modal information, image modal information, text modal information, gesture modal information, and facial expression modal information.
[0013] In some implementations, determining the intent of the first multimodal information includes: determining the modality type involved in the first multimodal information; invoking an understanding model corresponding to the modality type to process the first multimodal information and obtain the intent of the first multimodal information.
[0014] In some implementations, the application interaction method further includes: determining that the intent matches the target business module in response to the intent matching the business description keywords of the target business module.
[0015] According to a second aspect of the present disclosure, an application interaction device is provided, comprising: an intent determination unit, configured to receive first multimodal information input to an application and determine the intent of the first multimodal information; and a display unit, configured to, in response to a match between the intent and a target business module in the application, display functional components associated with the target business module in the application's application page.
[0016] In some embodiments, the application interaction device further includes a data determination unit, which is used to determine the data to be displayed for at least one function in the target business module; the display unit is also used to display at least one functional component containing the corresponding data to be displayed on the application page.
[0017] In some implementations, the data determining unit determines the data to be displayed for at least one function in the target business module, including: acquiring pre-stored business data for at least one function in the target business module; and determining the data to be displayed for the corresponding function based on the pre-stored business data.
[0018] In some implementations, the data determination unit determines the data to be displayed for at least one function in the target business module, including: in response to the existence of an execution requirement in the intent, determining a target function that matches the execution requirement among the at least one function; executing the target function to obtain the data to be displayed for the target function.
[0019] In some implementations, the data determination unit performs the target function by: acquiring input parameters required to perform the target function; determining parameter values corresponding to each input parameter; constructing an instruction to perform the target function based on the parameter values; and executing the instruction to run the target function.
[0020] In some implementations, the data determination unit determines the parameter values corresponding to each input parameter, including: parsing the parameter values corresponding to the input parameters from the first multimodal information; and / or receiving second multimodal information through a functional component corresponding to the target function, parsing the second multimodal information to obtain the parameter values corresponding to the input parameters; and / or determining a pre-processing function that uses the input parameters as the output result, executing the pre-processing function to obtain a pre-processing result, and determining the parameter values corresponding to the input parameters based on the pre-processing result.
[0021] In some implementations, the data to be displayed indicates a display mode, which is used to determine the display size of the functional component.
[0022] In some implementations, the first multimodal information includes at least one of the following: voice modal information, image modal information, text modal information, gesture modal information, and facial expression modal information.
[0023] In some implementations, the intent determination unit determines the intent of the first multimodal information by: determining the modality type involved in the first multimodal information; and calling an understanding model corresponding to the modality type to process the first multimodal information to obtain the intent of the first multimodal information.
[0024] In some implementations, the display unit is further configured to: determine that the intent matches the target business module in response to the intent matching the business description keywords of the target business module.
[0025] According to a third aspect of the present disclosure, an electronic device is provided, characterized in that it includes: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described application interaction method.
[0026] According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium is provided, wherein when instructions in the storage medium are executed by a processor of a mobile terminal, the mobile terminal is enabled to execute an application interaction method, the method comprising: receiving first multimodal information input to an application; determining an intent of the first multimodal information; and, in response to a match between the intent and a target business module in the application, displaying a functional component associated with the target business module in an application page of the application.
[0027] According to a fifth aspect of the present disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the above-described application interaction method.
[0028] The technical solutions provided by the embodiments of this disclosure may include the following beneficial effects:
[0029] This disclosure enables the instant connection between user needs and application functions through intelligent intent recognition, business module matching, and dynamic presentation of functional components. It eliminates the tedious operation of users actively searching for various functional components on the page, allowing users to directly access the required functions through multimodal input, greatly simplifying the operation process and improving the usability and user experience of the application.
[0030] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description
[0031] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.
[0032] Figure 1 This is a flowchart illustrating an application interaction method according to some embodiments of the present disclosure.
[0033] Figure 2 This is a flowchart illustrating yet another application interaction method according to some embodiments of the present disclosure.
[0034] Figure 3 This is a flowchart illustrating yet another application interaction method according to some embodiments of the present disclosure.
[0035] Figure 4This is a flowchart illustrating yet another application interaction method according to some embodiments of the present disclosure.
[0036] Figure 5 This is a flowchart illustrating yet another application interaction method according to some embodiments of the present disclosure.
[0037] Figure 6 This is a flowchart illustrating an application interaction method for performing a target function to obtain data to be displayed, according to some embodiments of the present disclosure.
[0038] Figure 7 This is a schematic diagram illustrating an application interaction method according to some embodiments of the present disclosure.
[0039] Figure 8 This is a block diagram illustrating an application interaction device according to some embodiments of the present disclosure.
[0040] Figure 9 This is a block diagram illustrating an apparatus for application interaction according to some embodiments of the present disclosure. Detailed Implementation
[0041] Exemplary embodiments of this disclosure will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings denote the same or similar elements unless otherwise indicated. Various changes, modifications, and equivalents of the methods, apparatus, and / or systems described herein will become apparent upon understanding this disclosure. For example, the order of operations described herein is merely illustrative and is not limited to those orders set forth herein, but can be changed as will become apparent upon understanding this disclosure, except for operations that must be performed in a particular order. Furthermore, for clarity and brevity, descriptions of features known in the art may be omitted.
[0042] The embodiments described below, which are examples of some of the embodiments of this disclosure, do not represent all embodiments consistent with this disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this disclosure as detailed in the appended claims.
[0043] The specific implementation methods of the embodiments of this disclosure will now be described in detail with reference to the accompanying drawings.
[0044] Figure 1 This is a flowchart illustrating an application interaction method according to some embodiments of the present disclosure, such as... Figure 1As shown, the application interaction method can be applied to electronic devices, including but not limited to terminal devices such as smartphones, smart tablets, wearable devices, desktop computers, laptops, and smart speakers, and can also include server-side devices such as local servers and cloud servers, which can be deployed in a computer cluster consisting of one computer or multiple computers.
[0045] Figure 1 The application interaction method shown may include the following steps.
[0046] In step S110, first multimodal information input to the application is received, and the intent of the first multimodal information is determined.
[0047] In this embodiment of the disclosure, the application can receive first multimodal information input by the user through a user interface. The first multimodal information may include one or more modalities, such as voice, text, images, gestures, and other forms of input information. The electronic device containing the application can accurately capture and parse this multimodal information, providing a basis for subsequent processing.
[0048] Among these methods, technologies such as Natural Language Processing (NLP), speech recognition, and image recognition can be used to analyze and process the collected first multimodal information in order to determine the user's intent.
[0049] In this embodiment of the disclosure, the parsed intent may include identifying the operation the user wants to perform, the information they want to query, or the event they wish to achieve. The intent can be a specific command, a query request, an emotional expression, or any other purpose the user wants to achieve through the application. Accurate intent recognition is a prerequisite for subsequently providing personalized services and displaying related functional components, ensuring that the application can respond to user needs quickly and accurately.
[0050] In some embodiments of this disclosure, the first multimodal information includes at least one of the following: voice modal information, image modal information, text modal information, gesture modal information, and facial expression modal information.
[0051] In this embodiment, voice modal information can be voice input by the user through a microphone; when mobile or with hands inconvenient to operate, voice input can provide users with a convenient interaction method. Image modal information can be images or videos captured by a camera, or pre-prepared images or videos transmitted through an image input interface. Text modal information can be text transmitted through a text input interface, such as typing a paragraph on a keyboard. Gesture modal information can be gesture operations performed through a touchscreen, or images or videos captured by a camera containing user body movements (such as sign language); gesture input provides users with an interaction method to convey information through hand or body movements. Facial expression modal information can be images or videos captured by a camera containing user facial expressions; in fields such as sentiment analysis and human-computer interaction, facial expressions can convey rich emotional information, and processing facial expression modal information helps to understand the user's emotional state, thereby providing more personalized services.
[0052] In an exemplary embodiment, the first multimodal information input by the user may contain only one of the above modal information, or it may contain multiple modal information simultaneously.
[0053] For example, a user can input "How's the weather today?" as the first multimodal information, allowing us to determine their intent, which may include "weather check" or "checking today's date." Another example is a user inputting the text "Book me a flight to Beijing" along with image information (such as scanning an ID card to verify identity), which allows us to determine their intent as "booking a flight" or "destination is Beijing." Yet another example is a user inputting sign language gestures as the first multimodal information, which can be used with gesture recognition tools to determine their intent.
[0054] Through the embodiments of this disclosure, users can input multiple modal information into the application to express their needs, thereby satisfying users' interaction with the application in various scenarios, improving the flexibility and convenience of using the application, and providing users with a smoother and more personalized user experience.
[0055] In step S120, in response to the intent matching the target business module in the application, the functional components associated with the target business module are displayed on the application page of the application.
[0056] In this embodiment of the disclosure, the application may maintain one or more business modules, each providing at least one function within its corresponding business domain. Each business module may be configured with business description information to reflect its service scope, function name, etc. If an intent matches the business description information of a business module, that business module can be identified as a target business module. The number of matched target business modules can be one or more.
[0057] For example, if the intent is "weather query", the target business module could be a weather query module; if the intent is "search for cups", the target business module could be a product browsing and search module; if the intent is "buy cups", the target business module could include a product browsing and search module and a checkout module; if the intent is "book a flight" or "destination is Beijing", the target business module could be a travel booking module.
[0058] In an exemplary embodiment, the various functions can be pre-decomposed based on the services provided by the application, and the services provided by the application can be divided into at least one business module. These functions are then categorized into different business modules based on the business scope applied to each function, the dependencies between functions, and so on.
[0059] For example, the business modules of a shopping app may include: a product browsing and search module, a product details and review module, a checkout module, an order management and after-sales module, and a user center module. Taking the checkout module as an example, its functions may include shopping cart management, product checkout process, and discount calculation; taking the user center module as an example, its functions may include personal information management, points and membership system, favorites, and browsing history.
[0060] In this embodiment, the functional component corresponds to a function, and includes a UI (User Interface) design. The UI design allows the functional component to be displayed on the application page, ensuring that users can intuitively and conveniently perform the required operations. A functional component may contain UI elements such as text boxes, buttons, images, and charts. For example, a search function may have a corresponding search bar UI, a product details display function may have a details page display interface, a shopping cart function may have a shopping cart interface, an order management function may have an order list page, and so on.
[0061] In this embodiment of the disclosure, after the target business module is determined, the application can directly display the functional components associated with the target business module on its application page. The number of functional components associated with the target business module can be one or more, and these components can be displayed in combination on the application's application page.
[0062] The functional components can display information corresponding to the above intents, and can also provide interactive controls (such as input trigger buttons, input boxes, drop-down menus, etc.). After the functional components are displayed, users can perform further interactive operations based on the functional components (such as clicking, swiping, inputting, etc.) to execute corresponding business logic or update the interface state according to the user's operation.
[0063] Through the embodiments of this disclosure, users can directly access the functions they are interested in or need by simply using multimodal input, thereby improving the usability of the application and the user experience.
[0064] In an exemplary embodiment, after displaying the functional components associated with the target business module, feedback, such as auditory or tactile feedback, can be provided to the user to indicate that the required functional components have been loaded.
[0065] As can be seen from the above steps, the application interaction method provided in this disclosure can accurately capture diverse user input information (i.e., first multimodal information), quickly parse user intent, and intelligently match it to the target business module, subsequently dynamically displaying the relevant functional components of the target business module. It is evident that this method, through intelligent intent recognition, business module matching, and dynamic presentation of functional components, achieves instant connection between user needs and application functions, eliminating the tedious operation of users actively searching for various functional components on the page. Users can directly access the required functions simply through multimodal input, greatly simplifying the operation process and thus improving application usability and user experience.
[0066] In some embodiments of this disclosure, the application interaction method further includes: determining that the intent matches the target business module in response to the intent matching the business description keywords of the target business module.
[0067] In this embodiment of the disclosure, the business description information of the business module may include business description keywords. These keywords are used to describe the main functions or business scope of the business module in text, so as to semantically match with the user's intent. If the intent matches the business description keywords of one or more business modules, it is considered that the intent can be processed by one or more business modules, that is, one or more business modules can be identified as target business modules.
[0068] In an exemplary embodiment, the intent and business description keywords can be converted into vectors, and their similarity can be measured using methods such as cosine similarity or Euclidean distance. If the similarity exceeds a threshold, they are considered a match. Alternatively, predefined word classification rules can be used to determine whether the intent and business description keywords belong to the same category; if they do, they are considered a match. Alternatively, a classifier or regression model can be trained using a pre-labeled dataset of intents and business descriptions to determine whether the intent and business description keywords match.
[0069] In an exemplary embodiment, multiple methods can be combined to determine whether the intent matches the business description keywords, thereby improving the accuracy and robustness of the matching.
[0070] Furthermore, if the intent does not match the business description keywords of each business module, it means that the application cannot meet the needs expressed by the first multimodal information. In this case, an error message can be displayed to the user or the user can be prompted to re-enter the information. Alternatively, the user can be shown all the business modules that the application can provide, so that the user can quickly understand the scope of the current application's business services.
[0071] This disclosure introduces a matching step between business description keywords and intents, enabling the application to more accurately identify user intents and associate them with the correct business modules, quickly displaying the required functional components and thus improving interaction efficiency and user satisfaction. Furthermore, when the service scope of a business module changes, the service scope can be adjusted simply by modifying the business description keywords, without requiring extensive code modifications, thereby improving the maintainability of the application.
[0072] In some embodiments of this disclosure, determining the intent of the first multimodal information includes: determining the modality type involved in the first multimodal information; invoking an understanding model corresponding to the modality type to process the first multimodal information and obtain the intent of the first multimodal information.
[0073] In this embodiment of the disclosure, the modality of the user input information can be identified by analyzing the format and characteristics of the input data (i.e., the first multimodal information). For example, it can be determined whether it is a speech modality by detecting whether there is an audio signal, or whether it is an image modality by detecting image data.
[0074] Once the modality type is determined, the corresponding understanding model can be invoked to process the input information and obtain one or more hypotheses or predictions about the user's intent. These intents can be represented in text form or in a predefined data structure (such as an intent tree or intent vector).
[0075] For example, for speech modal information, a speech recognition model can be used to convert it into text, and then a natural language processing model can be used to parse and understand the text content; for image modal information, an image recognition model can be used to identify objects or scenes in an image, or to identify text, actions, etc. in an image; for text modal information, a natural language processing model can be used to parse and understand the text content.
[0076] Through the embodiments of this disclosure, the type of multimodal information can be identified first, and then the corresponding understanding model can be called to process this information, thereby more accurately understanding the user's intent and providing the user with more accurate services.
[0077] Figure 2 This is a flowchart illustrating yet another application interaction method according to some embodiments of this disclosure. For example... Figure 2 As shown, in some embodiments of this disclosure, the application interaction method may include the following steps.
[0078] Step S210: Receive first multimodal information input from the application and determine the modality type involved in the first multimodal information.
[0079] Step S220: Invoke the understanding model corresponding to the modality type to process the first multimodal information and obtain the intent of the first multimodal information.
[0080] Step S230: In response to the intent matching the target business module in the application, display the functional components associated with the target business module in the application page of the application.
[0081] The specific implementation methods for each of the above steps have been described in detail in the embodiments of the method, and will not be elaborated here.
[0082] In some embodiments of this disclosure, displaying functional components associated with the target business module on the application page of the application may include: determining data to be displayed for at least one function in the target business module; and displaying at least one functional component containing the corresponding data to be displayed on the application page.
[0083] In this embodiment of the disclosure, it can be first determined which functions in the target business module need to be displayed to the user, and then functional components corresponding to these functions can be created and displayed on the application page to display this data.
[0084] In an exemplary embodiment, at least one function in the target business module can be pre-defined as a frequently displayed function, and then the functional components of the frequently displayed function can be displayed. Alternatively, a certain number of functions to be displayed can be selected according to preset rules, and the functional components of these functions can be displayed.
[0085] For example, based on historical user data of the application, the frequency or duration of user usage of all functions in the target business module can be determined. These functions can then be sorted according to frequency or duration, and a certain number of functions at the top of the sorted list can be selected for display. Alternatively, the relevance between each function in the target business module and the parsed intent can be determined, and the functions can be sorted according to relevance, with a certain number of functions at the top of the sorted list selected for display. Another approach is to identify the target function as indicated in the intent, and then select the target function and any functions with business connections to it (such as dependencies) as functions to be displayed.
[0086] As can be seen, when different intentions are identified, different functional components can be dynamically combined to present the user with the required functional components on the application page, so that the user can directly access the function that meets the user's needs after each input of the first multimodal information.
[0087] In this embodiment of the disclosure, the data to be displayed can be obtained from the application's local database, a remote server, user input, or data output from other applications. Specifically, the data to be displayed for each function can be determined based on factors such as user intent, business logic between functions, and data availability or real-time requirements. For example, if the user's intent is to "display a product search page," the product search page can be displayed in the default manner; if the user's intent is to "search for cups," the "search for cups" function can be executed first, followed by displaying a product search page containing cups.
[0088] Through the embodiments disclosed herein, it is clear how to identify and display the functional components and their data associated with the target business module, providing a more dynamic, real-time, and user-friendly application interface.
[0089] Figure 3 This is a flowchart illustrating yet another application interaction method according to some embodiments of this disclosure. For example... Figure 3 As shown, in some embodiments of this disclosure, the application interaction method may include the following steps.
[0090] Step S310: Receive first multimodal information input from the application and determine the intent of the first multimodal information.
[0091] Step S320: In response to the intent matching a target business module in the application, determine the data to be displayed for at least one function in the target business module.
[0092] Step S330: Display at least one functional component containing the corresponding data to be displayed on the application page.
[0093] The specific implementation methods for each of the above steps have been described in detail in the embodiments of the method, and will not be elaborated here.
[0094] In some embodiments of this disclosure, the data to be displayed indicates a display mode, which is used to determine the display size of the functional component.
[0095] In this embodiment of the disclosure, multiple display modes can be pre-designed, such as a display mode for detailed display, a display mode for small window display, a display mode for thumbnail display, etc. Different display modes of different functional components can have different display sizes.
[0096] In an exemplary embodiment, the type of display mode can be determined based on the correlation between the corresponding function and the intent. For example, for a function that the intent points to, the corresponding functional component can use a detailed display mode; for a frequently displayed function that the intent does not point to, the corresponding functional component can use a minimized display mode.
[0097] Through the embodiments of this disclosure, functional components can have different display modes, enabling flexible control over the display method of functional components, thereby providing users with a more personalized interface experience.
[0098] In some embodiments of this disclosure, determining the data to be displayed for at least one function in the target business module includes: acquiring pre-stored business data for at least one function in the target business module; and determining the data to be displayed for the corresponding function based on the pre-stored business data.
[0099] In this embodiment, the pre-stored business data can be data stored in advance within the target business module for corresponding functions, used to support business logic or function implementation. The pre-stored business data can be stored in a terminal device or a cloud server. After obtaining the pre-stored business data, it can be filtered and extracted according to business logic or the parsed intent, ensuring that the extracted data meets display requirements.
[0100] For example, weather forecast information obtained periodically from the meteorological center can be used as pre-stored business data for the weather query function. In this way, when users want to use the application's weather query function, they can directly obtain the pre-stored weather forecast information for display.
[0101] For example, if a user adds items to their shopping cart in their past operations, these added items can be stored as pre-stored business data for the shopping cart function. When the user activates the shopping cart function, these added items can be retrieved and displayed.
[0102] This disclosure provides a method for determining the data to be displayed to achieve accurate display and effective delivery of the information required by the user.
[0103] Figure 4 This is a flowchart illustrating yet another application interaction method according to some embodiments of this disclosure. For example... Figure 4 As shown, in some embodiments of this disclosure, the application interaction method may include the following steps.
[0104] Step S410: Receive first multimodal information input from the application and determine the intent of the first multimodal information.
[0105] Step S420: In response to the intent matching a target business module in the application, obtain pre-stored business data for at least one function in the target business module.
[0106] Step S430: Determine the data to be displayed for the corresponding function based on the pre-stored service data.
[0107] Step S440: Display at least one functional component containing the corresponding data to be displayed on the application page.
[0108] The specific implementation methods for each of the above steps have been described in detail in the embodiments of the method, and will not be elaborated here.
[0109] In some embodiments of this disclosure, determining the data to be displayed for at least one function in the target business module includes: in response to the existence of an execution requirement in the intent, determining a target function that matches the execution requirement among the at least one function; executing the target function to obtain the data to be displayed for the target function.
[0110] In this embodiment of the disclosure, the execution requirement in the intent can be a function that the user expresses they wish to perform, or a specific goal or requirement that the user expresses they wish to achieve through a certain action or operation. For example, if a user says "I want to buy a hat," the corresponding intent is not limited to displaying a search page or a checkout page, but can include the following execution requirements: searching for hat-type products, displaying search results, allowing the user to select products and add them to their shopping cart, processing payment, etc.
[0111] Since the target business module matches the aforementioned intent, if there is an execution requirement in the intent, then there is at least one function in the target business module that can match that execution requirement.
[0112] In an exemplary embodiment, function description information can be pre-configured for the functions in the business module. This description information may include the function's name, executable operations, specific services provided, and the corresponding sub-business type. Execution requirements can be compared with the function description information of each function, and the function corresponding to the function description information that successfully matches the execution requirements can be identified as the target function.
[0113] Once the target function is identified, it can be invoked for execution. The execution process may include calling functions, methods, or APIs, passing necessary parameters, or passing context information.
[0114] After the target function is executed, the data to be displayed can be determined based on the results. The data to be displayed can be used to generate content on the user interface, such as updating text fields, rendering images, and drawing charts.
[0115] For example, if a user says "Help me book a ticket for Ski Resort A next Monday", after recognizing that the user's intention is to book tickets, it can also recognize that the intention includes the execution requirements of the "select ticket" function and the "purchase ticket" function. Then, the "select ticket" function can be executed directly to determine a ski resort ticket with the time "next Monday" and the location "Ski Resort A", and then the "purchase ticket" function can be executed to settle the ski resort ticket for the user.
[0116] Through the embodiments of this disclosure, when a user's intent includes an execution requirement, this method can directly execute the target function that matches the execution requirement, and display the result obtained after the target function is executed as the data to be displayed, so that the user can directly see the result after the function is executed, further simplifying the user's operation process and improving the user experience and interaction efficiency.
[0117] Figure 5 This is a flowchart illustrating yet another application interaction method according to some embodiments of this disclosure. For example... Figure 5 As shown, in some embodiments of this disclosure, the application interaction method may include the following steps.
[0118] Step S510: Receive first multimodal information input from the application and determine the intent of the first multimodal information.
[0119] Step S520: If the intent matches the target business module in the application and there is an execution requirement in the intent, then determine the target function that the execution requirement matches in at least one function of the target business module.
[0120] Step S530: Execute the target function to obtain the data to be displayed for the target function.
[0121] Step S540: Display at least one functional component containing the corresponding data to be displayed on the application page.
[0122] The specific implementation methods for each of the above steps have been described in detail in the embodiments of the method, and will not be elaborated here.
[0123] Figure 6 This is a flowchart illustrating the execution of a target function to obtain data to be displayed in an application interaction method according to some embodiments of this disclosure. For example... Figure 5 As shown, in some embodiments of this disclosure, performing the target function and obtaining the data to be displayed for the target function may include the following steps.
[0124] Step S610: Obtain the input parameters required to perform the target function.
[0125] In this embodiment of the disclosure, input parameter requirements, output results, and other content can be pre-configured for the function. The input parameter requirements can indicate the necessary information (i.e., input parameters) required for the function to run and the format of the information.
[0126] Step S620: Determine the parameter values corresponding to each input parameter, and construct an instruction to execute the target function based on the parameter values.
[0127] In this embodiment of the disclosure, for each input parameter, its corresponding parameter value can be clearly defined. The parameter value can come from user input, preset configuration, output results of other functions, etc. An instruction for executing the target function can be constructed based on the determined input parameters and their parameter values. This instruction can be a function call, system command, API request, etc., and its specific form can be determined according to the implementation method and operating environment of the target function.
[0128] In some embodiments of this disclosure, the step of determining the parameter value corresponding to each input parameter may include: parsing the parameter value corresponding to the input parameter from the first multimodal information; and / or receiving second multimodal information through a functional component corresponding to the target function, parsing the second multimodal information to obtain the parameter value corresponding to the input parameter; and / or determining a pre-processing function that takes the input parameter as the output result, executing the pre-processing function to obtain a pre-processing result, and determining the parameter value corresponding to the input parameter based on the pre-processing result.
[0129] In this embodiment of the disclosure, the parameter value corresponding to the input parameter can be determined in different ways under different circumstances.
[0130] In an exemplary embodiment, if the intent parsed from the first multimodal information contains information corresponding to the input parameters, the parameter values corresponding to the input parameters can be extracted from it. Specifically, the relevant information in the intent can be converted into parameter values for constructing instructions according to pre-configured input parameter requirements. For example, if a user inputs "I want to know the weather in Shanghai tomorrow" via voice, the parameter values "yesterday" and "New York" can be parsed from this voice, corresponding to the time parameter and the location parameter, respectively.
[0131] In an exemplary embodiment, if a user indicates that a target function needs to be performed but has not provided complete input parameter information in the first multimodal information, the user can first be shown the initial functional components of the target function. These initial functional components may include a parameter input module to receive the second multimodal information and determine the parameter values not provided in the first multimodal information. Similar to the first multimodal information, the second multimodal information may also include information in one or more modalities, such as voice, text, images, gestures, and other forms of information.
[0132] In an exemplary embodiment, if the target function depends on the output of other functional components, the prerequisite functions upon which the target function depends can be executed first to obtain the parameter values. For example, in a data analysis process, the data preprocessing function is a prerequisite function for the data analysis function, and the input parameter values for the data analysis function need to be obtained by executing the data preprocessing function. This can be achieved by first displaying the functional components of the prerequisite function, receiving user interaction with the prerequisite function, executing the prerequisite function, and obtaining the prerequisite results generated after its execution.
[0133] In this embodiment of the disclosure, these methods for determining parameter values can be used individually or in combination. For example, a portion of the parameter values can be parsed from the user's multimodal information first, then a pre-processing function can be executed to obtain the remaining parameter values, and finally, the remaining parameter values can be received through the functional component of the target function. This design enables the system to acquire and process information more comprehensively, thereby providing more accurate and intelligent services.
[0134] Through the embodiments disclosed herein, multiple parameter acquisition mechanisms can be integrated to flexibly determine the parameter values corresponding to each input parameter, thereby ensuring the correct execution of the target function. This flexibility enables the system to handle more complex and varied user needs and scenarios, improving the system's adaptability and robustness.
[0135] Step S630: Execute the instruction to run the target function and obtain the data to be displayed for the target function.
[0136] In this embodiment of the disclosure, the constructed instructions can be submitted to the corresponding execution environment (such as program runtime, operating system, remote server, etc.) to run the target function. After the target function is executed, some output data can be generated. The output data may include calculation results, status information, log records, etc. The part that needs to be displayed to the user can be determined from the output data as the data to be displayed by the target function.
[0137] Through the embodiments disclosed herein, the target function can be effectively performed to obtain data for display. This process ensures the accuracy and validity of the data, while also improving system reliability and user satisfaction.
[0138] Figure 7 This is a schematic diagram illustrating an application interaction method according to some embodiments of this disclosure. For example... Figure 7 As shown, the application program in this method may include a multimodal input unit 701, an intent understanding unit 702, a function scheduling unit 703, and an interface display unit 704.
[0139] The user can input first multimodal information into the multimodal input unit 701.
[0140] The multimodal input unit 701 can transmit the first multimodal information to the intent understanding unit 702. The intent understanding unit 702 can understand the first multimodal information to obtain the intent and generate instructions for scheduling functions based on the intent.
[0141] The function scheduling unit 703 can receive instructions from the intent understanding unit 702, perform function scheduling according to the instructions, and determine the function to be displayed and the corresponding data to be displayed. (Reference) Figure 7 The functions to be demonstrated in this embodiment include function 1, function 2 and function 3.
[0142] The interface display unit 704 can display the functional components of the function to be displayed on the application page of the application according to the data to be displayed. That is, it can directly present the functional components corresponding to function 1, function 2 and function 3 to the user.
[0143] It should be noted that the above figures are merely illustrative representations of the processes included in methods according to some embodiments of this disclosure, and are not intended to be limiting. It is readily understood that the processes shown in the above figures do not indicate or limit the temporal order of these processes. Furthermore, it is readily understood that these processes may be executed synchronously or asynchronously, for example, in multiple modules.
[0144] The following are embodiments of the apparatus disclosed herein, which can be used to execute embodiments of the method disclosed herein. For details not disclosed in the apparatus embodiments of this disclosure, please refer to the embodiments of the method disclosed herein.
[0145] Figure 8 This is a block diagram illustrating an application interaction device according to some embodiments of the present disclosure. (Refer to...) Figure 8 The device includes: an intent determination unit 801, a display unit 802, and a data determination unit 803.
[0146] The intent determination unit 801 is used to receive first multimodal information input to the application and determine the intent of the first multimodal information; the display unit 802 is used to display the functional components associated with the target business module in the application page of the application in response to the intent matching the target business module in the application.
[0147] In some embodiments of this disclosure, the data determination unit 803 is used to determine the data to be displayed for at least one function in the target business module; the display unit 802 is also used to display at least one functional component containing the corresponding data to be displayed on the application page.
[0148] In some embodiments of this disclosure, the data determination unit 803 determines the data to be displayed for at least one function in the target business module, including: acquiring pre-stored business data for at least one function in the target business module; and determining the data to be displayed for the corresponding function based on the pre-stored business data.
[0149] In some embodiments of this disclosure, the data determination unit 803 determines the data to be displayed for at least one function in the target business module, including: in response to the existence of an execution requirement in the intent, determining the target function that the execution requirement matches in the at least one function; executing the target function to obtain the data to be displayed for the target function.
[0150] In some embodiments of this disclosure, the data determination unit 803 performs the target function by: acquiring input parameters required to perform the target function; determining parameter values corresponding to each input parameter; constructing an instruction to perform the target function based on the parameter values; and executing the instruction to run the target function.
[0151] In some embodiments of this disclosure, the data determination unit 803 determines the parameter values corresponding to each input parameter, including: parsing the parameter values corresponding to the input parameters from the first multimodal information; and / or receiving second multimodal information through a functional component corresponding to the target function, parsing the second multimodal information to obtain the parameter values corresponding to the input parameters; and / or determining a pre-processing function that uses the input parameters as the output result, executing the pre-processing function to obtain a pre-processing result, and determining the parameter values corresponding to the input parameters based on the pre-processing result.
[0152] In some embodiments of this disclosure, the data to be displayed indicates a display mode, which is used to determine the display size of the functional component.
[0153] In some embodiments of this disclosure, the first multimodal information includes at least one of the following: voice modal information, image modal information, text modal information, gesture modal information, and facial expression modal information.
[0154] In some embodiments of this disclosure, the intent determination unit 801 determines the intent of the first multimodal information by: determining the modality type involved in the first multimodal information; and calling an understanding model corresponding to the modality type to process the first multimodal information to obtain the intent of the first multimodal information.
[0155] In some embodiments of this disclosure, the display unit 802 is further configured to: determine that the intent matches the target business module in response to the intent matching the business description keywords of the target business module.
[0156] Regarding the apparatus in the above embodiments, the specific manner in which each module performs its operation has been described in detail in the embodiments related to the method, and will not be elaborated upon here.
[0157] Figure 9 This is a block diagram illustrating an apparatus 900 for application interaction according to some embodiments of the present disclosure. For example, apparatus 900 may be a mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical device, fitness equipment, personal digital assistant, etc.
[0158] Reference Figure 9 The device 900 may include one or more of the following components: a processing component 902, a memory 904, a power component 906, a multimedia component 908, an audio component 910, an input / output (I / O) interface 912, a sensor component 914, and a communication component 916.
[0159] Processing component 902 typically controls the overall operation of device 900, such as operations associated with display, telephone calls, data communication, camera operation, and recording. Processing component 902 may include one or more processors 920 to execute instructions to perform all or part of the steps of the methods described above. Furthermore, processing component 902 may include one or more modules to facilitate interaction between processing component 902 and other components. For example, processing component 902 may include a multimedia module to facilitate interaction between multimedia component 908 and processing component 902.
[0160] Memory 904 is configured to store various types of data to support the operation of device 900. Examples of this data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, etc. Memory 904 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.
[0161] The power supply component 906 provides power to the various components of the device 900. The power supply component 906 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power to the device 900.
[0162] Multimedia component 908 includes a screen that provides an output interface between the device 900 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of the touch or swipe action but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 908 includes a front-facing camera and / or a rear-facing camera. When the device 900 is in an operating mode, such as a shooting mode or a video mode, the front-facing camera and / or the rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
[0163] Audio component 910 is configured to output and / or input audio signals. For example, audio component 910 includes a microphone (MIC) configured to receive external audio signals when device 900 is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 904 or transmitted via communication component 916. In some embodiments, audio component 910 also includes a speaker for outputting audio signals.
[0164] I / O interface 912 provides an interface between processing component 902 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.
[0165] Sensor assembly 914 includes one or more sensors for providing status assessments of various aspects of device 900. For example, sensor assembly 914 may detect the on / off state of device 900, the relative positioning of components such as the display and keypad of device 900, changes in position of device 900 or a component of device 900, the presence or absence of user contact with device 900, orientation or acceleration / deceleration of device 900, and temperature changes of device 900. Sensor assembly 914 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, sensor assembly 914 may also include an accelerometer, gyroscope, magnetometer, pressure sensor, or temperature sensor.
[0166] Communication component 916 is configured to facilitate wired or wireless communication between device 900 and other devices. Device 900 can access wireless networks based on communication standards, such as WiFi, 3G, 4G, 5G, other communication standards, or combinations thereof. In some embodiments of this disclosure, communication component 916 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In some embodiments of this disclosure, communication component 916 further includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
[0167] In some embodiments of this disclosure, the apparatus 900 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the methods described above.
[0168] In some embodiments of this disclosure, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 904 including instructions that can be executed by a processor 920 of device 900 to perform the above-described method. For example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.
[0169] A non-transitory computer-readable storage medium, when instructions in the storage medium are executed by a processor of a mobile terminal, enables the mobile terminal to execute an application interaction method, the method comprising: receiving first multimodal information input to an application; determining an intent of the first multimodal information; and, in response to a match between the intent and a target business module in the application, displaying functional components associated with the target business module in an application page of the application.
[0170] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the following claims.
[0171] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.
Claims
1. An application interaction method, characterized in that, include: Receive first multimodal information input to the application and determine the intent of the first multimodal information; In response to the intent matching a target business module in the application, the functional components associated with the target business module are displayed on the application page of the application.
2. The method according to claim 1, characterized in that, The application page of the application displays the functional components associated with the target business module, including: Identify the data to be displayed for at least one function in the target business module; The application page displays at least one functional component containing the corresponding data to be displayed.
3. The method according to claim 2, characterized in that, Determine the data to be displayed for at least one function in the target business module, including: Obtain pre-stored business data for at least one function in the target business module; The data to be displayed for the corresponding function is determined based on the pre-stored business data.
4. The method according to claim 2, characterized in that, Determine the data to be displayed for at least one function in the target business module, including: In response to the existence of an execution requirement in the intent, a target function that matches the execution requirement among the at least one function is determined; Execute the target function to obtain the data to be displayed for the target function.
5. The method according to claim 4, characterized in that, Execute the target function to obtain the data to be displayed for the target function, including: Obtain the input parameters required to perform the target function; Determine the parameter values corresponding to each input parameter, and construct an instruction to execute the target function based on the parameter values; Execute the instructions to run the target function and obtain the data to be displayed for the target function.
6. The method according to claim 5, characterized in that, Determine the parameter values corresponding to each input parameter, including: Parse the parameter values corresponding to the input parameters from the first multimodal information; and / or, The second multimodal information is received through the functional component corresponding to the target function, and the second multimodal information is parsed to obtain the parameter value corresponding to the input parameter; and / or, A pre-processing function is determined that takes the input parameters as the output result, the pre-processing function is executed to obtain the pre-processing result, and the parameter value corresponding to the input parameters is determined based on the pre-processing result.
7. The method according to claim 2, characterized in that, The data to be displayed indicates the display mode, which is used to determine the display size of the functional component.
8. The method according to claim 1, characterized in that, The first multimodal information includes at least one of the following: speech modal information, image modal information, text modal information, gesture modal information, and facial expression modal information.
9. The method according to claim 1 or 8, characterized in that, Determining the intent of the first multimodal information includes: Determine the modal types involved in the first multimodal information; The understanding model corresponding to the modality type is invoked to process the first multimodal information to obtain the intent of the first multimodal information.
10. The method according to claim 1, characterized in that, The method further includes: In response to the intent matching the business description keywords of the target business module, it is determined that the intent matches the target business module.
11. An application interaction device, characterized in that, include: An intent determination unit is configured to receive first multimodal information input to an application and determine the intent of the first multimodal information. A display unit is configured to, in response to the intent matching a target business module in the application, display the functional components associated with the target business module in the application's application page.
12. An electronic device, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to implement the steps of the method according to any one of claims 1-10.
13. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to execute an application interaction method, the method comprising: Receive first multimodal information input to the application and determine the intent of the first multimodal information; In response to the intent matching a target business module in the application, the functional components associated with the target business module are displayed on the application page of the application.