Auxiliary operation method, electronic device, storage medium, and product

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By acquiring and processing images of the software's user interface, and determining and outputting recommended operations and locations, the problem of high user operation complexity is solved, and operational efficiency is improved.

WO2026123342A1PCT designated stage Publication Date: 2026-06-18BEIJING ZITIAO NETWORK TECH CO LTD +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: BEIJING ZITIAO NETWORK TECH CO LTD
Filing Date: 2024-12-13
Publication Date: 2026-06-18

Application Information

Patent Timeline

13 Dec 2024

Application

18 Jun 2026

Publication

WO2026123342A1

IPC: G06F9/451

AI Tagging

Application Domain

Execution for user interfaces

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

As computer software becomes more powerful and its operation becomes more complex, it leads to higher learning costs and lower operational efficiency for users.

⚗Method used

By acquiring images of the software's user interface, image processing technology is used to determine recommended operations and operation locations, which are then output to the user, providing operation suggestions.

🎯Benefits of technology

It lowers the barrier to entry for software use and improves user efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN2024139112_18062026_PF_FP_ABST

Patent Text Reader

Abstract

The present disclosure relates to an auxiliary operation method, an electronic device, a storage medium, and a product. The auxiliary operation method comprises: after authorization from a user, acquiring an image of an operation interface of software; on the basis of a processing result of the image, determining a recommended operation for the operation interface; on the basis of the image, determining an operation position of the recommended operation on the operation interface; and outputting to the user the recommended operation and the operation position.

Need to check novelty before this filing date? Find Prior Art

Description

Auxiliary operating methods, electronic devices, storage media and products Technical Field

[0001] This disclosure relates to the field of computer technology, and in particular to an auxiliary operation method, electronic device, storage medium, and product. Background Technology

[0002] With the development of computer technology, various computer software programs have become increasingly sophisticated, enabling users to efficiently complete various tasks. However, more powerful software functions often lead to more complex operations, and the learning curve for software operation is getting higher and higher. This forces many users to spend a lot of time studying how to use the software, resulting in relatively low operational efficiency. Summary of the Invention

[0003] According to some embodiments of this disclosure, an auxiliary operation method is provided, including: acquiring an image of the software's operation interface with user authorization; determining a recommended operation for the operation interface based on the image processing result; determining the operation position of the recommended operation on the operation interface based on the image; and outputting the recommended operation and operation position to the user.

[0004] According to some embodiments of the present disclosure, an electronic device is provided, including: a memory; and a processor coupled to the memory, the processor being configured to perform an auxiliary operation method of any embodiment of the present disclosure based on instructions stored in the memory.

[0005] According to some embodiments of the present disclosure, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, performs the auxiliary operation method of any embodiment described in the present disclosure.

[0006] According to some embodiments of the present disclosure, a computer program product is provided, including instructions that, when executed by a processor, cause the processor to perform an auxiliary operation method according to any embodiment of the present disclosure.

[0007] According to some embodiments of the present disclosure, a computer program is provided, comprising: instructions that, when executed by a processor, cause the processor to perform an auxiliary operation method according to any embodiment of the present disclosure.

[0008] Other features, aspects, and advantages of this disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description

[0009] Embodiments of this disclosure are described below with reference to the accompanying drawings. It should be understood that the drawings described below are merely illustrative of some embodiments of this disclosure and are not intended to limit the scope of this disclosure. In the drawings:

[0010] Figure 1 shows a flowchart of an auxiliary operation method according to some embodiments of the present disclosure.

[0011] Figure 2 shows a flowchart illustrating a method for determining recommended operations according to some embodiments of the present disclosure.

[0012] Figure 3 shows a flowchart illustrating a method for determining recommended operations according to other embodiments of the present disclosure.

[0013] Figure 4 shows a flowchart illustrating the output method of the operation position according to some embodiments of the present disclosure.

[0014] Figure 5 shows a flowchart illustrating the output method of the operation position according to other embodiments of the present disclosure.

[0015] Figure 6 shows a schematic diagram of a user screen according to some embodiments of the present disclosure.

[0016] Figure 7 shows a schematic diagram of the structure of an auxiliary operating device according to some embodiments of the present disclosure.

[0017] Figure 8 shows a block diagram of an electronic device according to some embodiments of the present disclosure.

[0018] Figure 9 shows a block diagram of an electronic device according to some other embodiments of the present disclosure.

[0019] It should be understood that, for ease of description, the dimensions of the various parts shown in the accompanying drawings are not necessarily drawn to actual scale. The same or similar reference numerals are used in the various drawings to denote the same or similar parts. Therefore, once an item is defined in one drawing, it may not be discussed further in subsequent drawings. Detailed Implementation

[0020] The technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. It should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein.

[0021] It should be understood that the various steps described in the method embodiments of this disclosure may be performed in different orders and / or in parallel. Furthermore, method embodiments may include additional steps and / or omit the steps shown. The scope of this disclosure is not limited in this respect. Unless otherwise specifically stated, the relative arrangement, numerical expressions, and values of components and steps set forth in these embodiments should be interpreted as merely exemplary and do not limit the scope of this disclosure.

[0022] As used in this disclosure, the term "comprising" and its variations are open-ended terms that include at least the following elements / features but do not exclude other elements / features, i.e., "including but not limited to". The term "based on" means "at least partially based on".

[0023] It should be noted that the concepts of "first," "second," etc., used in this disclosure are used only to distinguish different devices, modules, or units, and are not intended to define the order of functions performed by these devices, modules, or units or their interdependencies. Unless otherwise specified, the concepts of "first," "second," etc., are not intended to imply that the objects described herein must be in a given temporal, spatial, rank, or any other given order.

[0024] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0025] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

[0026] The user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this disclosure are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of the relevant data shall comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation portals shall be provided for users to choose to authorize or refuse.

[0027] The embodiments of this disclosure are described in detail below with reference to the accompanying drawings; however, this disclosure is not limited to these specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. Furthermore, in one or more embodiments, specific features, structures, or characteristics can be combined in any suitable manner that will be apparent to those skilled in the art from this disclosure.

[0028] It should be understood that this disclosure does not limit how the image to be applied / processed is obtained. In some embodiments of this disclosure, it can be obtained from a storage device, such as internal memory or external storage device. In other embodiments of this disclosure, a camera component can be invoked to capture an image. It should be noted that the acquired image can be a captured image or a frame from a captured video, and is not particularly limited to these.

[0029] In the context of this disclosure, "image" can refer to any of a variety of images, such as color images, grayscale images, etc. It should be noted that the type of image is not specifically limited in the context of this specification. Furthermore, an image can be any suitable image, such as a raw image obtained by a camera device, or an image from which specific processing has been performed, such as preliminary filtering, dealiasing, color adjustment, contrast adjustment, normalization, etc. It should be noted that preprocessing operations may also include other types of preprocessing operations known in the art, which will not be described in detail here.

[0030] The embodiments of this disclosure provide an auxiliary operation method. With user authorization, by acquiring an image of the software's operation interface, and based on the operation interface, recommended operations and operation positions are output to the user, thereby assisting the user in operating the software efficiently.

[0031] Figure 1 shows a flowchart illustrating an auxiliary operation method according to some embodiments of the present disclosure. As shown in Figure 1, the auxiliary operation method of this embodiment includes steps S102 to S108. This embodiment can be executed by an electronic device loaded with software operated by the user, for example, by a module in the electronic device or by another software.

[0032] In step S102, with user authorization, an image of the software's user interface is acquired.

[0033] Software can be any type of software running on electronic devices. For example, office software can include document processing software, spreadsheet software, presentation software, graphic design software, and so on. Software can also be multimedia content creation software, such as image editing software, graphic design software, audio production software, video production software, special effects software, and so on. Of course, the above description only exemplifies some of the software to which this disclosure applies, and not all of the software.

[0034] The image of the user interface can be obtained by taking a screenshot during the user's operation of the software, or by the user actively providing the image of the user interface. The obtained image may only include the software's user interface. Alternatively, the obtained image may include other content besides the software's user interface; for example, a screenshot of the electronic device's screen can be taken directly with the user's authorization, and the screenshot may include the software's user interface. Of course, in some embodiments, after obtaining the screenshot, object recognition can be further performed on the captured image to identify the area of the software's user interface, and the captured image can be further processed to retain only the user interface.

[0035] In step S104, based on the processing result of the image, the recommended operation for the operation interface is determined.

[0036] Image processing can utilize traditional image processing models or algorithms to identify, classify, and detect objects within images based on their features. Alternatively, artificial intelligence (AI)-based image processing models or algorithms can be used to further understand the image beyond the initial recognition, classification, and object detection results. The processing results can include the status or descriptive information of the user interface. Therefore, based on the processing results, the desired user action can be determined, and recommended actions can be provided.

[0037] Recommended actions can take many forms. For example, they can be descriptions of specific actions, such as triggering a control, selecting an object, or inputting information. Alternatively, a recommended action can be a general description of one or more specific actions; for instance, a recommended action could be "sorting," but performing the sorting requires several specific actions (or steps), such as selecting the object to be sorted, triggering the sorting control, and then selecting the sorting method. In short, the content of recommended actions can be defined according to actual needs.

[0038] In step S106, the recommended operation position on the operation interface is determined based on the image.

[0039] For example, based on the recommended actions, the actions to be performed can be determined, and then the elements used to perform these actions can be identified from the operation interface. The positions of these elements can then be defined as the operation positions.

[0040] In step S108, recommended operations and operation locations are output to the user.

[0041] Recommended actions and locations can be output using one or more methods, such as text, voice, images, video, and icons. The output methods for recommended actions and locations can be the same or different. If output through display, at least one of the recommended actions or locations can be directly displayed on top of the user interface or on other interfaces, such as the interface of an auxiliary operation assistant software. This disclosure does not impose any limitations on this. By outputting recommended actions and their locations to the user, the user can clearly understand what to do and how to do it.

[0042] The above embodiments, by acquiring and processing images of the software's user interface, can determine recommended operations and operation locations for the user based on the images. This allows for the provision of operation suggestions during software use, and by outputting the operation locations, it also provides highly feasible operational guidance. Therefore, the above embodiments can lower the barrier to entry for using the software and improve the efficiency of user operations.

[0043] In embodiments of this disclosure, operation suggestions may be proactively provided to the user by the auxiliary operation assistant, or they may be proactively requested by the user. These two methods are described below by way of example.

[0044] In scenarios where user operation suggestions are proactively provided, suggestions can be offered by analyzing the user's operational intent. Figure 2 illustrates a flowchart of a method for determining recommended operations according to some embodiments of this disclosure. As shown in Figure 2, the determination method in this embodiment includes steps S202 to S204.

[0045] In step S202, the user's operational intent is determined based on the image processing results.

[0046] A user's action intent reflects the action that the user wants to perform or needs to perform.

[0047] For example, a user's operational intent can be determined based on information about one or more objects in the user interface, such as the object's content, type, and state. The information of each object can be matched with a pre-set information state table to determine the operational intent. The information state table can include the correspondence between the information of each object and multiple operational intents. Alternatively, the information of each object can be input into a machine learning model, which performs semantic understanding on this information and outputs the operational intent. Of course, this model can be pre-trained using training data, which includes collected samples and their labeling information. The samples reflect the information of objects in historical user interfaces, and the labeling information represents the actual operations performed by the sample user on that historical user interface. Those skilled in the art can also use other methods to determine the user's operational intent, which will not be elaborated upon here.

[0048] In some embodiments, based on the image processing results, at least one of the interface elements, user input content, and operation objects in the operation interface is determined; based on at least one of the interface elements, user input content, and operation objects, the user's operation intention is determined.

[0049] Interface elements in a user interface refer to inherent elements of the software, such as panels, controls, and borders. In some embodiments, interface elements can be identified based on the software type. For example, similarity recognition or matching can be performed between objects in an image and objects in pre-acquired images belonging to the software type to determine interface elements. The user interface allows determination of currently available operations, such as operations that can be performed by triggering controls within the current interface. Of course, controls corresponding to some operations may not be directly displayed in the current interface, but can be indirectly triggered by engaging controls within the current interface; for example, some controls that need to be triggered may be hidden in submenus of the menu displayed in the current interface.

[0050] User input refers to content generated through user input operations or further generated based on existing user input, including various types such as text, images, videos, and icons. Generally, in a user interface, content other than interface elements and screen indicators (such as mouse pointers) used to represent user interaction constitutes user input. User input reflects objects that can be further processed.

[0051] An actionable object refers to an object in an image that the user is interacting with. This could be an object indicated by a screen indicator, or an object in a state of being manipulated (such as an object that is selected and highlighted or has a changed color). Actionable objects can be UI elements or user input. They more directly reflect the action the user is currently performing or wants to perform.

[0052] The aforementioned object types can be used individually or in combination to determine the user's operational intent. For example, when only UI elements are considered, operational suggestions can be given based on commonly used functions within the UI; when only user input is considered, optimization suggestions for the current content can be determined, such as sorting, summing, or adjusting the display method; when only the object to be operated on is considered, the user's operational intent can be assumed to be to continue operating on that object or to perform operations associated with that object. Combining these object types allows for a more accurate determination of the user's intent.

[0053] For example, the object to be processed in the user interface can be identified, thereby determining the user's intent. In some embodiments, the object to be processed in the user interface is determined based on at least one of interface elements, user input, and the object to be processed; the user's intent is then determined based on the object to be processed in the user interface. The object to be processed refers to the object that the user wants to further manipulate. For example, in spreadsheet software, if the user selects a column, the object to be processed is the data in that column, and the user's intent might be to sort the data based on the values in that column; if the user selects a cell in a blank column, the object to be processed is that cell, and the user's intent might be to input a formula to further calculate the data in the row or column containing that cell; if the table has many rows or columns, and the user's cursor is on the scroll bar, the object to be processed is the entire table, and the user's intent might be to quickly view the entire table, in which case operations such as freezing rows or columns or quickly locating the end of the table can be recommended. In document software, if the user opens the footer editing, the object to be processed is the footer, and the user's intent might be to add page numbers. Therefore, the user's intent can be accurately predicted.

[0054] For example, suggestions for display optimization can be provided based on the current display method of the input content. In some embodiments, the display method of the user input content is determined; in response to a mismatch between the display method of the user input content and the user interface, the operation intention is determined to adjust the display method. The display method may include the size of the content, background, color, and the data type of the content, such as text, tables, graphs, etc. Whether it matches the user interface can be determined based on the fill degree of the user input content in the user interface or elements in the user interface, such as whether the user input content is too large or too small; it can also be determined based on the type of user interface, such as prioritizing the display of content in the form of charts, flowcharts, etc., for presentation software. Thus, it can efficiently assist users in optimizing the display effect and improve user operation efficiency and experience.

[0055] In determining the user's intent, the results of semantic understanding obtained from the image can also be referenced, which can more accurately determine the user's intent. In some embodiments, the user's intent is determined based on the semantic understanding results of the image, wherein the semantic understanding results include at least one of descriptive information about the user interface and descriptive information about the user input content in the user interface. The descriptive information about the user interface can reflect the current state of the user interface, such as the user using a certain function of the software or the amount of data in the user interface being small. The descriptive information about the user input content can be the semantic understanding results of the user input content, which can reflect the semantic type of the user input content, such as academic, business, or the topic of the user input content, etc. The semantic understanding results of the user input content can be combined with the foregoing embodiments to determine the user intent. For example, for presentation software, if the content is legal text, the user's intent is more likely to be displayed in text format; if the content is business data, the user's intent is more likely to be displayed in chart format.

[0056] In step S204, operations that match the user's intent are identified as recommended operations. For example, operations used to execute or achieve the intent are identified as recommended operations. Thus, the recommended operations can satisfy the user's intent.

[0057] To further improve operational efficiency, shortcuts to operations can be recommended to the user. In some embodiments, in response to an operation intent including performing a first operation, and an operation that has already been performed, a shortcut to the first operation is determined as a recommended operation. For example, with user authorization, operations performed by the user in other images can be acquired and recorded. A shortcut is recommended when it is found that the user may need to perform an operation multiple times. Of course, the embodiments of this disclosure are not limited to this. Shortcuts can also be recommended for operations that meet other conditions, or shortcuts can be recommended for each operation, as needed. This further assists the user in saving operation time, improves operational convenience, and thus enhances operational efficiency.

[0058] The above embodiments predict the user's operational intent based on the images of the user interface and recommend operations that match that intent, thereby enabling timely identification of the user's potential needs during software use. Therefore, this disclosure can assist user operations and improve user efficiency.

[0059] When a user provides instructions, suggestions can be provided based on those instructions. Figure 3 shows a flowchart illustrating a method for determining recommended actions according to other embodiments of this disclosure. As shown in Figure 3, the determination method of this embodiment includes steps S302 to S308.

[0060] In step S302, the user's instruction is obtained. The user's instruction can be sent in various ways, such as voice, text, or trigger controls. The user's instruction can be a question, such as "I want to sort these data and remove duplicate data, how do I do that?"

[0061] In step S304, the instruction is semantically understood. This semantic understanding process can be implemented using a natural language processing model.

[0062] In step S306, based on the results of semantic understanding and the processing results of the image, the object in the software associated with the instruction is determined.

[0063] For example, if a user instructs to perform a sorting operation, it is necessary to specify the object to be sorted, the control to perform the sorting, etc., as objects associated with the instruction.

[0064] In step S308, a recommended operation for the user interface is determined based on the object associated with the instruction.

[0065] After identifying the objects associated with the indicator, the operations that need to be performed on each object can be determined. Taking sorting as an example again, for the objects being sorted, a selection operation needs to be performed; for the controls performing the sorting, a trigger operation needs to be performed, and so on.

[0066] The above embodiments can recommend operations based on user instructions. Thus, after determining the recommended operation, the operation location can also be output to the user. Therefore, when the user clearly understands the operation requirement but is unsure of the operation method, this disclosure can assist the user in completing the desired operation, improving the user's operational efficiency.

[0067] Considering that different types of software may offer slightly different operations, the process of determining recommended operations for users can be streamlined by filtering through the operations available to that type of software, thus providing more reasonable operation suggestions. For example, this involves identifying the type of software; determining the set of operations corresponding to that type of software; and then, based on the image processing results, determining recommended operations for the user interface from this set of operations. This narrows down the scope when determining recommended operations, improving the efficiency of assisted operations and ensuring the feasibility of the recommendations.

[0068] When displaying the operation location to the user, you can either label it on the screen or simply describe the location. The following provides illustrative examples of these two methods.

[0069] Figure 4 shows a schematic flowchart of an output method for an operating position according to some embodiments of the present disclosure. As shown in Figure 4, the output method of this embodiment includes steps S402 to S406.

[0070] In step S402, the elements corresponding to the recommendation operation are determined in the image. For example, the elements that need to be triggered when the recommendation operation is performed, the selected elements, etc.

[0071] In step S404, the third position information of the element corresponding to the recommended operation on the screen is determined based on the first position information of the element in the image corresponding to the recommended operation and the second position information of the operation interface on the screen.

[0072] For example, we can establish an image coordinate system for the image and a screen coordinate system for the electronic device's screen. The first position information of an element in the image can be understood as the element's coordinates in the image coordinate system. The position information of the user interface on the screen can be understood as the relative positional relationship between a certain marker point on the user interface (e.g., a corner point, center point) and a marker point on the screen. Thus, we can determine the mapping relationship between the image coordinate system and the screen coordinate system. Therefore, we can map the element's coordinates in the image coordinate system to its coordinates in the screen coordinate system, obtaining the element's third position information on the screen.

[0073] In step S406, an identifier is displayed on the screen based on the third location information.

[0074] Because the software assisting with operations may be different from the software used by the user, the software assisting with operations may not be able to be directly identified within the user-operated software. With proper permissions, a new layer with a transparent background can be created on the screen and placed on top of the main layer, displaying the identifier within this layer. This allows for the identification of specific locations within the user interface.

[0075] The identifier can be displayed in any way, such as an arrow, a bubble window, a highlighted area, etc., and those skilled in the art can choose as needed.

[0076] Through the above embodiments, the operation location corresponding to the operation suggestion can be accurately indicated on the screen, allowing the user to quickly and clearly understand the operation method. Therefore, the efficiency of assisting user operation is improved.

[0077] Figure 5 shows a flowchart illustrating an output method for an operating position according to other embodiments of the present disclosure. As shown in Figure 5, the output method of this embodiment includes steps S502 to S504.

[0078] In step S502, the element corresponding to the recommended operation in the image is determined as the target element.

[0079] In step S504, descriptive information about the target element is output to the user via voice or text.

[0080] For example, if the target element is column B in a table, the description would be "Column B"; or if the target element is a sorting control, the description would be "The sorting control in the XX panel has a funnel icon".

[0081] This method reduces interference with the user interface display. For easily discoverable and prominent target elements, this method can also convey the location of the operation to the user, thus improving the efficiency of assisted operations.

[0082] The above two output methods for operation positions can be used in combination to provide operation guidance to users from multiple perspectives.

[0083] Some embodiments of this disclosure can be implemented using assistive operating software including an intelligent agent. During user operation, the intelligent agent can provide operation suggestions to the user. If an intelligent agent is used as an assistive operating assistant, at least one of the recommended operation and operation location can be displayed in the interface interacting with the intelligent agent. If output via audio, at least one of the output recommended operation and operation location can be represented by the intelligent agent's voice. This voice can be a voice message or voice from the intelligent agent during a voice call between the user and the intelligent agent. The following exemplarily describes embodiments of intelligent agent-assisted user operation.

[0084] Figure 6 illustrates a user screen schematic according to some embodiments of the present disclosure. As shown in Figure 6, the screen 6 of the user's electronic device displays an interface 61 of a spreadsheet software and an interaction window 62 of an intelligent agent. The image on screen 6 is sent to the intelligent agent so that the intelligent agent can process the image and provide operation suggestions to the user.

[0085] In some embodiments, with user authorization, screen images are periodically acquired or acquired in response to user instructions. These screen images include images of the software's user interface. For example, a user can actively capture an image of the current screen 6 or an image of the software's user interface 61, and then send it to the agent through the interaction window 62 so that the agent can provide operational suggestions. Alternatively, the auxiliary operating software can periodically acquire screen images.

[0086] The interaction window 62 may include an identifier 621 for the intelligent agent, such as a name, avatar, etc., so that the user can clearly identify the current interactive object. In addition, the interaction window 62 may also include multiple controls to select the interaction method between the user and the intelligent agent, such as a control 622 for sending voice messages, a control 623 for sending text messages, a control 624 for voice calls, etc.

[0087] The interactive window 62 may also optionally include an operation suggestion display area 625 for displaying operation suggestions in a visual manner, such as text, images, links, and videos.

[0088] In some embodiments, in response to a recommended operation comprising multiple steps, each step, or the operation position corresponding to each step, is output simultaneously or sequentially. For example, in FIG6, the operation suggestion includes three steps, and all steps are output at once in the operation suggestion display area 625, and the operation position corresponding to each step is sequentially marked on the screen. In the example of FIG6, the identifier 63 of the first step is currently displayed. In response to a user's selection of each step in the operation suggestion display area 625, an identifier of the operation position corresponding to the selected step can be displayed on the screen. Of course, this is only an exemplary operation step and the way the operation position of the operation step is output; those skilled in the art can adjust it as needed, and this disclosure will not elaborate further.

[0089] In some embodiments, the recommended operation is executed automatically in response to the user's confirmation. For example, when an auxiliary operation assistant has obtained the necessary permissions, it can automatically execute the recommended operation within the software being used by the user to further improve the user's operational efficiency. The user can confirm the recommended operation through various methods such as text, voice, or confirmation controls. Since the auxiliary operation assistant has already determined the recommended operation and its location, it can also execute the operation accurately.

[0090] The embodiments of the auxiliary operation method of this disclosure have been described above. The apparatus for performing the methods of each embodiment is further described below.

[0091] Figure 7 shows a schematic diagram of the structure of an auxiliary operation device according to some embodiments of the present disclosure. As shown in Figure 7, the auxiliary operation device 7 of this embodiment includes: an acquisition module 701, configured to acquire an image of the software's operation interface with user authorization; a determination module 702, configured to determine a recommended operation for the operation interface based on the image processing result; and to determine the operation position of the recommended operation on the operation interface based on the image; and an output module 703, configured to output the recommended operation and operation position to the user.

[0092] In some embodiments, the determining module 702 is further configured to: determine the user's operation intention based on the processing result of the image; and determine the operation that matches the operation intention as the recommended operation.

[0093] In some embodiments, the determining module 702 is further configured to: determine at least one of interface elements, user input content, and operation object in the operation interface based on the processing result of the image; and determine the user's operation intention based on at least one of the interface elements, user input content, and operation object.

[0094] In some embodiments, the determining module 702 is further configured to: determine the object to be processed in the operation interface based on at least one of interface elements, user input content, and operation object; and determine the user's operation intention based on the object to be processed in the operation interface.

[0095] In some embodiments, the determining module 702 is further configured to: determine the display mode of the user input content; and, in response to a mismatch between the display mode of the user input content and the operation interface, determine the operation intent as adjusting the display mode.

[0096] In some embodiments, the determining module 702 is further configured to: determine the user's operation intention based on the semantic understanding result of the image, wherein the semantic understanding result includes at least one of the descriptive information of the operation interface and the descriptive information of the user input content in the operation interface.

[0097] In some embodiments, the determining module 702 is further configured to: determine a shortcut operation of the first operation as a recommended operation in response to an operation intent including performing a first operation and an operation that has already been performed.

[0098] In some embodiments, the auxiliary operation device 7 further includes a semantic understanding module 704, configured to: acquire a user's instruction and perform semantic understanding on the instruction; the determination module 702 is further configured to: determine an object in the software associated with the instruction based on the result of semantic understanding and the result of image processing; and determine a recommended operation for the operation interface based on the object associated with the instruction.

[0099] In some embodiments, the determining module 702 is further configured to: determine the type of software; determine the set of operations corresponding to the type of software; and determine recommended operations for the user interface from the set of operations based on the processing results of the image.

[0100] In some embodiments, the determining module 702 is further configured to: determine an element corresponding to the recommendation operation in the image; determine a third position information of the element corresponding to the recommendation operation on the screen based on a first position information of the element corresponding to the recommendation operation in the image and a second position information of the operation interface on the screen; and display an identifier on the screen based on the third position information.

[0101] In some embodiments, the output module 703 is further configured to: determine, in the image, an element corresponding to the recommendation operation as a target element; and output descriptive information about the target element to the user via voice or text.

[0102] In some embodiments, the output module 703 is further configured to output each step, or the operation position corresponding to each step, simultaneously or sequentially in response to the recommended operation comprising multiple steps.

[0103] In some embodiments, the auxiliary operation device 7 further includes an automatic execution module 705, configured to automatically execute the recommended operation in response to the user's confirmation of the recommended operation.

[0104] In some embodiments, the acquisition module 701 is further configured to: periodically acquire screen images when authorized by the user, or acquire screen images in response to the user's instruction, wherein the screen images include images of the software's user interface.

[0105] Figure 8 shows a block diagram of an electronic device according to some embodiments of the present disclosure.

[0106] Memory 81 is used to store one or more computer-readable instructions. Memory 81 may include any combination of various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory, including but not limited to random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), read-only memory (ROM), and flash memory. Memory 81 may, for example, store operating systems, application programs, boot loaders, databases, and other programs, as well as various application programs and various data.

[0107] Processor 82 is configured to execute computer-readable instructions to implement the methods described in any of the foregoing embodiments. Specific implementations of each step of the method can be found in the above embodiments; repeated details will not be elaborated upon here.

[0108] The processor 82 can be configured to perform the steps of the foregoing embodiments. The processor 82 can be embodied in various processing devices, such as a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. The central processing unit (CPU) can be an x86 or ARM architecture, etc.

[0109] The processor 82 and the memory 81 can communicate with each other directly or indirectly. For example, the processor 82 and the memory 81 can communicate via a network. The network can include wireless networks, wired networks, and / or any combination of wireless and wired networks. The processor 82 and the memory 81 can also communicate with each other via a system bus, which is not limited in this disclosure.

[0110] It should be noted that the components of the electronic device 8 shown in Figure 8 are exemplary and not limiting. The electronic device 8 may have other components depending on the specific application requirements. The processor 82 can control other components in the electronic device 8 to perform the desired functions.

[0111] Electronic device 8 can be implemented by software, firmware and / or hardware, and can be integrated into a device with the relevant application installed.

[0112] Figure 9 shows a block diagram of an electronic device according to some other embodiments of the present disclosure.

[0113] The electronic device 9 shown in Figure 9 can be a computer system with a dedicated hardware structure, capable of performing corresponding functions when relevant applications are installed.

[0114] Electronic devices include, but are not limited to, mobile terminals such as smartphones, laptops, personal digital assistants (PDAs), tablet computers (PCs), PMPs (portable multimedia players), in-vehicle terminals (such as in-vehicle navigation terminals), wearable devices, and fixed terminals such as digital televisions and desktop computers.

[0115] As shown in Figure 9, the Central Processing Unit (CPU) 91 executes various processes based on programs stored in Read-Only Memory (ROM) 92 or programs loaded from Storage Section 98 into Random Access Memory (RAM) 93. RAM 93 stores data required as needed when the CPU 91 executes various processes. The CPU is merely exemplary and can also be other types of processors, such as the various processors described above. ROM 92, RAM 93, and Storage Section 98 can be various forms of computer-readable storage media. It should be noted that although ROM 92, RAM 93, and Storage Section 98 are shown separately in Figure 9, one or more of them can be combined or located in the same or different memories or storage modules.

[0116] CPU 91, ROM 92 and RAM 93 are interconnected via bus 94. Input / output interface 95 is also connected to bus 94.

[0117] The following components are connected to the input / output interface 95: input section 96, such as a touchscreen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output section 97, including displays such as cathode ray tube (CRT), liquid crystal display (LCD), speakers, vibrators, etc.; storage section 98, including hard disk, magnetic tape, etc.; and communication section 99, including network interface cards such as LAN cards, modems, etc. The communication section 99 allows communication processing to be performed via a network such as the Internet. It is readily understood that although some parts of the electronic device 9 shown in Figure 9 communicate via bus 94, they can also communicate via a network or other means, wherein the network can include wireless networks, wired networks, and / or any combination of wireless and wired networks.

[0118] As needed, drive 910 is also connected to input / output interface 95. Removable media 911, such as disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on drive 910 as needed, so that computer programs read from them can be installed into storage section 98 as needed.

[0119] When the above series of processes are implemented through software, the program constituting the software can be installed from a network such as the Internet or a storage medium such as a removable medium 911.

[0120] According to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, some embodiments of this disclosure include a computer program product that, when run on a computer, causes the computer to perform the methods described in any of the foregoing embodiments. The computer program product includes computer instructions carried on a computer-readable medium, containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer instructions can be downloaded and installed from a network via communication section 99, or installed from storage section 98, or installed from ROM 92. When the computer program is executed by CPU 91, the methods of embodiments of this disclosure are performed.

[0121] It should be noted that, in the context of this disclosure, a computer-readable medium can be a tangible medium that may contain or store programs for use by or in conjunction with an instruction execution system, apparatus, or device.

[0122] A computer-readable medium may be a computer-readable storage medium, a computer-readable signal medium, or any combination thereof.

[0123] Computer-readable storage media include, but are not limited to, systems, apparatuses, or devices that are electrical, magnetic, optical, electromagnetic, infrared, or semiconductor, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this disclosure, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. Computer instructions are stored on the computer-readable storage medium that, when executed by a processor, implement the methods described in any of the foregoing embodiments.

[0124] Computer-readable signal media may include data signals propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media may also be any computer-readable medium other than computer-readable storage media, capable of sending, propagating, or transmitting programs for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium may be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.

[0125] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device.

[0126] In some embodiments, a computer program is also provided, comprising: instructions that, when executed by a processor, cause the processor to perform the methods described in any of the foregoing embodiments. For example, the instructions may be embodied in computer program code.

[0127] In embodiments of this disclosure, computer program code for performing the operations of this disclosure can be written in one or more programming languages or a combination thereof. These programming languages include, but are not limited to, object-oriented programming languages such as Java, Smalltalk, and C++, as well as conventional procedural programming languages such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network (including a local area network (LAN) or a wide area network (WAN)), or it can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0128] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0129] The functions described above can be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary hardware logic components that can be used include: Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs), Application Standard Products (ASSPs), System-on-Chip (SoCs), Complex Programmable Logic Devices (CPLDs), and so on.

[0130] While specific embodiments of this disclosure have been described in detail by way of example, those skilled in the art should understand that the examples are for illustrative purposes only and not intended to limit the scope of this disclosure. Those skilled in the art should understand that modifications can be made to the above embodiments without departing from the scope and spirit of this disclosure. The scope of this disclosure is defined by the appended claims.

Claims

1. An auxiliary operation method, comprising: With user authorization, images of the software's user interface were obtained; Based on the processing results of the image, a recommended operation is determined for the user interface; Based on the image, determine the location of the recommended operation on the user interface; The recommended action and the location of the action are output to the user.

2. The auxiliary operation method according to claim 1, wherein, The step of determining the recommended operation for the user interface based on the image processing result includes: Based on the processing results of the image, the user's operational intent is determined; The operation that matches the stated operation intention is identified as the recommended operation.

3. The auxiliary operation method according to claim 2, wherein, Determining the user's operational intent based on the processing result of the image includes: Based on the processing result of the image, at least one of the following is determined in the operation interface: interface elements, user input content, and operation object; The user's operation intention is determined based on at least one of the interface elements, the user input content, and the operation object.

4. The auxiliary operation method according to claim 3, wherein, Determining the user's operation intent based on at least one of the interface elements, the user input content, and the operation object includes: The object to be processed in the operation interface is determined based on at least one of the interface elements, the user input content, and the operation object; The user's operational intent is determined based on the object to be processed in the operation interface.

5. The auxiliary operation method according to claim 3 or 4, wherein, Determining the user's operation intent based on at least one of the interface elements, the user input content, and the operation object includes: Determine how the user input content will be displayed; In response to a mismatch between the display method of the user input content and the operation interface, the operation intention is determined to be to adjust the display method.

6. The auxiliary operation method according to any one of claims 2 to 5, wherein, Determining the user's operation intent based on at least one of the interface elements, the user input content, and the operation object includes: Based on the semantic understanding results of the image, the user's operation intention is determined, wherein the semantic understanding results include at least one of the descriptive information of the operation interface and the descriptive information of the user input content in the operation interface.

7. The auxiliary operation method according to any one of claims 2 to 6, wherein, The step of determining the operation that matches the operation intent as the recommended operation includes: In response to the operation intent including performing a first operation, and an operation that has already been performed, a shortcut operation for the first operation is determined as the recommended operation.

8. The auxiliary operation method according to any one of claims 1 to 7, further comprising: Obtain user instructions and perform semantic understanding on those instructions. The step of determining the recommended operation for the user interface based on the processing result of the image includes: determining the object in the software associated with the instruction based on the semantic understanding result and the processing result of the image; and determining the recommended operation for the user interface based on the object associated with the instruction.

9. The auxiliary operation method according to any one of claims 1 to 8, wherein, The step of determining the recommended operation for the user interface based on the image processing result includes: Determine the type of the software; Determine the set of operations corresponding to the type of the software; Based on the processing results of the image, recommended operations for the user interface are determined from the set of operations.

10. The auxiliary operation method according to any one of claims 1 to 9, wherein, Determining the location of the recommended operation on the user interface based on the image includes: In the image, identify the element corresponding to the recommended operation; Based on the first position information of the element corresponding to the recommended operation in the image and the second position information of the operation interface on the screen, the third position information of the element corresponding to the recommended operation on the screen is determined. Based on the third location information, an identifier is displayed on the screen.

11. The auxiliary operation method according to any one of claims 1 to 10, wherein, Determining the location of the recommended operation on the user interface based on the image includes: In the image, the element corresponding to the recommendation operation is identified as the target element; Descriptive information about the target element is output to the user via voice or text.

12. The auxiliary operation method according to any one of claims 1 to 11, wherein, The step of outputting the recommended operation and the operation location to the user includes: In response to the recommended operation comprising multiple steps, each step, or the operation position corresponding to each step, is output simultaneously or sequentially.

13. The auxiliary operation method according to any one of claims 1 to 12, further comprising: In response to the user's confirmation of the recommendation operation, the recommendation operation is executed automatically.

14. The auxiliary operation method according to any one of claims 1 to 13, wherein, The image of the software's user interface includes: With the user's authorization, screen images are periodically acquired, or acquired in response to the user's instruction, and the screen images include images of the software's user interface.

15. An electronic device comprising: Memory; as well as A processor coupled to the memory, the processor being configured to execute an auxiliary operation method as described in any one of claims 1 to 14 based on instructions stored in the memory.

16. A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, causes the processor to perform the auxiliary operation method according to any one of claims 1 to 14.

17. A computer program product, when run on a computer, causes the computer to perform the auxiliary operation method according to any one of claims 1 to 14.

18. A computer program comprising: Instructions, when executed by a processor, cause the processor to perform the auxiliary operation method according to any one of claims 1 to 14.