Information processing method and apparatus, smart glasses, computer device, and medium

By combining smart glasses and smartwatches with voice and motion recognition technology, a multi-dimensional communication dialog box is formed, which solves the problems of omission and misunderstanding in dialogue in noisy environments, and achieves more accurate information recognition and personalized services.

WO2026129443A1PCT designated stage Publication Date: 2026-06-25HANGZHOU QIUGUOJIHUA TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HANGZHOU QIUGUOJIHUA TECHNOLOGY CO LTD
Filing Date
2025-01-08
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

In noisy environments, the accuracy of voice information collection and recognition by smart glasses and smartwatches decreases, leading to omissions or misunderstandings in conversations and an inability to effectively recognize the voice and body language of users and other parties.

Method used

The system receives voice information from the first user through smart glasses and voice information from the second user transmitted through a smartwatch. It also combines this with image sensor data to obtain the second user's action information. The system performs comprehensive recognition and processing to form a multi-dimensional communication dialog box.

Benefits of technology

It improves the accuracy and comprehensiveness of dialogue content recognition, enhances user understanding and experience of dialogue, supports cross-language communication and personalized prompting strategies, and improves user interaction experience and satisfaction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025071313_25062026_PF_FP_ABST
    Figure CN2025071313_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The present application relates to the field of data processing, and specifically relates to an information processing method and apparatus, smart glasses, a computer device, and a medium. A first user wears smart glasses and a smart watch. The method comprises: receiving first voice information sent by a first user; receiving second voice information sent by a smart watch, wherein the second voice information is inputted to the smart watch by a second user; acquiring action information of the second user by an image sensor; and performing recognition processing on the first voice information, the second voice information and the action information to form a conversation dialog box on a display interface of the smart glasses. By means of the method, dialog conversation content of users is accurately and effectively recognized, such that users can accurately know dialog conversation content.
Need to check novelty before this filing date? Find Prior Art

Description

An information processing method, apparatus, smart glasses, computer equipment, and medium

[0001] Cross-reference to related applications

[0002] This application claims priority to Chinese Patent Application No. 2024118865940, filed on December 20, 2024, entitled "An Information Processing Method, Apparatus, Smart Glasses, Computer Equipment and Medium", the entire contents of which are incorporated herein by reference. Technical Field

[0003] This application relates to the field of data processing, and more specifically, to an information processing method, apparatus, smart glasses, computer equipment, and medium. Background Technology

[0004] Smart glasses, as wearable devices that integrate the functions of traditional glasses and modern smart devices, have made significant technological progress in recent years. Equipped with advanced technologies such as miniature displays, cameras, and sensors, they offer users a variety of functions including information display, augmented reality (AR), photography, video recording, and navigation. Meanwhile, smartwatches, with their low power consumption and portability, provide accurate data and guidance for sports and health monitoring. With the rapid development of the Internet of Things (IoT) and wireless communication technologies, data interaction and collaborative work between smart glasses and smartwatches have become possible. This collaborative work not only enhances the user experience but also plays a crucial role in multi-user communication scenarios.

[0005] Typically, when a user wears smart glasses and a smartwatch, the smartwatch will collect the user's or the other party's voice information through its built-in microphone and other voice acquisition devices. After recognizing the voice information, the smartwatch will send the recognized text content to the smart glasses for display to the user, in order to assist the user in conversation.

[0006] However, the research found that the transmission and collection of voice information are affected by environmental factors. If a user is in a noisy environment, the voice information during the conversation cannot be accurately collected by the smartwatch, leading to a decrease in the accuracy of the recognized text content. This further reduces the effectiveness and accuracy of the conversation content received by the user through the smart glasses. In addition, people often communicate not solely through language but also through non-verbal expressions such as body language and facial expressions. If information is displayed solely by collecting the user's or other party's voice information, the viewpoints expressed through body language may be missed, leading to omissions or misunderstandings of the conversation content, thus reducing the effectiveness and accuracy of the conversation content recognition. Summary of the Invention

[0007] In view of this, the purpose of this application is to provide an information processing method, device, smart glasses, computer equipment and medium to accurately and effectively identify the content of user dialogue, so that the user can accurately know the content of the dialogue.

[0008] In a first aspect, embodiments of this application provide an information processing method applied to smart glasses, wherein a first user wears the smart glasses and a smartwatch, and the method includes:

[0009] Receive the first voice message sent by the first user;

[0010] Receive second voice information sent by the smartwatch, wherein the voice information is input to the smartwatch by a second user;

[0011] The action information of the second user is obtained through an image sensor;

[0012] The first voice information, the second voice information, and the action information are recognized and processed to form a communication dialog box on the display interface of the smart glasses.

[0013] Optionally, the first voice information, the second voice information, and the action information are processed for recognition to form a communication dialog box on the display interface of the smart glasses, including:

[0014] The semantics corresponding to the first voice information, the second voice information, and the action information are arranged in chronological and semantic order to form the communication dialog box.

[0015] Optionally, the first voice information, the second voice information, and the action information are processed for recognition to form a communication dialog box on the display interface of the smart glasses, including:

[0016] Determine whether the time interval of the second voice information overlaps with the time interval of the action information;

[0017] If the time intervals of the second voice information and the action information do not overlap, then the semantic information corresponding to the action information is determined as the content in the communication dialog box.

[0018] Optionally, the first voice information, the second voice information, and the action information are processed for recognition to form a communication dialog box on the display interface of the smart glasses, including:

[0019] In response to the overlap of the time intervals of the second voice information and the action information, an approximate value of the second voice information and the action information is determined;

[0020] In response to the approximate value being greater than a preset value, the semantic content corresponding to the second voice information is determined as the content in the communication dialog box.

[0021] Optionally, the first voice information, the second voice information, and the action information are processed for recognition to form a communication dialog box on the display interface of the smart glasses, including:

[0022] In response to the overlap of the time intervals of the second voice information and the action information, an approximate value of the second voice information and the action information is determined;

[0023] In response to the approximate value being less than a preset value, the first semantic content corresponding to the second voice information is determined, and the second semantic content corresponding to the action information is determined;

[0024] The first semantic content and the second semantic content are merged, and the merged semantic information is determined as the content of the communication dialog box.

[0025] Optionally, after receiving the first voice information sent by the first user, the method further includes:

[0026] The first voice information is translated and sent to the smartwatch so that the smartwatch can display or play the translated first voice information.

[0027] Optionally, the method further includes: determining the intent of the first user in the communication dialog box, and determining a strategy to prompt the first user based on the intent of the first user.

[0028] Optionally, determining a strategy for prompting the first user based on the first user's intent includes:

[0029] Determine whether the strategy is related to the smartwatch;

[0030] If the policy is related to a smartwatch, then the policy is sent to the smartwatch.

[0031] Secondly, embodiments of this application provide an information processing device applied to smart glasses, wherein a first user wears the smart glasses and a smartwatch, and the device includes:

[0032] The first voice information receiving module is configured to receive the first voice information sent by the first user.

[0033] The second voice information receiving module is configured to receive second voice information sent by the smartwatch, wherein the voice information is input to the smartwatch by a second user;

[0034] The motion information acquisition module is configured to acquire the motion information of the second user through an image sensor;

[0035] The communication dialog box display module is configured to recognize and process the first voice information, the second voice information, and the action information, and form a communication dialog box on the display interface of the smart glasses.

[0036] Thirdly, a smart glasses, wherein the smart glasses, when running, perform the steps of the information processing method described in any of the optional embodiments of the first aspect above.

[0037] Fourthly, embodiments of this application provide a computer device, including: a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the computer device is running, the processor communicates with the memory via the bus. When the machine-readable instructions are executed by the processor, they perform the steps of the information processing method described in any of the optional embodiments of the first aspect above.

[0038] Fifthly, embodiments of this application provide a computer-readable storage medium storing a computer program, which, when executed by a processor, performs the steps of the information processing method described in any of the optional embodiments of the first aspect.

[0039] The technical solution provided in this application includes, but is not limited to, the following beneficial effects:

[0040] This application, when a first user is wearing smart glasses and a smartwatch, receives first voice information sent by the first user through the smart glasses and second voice information sent by the smartwatch. The voice information is input by the second user to the smartwatch, enabling real-time and effective collection of voice information from both parties in the dialogue. Then, an image sensor acquires the second user's motion information, collecting real-time information such as body movements, providing multi-dimensional and multi-form data support for subsequent dialogue content recognition. Finally, the first voice information, second voice information, and motion information are processed to form a dialogue box on the smart glasses' display interface. By comprehensively considering both voice and motion information to form the dialogue box, the comprehensiveness of information recognition is improved, resulting in more accurate and effective recognition results. This allows for accurate and effective recognition of the user's dialogue content, enabling the user to accurately understand the content of the dialogue.

[0041] To make the above-mentioned objectives, features and advantages of this application more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description

[0042] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0043] Figure 1 shows a flowchart of an information processing method provided in an embodiment of this application;

[0044] Figure 2 shows a flowchart of a method for forming an interactive dialog box according to an embodiment of this application;

[0045] Figure 3 shows a flowchart of a specific method for forming a communication dialog box according to an embodiment of this application;

[0046] Figure 4 shows a flowchart of a second specific method for forming a communication dialog box provided in an embodiment of this application;

[0047] Figure 5 shows a flowchart of a user prompt strategy determination method provided in an embodiment of this application;

[0048] Figure 6 shows a flowchart of a translation and real-time response method based on smart glasses and a watch provided in an embodiment of this application;

[0049] Figure 7 shows a schematic diagram of the structure of an information processing device provided in an embodiment of this application;

[0050] Figure 8 shows a schematic diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation

[0051] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. The components of the embodiments of this application described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.

[0052] To facilitate understanding of this application, the embodiments of this application will be described in detail below with reference to the flowchart of an information processing method provided in an embodiment of this application shown in FIG1.

[0053] Referring to Figure 1, which shows a flowchart of an information processing method provided in an embodiment of this application, the method is applied to smart glasses. A first user wears the smart glasses and a smartwatch. The method includes steps S101 to S104:

[0054] S101: Receive the first voice information sent by the first user.

[0055] Specifically, the first user sends voice information through the smart glasses' built-in microphone or other voice input devices.

[0056] S102: Receive second voice information sent by the smartwatch, wherein the voice information is input to the smartwatch by the second user.

[0057] Specifically, the second user inputs voice information through the microphone of the smartwatch, and this information is then sent to the smart glasses.

[0058] S103: Obtain the action information of the second user through the image sensor.

[0059] Specifically, the image sensors (such as cameras) of smart glasses capture the actions of a second user, such as gestures and facial expressions.

[0060] S104: The first voice information, the second voice information, and the action information are recognized and processed to form a communication dialog box on the display interface of the smart glasses.

[0061] Specifically, the smart glasses' processor performs voice recognition on the received first and second voice messages. Simultaneously, it performs motion recognition on the motion information acquired by the image sensor. The recognition results are integrated to form a dialogue box containing text, voice, and motion information, which is then displayed on the smart glasses' interface.

[0062] This application, through steps S101-S104, when a first user is wearing smart glasses and a smartwatch, receives first voice information sent by the first user through the smart glasses and second voice information sent by the smartwatch. The voice information is input by the second user to the smartwatch, enabling real-time and effective collection of voice information from both parties in the dialogue. Then, the image sensor acquires the second user's motion information, enabling real-time collection of body movements and other information from the dialogue participants, providing multi-dimensional and multi-form data support for subsequent dialogue content recognition. Finally, the first voice information, second voice information, and motion information are processed to form a dialogue box on the smart glasses' display interface. By comprehensively considering both voice and motion information to form the dialogue box, the comprehensiveness of information recognition is improved, resulting in more accurate and effective recognition results. This allows for accurate and effective recognition of the user's dialogue content, enabling the user to accurately understand the content of the dialogue.

[0063] In an optional implementation, the first voice information, the second voice information, and the action information are recognized and processed to form a communication dialog box on the display interface of the smart glasses, including:

[0064] The semantics corresponding to the first voice information, the second voice information, and the action information are arranged in chronological and semantic order to form the communication dialog box.

[0065] Specifically, when processing the first voice information, the second voice information, and the action information to form a communication dialog box on the smart glasses, it is indeed necessary to arrange this information in chronological and semantic order. For voice information, a speech recognition algorithm is used to convert speech into voice information. For action information, an action recognition algorithm is used to convert the image data corresponding to the action information into an understandable action description (such as "nodding," "waving," etc.). Based on the semantic order of the voice information and the action description, the voice and action information are arranged from first to last to form the communication dialog box.

[0066] In some preferred embodiments, the display of action descriptions in the communication dialog box can be emoticons or emojis; this application does not impose any particular limitation on this. When arranging the voice and action information in order to form the communication dialog box according to the semantic sequence of the voice information and the corresponding semantics of the action descriptions, firstly, their basic order is determined based on the timestamp of each message, providing a preliminary timeline for subsequent adjustment of the semantic order. Semantic analysis is performed on the identified voice text and action descriptions to identify key elements in the dialogue, such as questions, answers, instructions, and responses. Based on the results of the semantic analysis, the preliminary timeline is adjusted to ensure the fluency and logic of the dialogue, arranging the information according to the natural order of the conversation.

[0067] Based on chronological and semantic order, the recognized speech text and action icons are arranged in the dialog box according to the natural flow of the conversation. When new information is added, the dialog box should automatically scroll or adjust its layout to accommodate it.

[0068] In the above steps, the semantics corresponding to the first voice information, the second voice information, and the action information are arranged in chronological and semantic order to form a communication dialog box, which helps to enhance the coherence of the dialogue, improve the user experience, and promote effective communication.

[0069] In an optional implementation, referring to Figure 2, which shows a flowchart of a method for forming an interactive dialog box according to an embodiment of this application, wherein the first voice information, the second voice information, and the action information are identified and processed to form an interactive dialog box on the display interface of the smart glasses, including steps S201 to S202:

[0070] S201: Determine whether the time interval of the second voice information coincides with the time interval of the action information.

[0071] Specifically, during a dialogue, the parties may use both body language and verbal communication, or they may only use body language or head movements while remaining silent. To conduct targeted analysis of these different scenarios, when processing the first voice information, the second voice information, and the action information to form the communication dialog box on the smart glasses, it is indeed necessary to determine the temporal relationship between these pieces of information and, based on this, whether they should be included as part of the dialogue content, in order to adapt different information display strategies to different scenarios.

[0072] For each piece of information (including the first voice information, the second voice information, and the action information), its start time and end time are obtained, and its time interval is determined based on its start and end times. It is then determined whether the time interval of the second voice information overlaps with the time interval of the action information to determine whether the two time intervals intersect.

[0073] S202: In response to the fact that the time intervals of the second voice information and the action information do not overlap, the semantic information corresponding to the action information is determined as the content in the communication dialog box.

[0074] Specifically, if the time intervals of the second voice information and the action information do not overlap, that is, they do not occur simultaneously, then the semantic information corresponding to the action information shall be used as the basis for determining the content in the communication dialog box separately.

[0075] In some specific embodiments of this application, the corresponding action of the action information is identified based on computer vision technology, such as 3D convolutional neural networks (3D CNNs) or skeleton keypoint recognition algorithms (ST-GCN), and then the semantic information corresponding to the action information is determined through the action.

[0076] This application captures and processes voice and motion information in real time through steps S201 to S202. Based on the overlap of the time intervals of voice and motion information, the content displayed on the smart glasses' display interface is adjusted accordingly, which can improve the efficiency and accuracy of communication and enhance the user experience.

[0077] In an optional implementation, referring to Figure 3, Figure 3 shows a flowchart of a specific communication dialog box formation method provided by an embodiment of this application, wherein the first voice information, the second voice information, and the action information are recognized and processed to form a communication dialog box on the display interface of the smart glasses, including steps S301 to S302:

[0078] S301: In response to the overlap of the time intervals of the second voice information and the action information, determine an approximate value for the second voice information and the action information.

[0079] Specifically, when the time intervals of the second voice information and the action information overlap, that is, when the user is speaking and performing an action at the same time, the semantic information corresponding to the action information is identified based on computer vision technology, and the semantic information corresponding to the second voice information is determined through a speech recognition algorithm. Then, the approximate values ​​of the semantic information corresponding to the second voice information and the action information are calculated based on a pre-configured approximation algorithm. These approximate values ​​can reflect the correlation between the two.

[0080] In some specific implementations, semantic understanding-based computational methods are used for semantic recognition. Specifically, semantic approximations can be calculated based on semantic dictionaries or semantic expressions.

[0081] S302: In response to the approximate value being greater than a preset value, the semantic content corresponding to the second voice information is determined as the content in the communication dialog box.

[0082] Specifically, if this approximate value is greater than the preset value, it indicates that there is a strong correlation between the second voice information and the action information. In this case, the semantic content corresponding to the second voice information is determined as the content in the communication dialog box.

[0083] This application determines the content in the communication dialog box by simultaneously considering the overlap of the time intervals of the second voice information and the action information in steps S301 to S302 and calculating the approximate value between them. This can improve the accuracy of communication understanding, enhance the user experience, and help promote barrier-free communication.

[0084] In an optional implementation, referring to Figure 4, Figure 4 shows a flowchart of a second specific method for forming a communication dialog box provided in this application embodiment, wherein the first voice information, the second voice information, and the action information are recognized and processed to form a communication dialog box on the display interface of the smart glasses, including steps S401 to S403:

[0085] S401: In response to the overlap of the time intervals of the second voice information and the action information, determine an approximate value for the second voice information and the action information.

[0086] Specifically, when the time intervals of the second voice information and the action information overlap, that is, when the user is speaking and performing an action at the same time, the semantic information corresponding to the action information is identified based on computer vision technology, and the semantic information corresponding to the second voice information is determined through a speech recognition algorithm. Then, the approximate values ​​of the semantic information corresponding to the second voice information and the action information are calculated based on a pre-configured approximation algorithm. These approximate values ​​can reflect the correlation between the two.

[0087] In some specific implementations, semantic understanding-based computational methods are used for semantic recognition. Specifically, semantic approximations can be calculated based on semantic dictionaries or semantic expressions.

[0088] S402: In response to the approximate value being less than a preset value, determine the first semantic content corresponding to the second voice information, and determine the second semantic content corresponding to the action information.

[0089] Specifically, if the approximate value is less than the preset value, it indicates that the correlation or synchronicity between the second speech information and the action information is not strong, or that the information they convey is independent. In this case, the first semantic content corresponding to the second speech information and the second semantic content corresponding to the action information are determined separately.

[0090] S403: Merge the first semantic content and the second semantic content, and determine the merged semantic information as the content of the communication dialog box.

[0091] Specifically, the first and second semantic content are fused together to combine voice and action information at the semantic level. Combining these two information sources, the fused semantic information will more accurately reflect the user's communication intent and more comprehensively express the user's intent or communication content.

[0092] The fusion of the first semantic content and the second semantic content includes: extracting key features from the first semantic content and the second semantic content respectively; key features include words, phrases, grammatical structures, action types, and action parameters; and fusing the extracted features using a feature fusion algorithm (such as weighted average, max pooling, attention mechanism, etc.) to obtain fused semantic information.

[0093] This application, through steps S401 to S403, simultaneously considers the overlap of time intervals of the second voice information and action information and their approximate values, and determines and fuses semantic content accordingly. This enables a more accurate understanding of the user's communication intent and improves the performance of the smart glasses system in terms of communication comprehension, user experience, and diversity of communication methods.

[0094] In an optional implementation, after receiving the first voice information sent by the first user, the method further includes:

[0095] The first voice information is translated and sent to the smartwatch so that the smartwatch can display or play the translated first voice information.

[0096] Specifically, the identified text is translated from the original language into a target language that the smartwatch user can understand. Machine translation technology, such as a deep learning-based neural machine translation model, can be used during the translation process. After obtaining the translation result, the translated text information is sent to the smartwatch. Upon receiving the translated information, the smartwatch can choose to display the text information on the screen or play the translated audio information via TTS (Text-to-Speech) technology, depending on settings or user preferences.

[0097] The above steps translate the initial voice information and send it to the smartwatch for display or playback, providing users with instant translation and communication capabilities, facilitating cross-language communication. Furthermore, as a personal device, the smartwatch instantly receives and displays translated information, offering users a convenient way to access information. Users don't need to frequently check their phones or wait for translation results; they can understand the translated content through the smartwatch, thus improving the efficiency and convenience of communication.

[0098] In an optional implementation, the method further includes:

[0099] Determine the intent of the first user in the communication dialog box, and determine a strategy to prompt the first user based on the first user's intent.

[0100] Specifically, determining the intent of the first user in the communication dialog box and determining a strategy to prompt the first user based on the first user's intent includes the following steps:

[0101] Step 1: Extract all text, voice, and action semantic information from the communication dialog box. Step 2: Clean the extracted data, removing irrelevant information such as redundant words and characters. Step 3: Use natural language understanding techniques, such as intent classification models, to identify the user's intent. Step 4: Based on the identified intent, analyze the user's needs, expectations, and potential points of confusion. Step 5: Based on the analysis of the user's intent, develop one or more prompting strategies. These strategies include providing additional information, clarifying questions, guiding the user to the next step, and recommending relevant options.

[0102] To better illustrate the above steps, here are some examples of possible prompting strategies, which can be flexibly adjusted according to different user intentions and scenarios:

[0103] Example 1: Providing Additional Information Strategy. This is used when a user's request is insufficient or requires further information. User: "I want to book a hotel." Suggestion Strategy: "To recommend a suitable hotel, when would you like to check in? And what is your budget range?"

[0104] Example 2: Clarifying the Question Strategy. When a user's question is ambiguous or unclear. User: "How do I use this product?" Prompt Strategy: "Do you mean the installation method or the usage method? If it's the usage method, please tell me which specific part of the operation you want to know."

[0105] Example 3: User Guidance Strategy. When a user needs to perform a series of operations but doesn't know how to do it. User: "I want to change my password." Prompt Strategy: "To change your password, please click the 'Account Settings' button, then select the 'Change Password' option, and follow the prompts to enter your new password."

[0106] Example 4: Recommended Options Strategy. When users face multiple choices but don't know how to choose. User: "I want to find an extracurricular class suitable for my child." Suggestion Strategy: "Based on your needs, we recommend the following extracurricular classes: painting, dance, and science experiment classes. You can choose according to your child's interests and strengths."

[0107] The above-mentioned prompting strategies can help users complete their tasks more efficiently and optimize the user experience. In practical applications, these strategies can be flexibly adjusted and optimized according to specific user needs and scenarios.

[0108] This application, through the aforementioned steps, accurately identifies user intent and formulates corresponding prompting strategies, enabling the provision of more personalized and considerate services to users, thus helping to improve user satisfaction and loyalty. Simultaneously, the prompting strategies guide users to perform correct operations, reducing communication barriers caused by insufficient information or misunderstandings, and helping to accelerate communication and improve the interactive experience.

[0109] In an optional implementation, referring to Figure 5, Figure 5 shows a flowchart of a user prompting strategy determination method provided by an embodiment of this application, wherein determining the strategy for prompting the first user based on the first user's intent includes steps S501 to S502:

[0110] S501: Determine whether the strategy is related to the smartwatch.

[0111] Specifically, after formulating the prompting strategy, first check whether the strategy contains information related to the smartwatch's functions, settings, applications, or interactions, and confirm whether the smartwatch has the necessary functions and technical capabilities to display or execute the strategy.

[0112] S502: If the policy is related to the smartwatch, then the policy is sent to the smartwatch.

[0113] Specifically, if the policy is related to the smartwatch, and the smartwatch has the ability to receive and execute the policy, then the policy is sent to the smartwatch. Upon receiving the policy, the smartwatch displays it on its screen or notifies the user in other ways (such as through voice prompts).

[0114] For example, in navigation scenarios, users can be alerted not only through information displayed on smart glasses but also through vibrations from a smartwatch. For instance, in a noisy, crowded area, displaying directions on smart glasses is unsuitable as it would obstruct the user's view, and sound alerts are also inappropriate because the volume could damage the user's hearing in such a noisy environment. Therefore, in this situation, the navigation strategy is sent to the smartwatch, which then vibrates at key navigation points to provide the necessary navigation functionality.

[0115] This application determines the relevance of the strategy to the smartwatch through steps S501-S502, ensuring that only strategies matching the smartwatch's functions are sent. This avoids unnecessary resource waste and improves resource utilization efficiency. Furthermore, smartwatch notifications are typically more subtle and intuitive, unlike traditional mobile phone notifications that disturb the user. This helps reduce user distraction, improves focus and work efficiency, and is applicable to more scenarios.

[0116] To better illustrate the information processing method provided in this application, this application also provides a method for translation and real-time response using smart glasses and a smartwatch based on the information processing method. Referring to Figure 6, which shows a flowchart of a translation and real-time response method based on smart glasses and a smartwatch provided in an embodiment of this application, when the wearer of the smart device simultaneously wears smart glasses and a smartwatch, the user's voice / gesture information is received through the glasses or the smartwatch, and the voice information of the other party in the conversation is received through the smartwatch. The smart glasses capture the actions and expressions of the other party in the conversation through an image sensor, forming a dialogue box on the display interface, and displaying the translated voice information of the other party in the conversation, as well as semantically meaningful action information. Then, the translated voice / gesture information is presented to the user through the screen or speaker of the smartwatch. Alternatively, the semantic information in the conversation, the actions and expressions of the other party in the conversation, and suggestions are given based on the user's intent; in response to the user executing the suggestion, a reminder strategy is determined based on the user's current state; when the reminder strategy requires the participation of the smartwatch, the reminder strategy is sent to the smartwatch.

[0117] The information processing method provided in this application, by combining the functions of smart glasses and smartwatches, optimizes multimodal interaction experience, information display and dialogue management, cross-device collaboration, personalized prompting strategies, and multilingual communication, providing users with a richer, more convenient, and safer interactive experience. It can not only accurately and effectively identify the content of user dialogue, enabling users to accurately understand the content of dialogue, but also improve the user's experience and satisfaction.

[0118] Based on the same concept, this application also provides an information processing device. Referring to Figure 7, Figure 7 shows a schematic diagram of the structure of an information processing device provided in this application embodiment, wherein the device is applied to smart glasses, and a first user wears the smart glasses and a smartwatch. The device includes:

[0119] The first voice information receiving module 701 is configured to receive the first voice information sent by the first user.

[0120] The second voice information receiving module 702 is configured to receive second voice information sent by the smartwatch, wherein the voice information is input to the smartwatch by a second user;

[0121] The motion information acquisition module 703 is configured to acquire the motion information of the second user through an image sensor;

[0122] The communication dialog box display module 704 is configured to recognize and process the first voice information, the second voice information, and the action information, and form a communication dialog box on the display interface of the smart glasses.

[0123] Based on the same concept, this application also provides a smart glasses, wherein the smart glasses execute the steps of the information processing method described in any one of the above embodiments when running.

[0124] Based on the same concept, this application also provides a computer device. Referring to Figure 8, Figure 8 shows a schematic diagram of the structure of a computer device provided in this application embodiment. As shown in Figure 8, the computer device 800 provided in this application embodiment includes:

[0125] The computer device 800 includes a processor 801, a memory 802, and a bus 803. The memory 802 stores machine-readable instructions that can be executed by the processor 801. When the computer device 800 is running, the processor 801 communicates with the memory 802 through the bus 803. The machine-readable instructions are executed by the processor 801 to perform the steps of the information processing method shown in the above embodiment.

[0126] Based on the same concept, this application also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, performs the steps of the information processing method described in any of the above embodiments.

[0127] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the system and apparatus described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0128] The computer program product for information processing provided in this application includes a computer-readable storage medium storing program code. The instructions included in the program code can be used to execute the methods described in the preceding method embodiments. For specific implementation details, please refer to the method embodiments, which will not be repeated here.

[0129] The information processing apparatus provided in this application embodiment can be specific hardware on a device or software or firmware installed on the device. The system provided in this application embodiment has the same implementation principle and technical effects as the foregoing method embodiments. For the sake of brevity, any parts not mentioned in the system embodiment can be referred to the corresponding content in the foregoing method embodiments. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, apparatuses, and units described above can all be referred to the corresponding processes in the above method embodiments, and will not be repeated here.

[0130] In the embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. The apparatus embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. Furthermore, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Additionally, the displayed or discussed mutual couplings, direct couplings, or communication connections may be through some communication interfaces; indirect couplings or communication connections between devices or units may be electrical, mechanical, or other forms.

[0131] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0132] In addition, the functional units in the embodiments provided in this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0133] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0134] It should be noted that similar labels and letters in the following figures indicate similar items. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures. In addition, the terms "first", "second", "third", etc. are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.

[0135] Finally, it should be noted that the above-described embodiments are merely specific implementations of this application, used to illustrate the technical solutions of this application, and not to limit them. The protection scope of this application is not limited thereto. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features, within the scope of the technology disclosed in this application; and these modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application. All should be covered within the protection scope of this application. Therefore, the protection scope of this application should be determined by the protection scope of the claims.

Claims

1. An information processing method applied to smart glasses, a first user wearing the smart glasses and a smart watch, wherein, The method includes: Receive the first voice message sent by the first user; Receive second voice information sent by the smartwatch, wherein the voice information is input to the smartwatch by a second user; The action information of the second user is obtained through an image sensor; The first voice information, the second voice information, and the action information are recognized and processed to form a communication dialog box on the display interface of the smart glasses.

2. The method of claim 1, wherein, The first voice information, the second voice information, and the action information are recognized and processed to form a communication dialog box on the display interface of the smart glasses, including: The semantics corresponding to the first voice information, the second voice information, and the action information are arranged in chronological and semantic order to form the communication dialog box.

3. The method of claim 1, wherein, The first voice information, the second voice information, and the action information are recognized and processed to form a communication dialog box on the display interface of the smart glasses, including: Determine whether the time interval of the second voice information overlaps with the time interval of the action information; If the time intervals of the second voice information and the action information do not overlap, then the semantic information corresponding to the action information is determined as the content in the communication dialog box.

4. The method of claim 3, wherein, The first voice information, the second voice information, and the action information are recognized and processed to form a communication dialog box on the display interface of the smart glasses, including: In response to the overlap of the time intervals of the second voice information and the action information, an approximate value of the second voice information and the action information is determined; In response to the approximate value being greater than a preset value, the semantic content corresponding to the second voice information is determined as the content in the communication dialog box.

5. The method of claim 3, wherein, The first voice information, the second voice information, and the action information are recognized and processed to form a communication dialog box on the display interface of the smart glasses, including: In response to the overlap of the time intervals of the second voice information and the action information, an approximate value of the second voice information and the action information is determined; In response to the approximate value being less than a preset value, the first semantic content corresponding to the second voice information is determined, and the second semantic content corresponding to the action information is determined; The first semantic content and the second semantic content are merged, and the merged semantic information is determined as the content of the communication dialog box.

6. The method of claim 1, wherein, After receiving the first voice message sent by the first user, the method further includes: The first voice information is translated and sent to the smartwatch so that the smartwatch can display or play the translated first voice information.

7. The method of claim 1, wherein, The method further includes: determining the intent of the first user in the communication dialog box, and determining a strategy to prompt the first user based on the intent of the first user.

8. The method of claim 7, wherein, Determining a strategy for prompting the first user based on the first user's intent includes: Determine whether the strategy is related to the smartwatch; If the policy is related to a smartwatch, then the policy is sent to the smartwatch. 9.An information processing apparatus applied to smart glasses, a first user wearing the smart glasses and a smart watch, wherein, The device includes: The first voice information receiving module is used to receive the first voice information sent by the first user; The second voice information receiving module is used to receive second voice information sent by the smartwatch, wherein the voice information is input to the smartwatch by the second user; The motion information acquisition module is used to acquire the motion information of the second user through an image sensor; The communication dialog box display module is used to recognize and process the first voice information, the second voice information, and the action information, and form a communication dialog box on the display interface of the smart glasses.

10. A smart eyewear, wherein, The smart glasses perform the steps of the information processing method as described in any one of claims 1 to 7 when they are in operation.

11. A computer device, wherein, include: The computer device includes a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the computer device is running, the processor communicates with the memory via the bus. When the machine-readable instructions are executed by the processor, they perform the steps of the information processing method as described in any one of claims 1 to 7.

12. A computer readable storage medium, wherein, The computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of the information processing method as described in any one of claims 1 to 7.