Method, apparatus, device and storage medium for virtual object control

By breaking down interactive actions into facial and sustained action phases, and combining them with body movements, the problem of unnatural digital human interaction was solved, resulting in a more realistic human-computer interaction experience.

CN116048258BActive Publication Date: 2026-06-12BEIJING ZITIAO NETWORK TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING ZITIAO NETWORK TECH CO LTD
Filing Date
2022-12-30
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing digital human interaction methods lack naturalness and fail to provide an experience similar to that of real human interaction.

Method used

By splitting the interactive action into two stages, facial movements are first executed based on the statement, and then the follow-up action is executed based on emotional information, combined with body movements to enhance the realism of the interaction.

🎯Benefits of technology

It enhances the realism and naturalness of virtual object interaction, improving the fun and satisfaction of the user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116048258B_ABST
    Figure CN116048258B_ABST
Patent Text Reader

Abstract

According to embodiments of the present disclosure, methods, apparatuses, devices and storage media for virtual object control are provided. The method includes obtaining a sentence to be output by a virtual object; determining emotion information associated with the sentence; controlling, based on the sentence and the emotion information, the virtual figure to perform a first interaction action in a first time period, the first interaction action being determined based on a first facial motion corresponding to the sentence and a second facial motion corresponding to the emotion information; and controlling the virtual figure to perform a second interaction action in a second time period after the first time period, the second interaction action including the second facial motion corresponding to the emotion information. Thereby, the sense of reality of interaction with the virtual object can be improved, so that the virtual object (e.g., a digital human) can be more human-like.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The exemplary embodiments disclosed herein generally relate to the field of computers, and more particularly to methods, apparatus, devices, and computer-readable storage media for controlling virtual objects. Background Technology

[0002] With the development of computer technology, various digital humans are now able to assist people in all aspects of daily life. For example, voice assistants can provide users with voice or text-based services based on artificial intelligence. In recent years, some digital humans with visual appearances have not only been able to provide text or voice feedback, but also visual feedback, which greatly enhances the realism of interacting with digital humans. Summary of the Invention

[0003] In a first aspect of this disclosure, a method for controlling a virtual object is provided. The method includes: acquiring a statement to be output by a virtual object; determining emotional information associated with the statement; controlling a virtual avatar to perform a first interactive action within a first time period based on the statement and the emotional information, the first interactive action being determined based on a first facial movement corresponding to the statement and a second facial movement corresponding to the emotional information; and controlling the virtual avatar to perform a second interactive action within a second time period following the first time period, the second interactive action including a second facial movement corresponding to the emotional information.

[0004] In a second aspect of this disclosure, an apparatus for controlling a virtual object is provided. The apparatus includes: an acquisition module configured to acquire a statement to be output by a virtual object; a determination module configured to determine emotional information associated with the statement; a first interaction action execution module configured to control the virtual avatar to perform a first interaction action within a first time period based on the statement and the emotional information, the first interaction action being determined based on a first facial movement corresponding to the statement and a second facial movement corresponding to the emotional information; and a second interaction action execution module configured to control the virtual avatar to perform a second interaction action within a second time period following the first time period, the second interaction action including a second facial movement corresponding to the emotional information.

[0005] In a third aspect of this disclosure, an electronic device is provided. The device includes at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit. When executed by the at least one processing unit, the instructions cause the device to perform the method of the first aspect.

[0006] In a fourth aspect of this disclosure, a computer-readable storage medium is provided. The computer-readable storage medium stores a computer program that can be executed by a processor to implement the method of the first aspect.

[0007] It should be understood that the content described in this summary section is not intended to limit the key or essential features of the embodiments of this disclosure, nor is it intended to restrict the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description

[0008] The above and other features, advantages, and aspects of the embodiments of this disclosure will become more apparent from the accompanying drawings and the following detailed description. In the drawings, the same or similar reference numerals denote the same or similar elements, wherein:

[0009] Figure 1 A schematic diagram of an example environment in which embodiments of the present disclosure can be implemented is shown;

[0010] Figure 2 A flowchart illustrating a process for controlling a virtual object according to some embodiments of the present disclosure is shown;

[0011] Figure 3 A block diagram of an apparatus for controlling virtual objects according to some embodiments of the present disclosure is shown; and

[0012] Figure 4 A block diagram of an apparatus capable of implementing several embodiments of the present disclosure is shown. Detailed Implementation

[0013] It is understood that before using the technical solutions disclosed in the various embodiments of this disclosure, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in this disclosure in an appropriate manner in accordance with relevant laws and regulations, and user authorization should be obtained.

[0014] For example, upon receiving a user's active request, a prompt message is sent to the user to explicitly inform them that the requested operation will require the acquisition and use of the user's personal information. This allows the user to independently choose whether to provide personal information to the software or hardware, such as the electronic device, application, server, or storage medium performing the operations of this disclosed technical solution, based on the prompt message.

[0015] As an optional but non-limiting implementation, in response to a user's active request, sending a prompt message to the user can be done via a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide personal information to the electronic device.

[0016] It is understood that the above notification and user authorization process are merely illustrative and do not constitute a limitation on the implementation of this disclosure. Other methods that comply with relevant laws and regulations may also be applied to the implementation of this disclosure.

[0017] It is understood that the data involved in this technical solution (including but not limited to the data itself, the acquisition or use of the data) shall comply with the requirements of relevant laws, regulations and related provisions.

[0018] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.

[0019] It should be noted that the headings of any section / subsection provided herein are not limiting. Various embodiments are described throughout this document, and embodiments of any type may be included under any section / subsection. Furthermore, embodiments described in any section / subsection may be combined in any way with any other embodiments described in the same section / subsection and / or different sections / subsections.

[0020] In the description of embodiments of this disclosure, the term "comprising" and similar terms should be understood as open-ended inclusion, i.e., "including but not limited to". The term "based on" should be understood as "at least partially based on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The term "some embodiments" should be understood as "at least some embodiments". Other explicit and implicit definitions may also be included below. The terms "first", "second", etc., may refer to different or the same objects. Other explicit and implicit definitions may also be included below.

[0021] As used in this paper, the term "model" refers to a system that learns the relationship between inputs and outputs from training data, enabling it to generate corresponding outputs for a given input after training. Model generation can be based on machine learning techniques. Deep learning is a machine learning algorithm that uses multiple layers of processing units to process inputs and provide corresponding outputs. In this paper, "model" may also be referred to as a "machine learning model," a "machine learning network," or simply a "network," and these terms are used interchangeably. A model can also include different types of processing units or networks.

[0022] As used herein, a “unit,” “operation unit,” or “subunit” can consist of any suitable machine learning model or network. As used herein, a set of elements or similar expressions can include one or more such elements. For example, “a set of convolutional units” can include one or more convolutional units.

[0023] As briefly mentioned earlier, for various types of digital humans, verbal feedback (e.g., text-based or voice-based) is generally the most common form of feedback. In recent years, some digital humans with visual avatars have also gradually used visual feedback to enhance the realism of user interactions with them.

[0024] However, conventional digital humans can only perform some basic interactive actions, which makes interacting with digital humans seem unnatural and makes it difficult for users to have an experience similar to interacting with real people.

[0025] Embodiments of this disclosure propose a scheme for controlling virtual objects. According to various embodiments of this disclosure, a statement to be output by a virtual object is obtained. Emotional information associated with the statement is determined. Based on the statement and the emotional information, the virtual avatar is controlled to perform a first interactive action during a first time period, the first interactive action being determined based on a first facial movement corresponding to the statement and a second facial movement corresponding to the emotional information. The virtual avatar is controlled to perform a second interactive action during a second time period following the first time period, the second interactive action including a second facial movement corresponding to the emotional information.

[0026] Therefore, by splitting the interactive action into two stages, the embodiments of this disclosure can improve the realism of interacting with virtual objects, making virtual objects (e.g., digital humans) more human-like.

[0027] Example embodiments of this disclosure are described below with reference to the accompanying drawings.

[0028] Example Environment

[0029] Figure 1 A schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented is shown. Environment 100 may include electronic device 110. (As...) Figure 1 As shown, electronic device 110 can display virtual object 120.

[0030] In some embodiments, the virtual object 120 may be, for example, a visual representation of a digital human (also known as a digital AI). Exemplarily, some digital humans may have their own unique virtual object 120. In some embodiments, such a virtual object 120 may also be constructed or generated based on user configuration; for example, a user can customize the virtual object 120 of a digital human by adjusting its face, limbs, clothing, etc.

[0031] In some embodiments, such a virtual object 120 can interact appropriately with the user of the electronic device 110 in various ways. For example, the virtual object 120 can provide text feedback, voice feedback, visual feedback, vibration feedback, etc., based on appropriate output devices of the electronic device 110, such output devices may include, but are not limited to, screens, speakers, vibration motors, etc.

[0032] In some embodiments, the virtual object 120 can also provide corresponding interactive feedback based on the received user interaction. For example, a user can input text content, voice content, or other appropriate types of content into the electronic device 110, and accordingly, the virtual object 120 can provide corresponding interactive feedback based on the received user input.

[0033] It should be understood that, although in Figure 1 In this context, electronic device 110 is shown as a mobile terminal device, but it can also be other suitable forms, such as a desktop computer, laptop computer, tablet computer, wearable device, projection device, virtual reality device, augmented reality device, mixed reality device, and large advertising screen device, etc. In some example scenarios, electronic device 110 can also be a server device or other suitable cloud computing device, which can utilize a separate display device to present the virtual avatar 120.

[0034] Furthermore, although virtual object 120 is in Figure 1 The virtual object 120 is shown to be provided using the screen of electronic device 110, but depending on the specific form of electronic device 110, it may also provide other suitable display methods, such as projection-based display, display based on smart wearable devices, and other suitable presentation methods. This disclosure is not intended to limit the specific form of electronic device 110 or the specific presentation method of virtual object 120.

[0035] like Figure 1 As shown, the electronic device 110 can acquire a statement 130 to be output by the virtual object 120. For example, such a statement 130 may be determined based on received user interactions with the virtual object 120.

[0036] For example, a user can input the voice message "How are you feeling today?" into electronic device 110. Accordingly, electronic device 110 and / or other suitable computing devices can determine the corresponding feedback message as message 130 based on the statement corresponding to the voice message, and for example using natural language processing (NLP) technology, such as "The weather is nice today! I'm very happy!"

[0037] Furthermore, the electronic device 110 can cause the virtual object 120 to perform an interactive action corresponding to the statement 130. In some embodiments, such an interactive action may include facial movements 140 of the virtual object, such as smiling.

[0038] In some embodiments, if the virtual object 120 also includes limbs, the electronic device 110 may also cause the virtual object 120 to perform a limb action 150 corresponding to the statement 130, such as raising an arm.

[0039] Specific details regarding interactive actions and physical gestures will be described in detail below. It should be understood that the structure and function of environment 100 are described for illustrative purposes only and do not imply any limitation on the scope of this disclosure.

[0040] Example process

[0041] Figure 2 A flowchart of a process 200 for virtual object control according to some embodiments of the present disclosure is shown. Process 200 can be implemented at electronic device 110. Reference is made below. Figure 1 Describe the process 200.

[0042] In box 210, electronic device 110 obtains statement 130 to be output by virtual object 120.

[0043] For reference Figure 1 The utterance 130 discussed here may be determined, for example, based on the received user interaction with the virtual object 120. It should be understood that an utterance is intended to represent language-level content and is not limited to its specific form of presentation; it may be presented or output, for example, through text or speech.

[0044] by Figure 1 As an example, a user can input voice content and / or text content into electronic device 110, with the corresponding input statement being, for example, "How are you feeling today?" Accordingly, electronic device 110 and / or other suitable computing devices can determine the corresponding feedback statement as statement 130 based on the statement corresponding to the voice content, and for example using natural language processing (NLP) technology, such as, "The weather is nice today! I'm very happy!"

[0045] In some embodiments, such statement 130 may be generated by electronic device 110, for example. Alternatively, such statement 130 may also be generated by a remote computing device relative to electronic device 110 and sent to electronic device 110.

[0046] In box 220, electronic device 110 determines the emotional information associated with statement 130.

[0047] In some embodiments, the electronic device 110 may further generate or acquire emotional information corresponding to the statement 130. Such emotional information may, for example, indicate an emotional label corresponding to the statement 130.

[0048] In some embodiments, electronic device 110 and / or other suitable computing device may use an emotion classification model to process statement 130 to determine the emotion label corresponding to statement 130. For example, the emotion label corresponding to statement 130 may be determined as "happy".

[0049] In some embodiments, when determining the emotion label of statement 130, electronic device 110 and / or other suitable computing device may also consider other suitable factors. Such factors may include, for example, the personality settings of the digital human corresponding to virtual avatar 120, the historical interactions of virtual avatar 120 within a predetermined time period, etc. Such features may be used as feature inputs to a classification model to comprehensively determine the emotional information corresponding to statement 130.

[0050] In frame 230, electronic device 110 controls virtual avatar 120 to perform a first interactive action within a first time period based on statement 130 and emotional information, wherein the first interactive action is determined based on a first facial action corresponding to the statement and a second facial action corresponding to the emotional information.

[0051] Specifically, the electronic device 110 can determine the mouth movements in relation to the statement 130 based on the statement 130, and for example using a preset facial motion model, so that the mouth movements can be synchronized with the text and / or voice content output by the electronic device 110.

[0052] For example, electronic device 110 can use a text-to-speech model to convert statement 130 into speech content that matches the voice of a digital human, and at the same time determine the mouth movements synchronized with the speech content.

[0053] Furthermore, the electronic device 110 can also determine a corresponding second facial movement based on emotional information. Such a second facial movement may include, for example, an expression.

[0054] For example, if the emotional information corresponding to statement 130 is determined to be "happy", the electronic device 110 can make the face of the virtual object 120 show a grinning expression.

[0055] Therefore, the electronic device 110 can fuse such mouth movements and facial expressions to determine the interactive actions performed by the virtual object 120 within a first time period. It should be understood that the electronic device 110 can fuse these two facial movements through appropriate processes such as action combination, and this disclosure is not intended to limit the specific fusion process.

[0056] In some embodiments, the duration of such an action may be determined, for example, based on the length of statement 130 or the length of the speech content derived from statement 130. For example, if the electronic device 110 provides speech output and the speech content corresponding to statement 130 is, for example, 3 seconds, then the electronic device 110 may cause the virtual object 120 to perform an interactive action determined based on mouth movements and facial expressions within 3 seconds.

[0057] In frame 240, electronic device 110 controls virtual avatar 120 to perform a second interactive action during a second time period following the first time period. The second interactive action includes a second facial movement corresponding to emotional information.

[0058] In some embodiments, to avoid the virtual avatar's emotion switching being too abrupt, the electronic device 120 may also allow the second interactive action to be continuously performed during a second time period. For example, the virtual object 120 may maintain a "grinning" facial expression during a predetermined time period after completing the voice output, thereby avoiding overly abrupt expression switching.

[0059] In some embodiments, the duration of the second time period during which the second interactive action is performed can be determined based on the confidence level of the emotional information. Specifically, the electronic device 110 determines the confidence level (also referred to as the first confidence level) associated with the emotional information.

[0060] As discussed above, electronic device 110 and / or other suitable computing devices can determine emotion labels based on an emotion classification model. It should be understood that such a classification model can also simultaneously output a confidence score corresponding to the emotion label, such confidence score being, for example, a normalized value to characterize the credibility of the corresponding emotion label.

[0061] Furthermore, the electronic device 110 can determine the length of the second time period based on the confidence level, such that the length of the time period is proportional to the confidence level. In some embodiments, the electronic device 110 can, for example, set a predetermined time length (also referred to as a first predetermined time length) and determine the length of the second time period based on the product of the predetermined time length and the confidence level.

[0062] In some embodiments, to enhance the realism of the interaction, the length of the second time period may, for example, take into account the length of the first time period. Specifically, the electronic device 110 determines the first time length for outputting the statement, and, based on the confidence level and the first time length, determines the second time length of the second time period, such that the second time length is proportional to the confidence level.

[0063] For example, continuing the example above, if the length of the voice content corresponding to statement 130 is 3 seconds and the confidence level of the emotion label "happy" is 80%, then electronic device 110 can determine the length of the second time period as 2.4 seconds based on the product of the length of 3 seconds and the confidence level of 80%.

[0064] Based on this approach, the embodiments of this disclosure can not only improve the naturalness of the facial expression transitions of the virtual object 120, but also improve the matching degree with the sentences, thereby presenting a more human-like interactive effect.

[0065] In some embodiments, the electronic device 110 may also enable the virtual avatar 110 to present body movements 150 at least in partial parallel with the first interactive action and / or the second interactive action.

[0066] Specifically, electronic device 110 determines a first limb movement that matches the semantic information, for example, based on the textual and / or semantic information of statement 130.

[0067] Taking text information as an example, electronic device 110 may predefine a set of sentence fragments and associate each sentence fragment with a corresponding body movement. Thus, electronic device 110 detects target sentence fragments from the sentences 130 that match a set of predefined sentence fragments.

[0068] by Figure 1 As an example, electronic device 110 may detect from statement 130 that the statement fragment matching a set of statement fragments is "weather", and its corresponding body action is, for example, "raising the right arm and pointing the finger upward".

[0069] Thus, the electronic device 110 can further determine the first limb action that matches the target sentence fragment from a set of preset limb actions, such as "raise the right arm and point the finger upward".

[0070] In some embodiments, the electronic device 110 may also construct an association between the semantic information of a statement and a preset limb action, and determine the corresponding first limb action based on the semantic information of the statement 130.

[0071] In some embodiments, the execution of the first limb action is at least partially parallel to the output of the target statement fragment. For example, the electronic device 110 may execute the first limb action during the time period in which the virtual object 120 outputs an interactive action corresponding to "weather".

[0072] Therefore, this kind of physical interaction is more in line with the human interaction experience and can further enhance the realism of virtual object interaction.

[0073] In some embodiments, the electronic device 110 may also determine a second limb movement that matches the emotional information. For example, the emotional information determined by the electronic device 110 may include one or more emotion tags.

[0074] In some embodiments, the electronic device 110 may also construct a set of emotion labels corresponding to body movements. For example, the emotion label "happy" may be associated with both facial expressions and body movements; the emotion label "helpless" may be associated with only facial expressions; and the emotion label "confused" may be associated with only body movements.

[0075] In some embodiments, the emotion labeling model may be a multi-classification model that can output multiple emotion labels as emotion information. For example, the electronic device 110 may determine that the emotion label corresponding to the statement 130 includes only "happy", or may include both "helpless" and "confused".

[0076] In some embodiments, the electronic device 110 can determine an emotion tag associated with a body movement from emotion tags. For ease of description, the emotion tag used to determine the second facial movement discussed above is referred to as a first emotion tag, and the emotion tag used to determine the second body movement is also referred to as a second emotion tag, such that the first emotion tag and the second emotion tag are the same, partially the same, or completely different.

[0077] Furthermore, the electronic device 110 can determine a second body movement that matches the second emotion label. Continuing with the emotion label "happy," the electronic device 110 can, for example, determine that the preset body movement corresponding to "happy" is "left hand clenched into a fist."

[0078] Accordingly, the electronic device 110 can cause the virtual object 120 to synchronously perform a determined second limb action, and this second limb action can be continuously performed during a third time period following the first time period. Such a third time period is determined independently, and it may be the same as or different from the second time period.

[0079] Specifically, the electronic device 110 can also determine a confidence level (also called a second confidence level) associated with the second emotion label. For scenarios where the first and second emotion labels are the same, this second confidence level is equivalent to the first confidence level discussed above. For scenarios where the first and second emotion labels are different, this second confidence level can, for example, be determined independently by an emotion classification model.

[0080] Accordingly, the electronic device 110 determines the duration of the second limb action based on this confidence level. Such duration may include, for example, the first duration discussed above and a third duration of a third time period, wherein the third duration is proportional to the second confidence level.

[0081] In some embodiments, the electronic device 110 may, for example, set a predetermined time length (also referred to as a second predetermined time length) and determine the length of a third time period based on the product of the predetermined time length and a second confidence level. In some embodiments, the second predetermined time length may be the same as or different from the first predetermined time length discussed above, so that the duration of the second limb movement can be independent of the duration of the second facial movement.

[0082] In some embodiments, to enhance the realism of the interaction, the length of the third time period may, for example, take into account the length of the first time period. Specifically, the electronic device 110 may determine the third time period based on the second confidence level and the first time period, such that the third time period is proportional to the second confidence level.

[0083] For example, continuing the example above, if the length of the speech content corresponding to statement 130 is 3 seconds, the determined second emotion label is "doubt", and its corresponding second confidence level is 70%, then electronic device 110 can determine the length of the third time period as 2.1 seconds based on the product of the length of 3 seconds and the second confidence level of 70%.

[0084] Based on this approach, the embodiments of this disclosure can enrich the interactive experience of virtual objects, thereby improving the realism of virtual object interaction and enhancing the fun of user interaction.

[0085] Example devices and equipment

[0086] Figure 3 A schematic structural block diagram of a device 300 for controlling virtual objects according to certain embodiments of the present disclosure is shown. The device 300 may be implemented as or included in an electronic device 110. Various modules / components in the device 300 may be implemented by hardware, software, firmware, or any combination thereof.

[0087] As shown in the figure, device 300 includes an acquisition module 310 configured to acquire a statement to be output by a virtual object. Device 300 also includes a determination module 320 configured to determine emotional information associated with the statement. Device 300 further includes a first interactive action execution module 330 configured to control the virtual avatar to perform a first interactive action within a first time period based on the statement and emotional information. The first interactive action is determined based on a first facial movement corresponding to the statement and a second facial movement corresponding to the emotional information. Device 300 also includes a second interactive action execution module 340 configured to control the virtual avatar to perform a second interactive action within a second time period after the first time period. The second interactive action includes a second facial movement corresponding to the emotional information.

[0088] In some embodiments, the first facial movement includes at least mouth movements corresponding to a statement, and the second facial movement includes preset facial expressions corresponding to emotional information.

[0089] In some embodiments, the device 300 further includes: a confidence determination module configured to determine a first confidence level associated with emotional information; and a time length determination module configured to determine the time length of a second time period based on the first confidence level.

[0090] In some embodiments, the time length determination module is configured to: determine a first time length for outputting statements; and determine a second time length for a second time period based on a first confidence level and the first time length, such that the second time length is proportional to the first confidence level.

[0091] In some embodiments, the device 300 further includes a presentation module configured to control the virtual avatar to present limb movements at least partially in parallel with a first interactive action and / or a second interactive action.

[0092] In some embodiments, the device 300 further includes a first limb movement determination module, configured to determine a first limb movement that matches the semantic information based on the textual information and / or semantic information of the statement.

[0093] In some embodiments, the first limb action determination module is configured to: detect a target statement fragment that matches a set of preset statement fragments from the statement; and determine a first limb action that matches the target statement fragment from a set of preset limb actions, wherein the execution of the first limb action is at least partially parallel to the output of the target statement fragment.

[0094] In some embodiments, the device 300 further includes: a second limb movement determination module, configured to determine a second limb movement that matches the emotion information based on a second emotion tag, wherein the first emotion tag and the second emotion tag may be the same or different.

[0095] In some embodiments, the second limb movement determination module is further configured to: determine a second confidence level associated with the second emotion label; and, based on the second confidence level, determine the time length corresponding to the second limb movement, such that the second limb movement is continuously executed in a third time period after the first time period, wherein the three time lengths of the third time period are proportional to the second confidence level.

[0096] In some embodiments, the acquisition module 310 is configured to determine the statement to be output by the virtual object based on the user's interaction with the virtual object.

[0097] Figure 4 A block diagram is shown illustrating an electronic device 400 in which one or more embodiments of the present disclosure may be implemented. It should be understood that... Figure 4 The electronic device 400 shown is merely exemplary and should not be construed as limiting the functionality and scope of the embodiments described herein. Figure 4 The electronic device 400 shown can be used to achieve Figure 1 Electronic devices 110.

[0098] like Figure 4 As shown, electronic device 400 is in the form of a general-purpose electronic device. Components of electronic device 400 may include, but are not limited to, one or more processors or processing units 410, memory 420, storage device 430, one or more communication units 440, one or more input devices 450, and one or more output devices 460. Processing unit 410 may be a physical or virtual processor and is capable of performing various processes according to programs stored in memory 420. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to improve the parallel processing capability of electronic device 400.

[0099] Electronic device 400 typically includes multiple computer storage media. Such media can be any accessible media that is accessible to electronic device 400, including but not limited to volatile and non-volatile media, removable and non-removable media. Memory 420 can be volatile memory (e.g., registers, cache, random access memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. Storage device 430 can be a removable or non-removable medium and can include machine-readable media, such as flash drives, disks, or any other media that can be used to store information and / or data (e.g., training data for training) and can be accessed within electronic device 400.

[0100] Electronic device 400 may further include additional removable / non-removable, volatile / non-volatile storage media. Although not explicitly stated... Figure 4 As shown, disk drives for reading from or writing to removable, non-volatile disks (e.g., "floppy disks") and optical disk drives for reading from or writing to removable, non-volatile optical disks can be provided. In these cases, each drive can be connected to a bus (not shown) via one or more data media interfaces. Memory 420 may include computer program product 425 having one or more program modules configured to perform various methods or actions of various embodiments of this disclosure.

[0101] Communication unit 440 enables communication with other electronic devices via a communication medium. Additionally, the functionality of components of electronic device 400 can be implemented using a single computing cluster or multiple computing machines capable of communicating via communication connections. Therefore, electronic device 400 can operate in a networked environment using logical connections to one or more other servers, network personal computers (PCs), or another network node.

[0102] Input device 450 can be one or more input devices, such as a mouse, keyboard, trackball, etc. Output device 460 can be one or more output devices, such as a monitor, speaker, printer, etc. Electronic device 400 can also communicate with one or more external devices (not shown) via communication unit 440 as needed. These external devices include storage devices, display devices, etc., and can communicate with one or more devices that enable user interaction with electronic device 400, or with any device that enables electronic device 400 to communicate with one or more other electronic devices (e.g., network card, modem, etc.). Such communication can be performed via input / output (I / O) interface (not shown).

[0103] According to an exemplary implementation of this disclosure, a computer-readable storage medium is provided that stores computer-executable instructions thereon, wherein the computer-executable instructions are executed by a processor to implement the methods described above. According to an exemplary implementation of this disclosure, a computer program product is also provided, which is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, which are executed by a processor to implement the methods described above.

[0104] Various aspects of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatuses, devices, and computer program products implemented according to this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.

[0105] These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner. Thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.

[0106] Computer-readable program instructions can be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions that execute on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.

[0107] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction, which contains one or more executable instructions for implementing the specified logical function. In some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0108] Various implementations of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed implementations. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described implementations. The terminology used herein is chosen to best explain the principles, practical applications, or improvements to technology in the market, or to enable others skilled in the art to understand the various implementations disclosed herein.

Claims

1. A method for controlling virtual objects, comprising: Retrieve the statement to be output by the virtual object; Determine the emotional information associated with the statement; Based on the statement and the emotional information, the virtual character is controlled to perform a first interactive action within a first time period. The first interactive action is determined based on a first facial action corresponding to the statement and a second facial action corresponding to the emotional information. as well as The virtual avatar is controlled to perform a second interactive action during a second time period after the first time period. The second interactive action includes a second facial action corresponding to the emotional information. The method further includes: determining a first confidence level associated with the emotional information, the first confidence level indicating the credibility of a first emotional label in the emotional information; and determining the duration of the second time period based on the first confidence level. The determination of the second time period includes: determining a first time period for outputting the statement; and determining a second time period based on the first confidence level and the first time period, such that the second time period is proportional to the first confidence level.

2. The method according to claim 1, wherein the first facial movement includes at least a mouth movement corresponding to the statement, and the second facial movement includes a preset expression movement corresponding to the emotional information.

3. The method according to claim 1, further comprising: The virtual avatar is controlled to present body movements at least partially in parallel with the first interactive action and / or the second interactive action.

4. The method according to claim 3, further comprising: Based on the textual and / or semantic information of the statement, a first limb action matching the semantic information is determined.

5. The method of claim 4, wherein determining the first limb action matching the semantic information comprises: Detect target statement fragments that match a set of preset statement fragments from the statement; as well as The first limb action that matches the target statement fragment is determined from a set of preset limb actions, wherein the execution of the first limb action is at least partially parallel to the output of the target statement fragment.

6. The method of claim 3, wherein the second facial movement is determined based on the first emotion tag indicated by emotion information, further comprising: Based on the second emotion tag indicated by the emotion information, a second limb movement matching the emotion information is determined, wherein the first emotion tag is the same as or different from the second emotion tag.

7. The method of claim 6, further comprising: Determine a second confidence level associated with the second emotion label, the second confidence level indicating the degree of confidence of the second emotion label; as well as Based on the second confidence level and the predetermined time length, the time length corresponding to the second limb action is determined, such that the second limb action is continuously performed in a third time period after the first time period, wherein the third time length of the third time period is proportional to the second confidence level.

8. The method according to claim 1, wherein obtaining the statement to be output by the virtual object includes: Based on the user's interaction with the virtual object, it is determined that the statement to be output by the virtual object is to be determined.

9. An apparatus for controlling virtual objects, comprising: The get module is configured to retrieve statements to be output by the virtual object; The determination module is configured to determine the emotional information associated with the statement; The first interactive action execution module is configured to control the virtual image to perform a first interactive action within a first time period based on the statement and the emotional information. The first interactive action is determined based on a first facial action corresponding to the statement and a second facial action corresponding to the emotional information. as well as The second interactive action execution module is configured to control the virtual image to perform a second interactive action in a second time period after the first time period. The second interactive action includes a second facial action corresponding to the emotional information. The device further includes a confidence determination module, configured to determine a first confidence level associated with the emotional information, the first confidence level indicating the credibility of a first emotional label indicated by the emotional information; and to determine the duration of the second time period based on the first confidence level. The confidence determination module is further configured to: determine a first time length for outputting the statement; and, based on the first confidence and the first time length, determine a second time length for the second time period, such that the second time length is proportional to the first confidence.

10. An electronic device, comprising: At least one processing unit; as well as At least one memory, coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, which, when executed by the at least one processing unit, cause the electronic device to perform the method according to any one of claims 1 to 8.

11. A computer-readable storage medium having a computer program stored thereon, the computer program being executable by a processor to implement the method according to any one of claims 1 to 8.