Call method, related apparatus, and communication system

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By sending voice emoticons between calling devices and setting sound effects and display content, the problem of monotonous communication methods during calls is solved, enhancing the fun and user experience of calls.

WO2026130066A1PCT designated stage Publication Date: 2026-06-25HUAWEI TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: HUAWEI TECH CO LTD
Filing Date: 2025-11-27
Publication Date: 2026-06-25

Application Information

Patent Timeline

27 Nov 2025

Application

25 Jun 2026

Publication

WO2026130066A1

IPC: H04M3/42

AI Tagging

Application Domain

Special service for subscribers

Technology Topics

Communications system Human–computer interaction

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing technologies, users lack diverse communication methods and fun during calls, resulting in an unsatisfactory call experience.

Method used

By sending voice emoticons between calling devices and supporting personalized settings for their sound effects and displayed content, combined with virtual background sounds and avatars, the fun and interactivity of calls are enhanced.

Benefits of technology

It provides diverse communication methods to enhance users' expression and interaction experience during calls, thereby increasing the fun of the call and user satisfaction.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN2025138081_25062026_PF_FP_ABST

Patent Text Reader

Abstract

The present application provides a call method, a related apparatus, and a communication system. A call connection is established between a first device and a second device. The first device can mix a call background sound configured by a user in a call audio sent to the second device, and on the basis of the operation of the user adjusting the playback sound effect of the background sound, can adjust the sound effect of the call background sound in the call audio. The first device may also send a voice emoji to the second device. Upon receiving the voice emoji, the second device may play audio corresponding to the voice emoji, and / or display display content corresponding to the voice emoji. Moreover, on the basis of a user operation, the first device can also adjust the playback sound effect and / or display style of the voice emoji. According to the above-described method, a call scenario of a user can be hidden, call privacy of the user can be protected, and enjoyment of the call can be improved, thereby enhancing call experience of the user.

Need to check novelty before this filing date? Find Prior Art

Description

Communication methods, related devices and communication systems

[0001] This application claims priority to Chinese Patent Application No. 202411858437.9, filed on December 16, 2024, entitled "Method of Communication, Related Devices and Communication System", the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the field of terminal technology, and in particular to call methods, related devices and communication systems. Background Technology

[0003] With the development of electronic devices, they are playing an increasingly important role in people's lives. The calling function provided by electronic devices facilitates communication between people. How to improve the user's calling experience is a pressing issue that needs to be addressed. Summary of the Invention

[0004] This application provides a call method, related apparatus, and communication system. After a call connection is established between the first device and the second device, the first device can send voice emoticons to the second device. Upon receiving a voice emoticon, the second device can play the corresponding audio and / or display the corresponding content. The first device can also provide settings for customizing the sound effects of the voice emoticons to meet the personalized needs of users. This method can enhance the fun of calls and improve the user's call experience.

[0005] Firstly, this application provides a call method. The method is applied to a first device. The first device and a second device establish a call connection; the first device sends a first audio signal to the second device; the first device receives a second audio signal from the second device and plays the second audio signal; the first device receives a first operation to set the playback sound effect of a voice emoticon to the first sound effect, and a second operation to send the first voice emoticon; the first device mixes the first audio signal and the audio corresponding to the first voice emoticon according to the first sound effect to generate a third audio signal, and sends the third audio signal to the second device.

[0006] The first device may mix the audio of the first audio and the audio of the first voice emoticon according to the first sound effect. The first device may determine the audio parameters (such as gain, frequency, phase, etc.) corresponding to the first sound effect according to the first sound effect, and then mix the audio of the first audio and the audio of the first voice emoticon according to the parameters corresponding to the first sound effect.

[0007] The first voice emoticon can be referred to as the voice emoticon 721B shown in Figure 7B of this application. The second operation for sending the first voice emoticon can be referred to the operation of the sending control 732 shown in Figure 7B.

[0008] When the second device receives the first audio from the first device, it can also include the first audio. When the second device receives the third audio from the first device, it can play the third audio. Since the third audio includes the audio corresponding to the first voice emoticon, the user of the second device can hear the audio corresponding to the first voice emoticon sent by the other end of the call.

[0009] As can be seen, the above methods can provide users with diversified communication options during calls. In addition to voice communication, users can also express their thoughts, feelings, and attitudes by sending voice emoticons. Furthermore, users can adjust the sound effects of the voice emoticons. This increases the fun of the call and enhances the user's call experience.

[0010] In conjunction with the first aspect, in some embodiments, the first device receives an operation for sending a second voice emoticon; the first device mixes the audio corresponding to the first audio and the second voice emoticon according to the first sound effect to generate a fourth audio, and sends the fourth audio to the second device.

[0011] The second voice emoticon can be referenced to the voice emoticon 824 shown in Figure 8A of this application. The operation for sending the second voice emoticon can be referenced to the operation of the sending control 832 shown in Figure 8A.

[0012] As can be seen, without the user changing the playback sound effects of the voice emoticons, the first device can send one or more voice emoticons according to the sound effects most recently set by the user. This simplifies the process of sending voice emoticons, eliminating the need for users to set sound effects before each message.

[0013] In conjunction with the first aspect, in some embodiments, the first device receives an operation to set the playback sound effect of the voice emoticon to a second sound effect, and an operation to send a third voice emoticon; the first device mixes the audio corresponding to the first audio and the third voice emoticon according to the second sound effect to generate a fifth audio, and sends the fifth audio to the second device.

[0014] As can be seen, users can adjust the playback sound effects of voice emoticons at any time. The first device can send voice emoticons based on the user's adjusted playback sound effects. This can meet users' diverse needs for playback sound effects when sending voice emoticons.

[0015] In conjunction with the first aspect, in some embodiments, the first sound effect includes one or more of the following: the volume of the voice emoticon is a first volume, and the volume change mode of the voice emoticon is a first change mode.

[0016] The volume change modes for voice emoticons can be referenced from the one or more volume change mode options shown in Figure 7B of this application.

[0017] In conjunction with the first aspect, in some embodiments, after the first device receives the second operation for sending the first voice emoticon, the first device mixes the second audio and the audio corresponding to the first voice emoticon according to the first sound effect to generate a sixth audio; the first device plays the sixth audio.

[0018] As can be seen, after receiving the second operation to send the first voice emoticon, the first device can play the audio corresponding to the first voice emoticon. In this way, the user of the first device can hear the audio corresponding to the voice emoticon they sent, making it easier for the user to understand the playback effect of the voice emoticon they sent.

[0019] In conjunction with the first aspect, in some embodiments, when the playback sound effect of the voice emoticon is the first sound effect, the first device receives an operation for previewing the fourth voice emoticon; the first device mixes the audio of the second audio and the fourth voice emoticon according to the first sound effect to generate a seventh audio; the first device plays the seventh audio.

[0020] As you can see, users can preview voice emojis before sending them. This allows users to easily listen to the sound effects of the voice emojis and adjust them accordingly.

[0021] In conjunction with the first aspect, in some embodiments, the call connection between the first device and the second device is a video call connection, wherein the first device sends a first video to the second device; the first device receives a second video from the second device and plays the second video; in response to a second operation for sending a first voice emoticon, the first device performs image fusion on the first video and the display content corresponding to the first voice emoticon to generate a third video; and the first device sends the third video to the second device.

[0022] As can be seen, a voice-activated emoticon not only corresponds to audio but also to displayed content. In video call scenarios, the first device can also integrate the displayed content corresponding to the voice-activated emoticon into the video frame of the call, so that the other end of the call can see the displayed content of the voice-activated emoticon. This can increase the fun of the call and improve the user's call experience.

[0023] In conjunction with the first aspect, in some embodiments, the first device performs image fusion on the second video and the display content corresponding to the first voice emoticon to generate a fourth video; the first device plays the fourth video.

[0024] As can be seen, users of the first device can see the displayed content corresponding to the voice emoticons they sent. This makes it easy for users to understand the display effect of their sent voice emoticons.

[0025] In conjunction with the first aspect, in some embodiments, the display content corresponding to the first voice emoticon pack includes text and / or images. For example, the display content corresponding to the first voice emoticon pack can be a piece of text, specifically referring to voice emoticon packs 822 to 827 shown in FIG. 8A of this application. As another example, the display content corresponding to the first voice emoticon pack can be one or more images, specifically referring to the voice emoticon packs in the emoticon display area 721 shown in FIG. 7B of this application.

[0026] In conjunction with the first aspect, in some embodiments, before the first device receives the second operation for sending the first voice emoticon, the first device receives an operation for setting the display style of the voice emoticon to a first style; the first device performs image fusion on the display content corresponding to the first voice emoticon according to the first style to generate a third video, in which the display content corresponding to the first voice emoticon is displayed according to the first style.

[0027] The display style of voice emoticons can include, but is not limited to: display size, display color, display font, etc.

[0028] As can be seen, in addition to setting the sound effects for voice emoticons, users can also customize their display style. Users can select the voice emoticons they want to send and adjust their sound effects and / or display style. This better meets users' personalized needs for sending voice emoticons and enhances the fun of calls.

[0029] In conjunction with the first aspect, in some embodiments, the first device receives an operation to enable the first function and an operation to send a fifth voice emoticon; the first device acquires a first user avatar, which is the avatar of a user logged in on the calling application used to establish a call connection in the first device; the first device performs image fusion on the first user avatar and the display content corresponding to the fifth voice emoticon to generate first display content; the first device performs image fusion on the first video and the first display content to generate a fifth video; and the first device sends the fifth video to the second device.

[0030] The aforementioned first function can be the function of merging the display content corresponding to the voice emoticon with the user's avatar. The first user avatar can refer to the contact avatar 621A shown in Figure 6B of this application. The first display content generated by image fusion of the aforementioned first user avatar and the display content corresponding to the fifth voice emoticon can refer to the display content 781 shown in Figure 7G of this application.

[0031] As can be seen, when sending voice emojis, users can choose to merge the displayed content of the voice emoji with their personal avatar in the calling application before sending it to the other end of the call. Users can display corresponding expressions to the other end of the call using their own avatar. This can increase the fun of the call and improve the user's calling experience.

[0032] In some embodiments, the first device may also receive one or more types of data input by the user, such as audio, video, text, and symbols, and generate voice emoticons based on the aforementioned data. The generation of voice emoticons may include the audio corresponding to the voice emoticon and / or the display content corresponding to the voice emoticon.

[0033] This allows users to generate their own preferred voice-activated emojis and use them during the process. This improves the user experience when using voice-activated emojis.

[0034] In some embodiments, the voice emoticons (e.g., first voice emoticon, second voice emoticon, etc.) sent by the first device to the second device may be voice emoticons stored by the first device, or voice emoticons obtained from an online search in response to a voice emoticon search operation.

[0035] In conjunction with the first aspect, in some embodiments, before the first device and the second device establish a call connection, the first device receives an operation to enable a first background sound; after the first device and the second device establish a call connection, the first device mixes the audio corresponding to the first background sound with the audio collected by the first device to generate a first audio.

[0036] As can be seen, the user of the first device can set a background sound before the call is connected. Once the call is connected, the first device can generate the uplink call audio based on the user-set background sound and the captured audio. This allows the user's call scenario to be simulated immediately after the call is connected, protecting the user's call privacy.

[0037] In conjunction with the first aspect, in some embodiments, before the first device and the second device establish a call connection, the first device receives an operation to set the background sound playback effect to a third sound effect; the first device mixes the audio corresponding to the first background sound and the audio acquired by the first device according to the third sound effect.

[0038] The third sound effect includes one or more of the following: the background sound volume is the second volume, the background sound volume change mode is the second change mode, the background sound switching mode is the first switching mode, and the background sound switching duration is the first duration.

[0039] As you can see, users can customize the background sound effects during calls. This better meets users' needs for background noise and improves their call experience.

[0040] In conjunction with the first aspect, in some embodiments, after the first device mixes the audio corresponding to the first background sound and the audio acquired by the first device according to the third sound effect, the first device receives an operation to set the playback sound effect of the background sound to the fourth sound effect; the first device mixes the audio corresponding to the first background sound and the audio acquired by the first device according to the fourth sound effect to generate an eighth audio; the first device sends the eighth audio to the second device.

[0041] In conjunction with the first aspect, in some embodiments, the call connection between the first device and the second device is a video call connection. When the first background sound is enabled, the first device replaces the background in the image captured by the first device with the first background corresponding to the first background sound, generating a sixth video; the first device sends the sixth video to the second device; the first device receives the second video from the second device and plays the second video.

[0042] As can be seen, the virtual background and the background audio during a video call can be associated with the same scene. The virtual background and background audio can work together to create a more realistic virtual video call scenario, thereby better protecting the user's privacy during video calls.

[0043] Secondly, this application provides a call method, which is applied to a communication system including a first device and a second device. The first device and the second device establish a call connection; the first device sends a first audio to the second device, and the second device sends a second audio to the first device; the first device plays the second audio, and the second device plays the first audio; the first device receives a first operation for setting the playback sound effect of a voice emoticon to a first sound effect, and a second operation for sending the first voice emoticon; the first device sends first sound effect information corresponding to the first sound effect and a first emoticon identifier corresponding to the first voice emoticon to the second device; the second device obtains the audio corresponding to the first voice emoticon according to the first emoticon identifier, and mixes the first audio and the audio corresponding to the first voice emoticon according to the first sound effect information to generate a third audio; the second device plays the third audio.

[0044] As can be seen, the above methods can provide users with diversified communication options during calls. In addition to voice communication, users can also express their thoughts, feelings, and attitudes by sending voice emoticons. Furthermore, users can adjust the sound effects of the voice emoticons. This increases the fun of the call and enhances the user's call experience.

[0045] In conjunction with the second aspect, in some embodiments, the first device receives an operation for sending a second voice emoticon; the first device sends a second emoticon identifier corresponding to the second voice emoticon to the second device; the second device obtains the audio corresponding to the second voice emoticon based on the second emoticon identifier, and mixes the first audio and the audio corresponding to the second voice emoticon based on the first sound effect information to generate a fourth audio; the second device plays the fourth audio.

[0046] As can be seen, without the user changing the playback sound effects of the voice emoticons, the first device does not need to update the sound effects used by the second device to play the voice emoticons. The second device can play the voice emoticons sent by the first device using the same sound effects as the most recent voice emoticon sent by the first device. This simplifies the process of sending voice emoticons, eliminating the need for users to set sound effects before each message.

[0047] In conjunction with the second aspect, in some embodiments, the first device receives an operation to set the playback sound effect of the voice emoticon to a second sound effect, and an operation to send a third voice emoticon; the first device sends the second sound effect information corresponding to the second sound effect and the third emoticon identifier corresponding to the third voice emoticon to the second device; the second device obtains the audio corresponding to the third voice emoticon according to the third emoticon identifier, and mixes the first audio and the audio corresponding to the third voice emoticon according to the third sound effect information to generate a fifth audio; the second device plays the fifth audio.

[0048] As can be seen, users can adjust the playback sound effects of voice emoticons at any time. The first device can send the adjusted sound effects to the second device, so that the second device can play the corresponding audio of the voice emoticon from the first device according to the adjusted sound effects. This can meet the diverse needs of users for playback sound effects when sending voice emoticons.

[0049] In conjunction with the second aspect, in some embodiments, the first sound effect includes one or more of the following: the volume of the voice emoticon is a first volume, and the volume change mode of the voice emoticon is a first change mode.

[0050] In conjunction with the second aspect, in some embodiments, after the first device receives the second operation for sending the first voice emoticon, the first device mixes the second audio and the audio corresponding to the first voice emoticon according to the first sound effect to generate a sixth audio; the first device plays the sixth audio.

[0051] As can be seen, after receiving the second operation to send the first voice emoticon, the first device can play the audio corresponding to the first voice emoticon. In this way, the user of the first device can hear the audio corresponding to the voice emoticon they sent, making it easier for the user to understand the playback effect of the voice emoticon they sent.

[0052] In conjunction with the second aspect, in some embodiments, after the first device sends the first sound effect information corresponding to the first sound effect and the first emoticon identifier corresponding to the first voice emoticon to the second device, the second device obtains the display content corresponding to the first voice emoticon based on the first emoticon identifier; the second device displays the display content corresponding to the first voice emoticon.

[0053] As can be seen, a voice emoticon not only corresponds to audio but also to displayed content. During a call, the second device can display the corresponding content based on the emoticon's identifier from the first device. The user on the second device can not only hear the audio of the emoticon sent by the other end of the call but also see the displayed content. This increases the fun of the call and enhances the user's call experience.

[0054] In conjunction with the second aspect, in some embodiments, the call connection between the first device and the second device is a video call connection, wherein the first device sends a first video to the second device, and the second device sends a second video to the first device; the first device plays the second video, and the second device plays the first video; the second device performs image fusion on the first video and the display content corresponding to the first voice emoticon to generate a third video; and the second device plays the third video.

[0055] As can be seen, in video call scenarios, the second device can also integrate the display content corresponding to the voice-activated emojis from the first device into the video frame of the call, so that the user on the second device can see the displayed content of the voice-activated emojis. This can increase the fun of the call and improve the user's call experience.

[0056] In conjunction with the second aspect, in some embodiments, after the first device receives the second operation for sending the first voice emoticon, the first device displays the display content corresponding to the first voice emoticon.

[0057] As can be seen, users of the first device can see the displayed content corresponding to the voice emoticons they sent. This makes it easy for users to understand the display effect of their sent voice emoticons.

[0058] In conjunction with the second aspect, in some embodiments, the first device receives an operation to enable a first function and an operation to send a fifth voice emoticon; the first device sends a first message to the second device to indicate that the first function is enabled, and a fifth emoticon identifier corresponding to the fifth voice emoticon; according to the first message, the second device obtains a first user avatar, which is the avatar of a user logged in on the calling application used to establish a call connection in the first device; the second device performs image fusion on the first user avatar and the display content corresponding to the fifth voice emoticon to generate second display content; the second device displays the second display content.

[0059] As can be seen, when sending voice emojis, users can choose to merge the displayed content of the voice emoji with their personal avatar in the calling application before sending it to the other end of the call. Users can display corresponding expressions to the other end of the call using their own avatar. This can increase the fun of the call and improve the user's calling experience.

[0060] In conjunction with the second aspect, in some embodiments, after the first device and the second device establish a call connection, the first device receives an operation to enable a first background sound; the first device sends a first background sound identifier corresponding to the first background sound to the second device; the second device obtains the audio corresponding to the first background sound according to the first background sound identifier, and mixes the audio corresponding to the first background sound with the first audio to generate a ninth audio; the second device plays the ninth audio.

[0061] As can be seen, users can also set background noise during a call. This simulates the user's call scenario and protects the user's call privacy.

[0062] In conjunction with the second aspect, in some embodiments, before the first device receives the operation to enable the first background sound, the first device receives the operation to set the playback sound effect of the background sound to a third sound effect; the first device sends the third sound effect information corresponding to the third sound effect to the second device; the first device mixes the audio corresponding to the first background sound and the first audio according to the third sound effect.

[0063] As you can see, users can customize the background sound effects during calls. This better meets users' needs for background noise and improves their call experience.

[0064] In conjunction with the second aspect, in some embodiments, the third sound effect includes one or more of the following: the volume of the background sound is a second volume, the volume change mode of the background sound is a second change mode, the switching mode of the background sound is a first switching mode, and the switching duration of the background sound is a first duration.

[0065] In conjunction with the second aspect, in some embodiments, the call connection between the first device and the second device is a video call connection. The first device sends a first video to the second device, and the second device sends a second video to the first device. The second device obtains the first background corresponding to the first background sound based on the first background sound identifier, replaces the background of the image in the first video with the first background, and generates a sixth video. The second device plays the sixth video, and the first device plays the second video.

[0066] As can be seen, the virtual background and the background audio during a video call can be associated with the same scene. The virtual background and background audio can work together to create a more realistic virtual video call scenario, thereby better protecting the user's privacy during video calls.

[0067] Thirdly, this application provides an electronic device. The electronic device may include a memory and a processor. The memory may be used to store a computer program. The processor may be used to invoke the computer program to execute any of the possible implementation methods described in the first aspect.

[0068] Fourthly, this application provides a computer-readable storage medium storing instructions that, when executed by a processor, can implement any of the possible implementation methods described in the first aspect.

[0069] Fifthly, this application provides a computer program product that may contain computer instructions that, when executed on a processor, can implement any of the possible implementation methods described in the first aspect.

[0070] In a sixth aspect, this application provides a chip for use in an electronic device, the chip including one or more processors for invoking computer instructions to cause the electronic device to perform any of the possible implementation methods in the first aspect.

[0071] It is understood that the electronic device provided in the third aspect, the computer-readable storage medium provided in the fourth aspect, the computer program product provided in the fifth aspect, and the chip provided in the sixth aspect are all used to execute the methods provided in the embodiments of this application. Therefore, the beneficial effects they can achieve can be referred to the beneficial effects in the corresponding methods, and will not be repeated here. Attached Figure Description

[0072] Figure 1 is an architecture diagram of a communication system 10 provided in an embodiment of this application;

[0073] Figure 2A is a schematic diagram of the hardware structure of an electronic device 100 provided in an embodiment of this application;

[0074] Figure 2B is a schematic diagram of the software structure of an electronic device 100 provided in an embodiment of this application;

[0075] Figure 3 is a schematic diagram of the architecture of another communication system provided in an embodiment of this application;

[0076] Figure 4 is a schematic diagram of the architecture of another communication system provided in an embodiment of this application;

[0077] Figures 5A to 5E are schematic diagrams of some call scenarios provided in the embodiments of this application;

[0078] Figures 6A and 6B are schematic diagrams of other call scenarios provided in the embodiments of this application;

[0079] Figures 7A to 7G are schematic diagrams of other call scenarios provided in the embodiments of this application;

[0080] Figures 8A to 8D are schematic diagrams of other call scenarios provided in the embodiments of this application;

[0081] Figures 9A to 9E are schematic diagrams of other call scenarios provided in the embodiments of this application;

[0082] Figures 10A to 10F are schematic diagrams of some scenarios for setting background sounds / voice emoticons during calls, provided in the embodiments of this application;

[0083] Figure 11 is a flowchart of a call method provided in an embodiment of this application;

[0084] Figure 12 is a flowchart of another call method provided in an embodiment of this application;

[0085] Figure 13 is a flowchart of another call method provided in an embodiment of this application. Detailed Implementation

[0086] The technical solutions of the embodiments of this application are described below with reference to the accompanying drawings. In the description of the embodiments of this application, the terminology used in the following embodiments is for the purpose of describing specific embodiments only and is not intended to limit the application. As used in the specification and appended claims of this application, the singular expressions "a," "the," "the," "the," and "this" are intended to also include expressions such as "one or more," unless the context clearly indicates otherwise. It should also be understood that in the following embodiments of this application, "at least one" and "one or more" refer to one or more (including two). The term "and / or" is used to describe the relationship between related objects, indicating that three relationships can exist; for example, A and / or B can represent: A alone, A and B simultaneously, or B alone, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship.

[0087] References to "one embodiment" or "some embodiments" in this specification mean that one or more embodiments of this application include a specific feature, structure, or characteristic described in connection with that embodiment. Therefore, the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in still other embodiments," etc., appearing in different parts of this specification do not necessarily refer to the same embodiment, but rather mean "one or more, but not all, embodiments," unless otherwise specifically emphasized. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless otherwise specifically emphasized. The term "connection" includes direct connections and indirect connections, unless otherwise stated. "First" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated.

[0088] In the embodiments of this application, the words "exemplarily" or "for example" are used to indicate examples, illustrations, or explanations. Any embodiment or design described as "exemplarily" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design solutions. Specifically, the use of the words "exemplarily" or "for example" is intended to present the relevant concepts in a specific manner.

[0089] This application provides a call method. A first device can provide the function of setting a background sound during a call. After the first device and the second device establish a call connection, the first device can mix the audio captured through the microphone with the user-set background sound and send the mixed audio to the second device. This can hide the user's call context and protect the user's call privacy.

[0090] In some embodiments, the first device can provide the function of sending voice emoticons to the other end of a call. When a call connection is established between the first device and the second device, upon receiving an operation to send a voice emoticon, the first device can mix the audio corresponding to the voice emoticon with audio captured through a microphone, and send the mixed audio to the second device. The first device can also send the display content corresponding to the voice emoticon to the second device. The second device can play the audio from the first device and display the display content corresponding to the voice emoticon. In this way, the user of the second device can hear both the voice from the other end of the call and the audio corresponding to the voice emoticon sent by the other end, and see the display content corresponding to the voice emoticon sent by the other end of the call.

[0091] Voice emoticons can include one or more pre-made data such as audio, images, videos, text, symbols, etc. Voice emoticons can serve as a form of communication during calls. In a call, users can send voice emoticons to the other party to express their thoughts, emotions, and attitudes. Using voice emoticons can also liven up the call atmosphere. Voice emoticons can have corresponding audio and / or displayed content. For example, the displayed content corresponding to a voice emoticon may include an image, which can be static or animated. Electronic devices can download voice emoticons online. Optionally, electronic devices can also generate voice emoticons based on user input (such as audio, images, text, etc.). Voice emoticons can also be called spoken emoticons, call emoticons, etc.

[0092] As can be seen, the above methods can provide users with diversified communication options during calls. In addition to voice communication, users can also express their thoughts, feelings, and attitudes by sending voice emoticons. This increases the enjoyment of the call and enhances the user's call experience.

[0093] Figure 1 illustrates an exemplary architecture diagram of the communication system 10 provided in an embodiment of this application.

[0094] As shown in Figure 1, the communication system 10 may include electronic device 100 and electronic device 101. Electronic device 100 and electronic device 101 can establish a call connection. Specifically, electronic device 100 can make a call to electronic device 101, meaning electronic device 100 can be the caller and electronic device 101 can be the called party. Alternatively, electronic device 101 can also make a call to electronic device 100, meaning electronic device 101 can be the caller and electronic device 101 can be the called party.

[0095] In some embodiments, the communication connection between electronic device 100 and electronic device 101 can be a voice call connection, or it can be a video call connection.

[0096] In some embodiments, the call connection between electronic device 100 and electronic device 101 can be a call connection based on the Global System for Mobile Communications (GSM). For example, both electronic device 100 and electronic device 101 include a subscriber identity module (SIM) card. Electronic device 100 can dial the phone number corresponding to the SIM card in electronic device 101 to initiate a call request. Upon receiving a call request from electronic device 100, electronic device 101 can display an incoming call notification to prompt the user to answer the call. Upon receiving the operation to answer the call, electronic device 101 can complete the call connection with electronic device 100. Alternatively, the call connection between electronic device 100 and electronic device 101 can be a call connection based on the Voice over Internet Protocol (VoIP). This application embodiment does not limit the implementation method of the above-described call connection.

[0097] In some embodiments, the communication system 10 may include more electronic devices than just electronic devices 100 and 101. For example, electronic devices 100 and 101 may enable multi-party calls with other electronic devices.

[0098] The structure of the electronic device 100 involved in the embodiments of this application is described below.

[0099] Figure 2A illustrates a schematic diagram of the hardware structure of the electronic device 100.

[0100] As shown in Figure 2A, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone jack 170D, a sensor module 180, buttons 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a SIM card interface 195, etc.

[0101] It is understood that the structures illustrated in the embodiments of this application do not constitute a specific limitation on the electronic device 100. In other embodiments of this application, the electronic device 100 may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

[0102] Processor 110 may include one or more processing units, such as: application processor (AP), modem processor, graphics processing unit (GPU), image signal processor (ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and / or neural network processing unit (NPU), etc. Different processing units may be independent devices or integrated into one or more processors.

[0103] The controller can be the nerve center and command center of the electronic device 100. The controller can generate operation control signals according to the instruction opcode and timing signals to complete the control of fetching and executing instructions.

[0104] The processor 110 may also include a memory for storing instructions and data. In some examples, the memory in the processor 110 is a cache memory. This memory can store instructions or data that the processor 110 has just used or is recurring. If the processor 110 needs to use the instruction or data again, it can retrieve it directly from the memory. This avoids repeated accesses, reduces the waiting time of the processor 110, and thus improves the efficiency of the system.

[0105] In this application, a computer program may be stored in the memory to enable a controller or processor to implement the call method of this application through an interface or protocol. Exemplarily, the computer program stored in the memory may be used for: establishing a call connection with other electronic devices; providing a function for setting a background sound during a call; mixing the background sound selected by the user with audio acquired through a microphone and sending it to the other end of the call during a call; providing a function for sending voice emoticons; mixing the audio corresponding to the voice emoticon selected by the user with the audio acquired through a microphone and sending it to the other end of the call during a call, and sending the display content corresponding to the voice emoticon to the other end of the call; receiving and displaying the display content corresponding to the voice emoticon sent by the other end of the call; generating background sound and voice emoticons based on multimedia information input by the user (e.g., images, text, video, audio, etc.).

[0106] USB port 130 is a USB standard compliant interface, specifically a Mini USB port, Micro USB port, USB Type-C port, etc. USB port 130 can be used to connect a charger to charge electronic device 100, and can also be used for data transfer between electronic device 100 and peripheral devices. It can also be used to connect headphones for audio playback.

[0107] The charging management module 140 receives charging input from a charger, which can be a wireless charger or a wired charger. While charging the battery 142, the charging management module 140 can also supply power to the electronic device via the power management module 141.

[0108] The power management module 141 is used to connect the battery 142, the charging management module 140, and the processor 110. The power management module 141 receives input from the battery 142 and / or the charging management module 140 to power the processor 110, internal memory 121, external memory, display 194, camera 193, and wireless communication module 160, etc.

[0109] The wireless communication function of electronic device 100 can be realized through antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, modem processor and baseband processor, etc.

[0110] Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 can be used to cover one or more communication frequency bands. Different antennas can also be multiplexed to improve antenna utilization. For example, antenna 1 can be multiplexed as a diversity antenna for a wireless local area network. In some other embodiments, the antennas can be used in conjunction with tuning switches.

[0111] The mobile communication module 150 can provide solutions for wireless communication, including 2G / 3G / 4G / 5G, applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc. The mobile communication module 150 can receive electromagnetic waves via antenna 1, and perform filtering, amplification, and other processing on the received electromagnetic waves before transmitting them to a modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves for radiation via antenna 1.

[0112] The wireless communication module 160 can provide solutions for wireless communication applications on the electronic device 100, including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared (IR) technologies. The wireless communication module 160 can be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via antenna 2, performs frequency modulation and filtering of the electromagnetic wave signals, and sends the processed signal to processor 110. The wireless communication module 160 can also receive signals to be transmitted from processor 110, perform frequency modulation and amplification, and convert them into electromagnetic waves for radiation via antenna 2.

[0113] In some embodiments, antenna 1 of electronic device 100 is coupled to mobile communication module 150, and antenna 2 is coupled to wireless communication module 160, enabling electronic device 100 to communicate with networks and other devices via wireless communication technology. The wireless communication technology may include GSM, General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and / or IR technologies, etc.

[0114] Electronic device 100 implements display functions through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, connecting the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations and for graphics rendering.

[0115] The display screen 194 is used to display images, videos, etc. In some embodiments, the electronic device 100 may include one or N display screens 194, where N is a positive integer greater than 1.

[0116] Electronic device 100 can perform shooting functions through ISP, camera 193, video codec, GPU, display 194 and application processor.

[0117] The ISP is used to process data fed back from the camera 193. For example, when taking a picture, the shutter is opened, and light is transmitted through the lens to the camera's photosensitive element. The light signal is converted into an electrical signal, and the camera's photosensitive element transmits the electrical signal to the ISP for processing, converting it into an image visible to the naked eye.

[0118] Camera 193 is used to capture still images or videos. In some embodiments, electronic device 100 may include one or N cameras 193, where N is a positive integer greater than 1.

[0119] Digital signal processors (DSPs) are used to process digital signals. Besides digital image signals, they can also process other digital signals. For example, when electronic device 100 selects a frequency, the DSP can perform Fourier transforms on the frequency energy.

[0120] An NPU (Neural Processing Unit) is a computational processor for neural networks (NNs). By borrowing the structure of biological neural networks, such as the transmission patterns between neurons in the human brain, it can rapidly process input information and continuously learn on its own. NPUs enable intelligent cognitive applications in electronic devices, such as image recognition, facial recognition, speech recognition, and text understanding.

[0121] The external storage interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external storage interface 120 to perform data storage functions. For example, music, video, and other files can be saved on the external memory card.

[0122] Internal memory 121 can be used to store computer executable program code, which includes instructions. Processor 110 executes various functional applications and data processing of electronic device 100 by running the instructions stored in internal memory 121. Internal memory 121 may include a program storage area and a data storage area. The program storage area may store the operating system, at least one application program required for a function (such as sound playback, image playback, etc.), etc. The data storage area may store data created during the use of electronic device 100 (such as audio data, phonebook, etc.). Furthermore, internal memory 121 may include high-speed random access memory and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.

[0123] Electronic device 100 can implement audio functions, such as music playback and recording, through audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, and application processor.

[0124] The audio module 170 is used to convert digital audio information into analog audio signals for output, and also to convert analog audio input into digital audio signals. The audio module 170 can also be used for encoding and decoding audio signals. In some embodiments, the audio module 170 may be located in the processor 110, or some functional modules of the audio module 170 may be located in the processor 110.

[0125] The speaker 170A, also known as a "loudspeaker," is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music or make hands-free calls through the speaker 170A.

[0126] The receiver 170B, also known as the "earpiece," is used to convert audio electrical signals into sound signals. When the electronic device 100 receives a telephone call or voice message, the receiver 170B can be brought close to the ear to hear the voice.

[0127] Microphone 170C, also known as a "microphone" or "voice transducer," is used to convert sound signals into electrical signals. When making a phone call or sending a voice message, the user can speak by bringing their mouth close to microphone 170C, inputting the sound signal into microphone 170C.

[0128] The 170D headphone jack is used to connect wired headphones.

[0129] In this application, the speaker 170A, receiver 170B, and microphone 170C are all optional.

[0130] The sensor module 180 may include pressure sensors, gyroscope sensors, barometric pressure sensors, magnetic sensors, accelerometers, gravity sensors, distance sensors, proximity sensors, fingerprint sensors, temperature sensors, touch sensors, ambient light sensors, bone conduction sensors, angle sensors, etc.

[0131] Buttons 190 include a power button, volume buttons, etc. Motor 191 can generate vibration feedback. Indicator 192 can be an indicator light, used to indicate charging status, battery level changes, and also to indicate messages, missed calls, notifications, etc.

[0132] The SIM card interface 195 is used to connect a SIM card. The SIM card can be inserted into or removed from the SIM card interface 195 to make contact with and detach from the electronic device 100. The electronic device 100 can support one or N SIM card interfaces, where N is a positive integer greater than 1. The electronic device 100 interacts with the network through the SIM card to achieve functions such as calls and data communication. In some examples, the electronic device 100 uses an eSIM, i.e., an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be removed from it.

[0133] Electronic device 100 can be a mobile phone, tablet computer, laptop computer, smartwatch, television, or other electronic device capable of providing call functionality. This application does not limit the type of electronic device 100.

[0134] The software system of electronic device 100 can adopt a layered architecture, event-driven architecture, microkernel architecture, microservice architecture, or cloud architecture. This application embodiment uses a layered architecture. Taking the system as an example, the software structure of electronic device 100 is illustrated.

[0135] Figure 2B illustrates an exemplary schematic diagram of the software structure of the electronic device 100.

[0136] A layered architecture divides software into several layers, each with a clear role and function. Layers communicate with each other through software interfaces. In some embodiments, [the following is omitted as the text is incomplete and likely refers to a specific implementation or feature]. The system is divided into four layers, from top to bottom: application layer, application framework layer, system library, and kernel layer.

[0137] The application layer can include a series of application packages.

[0138] As shown in Figure 2B, the application package may include applications such as camera, gallery, calendar, map, navigation, WLAN, Bluetooth, music, SMS, calling application, and call management application.

[0139] Calling apps can be used to provide the functions of making and receiving calls.

[0140] The call management application provides functions for managing call background sounds and voice emoticons, as well as setting call background sounds and sending / receiving voice emoticons during calls on the electronic device 100 using the call application. The call background sound can be mixed with the audio captured by the microphone of the electronic device 100 during a call, effectively hiding the user's current context and protecting their call privacy. For example, the types of call background sounds can include, but are not limited to: rain sounds, singing sounds, traffic sounds, keyboard sounds, etc. Call background sounds can also be called virtual background sounds or virtual call backgrounds. Managing call background sounds can include managing call background sounds stored locally on the electronic device 100 (e.g., downloading and storing call background sounds, deleting locally stored call background sounds, etc.), generating call background sounds based on user input, setting a default call background sound, managing the playback sound effects of call background sounds, and managing the switching methods of call background sounds, etc. Managing voice emoticons can include managing voice emoticons stored locally by the electronic device 100 (e.g., downloading and storing voice emoticons, deleting locally stored voice emoticons), generating voice emoticons based on user input, managing the playback sound effects of voice emoticons, managing the display style of voice emoticons, and so on.

[0141] In some embodiments, the call management application can be an application independent of the call application. The call management application can provide interfaces for invoking background audio and / or voice emoticons. Thus, during a call, the call application can, based on the aforementioned interfaces for invoking background audio and voice emoticons, allow the user to select the desired background audio and send voice emoticons.

[0142] Alternatively, the call management app can be a sub-app integrated into the call app. The call app can use the call management app to provide users with features such as setting background music and sending / receiving voice emoticons.

[0143] The application framework layer provides APIs and a programming framework for applications within the application layer. The application framework layer includes some predefined functions.

[0144] As shown in Figure 2B, the application framework layer may include a window manager, content provider, view system, phone manager, resource manager, notification manager, activity manager, mixing module, noise reduction module, image fusion module, audio codec module, video codec module, voice emoticon / call background sound management module, audio fitting control module, etc.

[0145] The window manager is used to manage windowed applications. It can retrieve screen size, determine the presence of a status bar, lock the screen, and capture screenshots, among other things.

[0146] Content providers store and retrieve data, making that data accessible to applications. This data may include videos, images, audio, made and received phone calls, browsing history and bookmarks, phone books, etc.

[0147] A view system includes visual controls, such as controls for displaying text and controls for displaying images. View systems can be used to build applications. A display interface can consist of one or more views. For example, a display interface including a text notification icon could include views for displaying text and views for displaying images.

[0148] The phone manager is used to provide communication functions for electronic device 100. For example, it manages call status (including connection and disconnection).

[0149] The file explorer provides applications with various resources, such as localized strings, icons, images, layout files, video files, and more.

[0150] The notification manager allows applications to display notifications in the status bar (such as the pull-down notification bar). It can be used to convey informational messages and can disappear automatically after a short pause without user interaction. For example, the notification manager can be used to notify users of download completion or message alerts. The notification manager can also display notifications as icons or scrolling text in the top status bar, such as notifications from background applications, or as dialog boxes on the screen. Examples include displaying text messages in the status bar, emitting sounds, vibrating electronic devices, and flashing indicator lights.

[0151] The Activity Manager is responsible for managing activities, including starting, switching, and scheduling components in the system, as well as managing and scheduling applications. The Activity Manager can be called by upper-level applications to open the corresponding activities.

[0152] The mixing module can be used to mix multiple audio streams.

[0153] The noise reduction module can be used to reduce noise in audio. For example, in a call scenario, the noise reduction module can reduce noise in the audio captured by the microphone to filter out ambient noise and improve call quality.

[0154] The image fusion module can be used to merge multiple images into a single image. For example, in a video call scenario, the image fusion module can perform image fusion on the image captured by the camera and the display content corresponding to the voice emoticons.

[0155] An audio codec module can be used to encode audio and decode encoded audio. For example, in a call scenario, electronic device 100 can encode the call audio from its own end and send it to the other end, and then decode the call audio from the other end. Electronic device 100 can play the decoded audio through a handset or speaker. In some embodiments, the audio codec module may include an audio encoding module and an audio decoding module.

[0156] The video encoding / decoding module can be used to encode video and decode encoded video. For example, in a video call scenario, electronic device 100 can encode the call video on its own end and send it to the other end of the call, and then decode the call video from the other end. Electronic device 100 can then play the decoded video. In some embodiments, the video encoding / decoding module may include a video encoding module and a video decoding module.

[0157] The embodiments of this application do not limit the specific methods of the above-mentioned audio encoding and decoding and video encoding and decoding.

[0158] The voice emoticon / call background sound management module can be used to manage voice emoticons and call background sounds. When the call management application receives an operation to manage voice emoticons / call background sounds, it can instruct the voice emoticon / call background sound management module to perform the corresponding management of voice emoticons and call background sounds.

[0159] The audio fitting control module can be used to control the mixing module to perform audio mixing. For example, during a call, the audio fitting control module can acquire the switching mode of the background noise and instruct the mixing module to mix the background noise with the audio captured by the microphone according to the preset background noise switching mode. As another example, during a call, the audio fitting control module can acquire the playback sound effects corresponding to the background noise / voice emoticons and instruct the mixing module to mix the background noise / voice emoticons with the audio captured by the microphone according to the preset playback sound effects.

[0160] System libraries can include multiple functional modules. For example: surface manager, media libraries, 3D graphics processing libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), etc.

[0161] The Surface Manager is used to manage the display subsystem and provides the blending of 2D and 3D layers for multiple applications.

[0162] The media library supports playback and recording of various common audio and video formats, as well as still image files. It supports multiple audio and video encoding formats, such as MPEG4, H.264, MP3, AAC, AMR, JPG, and PNG.

[0163] The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.

[0164] A 2D graphics engine is a graphics engine for 2D drawing.

[0165] The kernel layer is the layer between hardware and software. The kernel layer contains at least the display driver, camera driver, audio driver, and sensor driver.

[0166] The device structure of electronic device 101 can be referred to the structure of electronic device 100 shown in Figures 2A and 2B above.

[0167] Figure 3 illustrates an exemplary schematic diagram of another communication system architecture provided in an embodiment of this application.

[0168] As shown in Figure 3, electronic device 100 and electronic device 101 can establish a call connection based on a call application. The call applications in electronic device 100 and electronic device 101 can be the same or different.

[0169] During a call, the electronic device 100 can acquire audio via an audio input device. For example, the audio input device may include a microphone in the electronic device 100. Alternatively, the electronic device 100 may be connected to headphones. The electronic device 100 can acquire the user's voice during the call via the microphone in the headphones. This application embodiment does not limit the method by which the electronic device 100 acquires the user's voice during a call.

[0170] In some embodiments, the audio input device in the electronic device 100 can send the acquired audio to the noise reduction module. The noise reduction module can reduce the noise in the audio and send the noise-reduced audio to the mixing module. This application embodiment does not limit the noise reduction method described above. For example, the noise reduction module can filter out sounds other than human voices from the audio. That is, the noise-reduced audio may contain only the user's voice.

[0171] In some embodiments, the noise reduction module can reduce audio noise to different degrees. A higher noise reduction level means the module filters out more components from the audio, resulting in cleaner ambient background noise. For example, after the electronic device 100 receives an instruction from the user to set a background sound for a call, the noise reduction module can increase the noise reduction level. This better replaces the actual background noise in the user's environment with the user-selected background sound, improving the effectiveness of the background sound setting and protecting the user's call privacy.

[0172] In some embodiments, the audio input device in the electronic device 100 can also directly send the collected audio to the mixing module. That is, during a call, the electronic device 100 may not need to perform noise reduction processing on the audio collected by the audio input device.

[0173] If the user does not set a background sound or send any voice emoticons, the mixing module can directly send the received audio (e.g., noise-reduced audio) to the audio encoding module. The audio encoding module can encode the received audio and send the encoded audio to the calling application. Then, the calling application in electronic device 100 can send the encoded audio to the calling application in electronic device 101 via the call connection.

[0174] When a call audio is received from electronic device 101, the call application of electronic device 100 can send the call audio from electronic device 101 to the audio decoding module. The audio decoding module can decode the call audio from electronic device 101 and send the decoded audio to the noise reduction module. The noise reduction module can reduce the noise of the received audio and send the noise-reduced audio to the mixing module.

[0175] Optionally, the audio decoding module can also directly send the decoded audio to the mixing module. That is, during a call, the electronic device 100 does not need to perform noise reduction processing on the audio from the other end of the call.

[0176] If the user has not set a background sound or sent any voice emoticons, the mixing module can directly send the received audio to the audio output device. For example, the audio output device may include the earpiece and speaker in electronic device 100. Electronic device 100 can play the call audio from electronic device 101 through its earpiece or speaker. Alternatively, if electronic device 100 is connected to headphones, it can also play the call audio from electronic device 101 through the earpiece in the headphones. This application embodiment does not limit the method by which electronic device 100 plays call audio from electronic device 101.

[0177] It should be noted that the uplink call audio in electronic device 100 can be the audio that electronic device 100 needs to send to the other end of the call (e.g., electronic device 101). The downlink call audio in electronic device 100 can be the audio received by electronic device 100 from the other end of the call. That is to say, in electronic device 100, the audio processing path composed of audio input device, noise reduction module, mixing module, and audio encoding module can be the processing path for uplink call audio; the audio processing path composed of audio decoding module, noise reduction module, mixing module, and audio output device can be the processing path for downlink call audio.

[0178] The uplink and downlink call audio in electronic device 101 can be referenced from the uplink and downlink call audio in electronic device 100 mentioned above. Further details will not be provided here.

[0179] If the call connection established between electronic device 100 and electronic device 101 is a video call connection, video of the call can also be transmitted between electronic device 100 and electronic device 101.

[0180] Electronic device 100 can acquire images using an image acquisition device. For example, the image acquisition device may include a camera in electronic device 100. Alternatively, electronic device 100 may be connected to other devices that include cameras. Electronic device 100 can acquire images through its connected devices that include cameras. This application does not limit the method by which electronic device 100 acquires images.

[0181] In some embodiments, the image acquisition device in the electronic device 100 can send the acquired image to the image fusion module.

[0182] If the user does not send a voice emoticon or the emoticon sent by the user does not have corresponding display content, the image fusion module in electronic device 100 can directly send the received image to the video encoding module. The video encoding module can encode the received image and send the encoded video to the calling application. Then, the calling application in electronic device 100 can send the encoded video to the calling application in electronic device 101 through the call connection.

[0183] When a video call is received from electronic device 101, the calling application of electronic device 100 can send the video call from electronic device 101 to the video decoding module. The video decoding module can decode the video call from electronic device 101 and send the decoded video to the image fusion module.

[0184] If the user does not send a voice emoticon or the emoticon sent by the user does not have corresponding display content, the image fusion module can directly send the received video to the image display device. For example, the image display device may include the screen of electronic device 100. Electronic device 100 can play the call video from electronic device 101 through the image display device. Alternatively, electronic device 100 is connected to other devices that include screens. Electronic device 100 can play the call video from electronic device 101 through its connected devices that include screens. This application embodiment does not limit the method by which electronic device 100 plays the call video from electronic device 101.

[0185] Specifically, the uplink video in electronic device 100 can be the video that electronic device 100 needs to send to the other end of the call. The downlink video in electronic device 100 can be the video received by electronic device 100 from the other end of the call. That is to say, in electronic device 100, the video processing path composed of image acquisition device, image fusion module, and video encoding module can be the processing path for uplink video; the video decoding module, image fusion module, and image display device can be the processing path for downlink video.

[0186] In some embodiments, the electronic device 100 can play both downlink and uplink video calls. This allows the user of the electronic device 100 to see both the video feed from the other end of the video call and the video feed of themselves being filmed.

[0187] The uplink and downlink video calls in electronic device 101 can be referenced from the uplink and downlink video calls in electronic device 100 mentioned above. Further details will not be provided here.

[0188] In some embodiments, when an operation to set a background sound for a call is received, the call management application of the electronic device 100 can invoke the voice emoticon / background sound management module to set and determine the background sound selected by the user. The voice emoticon / background sound management module can send the user-selected background sound to the audio fitting control module. The audio fitting control module can instruct the mixing module to mix the background sound and the call audio sent by the noise reduction module.

[0189] The audio fitting control module in electronic device 100 can instruct the mixing module to mix the background noise and uplink audio. In the uplink audio processing path, the mixing module can send the audio obtained by mixing the background noise and uplink audio to the audio encoding module. Thus, electronic device 100 can send the call audio mixed with the user-set background noise to electronic device 101. For example, if the user-set background noise for electronic device 100 is rain, the user of electronic device 101 can hear both rain and the user's voice when answering a call.

[0190] If the user of electronic device 101 has also set a background sound for the call, the downlink call audio received by electronic device 100 may be audio containing the background sound set by the user of electronic device 101.

[0191] In some embodiments, during a video call, when an operation to set a virtual background is received, the call management application of electronic device 100 can send the user-selected virtual background to the image fusion module. The image fusion module can replace the background of the image captured by the image acquisition device with the user-selected virtual background and send the background-replaced image to the video encoding module. In this way, electronic device 100 can send the uplink call video generated based on the background-replaced image to electronic device 101 for display.

[0192] The electronic device 100 can also display the uplink call video generated from the image with the replaced background. In this way, the user of the electronic device 100 can see the display effect of the virtual background they set.

[0193] Optionally, the background noise mixed in the uplink call audio and the virtual background integrated in the uplink call video can be associated with the same scene. The virtual background and the background noise can work together to create a more realistic virtual video call scene, thereby better protecting the user's privacy during video calls.

[0194] In some embodiments, when an operation to send a voice emoticon is received, the call management application of the electronic device 100 can invoke the voice emoticon / call background sound management module to determine the voice emoticon to be sent. The voice emoticon / call background sound management module can send the audio corresponding to the voice emoticon to be sent to the audio fitting control module. The audio fitting control module can instruct the mixing module to mix the audio corresponding to the voice emoticon with the call audio sent by the noise reduction module.

[0195] The audio fitting control module in electronic device 100 can instruct the mixing module to mix the audio corresponding to the voice emoticon and the uplink call audio. In the uplink call audio processing path, the mixing module can send the audio obtained by mixing the audio corresponding to the voice emoticon and the uplink call audio to the audio encoding module. Thus, electronic device 100 can send the call audio mixed with the audio corresponding to the voice emoticon to electronic device 101. In this way, the user of electronic device 101 can hear the audio corresponding to the voice emoticon sent by the user of electronic device 100.

[0196] Optionally, if a background noise is set for the call, the audio fitting control module in the electronic device 100 can instruct the mixing module to mix the audio corresponding to the voice emoticon, the uplink call audio, and the background noise for the call.

[0197] Optionally, the audio fitting control module in the electronic device 100 can also instruct the mixing module to mix the audio corresponding to the voice emoticon and the downlink call audio. In the downlink call audio processing path, the mixing module can send the audio obtained by mixing the audio corresponding to the voice emoticon and the downlink call audio to the audio output device. In this way, after sending a voice emoticon, the user of the electronic device 100 can hear the audio corresponding to the voice emoticon they sent.

[0198] In some embodiments, during a voice call, when an operation to send a voice emoticon is received, the call management application of electronic device 100 can also send the display content corresponding to the voice emoticon to the call application. The call application of electronic device 100 can send the display content corresponding to the voice emoticon to the call application of electronic device 101. The call application of electronic device 101 can then have the display content corresponding to the voice emoticon displayed by an image display device.

[0199] In some embodiments, during a voice call, when a display content corresponding to a voice emoticon is received from electronic device 101, the call application in electronic device 100 can forward the display content to an image display device for display. If the display content corresponding to the voice emoticon is video, the call application in electronic device 100 can send the display content to a video decoding module. The video decoding module can then send the video-decoded display content corresponding to the voice emoticon to the image display device via an image fusion module for display.

[0200] In some embodiments, during a video call, when an operation to send a voice emoticon is received, the call management application of electronic device 100 can send the display content corresponding to the voice emoticon to the image fusion module. The image fusion module can fuse the image captured by the image acquisition device and the display content corresponding to the voice emoticon, and send the fused image to the video encoding module. The video encoding module in electronic device 100 can encode the fused image and send the encoded video to the call application. The call application in electronic device 100 can then send the encoded video to the call application of electronic device 101 via the call connection. Thus, electronic device 101 can display a video screen that incorporates the display content corresponding to the voice emoticon.

[0201] The call audio, which includes the audio corresponding to the voice emoticon, and the displayed content corresponding to the voice emoticon can carry time synchronization information (e.g., a timestamp). This time synchronization information can be used to synchronize the display of the audio corresponding to the voice emoticon and the displayed content. That is, based on the time synchronization information, the electronic device 101 can play the call audio, which includes the audio corresponding to the voice emoticon, while simultaneously displaying the displayed content corresponding to the voice emoticon. The method for synchronizing the playback of the audio corresponding to the voice emoticon and the displayed content can refer to the audio-video synchronization method in a video call scenario. This application embodiment does not limit the method for synchronizing the audio corresponding to the voice emoticon and the displayed content.

[0202] Optionally, the image fusion module of the electronic device 100 can also perform image fusion on the images contained in the video decoded by the video decoding module and the display content corresponding to the voice emoticons, and then display the fused image on the image display device. In this way, after sending a voice emoticon, the user of the electronic device 100 can see the display content corresponding to the voice emoticon they sent.

[0203] In the scenario where electronic device 100 sends a voice emoticon to electronic device 101, the image fused with the display content corresponding to the voice emoticon can be an image contained in the downlink video call of electronic device 100, or it can be an image contained in the uplink video call of electronic device 100.

[0204] As can be seen, when the user of electronic device 100 sets a background sound for the call, electronic device 100 can mix the background sound with the uplink call audio and send it to electronic device 101. In a voice call scenario, when receiving an operation to send a voice emoticon from the user, electronic device 100 can mix the audio corresponding to the voice emoticon with the uplink call audio and send it to electronic device 101, and also send the display content corresponding to the voice emoticon to electronic device 101. In a video call scenario, when receiving an operation to send a voice emoticon from the user, electronic device 100 can mix the audio corresponding to the voice emoticon with the uplink call audio and send it to electronic device 101, and also merge the display content corresponding to the voice emoticon with the uplink call video and send it to electronic device 101. Electronic device 101 plays the audio from electronic device 100, allowing the user of electronic device 101 to hear the background sound set by the other end of the call and / or the audio corresponding to the voice emoticon sent by the other end. Electronic device 101 displays the data according to the display data from electronic device 101, allowing the user of electronic device 101 to see the display content corresponding to the voice emoticon sent by the other end of the call. In other words, the call management application, voice emoticon / call background sound management module, and audio fitting control module in electronic device 101 are optional.

[0205] Figure 4 illustrates an exemplary schematic diagram of another communication system architecture provided in an embodiment of this application.

[0206] As shown in Figure 4, electronic device 100 and electronic device 101 can establish a call connection based on a call application.

[0207] During a call, electronic device 100 can send uplink call audio to electronic device 101. Electronic device 100 can receive and play downlink call audio from electronic device 101. If the call connection between electronic device 100 and electronic device 101 is a video call connection, electronic device 100 can send uplink call video to electronic device 101. Electronic device 100 can receive and play downlink call video from electronic device 101. The uplink call audio, downlink call audio, uplink call video, and downlink call video in electronic device 100 can be referred to the description in the foregoing embodiments.

[0208] In some embodiments, when an operation to set a background sound for a call is received, the call management application of electronic device 100 can send an identifier of the background sound for the call to the call management application in electronic device 101.

[0209] The call management application in electronic device 101 can send an identifier of the call background sound from electronic device 100 to the voice emoticon / call background sound management module. The voice emoticon / call background sound management module can obtain the call background sound based on the identifier and send it to the audio fitting control module. The audio fitting control module can instruct the mixing module to mix the call background sound and the downlink call audio from electronic device 100. Specifically, the call application in electronic device 101 can send the downlink call audio from electronic device 100 to the audio decoding module. The audio decoding module can decode the downlink call audio and send the decoded downlink call audio to the noise reduction module. The noise reduction module can reduce the noise in the decoded downlink call audio and send the denoised downlink call audio to the mixing module. The mixing module can mix the denoised downlink call audio and the call background sound sent by the audio fitting control module and then output the mixed audio to the audio output device for playback.

[0210] In some embodiments, in a video call scenario, when an operation to set a virtual background is received, the call management application of electronic device 100 can send an identifier of the virtual background to the call management application in electronic device 101.

[0211] The call management application in electronic device 101 can send a corresponding virtual background to the image fusion module based on the identifier of the virtual background from electronic device 100. The image fusion module in electronic device 101 can replace the background of the video call from electronic device 100, which has been decoded, with the virtual background corresponding to the identifier of the aforementioned virtual background. The image fusion module in electronic device 101 can then display the video with the replaced background on an image display device.

[0212] In some embodiments, when an operation to send a voice emoticon is received, the call management application of electronic device 100 can send the identifier of the voice emoticon to the call management application in electronic device 101.

[0213] The call management application in electronic device 101 can send the identifier of the voice emoticon from electronic device 100 to the voice emoticon / call background sound management module. The voice emoticon / call background sound management module can obtain the audio and / or display content corresponding to the voice emoticon based on the aforementioned voice emoticon identifier.

[0214] The voice emoticon / call background sound management module in electronic device 101 can instruct the mixing module to mix the audio corresponding to the voice emoticon with the downlink call audio from electronic device 100. The mixing module can mix the noise-reduced downlink call audio with the audio corresponding to the voice emoticon sent by the audio fitting control module, and then output the mixed audio to the audio output device for playback. In this way, the user of electronic device 101 can hear the audio corresponding to the voice emoticon sent by the user of electronic device 100.

[0215] Optionally, if the audio fitting control module in electronic device 101 has received the call background sound determined by the identifier of the call background sound from electronic device 100 before receiving the audio corresponding to the voice emoticon, the audio fitting control module in electronic device 101 can instruct the mixing module to mix the audio corresponding to the voice emoticon, the uplink call audio, and the call background sound.

[0216] Optionally, in a scenario where electronic device 100 sends a voice emoticon to electronic device 101, the audio fitting control module of electronic device 100 can instruct the mixing module to mix the audio corresponding to the voice emoticon and the downlink call audio. The mixing module of electronic device 100 can mix the noise-reduced downlink call audio and the audio corresponding to the voice emoticon sent by the audio fitting control module, and then output the mixed audio to the audio output device for playback. In this way, the user of electronic device 100 can hear the audio corresponding to the voice emoticon they sent after sending it.

[0217] In some embodiments, during a voice call, when an identifier for a voice emoticon is received from the electronic device 101, the call management application in the electronic device 101 can send the display content corresponding to the voice emoticon to the image fusion module. The image fusion module in the electronic device 101 can then have the display content corresponding to the voice emoticon displayed by an image display device. The display content corresponding to the voice emoticon can be displayed floating on the screen. Alternatively, the display content corresponding to the voice emoticon can be displayed in a preset area of the voice call interface. This preset area can be an area agreed upon by the call management application and the call application. This application embodiment does not limit the display area and display method of the voice emoticon.

[0218] In some embodiments, during a video call, the image fusion module of electronic device 101 can also receive the downlink call video after video decoding. The image fusion module of electronic device 101 can perform image fusion on the display content corresponding to the voice emoticon and the image contained in the downlink call video, and then send the fused image to an image display device for display. In this way, electronic device 101 can display the display content corresponding to the voice emoticon sent by electronic device 100.

[0219] Optionally, in a scenario where electronic device 100 sends voice emoticons to electronic device 101, the call management application of electronic device 100 can send the display content corresponding to the voice emoticon to the image fusion module. The image fusion module in electronic device 100 can either have the display content corresponding to the voice emoticon displayed by an image display device, or perform image fusion between the display content corresponding to the voice emoticon and the image contained in the downlink call video, and then display the result on the image display device. In this way, after sending a voice emoticon, the user of electronic device 100 can see the display content corresponding to the voice emoticon they sent.

[0220] In some embodiments, the identifiers of the background sound, virtual background, and voice emoticon can be transmitted based on the call connection established between the call application in electronic device 100 and the call application in electronic device 101. For example, after electronic device 100 and electronic device 101 establish a call connection, they can transmit call audio and video through this connection. When an operation is detected where a user sets a background sound or sends a voice emoticon, the call application in electronic device 100 can obtain the identifier of the background sound or the identifier of the voice emoticon from the call management application. Then, the call application in electronic device 100 can send the identifier of the background sound or the identifier of the voice emoticon to electronic device 101 through the call connection. The call management application in electronic device 101 can obtain the identifier of the background sound or the identifier of the voice emoticon from the call application.

[0221] Alternatively, the transmission path for the aforementioned background sound identifier, virtual background identifier, and voice emoticon identifier may differ from the transmission path used to transmit call audio and video. For example, when the call management application in electronic device 100 detects that a user has set a background sound or sent a voice emoticon, the call management application can obtain information about the other party in the call (such as the other party's account information, device information, etc.) from the call application of electronic device 100. Based on the information of the other party, the call management application of electronic device 100 can send the background sound identifier or the voice emoticon identifier to the call management application of the other party (such as electronic device 101).

[0222] In some embodiments, when both electronic devices 100 and 101 include a call management application, a voice emoticon / call background sound management module, and an audio fitting control module, electronic device 100, upon detecting a user's action of setting a call background sound or sending a voice emoticon, can mix the audio corresponding to the call background sound / voice emoticon with the acquired audio and send it to electronic device 101. Furthermore, electronic device 100 can also fuse the display content corresponding to the virtual background / voice emoticon with the acquired image and send it to electronic device 101. For details, please refer to the description in Figure 3 above.

[0223] In a voice call scenario, when a user sends a voice emoticon, electronic device 100 can mix the audio corresponding to the voice emoticon with the acquired audio and send it to electronic device 101, as well as send the display content corresponding to the voice emoticon to electronic device 101. Since electronic device 101 includes a call management application, the call management application in electronic device 101 can recognize and display the display content corresponding to the voice emoticon sent by electronic device 100 on the screen.

[0224] As can be seen, electronic device 100 can send one or more of the following identifiers to electronic device 101: the identifier of the background audio, the identifier of the virtual background, and the identifier of the voice emoticon. Electronic device 101 can obtain the background audio based on the identifier of the background audio, the virtual background based on the identifier of the virtual background, and the audio and / or display content corresponding to the voice emoticon based on the identifier of the voice emoticon. In this way, electronic device 101 can play call audio containing the background audio and / or the voice emoticon, and display the corresponding content of the voice emoticon and / or video footage containing the virtual background.

[0225] The method for electronic device 101 to send call audio containing user-set background music to electronic device 100 can be referred to in Figure 3 or Figure 4, which illustrates the method for electronic device 100 to send call audio containing user-set background music to electronic device 101. The method for electronic device 101 to send voice emoticons to electronic device 100 can also be referred to in Figure 3 or Figure 4, which illustrates the method for electronic device 101 to send voice emoticons to electronic device 101. These will not be elaborated upon here.

[0226] The following describes some call scenarios provided by the embodiments of this application.

[0227] Figures 5A to 5E illustrate some call scenarios.

[0228] As shown in Figure 5A, the electronic device 100 can display a call interface 510. The call interface 510 may include contact identifiers 511.

[0229] The contact identifier 511 can be used to indicate the person with whom the electronic device 100 is speaking. The contact identifier 511 may include one or more pieces of information such as the contact's name, phone number, and profile picture. For example, the contact identifier 511 including "Li Si" could indicate that the person with the contact name "Li Si" is the user on the electronic device 100. The call interface 510 is the user interface for the call. That is, the electronic device 100 is the caller and is calling the user with the contact name "Li Si".

[0230] The call interface 510 may also include a background sound control 512. The background sound control 512 can be used to bring up a user interface for setting the call background sound. In response to an operation on the background sound control 512, the electronic device 100 can display the user interface 520 shown in FIG. 5B.

[0231] As shown in Figure 5B, the user interface 520 may include a switch 545. Switch 545 can be used to turn on or off the call background sound. When the call background sound is on, the electronic device 100 can mix the user-set call background sound into the call audio transmitted to the other end of the call. When the call background sound is off, the electronic device 100 can stop mixing the user-set call background sound into the call audio transmitted to the other end of the call.

[0232] User interface 520 may include one or more options corresponding to background sounds during a call, such as background sound option 1 522, background sound option 2 523, background sound option 3 524, and background sound option 4 525. User interface 520 may include a checkbox 521. The checkbox 521 can be used to indicate the currently selected background sound option. For example, as shown in Figure 5B, the checkbox 521 is located at the position corresponding to background sound option 1 522, which can indicate that background sound option 1 522 is currently selected.

[0233] In some embodiments, the options corresponding to the background noise of the call may include options corresponding to the background noise of the call stored in the electronic device 100, and options corresponding to the background noise of the call stored in the cloud server that have not yet been downloaded and stored by the electronic device 100.

[0234] The user interface 520 may also include more controls 526, a search control 527, a one-click background sound configuration control 528, a recording control 529, a background sound generation control 530, and a volume adjustment control 531.

[0235] More controls 526 can be used to view more options for call background sounds.

[0236] Search control 527 can be used to search for background noise during a call. For example, based on the search terms entered in search control 527, electronic device 100 can search for background noise related to the search terms in local storage or a cloud server.

[0237] The one-click background sound configuration control 528 can be used to select audio and / or video to generate background sound for a call. For example, in response to an operation on the one-click background sound configuration control 528, the electronic device 100 can provide a user interface for selecting audio and / or video. The electronic device 100 can generate background sound for a call based on the audio and / or video selected by the user. The audio and / or video selected by the user can be audio and / or video stored in the electronic device 100 (e.g., audio and video recorded by the user), or audio and / or video stored on a cloud server (e.g., audio in a music playback application, video in a video playback application).

[0238] The recording control 529 can be used to trigger the electronic device 100 to record audio.

[0239] Background sound generation control 530 can be used to process audio recorded by recording control 529 to generate background sound for calls.

[0240] The embodiments of this application do not limit the methods for generating call background sound based on audio and / or video as described above.

[0241] As can be seen, users can generate call background sounds using previously recorded audio or video, audio or video downloaded from a cloud server, or temporarily recorded audio. These embodiments provide users with multiple ways to obtain call background sounds, allowing them to use their preferred background sounds during calls and improving their overall experience.

[0242] The volume adjustment control 531 can be used to adjust the volume of the background noise in a call audio that is mixed with the background noise. That is, the electronic device 100 can determine the volume of the background noise according to the volume adjustment control 531, and mix the background noise with the acquired audio according to the volume of the background noise.

[0243] The user interface 520 may also include one or more volume change mode options, such as a gradual increase option 532, a gradual decrease option 533, a proportional change option 534, and a sound duration extension option 535. The gradual increase option 532 can be used to select a volume change mode in the call background sound within the call audio that gradually increases (e.g., gradually changing from a preset minimum volume to a preset maximum volume). The gradual decrease option 533 can be used to select a volume change mode in the call background sound within the call audio that gradually decreases (e.g., gradually changing from a preset maximum volume to a preset minimum volume). The proportional change option 534 can be used to select a volume change mode in the call audio that is proportional to the volume of the speech (i.e., human voice) in the call audio (e.g., if the volume of the human voice in the call audio increases, the volume of the call background sound in the call audio also increases proportionally). The sound duration extension option 535 can be used to set the call background sound to play at a slow speed (e.g., 0.5x speed) or a fast speed (e.g., 2x speed). Changing the playback speed of the call background sound can achieve a change in its pitch. The above volume change patterns are merely illustrative examples of this application and should not be construed as limiting the scope of this application.

[0244] The user interface 520 may also include one or more background sound switching mode options, such as "Fade Out-Quiet-Fade In" option 536, "Fade Out-Mix-Fade In" option 537, and direct switching option 538. The "Fade Out-Quiet-Fade In" option 536 can be used to select the call background sound switching mode as "Fade Out-Quiet-Fade In". The "Fade Out-Mix-Fade In" option 537 can be used to select the call background sound switching mode as "Fade Out-Mix-Fade In". The direct switching option 538 can be used to select the call background sound switching mode as "Direct Switch". For example, the call background sound before the switch is background sound 1, and the call background sound after the switch is background sound 2. "Fade Out-Quiet-Fade In" can mean gradually decreasing the volume of background sound 1 in the call audio to 0, then switching the call background sound used for mixing from background sound 1 to background sound 2, and gradually increasing the volume of background sound 2 in the call audio. "Fade out-mix-fade in" can mean gradually decreasing the volume of background sound 1 in the call audio to 0, simultaneously mixing background sound 1 and background sound 2 in the call audio before the volume of background sound 1 decreases to 0, and gradually increasing the volume of background sound 2. "Direct switch" can mean directly switching the call background sound from background sound 1 to background sound 2 in the call audio, that is, there is no process of gradually decreasing the volume of background sound 1 and gradually increasing the volume of background sound 2 when switching the call background sound. The above background sound switching modes are only illustrative examples of this application and should not be construed as limiting this application.

[0245] The user interface 520 may also include one or more background sound switching duration options, such as options 539, 540, and 541. The background sound switching duration can be used to indicate the time required to complete the background sound switching. Specifically, when the selected background sound switching mode is the aforementioned "fade out-quiet-fade in" option 536 or "fade out-mix-fade in" option 537, the electronic device 100 can provide the aforementioned background sound switching duration options. Option 539 corresponds to a background sound switching duration of 500 milliseconds (ms). Option 540 corresponds to a background sound switching duration of 1 second (s). Option 541 corresponds to a background sound switching duration of 2 seconds. It is understood that the longer the background sound switching duration, the slower the rate at which the volume of background sound 1 decreases and the rate at which the volume of background sound 2 increases during the switching from background sound 1 to background sound 2. Not limited to options 539, 540, and 541, the electronic device 100 may also provide more or fewer background sound switching duration options.

[0246] The user interface 520 may also include a preview control 542, a settings control 543, and a cancellation control 544.

[0247] The preview control 542 can be used to preview the selected background sound. In response to an operation on the preview control 542, the electronic device 100 can play the call background sound corresponding to the selected background sound option in the user interface 520, such as the call background sound corresponding to background sound option 522 shown in Figure 5B. The electronic device 100 can play a switching audio corresponding to the selected call background sound in the user interface 520, based on the selected volume change mode option, the selected background sound switching mode option, and the selected background sound switching duration option in the user interface 520. This allows users to easily preview the sound effects of the background sound switching and adjust one or more background sound settings such as volume, background sound switching mode, and background sound switching duration. The current call background sound in the aforementioned call audio can be the call background sound previously set by the user, or it can be the actual background sound captured by the audio input device during the call.

[0248] The setting control 543 can be used to set the background sound of the selected background sound option in the user interface 520 as the background sound of the current call. The electronic device 100 can mix the selected background sound into the call audio based on one or more background sound settings in the user interface 520 (e.g., volume, background sound switching mode, background sound switching duration).

[0249] Cancel control 544 can be used to cancel switching the call background sound. In response to the operation of cancel control 544, electronic device 100 can exit user interface 520 and display call interface 510 as shown in FIG5A.

[0250] As shown in Figure 5B, in response to the operation of the background sound 2 option 523, the electronic device 100 can display the user interface 520 shown in Figure 5C.

[0251] As shown in Figure 5C, the electronic device 100 moves the selection box 521 from the position corresponding to the background sound 1 option 522 to the position corresponding to the background sound 2 option 523. That is, the current background sound 2 option 523 is selected. In response to the operation of the setting control 543 shown in Figure 5C, the electronic device 100 can set the call background sound corresponding to the background sound 2 option 523 as the call background sound during the current call.

[0252] Setting the background audio corresponding to option 523 (background audio 2) as the background audio during the current call can include: after a call made by electronic device 100 is answered, electronic device 100 can mix the background audio corresponding to option 523 with the acquired audio and send the mixed audio to the other end of the call. Optionally, before mixing, electronic device 100 can perform audio signal processing such as noise reduction on the acquired audio.

[0253] In some embodiments, in addition to the background sound control 512 shown in FIG5A, the electronic device 100 may also provide other operation entry points for users to set the call background sound. For example, the electronic device 100 may provide an operation entry point for setting the call background sound in a drop-down notification interface.

[0254] As shown in Figure 5D, the electronic device 100 displays a call interface 510. In response to a swipe down from the top of the screen, the electronic device 100 can display a pull-down notification interface 550 as shown in Figure 5E. This embodiment does not limit the operation used to access the pull-down notification interface 550.

[0255] As shown in Figure 5E, the pull-down notification interface 550 may include notification box 551 and notification box 552.

[0256] The notification box 551 can be used to control an ongoing call on the electronic device 100. The notification box 551 may include a contact avatar 551A and a contact name 551B. The contact avatar 551A may be the avatar of a logged-in user on a calling application within an electronic device (e.g., electronic device 101) that has established a call connection with electronic device 100. Both the contact avatar 551A and the contact name 551B can be used to indicate the person being called by electronic device 100.

[0257] Notification box 552 can be used to set the call background sound. Notification box 552 may include switch 552I. Switch 552I can be used to turn the call background sound on or off.

[0258] The notification box 552 may include one or more background sound options, such as rain sound option 552A, singing sound option 552B, traffic sound option 552C, and keyboard sound option 552D. Rain sound option 552A may include rain sounds as the background sound. When rain sound option 552A is selected, the electronic device 100 can simulate a user talking in a rainy environment using a background sound including rain. Singing sound option 552B may include singing sounds as the background sound. When singing sound option 552B is selected, the electronic device 100 can simulate a user talking in a karaoke venue using a background sound including singing. Traffic sound option 552C may include traffic sounds as the background sound. When traffic sound option 552C is selected, the electronic device 100 can simulate a user talking on the side of the road using a background sound including traffic. Keyboard sound option 552D may include keyboard sounds as the background sound. When keyboard sound option 552D is selected, the electronic device 100 can simulate a user talking in an office environment using a background sound including keyboard. The background sound options described above are merely illustrative examples of this application and should not be construed as limiting the scope of this application.

[0259] The notification box 552 may also include a default control 552E, a preview control 552F, a settings control 552G, and a more controls 552H. The default control 552E can be used to set the call background sound as the default background sound, or to cancel a previously set default background sound. The default background sound can mean that, in the absence of a user switching the call background sound, the electronic device 100 can directly use the default background sound mixed with the audio acquired by the audio input device during a call, and send the mixed audio to the other end of the call. The preview control 552F can be used to preview the selected call background sound in the notification box 552. The settings control 552G can be used to set the selected call background sound in the notification box 552 as the call background sound during the current call of the electronic device 100. The more controls 552H can be used to bring up a user interface containing more background sound settings. For example, in response to an operation on the more controls 552H, the electronic device 100 can display the user interface 520 shown in Figure 5B.

[0260] As described in the above scenario, a user can use electronic device 100 to make a call to another user and set a background tone before the call is answered. Once the call is connected, electronic device 100 can generate uplink call audio based on the user-set background tone and the acquired audio. This allows for a virtual call environment for the caller immediately after the call is connected, protecting the caller's privacy.

[0261] Figures 6A and 6B illustrate schematic diagrams of other call scenarios.

[0262] As shown in Figure 6A, the electronic device 101 can display an incoming call answering interface 610. The incoming call answering interface 610 can be used to indicate that the electronic device 101 has received an incoming call.

[0263] The incoming call interface 610 may include a contact identifier 611. The contact identifier 611 can be used to indicate the person being spoken to by the electronic device 101. For example, a contact identifier 611 including "Zhang San" can indicate that the person being spoken to by the electronic device 101 is a user whose contact name is "Zhang San". It can be seen that the electronic device 101 is the called party and is being called by a user whose contact name is "Zhang San".

[0264] For example, electronic device 101 can be the electronic device called by electronic device 100 shown in Figure 5A, that is, the electronic device of the user whose contact name is "Li Si". Electronic device 100 can be the electronic device of the user whose contact name is "Zhang San" shown in Figure 6A.

[0265] The incoming call answering interface 610 may include an SMS control 612, a hang-up control 613, a virtual scene answering control 614, and a normal answering control 615.

[0266] SMS control 612 can be used by electronic device 101 to send SMS messages to the other end of a call.

[0267] The hang-up control 613 can be used by electronic device 101 to hang up the phone.

[0268] The virtual scene answering control 614 can be used to connect a phone call. In response to an operation of the virtual scene answering control 614, the electronic device 101 can connect the phone call and mix a preset background sound into the audio transmitted to the other end of the call. For example, the preset background sound could be a default background sound set by the user in advance.

[0269] The standard call answering control 615 can be used to connect a phone call. When a phone call is connected based on the operation of the standard call answering control 615, the call audio transmitted by the electronic device 101 to the other party does not include virtual background noise.

[0270] In some embodiments, in addition to the virtual scene answering control 614 shown in FIG6A, the electronic device 101 may also provide other operation entry points for users to set the call background sound. For example, the electronic device 101 may provide an operation entry point for setting the call background sound in the drop-down notification interface.

[0271] As shown in Figure 6A, in response to a swipe-down operation from the top of the screen of electronic device 101, electronic device 100 can display a pull-down notification interface 620 as shown in Figure 6B. The pull-down notification interface 620 may include a notification box 621 and a notification box 622.

[0272] The notification box 621 can be used to control an ongoing call on the electronic device 101. The notification box 621 may include a contact avatar 621A and a contact name 621B. The contact avatar 621A may be the avatar of a logged-in user on a calling application in an electronic device (e.g., electronic device 100) that has established a call connection with electronic device 101. Both the contact avatar 621A and the contact name 621B can be used to indicate the person being called by electronic device 101.

[0273] The notification box 622 can be used to set the background sound during a call. The notification box 622 can be referred to as the notification box 552 shown in Figure 5E above.

[0274] As can be seen from the above scenario, after receiving an incoming call, the user can answer the call with one click and turn on the background sound using the virtual scene answering control 614. Alternatively, after receiving an incoming call, the user can set the background sound before answering the call through other operation interfaces provided by the electronic device 101. The above embodiments can immediately simulate the called user's call scene as soon as the call is connected, protecting the called user's call privacy.

[0275] Not limited to the voice call scenarios shown in Figures 5A-5E, 6A, and 6B above, in video call scenarios, both the calling and called users can also set background sounds before the video call is connected. For details, please refer to the methods for setting background sounds in the voice call scenarios described above. These will not be repeated here.

[0276] Figures 7A to 7G illustrate schematic diagrams of other call scenarios.

[0277] As shown in Figure 7A, after the telephone call between electronic device 100 and electronic device 101 is connected, electronic device 100 can display a call interface 710. The call interface 710 may include a contact identifier 711, a call time 712, a background sound control 713, and an emoticon control 714.

[0278] The contact identifier 711 can be referenced from the contact identifier 511 shown in Figure 5A above.

[0279] The call duration of 712 can be used to indicate the duration of a call after it has been connected.

[0280] The background sound control 713 can be referenced from the background sound control 512 shown in Figure 5A above. It can be seen that, in addition to setting the background sound before the call is connected, the user can also set the background sound after the call is connected (for example, turning the background sound on / off, switching the background sound, adjusting the volume of the background sound, adjusting the switching mode of the background sound, etc.).

[0281] The emoji control 714 can be used to bring up the user interface for sending voice emojis. In response to an operation on the emoji control 714, the electronic device 100 can display the prompt box 720 shown in Figure 7B.

[0282] As shown in Figure 7B, the prompt box 720 may include an emoji display area 721, a search control 722, a one-click configuration control 723, a recording control 724, and a volume adjustment control 725.

[0283] The emoticon display area 721 includes one or more voice emoticons, such as voice emoticon 721B. The emoticon display area 721 may also include a checkbox 721A. The checkbox 721A can be used to indicate which voice emoticon is currently selected. For example, if the checkbox 721A is located at the position corresponding to voice emoticon 721B, it can indicate that voice emoticon 721B is currently selected. The voice emoticons in the emoticon display area 721 are merely illustrative examples of this application and should not be construed as limiting the scope of this application.

[0284] Search control 722 can be used to search for voice emoticons. For example, based on the search terms entered in search control 722, electronic device 100 can search for voice emoticons related to the search terms in local storage or a cloud server.

[0285] The one-click configuration control 723 can be used to select one or more types of data, such as audio, video, text, and symbols, to generate voice emoticons. For example, in response to an operation on the one-click configuration control 723, the electronic device 100 can provide a user interface for selecting data such as audio, video, text, and symbols. The electronic device 100 can generate voice emoticons based on the data selected by the user. The data selected by the user can be data stored in the electronic device 100 or data stored on a cloud server.

[0286] The recording control 724 can be used to trigger the electronic device 100 to record audio and generate voice emoticons based on the recorded audio.

[0287] This application does not limit the method for generating voice emoticons described above.

[0288] As can be seen, users can generate voice emoticons using pre-stored data, data downloaded from a cloud server, or temporarily recorded audio. These embodiments provide users with multiple ways to obtain voice emoticons, allowing them to use their favorite emoticons during calls and improving their overall experience.

[0289] The volume adjustment control 725 can be used to adjust the volume of the audio corresponding to the voice emoticon in the call audio. That is, the electronic device 100 can determine the volume of the audio corresponding to the voice emoticon according to the volume adjustment control 725, and mix the audio corresponding to the voice emoticon with the acquired audio according to the volume.

[0290] The prompt box 720 may also include one or more volume change mode options, such as a gradually increasing option 726, a gradually decreasing option 727, a proportional change option 728, and a sound duration extension option 729. The gradually increasing option 726 can be used to select that the volume of the audio corresponding to the voice emoticon in the call audio changes gradually. The gradually decreasing option 727 can be used to select that the volume of the audio corresponding to the voice emoticon in the call audio changes gradually. The proportional change option 728 can be used to select that the volume of the audio corresponding to the voice emoticon in the call audio changes proportionally to the volume of the voice (i.e., human voice) in the call audio. The above volume change modes are merely illustrative examples of this application and should not be construed as limiting this application.

[0291] The prompt box 720 may also include an emoji avatar fusion switch 730, a preview control 731, and a send control 732.

[0292] The emoticon avatar fusion switch 730 can be used to enable or disable the function of fusion between the display content corresponding to the voice emoticon and the user's avatar. When the function of fusion between the display content corresponding to the voice emoticon and the user's avatar is enabled, the electronic device 100 can perform image fusion between the display content corresponding to the voice emoticon and the personal avatar in the call application of the electronic device 100, and send the fused image to the other end of the call for display. This application embodiment does not limit the above image fusion method. For example, the image fusion method can be artificial intelligence (AI) image fusion.

[0293] The preview control 731 can be used to preview the audio corresponding to the selected voice emoticon and to view the displayed content of the voice emoticon. The electronic device 100 can play the audio corresponding to the selected voice emoticon in the emoticon display area 721 according to the volume adjustment control 725 in the prompt box 720 and the selected volume change mode option. The electronic device 100 can display the displayed content corresponding to the voice emoticon, or the displayed content after the voice emoticon is merged with the user's avatar, depending on the on / off state of the emoticon avatar fusion switch 730. This allows users to easily preview the sound effects of the voice emoticons and view their display effects, thereby adjusting one or more emoticon settings such as volume and the emoticon avatar fusion switch 730.

[0294] The send control 732 can be used to send selected voice emoticons to the other end of a call.

[0295] As shown in Figure 7B, when the voice emoticon 721B is selected, in response to the operation of the send control 732, the electronic device 100 can send the voice emoticon 721B to the other end of the call (i.e., electronic device 101). The method for electronic device 100 to send the voice emoticon 721B to the other end of the call can be referred to the description in Figure 3 or Figure 4 above.

[0296] As shown in Figure 7C, a voice call connection is established between electronic device 100 and electronic device 101. Electronic device 101 can display a call interface 750 and play the call audio of electronic device 100. Specifically, electronic device 101 can display content 751 on the call interface 750 based on a voice emoticon 721B from electronic device 100. The display content 751 can be the content corresponding to the voice emoticon 721B. The display content 751 can be a static image or a dynamic image. The call audio of electronic device 100 can include the audio corresponding to the voice emoticon 721B. For example, the voice emoticon 721B means "laughing." The audio corresponding to the voice emoticon 721B can include the sound of laughing. The call audio of electronic device 100 can also include the voice of the user of electronic device 100 during the call. If electronic device 100 has enabled a background sound and sets background sound 1 as the background sound, the call audio of electronic device 100 can also include background sound 1. This embodiment does not limit the audio content of background sound 1.

[0297] After electronic device 100 sends voice emoticon 721B to electronic device 101, electronic device 100 can display the call interface 740 shown in FIG7C and play the audio corresponding to voice emoticon 721B and the call audio of electronic device 101. The call interface 740 may include display content 741. Display content 741 and display content 751 may be the same. The call audio of electronic device 101 may be sent by electronic device 101 to electronic device 100 and may include the voice of the user of electronic device 101 during the call. If electronic device 101 has enabled the call background sound and set background sound 2 as the call background sound, the call audio of electronic device 101 may also include background sound 2. The embodiments of this application do not limit the audio content of background sound 2.

[0298] In this way, after sending a voice emoticon, the user of electronic device 100 can see the playback sound effect and display effect of the voice emoticon they sent.

[0299] Optionally, the electronic device 100 may not need to display the content 741 or play the audio corresponding to the voice emoticon pack 721B.

[0300] It should be noted that a voice emoticon may only have corresponding audio or only corresponding display content. If a voice emoticon only has corresponding audio and no corresponding display content, the electronic device receiving the voice emoticon only needs to play the audio. If a voice emoticon only has corresponding display content and no corresponding audio, the electronic device receiving the voice emoticon only needs to display the display content.

[0301] In some embodiments, the playback duration of the audio corresponding to the voice emoticon is limited. For example, the playback duration of the audio corresponding to the voice emoticon may be 1 second, or 2 seconds, etc. That is to say, after the call audio mixed with the audio corresponding to the voice emoticon has finished playing, the electronic device that received the voice emoticon can continue playing the call audio that does not contain the audio corresponding to the voice emoticon. The displayed content corresponding to the voice emoticon and the audio corresponding to the voice emoticon can be displayed synchronously. The display duration of the displayed content corresponding to the voice emoticon can be the same as, slightly shorter than, or slightly longer than the playback duration of the audio corresponding to the voice emoticon. That is to say, after the displayed content corresponding to the voice emoticon has finished displaying, the electronic device that received the voice emoticon can cancel the display of the displayed content corresponding to the voice emoticon on the screen.

[0302] As shown in Figure 7D, after playing the call audio containing the audio corresponding to the voice emoticon 721B shown in Figure 7C, electronic device 101 can play call audio from electronic device 100 that does not contain the audio corresponding to the voice emoticon. Furthermore, after displaying the display content 751 shown in Figure 7C, electronic device 101 can display the call interface 770 shown in Figure 7D. The call interface 770 does not contain the display content corresponding to the voice emoticon. It can be seen that electronic device 101 can play the audio corresponding to the voice emoticon and display the corresponding display content for a period of time after receiving the voice emoticon. After the voice emoticon is displayed, electronic device 101 can de-display the corresponding display content on the call interface to allow for the subsequent display of newly received voice emoticons and reduce the impact of prolonged display of the voice emoticon content on the user's operation of the call interface.

[0303] Similarly, after playing the audio corresponding to the voice emoticon pack 721B, electronic device 100 can play only the call audio of electronic device 101. After displaying the display content 741 shown in Figure 7C, electronic device 100 can display the call interface 760 shown in Figure 7D. The call interface 760 does not include the display content corresponding to the voice emoticon pack.

[0304] As shown in Figure 7E, the electronic device 100 can display a prompt box 720. The prompt box 720 may include an emoji avatar fusion switch 730. The emoji avatar fusion switch 730 is in the off state. In response to an operation on the emoji avatar fusion switch 730, such as a click operation, the electronic device 100 can enable the function of merging the displayed content corresponding to the voice emoji with the user's avatar, and switch the emoji avatar fusion switch 730 to the on state.

[0305] As shown in Figure 7F, the emoticon avatar fusion switch 730 is in the ON state. When the voice emoticon 721B is selected, in response to the operation of the send control 732 shown in Figure 7F, the electronic device 100 can perform image fusion of the display content corresponding to the voice emoticon 721B and the personal avatar of the calling application in the electronic device 100, and send the fused image to the electronic device 101. The personal avatar of the calling application in the electronic device 100 can refer to the contact avatar 621A shown in Figure 6B above. The contact avatar 621A can represent the user of the electronic device 100, that is, the user who sent the voice emoticon 721B. The electronic device 100 can also send the call audio containing the voice emoticon 721B to the electronic device 101.

[0306] As shown in Figure 7G, the electronic device 101 can display a call interface 790 and play the call audio from the electronic device 100. The call audio from the electronic device 100 may include the audio corresponding to the voice emoticon 721B. The call interface 790 may include display content 791. The display content 791 can be obtained by fusing the display content corresponding to the voice emoticon 721B with the personal avatar (i.e., the contact avatar 621A shown in Figure 6B) in the call application of the electronic device 100. For example, the electronic device 100 can perform image fusion based on the meaning of the voice emoticon 721B, the display content corresponding to the voice emoticon 721B, and the contact avatar 621A. Since the meaning of the voice emoticon 721B is laughing, the electronic device 100 can change the face in the contact avatar 621A to a laughing face, thereby obtaining the display content 791. The display content 791 can be a static image or a dynamic image.

[0307] After electronic device 100 sends a voice emoticon 721B to electronic device 101, electronic device 100 can display the call interface 780 shown in Figure 7G and play the audio corresponding to the voice emoticon 721B and the call audio of electronic device 101. The call interface 780 may include display content 781. Display content 781 and display content 791 can be the same. In this way, after sending a voice emoticon, the user of electronic device 100 can see the display effect after their personal avatar in the call application is merged with the display content corresponding to the voice emoticon.

[0308] Optionally, the electronic device 100 may not need to display the content 781 or play the audio corresponding to the voice emoticon pack 721B.

[0309] As described above, during a call, a user can send voice emoticons to the other end. The electronic device receiving the emoticon can play the corresponding audio and / or display the corresponding content. The electronic device sending the emoticon can also play the corresponding audio and / or display the corresponding content. Furthermore, when sending voice emoticons, users can choose to integrate the displayed content with their profile picture in the calling application before sending it to the other end. Users can then display corresponding expressions using their own avatar. This enhances the fun of the call and improves the user's call experience.

[0310] Figures 8A to 8D illustrate schematic diagrams of other call scenarios.

[0311] As shown in Figure 8A, the electronic device 100 can display a call interface 810. The call interface 810 may include a prompt box 821. The prompt box 821 may be displayed by the electronic device 100 according to the operation of the emoji control 714 shown in Figure 7A above.

[0312] The prompt box 821 may include one or more voice emoticons, such as voice emoticons 822, 823, 824, 825, 826, 827, etc. It can be seen that the type of voice emoticons in the prompt box 821 is different from the type of voice emoticons in the emoticon display area 721 shown in Figure 7B. The voice emoticons in the emoticon display area 721 shown in Figure 7B are image-based voice emoticons. The voice emoticons in the prompt box 821 are text-based voice emoticons.

[0313] For example, the text corresponding to voice emoticon 822 may include "That works too." The text corresponding to voice emoticon 824 may include "Hahahahahahaha." The audio corresponding to a text-based voice emoticon may include the speech formed by the text corresponding to the emoticon. The display content corresponding to a text-based voice emoticon may include the text content corresponding to the emoticon.

[0314] The controls in prompt box 821 for searching for voice emoticons, generating voice emoticons, and adjusting the volume of voice emoticons can all be found in prompt box 720 shown in Figure 7B above. Further details will not be provided here.

[0315] In some embodiments, the prompt box 821 may also include one or more settings for configuring the display style of text-based voice emoticons. As shown in FIG8A, the prompt box 821 may include font options, such as KaiTi option 828, SongTi option 829, LiShu option 830, etc. The above-mentioned font options can be used to determine the display font of the display content corresponding to the above-mentioned text-based voice emoticons. Not limited to setting font settings, the prompt box 821 may also include settings for setting other display styles (e.g., settings for setting font size, settings for setting font color, etc.). This application embodiment does not limit this.

[0316] The prompt box 821 may also include a preview control 831 and a send control 832. The preview control 831 can be referred to as the preview control 731 shown in Figure 7B above. The send control 832 can be used to send the selected voice emoticon to the other end of the call.

[0317] As shown in Figure 8A, when the voice emoticon 824 is selected, in response to the operation of the send control 832, the electronic device 100 can send the voice emoticon 824 to the other end of the call (i.e., electronic device 101). The method for electronic device 100 to send the voice emoticon 824 to the other end of the call can be referred to the description in Figure 3 or Figure 4 above.

[0318] As shown in Figure 8B, electronic device 101 can display a call interface 850 and play the call audio from electronic device 100. Specifically, electronic device 101 can display content 851 based on voice emoticons 824 from electronic device 100. The displayed content 851 can be the content corresponding to the voice emoticon 824. The font of the text in the displayed content 851 can be the font option selected in the prompt box 821 shown in Figure 8A (e.g., KaiTi option 828). The displayed content 851 can be static or dynamic. The call audio from electronic device 100 can include the audio corresponding to the voice emoticon 824.

[0319] The electronic device 101 can play call audio containing the audio corresponding to the voice emoticon pack 824 while displaying the display content 851.

[0320] After electronic device 100 sends voice emoticon 824 to electronic device 101, electronic device 100 can display the call interface 840 shown in Figure 8B and play the audio corresponding to voice emoticon 824 and the call audio of electronic device 101. The call interface 840 may include display content 841. Display content 841 and display content 851 can be the same. In this way, after sending a voice emoticon, the user of electronic device 100 can see the playback sound effects and display effects of the voice emoticon they sent.

[0321] Optionally, the electronic device 100 may not need to display the content 841 or play the audio corresponding to the voice emoticon pack 824.

[0322] In some embodiments, in addition to the emoji control 714 shown in FIG7A, the electronic device 100 may also provide other operation entry points for users to select and set voice emojis. For example, the electronic device 100 may provide an operation entry point for selecting and setting voice emojis in a drop-down notification interface.

[0323] As shown in Figure 8C, the electronic device 100 displays a call interface 860, indicating that the electronic device 100 is currently in a call scenario. In response to a swipe-down operation from the top of the screen of the electronic device 100, the electronic device 100 can display a pull-down notification interface 870 as shown in Figure 8D.

[0324] As shown in Figure 8D, the pull-down notification interface 870 may include notification boxes 871, 872, and 873.

[0325] The notification box 871 can be used to control an ongoing call on the electronic device 100. The notification box 871 can be referenced to the notification box 551 shown in FIG5E above.

[0326] Notification box 872 can be used to set the background sound during a call. Notification box 872 can be referenced from notification box 552 shown in Figure 5E above.

[0327] The notification box 873 can be used to select and set voice emoticons. The notification box 873 may include an emoticon display area 873A, a preview control 873B, a send control 873C, and more controls 873D.

[0328] The emoticon display area 873A may include one or more voice emoticons, such as image-based voice emoticons, text-based voice emoticons, etc. This application embodiment does not limit the type of voice emoticons in the emoticon display area 873A.

[0329] The preview control 873B can be used to preview voice emoticons. For example, in response to an operation on the preview control 873B, the electronic device 100 can play the audio corresponding to the selected voice emoticon in the emoticon display area 873A and display the content corresponding to the selected voice emoticon in the emoticon display area 873A.

[0330] The sending control 873C can be used to send the selected voice emoticon in the emoticon display area 873A to the other end of the call.

[0331] The additional control 873D can be used to bring up a user interface containing more emoji settings. For example, in response to an operation on the additional control 873D, the electronic device 100 can display the prompt box 720 shown in FIG. 7B or the prompt box 821 shown in FIG. 8A.

[0332] As described above, during a call, users can send various types of voice emoticons, including image-based and text-based ones. Users can choose the appropriate emoticon to send. Furthermore, users can adjust the sound effects and / or display of the emoticons. This enhances the fun of the call and improves the user's call experience.

[0333] Figures 9A to 9E illustrate schematic diagrams of other call scenarios.

[0334] As shown in Figure 9A, a video call connection is established between electronic device 100 and electronic device 101. Electronic device 100 can display a call interface 910. The call interface 910 may include call time 911, display area 912, display area 913, emoticon control 914, background sound control 915, hang-up control 916, voice-to-speech control 917, switching control 918, and virtual background control 919.

[0335] The call duration 911 can be used to indicate the length of a video call.

[0336] Display areas 912 and 913 can be used to display videos captured by electronic device 100 and videos from electronic device 101, respectively. The videos displayed in display areas 912 and 913 can be interchanged. For example, electronic device 100 can display a video captured by itself in display area 912 and a video from electronic device 101 in display area 913. In response to the operation of swapping video display areas, electronic device 100 can display a video from electronic device 101 in display area 912 and a video captured by itself in display area 913.

[0337] The emoticon control 914 can be used to bring up the user interface for sending voice emoticons. The emoticon control 914 can be referred to as the emoticon control 714 shown in Figure 7A above.

[0338] Background sound control 915 can be used to bring up the user interface for setting the background sound during a call. Background sound control 915 can be referenced to background sound control 512 shown in Figure 5A above.

[0339] The hang-up control 916 can be used to hang up video calls.

[0340] The 917 voice switch control can be used to switch the current video call to a voice call.

[0341] The switching control 918 can be used to switch the camera used for shooting video in the electronic device 100, for example, switching the camera for shooting video from the front camera to the rear camera.

[0342] The virtual background control 919 can be used to set the background image of the local video call. For example, the electronic device 100 can blur the background image of the local video call or replace the background image of the local video call with a preset image.

[0343] Electronic device 101 can display a call interface 920. The call interface 920 may include a display area 921 and a display area 922. Display areas 921 and 922 may refer to the aforementioned display areas 912 and 913. For example, electronic device 100 may display a video captured by electronic device 101 in display area 921 and display a video from electronic device 100 in display area 922.

[0344] In response to the operation of the emoji control 914 shown in FIG9A, the electronic device 100 may display the prompt box 930 shown in FIG9B. The prompt box 930 may refer to the prompt box 720 shown in FIG7B above. The prompt box 930 may include a voice emoji 931 and a sending control 932.

[0345] As shown in Figure 9B, when the voice emoticon 931 is selected, in response to the operation of the send control 932, the electronic device 100 can send the voice emoticon 931 to the electronic device 101. The method for sending voice emoticons in a video call scenario can be referred to the description in Figure 3 or Figure 4 above.

[0346] As shown in Figure 9C, electronic device 101 can display a call interface 950 and play call audio from electronic device 100. Specifically, electronic device 101 can display video captured by itself in display area 951 and video from electronic device 100 in display area 952. The video from electronic device 100 may include an image integrated with display content 953. Display content 953 may be the content corresponding to voice emoticon 931. Display content 953 can be static or dynamic. The call audio from electronic device 100 may include audio corresponding to voice emoticon 931.

[0347] As can be seen, if electronic device 100 sends a voice emoticon to electronic device 101 during a video call, the display content corresponding to the voice emoticon can be integrated with the video captured by electronic device 100 and displayed on the screen of electronic device 101. That is, the user of electronic device 101 can see the display content corresponding to the voice emoticon sent by the other end of the video call in the video screen of the other end.

[0348] If electronic device 100 displays video from electronic device 100 in display area 951, then the display content 953 shown in FIG9C can be displayed in display area 951.

[0349] This application does not limit the method for displaying the content corresponding to voice emoticons in a video call scenario.

[0350] In some embodiments, when electronic device 100 sends a voice emoticon 931 to electronic device 101, the display content 953 corresponding to the voice emoticon 931 can be merged with the video captured by electronic device 101 and displayed in the display area of the call interface 950 used to display the video captured by electronic device 101. For example, electronic device 101 displays the video captured by electronic device 101 in the display area 951. Electronic device 101 can merge the display content 953 and the video captured by electronic device 101 and display the merged image in the display area 951.

[0351] Alternatively, when electronic device 100 sends a voice emoticon 931 to electronic device 101, electronic device 101 can display the corresponding content 953 on the top layer of the call interface 950. For example, the corresponding content 953 can be displayed floating on the screen. The display position of the content 953 does not need to change as electronic device 101 exchanges display content between display areas 951 and 952. That is, the content 953 does not need to be merged with the video captured by electronic device 100 or electronic device 101 before being displayed.

[0352] After electronic device 100 sends a voice emoticon 931 to electronic device 101, electronic device 100 can display the call interface 940 shown in Figure 9C and play the audio corresponding to the voice emoticon 931 and the call audio of electronic device 101. Electronic device 100 can also display video captured by itself in display area 941. Electronic device 100 can merge the displayed content 943 with the video from electronic device 101 and display the merged image in display area 942. The displayed content 943 can be the same as the previously mentioned displayed content 953. When the displayed content in display areas 941 and 942 is swapped, the display position of the displayed content 943 shown in Figure 9C can switch from display area 942 to display area 941 along with the video from electronic device 101.

[0353] Alternatively, after electronic device 100 sends a voice emoticon 931 to electronic device 101, electronic device 100 can merge the display content 943 corresponding to the voice emoticon 931 with the video captured by electronic device 100, and display the merged image in display area 941 or display area 942. Alternatively, after electronic device 100 sends a voice emoticon 931 to electronic device 101, electronic device 100 can display the display content 943 corresponding to the voice emoticon 931 on the top layer of the call interface 940. That is, the display content 943 does not need to be merged with the video captured by electronic device 100 or electronic device 101 before being displayed.

[0354] In some embodiments, in a video call scenario between electronic device 100 and electronic device 101 shown in Figures 9A to 9C, if electronic device 100 turns on the call background sound and sets background sound 1 as the call background, the call audio played by electronic device 101 shown in Figure 9C may also include background sound 1.

[0355] In some embodiments, during a video call, the electronic device 100 can replace the background of the video captured by the electronic device 100 according to a virtual background set by the user. For example, the electronic device 100 can perform subject recognition on the captured image to distinguish the subject and background in the image. The subject of the image may include, but is not limited to, people, animals, etc. This application embodiment does not limit the method of image subject recognition. The electronic device 100 may retain the subject in the image and replace the background of the image with a virtual background set by the user.

[0356] The virtual background set by the user can be determined based on the background sound of the call.

[0357] As shown in Figure 9D, in response to an operation on the virtual background control 919, the electronic device 100 can display the prompt box 960 shown in Figure 9E.

[0358] As shown in Figure 9E, the prompt box 960 may include a virtual background switch 961, a follow call background sound switch 962, a virtual background option area 963, more controls 964, an OK control 965, and a cancel control 966.

[0359] The virtual background switch 961 can be used to turn the virtual background on or off. When the virtual background is turned on, the electronic device 100 can replace the background in the video captured by the electronic device 100 with the virtual background set by the user. The electronic device 100 can then send the background-replaced video to the other end of the video call, achieving the effect of hiding the user's video call scene.

[0360] The "Follow Call Background Sound" switch 962 can be used to enable or disable the function of the virtual background following the call background sound. If the function is enabled, the electronic device 100 can determine the virtual background based on the currently set call background sound. For example, if the call background sound is audio containing rain sounds, the virtual background following that call background sound could be an image containing a rain scene. If the call background sound is audio containing traffic sounds, the virtual background following that call background sound could be an image containing a road scene. If the call background sound is audio containing keyboard sounds, the virtual background following that call background sound could be an image containing an office scene.

[0361] As can be seen, when the virtual background follows the call background function is enabled, the virtual background and call background sound provided by the electronic device 100 can be associated with the same scene. The virtual background and call background sound can work together to create a more realistic virtual video call scene, thereby better protecting the user's privacy during video calls.

[0362] The above-described method of associating a virtual background with background audio in a video call scenario is merely an illustrative example of this application and should not be construed as limiting this application. For example, the electronic device 100 can first determine the virtual background set by the user, and then determine the associated background audio based on the user-set virtual background.

[0363] The virtual background option area 963 may include one or more virtual background options, such as the option corresponding to background 1, the option corresponding to background 2, the option corresponding to background 3, the option corresponding to background 4, and so on. The electronic device 100 can use the virtual background corresponding to the selected virtual background option in the virtual background option area 963 to replace the background in the video recorded by the electronic device 100. That is to say, the virtual background and the call background sound can be set independently. This application embodiment does not limit the virtual background options in the virtual background option area 963.

[0364] More controls 964 are available to view more virtual background options.

[0365] The control 965 can be used to set a virtual background based on the settings in the prompt box 960.

[0366] Cancel control 966 can be used to cancel the setting to change the virtual background.

[0367] The prompt box 960 may also include more controls for setting the virtual background (e.g., controls for adding filters to the virtual background, etc.). This application embodiment does not limit this.

[0368] As described above, in video calls, users can send voice emoticons to the other end. Users can adjust the playback sound effects and / or display effects of these emoticons. Furthermore, users can set background sounds and virtual backgrounds, and manually adjust the playback sound effects of the background sounds and the display effects of the virtual backgrounds. This not only enhances the fun of video calls but also protects user privacy and improves the overall video call experience.

[0369] In some embodiments, during multi-person voice or video calls, electronic device 100 can send the same voice emoticon to multiple electronic devices with which it has established a call connection. Multiple electronic devices receiving the same voice emoticon can play the audio corresponding to the voice emoticon and / or display the corresponding content. For details, refer to the operations performed by electronic device 101 after receiving the voice emoticon as shown in Figures 7C / 7G / 8B / 9C. That is, in multi-person voice or video calls, a user can send a voice emoticon to one user in the call, or they can choose to send a voice emoticon to multiple users in the call.

[0370] In some embodiments, the electronic device 100 can also set background sounds / voice emoticons for calls even when no call is being made.

[0371] Figures 10A to 10F illustrate some scenarios for setting background sounds / voice emoticons during calls.

[0372] As shown in Figure 10A, the electronic device 100 can display a user interface 1010. The user interface 1010 can be a user interface for setting up applications. The user interface 1010 may include application and service controls 1011. In response to operations on the application and service controls 1011, the electronic device 100 can display the user interface 1020 shown in Figure 10B. The user interface 1020 may include options corresponding to applications in the electronic device 100.

[0373] As shown in Figure 10B, the user interface 1020 may include a phone application option 1021. In response to an operation on the phone application option 1021, the electronic device 100 may display the user interface 1030 shown in Figure 10C. The user interface 1030 may be used to provide controls for setting permissions, functions, and other related information for the phone application.

[0374] As shown in Figure 10C, the user interface 1030 may include a background sound control 1031 and an emoticon control 1032. In response to an operation on the background sound control 1031, the electronic device 100 may display the user interface 1040 shown in Figure 10D. The user interface 1040 can be used to set the background sound for calls made after the electronic device 100 establishes a call connection through a telephone application. The user interface 1040 can refer to the user interface 520 shown in Figure 5B above.

[0375] As shown in Figure 10D, the user interface 1040 may include a preview control 1041 and a set-default control 1042.

[0376] The preview control 1041 can be used to preview the background sound of a call. For example, in response to an operation on the preview control 1041, the electronic device 100 can play the selected background sound of a call in the user interface 1040. The sound effect of the background sound played by the electronic device 100 can be determined based on one or more background sound settings in the user interface 1040 (e.g., settings corresponding to volume, settings for background sound switching modes, settings for background sound switching duration, etc.).

[0377] The Set as Default control 1042 can be used to set a default background sound. For example, in response to the operation of the Set as Default control 1042, the electronic device 100 can set the selected call background sound in the user interface 1040 as the default background sound, and determine the default sound effect when the default background sound is played based on one or more background sound settings in the user interface 1040.

[0378] As shown in Figure 10E, in response to an operation on the emoji control 1032, the electronic device 100 can display the user interface 1050 shown in Figure 10F. The user interface 1050 can be used to set voice emojis.

[0379] As shown in Figure 10F, the user interface 1050 may include a display area 1051 and a display area 1052.

[0380] Display area 1051 may include one or more image-based voice emoticons.

[0381] Display area 1052 may include one or more text-based voice emoticons.

[0382] In addition to image-based and text-based voice emoticons, the user interface 1050 can also include many other types of voice emoticons.

[0383] The user interface 1050 may also include controls for searching for voice emoticons, controls for generating voice emoticons, and one or more emoticon settings (e.g., volume settings). See the prompt box 720 in Figure 7B and the prompt box 821 in Figure 8A for details. Further description is omitted here.

[0384] The user interface 1050 may also include a preview control 1053. The preview control 1053 can be used to preview the audio corresponding to the voice emoticon and / or to preview the display content corresponding to the voice emoticon. For example, in response to an operation on the preview control 1053, the electronic device 100 can play the audio corresponding to the selected voice emoticon in the user interface 1050 and / or display the display content corresponding to the selected voice emoticon in the user interface 1050. The sound effects of the audio corresponding to the voice emoticon played by the electronic device 100, and the display effects of the display content corresponding to the voice emoticon, can be determined based on one or more emoticon settings in the user interface 1050.

[0385] As described above, users can customize call background sounds and voice emoticons when not in a call. For example, users can set default background sounds and effects, input audio and / or video data to create custom background sounds, adjust playback effects for voice emoticons, and customize the display of their content. Users can also create their own personalized voice emoticons. During a call, users can then use pre-set background sounds and / or voice emoticons, enhancing their overall experience.

[0386] Figure 11 illustrates a flowchart of a call method provided by an embodiment of this application.

[0387] S1111~S1116: Establish a voice call connection.

[0388] S1111, Electronic device 100 receives a voice call to electronic device 101.

[0389] S1112, Electronic device 100 initiates a voice call request to electronic device 101.

[0390] S1113, Electronic device 101 receives a voice call from electronic device 100.

[0391] S1114, Electronic device 101 receives the operation to answer a phone call.

[0392] The above-mentioned operation of answering the phone can be an operation of the virtual scene answering control 614 shown in Figure 6A, or an operation of the ordinary answering control 615.

[0393] S1115, Electronic device 100 receives an operation to set audio 1 as the background sound for a call.

[0394] For example, Audio 1 could be the background sound for a call corresponding to option 523 of background sound 2 shown in Figure 5C. Setting Audio 1 as the background sound for a call can be done by referring to the operation of setting control 543 shown in Figure 5C.

[0395] S1116, Electronic device 100 and electronic device 101 establish a voice call connection.

[0396] After establishing a voice call connection, electronic devices 100 and 101 can transmit call audio to each other.

[0397] S1117~S1122: Send call audio including background noise.

[0398] S1117, Electronic device 100 mixes audio 1 with audio acquired through a microphone to generate audio 2.

[0399] In some embodiments, before mixing, the electronic device 100 may perform noise reduction processing on the audio captured by the microphone. For example, the noise reduction processing may eliminate background noise in the audio captured by the microphone. Then, the electronic device 100 may mix the noise-reduced audio with Audio 1 to generate Audio 2. This allows Audio 1 to replace the actual background noise in the captured audio, thereby protecting the user's call privacy.

[0400] S1118, Electronic device 101 acquires audio 3 through a microphone.

[0401] In some embodiments, audio 3 may be the audio obtained by noise reduction of the sound captured by the microphone.

[0402] S1119, Electronic device 100 sends audio 2 to electronic device 101.

[0403] S1120, Electronic device 101 sends audio 3 to electronic device 100.

[0404] S1121, Electronic device 101 plays audio 2.

[0405] S1122, Electronic device 100 plays audio 3.

[0406] It can be seen that audio 2 includes audio 1. That is, when the user of electronic device 101 listens to the call audio from electronic device 100, they can hear the call background sound set by the user of electronic device 100.

[0407] In some embodiments, when an operation to adjust the playback sound effect of audio 1 is received, the electronic device 100 can adjust the audio parameters of audio 1 to achieve the corresponding sound effect. The electronic device 100 can mix the audio 1 with the audio acquired by the microphone after the audio parameters are adjusted. For example, the operation to adjust the playback sound effect may include, but is not limited to, operations to adjust the volume, operations to adjust the volume change mode, etc. Audio parameters may include, but are not limited to, gain, frequency, phase, etc. The embodiments of this application do not limit the method of sound effect adjustment.

[0408] For example, in response to the operation of increasing the volume of audio 1 from volume 1 to volume 2, electronic device 100 can increase the volume of audio 1 in audio 2. In this way, when a user of electronic device 101 listens to a call audio from electronic device 100, they can perceive an increase in the volume of the background noise from the other end of the call.

[0409] In some embodiments, when an operation to switch the background noise of a call to audio 1' is received, electronic device 100 can switch the audio mixed with the audio acquired by the microphone in step S1117 from audio 1 to audio 1'. Electronic device 100 can send the audio generated by mixing audio 1' with the audio acquired by the microphone to electronic device 101. For example, audio 1 may contain keyboard sounds. Audio 1' may contain traffic sounds. Switching the background noise of a call from audio 1 to audio 1' can indicate that the user's virtual call scenario has changed from an office scenario to a roadside scenario. This application embodiment does not limit the types of sounds contained in audio 1 and audio 1'.

[0410] Optionally, the electronic device 100 can also switch the background sound during a call according to the background sound switching mode and the background sound switching duration set by the user. The background sound switching mode and the background sound switching duration can be referred to the description in the foregoing embodiments. For example, if the background sound switching mode is "fade out-quiet-fade in" as shown in Figure 5B, and the background sound switching duration is 2 seconds, then the electronic device 100 can gradually change the background sound of the call mixed in audio 2 from audio 1 to audio 1' within 2 seconds. This can achieve a transition in the virtual call scene.

[0411] The execution order of step S1115 is not limited in this embodiment. For example, step S1115 can be executed before step S1116, or step S1115 can be executed after step S1116. Electronic device 101 can also receive the operation of setting the call background sound. The method of electronic device 101 mixing background sound into the call audio sent to the other end of the call can refer to the method of electronic device 100 mixing background sound into the call audio.

[0412] Step S1115 is optional. In some embodiments, when the background sound is enabled, the electronic device 100 can use the default background sound after establishing a call connection with the electronic device 101.

[0413] S1123~S1128: Send voice emoticons.

[0414] S1123, Electronic device 100 receives the operation to send voice emoticon pack 1.

[0415] The voice emoticon pack 1 can refer to the voice emoticon pack 721B shown in Figure 7B above, or the voice emoticon pack 824 shown in Figure 8A. The operation of sending the voice emoticon pack 1 can refer to the operation of the sending control 732 shown in Figure 7B above.

[0416] S1124. Electronic device 100 mixes audio 1, audio acquired through microphone, and audio corresponding to voice emoticon 1 to generate audio 4.

[0417] The electronic device 100 can mix the audio acquired through the microphone after receiving the operation of sending voice emoticon 1 with the audio of audio 1 and the audio corresponding to voice emoticon 1 to generate audio 4. The audio acquired through the microphone can be noise-reduced audio.

[0418] S1125, Electronic device 100 sends the display content corresponding to audio 4 and voice emoticon 1 to electronic device 101.

[0419] In some embodiments, the displayed content corresponding to voice emoticon 1 can be an image. That is, the displayed content corresponding to voice emoticon 1 can be static. Alternatively, the displayed content corresponding to voice emoticon 1 can include multiple images. That is, the displayed content corresponding to voice emoticon 1 can be dynamic. Alternatively, the displayed content corresponding to voice emoticon 1 can include a piece of text or a piece of characters. This application embodiment does not limit the displayed content corresponding to voice emoticon 1.

[0420] S1126, Electronic device 101 plays audio 4 and displays the content corresponding to voice emoticon 1.

[0421] The display content corresponding to the received voice emoticon 1 in a voice call scenario can refer to the scenarios shown in Figure 7C or Figure 8B. For example, the display content corresponding to the voice emoticon 1 may include the display content 751 shown in Figure 7C.

[0422] S1127, Electronic device 100 mixes the audio corresponding to audio 3 and voice emoticon pack 1 to generate audio 5.

[0423] Audio 3 may include audio sent by electronic device 101 to electronic device 100 in real time during a call.

[0424] Electronic device 100 can mix the call audio from electronic device 101 after receiving the operation of sending voice emoticon 1 with the audio corresponding to voice emoticon 1 to generate audio 5.

[0425] S1128, Electronic device 100 plays audio 5 and displays the content corresponding to voice emoticon 1.

[0426] The display content corresponding to the voice emoticon pack 1 on the electronic device 100 can be referenced from the scenarios shown in Figure 7C or Figure 8B above.

[0427] Steps S1127 and S1128 are optional. For example, electronic device 100 can directly play audio 3 from electronic device 101 without mixing audio 3 with the audio corresponding to voice emoticon 1, and without displaying the display content corresponding to voice emoticon 1.

[0428] In some embodiments, the electronic device 100 can provide a function to merge the display content corresponding to the voice emoticon with the user's avatar. If the function of merging the display content corresponding to the voice emoticon with the user's avatar is enabled, the electronic device 100 can perform image fusion between the display content corresponding to the voice emoticon 1 and the personal avatar of the calling application in the electronic device 100. That is, in step S1125 above, the electronic device 100 can send audio 4, the data after merging the display content corresponding to the voice emoticon 1 with the user's avatar. The scenario of merging the display content corresponding to the voice emoticon with the user's avatar can be referred to the scenarios shown in Figures 7E to 7G above.

[0429] In some embodiments, the electronic device 100 can determine the audio parameters of the audio corresponding to the voice emoticon based on the sound effects played by the user-defined voice emoticon. For example, the electronic device 100 can increase the volume of the audio corresponding to the voice emoticon to be sent in the call audio by increasing the volume of the voice emoticon.

[0430] In some embodiments, the electronic device 100 can determine the display style of the content corresponding to the voice emoticon based on the user-defined voice emoticon display style. The voice emoticon display style may include, but is not limited to, display size, display color, display font, etc.

[0431] S1129~S1138: Send voice emoticons after turning off the background noise during the call.

[0432] S1129, Electronic device 100 receives an operation to turn off the background noise during a call.

[0433] S1130, electronic device 100 acquires audio 6 through a microphone.

[0434] S1131, Electronic device 100 sends audio 6 to electronic device 101.

[0435] S1132, Electronic device 101 plays audio 6.

[0436] It should be noted that during the process of electronic device 100 sending audio 6, electronic device 101 can send the real-time acquired audio, i.e., audio 3, to electronic device 100.

[0437] S1133, Electronic device 100 receives the operation to send voice emoticon 2.

[0438] Step S1133 can be referred to the aforementioned step S1123.

[0439] S1134. Electronic device 100 mixes the audio corresponding to voice emoticon pack 2 with the audio acquired through microphone to generate audio 7.

[0440] The electronic device 100 can mix the audio acquired through the microphone after receiving the operation to send the voice emoticon 2 with the audio corresponding to the voice emoticon 2 to generate audio 7. The audio acquired through the microphone can be noise-reduced audio.

[0441] S1135, Electronic device 100 sends the audio 7 and the display content corresponding to the voice emoticon 2 to electronic device 101.

[0442] S1136, Electronic device 101 plays audio 7 and displays the content corresponding to voice emoticon 2.

[0443] S1137, Electronic device 100 mixes the audio corresponding to audio 3 and voice emoticon 2 to generate audio 8.

[0444] S1138, Electronic device 100 plays audio 8 and displays the content corresponding to voice emoticon 2.

[0445] Steps S1135 to S1138 can refer to the aforementioned steps S1125 to S1128.

[0446] As described above, in a voice call scenario, the electronic device 100 can replace the actual background noise in the acquired audio with a user-set background noise. This simulates the user's call scenario and protects the user's call privacy. Furthermore, the user can manually adjust the playback effects of the background noise, allowing the electronic device 100 to personalize the background noise in the call audio to meet their individual needs. In addition, the electronic device 100 can send user-selected voice emoticons to the other end of the call. The user can manually adjust the playback effects of the audio corresponding to the voice emoticon and / or the display effects of the content corresponding to the voice emoticon. This enhances the fun of the call and improves the user's call experience.

[0447] Figure 12 illustrates a flowchart of another call method provided by an embodiment of this application.

[0448] S1211, electronic device 100 and electronic device 101 establish a voice call connection.

[0449] S1212, Electronic device 100 receives an operation to set audio 1 as the background sound for a call.

[0450] Step S1212 can be referred to step S1115 shown in Figure 11 above.

[0451] S1213, Electronic device 100 acquires audio 11 through microphone.

[0452] In some embodiments, audio 11 may be noise-reduced audio.

[0453] S1214, Electronic device 100 sends the identifier of audio 1 and audio 11 to electronic device 101.

[0454] S1215, Electronic device 101 sends audio 12 to electronic device 100.

[0455] S1216, Electronic device 101 obtains audio 1 according to the identifier of audio 1, mixes audio 1 and audio 11 and plays them.

[0456] In some embodiments, electronic device 101 may first check its memory to see if audio 1 is stored. If audio 1 is stored, electronic device 101 can read audio 1 from its memory. If audio 1 is not stored in electronic device 101, electronic device 101 can retrieve audio 1 from a cloud server based on the identifier of audio 1. The cloud server may be an application server corresponding to the call application.

[0457] In some embodiments, in addition to the identifier of audio 1, electronic device 100 may also send audio parameters of audio 1 to electronic device 101. Electronic device 101 can mix audio 1 and audio 11 according to the audio parameters of audio 1. The parameters of audio 1 may be determined by electronic device 100 based on the playback sound effects of the call background sound set by the user.

[0458] S1217, Electronic device 100 plays audio 12.

[0459] In some embodiments, when receiving an operation to adjust the playback sound effect of audio 1, electronic device 100 can adjust the audio parameters of audio 1 and send the adjusted audio parameters to electronic device 101. Electronic device 101 can mix audio 1 and audio 11 according to the adjusted audio parameters. For example, in response to an operation to increase the volume of audio 1 from volume 1 to volume 2, electronic device 100 can send a message to electronic device 101 to increase the volume of audio 1 to volume 2. Electronic device 101 can then increase the volume of audio 1 to volume 2 in audio 12. In this way, when a user of electronic device 101 listens to a call audio from electronic device 100, they can perceive an increase in the volume of the background noise from the other end of the call.

[0460] In some embodiments, when receiving an operation to switch the background audio of a call to audio 1', electronic device 100 can send an identifier of audio 1' to electronic device 101 and instruct electronic device 101 to switch the background audio mixed in the call audio from electronic device 100 to audio 1'. Electronic device 101 can obtain audio 1' based on the identifier of audio 1'. Electronic device 101 can mix audio 1' and audio 11 and then play them.

[0461] Optionally, electronic device 100 can also send background sound switching data, such as the user-set background sound switching mode and background sound switching duration, to electronic device 101. Electronic device 101 can switch the background sound during the call based on the aforementioned background sound switching data. For example, if the background sound switching mode is "fade out-quiet-fade in" as shown in Figure 5B, and the background sound switching duration is 2 seconds, then electronic device 101 can gradually change the audio mixed with audio 11 from audio 1 to audio 1' within 2 seconds. Specifically, electronic device 101 can play the mixed audio 1 and audio 11, gradually decreasing the volume of audio 1. When the volume of audio 1 decreases to 0, electronic device 101 can play the mixed audio 1' and audio 11, gradually increasing the volume of audio 1' to a preset volume. This allows for the transition of the virtual call scene.

[0462] S1218, Electronic device 100 receives the operation to send voice emoticon pack 1.

[0463] Step S1218 can refer to step S1123 shown in Figure 11 above.

[0464] S1219, Electronic device 100 sends the identifier of voice emoticon pack 1 to electronic device 101.

[0465] S1220, electronic device 101 obtains the audio and display content corresponding to voice emoticon 1 according to the identifier of voice emoticon 1, mixes audio 1, audio 11 and the audio corresponding to voice emoticon 1 and plays them, and displays the display content corresponding to voice emoticon 1.

[0466] The method by which electronic device 101 obtains the audio and display content corresponding to voice emoticon 1 based on the identifier of voice emoticon 1 can refer to the method of obtaining audio 1 based on the identifier of audio 1 in step S1216 above.

[0467] It should be noted that if the voice emoticon 1 only has corresponding audio, the electronic device 101 can mix and play audio 1, audio 11, and the audio corresponding to the voice emoticon 1. If the voice emoticon 1 only has corresponding display content, the electronic device 101 can mix and play audio 1 and audio 11, and display the display content corresponding to the voice emoticon 1.

[0468] S1221, Electronic device 100 mixes the audio corresponding to audio 12 and voice emoticon pack 1 and plays it, and displays the display content corresponding to voice emoticon pack 1.

[0469] Audio 12 may include audio sent by electronic device 101 to electronic device 100 in real time during a call.

[0470] The electronic device 100 can mix and play the call audio from the electronic device 101 after receiving the operation to send voice emoticon 1, and the audio corresponding to voice emoticon 1. The display content corresponding to voice emoticon 1 shown by the electronic device 100 can be referred to the scenarios shown in Figure 7C or Figure 8B above.

[0471] Step S1221 is optional. For example, electronic device 100 can directly play audio 12 from electronic device 101 without mixing audio 12 with the audio corresponding to voice emoticon 1, and without displaying the display content corresponding to voice emoticon 1.

[0472] In some embodiments, electronic device 100 can provide a function to merge the display content corresponding to a voice emoticon with the user's avatar. If the function of merging the display content corresponding to the voice emoticon with the user's avatar is enabled, electronic device 100 can instruct electronic device 101 to merge the display content corresponding to the voice emoticon with the user's avatar. For example, electronic device 100 can send a message to electronic device 101 to instruct the function of merging the display content corresponding to the voice emoticon with the user's avatar to be enabled. Electronic device 101 can merge the display content corresponding to voice emoticon 1 with the avatar of the user corresponding to electronic device 100 in the call application. Electronic device 101 can display the merged display content, as shown in the scenarios illustrated in Figures 7E to 7G above.

[0473] In some embodiments, electronic device 100 can determine the audio parameters of the audio corresponding to a user-defined voice emoticon by playing sound effects. Electronic device 100 can then send the audio parameters of the audio corresponding to the voice emoticon to electronic device 101. Electronic device 101 can then mix the audio corresponding to the voice emoticon with the call audio from electronic device 100 based on these audio parameters.

[0474] In some embodiments, electronic device 100 can determine the display style of the content corresponding to the voice emoticon based on the user-defined voice emoticon display style. Electronic device 100 can send the display style of the content corresponding to the voice emoticon to electronic device 101. Electronic device 101 can then display the content corresponding to the voice emoticon according to the display style.

[0475] S1222, Electronic device 100 receives an operation to turn off the background noise during a call.

[0476] For example, turning off the background noise during a call can be done by operating the switch 545 shown in Figure 5B above.

[0477] S1223, Electronic device 100 acquires audio 13 through a microphone.

[0478] In some embodiments, audio 13 may be noise-reduced audio. The noise reduction process performed on audio 13 may be the same as or different from the noise reduction process performed on audio 11 in step S1213.

[0479] S1224, Electronic device 100 sends an audio message 13 and a message to turn off the background sound of the call to electronic device 101.

[0480] S1225, Electronic device 101 plays audio 13.

[0481] Based on the aforementioned message to turn off background noise during a call, electronic device 101 can stop mixing the background noise (e.g., audio 1) with the audio from electronic device 100. Electronic device 101 can then play audio 13.

[0482] As described above, in a voice call scenario, electronic device 100 can send the identifier of the user-set background sound to electronic device 101 when the background sound is enabled. Based on the received background sound identifier, electronic device 101 can mix the call audio from electronic device 100 with the corresponding background sound and play it. This effectively simulates the user's call scenario on virtual electronic device 100, protecting their call privacy. Furthermore, electronic device 100 can also send the identifier of a voice emoticon that the user wants to send to electronic device 101. Based on the received voice emoticon identifier, electronic device 101 can play the corresponding audio and / or display the corresponding content. This enhances the fun of the call and improves the user's call experience.

[0483] Figure 13 illustrates a flowchart of another call method provided by an embodiment of this application.

[0484] S1311~S1316: Establish video call connection.

[0485] S1311, Electronic device 100 receives a video call from electronic device 101.

[0486] S1312, Electronic device 100 initiates a video call request to electronic device 101.

[0487] S1313, Electronic device 101 receives a video call from electronic device 101.

[0488] S1314, Electronic device 101 receives the operation to answer a phone call.

[0489] S1315, Electronic device 100 receives an operation to set audio 1 as the background sound for a call.

[0490] Step S1315 can refer to step S1115 shown in Figure 11 above.

[0491] S1316, Electronic device 100 and electronic device 101 establish a video call connection.

[0492] After establishing a video call connection, electronic devices 100 and 101 can transmit call audio and video to each other.

[0493] S1317~S1322: Send call audio including background noise.

[0494] S1317, Electronic device 100 mixes audio 1 with audio acquired through a microphone to generate audio 21.

[0495] In some embodiments, before mixing, the electronic device 100 may perform noise reduction processing on the audio captured by the microphone. For example, the noise reduction processing may eliminate background noise in the audio captured by the microphone. Then, the electronic device 100 may mix the noise-reduced audio with audio 1 to generate audio 21. This allows audio 1 to replace the actual background noise in the captured audio, thereby protecting the user's call privacy.

[0496] S1318, Electronic device 101 acquires audio 22 through a microphone.

[0497] S1319, Electronic device 100 sends audio 21 and video 1 captured by electronic device 100 through camera to electronic device 101.

[0498] S1320, electronic device 101 sends audio 22 and video 2 captured by electronic device 101 through camera to electronic device 100.

[0499] S1321, Electronic device 101 plays audio 21 and video 1.

[0500] In some embodiments, in addition to playing video 1, electronic device 101 can also play video 2.

[0501] S1322, Electronic device 100 plays audio 22 and video 2.

[0502] In some embodiments, in addition to playing video 2, electronic device 100 can also play video 1.

[0503] In some embodiments, the electronic device 100 may adjust the playback sound effect of audio 1 in audio 21, or switch the mixed background sound of the call in audio 21 to other audio. For details, please refer to the description of the method shown in Figure 11 above.

[0504] In some embodiments, in a video call scenario, in addition to setting the background sound, the electronic device 100 can also provide the function of setting a virtual background. The scenario for setting a virtual background can be referred to the scenarios shown in Figures 9D to 9E. For example, in response to the operation of setting background 1 as a virtual background, the electronic device 100 can perform subject recognition on the image in video 1 to distinguish the subject and background in the image. The electronic device 100 can replace the background of the image in video 1 with background 1 to generate video 1'. The electronic device 100 can send video 1' to the electronic device 101. The electronic device 101 can play video 1'. Background 1 can be an image or a video. The embodiments of this application do not limit the content of background 1.

[0505] In this scenario, Background 1 and Audio 1 can be related, such as being associated with the same scene. This allows the virtual background and background audio to work together, creating a more realistic virtual video call environment and better protecting user privacy during video calls. Alternatively, Background 1 and Audio 1 can be independent. Users can set their own background audio and virtual background separately.

[0506] S1323~S1330: Send voice emoticons.

[0507] S1323, Electronic device 100 receives the operation of sending voice emoticon pack 1.

[0508] S1324. Electronic device 100 mixes audio 1, audio acquired through microphone, and audio corresponding to voice emoticon 1 to generate audio 23.

[0509] Steps S1323 and S1324 can refer to steps S1123 and S1124 shown in Figure 11 above.

[0510] S1325, Electronic device 100 performs image fusion between the displayed content corresponding to the voice emoticon pack 1 and the image captured by the camera to generate video 3.

[0511] In some embodiments, the content displayed corresponding to the voice emoticon 1 can be an image. The electronic device 100 can fuse the image corresponding to the voice emoticon 1 with multiple images captured by a camera. In this way, the electronic device 101 can continuously display a static voice emoticon for a period of time.

[0512] Alternatively, the display content corresponding to the voice emoticon 1 may include multiple images. The electronic device 100 can merge the multiple images corresponding to the voice emoticon with multiple images captured by a camera. In this way, the electronic device 101 can display dynamic voice emoticons for a period of time.

[0513] S1326, Electronic device 100 sends audio 23 and video 3 to electronic device 101.

[0514] S1327, Electronic device 101 plays audio 23 and video 3.

[0515] The scenario in which electronic device 101 plays audio 23 and video 3 can be referred to the scenario shown in Figure 9C above.

[0516] S1328, Electronic device 100 mixes the audio corresponding to audio 22 and voice emoticon pack 1 to generate audio 24.

[0517] Step S1328 can refer to step S1127 shown in Figure 11 above.

[0518] S1329, Electronic device 100 performs image fusion on the images and voice emoticons 1 in video 2 to generate video 4.

[0519] Video 2 may include images captured in real time by electronic device 101 during a video call.

[0520] The electronic device 100 can perform image fusion with the display content corresponding to the voice emoticon 1 after receiving the operation of sending the voice emoticon 1, and generate video 4.

[0521] S1330, Electronic Device 100 plays audio 24 and video 4.

[0522] In some embodiments, steps S1328 and S1329 are optional. For example, after sending the voice emoticon 1 to the electronic device 101, the electronic device 100 can continue playing the audio 22 and the video 2.

[0523] Alternatively, the electronic device 100 can perform image fusion and display on multiple images captured by the camera after receiving the operation of sending voice emoticon 1 and the display content corresponding to voice emoticon 1.

[0524] In some embodiments, the electronic device 100 may also provide functions for merging the display content corresponding to the voice emoticon with the user's avatar, setting the playback sound effect of the voice emoticon, and setting the display style of the voice emoticon. For details, please refer to the description of the method shown in Figure 11 above. This will not be repeated here.

[0525] As described above, in a video call scenario, the electronic device 100 can replace the actual background noise in the captured audio with the user-set background noise, and replace the actual background in the captured image with a user-set virtual background. This simulates the user's video call scenario and protects the user's call privacy. The electronic device 100 can also send user-selected voice emoticons to the device. The user of the electronic device 100 can see the corresponding content of the voice emoticons sent by the other end in the video call frame, and listen to the corresponding audio. This increases the fun of the call and enhances the user's call experience.

[0526] In some embodiments, after electronic device 100 and electronic device 101 establish a video call connection, electronic device 100 can send an identifier of the background audio to electronic device 101. Based on this background audio identifier, electronic device 101 can mix the corresponding background audio with the call audio from electronic device 100 and play it. Specifically, refer to steps S1214 to S1216 shown in FIG12 above. That is, electronic device 100 can instruct electronic device 101 to mix the background audio with the call audio from electronic device 100.

[0527] When an operation to send a voice emoticon is received, electronic device 100 can send the identifier of the voice emoticon to electronic device 101. Based on the identifier of the voice emoticon, electronic device 101 can mix the audio corresponding to the voice emoticon with the call audio from electronic device 100 and play it, and display the display content corresponding to the voice emoticon. That is, electronic device 100 does not need to perform steps S1324 and S1325 shown in Figure 13 above.

[0528] This application also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, can implement the steps in the above-described method embodiments.

[0529] This application also provides a computer program product, including a computer program that, when run on a processor, can implement the steps in the various method embodiments described above.

[0530] This application also provides a chip system, which includes a processing circuit and an interface circuit. The interface circuit receives code instructions and transmits them to the processing circuit. The processing circuit executes the code instructions to enable the chip system to implement the steps of any method embodiment of this application. The chip system can be a single chip or a chip module composed of multiple chips.

[0531] It is understood that the user interfaces described in the embodiments of this application are merely example interfaces and do not constitute a limitation on the solution of this application. In other embodiments, the user interface may adopt different interface layouts, may include more or fewer controls, and may add or remove other functional options, as long as they are based on the same inventive concept provided in this application, they are all within the protection scope of this application.

[0532] It should be noted that, without causing contradictions or conflicts, any feature in any embodiment of this application, or any part of any feature, can be combined, and the combined technical solution is also within the scope of the embodiments of this application.

[0533] The above-described embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit it. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of this application.

Claims

1. A method for making a call, characterized in that, The method is applied to a first device, and the method includes: The first device and the second device establish a call connection; The first device sends a first audio signal to the second device; The first device receives a second audio signal from the second device and plays the second audio signal; The first device receives a first operation for setting the playback sound effect of the voice emoticon to a first sound effect, and a second operation for sending the first voice emoticon; The first device mixes the first audio and the audio corresponding to the first voice emoticon according to the first sound effect to generate a third audio, and sends the third audio to the second device.

2. The method of claim 1, wherein, The method further includes: The first device receives an operation to send a second voice emoticon; The first device mixes the first audio and the audio corresponding to the second voice emoticon according to the first sound effect to generate a fourth audio, and sends the fourth audio to the second device.

3. The method according to claim 1 or 2, characterized in that, The method further includes: The first device receives an operation to set the playback sound effect of the voice emoticon to the second sound effect, and an operation to send the third voice emoticon; The first device mixes the first audio and the audio corresponding to the third voice emoticon according to the second sound effect to generate a fifth audio, and sends the fifth audio to the second device.

4. The method according to any one of claims 1 to 3, characterized in that, The first sound effect includes one or more of the following: the volume of the voice emoticon is a first volume, and the volume change mode of the voice emoticon is a first change mode.

5. The method according to any one of claims 1-4, characterized in that, After the first device receives the second operation for sending the first voice emoticon, the method further includes: The first device mixes the second audio and the audio corresponding to the first voice emoticon pack according to the first sound effect to generate the sixth audio; The first device plays the sixth audio.

6. The method according to any one of claims 1-5, characterized in that, The method further includes: When the playback sound effect of the voice emoticon is the first sound effect, the first device receives an operation to preview the fourth voice emoticon. The first device mixes the audio of the second audio and the fourth voice emoticon pack according to the first sound effect to generate the seventh audio; The first device plays the seventh audio.

7. The method according to any one of claims 1 to 6, characterized in that, The call connection is a video call connection, and the method further includes: The first device sends the first video to the second device; The first device receives a second video from the second device and plays the second video; In response to the second operation for sending the first voice emoticon, the first device performs image fusion on the first video and the display content corresponding to the first voice emoticon to generate a third video; The first device sends the third video to the second device.

8. The method of claim 7, wherein, The method further includes: The first device performs image fusion on the second video and the display content corresponding to the first voice emoticon to generate a fourth video; The first device plays the fourth video.

9. The method according to claim 7 or 8, characterized in that, The display content corresponding to the first voice emoticon includes text and / or images.

10. The method according to any one of claims 7-9, characterized in that, Before the first device receives the second operation for sending the first voice emoticon, the method further includes: The first device receives an operation to set the display style of the voice emoticon to a first style; The first device performs image fusion on the first video and the display content corresponding to the first voice emoticon to generate a third video, specifically including: The first device performs image fusion on the display content corresponding to the first voice emoticon according to the first style to generate the third video, in which the display content corresponding to the first voice emoticon is displayed according to the first style.

11. The method according to any one of claims 7-10, characterized in that, The method further includes: The first device receives an operation to enable the first function and an operation to send the fifth voice emoticon. The first device acquires a first user avatar, which is the avatar of a user logged into the calling application used to establish the call connection on the first device. The first device performs image fusion on the first user avatar and the display content corresponding to the fifth voice emoticon to generate the first display content; The first device performs image fusion on the first video and the first displayed content to generate a fifth video; The first device sends the fifth video to the second device.

12. The method according to any one of claims 1-11, characterized in that, Before the first device and the second device establish a call connection, the method further includes: The first device receives an operation to turn on the first background sound; After the first device and the second device establish a call connection, the method further includes: The first device mixes the audio corresponding to the first background sound and the audio acquired by the first device to generate the first audio.

13. The method of claim 12, wherein, Before the first device and the second device establish a call connection, the method further includes: The first device receives an operation to set the background sound playback effect to a third sound effect; The first device mixes the audio corresponding to the first background sound and the audio acquired by the first device, specifically including: The first device mixes the audio corresponding to the first background sound and the audio collected by the first device according to the third sound effect.

14. The method of claim 13, wherein, After the first device mixes the audio corresponding to the first background sound and the audio acquired by the first device according to the third sound effect, the method further includes: The first device receives an operation to set the background sound playback effect to the fourth sound effect; The first device mixes the audio corresponding to the first background sound and the audio collected by the first device according to the fourth sound effect to generate the eighth audio. The first device sends the eighth audio signal to the second device.

15. The method according to claim 13 or 14, characterized in that, The third sound effect includes one or more of the following: the background sound volume is a second volume, the background sound volume change mode is a second change mode, the background sound switching mode is a first switching mode, and the background sound switching duration is a first duration.

16. The method according to any one of claims 12-15, characterized in that, The call connection is a video call connection, and the method further includes: When the first background sound is turned on, the first device replaces the background in the image captured by the first device with the first background corresponding to the first background sound, and generates the sixth video; The first device sends the sixth video to the second device; The first device receives a second video from the second device and plays the second video.

17. A talk method, characterized by, The method is applied to a communication system including a first device and a second device, the method comprising: The first device and the second device establish a call connection; The first device sends a first audio signal to the second device, and the second device sends a second audio signal to the first device; The first device plays the second audio, and the second device plays the first audio; The first device receives a first operation for setting the playback sound effect of the voice emoticon to a first sound effect, and a second operation for sending the first voice emoticon; The first device sends the first sound effect information corresponding to the first sound effect and the first emoticon identifier corresponding to the first voice emoticon to the second device; The second device obtains the audio corresponding to the first voice emoticon based on the first emoticon identifier, and mixes the first audio and the audio corresponding to the first voice emoticon based on the first sound effect information to generate a third audio. The second device plays the third audio.

18. The method of claim 17, wherein, The method further includes: The first device receives an operation to send a second voice emoticon; The first device sends the second emoticon identifier corresponding to the second voice emoticon to the second device; The second device obtains the audio corresponding to the second voice emoticon based on the second emoticon identifier, and mixes the first audio and the audio corresponding to the second voice emoticon based on the first sound effect information to generate a fourth audio. The second device plays the fourth audio.

19. The method of claim 17 or 18, wherein, The method further includes: The first device receives an operation to set the playback sound effect of the voice emoticon to the second sound effect, and an operation to send the third voice emoticon; The first device sends the second sound effect information corresponding to the second sound effect and the third emoticon identifier corresponding to the third voice emoticon to the second device; The second device obtains the audio corresponding to the third voice emoticon based on the third emoticon identifier, and mixes the first audio and the audio corresponding to the third voice emoticon based on the third sound effect information to generate the fifth audio. The second device plays the fifth audio.

20. The method of any one of claims 17-19, wherein, The first sound effect includes one or more of the following: the volume of the voice emoticon is a first volume, and the volume change mode of the voice emoticon is a first change mode.

21. The method according to any one of claims 17-20, characterized by, After the first device receives the second operation for sending the first voice emoticon, the method further includes: The first device mixes the second audio and the audio corresponding to the first voice emoticon pack according to the first sound effect to generate the sixth audio; The first device plays the sixth audio.

22. The method of any one of claims 17-21, wherein, After the first device sends the first sound effect information corresponding to the first sound effect and the first emoticon identifier corresponding to the first voice emoticon to the second device, the method further includes: The second device obtains the display content corresponding to the first voice emoticon based on the first emoticon identifier; The second device displays the content corresponding to the first voice emoticon pack.

23. The method of claim 22, wherein, The call connection is a video call connection, and the method further includes: The first device sends a first video to the second device, and the second device sends a second video to the first device; The first device plays the second video, and the second device plays the first video; The second device displays the content corresponding to the first voice emoticon pack, specifically including: The second device performs image fusion on the first video and the display content corresponding to the first voice emoticon to generate a third video; The second device plays the third video.

24. The method of claim 22 or 23, wherein, After the first device receives the second operation for sending the first voice emoticon, the method further includes: The first device displays the content corresponding to the first voice emoticon pack.

25. The method of any one of claims 17-24, wherein, The method further includes: The first device receives an operation to enable the first function and an operation to send the fifth voice emoticon. The first device sends a first message to the second device to indicate that the first function is enabled, as well as the fifth emoticon identifier corresponding to the fifth voice emoticon; According to the first message, the second device obtains the first user avatar, which is the avatar of the user logged in on the call application used to establish the call connection in the first device; The second device performs image fusion on the first user avatar and the display content corresponding to the fifth voice emoticon to generate the second display content; The second device displays the second display content.

26. The method of any one of claims 17-25, wherein, After the first device and the second device establish a call connection, the method further includes: The first device receives an operation to turn on the first background sound; The first device sends the first background sound identifier corresponding to the first background sound to the second device; The second device obtains the audio corresponding to the first background sound based on the first background sound identifier, and mixes the audio corresponding to the first background sound with the first audio to generate the ninth audio. The second device plays the ninth audio.

27. The method of claim 26, wherein, Before the first device receives the operation to turn on the first background sound, the method further includes: The first device receives an operation to set the background sound playback effect to a third sound effect; The first device sends the third sound effect information corresponding to the third sound effect to the second device; The mixing of the audio corresponding to the first background sound and the first audio specifically includes: The first device mixes the audio corresponding to the first background sound and the first audio sound according to the third sound effect.

28. The method of claim 27, wherein, The third sound effect includes one or more of the following: the background sound volume is a second volume, the background sound volume change mode is a second change mode, the background sound switching mode is a first switching mode, and the background sound switching duration is a first duration.

29. The method of any one of claims 26-28, wherein, The call connection is a video call connection, and the method further includes: The first device sends a first video to the second device, and the second device sends a second video to the first device; The second device obtains the first background corresponding to the first background sound based on the first background sound identifier, replaces the background of the image in the first video with the first background, and generates the sixth video; The second device plays the sixth video; The first device plays the second video.

30. An electronic device, comprising: The electronic device includes a memory and a processor, wherein the memory is used to store a computer program; the processor executes the computer program to implement the method of any one of claims 1-16.

31. A computer readable storage medium storing instructions, wherein, When the instructions are executed by the processor, they implement the method of any one of claims 1-16.

32. A computer program product, characterised in that, The computer program product includes computer instructions that, when executed by a processor, implement the method of any one of claims 1-16.