Audio processing method and apparatus, and device and storage medium

By performing enhancement and signal cancellation processing on the speakers on both sides of the terminal device, the problem of sound orientation error was solved, achieving a stereo depth sound field effect and improving the stereo playback effect.

WO2026129339A1PCT designated stage Publication Date: 2026-06-25GUANGZHOU KUGOU COMP TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
GUANGZHOU KUGOU COMP TECH CO LTD
Filing Date
2024-12-20
Publication Date
2026-06-25

Smart Images

  • Figure CN2024141184_25062026_PF_FP_ABST
    Figure CN2024141184_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of computers. Disclosed are an audio processing method and apparatus, and a device and a storage medium. The method is executed by a terminal device, which comprises a first speaker and a second speaker. The method comprises: when a first speaker and a second speaker are respectively located on two sides of a terminal device, acquiring a left-channel difference signal, a right-channel difference signal and a common signal in an audio signal; when the terminal device is in a first state, executing enhancement processing on the common signal to obtain a processed audio signal; and when the terminal device is in a second state, respectively executing signal cancellation processing on the left-channel difference signal and the right-channel difference signal to obtain a processed left-channel audio signal and a processed right-channel audio signal. According to the present application, different audio processing methods are used on the basis of the positions on two sides where a first speaker and a second speaker are located from a visual effect perspective, thereby improving the stereophonic playback effect.
Need to check novelty before this filing date? Find Prior Art

Description

Audio processing methods, apparatus, devices and storage media Technical Field

[0001] This application relates to the field of computer technology, and in particular to an audio processing method, apparatus, device, and storage medium. Background Technology

[0002] With the development of technology, stereo speakers have become increasingly common in mobile phones, tablets and other terminal devices. Compared with terminal devices that only have one speaker, terminal devices with stereo speakers can retain stereo information when playing back audio, resulting in a better playback effect.

[0003] In related technologies, the left and right channel speakers of a terminal device are typically located above and below the device when it is held vertically. For example, the two speakers are positioned near the earpiece and charging port, respectively, with the left channel speaker near the earpiece and the right channel speaker near the charging port. Since the user typically holds the terminal device vertically, the stereo left and right channels will emit sound simultaneously from the top and bottom of the device, allowing the user to hear a stereo sound effect.

[0004] However, since the human ear requires stereo sound from both sides, playing stereo sound from the top and bottom will cause the sound to be in the wrong position, affecting the stereo sound playback effect. Summary of the Invention

[0005] This application provides an audio processing method, apparatus, device, and storage medium. The technical solutions provided by this application are as follows:

[0006] According to one aspect of the embodiments of this application, an audio processing method is provided, the method being executed by a terminal device, the terminal device including a first speaker and a second speaker, the method comprising:

[0007] With the first speaker and the second speaker located on opposite sides of the terminal device, the left channel difference signal, the right channel difference signal, and the common signal in the audio signal are acquired. The left channel difference signal is a signal that exists alone in the left channel audio signal, the right channel difference signal is a signal that exists alone in the right channel audio signal, and the common signal is a signal that exists together in the left channel audio signal and the right channel audio signal.

[0008] When the terminal device is in the first state, the common signal is enhanced to obtain a processed audio signal. The processed audio signal is then output to the first speaker and the second speaker, respectively. The enhancement process is used to improve the sound effect of the common signal in the audio signal. The first state refers to the state in which the first speaker and the second speaker are located on the upper and lower sides of the terminal device, respectively.

[0009] When the terminal device is in the second state, signal cancellation processing is performed on the left channel difference signal and the right channel difference signal respectively to obtain a processed left channel audio signal and a processed right channel audio signal. The processed left channel audio signal is output to the first speaker, and the processed right channel audio signal is output to the second speaker. The signal cancellation processing is used to reduce the influence of the processed right channel audio signal on the playback effect of the processed left channel audio signal, and to reduce the influence of the processed left channel audio signal on the playback effect of the processed right channel audio signal. The second state refers to the state in which the first speaker and the second speaker are located on the left and right sides of the terminal device, respectively.

[0010] According to one aspect of the embodiments of this application, an audio processing apparatus is provided, the apparatus comprising:

[0011] The signal acquisition module is used to acquire the left channel difference signal, right channel difference signal and common signal in the audio signal when the first speaker and the second speaker are respectively located on both sides of the terminal device. The left channel difference signal is a signal that exists alone in the left channel audio signal, the right channel difference signal is a signal that exists alone in the right channel audio signal, and the common signal is a signal that exists together in the left channel audio signal and the right channel audio signal.

[0012] The first processing module is configured to perform enhancement processing on the common signal when the terminal device is in a first state, to obtain a processed audio signal, and to output the processed audio signal in the first speaker and the second speaker respectively. The enhancement processing is used to improve the sound effect of the common signal in the audio signal. The first state refers to the state in which the first speaker and the second speaker are located on the upper side and the lower side of the terminal device respectively.

[0013] The second processing module is configured to perform signal cancellation processing on the left channel difference signal and the right channel difference signal respectively when the terminal device is in the second state, to obtain a processed left channel audio signal and a processed right channel audio signal, output the processed left channel audio signal to the first speaker, and output the processed right channel audio signal to the second speaker. The signal cancellation processing is used to reduce the influence of the processed right channel audio signal on the playback effect of the processed left channel audio signal, and to reduce the influence of the processed left channel audio signal on the playback effect of the processed right channel audio signal. The second state refers to the state in which the first speaker and the second speaker are located on the left and right sides of the terminal device, respectively.

[0014] According to one aspect of the embodiments of this application, a terminal device is provided, the terminal device including a processor and a memory, the memory storing a computer program, the computer program being loaded and executed by the processor to implement the above-described audio processing method.

[0015] According to one aspect of the embodiments of this application, a computer-readable storage medium is provided, wherein a computer program is stored in the computer-readable storage medium, the computer program being loaded and executed by a processor to implement the above-described audio processing method.

[0016] According to one aspect of the embodiments of this application, a computer program product is provided, the computer program product including a computer program, the computer program being loaded and executed by a processor to implement the above-described audio processing method.

[0017] The technical solution provided in this application can bring the following beneficial effects:

[0018] By taking into account the distribution of the first and second speakers on the terminal device, as well as the placement of the terminal device, different audio processing methods are adopted according to the visual positions of the first and second speakers on opposite sides.

[0019] When the first speaker and the second speaker are visually positioned at the top and bottom of the terminal device, respectively, the sound signals from the first speaker and the second speaker are made completely consistent, solving the problem of misaligned upper and lower sound fields. By performing enhancement processing on the common signal, a deep sound field effect is achieved to highlight the common signal, drawing the user's attention to the common signal and compensating for the lack of stereo sound, thus indirectly achieving a stereo sound playback effect.

[0020] When the first speaker and the second speaker are visually positioned on the left and right sides of the terminal device, respectively, signal cancellation processing is performed on the left channel difference signal and the right channel difference signal. The right channel difference signal entering the left channel is canceled, and the left channel difference signal entering the right channel is canceled, thereby reducing signal crosstalk that occurs during the propagation of the audio signal and improving the stereo playback effect. Attached Figure Description

[0021] Figure 1 is a schematic diagram of the implementation environment of a solution provided in an embodiment of this application;

[0022] Figure 2 is a schematic diagram of the placement state of a terminal device provided in an embodiment of this application;

[0023] Figure 3 is a schematic diagram of the propagation path of an audio signal provided in an embodiment of this application;

[0024] Figure 4 is a flowchart of an audio processing method provided in an embodiment of this application;

[0025] Figure 5 is a schematic diagram of the speaker distribution when the terminal device provided in an embodiment of this application is in a vertical state;

[0026] Figure 6 is a schematic diagram of the complete process of extracting common signals from audio signals according to an embodiment of this application;

[0027] Figure 7 is a schematic diagram of the generation process of the processed audio signal provided in an embodiment of this application;

[0028] Figure 8 is a schematic diagram of the generation process of the processed left channel audio signal and the processed right channel audio signal provided in an embodiment of this application;

[0029] Figure 9 is a flowchart of an audio processing method provided in another embodiment of this application;

[0030] Figure 10 is a block diagram of an audio processing apparatus provided in an embodiment of this application;

[0031] Figure 11 is a structural block diagram of a terminal device provided in an embodiment of this application. Detailed Implementation

[0032] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.

[0033] Please refer to Figure 1, which shows a schematic diagram of an implementation environment provided by an embodiment of this application. This implementation environment can be implemented as an audio processing system. The implementation environment may include: a terminal device 10, which has speakers, including a first speaker 201 and a second speaker 202.

[0034] In this embodiment, the execution entity for each step is the terminal device 10, which refers to an electronic device with data computing, processing, and storage functions. The terminal device can be a smartphone, tablet, laptop, desktop computer, smart speaker, smartwatch, or other electronic device, but is not limited to these. The terminal device may have a client application for the target application installed. The target application has the function of playing audio signals. This application does not limit the type of target application, including but not limited to audio applications, video applications, social applications, game applications, search applications, news applications, etc. Optionally, the target application can be an application that requires downloading and installation, or it can be an application that can be used instantly; this embodiment does not limit this.

[0035] Terminal device 10 processes the audio signal to obtain an audio signal output from the first speaker 201 and an audio signal output from the second speaker 202. The first speaker 201 and the second speaker 202 are used to amplify and play the received audio signal, allowing the user to hear a stereo sound effect. Terminal device 10 can be connected to the first speaker 201 and the second speaker 202 via a network.

[0036] In this embodiment, the first speaker and the second speaker can be located on opposite sides of the terminal device, or on the same side of the terminal device. First, the left channel difference signal, the right channel difference signal, and the common signal in the audio signal are acquired. The left channel difference signal is a signal that exists independently in the left channel audio signal, the right channel difference signal is a signal that exists independently in the right channel audio signal, and the common signal is a signal that exists in both the left and right channel audio signals.

[0037] With the first and second speakers located on opposite sides of the terminal device, if the terminal device is in a first state (i.e., the first and second speakers are located above and below the terminal device, respectively), enhancement processing is performed on the common signal to improve its sound effect in the audio signal, resulting in a processed audio signal. This processed audio signal is then output to the first and second speakers, respectively. If the terminal device is in a second state (i.e., the first and second speakers are located to the left and right of the terminal device, respectively), signal cancellation processing is performed on the left and right channel difference signals, respectively. This reduces the impact of the processed right channel audio signal on the playback effect of the processed left channel audio signal, and also reduces the impact of the processed left channel audio signal on the playback effect of the processed right channel audio signal, resulting in processed left and right channel audio signals. The processed left channel audio signal is then output to the first speaker, and the processed right channel audio signal is output to the second speaker.

[0038] When the first speaker and the second speaker are located on the same side of the terminal device, the state of the terminal device is not distinguished. The common signal is enhanced to obtain the processed audio signal, and then the processed audio signal is output to the first speaker and the second speaker respectively.

[0039] To achieve stereo sound playback, when playing audio signals, the terminal device outputs the audio signal from both the left and right channels. This allows the user's left ear to hear the audio signal from the left channel, and the user's right ear to hear the audio signal from the right channel, resulting in a richer audio playback experience. Specifically, the terminal device's first speaker plays the audio signal from the left channel, and its second speaker plays the audio signal from the right channel. Therefore, the first speaker in this application can also be referred to as the left speaker, and the second speaker can also be referred to as the right speaker.

[0040] However, since the first speaker and the second speaker may be located in different positions on the terminal device, the first speaker and the second speaker will produce different sound effects when they emit their respective audio signals.

[0041] For example, when the first speaker and the second speaker are located on opposite sides of the terminal device, the terminal device will be placed in different states due to the different ways users use the terminal device. The different placement states of the terminal device and the fact that the first speaker and the second speaker are located on different sides of the terminal device will result in different sound playback effects.

[0042] The placement of the terminal device includes both vertical and horizontal orientations. A vertical orientation means the terminal device is placed upright, as shown in Figure 2(1), in which case the display interface is in portrait mode. A horizontal orientation means the terminal device is placed horizontally, as shown in Figure 2(2), in which case the display interface is in landscape mode. Both orientations refer to the terminal device's screen facing the user or the back panel (the side opposite the screen) facing the user.

[0043] The first speaker and the second speaker are located on different sides of the terminal device, including the first speaker and the second speaker being located on the top and bottom sides of the terminal device, and the first speaker and the second speaker being located on the left and right sides of the terminal device, respectively. It should be noted that the top, bottom, left, and right sides of the terminal device mentioned here need to be described in conjunction with the placement state of the terminal device. For example, when the terminal device is in a vertical state, the top and bottom sides described in the vertical state can be equivalent to the top and bottom sides described in the horizontal state. The left and right sides described in the horizontal state and the left and right sides described in the vertical state can be equivalent to the top and bottom sides described in the horizontal state. As shown in Figures (1) and (2) of Figure 2, the two sides corresponding to the width in Figure (1) of Figure 2 are the top and bottom sides described in the vertical state, the two sides corresponding to the width in Figure (2) of Figure 2 are the left and right sides described in the horizontal state, the two sides corresponding to the length in Figure (1) of Figure 2 are the left and right sides described in the vertical state, and the two sides corresponding to the length in Figure (2) of Figure 2 are the top and bottom sides described in the horizontal state.

[0044] When the first speaker and the second speaker are located on the left and right sides of the terminal device, respectively—for example, when the first speaker is on the left side of the terminal device and the second speaker is on the right side—the user's left ear will hear the left channel audio signal emitted by the first speaker, and the user's right ear will hear the right channel audio signal emitted by the second speaker. However, simultaneously, the user's left ear will hear the right channel audio signal emitted by the second speaker, and the user's right ear will hear the left channel audio signal emitted by the first speaker. Referring to Figure 3, which illustrates the audio signal propagation path, the arrows represent the paths the audio signal should follow, i.e., the left channel audio signal enters the left ear and the right channel audio signal enters the right ear. The dashed arrows represent additional propagation paths during actual sound wave transmission, i.e., the left channel audio signal enters the right ear and the right channel audio signal enters the left ear. Therefore, the technical solution provided in this application needs to solve the problem of sound wave propagation path indicated by the dashed arrow in Figure 3 when the first speaker and the second speaker are located on the left and right sides of the terminal device, respectively, that is, to avoid the audio signal of the left channel from entering the right ear as much as possible, and to avoid the audio signal of the right channel from entering the left ear as much as possible.

[0045] When the first speaker and the second speaker are located on the top and bottom sides of the terminal device, respectively—for example, when the first speaker is on the top side and the second speaker is on the bottom side—the sound direction of the left channel audio signal emitted by the first speaker and the right channel audio signal emitted by the second speaker becomes disordered, thus affecting the stereo sound reproduction effect. Therefore, the technical solution provided in this application aims to solve the problem of stereo sound field disorder when the first speaker and the second speaker are located on the top and bottom sides of the terminal device, respectively.

[0046] For example, when the first speaker and the second speaker are located on the same side of the terminal device, since the first speaker and the second speaker emit the left channel audio signal and the right channel audio signal on the same side of the terminal device, the placement of the terminal device will not affect the sound playback effect of the audio signal, and the sound wave propagation path problem shown in Figure 3 will not occur. However, sound playback on the same side will cause the sound orientation to be disordered. Therefore, the technical solution provided by the embodiments of this application also needs to solve the problem of stereo sound field disorder when the first speaker and the second speaker are located on the same side of the terminal device.

[0047] Please refer to Figure 4, which shows a flowchart of an audio processing method provided in an embodiment of this application. The execution entity of each step of this method can be a terminal device. The method may include at least one of the following steps 410 to 430:

[0048] Step 410: With the first speaker and the second speaker located on opposite sides of the terminal device, acquire the left channel difference signal, the right channel difference signal, and the common signal in the audio signal.

[0049] The left channel difference signal is a signal that exists alone in the left channel audio signal, the right channel difference signal is a signal that exists alone in the right channel audio signal, and the common signal is a signal that exists in both the left and right channel audio signals.

[0050] The two sides of a terminal device refer to the two sides that correspond to each other among the four sides of the terminal device. For example, the two sides of a terminal device can refer to the top and bottom sides of the terminal device, or the left and right sides of the terminal device. The top and bottom sides, as well as the left and right sides of the terminal device, need to be considered in conjunction with the placement of the terminal device. For example, in Figure 2(1), the top and bottom sides of the terminal device refer to the two sides corresponding to the width of the terminal device. In Figure 2(2), the top and bottom sides of the terminal device refer to the two sides corresponding to the length of the terminal device.

[0051] The first speaker and the second speaker are located on both sides of the terminal device, but this application does not specify which side or position the first speaker and the second speaker are located on.

[0052] For example, when the terminal device is in a vertical position, refer to the speaker distribution diagram of the terminal device in a vertical position shown in Figure 5. In Figure 5(1), the first speaker 201 is located on the upper right side of the terminal device, and the second speaker 202 is located on the lower right side of the terminal device. In Figure 5(2), the first speaker 201 is located on the lower left side of the terminal device, and the second speaker 202 is located on the lower right side of the terminal device.

[0053] Taking a mobile phone as an example, the first speaker is located near the earpiece on the upper side of the phone, and the second speaker is located near the charging port on the lower side of the phone.

[0054] Audio signal refers to the audio signal that the terminal device originally intended to send to the speaker for playback. The audio signal includes the left channel audio signal and the right channel audio signal. The left channel audio signal is the audio signal that the terminal device originally intended to send to the first speaker for playback, and the right channel audio signal is the audio signal that the terminal device originally intended to send to the second speaker for playback.

[0055] The left channel audio signal includes a left channel difference signal and a common signal, and the right channel audio signal includes a right channel difference signal and a common signal. Therefore, the audio signal contains three signal components: the left channel difference signal, the right channel difference signal, and the common signal. The left channel difference signal is a signal that exists only in the left channel audio signal and not in the right channel audio signal. The right channel difference signal is a signal that exists only in the right channel audio signal and not in the left channel audio signal. The common signal is a signal that exists in both the left and right channel audio signals.

[0056] On the other hand, the left channel difference signal refers to the audio signal picked up only by the left microphone during the audio signal recording stage, the right channel difference signal refers to the audio signal picked up only by the right microphone during the audio signal recording stage, and the common signal refers to the audio signal picked up by both the left and right microphones during the audio signal recording stage. In some embodiments, the left channel difference signal refers to the audio signal that arrives at the left microphone before the right microphone during the audio signal recording stage, the right channel difference signal refers to the audio signal that arrives at the right microphone before the left microphone during the audio signal recording stage, and the common signal refers to the audio signal that arrives at the left and right microphones within a very short time interval during the audio signal recording stage, or the audio signal that arrives at the left and right microphones simultaneously.

[0057] Normally, the left channel difference signal and the right channel difference signal are the audio signals corresponding to the accompaniment, while the common signal is the audio signal corresponding to the human voice.

[0058] For example, the left channel audio signal L = L′ + C, and the right channel audio signal R = R′ + C, where L′ represents the left channel difference signal, R′ represents the right channel difference signal, and C represents the common signal.

[0059] Based on the left and right channel audio signals in the audio signal, the signal components are separated to obtain a common signal. Specific signal separation methods can be found in the following embodiments, which will not be described here. A left channel difference signal is obtained from the left channel audio signal and the common signal, and a right channel difference signal is obtained from the right channel audio signal and the common signal.

[0060] Step 420: When the terminal device is in the first state, perform enhancement processing on the common signal to obtain the processed audio signal, and output the processed audio signal to the first speaker and the second speaker respectively.

[0061] Enhancement processing is used to improve the sound effect of the common signal in the audio signal. The first state refers to the state in which the first speaker and the second speaker are located on the upper and lower sides of the terminal device, respectively.

[0062] The terminal device being in the first state refers to the state where the first speaker and the second speaker are located on the upper and lower sides of the terminal device, respectively. In other words, the first speaker and the second speaker are positioned visually on the upper and lower sides of the terminal device when it is in its placement state. For example, when the terminal device is in a vertical position, the first speaker and the second speaker are located on either side corresponding to the width of the terminal device; when the terminal device is in a horizontal position, the first speaker and the second speaker are located on either side corresponding to the length of the terminal device.

[0063] When the first speaker and the second speaker are located on the top and bottom of the terminal device, respectively, the audio signal from the left channel of the first speaker and the audio signal from the right channel of the second speaker will cause a misalignment of the sound direction. Therefore, it is necessary to mix the stereo signal into a mono signal so that the sound signals from the top and bottom speakers are completely consistent, thus solving the problem of sound field misalignment. However, mixing into a mono signal will lose the left and right sound field information in the original stereo, reducing the stereo sound effect. The common signal can be highlighted by creating a deep sound field effect, which will draw the user's attention to the common signal and make up for the lack of stereo sound, thus indirectly achieving the stereo sound effect.

[0064] Step 420 performs a merging process on the left and right channel difference signals, combining them into a single mono signal. It also performs enhancement processing on the common signal, improving its sound quality within the audio signal and thus enhancing its clarity and high-frequency response. The resulting processed audio signal is then output to the first and second speakers respectively. Although the audio signals entering the left and right ears are identical, a stereo sound effect is achieved by creating a deep sound field. Specific processing methods can be found in the following embodiments, which will not be described here.

[0065] Step 430: When the terminal device is in the second state, perform signal cancellation processing on the left channel difference signal and the right channel difference signal respectively to obtain the processed left channel audio signal and the processed right channel audio signal. Output the processed left channel audio signal in the first speaker and the processed right channel audio signal in the second speaker.

[0066] Signal cancellation processing is used to reduce the impact of the processed right channel audio signal on the playback effect of the processed left channel audio signal, and to reduce the impact of the processed left channel audio signal on the playback effect of the processed right channel audio signal. The second state refers to the state in which the first speaker and the second speaker are located on the left and right sides of the terminal device, respectively.

[0067] The second state of the terminal device refers to the state where the first speaker and the second speaker are located on the left and right sides of the terminal device, respectively. In other words, the first speaker and the second speaker are positioned on the left and right sides of the terminal device from a visual perspective, depending on its orientation. For example, when the terminal device is in a vertical position, the first speaker and the second speaker are located on the sides corresponding to the length of the terminal device; when the terminal device is in a horizontal position, the first speaker and the second speaker are located on the sides corresponding to the width of the terminal device.

[0068] When the first speaker and the second speaker are located on the left and right sides of the terminal device, respectively, since the left channel audio signal emitted by the first speaker is heard by the user's right ear, and the right channel audio signal emitted by the second speaker is heard by the user's left ear, it is necessary to perform signal cancellation processing on the left channel difference signal and the right channel difference signal respectively to obtain the processed left channel audio signal and the processed right channel audio signal. The processed left channel audio signal is output through the first speaker, and the processed right channel audio signal is output through the second speaker.

[0069] Step 430 involves performing signal cancellation processing on the left and right channel difference signals respectively. This involves adding an inverted right channel difference signal to the left channel audio signal and an inverted left channel difference signal to the right channel audio signal. Additionally, enhancement processing is applied to the common signal to improve its sound quality within the audio signal, thereby enhancing the playback performance of both the processed left and right channel audio signals. Specific processing methods can be found in the following embodiments, which will not be described here.

[0070] By adding inverted right-channel difference signals to the processed left-channel audio signal and inverted left-channel difference signals to the processed right-channel audio signal, when the processed right-channel audio signal enters the left ear, the inverted right-channel difference signals in the left channel cancel out most of the right-channel difference signals entering the left ear. This reduces the impact of the processed right-channel audio signal on the playback quality of the processed left-channel audio signal. Similarly, when the processed left-channel audio signal enters the right ear, the inverted left-channel difference signals in the right channel cancel out most of the left-channel difference signals entering the right ear, further reducing the impact of the processed left-channel audio signal on the playback quality of the processed right-channel audio signal. This reduces signal crosstalk during sound wave propagation and improves stereo playback quality.

[0071] The technical solution provided in this application takes into account the distribution of the first speaker and the second speaker on the terminal device, as well as the placement state of the terminal device, so that different audio processing methods are adopted according to the visual positions of the first speaker and the second speaker on opposite sides.

[0072] When the first speaker and the second speaker are visually positioned at the top and bottom of the terminal device, respectively, the sound signals from the first speaker and the second speaker are made completely consistent, solving the problem of misaligned upper and lower sound fields. By performing enhancement processing on the common signal, a deep sound field effect is achieved to highlight the common signal, drawing the user's attention to the common signal and compensating for the lack of stereo sound, thus indirectly achieving a stereo sound playback effect.

[0073] When the first speaker and the second speaker are visually positioned on the left and right sides of the terminal device, respectively, signal cancellation processing is performed on the left channel difference signal and the right channel difference signal. The right channel difference signal entering the left channel is canceled, and the left channel difference signal entering the right channel is canceled, thereby reducing signal crosstalk that occurs during the propagation of the audio signal and improving the stereo playback effect.

[0074] In some embodiments, step 410 includes at least one of sub-steps 411 to 417.

[0075] Sub-step 411: Obtain the left channel audio signal and the right channel audio signal from the audio signal.

[0076] The audio signal includes a left channel audio signal and a right channel audio signal. The left channel audio signal is the audio signal that the terminal device originally intended to send to the first speaker for playback, and the right channel audio signal is the audio signal that the terminal device originally intended to send to the second speaker for playback. The left channel audio signal includes a left channel difference signal and a common signal, and the right channel audio signal includes a right channel difference signal and a common signal. Step 410 is used to extract the left channel difference signal, the right channel difference signal, and the common signal from the audio signal.

[0077] Sub-step 412: Divide the left channel audio signal and the right channel audio signal into frames to obtain n left channel audio frames and n right channel audio frames, where n is an integer greater than 1.

[0078] In some embodiments, the left channel audio signal and the right channel audio signal are divided into frames using a first frame length, with each first frame length being taken as an audio frame, resulting in n left channel audio frames and n right channel audio frames. This application does not limit the size of the first frame length, which can be determined according to the audio processing requirements in the actual application. For example, the first frame length can be 50ms.

[0079] In some embodiments, the left channel audio signal and the right channel audio signal are sampled at a first frequency, and an audio frame is taken every first number of sampling points, resulting in n left channel audio frames and n right channel audio frames. This application does not limit the magnitude of the first frequency and the first number, which can be determined according to the audio processing requirements in the actual application. For example, the first frequency can be 48kHz, and the first number can be 2400, meaning that an audio frame is taken every 2400 sampling points.

[0080] In some embodiments, when the left channel audio signal and the right channel audio signal are framed separately, frame shift is considered, that is, some overlapping data is retained between frames so that the audio signals of the n left channel audio frames and n right channel audio frames after the frame division are continuous.

[0081] For example, the frame shift can be set to a fixed time length, such as 10ms. Then, for both the left and right channel audio signals, a frame shift is initiated from the start of the first audio frame, and a frame is taken every 40ms, with each frame lasting 50ms. Alternatively, the frame shift can be set to a fixed number of sampling points, such as 100 sampling points. Then, for both the left and right channel audio signals, a frame shift is initiated from the start of the first audio frame, and a frame is taken every 2300 sampling points, with each frame representing an audio signal with 2400 sampling points.

[0082] Sub-step 413: Window the n left channel audio frames and the n right channel audio frames respectively to obtain n windowed left channel audio frames and n windowed right channel audio frames.

[0083] Since there are interruptions at the beginning and end of each audio frame after framing, the more frames there are, the greater the error with the original audio signal. Windowing can make the framed audio signal continuous and make the audio signal, which originally had no periodic characteristics, exhibit the characteristics of a periodic function.

[0084] Windowing refers to multiplying an audio frame by a window function. The window function can be any one of a cosine window, rectangular window, Hamming window, Blackman window, etc., and this application does not limit it.

[0085] In some embodiments, a cosine window can be selected to window at least one left channel audio frame and at least one right channel audio frame respectively. The window function of the cosine window is expressed as:

[0086] Where n = 1, 2, ..., N-1, N represents the total length of the window function, M represents the effective length of the window function, and 0 < β < 1. This application does not limit the value of β; for example, β = 0.35.

[0087] Window each of the n left channel audio frames to obtain n windowed left channel audio frames. Similarly, window each of the n right channel audio frames to obtain n windowed right channel audio frames. The windowed left and right channel audio frames are in real number form.

[0088] Sub-step 414: Perform fast real Fourier transform on the n windowed left channel audio frames and the n windowed right channel audio frames respectively to obtain n left channel frequency domain signals and n right channel frequency domain signals.

[0089] Perform a Fast Real Fourier Transform (FFT) on each of the n windowed left-channel and n windowed right-channel audio frames to transform them from the time domain to the frequency domain, resulting in n left-channel and n right-channel frequency domain signals. The n left-channel frequency domain signals are the frequency domain signals of the left-channel audio signals after the time-domain transformation, and the n right-channel frequency domain signals are the frequency domain signals of the right-channel audio signals after the time-domain transformation. Both the n left-channel and n right-channel frequency domain signals are in complex form.

[0090] By sequentially performing frame division, windowing, and fast real Fourier transform on the left and right channel audio signals in the audio signal, the audio signal is converted from the time domain to the frequency domain, resulting in a frequency domain signal with regular representation. This facilitates the extraction of the common frequency domain signal from the frequency domain signal, improves the accuracy of the common frequency domain signal, and reduces signal delay and signal error.

[0091] Sub-step 415: For the one-to-one corresponding left channel frequency domain signal and right channel frequency domain signal, calculate the frequency domain projection of the left channel frequency domain signal and the right channel frequency domain signal to obtain the common frequency domain signal.

[0092] There is a one-to-one correspondence between the n left channel frequency domain signals and the n right channel frequency domain signals. For the corresponding left channel frequency domain signals and right channel frequency domain signals, calculate the frequency domain projection of the left channel frequency domain signals and right channel frequency domain signals of each group to obtain the common frequency domain signal corresponding to the left channel frequency domain signals and right channel frequency domain signals of each group.

[0093] In some embodiments, the left channel frequency domain signal and the right channel frequency domain signal can be viewed as vectors. Each left channel frequency domain signal and the right channel frequency domain signal contains a frequency domain signal corresponding to at least one sampling point. Then, the vector corresponding to each frequency domain signal contains a vector component corresponding to at least one sampling point. The vector component corresponding to each sampling point contains the real part and the imaginary part of the frequency domain signal.

[0094] In some embodiments, the projection of the left channel frequency domain signal onto the right channel frequency domain signal is calculated to obtain the common frequency domain signal.

[0095] For example, the left channel frequency domain signal can be represented as LF, and the right channel frequency domain signal can be represented as RF. Then, the projection of the left channel frequency domain signal onto the right channel frequency domain signal can be represented as CF = |LF|*cosθ, where CF represents the common frequency domain signal, |LF| represents the magnitude of the vector corresponding to the left channel frequency domain signal, and θ is the angle between the vector corresponding to the left channel frequency domain signal and the vector corresponding to the right channel frequency domain signal.

[0096] In some embodiments, the projection of the right channel frequency domain signal onto the left channel frequency domain signal is calculated to obtain the common frequency domain signal.

[0097] For example, the projection of the right channel frequency domain signal onto the left channel frequency domain signal can be expressed as CF = |RF|*cosθ, where CF represents the common frequency domain signal, |RF| represents the magnitude of the vector corresponding to the right channel frequency domain signal, and θ is the angle between the vector corresponding to the left channel frequency domain signal and the vector corresponding to the right channel frequency domain signal.

[0098] Calculating the common frequency domain signal using the frequency domain projection method can reduce the computational difficulty and effectively ensure the accuracy of the common frequency domain signal.

[0099] Sub-step 416: For the i-th left channel frequency domain signal and the i-th right channel frequency domain signal, remove the i-th common frequency domain signal from the i-th left channel frequency domain signal to obtain the i-th left channel difference frequency domain signal, and remove the i-th common frequency domain signal from the i-th right channel frequency domain signal to obtain the i-th right channel difference frequency domain signal.

[0100] For example, the i-th left channel difference frequency domain signal can be represented as LF′ i =LF i -CF i Where "-" represents complex number subtraction, LF i CF represents the i-th left channel frequency domain signal. i This represents the i-th common frequency domain signal. The n left channel difference frequency domain signals are the frequency domain signals of the left channel difference signals.

[0101] For example, the i-th right channel difference frequency domain signal can be represented as RF′ i =RF i -CF i Where "-" indicates complex number subtraction, RF i CF represents the i-th right channel frequency domain signal. i This represents the i-th common frequency domain signal. The n right channel difference frequency domain signals are the frequency domain signals of the right channel difference signals.

[0102] Sub-step 417: Perform a fast inverse real Fourier transform on the i-th left channel difference frequency domain signal to obtain the i-th left channel difference audio frame; perform a fast inverse real Fourier transform on the i-th right channel difference frequency domain signal to obtain the i-th right channel difference audio frame; and perform a fast inverse real Fourier transform on the i-th common frequency domain signal to obtain the i-th common audio frame.

[0103] Perform a Fast Inverse Real Fourier Transform (FFT) on each of the n left-channel difference frequency domain signals to transform the frequency domain of the n left-channel difference frequency domain signals to the time domain, obtaining n left-channel difference audio frames. Perform a Fast Inverse Real Fourier Transform (FFT) on each of the n right-channel difference frequency domain signals to transform the frequency domain of the n left-channel difference frequency domain signals to the time domain, obtaining n right-channel difference audio frames. Perform a Fast Inverse Real Fourier Transform (FFT) on the n common frequency domain signals to transform the frequency domain of the n common frequency domain signals to the time domain, obtaining n common audio frames.

[0104] By splicing n left-channel difference audio frames, a left-channel difference signal is obtained; by splicing n right-channel difference audio frames, a right-channel difference signal is obtained; and by splicing n common audio frames, a common signal is obtained.

[0105] Figure 6 illustrates the complete process of extracting the left channel difference signal, right channel difference signal, and common signal from an audio signal. First, the left and right channel audio signals are acquired. Then, the left and right channel audio signals are framed separately, resulting in n left channel audio frames and n right channel audio frames. Next, each left and right channel audio frame is windowed, resulting in n windowed left and right channel audio frames. A Fast Real Fourier Transform is then performed on each windowed left and right channel audio frame to obtain the left and right channel frequency domain signals. For each corresponding left and right channel frequency domain signal, the frequency domain projection of the left and right channel frequency domain signals is calculated to obtain the common frequency domain signal. For each left channel frequency domain signal, remove the corresponding common frequency domain signal to obtain the left channel difference frequency domain signal; similarly, for each right channel frequency domain signal, remove the corresponding common frequency domain signal to obtain the right channel difference frequency domain signal. Perform a fast inverse real Fourier transform on the left channel difference frequency domain signal to obtain the left channel difference audio frame; perform a fast inverse real Fourier transform on the right channel difference frequency domain signal to obtain the right channel difference audio frame; and perform an inverse Fourier transform on the common frequency domain signal to obtain the common audio frame. Concatenate n left channel difference audio frames to obtain the left channel difference signal; concatenate n right channel difference audio frames to obtain the right channel difference signal; and concatenate n common audio frames to obtain the common signal.

[0106] In some embodiments, since the left channel difference signal includes n left channel difference audio frames, the right channel difference signal includes n right channel difference audio frames, and the common signal includes n common audio frames, step 420 includes at least one of substeps 421 to 424.

[0107] Sub-step 421: Merge the left channel difference audio frame and the right channel difference audio frame into a mono, and obtain a mono audio frame.

[0108] The left and right channel difference audio frames are merged into a single mono audio frame.

[0109] In some embodiments, left channel difference audio frames and right channel difference audio frames are superimposed to obtain a superimposed audio frame; the superimposed audio frame is delayed by a first duration to obtain a mono audio frame.

[0110] For example, the superimposed audio frame can be represented as B = L′ + R′, where B represents the superimposed audio frame, L′ represents the left channel difference audio frame, and R′ represents the right channel difference audio frame. Delaying the superimposed audio frame B by a first duration yields the mono audio frame B′.

[0111] This application does not limit the size of the first duration, which can be determined according to the audio processing needs in the actual application process. For example, the first duration can be 5ms.

[0112] By merging the left and right channel difference audio frames into a mono audio signal, the playback signals of the upper and lower speakers are made completely consistent, solving the problem of upper and lower sound field disorder. Furthermore, by delaying the superimposed audio frames by a first duration, the mono audio frames are made to lag behind the common signal. This allows the common signal to reach the human ear first without affecting the overall playback effect, making the user perceive the common signal as stronger. This weakens the defect of losing the left and right sound field information in the original stereo, achieving a deep sound field effect.

[0113] Sub-step 422: Perform enhancement processing on the shared audio frame to obtain the processed shared audio frame.

[0114] If a shared audio frame can be represented as C, then the processed shared audio frame can be represented as C′. The enhancement processing steps for the shared audio frame can be referred to in the following embodiments, which will not be described here.

[0115] Sub-step 423: Superimpose the mono audio frame and the processed common audio frame to obtain the processed audio frame.

[0116] For example, the processed audio frame can be represented as B′+C′, that is, the audio frame of the left channel can be represented as Lo=B′+C′, and the audio frame of the right channel can be represented as Ro=B′+C′. The audio frame Lo of the left channel is the audio frame corresponding to the audio signal of the left channel output by the first speaker, and the audio frame Ro of the right channel is the audio frame corresponding to the audio signal of the left channel output by the second speaker.

[0117] Sub-step 424: Concatenate n processed audio frames to obtain the processed audio signal.

[0118] By splicing the audio frames Lo from the left channel, we obtain the audio signal Lo′ of the left channel output from the first speaker. By splicing the audio frames Ro from the right channel, we obtain the audio signal Ro′ of the right channel output from the second speaker. The audio signals Lo′ of the left channel and Ro′ of the right channel are the same; both are processed audio signals.

[0119] By merging the left and right channel difference audio frames into a mono signal, the problem of disordered upper and lower sound fields is solved. However, the left and right sound field information in the original stereo is lost after being mixed into a mono signal. Therefore, the common audio frame is enhanced to highlight the common signal, so that the user's attention is shifted to the common signal, making up for the lack of stereo sound and indirectly achieving the stereo sound playback effect.

[0120] In some embodiments, sub-step 422 (enhancement process) includes at least one of steps A1 to A4.

[0121] Step A1: The common audio frame is filtered by a high-pass filter and a low-pass filter respectively to obtain a high-pass audio frame and a low-pass audio frame. The high-pass filter and the low-pass filter are power complementary filters.

[0122] The high-pass and low-pass filters are power complementary filters, and they have the same cutoff frequency. For example, the high-pass filter could be a fourth-order Butterworth filter, and the low-pass filter could be a Linkwitz-Riley filter.

[0123] The common audio frame is filtered by a high-pass filter to obtain a high-pass audio frame, and the common audio frame is filtered by a low-pass filter to obtain a low-pass audio frame. Here, the common audio frame can be represented as C, the high-pass audio frame as CH, and the low-pass audio frame as CL.

[0124] Step A2: Delay the high-pass audio frame for a second duration to obtain the delayed high-pass audio frame.

[0125] The high-pass audio frame CH is delayed by a second duration to obtain the delayed high-pass audio frame CH′. This application does not limit the size of the second duration, which can be determined according to the audio processing requirements in the actual application. For example, the second duration can be 1.5ms.

[0126] Step A3: Perform equalization processing on the delayed high-pass audio frame using an equalizer to obtain the processed high-pass audio frame.

[0127] Adjusting the frequency balance of sound using an equalizer can significantly improve sound quality, making music clearer and more vivid. Here, equalization processing is performed on the Qualcomm audio frame, which can improve the high-frequency sound effect, making the high-frequency part clearer and sharper, highlighting the high-frequency part in the overall sound.

[0128] The delayed high-pass audio frame CH′ is processed by an equalizer to obtain the processed high-pass audio frame CH″. The specific equalization process can be found in the following example.

[0129] Step A4: Superimpose the low-pass audio frame and the processed high-pass audio frame to obtain the processed shared audio frame.

[0130] For example, the processed shared audio frame can be represented as C′=CL+CH″.

[0131] By extending the high-pass audio frame in the shared audio frame by a second duration and performing equalization processing, the audio quality and playback effect of the high-pass audio frame in the shared audio frame are enhanced. This allows the human ear to perceive the processed high-pass audio frame more sensitively in the processed shared audio frame, enhancing the depth effect between the high-pass audio frame and the low-pass audio frame, thereby achieving a stereo sound effect.

[0132] In some embodiments, the equalizer includes K peak filters, the angular frequencies of which are K frequency points in the frequency range of the delayed high-pass audio frame. The K frequency points are the top K frequency points obtained by sorting each frequency point in the frequency range of the delayed high-pass audio frame in descending order of fidelity, where K is a positive integer.

[0133] The angular frequency corresponding to a peak filter is the frequency point in the frequency band of the peak filter that is close to the starting frequency within the frequency band. Generally, the angular frequency corresponding to a peak filter can be regarded as the starting frequency within the frequency band of the peak filter.

[0134] The frequency range of the delayed high-pass audio frame contains multiple frequency points. These frequency points are sorted in descending order of fidelity to obtain sorted frequency points. The fidelity of a frequency point indicates the similarity between the output signal and the input signal at that frequency point, and can be expressed as (actual frequency - measured frequency) / actual frequency. Then, the top K frequency points from the sorted frequency points are selected. These K frequency points represent the K frequency points with the lowest distortion within the frequency range of the delayed high-pass audio frame. Since the human ear is highly sensitive to frequency ranges with high fidelity, the K frequency ranges with high fidelity can be determined based on these K frequency points, and these K frequency ranges can then be enhanced.

[0135] This application does not limit the setting of the value of K; for example, K can be 3.

[0136] In some embodiments, step A3 includes at least one of sub-steps A31 to A32.

[0137] Sub-step A31: Based on the angular frequency and quality factor corresponding to the K peak filters, obtain the gain range corresponding to the K peak filters respectively. The quality factor is used to indicate the bandwidth of the peak filters.

[0138] The specific value of the quality factor is preset by technicians according to audio gain requirements, and this application does not limit this. The quality factor is used to indicate the bandwidth of the peak filter. The bandwidth of the peak filter is obtained based on the angular frequency corresponding to the peak filter and the quality factor of the peak filter. Based on the bandwidth of the peak filter, the corresponding gain range of the peak filter is obtained.

[0139] For example, the bandwidth of a peak filter equals the angular frequency of the peak filter divided by its quality factor. The gain interval of the peak filter is obtained by taking the angular frequency of the peak filter as the starting frequency and the bandwidth of the peak filter as the length of the gain interval.

[0140] For example, the angular frequencies corresponding to the K peak filters can be 1kHz, 2kHz and 3.5kHz respectively, and the quality factors corresponding to the K peak filters can be 1, 1.3 and 2.7 respectively. Then the gain ranges corresponding to the K peak filters are 1-2kHz, 2-3.5kHz and 3.5-4.8kHz respectively.

[0141] In some embodiments, a gain range can be preset according to audio gain requirements. For example, if the gain range is set to 1-3.7kHz, the bandwidth of the K peak filters can be adjusted by changing their respective quality factors based on their angular frequencies. This allows the adjusted K peak filters to obtain the desired gain range based on their own characteristics. For example, since the gain range is set to 1-3.7kHz, the gain ranges corresponding to the K peak filters are 1-2kHz, 2-3.5kHz, and 3.5-3.7kHz, respectively. Based on the angular frequencies corresponding to the K peak filters, their respective quality factors are 1, 1.3, and 17.2, respectively.

[0142] Sub-step A32: Using K peak-type filters, perform gain processing on the gain intervals corresponding to the K peak-type filters respectively using pre-set gain values ​​to obtain the processed high-pass audio frame.

[0143] The gain value is used to indicate the amplification factor of the audio signal within the gain range. This application does not limit the specific value of the gain value.

[0144] Optionally, the gain intervals corresponding to the K peak filters can each correspond to a single gain value. This single gain value is then used to enhance the signal in each of the gain intervals corresponding to the K peak filters, resulting in a boosted audio frame. Since no gain processing is performed on the audio signals other than the gain intervals in the delayed high-pass audio frame, the boosted audio frame and the other audio signals other than the gain intervals in the delayed high-pass audio frame are superimposed to obtain the processed high-pass audio frame. For example, the preset gain value can be 4dB.

[0145] Optionally, the gain intervals corresponding to the K peak filters can each correspond to different gain values. Therefore, gain processing is performed on the gain intervals corresponding to the K peak filters using the gain values ​​of the K peak filters, resulting in a gained audio frame. Since no gain processing is performed on the other audio signals in the delayed high-pass audio frame besides the gain intervals, the gained audio frame and the other audio signals in the delayed high-pass audio frame besides the gain intervals are superimposed to obtain the processed high-pass audio frame.

[0146] By employing K peak-type filters to perform gain processing on the gain range of the processed high-pass audio frame, the high-fidelity audio signal in the processed high-pass audio frame is amplified, enhancing the playback effect of the high-pass audio frame. This further enhances the depth effect between the high-pass audio frame and the low-pass audio frame, improving the stereo sound reproduction effect.

[0147] Figure 7 illustrates the generation process of the processed audio signal. For the left and right channel difference audio frames, they are first superimposed to obtain a superimposed audio frame. Then, the superimposed audio frame is delayed by a first duration to obtain a mono audio frame. For the common audio frame, a high-pass filter and a low-pass filter are used to filter the common audio frame to obtain a high-pass audio frame and a low-pass audio frame, respectively. Then, the high-pass audio frame is delayed by a second duration to obtain a delayed high-pass audio frame. An equalizer composed of K peak-type filters is used to perform equalization processing on the delayed high-pass audio frame to obtain a processed high-pass audio frame. The low-pass audio frame and the processed high-pass audio frame are superimposed to obtain the processed common audio frame. Then, the mono audio frame and the processed common audio frame are superimposed to obtain the processed audio frame. Multiple processed audio frames are spliced ​​together to obtain the processed audio signal.

[0148] In some embodiments, since the left channel difference signal includes n left channel difference audio frames, the right channel difference signal includes n right channel difference audio frames, and the common signal includes n common audio frames, step 430 includes at least one of substeps 431 to 434.

[0149] Sub-step 431: Perform enhancement processing on the shared audio frame to obtain the processed shared audio frame.

[0150] The enhancement processing steps performed on the shared audio frames can be referred to the embodiments corresponding to steps A1 to A4 above, and will not be repeated here.

[0151] Sub-step 432 performs signal cancellation processing on the left channel difference audio frame and the right channel difference audio frame respectively to obtain the processed left channel difference audio frame and the processed right channel difference audio frame.

[0152] The left channel difference audio frame can be represented as L′, and the processed left channel difference audio frame can be represented as L″. The right channel difference audio frame can be represented as R′, and the processed right channel difference audio frame can be represented as R″. The signal cancellation processing steps for the left and right channel difference audio frames can be referred to in the following embodiments, and will not be described here.

[0153] Sub-step 433: Superimpose the processed common audio frame and the processed left channel difference audio frame to obtain the processed left channel audio frame, and superimpose the processed common audio frame and the processed right channel difference audio frame to obtain the processed right channel audio frame.

[0154] For example, the processed left channel audio frame can be represented as Lc = L″ + C′, and the processed right channel audio frame can be represented as Rc = R″ + C′.

[0155] Sub-step 434: splice n processed left channel audio frames to obtain the processed left channel audio signal, and splice n processed right channel audio frames to obtain the processed right channel audio signal.

[0156] The left channel audio frame Lc is spliced ​​to obtain the processed left channel audio signal Lc′ output from the first speaker, and the right channel audio frame Rc is spliced ​​to obtain the processed right channel audio signal Rc′ output from the second speaker.

[0157] By performing signal cancellation processing on the left and right channel difference signals respectively, the right channel difference signal entering the left channel is canceled, and the left channel difference signal entering the right channel is canceled, thereby reducing signal crosstalk that occurs during the propagation of audio signals and improving the stereo playback effect.

[0158] In some embodiments, sub-step 432 (signal cancellation processing) includes at least one of steps B1 to B3.

[0159] Step B1: Perform phase inversion processing on the left channel difference audio frame and the right channel difference audio frame respectively to obtain the left channel phase inverted difference audio frame corresponding to the left channel difference audio frame and the right channel phase inverted difference audio frame corresponding to the right channel difference audio frame.

[0160] A left-channel phase-out difference audio frame is an audio frame whose phase is opposite to that of the audio signal in the left-channel difference audio frame, while other parameters remain unchanged. A right-channel phase-out difference audio frame is an audio frame whose phase is opposite to that of the audio signal in the right-channel difference audio frame, while other parameters remain unchanged. For example, a left-channel phase-out difference audio frame can be represented as -L′, and a right-channel phase-out difference audio frame can be represented as -R′, where "—" indicates phase reversal.

[0161] Step B2: Combine the left channel phase-inverted difference audio frame with the gain control factor to obtain the phase-inverted left channel difference audio frame, and combine the right channel phase-inverted difference audio frame with the gain control factor to obtain the phase-inverted right channel difference audio frame.

[0162] The gain control factor is used to control the signal strength of the left channel phase difference audio frame and the right channel phase difference audio frame. Since the sound signal travels a distance in the air when the speaker plays the sound, the sound signal is weakened when it reaches the human ear compared to the sound signal played from the speaker. Therefore, the gain control factor is a value between 0 and 1.

[0163] For example, if the gain control factor is denoted as G, then the left channel difference audio frame after phase inversion control can be denoted as G*(-L′), and the right channel difference audio frame after phase inversion control can be denoted as G*(-R′).

[0164] The value of the gain control factor is related to the physical distance between the first speaker and the second speaker. The greater the physical distance between the first speaker and the second speaker, the smaller the gain control factor; conversely, the smaller the physical distance between the first speaker and the second speaker, the larger the gain control factor. This application does not limit the specific value of the gain control factor; for example, the gain control factor can be set to 0.5.

[0165] Step B3: Superimpose the left channel difference audio frame and the right channel difference audio frame after phase inversion control to obtain the processed left channel difference audio frame, and superimpose the right channel difference audio frame and the left channel difference audio frame after phase inversion control to obtain the processed right channel difference audio frame.

[0166] For example, the processed left channel difference audio frame can be represented as L″=L′+G*(-R′), and the processed right channel difference audio frame can be represented as R″=R′+G*(-L′). Therefore, the processed left channel audio frame can be represented as Lc=L′+G*(-R′)+C′, and the processed right channel audio frame can be represented as Rc=R′+G*(-L′)+C′.

[0167] When the first and second speakers are actually playing audio signals, the processed left channel audio signal will enter the right ear, and the processed right channel audio signal will enter the left ear. The audio signal actually entering the right ear can be represented as Rc′+A*Lc′, and the audio signal actually entering the left ear can be represented as Lc′+A*Rc′. The value of A here is related to factors such as the distance between the user and the first and second speakers, the size of the user's head, and the air humidity. Therefore, for the audio signal actually entering the right ear, the left channel difference audio frame after phase inversion control cannot completely cancel out the left channel difference audio frame entering the right ear, and for the audio signal actually entering the left ear, the right channel difference audio frame after phase inversion control cannot completely cancel out the right channel difference audio frame entering the left ear.

[0168] Therefore, the technical solution provided in this application can only improve the stereo playback effect by controlling the size of the gain control factor to minimize the impact of the processed right channel audio signal on the playback effect of the processed left channel audio signal, and to minimize the impact of the processed left channel audio signal on the playback effect of the processed right channel audio signal.

[0169] Figure 8 illustrates the generation process of the processed left and right channel audio signals. For the left and right channel difference audio frames, phase inversion is performed on each, resulting in a phase-inverted left and right channel difference audio frame. The phase-inverted left channel difference audio frame is then combined with a gain control factor to obtain a phase-inverted left channel difference audio frame, and the same applies to the phase-inverted right channel difference audio frame. Finally, the left and right channel difference audio frames are superimposed to obtain the processed left channel difference audio frame, and vice versa. For a shared audio frame, high-pass and low-pass filters are applied to it to obtain high-pass and low-pass audio frames respectively. The high-pass audio frame is then delayed by a second duration to obtain a delayed high-pass audio frame. An equalizer composed of K peak-type filters is used to equalize the delayed high-pass audio frame to obtain a processed high-pass audio frame. The low-pass audio frame and the processed high-pass audio frame are then superimposed to obtain the processed shared audio frame. After obtaining the processed left-channel difference audio frame, the processed right-channel difference audio frame, and the processed shared audio frame, the processed shared audio frame and the processed left-channel difference audio frame are superimposed to obtain the processed left-channel audio frame, and the processed shared audio frame and the processed right-channel difference audio frame are superimposed to obtain the processed right-channel audio frame. Multiple processed left-channel audio frames are concatenated to obtain the processed left-channel audio signal, and multiple processed right-channel audio frames are concatenated to obtain the processed right-channel audio signal.

[0170] Please refer to Figure 9, which shows a flowchart of an audio processing method provided in another embodiment of this application. The execution entity of each step of this method can be a terminal device. The method may include at least one of the following steps 910-920:

[0171] Step 910: When the first speaker and the second speaker are located on the same side of the terminal device, acquire the left channel difference signal, the right channel difference signal and the common signal in the audio signal.

[0172] The first speaker and the second speaker are located on the same side of the terminal device. For example, the first speaker and the second speaker can be located on any one of the top, bottom, left, or right sides of the terminal device. The top, bottom, left, and right sides of the terminal device need to be considered in conjunction with the placement of the terminal device. The first speaker and the second speaker are located on the same side of the terminal device, but this application does not limit the specific side and position of the first speaker and the second speaker on the terminal device.

[0173] Step 920: Perform enhancement processing on the common signal to obtain the processed audio signal, and output the processed audio signal to the first speaker and the second speaker respectively.

[0174] Since the first and second speakers emit the left and right channel audio signals on the same side of the terminal device, the placement of the terminal device will not affect the audio signal playback effect, and the sound wave propagation path problem shown in Figure 3 will not occur. Therefore, the problem of stereo sound field disorder only needs to be solved.

[0175] The specific implementation process of step 920 can be referred to the above embodiments, and will not be repeated here.

[0176] By enhancing the common signal, a deeper sound field effect is achieved to highlight the common signal, drawing the user's attention to it and compensating for the lack of stereo sound, thus achieving a stereo sound playback effect.

[0177] The following are embodiments of the apparatus described in this application, which can be used to execute the embodiments of the method described in this application. For details not disclosed in the apparatus embodiments of this application, please refer to the embodiments of the method described in this application.

[0178] Please refer to Figure 10, which shows a block diagram of an audio processing apparatus provided in one embodiment of this application. This apparatus has the function of implementing the above-described audio processing method; the function can be implemented in hardware or by hardware executing corresponding software. This apparatus can be the terminal device described above, or it can be installed within a terminal device. As shown in Figure 10, the apparatus 1000 may include: a signal acquisition module 1010, a first processing module 1020, and a second processing module 1030.

[0179] The signal acquisition module 1010 is used to acquire a left channel difference signal, a right channel difference signal, and a common signal in an audio signal when the first speaker and the second speaker are respectively located on both sides of the terminal device. The left channel difference signal is a signal that exists alone in the left channel audio signal, the right channel difference signal is a signal that exists alone in the right channel audio signal, and the common signal is a signal that exists together in the left channel audio signal and the right channel audio signal.

[0180] The first processing module 1020 is configured to perform enhancement processing on the common signal when the terminal device is in a first state, to obtain a processed audio signal, and to output the processed audio signal in the first speaker and the second speaker respectively. The enhancement processing is used to improve the sound effect of the common signal in the audio signal. The first state refers to the state in which the first speaker and the second speaker are located on the upper side and the lower side of the terminal device respectively.

[0181] The second processing module 1030 is configured to perform signal cancellation processing on the left channel difference signal and the right channel difference signal respectively when the terminal device is in the second state, to obtain a processed left channel audio signal and a processed right channel audio signal, output the processed left channel audio signal to the first speaker, and output the processed right channel audio signal to the second speaker. The signal cancellation processing is used to reduce the influence of the processed right channel audio signal on the playback effect of the processed left channel audio signal, and to reduce the influence of the processed left channel audio signal on the playback effect of the processed right channel audio signal. The second state refers to the state in which the first speaker and the second speaker are located on the left and right sides of the terminal device, respectively.

[0182] In some embodiments, the left channel difference signal includes n left channel difference audio frames, the right channel difference signal includes n right channel difference audio frames, and the common signal includes n common audio frames, where n is an integer greater than 1; the first processing module 1020 is configured to:

[0183] The left channel difference audio frame and the right channel difference audio frame are merged into a mono to obtain a mono audio frame.

[0184] The shared audio frame is enhanced to obtain the processed shared audio frame;

[0185] The mono audio frame and the processed shared audio frame are superimposed to obtain the processed audio frame.

[0186] By splicing together n processed audio frames, the processed audio signal is obtained.

[0187] In some embodiments, the first processing module 1020 is configured to:

[0188] The left channel difference audio frame and the right channel difference audio frame are superimposed to obtain the superimposed audio frame;

[0189] The superimposed audio frame is delayed by a first duration to obtain the mono audio frame.

[0190] In some embodiments, the first processing module 1020 is configured to:

[0191] The common audio frame is filtered by a high-pass filter and a low-pass filter respectively to obtain a high-pass audio frame and a low-pass audio frame. The high-pass filter and the low-pass filter are power complementary filters.

[0192] The high-pass audio frame is delayed by a second duration to obtain the delayed high-pass audio frame;

[0193] The delayed high-pass audio frame is subjected to equalization processing by an equalizer to obtain the processed high-pass audio frame.

[0194] The low-pass audio frame and the processed high-pass audio frame are superimposed to obtain the processed common audio frame.

[0195] In some embodiments, the equalizer includes K peak-type filters, the angular frequencies corresponding to the K peak-type filters are K frequency points in the frequency range of the delayed high-pass audio frame, and the K frequency points are the K first-ordered frequency points obtained by sorting each frequency point in the frequency range of the delayed high-pass audio frame in descending order of fidelity, where K is a positive integer; the first processing module 1020 is used for:

[0196] Based on the angular frequency and quality factor corresponding to the K peak filters, the gain range corresponding to the K peak filters is obtained, and the quality factor is used to indicate the bandwidth of the peak filters;

[0197] By using the K peak-type filters, gain processing is performed on the gain intervals corresponding to the K peak-type filters respectively using a pre-set gain value, to obtain the processed high-pass audio frame.

[0198] In some embodiments, the left channel difference signal includes n left channel difference audio frames, the right channel difference signal includes n right channel difference audio frames, and the common signal includes n common audio frames, where n is an integer greater than 1; the second processing module 1030 is used for:

[0199] The shared audio frame is enhanced to obtain the processed shared audio frame;

[0200] Signal cancellation processing is performed on the left channel difference audio frame and the right channel difference audio frame respectively to obtain the processed left channel difference audio frame and the processed right channel difference audio frame.

[0201] The processed common audio frame and the processed left channel difference audio frame are superimposed to obtain the processed left channel audio frame, and the processed common audio frame and the processed right channel difference audio frame are superimposed to obtain the processed right channel audio frame.

[0202] By splicing together n processed left channel audio frames, the processed left channel audio signal is obtained; and by splicing together n processed right channel audio frames, the processed right channel audio signal is obtained.

[0203] In some embodiments, the second processing module 1030 is configured to:

[0204] Inverting processing is performed on the left channel difference audio frame and the right channel difference audio frame respectively to obtain the left channel inverted difference audio frame corresponding to the left channel difference audio frame and the right channel inverted difference audio frame corresponding to the right channel difference audio frame.

[0205] The left channel phase-inverted difference audio frame is combined with the gain control factor to obtain the phase-inverted left channel difference audio frame, and the right channel phase-inverted difference audio frame is combined with the gain control factor to obtain the phase-inverted right channel difference audio frame.

[0206] The left channel difference audio frame and the right channel difference audio frame after phase inversion are superimposed to obtain the processed left channel difference audio frame, and the right channel difference audio frame and the left channel difference audio frame after phase inversion are superimposed to obtain the processed right channel difference audio frame.

[0207] In some embodiments, the first processing module 1020 is configured to:

[0208] When the first speaker and the second speaker are located on the same side of the terminal device, the common signal is enhanced to obtain the processed audio signal, and the processed audio signal is output in the first speaker and the second speaker respectively.

[0209] It should be noted that the apparatus provided in the above embodiments is only illustrated by the division of the above functional modules when implementing its functions. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the content structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.

[0210] Please refer to Figure 11, which shows a structural block diagram of a terminal device 1100 provided in one embodiment of this application. The terminal device 1100 can be any electronic device with data computing, processing, and storage functions. The terminal device 1100 can be used to implement the audio processing method provided in the above embodiments.

[0211] Typically, terminal device 1100 includes a processor 1101 and a memory 1102.

[0212] Processor 1101 may include one or more processing cores, such as a quad-core processor, an octa-core processor, etc. Processor 1101 may be implemented using at least one hardware form selected from DSP (Digital Signal Processing), FPGA (Field Programmable Gate Array), and PLA (Programmable Logic Array). Processor 1101 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, processor 1101 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, processor 1101 may also include an AI (Artificial Intelligence) processor, which is used to handle computational operations related to machine learning.

[0213] The memory 1102 may include one or more computer-readable storage media, which may be non-transitory. The memory 1102 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In some embodiments, the non-transitory computer-readable storage media in the memory 1102 are used to store a computer program configured to be executed by one or more processors to implement the above-described audio processing method.

[0214] Those skilled in the art will understand that the structure shown in FIG11 does not constitute a limitation on the terminal device 1100, and may include more or fewer components than shown, or combine certain components, or use different component arrangements.

[0215] In an illustrative embodiment, a computer-readable storage medium is also provided, wherein a computer program is stored in the storage medium, and the computer program implements the above-described audio processing method when executed by the processor of a terminal device. Optionally, the above-described computer-readable storage medium may be ROM (Read-Only Memory), RAM (Random Access Memory), CD-ROM (Compact Disc Read-Only Memory), magnetic tape, floppy disk, and optical data storage device, etc.

[0216] In an exemplary embodiment, a computer program product is also provided, comprising a computer program stored in a computer-readable storage medium. A processor of a terminal device reads the computer program from the computer-readable storage medium and executes the computer program, causing the terminal device to perform the aforementioned audio processing method.

[0217] It should be understood that "multiple" as used herein refers to two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. Furthermore, the step numbers described herein are merely illustrative of one possible execution order. In some other embodiments, the steps may not be executed in numerical order, such as two steps with different numbers being executed simultaneously, or two steps with different numbers being executed in the reverse order of the illustration. This application does not limit this.

[0218] The above description is merely an exemplary embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. An audio processing method, the method being executed by a terminal device, the terminal device including a first speaker and a second speaker, the method comprising: With the first speaker and the second speaker located on opposite sides of the terminal device, the left channel difference signal, the right channel difference signal, and the common signal in the audio signal are acquired. The left channel difference signal is a signal that exists alone in the left channel audio signal, the right channel difference signal is a signal that exists alone in the right channel audio signal, and the common signal is a signal that exists together in the left channel audio signal and the right channel audio signal. When the terminal device is in the first state, the common signal is enhanced to obtain a processed audio signal. The processed audio signal is then output to the first speaker and the second speaker, respectively. The enhancement process is used to improve the sound effect of the common signal in the audio signal. The first state refers to the state in which the first speaker and the second speaker are located on the upper and lower sides of the terminal device, respectively. When the terminal device is in the second state, signal cancellation processing is performed on the left channel difference signal and the right channel difference signal respectively to obtain a processed left channel audio signal and a processed right channel audio signal. The processed left channel audio signal is output to the first speaker, and the processed right channel audio signal is output to the second speaker. The signal cancellation processing is used to reduce the influence of the processed right channel audio signal on the playback effect of the processed left channel audio signal, and to reduce the influence of the processed left channel audio signal on the playback effect of the processed right channel audio signal. The second state refers to the state in which the first speaker and the second speaker are located on the left and right sides of the terminal device, respectively.

2. The method according to claim 1, wherein, The left channel difference signal includes n left channel difference audio frames, the right channel difference signal includes n right channel difference audio frames, and the common signal includes n common audio frames, where n is an integer greater than 1. The enhancement processing of the shared signal to obtain the processed audio signal includes: The left channel difference audio frame and the right channel difference audio frame are merged into a mono to obtain a mono audio frame. The shared audio frame is enhanced to obtain the processed shared audio frame; The mono audio frame and the processed shared audio frame are superimposed to obtain the processed audio frame. By splicing together n processed audio frames, the processed audio signal is obtained.

3. The method according to claim 2, wherein, The step of merging the left channel difference audio frame and the right channel difference audio frame into a mono to obtain a mono audio frame includes: The left channel difference audio frame and the right channel difference audio frame are superimposed to obtain the superimposed audio frame; The superimposed audio frame is delayed by a first duration to obtain the mono audio frame.

4. The method according to claim 2 or 3, wherein, The enhancement processing of the shared audio frame to obtain the processed shared audio frame includes: The common audio frame is filtered by a high-pass filter and a low-pass filter respectively to obtain a high-pass audio frame and a low-pass audio frame. The high-pass filter and the low-pass filter are power complementary filters. The high-pass audio frame is delayed by a second duration to obtain the delayed high-pass audio frame; The delayed high-pass audio frame is subjected to equalization processing by an equalizer to obtain the processed high-pass audio frame. The low-pass audio frame and the processed high-pass audio frame are superimposed to obtain the processed common audio frame.

5. The method according to claim 4, wherein, The equalizer includes K peak filters, and the angular frequencies corresponding to the K peak filters are K frequency points in the frequency range of the delayed high-pass audio frame. The K frequency points are the K first-ordered frequency points obtained by sorting each frequency point in the frequency range of the delayed high-pass audio frame in descending order of fidelity, where K is a positive integer. The step of performing equalization processing on the delayed high-pass audio frame using an equalizer to obtain a processed high-pass audio frame includes: Based on the angular frequency and quality factor corresponding to the K peak filters, the gain range corresponding to the K peak filters is obtained, and the quality factor is used to indicate the bandwidth of the peak filters; By using the K peak-type filters, gain processing is performed on the gain intervals corresponding to the K peak-type filters respectively using a pre-set gain value, to obtain the processed high-pass audio frame.

6. The method according to any one of claims 1 to 5, wherein, The left channel difference signal includes n left channel difference audio frames, the right channel difference signal includes n right channel difference audio frames, and the common signal includes n common audio frames, where n is an integer greater than 1. The step of performing signal cancellation processing on the left channel difference signal and the right channel difference signal respectively to obtain the processed left channel audio signal and the processed right channel audio signal includes: The shared audio frame is enhanced to obtain the processed shared audio frame; Signal cancellation processing is performed on the left channel difference audio frame and the right channel difference audio frame respectively to obtain the processed left channel difference audio frame and the processed right channel difference audio frame. The processed common audio frame and the processed left channel difference audio frame are superimposed to obtain the processed left channel audio frame, and the processed common audio frame and the processed right channel difference audio frame are superimposed to obtain the processed right channel audio frame. By splicing together n processed left channel audio frames, the processed left channel audio signal is obtained; and by splicing together n processed right channel audio frames, the processed right channel audio signal is obtained.

7. The method according to claim 6, wherein, The step of performing signal cancellation processing on the left channel difference audio frame and the right channel difference audio frame respectively to obtain the processed left channel difference audio frame and the processed right channel difference audio frame includes: Inverting processing is performed on the left channel difference audio frame and the right channel difference audio frame respectively to obtain the left channel inverted difference audio frame corresponding to the left channel difference audio frame and the right channel inverted difference audio frame corresponding to the right channel difference audio frame. The left channel phase-inverted difference audio frame is combined with the gain control factor to obtain the phase-inverted left channel difference audio frame, and the right channel phase-inverted difference audio frame is combined with the gain control factor to obtain the phase-inverted right channel difference audio frame. The left channel difference audio frame and the right channel difference audio frame after phase inversion are superimposed to obtain the processed left channel difference audio frame, and the right channel difference audio frame and the left channel difference audio frame after phase inversion are superimposed to obtain the processed right channel difference audio frame.

8. The method according to any one of claims 1 to 7, wherein, The method further includes: When the first speaker and the second speaker are located on the same side of the terminal device, the common signal is enhanced to obtain the processed audio signal, and the processed audio signal is output in the first speaker and the second speaker respectively.

9. An audio processing apparatus, the apparatus comprising: The signal acquisition module is used to acquire the left channel difference signal, right channel difference signal and common signal in the audio signal when the first speaker and the second speaker are respectively located on both sides of the terminal device. The left channel difference signal is a signal that exists alone in the left channel audio signal, the right channel difference signal is a signal that exists alone in the right channel audio signal, and the common signal is a signal that exists together in the left channel audio signal and the right channel audio signal. The first processing module is configured to perform enhancement processing on the common signal when the terminal device is in a first state, to obtain a processed audio signal, and to output the processed audio signal in the first speaker and the second speaker respectively. The enhancement processing is used to improve the sound effect of the common signal in the audio signal. The first state refers to the state in which the first speaker and the second speaker are located on the upper side and the lower side of the terminal device respectively. The second processing module is configured to perform signal cancellation processing on the left channel difference signal and the right channel difference signal respectively when the terminal device is in the second state, to obtain a processed left channel audio signal and a processed right channel audio signal, output the processed left channel audio signal to the first speaker, and output the processed right channel audio signal to the second speaker. The signal cancellation processing is used to reduce the influence of the processed right channel audio signal on the playback effect of the processed left channel audio signal, and to reduce the influence of the processed left channel audio signal on the playback effect of the processed right channel audio signal. The second state refers to the state in which the first speaker and the second speaker are located on the left and right sides of the terminal device, respectively.

10. A terminal device comprising a processor and a memory, the memory storing a computer program, the computer program being loaded and executed by the processor to implement the audio processing method as claimed in any one of claims 1 to 8.

11. A computer-readable storage medium storing a computer program, the computer program being loaded and executed by a processor to implement the audio processing method as claimed in any one of claims 1 to 8.

12. A computer program product comprising a computer program loaded and executed by a processor to implement the audio processing method as claimed in any one of claims 1 to 8.