Audio processing method, apparatus, device, medium, and program product

By setting up one-way and two-way audio processing modes in smart cars and utilizing microphone arrays and noise reduction and echo cancellation technologies, the problem of imperfect sound interaction between the inside and outside of smart cars has been solved, improving the user experience and interaction effect.

CN122245333APending Publication Date: 2026-06-19BEIJING XIAOMI MOBILE SOFTWARE CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING XIAOMI MOBILE SOFTWARE CO LTD
Filing Date
2026-03-18
Publication Date
2026-06-19

Smart Images

  • Figure CN122245333A_ABST
    Figure CN122245333A_ABST
Patent Text Reader

Abstract

This disclosure relates to an audio processing method, apparatus, device, medium, and program product, involving fields such as audio processing technology and artificial intelligence technology. The audio processing method includes: determining the operating mode of the device, which includes a first mode and a second mode. The first mode is a transparent mode for unidirectional sound transmission from the outside of the device to the inside, and the second mode is a transparent mode for bidirectional sound interaction between the outside and inside of the device. Based on the audio signal processing method corresponding to the operating mode, an initial audio signal is processed to determine the output signal. By identifying the two operating modes—unidirectional sound transmission and bidirectional sound interaction—and processing the audio inside and outside the device based on different operating modes, users can improve the privacy of the sound inside the device or the perception and interaction performance of the external sound environment by switching between different operating modes, thus enhancing the user experience.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present disclosure relates to the fields of audio processing technology, artificial intelligence technology, etc., and particularly relates to an audio processing method, apparatus, device, medium, and program product. Background Art

[0002] With the continuous improvement of users' demand for spatial privacy, in spatial scenarios such as intelligent vehicles, smart homes, smart conference rooms, and security monitoring rooms that have the need for internal and external sound interaction, the requirements for audio processing devices supporting the space are also gradually increasing. Summary of the Invention

[0003] The present disclosure provides an audio processing method, apparatus, device, medium, and program product.

[0004] According to the first aspect of the embodiments of the present disclosure, an audio processing method is provided. The audio processing method includes: Determine the working mode of the device. The working mode includes a first mode and a second mode. The first mode is a transparent mode in which the device transmits sound unidirectionally from the outside to the inside of the device, and the second mode is a transparent mode in which sound is interacted bidirectionally between the outside and the inside of the device; Process the initial audio signal according to the audio signal processing method corresponding to the working mode to determine the output signal.

[0005] In this embodiment, two working modes of unidirectional sound transmission and bidirectional sound interaction can be recognized, and the internal and external audio of the device is processed based on different working modes, which is beneficial for users to switch different working modes to improve the privacy inside the device or improve the perception and interaction performance with the outside of the device, and enhance the user experience.

[0006] In some embodiments of the present disclosure, in response to the working mode being the first mode, the corresponding audio signal processing method includes a noise reduction processing method; Among them, the processing the initial audio signal according to the audio signal processing method corresponding to the working mode to determine the output signal includes: Process the initial audio signal using the noise reduction processing method to determine the output signal; Among them, the initial audio signal is collected by a microphone array outside the device.

[0007] In this embodiment, through noise reduction processing, the probability of the user obtaining the target sound can be increased, and the clarity of the target sound can be improved, thereby enhancing the user experience.

[0008] In some embodiments of the present disclosure, in response to the working mode being the second mode, the corresponding audio signal processing method includes an echo cancellation processing method and a noise reduction processing method; The step of processing the initial audio signal according to the audio signal processing method corresponding to the working mode to determine the output signal includes: An echo cancellation process is used to eliminate the echo of the initial audio signal; wherein, the initial audio signal includes: an external audio signal acquired by an external microphone array of the device, or an internal audio signal acquired by an internal microphone array of the device; The initial audio signal after echo cancellation is processed using noise reduction techniques to determine the output signal.

[0009] In this embodiment, echo cancellation mechanism is used to optimize echo interference during two-way voice interaction between the device and the user, reducing the probability of closed-loop echo howling. Furthermore, noise reduction processing is used to reduce the impact of environmental noise. Compared with traditional direct dialogue, this greatly improves call quality and enables clear voice interaction between the device and the user.

[0010] In some embodiments of this disclosure, before processing the initial audio signal after echo cancellation using noise reduction processing, the method further includes: The initial audio signal, after echo cancellation, is subjected to noise addition processing to recover at least a portion of the noise.

[0011] In this embodiment, echo is restored by adding noise to improve listening comfort.

[0012] In some embodiments of this disclosure, the noise reduction process includes: The input signal is separated into a human voice signal and an ambient sound signal; wherein, when the working mode is the first mode, the input signal is the initial audio signal; or, when the working mode is the second mode, the input signal is the initial audio signal after echo cancellation. Obtain noise reduction parameters; The ratio of the human voice signal to the ambient sound signal is adjusted according to the noise reduction parameters; wherein, the output signal includes the human voice signal and the ambient sound signal after the adjustment.

[0013] In this embodiment, by separating human voice and ambient sound from the sound input signal, and then using noise reduction parameters to readjust the proportion of human voice and ambient sound, purposeful sound optimization adjustment is achieved to meet the user's need to acquire any target sound from human voice or ambient sound, thereby improving the user experience.

[0014] In some embodiments of this disclosure, the noise reduction process further includes: Determine the wind noise of the input signal; Based on the wind noise and the noise reduction parameters, a wind noise suppression gain is determined, which is used to adaptively suppress wind noise for the separated human voice signal and the ambient sound signal, respectively.

[0015] In this embodiment, wind noise suppression gain is used to track and adaptively suppress wind noise, thereby combining noise reduction with wind noise reduction to improve audio quality.

[0016] In some embodiments of this disclosure, the step of suppressing wind noise in the separated human voice signal and the ambient sound signal according to the wind noise includes: The signal-to-noise ratio and weighted signal-to-noise ratio are determined based on the human voice signal and the ambient sound signal. Based on the signal-to-noise ratio and the energy-weighted signal-to-noise ratio, determine the frame gain of the signal-to-noise ratio and the frame gain of the energy-weighted signal-to-noise ratio; The wind noise suppression gain is determined based on the minimum of the frame gain of the signal-to-noise ratio and the frame gain of the energy-weighted signal-to-noise ratio.

[0017] In this embodiment, the signal-to-noise ratio and energy-weighted signal-to-noise ratio of the current signal are determined by using the noise-reduced human voice and ambient sound signals to determine whether wind noise affects the output signal. When wind noise has an impact, wind noise suppression is performed, thereby achieving a combination of noise reduction and wind noise reduction to improve the quality of the output signal.

[0018] In some embodiments of this disclosure, the method further includes: Peak limiting processing is performed on the human voice signal after wind noise suppression and the ambient sound signal after wind noise suppression, respectively. The peak limiting processing is used to protect the peak integrity of the signal. The adjustment of the ratio of the human voice signal to the ambient sound signal includes: The ratio of the peak-limited human voice signal and the peak-limited ambient sound signal is adjusted, wherein the output signal is obtained by adjusting the ratio of the peak-limited human voice signal and the ambient sound signal based on the noise reduction parameters.

[0019] In this embodiment, peak limiting processing is used to reduce problems such as audio signal overload or popping, ensuring the integrity of signal transmission and improving the user experience.

[0020] In some embodiments of this disclosure, the echo cancellation processing method for eliminating the echo of the initial audio signal includes: Obtain the reference signal corresponding to the initial audio signal; The echo of the initial audio signal is eliminated based on the reference signal.

[0021] In this embodiment, the initial audio signal is used to eliminate echoes by using a reference signal, thereby improving the accuracy and efficiency of echo elimination.

[0022] In some embodiments of this disclosure, obtaining the reference signal corresponding to the initial audio signal includes: In the second mode, in response to the initial audio signal being an external audio signal of the device, a reference signal from outside the device is acquired.

[0023] In this embodiment, reference signals inside and outside the device can be acquired in real time, thereby effectively eliminating bidirectional echoes outside the device in different scenarios or environments and improving the accuracy of echo cancellation.

[0024] In some embodiments of this disclosure, obtaining the noise reduction parameters includes: Obtain the first noise reduction parameter corresponding to the user command, wherein the first noise reduction parameter is the noise reduction parameter corresponding to the user command, and the user command includes the noise reduction parameter input by the user; or, Obtain a second noise reduction parameter corresponding to a preset scene, wherein the preset scene is a scene identified based on the input signal, and there is a correspondence between the preset scene and the noise reduction parameter. The second noise reduction parameter is a noise reduction parameter corresponding to the preset scene, wherein the preset scene is a scene identified based on the input signal.

[0025] In this embodiment, user-defined noise reduction parameters can be implemented, thereby enabling noise reduction processing based on user-defined methods to better meet user needs and improve user experience; or the application scenario of noise reduction can be automatically identified, and noise reduction parameters can be applied to different application scenarios to meet different user needs and improve the flexibility of noise reduction.

[0026] In some embodiments of this disclosure, the preset scenario includes a first scenario and a second scenario, wherein the second noise reduction parameter corresponding to the first scenario is greater than the second noise reduction parameter corresponding to the second scenario, wherein the first scenario is a scenario in which the intensity values ​​of the human voice signal and the ambient sound signal are greater than a first threshold, and the second scenario is a scenario in which the intensity values ​​of the human voice signal and the ambient sound signal are less than a second threshold, wherein the second threshold is less than the first threshold.

[0027] In this embodiment, in the automatic scene recognition, different noise reduction parameters are preset for different scenes to achieve different proportions of human voice signals and ambient sound signals output, thereby improving the user's scene experience.

[0028] In some embodiments of this disclosure, the output signal y satisfies: y = s + (1-α)×n; In the formula, s is the human voice signal, n is the ambient sound signal, and α is the noise reduction parameter, α∈[0,1].

[0029] In this embodiment, the output ratio of ambient sound signals is controlled by noise reduction parameters, eliminating the need to adjust human voice signals and ensuring that relatively critical human voice signals can be completely preserved, thereby improving safety.

[0030] In some embodiments of this disclosure, the method further includes: In response to the operating mode being the first mode, the output signal is played using the speaker inside the device.

[0031] In this embodiment, in the one-way transparency mode, the output signal is played by the speaker inside the device, realizing the one-way transparency function of transmitting processed audio from outside the device to inside the device. The sound inside the device will not be transmitted out, thus improving the privacy of the sound protection inside the device and the comprehensiveness of the sound acquisition outside the device.

[0032] In some embodiments of this disclosure, the method further includes: In response to the second working mode, the output signal corresponding to the external audio signal of the device is played by the internal speaker of the device. The output signal corresponding to the internal audio signal of the device is played through an external speaker.

[0033] In this embodiment, in the two-way interactive transparency mode, the output signal may include the output signal corresponding to the external audio signal of the device and the output signal corresponding to the internal audio signal of the device. During the interaction, the two output signals are played by the corresponding internal or external speakers of the device, which improves the quality of the output audio and enhances the interactive experience.

[0034] According to a second aspect of the present disclosure, an audio processing apparatus is provided, the audio processing apparatus comprising: The determining module is used to determine the working mode of the device. The working mode includes a first mode and a second mode. The first mode is a transparent mode in which sound is transmitted unidirectionally from the outside of the device to the inside of the device, and the second mode is a transparent mode in which sound is transmitted bidirectionally between the outside of the device and the inside of the device. The processing module is used to process the initial audio signal according to the audio signal processing method corresponding to the working mode, and determine the output signal.

[0035] According to a third aspect of the present disclosure, an electronic device is provided, the electronic device comprising: An audio component, configured as an input signal and / or an output signal; A processor, which is signal-connected to the audio component; A memory for storing processor-executable instructions, the memory being signal-connected to the processor; The processor is configured to perform the audio processing method as described in the first aspect.

[0036] According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium is provided, wherein when instructions in the storage medium are executed by a processor of an electronic device, the electronic device is enabled to perform the audio processing method as described in the first aspect.

[0037] According to a fifth aspect of the present disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the audio processing method as described in the first aspect.

[0038] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description

[0039] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention.

[0040] Figure 1 This is a schematic diagram illustrating the composition of an audio processing system in an intelligent vehicle application scenario, according to an exemplary embodiment.

[0041] Figure 2 This is a flow chart of an audio processing method according to an exemplary embodiment. Figure 1 .

[0042] Figure 3 This is a flow chart of an audio processing method according to an exemplary embodiment. Figure 2 .

[0043] Figure 4 This is a flow chart of an audio processing method according to an exemplary embodiment. Figure 3 .

[0044] Figure 5 This is a flow chart of an audio processing method according to an exemplary embodiment. Figure 4 .

[0045] Figure 6 This is a flow chart of an audio processing method according to an exemplary embodiment. Figure 5 .

[0046] Figure 7 This is a schematic diagram of a one-way transparent algorithm framework for an audio processing method for an intelligent vehicle, according to an exemplary embodiment.

[0047] Figure 8 This is a schematic diagram of a two-way transparent algorithm framework for an audio processing method for an intelligent vehicle, according to an exemplary embodiment.

[0048] Figure 9 This is a block diagram of an audio processing apparatus according to an exemplary embodiment.

[0049] Figure 10 This is a block diagram of an electronic device according to an exemplary embodiment.

[0050] Figure 11 This is a schematic diagram illustrating a noise reduction process according to an exemplary embodiment. Detailed Implementation

[0051] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatuses and methods consistent with some aspects of the invention as detailed in the appended claims.

[0052] In spaces with internal and external sound interaction needs, such as smart cars, smart homes, smart conference rooms, and security monitoring rooms, users' demands for such interaction are constantly increasing. For example, as users' requirements for the comfort and safety of the in-car environment become increasingly higher, smart cars need to be more intelligent in terms of internal and external sound interaction compared to traditional cars. Traditional cars, while maintaining a quiet interior, isolate external sounds, forcing users to rely on mechanical methods like opening doors or windows to perceive external sounds or interact with the outside world, resulting in a poor user experience. The demands of smart cars for internal and external sound interaction are mainly reflected in the following aspects: First, the interaction range extends from inside the car to outside, such as evolving from single-modal voice control inside the car to external voice control and voice interaction between the inside and outside of the car; second, multimodal fusion interaction, such as evolving from single voice control to fusion interaction of voice and gesture, voice and facial recognition; third, the acoustic experience is upgraded to be personalized and immersive, such as enjoying independent sound zones, adaptive audio adjustment, and advanced noise reduction features inside the car. Therefore, smart cars need to balance the quietness of the car interior with the intelligence of the interaction, but the audio control of internal and external sounds in existing smart cars is not perfect enough, which reduces the user experience.

[0053] Based on this, this disclosure provides an audio processing method, apparatus, device, medium, and program product. By setting a first mode for unidirectional sound transmission and a second mode for bidirectional interactive sound, the acquired initial audio signal can be processed in a corresponding manner according to different working modes. This allows users to switch between different working modes to achieve privacy of sound within the device or full perception of sound outside the device, thereby enhancing the user experience.

[0054] The electronic devices or equipment using the above-described audio processing method can be smart cars, smart homes, smart conference rooms, security monitoring rooms, etc. This disclosure uses a smart car as an example for description; other devices can refer to the implementation methods in the following embodiments.

[0055] Taking intelligent vehicles as an example, such as Figure 1 The diagram shown illustrates the structural composition of an audio processing system in a possible intelligent vehicle application scenario. This audio processing system includes at least the following components: Microphone 11 includes a microphone installed inside the vehicle and a microphone installed outside the vehicle. The microphone inside the vehicle can be a microphone array, such as a microphone array installed on the center console, configured to collect sound from inside the vehicle to generate an initial audio signal (or raw audio signal). The microphone outside the vehicle can be a microphone array, such as one installed on each of the side mirrors and at least one of the front and rear of the vehicle, configured to collect sound from outside the vehicle to generate an initial audio signal.

[0056] The speaker 12 includes speakers installed inside the vehicle and speakers installed outside the vehicle. The speakers inside the vehicle can be speaker arrays, such as speaker arrays installed on the inside of the vehicle doors, configured to play an output signal generated after processing an initial audio signal from outside the vehicle. The speakers outside the vehicle can also be speaker arrays, such as speaker arrays installed at the front of the vehicle, configured to play an output signal generated after processing an initial audio signal from inside the vehicle.

[0057] The audio processor 13 is configured to receive an initial audio signal collected by a microphone, process the initial audio signal to obtain an output signal, and send the output signal to a speaker.

[0058] In one exemplary embodiment, an audio processing method is provided, applied to a device or electronic device, with reference to... Figure 2 As shown, the method includes the following steps S110 to S120: S110. Determine the operating mode of the equipment.

[0059] In step S110, the device's processor can identify the operating mode. The operating modes include a first mode and a second mode. The first mode is a transparent mode for unidirectional sound transmission from the outside of the device to the inside of the device, and the second mode is a transparent mode for bidirectional sound interaction between the outside of the device and the inside of the device.

[0060] In step S110, the operating mode can be manually adjusted by the user; for example, the user can indicate to the device, based on voice commands, screen touch commands, operation commands, etc., that the selected operating mode is the first mode or the second mode.

[0061] In step S110, the operating mode can be automatically optimized by the system. The device's control system or processor identifies the current scenario of the device and determines the operating mode. For example, if the device is in a parking lot or temporary parking scenario, the operating mode can be the first mode; if the device is in a scenario requiring communication, the operating mode can be the second mode. The current scenario of the device can be identified based on the sound signals collected by the microphone array inside and outside the device.

[0062] S120. Process the initial audio signal according to the audio signal processing method corresponding to the working mode, and determine the output signal.

[0063] In step S120, the initial audio signal can be the raw audio signal collected by the microphone array inside and outside the device.

[0064] In the first mode, the unidirectional + initial audio signal is the sound signal acquired by the external microphone array.

[0065] In the second mode, when bidirectional + external to internal, the initial audio signal is the sound signal collected by the external microphone array of the device; when internal to external, the initial audio signal is the sound signal collected by the internal microphone array of the device.

[0066] In step S120, the audio signal processing methods corresponding to different working modes may be different or partially the same, so that the initial audio signal can be processed in the corresponding way under different working modes.

[0067] In step S120, when the operating mode is the first mode, the audio signal processing method corresponding to the first mode may include noise reduction processing. The initial audio signal is processed using the corresponding audio signal processing method to determine the output signal. For example, the audio processing controls the initial audio signal outside the vehicle so that the user can receive the sound outside the vehicle inside the vehicle, but the sound inside the vehicle will not be transmitted to the outside, thus improving the privacy of the vehicle interior.

[0068] In step S120, when the operating mode is the second mode, the audio signal processing method corresponding to the second mode may include noise reduction processing and echo cancellation processing to process the initial audio signal and determine the output signal. For example, controlling the audio processing of the initial audio signal inside the vehicle transmits the sound inside the vehicle to the outside so that the user can receive the sound outside the vehicle inside the vehicle. At the same time, controlling the audio processing of the initial audio signal outside the vehicle allows the user to receive the sound outside the vehicle inside the vehicle, thus achieving clear interaction between the sound inside and outside the vehicle without opening the doors or windows, improving the convenience and safety of the interaction.

[0069] For example, when a user is inside the car and selects the first mode, they can hear sounds outside the vehicle, including pedestrian conversations, passing vehicles, and alarms, without transmitting sounds from inside the car. This preserves the privacy of the in-car environment while providing an immersive experience of ambient sound throughout the entire scene outside the vehicle. It achieves one-way transparent sound transmission from outside to inside, ensuring that external dynamics are captured in real time, thus improving driving and parking safety. When the user selects the second mode, they can engage in seamless voice interaction with the outside world. For example, when talking to pedestrians, asking for directions, or making payments, they can interact clearly without opening doors or windows, unaffected by wind, rain, dust, or other environmental factors, effectively avoiding safety hazards caused by opening windows.

[0070] For example, in a low-speed parking scenario within a residential area, the vehicle is traveling at a low speed (≤5km / h) with the windows closed. The user selects the first mode, or the system determines the first mode by recognizing the external scene. In this mode, the device activates the one-way transmission function, clearly transmitting footsteps, shouts, and children's playful sounds outside the vehicle to avoid collisions due to blind spots. At the same time, because the environment is relatively quiet, the transparency level is adjusted to a medium-high level to preserve details such as birdsong and ambient sounds, enhancing the user's perception.

[0071] For example, in a temporary parking and rest scenario, the user closes the car windows to rest. The user selects the first mode or the system determines the first mode by recognizing the external scene. In this mode, the device activates the Quiet + Environmental Awareness function, and the system transmits low-intensity ambient sounds (such as birdsong in a park or the sound of a breeze) to create an immersive experience. At the same time, if the ambient noise is strong, the user can adjust the noise reduction level to achieve high-intensity noise reduction to filter traffic and crowd noise, ensuring a quiet resting environment and preventing conversations inside the car from being leaked out.

[0072] For example, in temporary roadside communication scenarios (such as asking for directions or picking up a package), after the vehicle stops, users can select the second mode or the system can identify the second mode by recognizing the external scene. In this mode, the device enables two-way interaction, and users do not need to get out of the car or open the window. People inside and outside the car can have a clear conversation. At the same time, based on the intensity of the external ambient sound, roadside traffic noise is filtered to ensure smooth communication.

[0073] For example, the operating mode of the device may also include a third mode, which is configured as a transparency mode that transmits sound unidirectionally from inside the device to the outside.

[0074] By employing the aforementioned audio processing method and setting up two working modes—one-way sound transmission and two-way interactive sound—effective control over audio transmission inside and outside the device can be achieved. This allows users to switch between different working modes to ensure the privacy of sound inside the device or to fully perceive the sound environment outside the device, thereby enhancing the user experience.

[0075] In one exemplary embodiment, an audio processing method is provided, applied to a device or electronic device, such as... Figure 3 As shown, the method includes the following steps S210 and S220: S210. Determine the operating mode of the equipment as the first mode.

[0076] In step S210, the first mode is a transparent mode that transmits sound unidirectionally from the outside of the device to the inside of the device. The unidirectionally transmitted audio signal will not have an echo, so only the audio signal itself needs to be noise-reduced. That is, when the working mode of the device is determined to be the first mode, the audio signal processing method corresponding to the first mode is determined to include the noise reduction processing method.

[0077] The method for determining the working mode as the first mode in step S210 can be found in the description of step S110.

[0078] S220: The initial audio signal is processed using noise reduction methods to determine the output signal.

[0079] In step S220, the initial audio signal is acquired by an external microphone array and needs to be transmitted into the vehicle; it may contain various types of noise. A noise reduction process is used to reduce the noise in the initial audio signal acquired by the external microphone array to improve the sound quality of the output signal.

[0080] The following is an example illustrating an audio signal processing method in a first mode. For example... Figure 4 As shown, the method is described in steps S310 to S370 below: S310. Determine the operating mode of the equipment as the first mode.

[0081] The description of step S310 can be found in step S210 of the above embodiments.

[0082] In the above embodiment, step S220 may include the following steps S320 to S340.

[0083] S320: Separate the input signal into human voice signal and ambient sound signal.

[0084] In step S320, when the working mode is the first mode, the input signal is the initial audio signal, so as to perform noise reduction processing on the acquired initial audio signal.

[0085] For example, such as Figure 7 The diagram shows a unidirectional transparent algorithm framework for audio processing in a smart car. The input signal is collected from an external microphone, and an AI noise reduction model is used to separate the human voice signal from the noise signal (which can also be understood as the ambient sound signal).

[0086] S330, obtain noise reduction parameters.

[0087] In step S330, the noise reduction parameters are used to adjust the ratio of human voice signal to ambient sound signal after wind noise suppression. The noise reduction parameters can be fixed parameters specified by user instructions, noise reduction parameters automatically identified and matched by the audio processing device based on the input signal, or factory preset values, thereby satisfying both user-defined settings and intelligent matching in non-user-defined situations, and improving the flexibility of noise reduction parameter acquisition.

[0088] like Figure 7 As shown, the relative intensity of human voice signal and ambient sound signal is dynamically adjusted according to the noise reduction parameters that the user can adjust independently, and the adjusted output signal is obtained.

[0089] The noise reduction parameters in step S330 can be obtained in the following two ways: Method 1: S331. Obtain the first noise reduction parameter corresponding to the user command.

[0090] In step S331, the first noise reduction parameter is the noise reduction parameter corresponding to the user command, which includes the noise reduction parameter input by the user. For example, if the user wants to hear only human voices, adjusting the first noise reduction parameter can filter out all ambient sounds, ensuring that only human voices from outside the device are played. Alternatively, if the user wants to hear human voices while retaining a small portion of ambient sound, they can also adjust the first noise reduction parameter to allow ambient sound to serve as background noise for human voices, improving the comprehensiveness of the user's perception of external sounds and the realism of the auditory experience.

[0091] Method 2: S332, Obtain the second noise reduction parameter corresponding to the preset scene.

[0092] In step S332, the second noise reduction parameter is a noise reduction parameter corresponding to a preset scene. The preset scene is a scene identified based on the input signal, and there is a correspondence between the preset scene and the noise reduction parameter.

[0093] In step S332, the preset scenarios include a first scenario and a second scenario. The first scenario is a scenario where the human voice signal and ambient sound signal are greater than a first threshold, and the second scenario is a scenario where the human voice signal and ambient sound signal are less than a second threshold. The second threshold is less than the first threshold, and the noise reduction parameter corresponding to the first scenario is greater than the noise reduction parameter corresponding to the second scenario. For example, when a noisy environment is detected by the input signal, it is determined that the current scenario is the first scenario. At this time, most of the noise needs to be filtered out, that is, the noise reduction parameter is increased to reduce the ambient sound signal. As another example, when a quiet environment is detected by the input signal, such as a natural environment like a country road, it is determined that the current scenario is the second scenario. At this time, the ambient sound can be retained, that is, the noise reduction parameter is decreased to create an immersive, fully perceptive environment and improve the user experience.

[0094] Method one for obtaining noise reduction parameters allows users to customize noise reduction parameters, enabling noise reduction processing based on user-defined methods, which better meets user needs and improves user experience. Method two for obtaining noise reduction parameters can automatically identify the application scenario of noise reduction and apply noise reduction parameters according to different application scenarios to meet different user needs and improve the flexibility of noise reduction.

[0095] It should be understood that when the device acquires noise reduction parameters corresponding to user commands, the first noise reduction parameter has a higher priority than the second noise reduction parameter. This ensures that the device prioritizes meeting the user's autonomous adjustment needs, improving usability. When the user command indicates that the autonomous noise reduction parameter adjustment function is disabled, the device uses the second noise reduction parameter to identify scene information based on the acquired input information, such as the sound intensity and / or timbre complexity of the input signal. Then, based on the identified scene information, the device dynamically adjusts the noise reduction parameters, improving the level of intelligence. This embodiment can determine the ambient noise intensity based on the input signal, supporting personalized pass-through intensity adjustment according to user preferences. This allows users within the device to be undisturbed by noise in noisy scenes and to enjoy an immersive environment in quiet scenes, enhancing the user's intelligent experience.

[0096] S340. Adjust the ratio of human voice signal to ambient sound signal according to the noise reduction parameters.

[0097] In step S340, the output signal includes the human voice signal and the ambient sound signal after the ratio is adjusted.

[0098] By adjusting the noise reduction parameters to control the proportions of human voice and ambient sound signals in the output signal—for example, by making the human voice signal greater than the ambient sound signal to make the human voice clearer, or by making the ambient sound signal greater than the human voice signal to make the ambient sound clearer—users can selectively receive human voice or ambient sound according to their own needs.

[0099] For example, the ratio of human voice signal to ambient sound signal is adjusted using the following formula: y = s + (1-α)×n In the formula, y is the output signal, s is the human voice signal, n is the ambient sound signal, and α is the noise reduction parameter, α∈[0,1].

[0100] Optionally, when α=0, the output signal y = s + n, meaning the output signal completely retains both human voice and ambient sound signals. When α=1, the output signal y = s, meaning the output signal is only the human voice signal. When 0 < α < 1, the output signal retains both human voice and ambient sound signals. By retaining a portion of the ambient sound signal and adjusting its strength using noise reduction parameters, the human voice signal is not masked by the ambient sound signal, making the output signal more realistic. The retained ambient sound also makes the output audio sound more natural. By separating human voice and ambient sound in the input sound signal and then readjusting the proportion of human voice and ambient sound using noise reduction parameters, targeted sound optimization can be achieved, meeting the user's need to acquire any target sound from either human voice or ambient sound, thus improving the user experience.

[0101] S350, Determine the wind noise of the input signal.

[0102] In step S350, wind noise differs from ambient sound and is considered harmful noise that affects audio quality. Suppressing wind noise helps improve the clarity of other sounds. Since wind noise has a unique distribution pattern, it can be extracted using model analysis before suppression. For example, by analyzing time-frequency domain features, the characteristics of wind noise can be determined, and wind noise can be identified from the input signal. This can be achieved, for instance, by using multi-microphone signal correlation analysis and extracting wind noise based on spectral features, thereby enabling accurate detection and extraction of wind noise from the input signal.

[0103] like Figure 7 As shown, signal diagnosis (i.e. wind noise detection) is performed on the mic signal (input signal) collected by the external microphone to determine wind noise.

[0104] S360. Determine the wind noise suppression gain based on wind noise and noise reduction parameters.

[0105] In step S360, the wind noise suppression gain is used to adaptively suppress wind noise on the separated human voice signal and ambient sound signal. Through the linkage of real-time signal-to-noise ratio estimation and wind noise detection, adaptive wind noise suppression is achieved.

[0106] In step S360, the wind noise determined in step S350 can be subtracted from the separated human voice signal and ambient sound signal respectively to achieve wind noise suppression, improve audio quality, and reduce the listening discomfort caused by wind noise.

[0107] like Figure 7As shown, after separating the human voice signal and noise signal (which can also be understood as the ambient sound signal) of the input signal using an AI noise reduction model, wind noise is used to suppress the separated human voice signal and ambient sound signal.

[0108] Optionally, in addition to directly subtracting wind noise, a real-time energy-weighted signal-to-noise ratio estimation method can be combined to achieve adaptive suppression of wind noise intensity. In step S360, the wind noise suppression gain is determined based on the wind noise and noise reduction parameters, including the following steps S361 to S363: S361. Determine the signal-to-noise ratio and energy-weighted signal-to-noise ratio based on the human voice signal and the ambient sound signal.

[0109] In step S361, assuming the frequency point of the human voice signal and the ambient sound signal is k, and the number of frequency points is K, then the human voice power corresponding to the human voice signal and the ambient sound signal is respectively... and ambient sound power They are respectively:

[0110]

[0111] In the formula, This is the denoising mask output by the denoising model. The denoising parameters can be converted into a denoising mask using the denoising model. Frequency domain signals collected by the microphone.

[0112] According to human voice power and ambient sound power Determine the signal-to-noise ratio :

[0113] In the formula, To prevent constants with a denominator of 0.

[0114] Energy-weighted signal-to-noise ratio for:

[0115] In the formula, As an energy weighting factor, .

[0116] Optionally, determine the signal-to-noise ratio. Energy-weighted signal-to-noise ratio Then, the signal-to-noise ratio can be smoothed using Minimum Control Recursive Averaging (MCRA):

[0117]

[0118] In the formula, The signal-to-noise ratio of the previous frame. The signal-to-noise ratio of the current frame. The energy-weighted signal-to-noise ratio of the previous frame. The energy-weighted signal-to-noise ratio of the current frame. The smoothing factor is adaptively selected based on the following variation range:

[0119] In the formula, is the threshold constant.

[0120] S362. Determine the frame gain of the signal-to-noise ratio and the frame gain of the energy-weighted signal-to-noise ratio based on the signal-to-noise ratio and the energy-weighted signal-to-noise ratio.

[0121] In step S362, using the smoothed energy-weighted signal-to-noise ratio (WSNR) and signal-to-noise ratio (SNR), two frame-level gains, namely the frame gain of the signal-to-noise ratio, are obtained through the Sigmoid function. Frame gain and energy-weighted signal-to-noise ratio :

[0122]

[0123] In the formula, The steepness parameter for the energy-weighted signal-to-noise ratio. The center point parameter for energy-weighted signal-to-noise ratio. The steepness parameter for the signal-to-noise ratio. The center point parameter represents the signal-to-noise ratio.

[0124] S363. Determine the wind noise suppression gain based on the minimum of the frame gain of the signal-to-noise ratio and the frame gain of the energy-weighted signal-to-noise ratio.

[0125] In step S363, the minimum value is taken as the wind noise suppression gain for the current frame:

[0126] In the formula, when the wind noise detection indicator is "no wind", it can be... Set to 1 (do not suppress).

[0127] When the wind noise detection indicates "winding", for example, when the wind noise is greater than the preset intensity, the effective wind noise state eff_wind is 1; otherwise, it is 0.

[0128] Alternatively, the following formula can be used for... Perform first-order IIR smoothing to obtain the smoothed frame gain. and utilize Implement lower limit protection:

[0129] In the formula, For example, the frame gain time-domain smoothing coefficient is used for frame-level modulation of the frequency domain mask of the speech path. .

[0130] When wind noise is present, frame gain modulation is only performed when the wind noise detection indicates "windy". The entire frame mask is uniformly scaled using smooth frame gain, and a lower limit for the frame gain of the speech path is set. To determine the frequency domain mask after frame gain modulation. :

[0131] Microphone frequency domain signal The final output is obtained by masking. :

[0132] In the formula, For the mask, Among them, the output signal It includes both human voices and ambient sounds, and processes the same audio signal with the same gain.

[0133] Reference Figure 11 As shown, the input signal is processed by wind noise detection and a noise reduction model. Wind noise detection determines the wind noise, and the noise reduction model determines the frequency domain mask. Then, based on the frequency domain mask and the human voice and ambient sound signals in the input signal, the signal-to-noise ratio (SNR) is estimated, such as the SNR and the energy-weighted SNR. The SNR estimate is then smoothed, for example, by MCRA smoothing. When effective wind noise is detected, such as when the wind noise is greater than a preset intensity, the input signal is amplified using the gain from the smoothing process. This allows the input signal to be dynamically adjusted according to the wind noise intensity, thereby achieving an effective combination of wind noise detection and noise reduction.

[0134] Optionally, before step S360, peak limiting processing can be performed on the human voice signal after wind noise suppression and the ambient sound signal after wind noise suppression, respectively.

[0135] Peak limiting processing is used to protect the peak integrity of the signal. For example, a limiter is used to limit the range of audio peak values ​​and dynamically compress the audio according to the intensity of the input signal to prevent audio distortion caused by excessive input signal, so as to adapt to the intensity changes of the input signal in different scenarios.

[0136] Therefore, adjusting the ratio of human voice signals to ambient sound signals includes: The ratio of the peak-limited human voice signal to the peak-limited ambient sound signal is adjusted. The output signal is obtained by adjusting the ratio of the peak-limited human voice signal to the ambient sound signal based on the noise reduction parameters.

[0137] S370 uses the built-in speaker to play the output signal.

[0138] In step S370, only the speaker inside the device is driven to play, ensuring that the sound inside the device will not be played outside the device, thus improving the spatial privacy in the first mode.

[0139] like Figure 7 As shown, the output signal from the external algorithm is sent to the in-vehicle speaker for playback.

[0140] In this embodiment, by setting the first mode, the focus is on the one-way optimized transmission of ambient sound outside the vehicle, supporting immersive environmental perception and improving the user experience.

[0141] In one exemplary embodiment, an audio processing method is provided, applied to a device or electronic device, such as... Figure 5 As shown, the method includes the following steps S410 to S430: S410. Determine the operating mode of the equipment as the second mode.

[0142] In step S410, the second mode is a transparent mode for bidirectional audio interaction between the outside and inside of the device. In this second mode, sound acquisition and playback are achieved both inside and outside the device. For example, an external microphone collects external sound, and an external speaker plays internal sound; simultaneously, an internal microphone collects internal sound, and an internal speaker plays external sound.

[0143] The method for determining the working mode as the second mode in step S410 can be found in the description of step S110.

[0144] S420 employs echo cancellation processing to eliminate the echo of the initial audio signal.

[0145] In step S420, the initial audio signal includes: an external audio signal acquired by an external microphone array, or an internal audio signal acquired by an internal microphone array. Since the second mode performs sound acquisition and playback both inside and outside the device, it is prone to echo phenomena. For example, if the external sound is mixed with the internal sound and then collected by the internal microphone, the external sound will be played back outwards, creating an echo that affects interaction. Furthermore, if the echo is repeatedly collected and played back, it will loop continuously, causing a howling effect and severely impacting normal interaction between the inside and outside of the device. Therefore, when the operating mode is the second mode, in addition to noise reduction processing of the initial audio signal, echo cancellation processing is also required. That is, the audio signal processing methods corresponding to the second mode include echo cancellation and noise reduction processing.

[0146] S430: The initial audio signal after echo cancellation is processed using noise reduction technology to determine the output signal.

[0147] The description of the noise reduction processing method in step S430 can be found in the description of step S220 in the above embodiments, the only difference being that the input signals for step S430 and step S220 are different. In the second mode, the device collects sound from within the device and excludes playback sounds belonging to the external part of the sound from within the device to eliminate echoes. Then, it performs noise reduction processing on the sound with eliminated echoes, thereby outputting clear sound from within the device to the outside. Simultaneously, the device collects sound from outside the device and excludes playback sounds belonging to the internal part of the sound from outside the device to eliminate echoes. Then, it performs noise reduction processing on the sound with eliminated echoes, thereby outputting clear sound from outside the device to the inside.

[0148] Optionally, after step S420 and before step S430, the method further includes the following step S440: S440, Noise is added to the initial audio signal after echo cancellation.

[0149] In step S440, noise addition processing is used to recover at least part of the noise.

[0150] Since echo cancellation processing is used in step S420, while eliminating the echo of the initial audio signal, the noise of the initial audio signal may also be eliminated. By adding noise processing, some of the noise before and after echo cancellation can be restored, such as adding some of the noise floor before and after echo cancellation. Without affecting the clarity of the audio signal, the listening comfort of the audio signal can be improved.

[0151] The following is an example illustrating an audio signal processing method in the second mode. For example... Figure 6 As shown, the method is described in steps S510 to S592 below: S510, Determine the operating mode of the equipment as the second mode.

[0152] The description of step S510 can be found in step S410 of the above embodiments.

[0153] Step S420 in the above embodiment may include the following steps S520, S521 and S530: S520: Obtain the reference signal corresponding to the initial audio signal.

[0154] In step S520, the reference signal corresponding to the initial audio signal includes the reference signal corresponding to the initial audio signal outside the device and / or the reference signal corresponding to the initial audio signal inside the device.

[0155] In step S520, a reference signal corresponding to the initial audio signal is determined based on the sound played by the speaker of the audio system. Optionally, an adaptive filtering algorithm is used on the reference information to establish a transfer function model between the speaker signal and the microphone-received echo. For example, an adaptive filtering algorithm using a multi-channel adaptive filter, combined with voice activity detection of the external reference signal, real-time parameter adjustment in a dynamic environment, and the sound played by the speaker, along with the echo sampling path, ensures effective echo cancellation even when people speak simultaneously inside and outside the device. In some embodiments, to avoid excessive noise suppression during echo cancellation that leads to a decrease in auditory comfort, this embodiment dynamically monitors and tracks noise intensity during echo cancellation processing, adaptively compensating for the corresponding noise component according to the noise level, making the audio output more natural and effectively improving the problems of abrupt changes in timbre and inconsistent listening experience before and after noise cancellation.

[0156] Step S520 may further include step S521: S521. In response to the initial audio signal being an external audio signal of the device, obtain a reference signal from outside the device.

[0157] In step S521, since external audio signals usually contain more types of sounds, it is more difficult to separate human voice signals and ambient sound signals as input signals. Therefore, when the initial audio signal acquired is an external audio signal, the output signal to the outside of the device can be used as a reference signal to the outside of the device. This reference signal can be used to eliminate echo interference from the external audio signal, thereby improving audio processing efficiency and accuracy.

[0158] S530. Based on the reference signal, eliminate the echo of the initial audio signal.

[0159] In step S530, based on an accurate reference signal, the echo path from the speaker inside the device to the microphone outside the device, as well as the echo path from the speaker outside the device to the microphone inside the device, can be eliminated simultaneously, eliminating bidirectional harmful noise, preventing howling, and ensuring that echoes can still be effectively eliminated when speaking simultaneously inside and outside the device.

[0160] like Figure 8 The diagram illustrates a bidirectional transparency algorithm framework for audio processing in a smart car. The system collects mic signals from the in-vehicle microphones to determine the initial audio signal inside the vehicle. A reference signal (ref signal) is then determined using the output signal played by the in-vehicle speakers. Echo cancellation is then performed on the initial audio signal inside the vehicle based on the reference signal. Similarly, the system collects mic signals from the external microphones to determine the initial audio signal outside the vehicle. A reference signal (ref signal) is then determined using the output signal played by the external speakers. Echo cancellation is then performed on the initial audio signal outside the vehicle based on the external reference signal.

[0161] In this embodiment, step S430 may include the following steps S540 to S580: S540 separates the input signal into human voice signal and ambient sound signal.

[0162] like Figure 8 As shown, the echo-cancelled input signal is separated into human voice signal (human voice) and ambient sound signal (noise) by the AI ​​noise reduction model.

[0163] S550, obtain noise reduction parameters.

[0164] The method for obtaining the noise reduction parameters in step S550 also includes the following methods S551 and S552: S551, Obtain the first noise reduction parameter corresponding to the user command.

[0165] S552, Obtain the second noise reduction parameter corresponding to the preset scene.

[0166] S560: Adjust the ratio of human voice signal to ambient sound signal according to the noise reduction parameters.

[0167] S570, Determine the wind noise of the input signal.

[0168] like Figure 8 As shown, the wind noise of the input signal inside the vehicle is determined by the mic signal collected by the microphone inside the vehicle, and the wind noise of the input signal outside the vehicle is determined by the mic signal collected by the microphone outside the vehicle.

[0169] S580. Determine the wind noise suppression gain based on wind noise and noise reduction parameters.

[0170] like Figure 8As shown, the in-vehicle human voice signal and the in-vehicle ambient sound signal are separated by wind noise suppression of the in-vehicle input signal, and the outside human voice signal and the outside ambient sound signal are separated by wind noise suppression of the outside input signal.

[0171] For a detailed description of steps S540 to S580, please refer to the description of steps S320 to S360 in the above embodiment. The only difference is that the input signal is different. In this embodiment, the input signal is the internal audio signal of the device after echo cancellation, or the external audio signal of the device after echo cancellation.

[0172] S591. The output signal corresponding to the external audio signal of the device is played by the internal speaker of the device.

[0173] In step S591, the device sends the output signal corresponding to the external audio signal after echo + noise reduction processing to the speaker inside the device, so that the device can clearly obtain the external sound according to actual needs.

[0174] like Figure 8 As shown, the output signal from the external algorithm is sent to the in-vehicle speaker for playback.

[0175] S592. The output signal corresponding to the internal audio signal of the device is played through an external speaker.

[0176] In step S592, the device sends the output signal corresponding to the internal audio signal after echo + noise reduction processing to the external speaker so that the external speaker can clearly obtain the internal sound of the device.

[0177] like Figure 8 As shown, the output signal from the in-vehicle algorithm is sent to the external speaker for playback.

[0178] In steps S591 and S592, by distinguishing between playback from speakers inside and outside the device, the accuracy of sound transmission direction is ensured, interference during call interaction is reduced, and call clarity is improved. It should be understood that steps S591 and S592 are not sequential; both begin playback output in response to receiving the corresponding output signal, ensuring real-time interaction.

[0179] In this embodiment, the quality of voice interaction inside and outside the vehicle is optimized by setting the second mode to ensure the clarity of two-way calls.

[0180] In summary, the audio processing method disclosed herein meets the user's need to perceive the external environment of the vehicle without opening the door or window in inconvenient or unsafe scenarios, and can activate the interaction between the inside and outside of the vehicle when needed, thus achieving a balance between safety, privacy and quietness, perception of the external environment of the device, and human-to-human interaction between the inside and outside of the device for smart cars and other devices.

[0181] In one exemplary embodiment, an audio processing apparatus is provided, with reference to Figure 9 As shown, the audio processing device 20 includes: The determination module 21 is used to determine the working mode of the device. The working mode includes a first mode and a second mode. The first mode is a transparent mode that transmits sound unidirectionally from the outside of the device to the inside of the device, and the second mode is a transparent mode that allows two-way interaction of sound between the outside of the device and the inside of the device. Processing module 22 is used to process the initial audio signal according to the audio signal processing method corresponding to the working mode, and determine the output signal.

[0182] In this embodiment, by setting two working modes—one-way sound transmission and two-way interactive sound—effective control over audio transmission inside and outside the device can be achieved. This allows users to switch between different working modes to achieve privacy of sound inside the device or full perception of the sound environment outside the device, thereby enhancing the user experience.

[0183] In one embodiment, the processing module 22 is further configured to: in response to the working mode being the first mode, the corresponding audio signal processing method includes a noise reduction processing method; Specifically, the initial audio signal is processed according to the audio signal processing method corresponding to the working mode to determine the output signal, including: The initial audio signal is processed using noise reduction techniques to determine the output signal; The initial audio signal was acquired by an external microphone array.

[0184] In one embodiment, the processing module 22 is further configured to: in response to the working mode being the second mode, the corresponding audio signal processing method includes an echo cancellation processing method and a noise reduction processing method; Specifically, the initial audio signal is processed according to the audio signal processing method corresponding to the working mode to determine the output signal, including: An echo cancellation process is used to eliminate the echo of the initial audio signal; wherein the initial audio signal includes: external audio signal acquired by the external microphone array of the device, or internal audio signal acquired by the internal microphone array of the device; The initial audio signal after echo cancellation is processed using noise reduction techniques to determine the output signal.

[0185] In one embodiment, the processing module 22 is further configured to: The initial audio signal, after echo cancellation, is subjected to noise addition to recover at least part of the noise.

[0186] In one embodiment, the processing module 22 is further configured to: The input signal is separated into a human voice signal and an ambient sound signal; wherein, when the working mode is the first mode, the input signal is the initial audio signal; or, when the working mode is the second mode, the input signal is the initial audio signal after echo cancellation. Obtain noise reduction parameters; Based on the noise reduction parameters, the ratio of human voice signal to ambient sound signal is adjusted; the output signal includes the human voice signal and ambient sound signal after the adjustment.

[0187] In one embodiment, the processing module 22 is further configured to: Determine the wind noise of the input signal; Based on the wind noise and noise reduction parameters, the wind noise suppression gain is determined. The wind noise suppression gain is used to suppress wind noise in the separated human voice signal and ambient sound signal.

[0188] In one embodiment, the processing module 22 is further configured to: The signal-to-noise ratio and energy-weighted signal-to-noise ratio are determined based on human voice signals and ambient sound signals. Based on the signal-to-noise ratio (SNR) and the energy-weighted SNR, determine the frame gain of the SNR and the frame gain of the energy-weighted SNR; The wind noise suppression gain is determined based on the minimum of the frame gain of the signal-to-noise ratio and the frame gain of the energy-weighted signal-to-noise ratio.

[0189] In one embodiment, the processing module 22 is further configured to: Peak limiting processing is performed on the human voice signal after wind noise suppression and the ambient sound signal after wind noise suppression, respectively. Peak limiting processing is used to protect the peak integrity of the signal. Adjusting the ratio of human voice signals to ambient sound signals includes: The ratio of the peak-limited human voice signal to the peak-limited ambient sound signal is adjusted. The output signal is obtained by adjusting the ratio of the peak-limited human voice signal to the ambient sound signal based on the noise reduction parameters.

[0190] In one embodiment, the processing module 22 is further configured to: Obtain the reference signal corresponding to the initial audio signal; Based on the reference signal, the echo of the initial audio signal is eliminated.

[0191] In one embodiment, the processing module 22 is further configured to: In the second mode, in response to the initial audio signal being an external audio signal, a reference signal from outside the device is acquired.

[0192] In one embodiment, the processing module 22 is further configured to: Obtain the first noise reduction parameter corresponding to the user command; or, Obtain a second noise reduction parameter corresponding to a preset scene, wherein the preset scene is a scene identified based on the input signal, and there is a correspondence between the preset scene and the noise reduction parameter.

[0193] In one embodiment, the preset scenario includes a first scenario and a second scenario. The second noise reduction parameter corresponding to the first scenario is greater than the second noise reduction parameter corresponding to the second scenario. The first scenario is a scenario where the intensity values ​​of the human voice signal and the ambient sound signal are greater than a first threshold, and the second scenario is a scenario where the intensity values ​​of the human voice signal and the ambient sound signal are less than a second threshold. The second threshold is less than the first threshold.

[0194] In one embodiment, the processing module 22 is further configured to: The ratio of human voice signal to ambient sound signal is adjusted using the following formula: y = s + (1-α)×n; In the formula, y is the output signal, s is the human voice signal, n is the ambient sound signal, and α is the noise reduction parameter, α∈[0,1].

[0195] In one embodiment, the processing module 22 is further configured to: In response to the operating mode being the first mode, the output signal is played through the device's built-in speaker.

[0196] In one embodiment, the processing module is further configured to: In response to the second working mode, the output signal corresponding to the external audio signal of the device is played by the internal speaker of the device; The output signal corresponding to the internal audio signal of the device is played through an external speaker.

[0197] In one exemplary embodiment, an electronic device is provided, which may include, for example, a mobile phone, a tablet computer, a camera, or other smart device, and is capable of performing the audio processing method described above.

[0198] refer to Figure 10 As shown, the electronic device may include one or more of the following components: processing component 101, memory 102, power component 103, multimedia component 104, audio component 105, input / output (I / O) interface 106, sensor component 107, and communication component 108.

[0199] Processing component 101 typically controls the overall operation of an electronic device, such as operations associated with display, telephone calls, data communication, camera operation, and recording. Processing component 101 may include one or more processors 109 to execute instructions to perform all or part of the steps of the methods described above. Furthermore, processing component 101 may include one or more modules to facilitate interaction between processing component 101 and other components. For example, processing component 101 may include a multimedia module to facilitate interaction between multimedia component 104 and processing component 101.

[0200] Memory 102 is configured to store various types of data to support the operation of the electronic device. Examples of such data include instructions for any application or method used to operate on the electronic device, contact data, phonebook data, messages, pictures, videos, etc. Memory 102 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0201] Power component 103 provides power to various components of the electronic device. Power component 103 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the electronic device.

[0202] Multimedia component 104 includes a screen that provides an output interface between the electronic device and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touchscreen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of touch or swipe actions but also the duration and pressure associated with the touch or swipe operation. In some embodiments, multimedia component 104 includes a front-facing camera and / or a rear-facing camera. When the electronic device is in an operating mode, such as a shooting mode or a video mode, the front-facing camera and / or the rear-facing camera may receive external multimedia data. Each front-facing camera and rear-facing camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

[0203] Audio component 105 is configured to output and / or input audio signals. For example, audio component 105 includes a microphone (MIC) configured to receive external audio signals when the electronic device is in an operating mode, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 102 or transmitted via communication component 108. In some embodiments, audio component 105 also includes a speaker for outputting signals.

[0204] I / O interface 106 provides an interface between processing component 101 and peripheral interface modules, such as keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to, home buttons, volume buttons, power buttons, and lock buttons.

[0205] Sensor assembly 107 includes one or more sensors for providing state assessments of various aspects of the electronic device. For example, sensor assembly 107 can detect the on / off state of the electronic device, the relative positioning of components such as the display and keypad of the electronic device, changes in the position of the electronic device or a component of the electronic device, the presence or absence of user contact with the electronic device, the orientation or acceleration / deceleration of the electronic device, and temperature changes of the electronic device. Sensor assembly 107 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 107 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, sensor assembly 107 may also include an accelerometer, a gyroscope, a magnetometer, a pressure sensor, or a temperature sensor.

[0206] Communication component 108 is configured to facilitate wired or wireless communication between electronic devices and other devices. Devices can access wireless networks based on communication standards, such as WiFi, 2G, or 3G, or combinations thereof. In one exemplary embodiment, communication component 108 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communication component 108 also includes a near-field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

[0207] In an exemplary embodiment, the electronic device may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the audio processing method applied to the electronic device described above.

[0208] In one exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 102 including instructions, which can be executed by a processor 109 of an electronic device to perform the audio processing method applied to the electronic device described above. For example, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc. When the instructions in the storage medium are executed by the processor 109 of the electronic device, the electronic device is able to perform the audio processing method shown in the above embodiments.

[0209] In one exemplary embodiment, a computer program product is also provided, including a computer program that, when executed by processor 109, implements the audio processing method shown in the above embodiments.

[0210] Other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of the invention are indicated by the following claims.

[0211] It should be understood that the present invention is not limited to the precise structure described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of the invention is limited only by the appended claims.

Claims

1. An audio processing method, characterized in that, The audio processing method includes: The working mode of the device is determined, which includes a first mode and a second mode. The first mode is a transparent mode in which sound is transmitted unidirectionally from the outside of the device to the inside of the device, and the second mode is a transparent mode in which sound is transmitted bidirectionally between the outside of the device and the inside of the device. The initial audio signal is processed according to the audio signal processing method corresponding to the working mode to determine the output signal.

2. The audio processing method according to claim 1, characterized in that, In response to the operating mode being the first mode, the corresponding audio signal processing method includes noise reduction processing. The step of processing the initial audio signal according to the audio signal processing method corresponding to the working mode to determine the output signal includes: The initial audio signal is processed using noise reduction techniques to determine the output signal; wherein the initial audio signal is acquired by an external microphone array of the device.

3. The audio processing method according to claim 1, characterized in that, In response to the second working mode, the corresponding audio signal processing methods include echo cancellation processing and noise reduction processing. The step of processing the initial audio signal according to the audio signal processing method corresponding to the working mode to determine the output signal includes: An echo cancellation process is used to eliminate the echo of the initial audio signal; wherein, the initial audio signal includes: an external audio signal acquired by an external microphone array of the device, or an internal audio signal acquired by an internal microphone array of the device; The initial audio signal after echo cancellation is processed using noise reduction techniques to determine the output signal.

4. The audio processing method according to claim 3, characterized in that, Before processing the initial audio signal after echo cancellation using noise reduction processing, the method further includes: The initial audio signal, after echo cancellation, is subjected to noise addition processing to recover at least a portion of the noise.

5. The audio processing method according to claim 2 or 3, characterized in that, The noise reduction process includes: The input signal is separated into a human voice signal and an ambient sound signal; wherein, when the working mode is the first mode, the input signal is the initial audio signal; or, when the working mode is the second mode, the input signal is the initial audio signal after echo cancellation. Obtain noise reduction parameters; The ratio of the human voice signal to the ambient sound signal is adjusted according to the noise reduction parameters; wherein, the output signal includes the human voice signal and the ambient sound signal after the adjustment.

6. The audio processing method according to claim 5, characterized in that, The noise reduction process also includes: Determine the wind noise of the input signal; Based on the wind noise and the noise reduction parameters, a wind noise suppression gain is determined, which is used to adaptively suppress wind noise on the separated human voice signal and the ambient sound signal.

7. The audio processing method according to claim 6, characterized in that, The step of determining the wind noise suppression gain based on the wind noise and the noise reduction parameters includes: Based on the human voice signal and the ambient sound signal, determine the signal-to-noise ratio and the energy-weighted signal-to-noise ratio; Based on the signal-to-noise ratio and the energy-weighted signal-to-noise ratio, determine the frame gain of the signal-to-noise ratio and the frame gain of the energy-weighted signal-to-noise ratio; The wind noise suppression gain is determined based on the minimum of the frame gain of the signal-to-noise ratio and the frame gain of the energy-weighted signal-to-noise ratio.

8. The audio processing method according to claim 6, characterized in that, The method further includes: Peak limiting processing is performed on the human voice signal after wind noise suppression and the ambient sound signal after wind noise suppression, respectively. The peak limiting processing is used to protect the peak integrity of the signal. The adjustment of the ratio of the human voice signal to the ambient sound signal includes: The ratio of the peak-limited human voice signal and the peak-limited ambient sound signal is adjusted, wherein the output signal is obtained by adjusting the ratio of the peak-limited human voice signal and the ambient sound signal based on the noise reduction parameters.

9. The audio processing method according to claim 5, characterized in that, The acquisition of noise reduction parameters includes: Obtain the first noise reduction parameter corresponding to the user command, wherein the first noise reduction parameter is the noise reduction parameter corresponding to the user command, and the user command includes the noise reduction parameter input by the user; or, Obtain a second noise reduction parameter corresponding to a preset scene, wherein the preset scene is a scene identified based on the input signal, and there is a correspondence between the preset scene and the noise reduction parameter. The second noise reduction parameter is a noise reduction parameter corresponding to the preset scene, wherein the preset scene is a scene identified based on the input signal.

10. The audio processing method according to claim 9, characterized in that, The preset scenarios include a first scenario and a second scenario. The second noise reduction parameter corresponding to the first scenario is greater than the second noise reduction parameter corresponding to the second scenario. The first scenario is a scenario where the intensity values ​​of the human voice signal and the ambient sound signal are greater than a first threshold, and the second scenario is a scenario where the intensity values ​​of the human voice signal and the ambient sound signal are less than a second threshold. The second threshold is less than the first threshold.

11. The audio processing method according to claim 3, characterized in that, The echo cancellation processing method, used to eliminate the echo of the initial audio signal, includes: Obtain the reference signal corresponding to the initial audio signal; The echo of the initial audio signal is eliminated based on the reference signal.

12. The audio processing method according to claim 11, characterized in that, The step of obtaining the reference signal corresponding to the initial audio signal includes: In the second mode, in response to the initial audio signal being an external audio signal of the device, a reference signal from outside the device is acquired.

13. The audio processing method according to claim 2, characterized in that, The method further includes: In response to the operating mode being the first mode, the output signal is played using the speaker inside the device.

14. The audio processing method according to claim 3, characterized in that, The method further includes: In response to the second working mode, the output signal corresponding to the external audio signal of the device is played by the internal speaker of the device. The output signal corresponding to the internal audio signal of the device is played through an external speaker.

15. An audio processing apparatus, characterized in that, The audio processing device includes: The determining module is used to determine the working mode of the device. The working mode includes a first mode and a second mode. The first mode is a transparent mode in which sound is transmitted unidirectionally from the outside of the device to the inside of the device, and the second mode is a transparent mode in which sound is transmitted bidirectionally between the outside of the device and the inside of the device. The processing module is used to process the initial audio signal according to the audio signal processing method corresponding to the working mode, and determine the output signal.

16. The audio processing apparatus according to claim 15, characterized in that, The processing module is also used for: In response to the operating mode being the first mode, the corresponding audio signal processing method includes noise reduction processing. The step of processing the initial audio signal according to the audio signal processing method corresponding to the working mode to determine the output signal includes: The initial audio signal is processed using noise reduction techniques to determine the output signal; The initial audio signal is acquired by an external microphone array of the device.

17. The audio processing apparatus according to claim 15, characterized in that, The processing module is also used for: In response to the second working mode, the corresponding audio signal processing methods include echo cancellation processing and noise reduction processing. The step of processing the initial audio signal according to the audio signal processing method corresponding to the working mode to determine the output signal includes: An echo cancellation process is used to eliminate the echo of the initial audio signal; wherein, the initial audio signal includes: an external audio signal acquired by an external microphone array of the device, or an internal audio signal acquired by an internal microphone array of the device; The initial audio signal after echo cancellation is processed using noise reduction techniques to determine the output signal.

18. The audio processing apparatus according to claim 17, characterized in that, The processing module is also used for: The initial audio signal, after echo cancellation, is subjected to noise addition processing to recover at least a portion of the noise.

19. The audio processing apparatus according to claim 16 or 17, characterized in that, The processing module is also used for: The input signal is separated into a human voice signal and an ambient sound signal; wherein, when the working mode is the first mode, the input signal is the initial audio signal; or, when the working mode is the second mode, the input signal is the initial audio signal after echo cancellation. Obtain noise reduction parameters; The ratio of the human voice signal to the ambient sound signal is adjusted according to the noise reduction parameters; wherein, the output signal includes the human voice signal and the ambient sound signal after the adjustment.

20. The audio processing apparatus according to claim 19, characterized in that, The processing module is also used for: Determine the wind noise of the input signal; Based on the wind noise and the noise reduction parameters, a wind noise suppression gain is determined, which is used to adaptively suppress wind noise on the separated human voice signal and the ambient sound signal.

21. The audio processing apparatus according to claim 20, characterized in that, The processing module is also used for: Based on the human voice signal and the ambient sound signal, determine the signal-to-noise ratio and the energy-weighted signal-to-noise ratio; Based on the signal-to-noise ratio and the energy-weighted signal-to-noise ratio, determine the frame gain of the signal-to-noise ratio and the frame gain of the energy-weighted signal-to-noise ratio; The wind noise suppression gain is determined based on the minimum of the frame gain of the signal-to-noise ratio and the frame gain of the energy-weighted signal-to-noise ratio.

22. The audio processing apparatus according to claim 19, characterized in that, The processing module is also used for: Obtain the first noise reduction parameter corresponding to the user instruction, wherein the first noise reduction parameter is the noise reduction parameter corresponding to the user instruction, and the user instruction includes the noise reduction parameter input by the user; or, Obtain a second noise reduction parameter corresponding to a preset scene, wherein the preset scene is a scene identified based on the input signal, and there is a correspondence between the preset scene and the noise reduction parameter. The second noise reduction parameter is a noise reduction parameter corresponding to the preset scene, wherein the preset scene is a scene identified based on the input signal.

23. An electronic device, characterized in that, The electronic device includes: An audio component, configured as an input signal and / or an output signal; A processor, which is signal-connected to the audio component; A memory for storing processor-executable instructions, the memory being signal-connected to the processor; The processor is configured to perform the audio processing method as described in any one of claims 1 to 14.

24. A non-transitory computer-readable storage medium, characterized in that, When the instructions in the storage medium are executed by the processor of the electronic device, the electronic device is able to perform the audio processing method as described in any one of claims 1 to 14.

25. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the audio processing method as described in any one of claims 1 to 14.