Method for controlling multi-audio playback, electronic device, and storage medium
By identifying and processing the sound source components of the main audio and ambient audio, the problem of audio interference in vehicles was solved, achieving a clear multi-audio playback experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG ZEEKR INTELLIGENT TECH CO LTD
- Filing Date
- 2024-07-09
- Publication Date
- 2026-06-12
AI Technical Summary
When the main audio and ambient audio are played simultaneously in a vehicle, the overlap of the same sound source causes audio interference, affecting the listening experience.
By identifying the sound source components of the main audio and ambient audio, overlapping parts are detected, and the corresponding sound source components of the ambient audio are stopped or paused when they overlap, thus avoiding interference.
It effectively avoids interference from ambient audio to the main audio, ensuring the clarity and stability of the main audio while meeting users' needs for multi-audio playback.
Smart Images

Figure CN118760413B_ABST
Abstract
Description
Technical Field
[0001] This specification relates to the field of audio playback technology, and in particular to methods, electronic devices, and storage media for controlling the playback of multiple audio files. Background Technology
[0002] With the development of automotive electronics technology, vehicle entertainment systems have become increasingly rich and diverse. Modern vehicles are typically equipped with advanced audio systems that not only play main audio (such as music and radio) but also provide ambient audio (such as natural sounds and white noise) to enhance the driving experience. Ambient audio aims to create a comfortable auditory environment, usually including natural sounds such as birdsong, flowing water, and wind, which can help drivers and passengers relax and reduce driving fatigue. However, some background sound components in the main audio, such as those in radio broadcasts or music, may also contain similar sound sources. For example, the vocals in a playing song may include some natural birdsong as background sound effects. When these overlap with the birdsong in the simultaneously playing ambient audio, interference can occur between the main audio and the ambient audio, causing the main audio to sound chaotic and affecting its normal playback. Summary of the Invention
[0003] To overcome the problems existing in related technologies, this specification provides methods, electronic devices, and storage media for controlling multi-audio playback.
[0004] According to a first aspect of the embodiments of this specification, a method for controlling multi-audio playback is provided. The method is applied to a vehicle, specifically to an application installed in a vehicle infotainment system for managing in-vehicle audio systems. This application can be deployed within the vehicle's cabin area; or it can be applied to the vehicle infotainment system itself (in which case the vehicle infotainment system manages the in-vehicle audio system). The method includes:
[0005] When the vehicle is playing both main audio and ambient audio simultaneously, the main audio sound components contained in the main audio and the ambient audio sound components contained in the ambient audio are identified.
[0006] If at least a portion of the main audio sound component overlaps with at least a portion of the ambient sound component, then the playback of the ambient audio is stopped or the playback of the overlapping ambient sound component in the ambient audio is stopped.
[0007] According to a second aspect of the present disclosure, an electronic device is provided, comprising: a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to implement the steps of the method for controlling multiple audio playback as described in the first aspect above.
[0008] According to a third aspect of the present disclosure, a non-transient computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements the steps of the method for controlling multiple audio playback as described in the first aspect above.
[0009] The technical solutions provided in the embodiments of this specification may include the following beneficial effects:
[0010] In this embodiment, when the vehicle is playing both main audio and ambient audio simultaneously, the system identifies the main audio sound components from the main audio and the ambient sound components from the ambient audio. If at least a portion of the main audio sound components overlaps with at least a portion of the ambient sound components, the system stops playing the ambient audio or stops playing the overlapping ambient sound components within the ambient audio. Therefore, when the main audio and ambient audio are played simultaneously, by identifying the main audio sound components from the main audio and the ambient sound components from the ambient audio, the system can detect whether they contain overlapping sound components. Once these overlapping sound components are detected, the system can stop playing the ambient audio or stop playing the overlapping ambient sound components within the ambient audio, thereby preventing interference from the overlapping sound components in the ambient audio to the main audio. If there are no overlapping sound components, the main audio and ambient audio can be played normally. This satisfies the user's need to play both main audio and ambient audio simultaneously while resolving the mutual interference problem caused by the same sound source during simultaneous playback of the main audio and ambient audio.
[0011] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this specification. Attached Figure Description
[0012] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this specification and, together with the description, serve to explain the principles of this specification.
[0013] Figure 1 This is a flowchart illustrating a method for controlling multiple audio playback according to an exemplary embodiment of this specification.
[0014] Figure 2 This is a flowchart illustrating another method for controlling multiple audio playback according to an exemplary embodiment of this specification.
[0015] Figure 3 This is a schematic diagram illustrating the determination of overlapping ambient sound components according to an exemplary embodiment of this specification.
[0016] Figure 4 This is a block diagram illustrating an electronic device according to an exemplary embodiment.
[0017] Figure 5 This is a block diagram illustrating an apparatus for controlling multiple audio playback according to an exemplary embodiment of this specification. Detailed Implementation
[0018] The embodiments described in this specification will now be described in detail.
[0019] like Figure 1 As shown, Figure 1 This is a flowchart illustrating a method for controlling multiple audio playback according to an exemplary embodiment of this specification. The method is applied to a vehicle and includes steps 101-102:
[0020] Step 101: When the vehicle is playing main audio and ambient audio at the same time, identify the main audio sound components contained in the main audio and the ambient audio sound components contained in the ambient audio.
[0021] Step 102: If at least a portion of the main audio sound component overlaps with at least a portion of the ambient sound component, then stop playing the ambient audio or stop playing the overlapping ambient sound components in the ambient audio.
[0022] The main audio refers specifically to the multimedia audio of the vehicle's audio system. Main audio includes, but is not limited to, the following: Music playback: music played through the vehicle's audio system, such as songs played via radio, CD, or streaming applications. Video playback: video audio played through the vehicle's multimedia equipment. This manual does not impose any limitations on the type of main audio.
[0023] The main audio typically consists of core sound components and background sound components. The core sound components are the main content of the main audio, containing important information or entertainment elements; these are the parts the user needs to hear and understand clearly. For example, lyrics in music. Background sound components are auxiliary sounds or environmental sound effects present in the main audio. While not core to the main information, these components can enhance the realism of the audio content or increase its richness. For example, background sounds in music, such as a gentle breeze or birdsong, make the sound more natural and less harsh.
[0024] Ambient audio refers to background sounds used in vehicle audio systems to create a comfortable and pleasant environment. This type of audio typically does not contain essential driving information but rather aims to enhance overall driving comfort and experience. Ambient audio includes, but is not limited to, the following: Natural sounds: such as birdsong, flowing water, and wind, which can provide relaxing background sounds during driving. White noise: used to mask unwanted external noise and improve the quietness of the vehicle interior. Music or ambient music: playing soft background music to create a relaxing atmosphere, but not the main content that the user is focused on listening to. Simulated in-vehicle sound effects: such as simulating the sound of an electric vehicle engine to enhance the driving experience.
[0025] Ambient audio typically comprises primary ambient sound components (also known as the "soundbed") and secondary ambient sound components. Primary ambient sound components are the main sound elements used to create a specific auditory environment or atmosphere; these components form the core of the ambient audio and directly influence the user's auditory experience. Secondary ambient sound components are the secondary sound elements in ambient audio, playing a supplementary and supporting role, enhancing or enriching the overall auditory experience. These components are not the main content of the ambient audio, but they add depth and realism. For example, in ambient audio containing birdsong and flowing water, the primary ambient sound component is the flowing water sound, which permeates the entire audio playback and is the sound primarily perceived by the user, directly determining the theme of the ambient audio. The secondary ambient sound component is the birdsong, which appears intermittently throughout the audio playback, supporting and enhancing the primary ambient sound component. Secondary ambient sound components can include one or more sound sources, with each sound source corresponding to a specific sound element within the secondary ambient sound component. For example, secondary ambient sound components can include different sound elements such as birdsong, frog croaks, and cicada chirps.
[0026] When the main audio and ambient audio are played simultaneously, interference and conflicts can occur if they contain the same sound source. For example, while playing vocals in music, natural bird calls may be added as background sound effects. When these background sound effects overlap with the bird calls in the ambient sound components of the ambient audio, the sounds from the same sound source in the main audio and ambient audio will interfere with each other, affecting the user's listening experience.
[0027] To address the aforementioned issues, embodiments of this specification identify the main audio sound components from the main audio and the ambient sound components from the ambient audio. If it is detected that both contain the same sound source, i.e., at least a portion of the main audio sound components overlaps with at least a portion of the ambient sound components, then the playback of the ambient audio or the overlapping ambient sound components in the ambient audio can be stopped. This avoids interference from the overlapping sound sources in the ambient audio to the main audio, thereby ensuring the clarity and stability of the main audio.
[0028] In one embodiment, during the entire playback of the main audio, before transmitting the audio signal to the playback device, the main audio segment to be played can be captured in real time for real-time detection, the main audio sound components contained in the current main audio segment can be identified, and when at least a part of the main audio sound components are detected to overlap with at least a part of the ambient sound components, the playback of the ambient audio can be stopped until the main audio playback ends, and the playback of the ambient audio can be resumed when the audio playback device switches to the next main audio playback.
[0029] In another embodiment, during the entire playback of the main audio, before transmitting the audio signal to the playback device, the main audio segment to be played can be captured in real time for real-time detection, identifying the main audio sound components contained in the current main audio segment. When at least a portion of the main audio sound components is detected to overlap with at least a portion of the ambient sound components, the playback of the overlapping ambient sound components in the ambient audio can be stopped until the main audio playback ends, and the normal playback of the ambient audio can be resumed when the audio playback device switches to the next main audio playback.
[0030] In one embodiment, during the entire playback of the main audio, before transmitting the audio signal to the playback device, the currently played main audio segment can be intercepted and detected in real time to identify the main audio sound components contained in the current main audio segment. If it is detected that at least a portion of the main audio sound component overlaps with at least a portion of the ambient sound component, playback of the ambient audio is stopped for a preset duration after the current moment. If, during the real-time detection of the main audio signal within the preset duration, no further overlap between at least a portion of the main audio sound component and at least a portion of the ambient sound component is detected, playback of the ambient audio resumes after the preset duration.
[0031] In another embodiment, during the entire playback of the main audio, before transmitting the audio signal to the playback device, the main audio segment to be played can be intercepted in real time for real-time detection. The main audio sound components contained in the current main audio segment can be identified. If at least a portion of the main audio sound component is detected to overlap with at least a portion of the ambient sound component, playback of the overlapping ambient sound component in the ambient audio can be stopped for a preset duration after the current moment. If, during the real-time detection of the main audio signal within the preset duration, no further overlap between at least a portion of the main audio sound component and at least a portion of the ambient sound component is detected, playback of the overlapping ambient sound component in the ambient audio can resume after the preset duration.
[0032] If an ambient audio track contains the sounds of insects chirping and flowing water, and the flowing water is the dominant ambient sound throughout its playback, while the insect chirping appears intermittently as a secondary ambient sound, then even if the main audio track played simultaneously does not contain the flowing water sound, its continuous presence as the dominant ambient sound can easily mask or interfere with the sound information in the main audio, making the main audio's sound information unclear. For example, continuous flowing water can obscure details in the main audio, making it difficult to hear clearly. On the other hand, the insect chirping, as a secondary ambient sound, appears intermittently, with each sound lasting only a short duration. Because the insect chirping occurs intermittently and with short durations, its primary function is to supplement the overall atmosphere of the audio, with less interference to the main audio.
[0033] In response to the above problems, such as Figure 2 As shown, this specification proposes another method for controlling multi-audio playback, including steps 201-202:
[0034] Step 201: When the vehicle is playing both main audio and ambient audio at the same time, only the secondary ambient sound component of the ambient audio is played, and the main audio sound component is identified from the main audio.
[0035] Step 202: If at least a portion of the main audio sound component overlaps with at least a portion of the secondary ambient sound component, then stop playing the ambient audio or stop playing the overlapping ambient sound components in the ambient audio.
[0036] In scenarios where both main audio and ambient audio are played simultaneously, this embodiment only plays the secondary ambient sound component of the ambient audio, omitting the main ambient sound component, thus avoiding continuous interference from the main ambient sound component. Furthermore, to further prevent interference between the secondary ambient sound component and the main audio due to shared sound sources, this embodiment identifies the main audio sound component from the main audio and stops playing the ambient audio or the overlapping ambient sound component when at least a portion of the main audio sound component overlaps with at least a portion of the secondary ambient sound component. It is understood that if the secondary ambient sound component and the main audio do not share the same sound source, the secondary ambient sound component of both the main audio and the ambient audio can be played simultaneously.
[0037] In one embodiment, during the entire playback of the main audio, before transmitting the audio signal to the playback device, the main audio segment to be played can be captured in real time for real-time detection, identifying the main audio sound components contained in the current main audio segment. When at least a portion of the main audio sound components is detected to overlap with at least a portion of the secondary ambient sound components, the playback of the ambient audio can be stopped until the main audio playback ends. When the audio playback device switches to the next main audio playback, the playback of the secondary ambient sound components of the ambient audio can be resumed.
[0038] In another embodiment, during the entire playback of the main audio, before transmitting the audio signal to the playback device, the main audio segment to be played can be intercepted in real time for real-time detection, identifying the main audio sound components contained in the current main audio segment. When at least a portion of the main audio sound components is detected to overlap with at least a portion of the secondary ambient sound components, the playback of the overlapping secondary ambient sound components in the ambient audio can be stopped until the main audio playback ends. When the audio playback device switches to the next main audio playback, the normal playback of the secondary ambient sound components of the ambient audio can be resumed.
[0039] In one embodiment, during the entire playback of the main audio, before transmitting the audio signal to the playback device, the main audio segment to be played can be intercepted in real time for real-time detection. The main audio sound components contained in the current main audio segment can be identified. If it is detected that at least a portion of the main audio sound component overlaps with at least a portion of the secondary ambient sound component, the playback of the secondary ambient sound component of the ambient audio can be stopped for a preset duration after the current moment. If, during the real-time detection within the preset duration, no further overlap between at least a portion of the main audio sound component and at least a portion of the secondary ambient sound component is detected, the playback of the secondary ambient sound component of the ambient audio can be resumed after the preset duration.
[0040] In another embodiment, during the entire playback of the main audio, before transmitting the audio signal to the playback device, the main audio segment to be played can be intercepted in real time for real-time detection. The main audio sound components contained in the current main audio segment can be identified. When at least a portion of the main audio sound component is detected to overlap with at least a portion of the secondary ambient sound component, playback of the overlapping secondary ambient sound component in the ambient audio is stopped for a preset duration after the current moment. If, during the real-time detection within the preset duration, no further overlap between at least a portion of the main audio sound component and at least a portion of the secondary ambient sound component is detected, playback of the overlapping secondary ambient sound component in the ambient audio resumes after the preset duration.
[0041] In one embodiment, in order to play only the secondary ambient sound components of the ambient audio, the primary ambient sound components in the ambient audio can be filtered in real time while playing the ambient audio. By filtering the primary ambient sound components in the ambient audio in real time, only the secondary ambient sound components are retained for playback.
[0042] In another embodiment, a corresponding ambient audio copy can be set for each ambient audio track, containing only the secondary ambient sound components. When playing the main audio and ambient audio simultaneously, the ambient audio copy containing only the secondary ambient sound components is selected for playback. By pre-creating audio copies containing only the secondary ambient sound components, the complexity of real-time audio separation processing and the waste of computational resources are reduced. Optionally, the original ambient audio or its copy can be dynamically selected for playback based on actual conditions. For example, when it is detected that the main audio and ambient audio are playing simultaneously, the playback automatically switches to the copy containing only the secondary ambient sound components; when it is detected that only ambient audio is playing, the playback of the complete ambient audio is resumed.
[0043] In one embodiment, the main audio sound component includes a core sound component and a background sound component. Since the core sound component of the main audio is mainly human voice, it almost never overlaps with the sound source of the ambient audio. The part that contains the same sound source as the ambient audio generally appears in the background sound component of the main audio. Furthermore, the main audio sound component is mostly composed of the core sound component. Therefore, when determining whether the main audio sound component contains the same sound source as the ambient audio, it is possible to ignore whether the core sound component of the main audio overlaps with the ambient sound component of the ambient audio. Instead, it is only necessary to determine whether the background sound component of the main audio overlaps with the ambient sound component of the ambient audio to improve computational efficiency.
[0044] Specifically, when the vehicle is playing both main audio and ambient audio simultaneously, the background sound components contained in the main audio are identified. If at least a portion of the identified background sound components overlaps with at least a portion of the ambient sound components, the playback of the ambient audio or the overlapping ambient sound components in the ambient audio is stopped.
[0045] In one embodiment, to obtain the background sound component of the main audio and the ambient sound component of the ambient audio, the main audio signal of the main audio output by the vehicle system can be captured through the first audio output interface, and the captured main audio signal can be separated based on an audio separation model to extract the background sound component. Similarly, the ambient audio signal of the ambient audio output by the vehicle system can be captured through the second audio output interface, and the captured ambient audio signal can be separated based on an audio separation model to extract the ambient sound component.
[0046] To achieve the goal of capturing both the main audio signal and the ambient audio signal, a high-sensitivity first audio capture device and a second audio capture device can be configured at the first and second audio output interfaces of the vehicle's infotainment system, respectively. The first audio capture device captures the main audio signal output from the first audio output interface, and the second audio capture device captures the ambient audio signal output from the second audio output interface. Alternatively, audio capture devices can be configured to be connected to both the first and second audio output interfaces respectively, allowing for the parallel capture of the main audio signal and the ambient audio signal output from the first and second audio output interfaces, respectively.
[0047] In separating sound components from audio signals, taking the main audio signal as an example, the main audio signal is captured using an audio capture device, and the real-time captured main audio signal is input into an audio separation model. Using time-frequency masking technology, the background sound components are separated from the main audio signal based on the masking value of the audio separation model. Similarly, for ambient audio signals, the ambient audio signal can also be captured using an audio capture device, and the real-time captured ambient audio signal is input into an audio separation model. Using time-frequency masking technology, the ambient sound components are separated from the ambient audio signal based on the masking value of the audio separation model.
[0048] Regarding the method of separating target sound components from audio, the same audio separation model can be used to separate the background sound components of the main audio and the ambient sound components of the ambient audio. Specifically, by using diverse sound datasets containing both main audio and ambient audio types, a general audio separation model is trained, enabling it to identify and separate the background sound components of various main audio and ambient sound components of different ambient audio from the input audio. This general audio separation model can be used to separate both the background sound components of the main audio and the ambient sound components of the ambient audio, thus reducing the resources required to store the model and facilitating its deployment in in-vehicle systems. Furthermore, a multi-task learning strategy can be introduced into the model to enable parallel processing of main audio and ambient audio tasks, thereby improving processing efficiency.
[0049] Of course, different audio separation models can also be used to separate the background sound components of the main audio and the ambient sound components of the ambient audio, respectively. Specifically, each audio separation model is designed with a structure optimized specifically for a particular type of audio. For example, one model can be trained and optimized specifically for separating the background sound components in the main audio, while another model can be trained and optimized specifically for separating the secondary ambient sound components in the ambient audio. This allows each model to focus on the task of separating specific types of sound components in a specific audio, and the model parameters and architecture can be adjusted specifically for different types of audio signals to ensure better separation results.
[0050] The audio separation models described in the above embodiments can be based on architectures such as Wave-U-net, U-net, or CNN networks. This specification does not limit the model architecture.
[0051] After identifying the main audio sound components from the main audio and the ambient sound components from the ambient audio, it can be determined whether at least a portion of the main audio sound components overlaps with at least a portion of the ambient sound components in the following way:
[0052] In one embodiment, such as Figure 3 As shown, main audio sound features corresponding to each sound source are extracted from the main audio sound component 31, and ambient sound features corresponding to each sound source are extracted from the ambient sound component 32. For example, if the main audio sound component includes different sound sources such as human voice, violin, guitar, and wind, main audio sound features representing the characteristics of each sound source can be extracted from the main audio sound component 31. Similarly, the same processing is performed on the ambient sound component 32, which will not be elaborated here. Specifically, the main audio sound component 31 can be separated into different independent sound sources based on a non-negative matrix factorization algorithm, and main audio sound features representing the characteristics of the separated independent sound sources can be extracted from them. For example, spectral peaks, MFCC coefficients, etc., can be used to represent the main audio sound features. Similarly, the same processing can be used for the ambient sound component 32, which will not be elaborated here.
[0053] In this embodiment, for any ambient audio stored locally on the vehicle's infotainment system, the ambient sound features of that ambient audio can be pre-stored in an ambient sound feature library. When the main audio and ambient audio are played simultaneously, the main audio segment to be played can be extracted in real time before the main audio signal is transmitted to the external playback device. This main audio segment can be a 20-millisecond audio signal frame as the main audio sound component 31 to be detected. The main audio sound features corresponding to each sound source in the main audio sound component 31 are extracted and compared with the ambient sound features of the currently played ambient audio obtained from the ambient sound feature library to determine whether there are sound sources in the main audio played in the current time period that overlap with the ambient audio. Of course, for ambient audio played online on the vehicle's infotainment system, the same method as processing the main audio signal can be used: before the ambient audio signal is transmitted to the external device, the ambient audio segment to be played can be extracted. This ambient audio segment can be a 20-millisecond audio signal frame as the ambient sound component 31 to be detected.
[0054] After extracting the main audio sound features and ambient sound features, the source similarity between each main audio sound feature and each ambient sound feature can be calculated. For example, for any main audio sound feature i, the source similarity between the main audio sound feature i and all ambient sound features extracted from ambient sound component 32 can be calculated, and ambient sound features with a source similarity greater than a similarity threshold with any main audio sound feature can be identified, and the ambient sound component corresponding to that ambient sound feature can be stopped from playing.
[0055] As explained in the foregoing embodiments, the core sound component 331 in the main audio sound component 31 will hardly overlap with the sound source of the ambient sound component 32. Therefore, it is not necessary to calculate the sound source similarity between the core sound component 331 and the ambient sound component 32. Even if the main ambient sound component 341 in the ambient sound component 32 does not have any components overlapping with the main audio sound component 31, its continuous interference to the main audio will affect the user's auditory experience. Therefore, when playing the main audio and ambient audio at the same time, the main ambient sound component can be filtered out, and it is not necessary to calculate the sound source similarity between the main ambient sound component 341 and the main audio sound component 31.
[0056] Therefore, in one embodiment, a core sound component 331 and a background sound component 332 can be determined from the main audio sound component 31, and a main ambient sound component 341 and a secondary ambient sound component 342 can be determined from the ambient sound component 32. The main audio sound features corresponding to each sound source are extracted from the background sound component 332, and the ambient sound features corresponding to each sound source are extracted from the secondary ambient sound component 342. If an ambient sound feature has a similarity greater than a similarity threshold with any main audio sound feature, the playback of the ambient sound component corresponding to that ambient sound feature is stopped. For the main ambient sound component of the ambient audio, the main ambient sound component can be kept off during the simultaneous playback of the main audio and ambient audio.
[0057] In addition, this specification proposes another method for determining whether at least a portion of the main audio sound component overlaps with at least a portion of the ambient sound component:
[0058] Determine the main audio source category corresponding to each sound source in the main audio sound component, and determine the ambient audio source category corresponding to each sound source in the ambient sound component; if there is an ambient audio source category that belongs to the same source category as any main audio source category, then stop playing the ambient sound component corresponding to the ambient audio source category.
[0059] Specifically, sound source separation algorithms such as Non-negative Matrix Factorization (NMF) and Independent Component Analysis (ICA) are used to separate different sound sources from the main audio sound components. For each separated sound source, a pre-trained machine learning model can be used to identify the main audio sound source category to which it belongs. Similarly, sound source separation algorithms such as Non-negative Matrix Factorization (NMF) and Independent Component Analysis (ICA) are used to separate different sound sources from the ambient sound components. For each separated sound source, a pre-trained machine learning model can be used to identify the ambient audio sound source category to which it belongs. If there is an ambient audio sound source category that belongs to the same sound source category as any main audio sound source category, then the ambient sound component corresponding to that ambient audio sound source category is stopped from playing.
[0060] For example, assuming that the main audio sound component is identified to include guitar sounds, flowing water sounds, and birdsong, and the ambient sound component is identified to include birdsong and wind sounds, then it can be determined that the main audio sound component and the ambient sound component contain the same sound source category (birdsong), and the ambient sound component corresponding to the ambient sound source category (birdsong) will be stopped from playing.
[0061] In one embodiment, in a scenario where main audio and ambient audio are played simultaneously, the theme to which the main audio belongs can be identified, and ambient audio matching the theme can be played.
[0062] Specifically, when building an ambient audio library, ambient audio can be categorized into different categories based on different themes. For example, ambient audio can be categorized into "natural ambient sounds," "urban ambient sounds," and "leisure music sounds." For each category, a matching rule is defined between the theme of the main audio and the category of the ambient audio. For example, the theme "driving instructions" matches "natural ambient sounds" (such as wind sounds, birdsong, etc.). The theme "music playback" matches "urban ambient sounds" (such as street noise, café background sounds, etc.). Based on the identified theme of the currently playing main audio, ambient audio from the matching category is selected for playback. To identify the theme of the main audio, it can be converted into text. By analyzing the text content, the theme of the main audio can be determined. For example, for the audio clip "Next song A will be played," by converting the audio clip into the text "Next song A will be played," and then extracting keywords such as "will be played" and "next song," the theme category of the main audio is determined to be "music playback" based on semantic analysis.
[0063] In the embodiments described in this specification, by playing corresponding ambient audio to match the main audio theme, a harmonious audio environment can be created, enhancing the auditory experience for drivers and passengers. For example, music combined with natural ambient sounds can reduce driving stress and enhance concentration.
[0064] Corresponding to the embodiments of the foregoing methods, this specification also provides embodiments of the apparatus and the terminal to which it is applied.
[0065] like Figure 4 As shown, Figure 4 This is a schematic diagram illustrating the structure of an electronic device 400 according to an exemplary embodiment of this specification. At the hardware level, the device 400 includes a processor 402, an internal bus 404, a network interface 406, memory 408, and non-volatile memory 410, and may also include other hardware required for various services. One or more embodiments of this specification can be implemented in software, for example, the processor 402 reads the corresponding computer program from the non-volatile memory 410 into memory 408 and then runs it. Of course, besides software implementation, one or more embodiments of this specification do not exclude other implementation methods, such as logic devices or a combination of hardware and software, etc. That is to say, the execution entity of the following processing flow is not limited to individual logic modules, but can also be hardware or logic devices.
[0066] like Figure 5 As shown, Figure 5 This specification describes a device for controlling multi-audio playback according to an exemplary embodiment. The device 500 can be applied to, for example... Figure 4The electronic device shown implements the technical solution of this specification. The device 500 is applied to a vehicle and includes:
[0067] The component recognition module 502 is used to identify the main audio sound components contained in the main audio and the ambient sound components contained in the ambient audio when the vehicle is playing main audio and ambient audio at the same time.
[0068] The first playback mode control module 504 is used to stop playing the ambient audio or stop playing the overlapping ambient sound components in the ambient audio if at least a portion of the main audio sound component overlaps with at least a portion of the ambient sound component.
[0069] The main audio sound component includes a background sound component, and the component recognition module 502 is specifically used to identify the background sound component included in the main audio. The first playback mode control module 504 is specifically used to identify at least a portion of the identified background sound component that overlaps with at least a portion of the ambient sound component.
[0070] The main audio is output through the first audio output interface of the vehicle system, and the ambient audio is output through the second audio output interface of the vehicle system. The component recognition module 502 is specifically used to capture the main audio signal of the main audio through the first audio output interface, and perform separation processing on the main audio signal based on the audio separation model to separate the background sound component; and / or, capture the ambient audio signal of the ambient audio through the second audio output interface, and perform separation processing on the ambient audio signal based on the audio separation model to separate the ambient sound component.
[0071] The first playback mode control module 504 is specifically used to determine the main audio sound features corresponding to each sound source in the main audio sound component, and to determine the ambient sound features corresponding to each sound source in the ambient sound component; if there is an ambient sound feature with a similarity greater than a similarity threshold with any main audio sound feature, then the playback of the ambient sound component corresponding to that ambient sound feature is stopped; or, to determine the main audio sound source category corresponding to each sound source in the main audio sound component, and to determine the ambient audio sound source category corresponding to each sound source in the ambient sound component; if there is an ambient audio sound source category belonging to the same sound source category as any main audio sound source category, then the playback of the ambient sound component corresponding to that ambient audio sound source category is stopped.
[0072] The ambient sound feature library pre-stores the ambient sound features of various ambient audios, and the first playback mode control module 504 is specifically used to obtain the ambient sound features of the target ambient audio from the ambient sound feature library.
[0073] The ambient audio includes a main ambient sound component and a secondary ambient sound component. The device 500 also includes a second playback mode control module 506, which is used to play the secondary ambient sound component of the ambient audio while playing the main audio and the ambient audio simultaneously.
[0074] The second playback mode control module 506 is specifically used to play an ambient audio copy containing only the secondary ambient sound component; or, when playing the ambient audio, to filter the primary ambient sound component in the ambient audio in real time.
[0075] The first playback mode control module 504 is specifically used to identify the theme to which the main audio belongs and play ambient audio that matches the theme.
[0076] The first playback mode control module 504 is specifically used to stop playing the ambient audio or stop playing overlapping ambient sound components in the ambient audio within a preset duration; or, stop playing the ambient audio or stop playing overlapping ambient sound components in the ambient audio until the main audio playback ends.
[0077] The specific implementation process of the functions and roles of each module in the above device can be found in the implementation process of the corresponding steps in the above method, and will not be repeated here.
[0078] For the device embodiments, since they basically correspond to the method embodiments, the relevant parts can be referred to in the description of the method embodiments. The device embodiments described above are merely illustrative. The modules described as separate components may or may not be physically separate, and the components shown as modules may or may not be physical modules, that is, they may be located in one place or distributed across multiple network modules. Some or all of the modules can be selected to achieve the purpose of the solution in this specification according to actual needs. Those skilled in the art can understand and implement this without creative effort.
[0079] This specification also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of any of the foregoing methods for controlling multi-audio playback provided in this disclosure.
[0080] Specifically, computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, such as semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disks or removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks.
Claims
1. A method for controlling the playback of multiple audio files, characterized in that, The method is applied to vehicles and includes: When the vehicle is playing both main audio and ambient audio simultaneously, the main audio sound components contained in the main audio and the ambient audio sound components contained in the ambient audio are identified, wherein the main audio sound components include background sound components; the identification of the main audio sound components contained in the main audio includes: identifying the background sound components contained in the main audio. If at least a portion of the main audio sound component overlaps with at least a portion of the ambient sound component, then the playback of the ambient audio is stopped or the playback of the overlapping ambient sound component in the ambient audio is stopped; the overlap of at least a portion of the main audio sound component with at least a portion of the ambient sound component includes: the overlap of at least a portion of the identified background sound component with at least a portion of the ambient sound component.
2. The method according to claim 1, characterized in that, The main audio is output through the first audio output interface of the vehicle's infotainment system, and the ambient audio is output through the second audio output interface of the vehicle's infotainment system. The step of identifying background sound components from the main audio includes: capturing the main audio signal of the main audio through the first audio output interface, and performing separation processing on the main audio signal based on an audio separation model to separate the background sound components therefrom; and / or, The step of identifying ambient sound components from the ambient audio includes: capturing the ambient audio signal of the ambient audio through the second audio output interface, and performing separation processing on the ambient audio signal based on an audio separation model to separate the ambient sound components from it.
3. The method according to claim 1, characterized in that, Determining that at least a portion of the main audio sound component overlaps with at least a portion of the ambient sound component, and stopping playback of the overlapping ambient sound component in the ambient audio, includes: Determine the main audio sound features corresponding to each sound source in the main audio sound component, and determine the ambient sound features corresponding to each sound source in the ambient sound component; if there is an ambient sound feature whose similarity to any main audio sound feature is greater than a similarity threshold, then stop playing the ambient sound component corresponding to that ambient sound feature; or, Determine the main audio source category corresponding to each sound source in the main audio sound component, and determine the ambient audio source category corresponding to each sound source in the ambient sound component; if there is an ambient audio source category that belongs to the same source category as any main audio source category, then stop playing the ambient sound component corresponding to the ambient audio source category.
4. The method according to claim 3, characterized in that, The ambient sound feature library pre-stores ambient sound features of various ambient audio frequencies. Determining the ambient sound features corresponding to each sound source in the ambient sound components includes: Obtain the ambient sound features of the target ambient audio from the ambient sound feature library.
5. The method according to claim 1, characterized in that, The ambient audio includes a primary ambient sound component and a secondary ambient sound component, and the method further includes: While playing the main audio and ambient audio simultaneously, the secondary ambient sound component of the ambient audio is played.
6. The method according to claim 5, characterized in that, The sub-ambient sound components of the ambient audio being played include: Play an ambient audio copy containing only the aforementioned sub-ambient sound components; or, The main ambient sound component in the ambient audio is filtered in real time while the ambient audio is being played.
7. The method according to claim 1, characterized in that, When the vehicle is simultaneously playing main audio and ambient audio, playing the ambient audio includes: Identify the theme to which the main audio belongs, and play ambient audio that matches the theme.
8. The method according to claim 1, characterized in that, The step of stopping the playback of the ambient audio or stopping the playback of overlapping ambient sound components in the ambient audio includes: Stop playing the ambient audio or the overlapping ambient sound components within the ambient audio within a preset duration; or... Stop playing the ambient audio or stop playing the overlapping ambient sound components in the ambient audio until the main audio finishes playing.
9. An electronic device, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to implement the method of any one of claims 1 to 8.