Audio processing methods, devices, chips, module equipment and storage media
By performing weighted averaging and Fourier transform processing on the amplitude attenuation coefficients of dual-channel audio, the non-uniform attenuation problem caused by traditional audio enhancement methods is solved, achieving uniform attenuation of audio signals and reducing distortion.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- UNISOC CHONGQING TECH CO LTD
- Filing Date
- 2023-08-09
- Publication Date
- 2026-06-30
AI Technical Summary
Traditional audio enhancement methods are prone to non-uniform attenuation when processing audio signals, resulting in audio signal distortion.
By weighted averaging of the amplitude attenuation coefficients at each frequency point in the first frequency point set for each frame of the dual-channel audio signal, the single-frame amplitude attenuation coefficient is determined. Based on the single-frame attenuation coefficient, the frequency points obtained after Fourier transform are attenuated to achieve uniform attenuation across the entire spectrum.
It reduces audio signal distortion and ensures that the audio signal is not distorted after enhancement.
Smart Images

Figure CN116887129B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of audio processing technology, and in particular to an audio processing method, apparatus, chip, module device and storage medium. Background Technology
[0002] Microphones typically pick up interference signals when capturing audio signals. Traditional single microphones have limited ability to process non-stationary interference signals, while arrays of two or more microphones can utilize more spatial information to suppress non-stationary interference signals, thereby achieving audio enhancement.
[0003] Currently, one method of audio enhancement is to apply amplitude enhancement or attenuation coefficients to audio signals from multiple microphones, thereby preserving the spectral characteristics of the parts of interest in the audio signal while destroying the spectral characteristics of interfering parts. However, amplitude enhancement or attenuation coefficients are applied to individual frequency points, and the values of these coefficients typically differ for different frequency points. Therefore, when audio signals are processed with different degrees of amplitude enhancement or attenuation, the audio signal will experience non-uniform attenuation across the entire spectrum, leading to audio signal distortion. How to reduce audio signal distortion during audio enhancement has become one of the urgent problems to be solved. Summary of the Invention
[0004] This application provides an audio processing method, apparatus, chip, module device, and storage medium that can uniformly attenuate the signal across the entire spectrum using a single-frame amplitude attenuation coefficient, thereby reducing audio signal distortion.
[0005] In a first aspect, this application provides an audio processing method, the method comprising:
[0006] Acquire dual-channel audio from a dual-microphone array;
[0007] Process the dual-channel audio to obtain the amplitude attenuation coefficient of each frame signal of the dual-channel audio at each frequency point in the first frequency point set;
[0008] For each frame of dual-channel audio signal, the weighted average of the amplitude attenuation coefficients at each frequency point in the first frequency point set is determined as the single-frame amplitude attenuation coefficient of each frame signal.
[0009] For each frame of the dual-channel audio signal, the amplitude attenuation processing is performed on each frequency point in the second frequency point set of each frame signal based on the single-frame amplitude attenuation coefficient of each frame signal to obtain the attenuated signal of each frame of the dual-channel audio. The first frequency point set is a subset of the second frequency point set, and the second frequency point set is the frequency point set obtained after performing Fourier transform on the dual-channel audio.
[0010] Beamforming and inverse Fourier transform are performed on the attenuated signal of each frame of the dual-channel audio to obtain the enhanced dual-channel audio.
[0011] As can be seen, by weighted averaging the amplitude attenuation coefficients of each frequency point in the first frequency point set for each frame of the dual-channel audio signal, the single-frame amplitude attenuation coefficient of each frame signal is obtained; and by performing amplitude attenuation on each frequency point obtained after Fourier transform based on the single-frame attenuation coefficient, the signal of each frame can be attenuated uniformly across the entire spectrum, and the signal of each frame will not be distorted after amplitude attenuation, thus reducing the distortion of the enhanced dual-channel audio.
[0012] In one possible implementation, the dual-channel audio is processed to obtain the amplitude attenuation coefficients of each frame of the dual-channel audio signal at each frequency point in the first frequency point set, including:
[0013] The dual-channel audio is framed, windowed, and subjected to Fourier transform to obtain the signal of each frame of the dual-channel audio.
[0014] Determine the phase difference of each frame of the dual-channel audio signal at each frequency point in the first frequency point set;
[0015] Based on the relationship between phase difference and amplitude attenuation coefficient, and the phase difference of each frame signal of the dual-channel audio at each frequency point in the first frequency point set, the amplitude attenuation coefficient of each frame signal of the first channel audio in the dual-channel audio at each frequency point in the first frequency point set, and the amplitude attenuation coefficient of each frame signal of the second channel audio in the dual-channel audio at each frequency point in the first frequency point set are obtained.
[0016] Since the deviation between the direction of the part of interest and the direction of noise in dual-channel audio is related to the phase difference of the dual-channel audio signal, the amplitude attenuation coefficient at each frequency point in the first frequency point set can be obtained by the phase difference of each frame signal in the dual-channel audio signal at each frequency point in the first frequency point set. The amplitude attenuation coefficient can be used to suppress the noise part in the signal, thereby achieving the effect of enhancing the part of interest in the signal.
[0017] In one possible implementation, the dual-channel audio is processed to obtain the amplitude attenuation coefficients of each frame of the dual-channel audio signal at each frequency point in the first frequency point set, including:
[0018] The dual-channel audio is framed, windowed, and subjected to Fourier transform to obtain the signal of each frame of the dual-channel audio.
[0019] Determine the signal-to-noise ratio (SNR) of each frame of the first audio channel in the dual-channel audio at each frequency point in the first frequency point set, and the SNR of each frame of the second audio channel in the dual-channel audio at each frequency point in the first frequency point set;
[0020] Based on the relationship between signal-to-noise ratio and amplitude attenuation coefficient, and the signal-to-noise ratio of each frame of the first channel audio signal at each frequency point in the first frequency point set, the amplitude attenuation coefficient of each frame of the first channel audio signal at each frequency point in the first frequency point set is obtained.
[0021] Based on the relationship between signal-to-noise ratio and amplitude attenuation coefficient, and the signal-to-noise ratio of each frame of the second channel audio signal at each frequency point in the first frequency point set, the amplitude attenuation coefficient of each frame of the second channel audio signal at each frequency point in the first frequency point set is obtained.
[0022] Since the signal-to-noise ratio (SNR) indicates the ratio of the part of interest to the noise part of the signal, the amplitude attenuation coefficient at each frequency point in the first frequency point set can be obtained based on the SNR of the first and second channel audio in the first frequency point set.
[0023] In one possible implementation, for each frame of the dual-channel audio signal, the weighted average of the amplitude attenuation coefficients at each frequency point in the first frequency point set is determined as the single-frame amplitude attenuation coefficient of each frame signal, including:
[0024] The weighted average of the amplitude attenuation coefficients of each frame of the first channel audio signal at each frequency point in the first frequency point set is determined as the single-frame amplitude attenuation coefficient of each frame of the first channel audio signal.
[0025] The weighted average of the amplitude attenuation coefficients of each frame of the second channel audio signal at each frequency point in the first frequency point set is determined as the single-frame amplitude attenuation coefficient of each frame of the first channel audio signal.
[0026] As can be seen, by processing each frame of the first audio channel signal and each frame of the second audio channel signal separately, the single-frame amplitude attenuation coefficient of each frame of the first audio channel signal and the single-frame amplitude attenuation coefficient of each frame of the second audio channel signal can be obtained.
[0027] In one possible implementation, the method further includes:
[0028] A nonlinear transformation is performed on each frequency point in the second frequency point set to obtain the nonlinear frequency point corresponding to each frequency point in the second frequency point set;
[0029] The nonlinear frequency points that meet the preset conditions among the nonlinear frequency points corresponding to each frequency point in the second frequency point set are determined as the target nonlinear frequency points;
[0030] The frequency point corresponding to the target nonlinear frequency point before the nonlinear transformation is determined as the frequency point in the first frequency point set.
[0031] As can be seen, a subset of frequencies can be selected from the second frequency set and used as frequencies in the first frequency set to participate in the calculation of the amplitude attenuation coefficients for each frequency in the first frequency set. For example, the frequencies in the first frequency set might be the frequencies in the second frequency set that are sensitive to human hearing. This method can effectively reduce the amount of calculation required for the amplitude attenuation coefficients, thereby improving the efficiency of audio processing.
[0032] In one possible implementation, the method further includes:
[0033] The amplitude attenuation coefficients of each frame of the dual-channel audio signal at each frequency point in the first frequency point set are input into the target model to obtain the weighted weights of the amplitude attenuation coefficients of each frame of the signal; the weighted weights are used to obtain the single-frame amplitude attenuation coefficients of each frame of the signal.
[0034] Alternatively, search the database for a weighted value of the amplitude attenuation coefficient that matches the amplitude attenuation coefficient at each frequency point in the first frequency point set for each frame of signal.
[0035] As can be seen, the weighting values used in calculating the single-frame amplitude attenuation coefficient can be determined through model processing or database lookup. Since the model can obtain the corresponding weighting values with better amplitude attenuation effect for different amplitude attenuation coefficients at various frequency points during pre-training or database establishment, this method can determine more suitable weighting values, thereby making the obtained single-frame amplitude attenuation coefficient more accurate.
[0036] Secondly, this application provides a communication device that includes units for implementing the methods described in the first aspect and any possible implementation thereof.
[0037] Thirdly, this application provides a chip including a processor and a communication interface, the processor being configured to cause the chip to perform the methods described in the first aspect above and any possible implementation thereof.
[0038] Fourthly, this application provides a module device, which includes a communication module, a power module, a storage module, and a chip, wherein: the power module is used to provide electrical energy to the module device; the storage module is used to store data and instructions; the communication module is used for internal communication within the module device, or for communication between the module device and external devices; and the chip is used to execute the methods in the first aspect and any possible implementation thereof.
[0039] Fifthly, this application provides a terminal device including a memory and a processor. The memory is used to store a computer program, the computer program including program instructions, and the processor is configured to invoke the program instructions to execute the methods described in the first aspect and any possible implementation thereof.
[0040] In a sixth aspect, this application provides a computer-readable storage medium storing a computer program, the computer program including program instructions that, when executed on a communication device, cause the communication device to perform the methods described in the first aspect and any possible implementation thereof.
[0041] In a seventh aspect, this application provides a computer program or computer program product, including program instructions that, when executed on a computer, cause the computer to perform the methods described in the first aspect above and any possible implementation thereof. Attached Figure Description
[0042] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0043] Figure 1 This is a schematic diagram of a communication system architecture provided in an embodiment of this application;
[0044] Figure 2 This is a schematic flowchart of an audio processing method provided in an embodiment of this application;
[0045] Figure 3 This is a schematic diagram of the spectrum before and after amplitude attenuation provided in an embodiment of this application;
[0046] Figure 4 This is a schematic diagram of the structure of a communication device provided in an embodiment of this application;
[0047] Figure 5 This is a schematic diagram of the structure of a terminal device provided in an embodiment of this application;
[0048] Figure 6 This is a schematic diagram of the structure of a module device provided in an embodiment of this application. Detailed Implementation
[0049] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0050] The terms "first" and "second," etc., used in the specification, claims, and drawings of this application are used to distinguish different objects, not to describe a specific order. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or apparatus that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or apparatuses.
[0051] First, some of the terms used in the embodiments of this application will be explained to facilitate understanding by those skilled in the art.
[0052] 1. Speech enhancement
[0053] Typically, microphones capture sound along with background noise. For example, when capturing a speaker's voice, a microphone will also pick up background noise. Speech enhancement technology aims to extract the purest possible target sound from the sound captured by the microphone while suppressing background noise. Microphone array-based speech enhancement technology utilizes the spatial information contained in the sound captured by multiple microphones to process speech and achieve speech enhancement, including but not limited to: amplitude attenuation technology based on phase difference and beamforming technology.
[0054] 2. Beamforming
[0055] Beamforming, also known as beamforming or beam shaping, refers to the process of merging multiple audio signals captured by a microphone array. During merging, interference signals from non-target directions are suppressed, while signals from the target direction are amplified, resulting in the desired audio signal. The target and non-target directions vary depending on the audio capture scenario. For example, when filming an instructional video, if the subject is a teacher, the target direction is the teacher's position relative to the microphone array, and the directions of other sound sources are considered non-target directions.
[0056] Traditional beamforming techniques include fixed beamforming and adaptive beamforming. Fixed beamforming uses pre-set beam weights corresponding to the target direction to ensure the synthesized audio signal points in the desired direction. Adaptive beamforming analyzes the received audio in real-time to obtain information about each sound source within the audio, and then continuously adjusts and adapts the beam weights based on this information to ensure the filtered audio signal points as close to the target direction as possible.
[0057] 3. Human ear sensitivity spectrum
[0058] Because different types of sounds have different frequency ranges and the human ear's audible frequency range is limited, the human ear's sensitivity to different types of sounds varies. Generally, the human ear is more sensitive to low-frequency sounds and less sensitive to high-frequency sounds; that is, the sensitivity of the human ear to sound has a non-linear relationship with the frequency of the sound. Currently, this non-linear relationship can be represented by Mel spectrum, Barker spectrum, or equivalent rectangular bandwidth (ERB) spectrum.
[0059] Because the amplitude attenuation technique based on phase difference in voice enhancement technology causes non-uniform attenuation of the audio across the entire spectrum, resulting in audio distortion, this application proposes an audio processing method, apparatus, chip, module device, and storage medium to reduce audio distortion. This method can be applied to various communication systems, such as: Global System for Mobile Communication (GSM), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), General Packet Radio Service (GPRS), Long Term Evolution (LTE), LTE Frequency Division Duplex (FDD), LTE Time Division Duplex (TDD), Universal Mobile Telecommunications System (UMTS), Worldwide Interoperability for Microwave Access (WiMAX), 5th Generation (5G), New Radio (NR), and future communication systems.
[0060] The aforementioned communication system may include a terminal device. The technical solution of this embodiment can be executed by the terminal device so that the terminal device can enhance the dual-channel audio.
[0061] This terminal device can be referred to as a terminal, mobile terminal (MT), access terminal device, vehicle-mounted terminal device, industrial control terminal device, user experience (UE) unit, UE station, mobile station, remote station, remote terminal device, mobile device, UE terminal device, wireless communication device, UE agent or UE device, etc. For example, terminal devices can be mobile phones, tablets, desktop computers, laptops, all-in-one computers, in-vehicle terminals, virtual reality (VR) terminal devices, augmented reality (AR) terminal devices, wireless terminals in industrial control, wireless terminals in self-driving, wireless terminals in remote medical surgery, wireless terminals in smart grids, wireless terminals in transportation safety, wireless terminals in smart cities, wireless terminals in smart homes, cellular phones, cordless phones, session initiation protocol (SIP) phones, wireless local loop (WLL) stations, personal digital assistants (PDAs), handheld devices with wireless communication capabilities, computing devices or other processing devices connected to a wireless modem, wearable devices, user equipment in future mobile communication networks, or terminal devices in future evolved public land mobile networks (PLMNs).
[0062] Optionally, the aforementioned communication system may further include a server, which can be used to assist the terminal device in executing the technical solutions proposed in this application. For example, Figure 1 This is a schematic diagram of a communication system architecture provided in an embodiment of this application, such as... Figure 1 As shown, the communication system includes a terminal device 101 and a server 102. The terminal device 101 and the server 102 are linked through a network, which can be a wired network or a wireless network, etc. In this application, the server 102 can send a weighted value of the amplitude attenuation coefficient to the terminal device 101, so that the terminal device 101 can obtain the single-frame amplitude attenuation coefficient of each frame signal according to the weighted value.
[0063] It should be noted that server 102 can be an independent physical server, a server cluster or distributed system consisting of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, content delivery networks (CDNs), and big data and artificial intelligence platforms.
[0064] The following is through Figures 2 to 6 This application provides a detailed description of the audio processing method, apparatus, chip, module device, and storage medium proposed in this application.
[0065] Please see Figure 2 , Figure 2 This is a flowchart illustrating an audio processing method provided in an embodiment of this application, which includes steps S201 to S205. Figure 2 The method shown can be executed by a terminal device or a chip within a terminal device, as described in the above description. This application does not exclusively limit the execution subject of the method; the corresponding execution subject can be any device, chip, or software module capable of implementing the method, or a combination thereof. For ease of subsequent description, this application uses a terminal device as an example for illustration. Wherein:
[0066] S201. The terminal device acquires dual-channel audio collected by the dual-microphone array.
[0067] In this application, the microphone array configured in the terminal device is a dual-microphone array, which refers to an array composed of two microphones arranged in a preset manner. Dual-channel audio includes the channel audio collected by the two microphones respectively. For example, a dual-microphone array includes a first microphone and a second microphone, and dual-channel audio includes the first channel audio collected by the first microphone and the second channel audio collected by the second microphone.
[0068] Optionally, the dual-channel audio can be audio captured in real time, audio stored in the terminal device, or audio obtained by the terminal device from other devices in real time. This application does not restrict the source of the dual-channel audio.
[0069] S202. The terminal device processes the dual-channel audio and obtains the amplitude attenuation coefficient of each frame signal of the dual-channel audio at each frequency point in the first frequency point set.
[0070] In this context, each frame of the dual-channel audio signal refers to the frequency domain signal of each frame of the dual-channel audio. The first frequency point set is a subset of the second frequency point set, which is the set of frequency points obtained after performing a Fourier transform on the dual-channel audio. The amplitude attenuation coefficient of each frame of the dual-channel audio signal at each frequency point in the first frequency point set is used to either enhance or suppress the amplitude of each frame of the dual-channel audio signal at each frequency point in the first frequency point set.
[0071] In one possible implementation, the terminal device processes dual-channel audio to obtain the amplitude attenuation coefficients of each frame signal of the dual-channel audio at each frequency point in the first frequency point set. Specifically, this includes: performing frame segmentation, windowing, and Fourier transform on the dual-channel audio to obtain each frame signal of the dual-channel audio; determining the phase difference of each frame signal of the dual-channel audio at each frequency point in the first frequency point set; and based on the relationship between the phase difference and the amplitude attenuation coefficient, and the phase difference of each frame signal of the dual-channel audio at each frequency point in the first frequency point set, obtaining the amplitude attenuation coefficients of each frame signal of the first channel audio and the amplitude attenuation coefficients of each frame signal of the second channel audio at each frequency point in the first frequency point set.
[0072] In this application, the terminal device can perform framing, windowing, and Fourier transform on the first and second audio channels of the dual-channel audio, respectively, to obtain each frame signal of the dual-channel audio. Each frame signal of the dual-channel audio includes each frame signal of the first channel and each frame signal of the second channel. For example, the windowing process involved in this process can use any of the following window functions: rectangular window, sine window, Hanning window, Hamming window, Tukey window, etc. For example, the Fourier transform in this process includes, but is not limited to, Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT), etc.
[0073] Then, in this method, the terminal device can obtain the phase difference of each frame signal of the dual-channel audio at each frequency point in the first frequency point set. For example, if the terminal device performs frame division on the dual-channel audio, obtaining N frames for the first channel audio and N frames for the second channel audio; the first frequency point set includes M frequency points, each frequency point corresponding to a frequency. Then, the phase of the k-th frame signal of the first channel audio at the w-th frequency point in the first frequency point set is θ1(k,w), and the phase of the k-th frame signal of the second channel audio at the w-th frequency point in the first frequency point set is θ2(k,w), where k is a positive integer greater than or equal to 1 and less than or equal to N, and w is a positive integer greater than or equal to 1 and less than or equal to M. The phase difference θ of the k-th frame signal of the dual-channel audio at the w-th frequency point in the first frequency point set is... k (w) can be obtained from the following formula:
[0074] θ k (w)=|θ1(k,w)-θ2(k,w)| (Formula 1)
[0075] Furthermore, the terminal device can obtain the amplitude attenuation coefficient at each frequency point in the first frequency point set for each frame of the dual-channel audio signal based on the relationship between the phase difference and the amplitude attenuation coefficient, and the phase difference obtained above. For example, Formula 2 shows the correspondence between the phase difference and the amplitude attenuation coefficient:
[0076]
[0077] Where, η k (w) is the amplitude attenuation coefficient of the k-th frame signal of the dual-channel audio at the w-th frequency point in the first frequency point set, η k The value of (w) is less than 1, and γ is a fixed constant. From Formula 2, it can be seen that when θ... k When (w) is 0, it means that there is no phase difference between the first channel audio and the second channel audio at the w-th frequency point of the k-th frame signal. In this case, η can be determined according to Formula 2. k When (w) is 1, it means that the signal amplitude is neither amplified nor attenuated. When θ k The larger (w) is, the greater the phase difference between the first channel audio and the second channel audio at the w-th frequency point of the k-th frame signal. In this case, η can be determined according to Formula 2. k A smaller value for (w) corresponds to a greater attenuation of the signal amplitude. In dual-channel audio, the deviation between the direction of the part of interest (or target direction) and the direction of the noise (or non-target direction) is related to the phase difference of the dual-channel audio signal; the greater the deviation, the greater the phase difference. Therefore, a smaller value for η is obtained when the deviation is large. k (w) is more effective at suppressing noise.
[0078] Based on the above description, it can be obtained that the amplitude attenuation coefficients of the first channel audio and the second channel audio at the w-th frequency point of the k-th frame signal are both η. k (w).
[0079] In another possible implementation, the terminal device processes dual-channel audio to obtain the amplitude attenuation coefficient of each frame signal of the dual-channel audio at each frequency point in the first frequency point set. Specifically, this includes: performing frame segmentation, windowing, and Fourier transform on the dual-channel audio to obtain each frame signal of the dual-channel audio; determining the signal-to-noise ratio (SNR) of each frame signal of the first channel audio at each frequency point in the first frequency point set, and the SNR of each frame signal of the second channel audio at each frequency point in the first frequency point set; obtaining the amplitude attenuation coefficient of each frame signal of the first channel audio at each frequency point in the first frequency point set based on the relationship between the SNR and the amplitude attenuation coefficient and the SNR of each frame signal of the first channel audio at each frequency point in the first frequency point set; and obtaining the amplitude attenuation coefficient of each frame signal of the second channel audio at each frequency point in the first frequency point set based on the relationship between the SNR and the amplitude attenuation coefficient and the SNR of each frame signal of the second channel audio at each frequency point in the first frequency point set.
[0080] In this method, after the terminal device obtains each frame of the dual-channel audio signal, it can determine the signal-to-noise ratio (SNR) of each frame at each frequency point in the first frequency point set. For example, if the SNR of the k-th frame of the first channel audio signal at the w-th frequency point in the first frequency point set is R1(k,w), and the SNR of the k-th frame of the second channel audio signal at the w-th frequency point in the first frequency point set is R2(k,w), then the amplitude attenuation coefficients of the k-th frame of the first channel audio and the second channel audio at the w-th frequency point in the first frequency point set can be obtained according to the following formula three:
[0081]
[0082] Among them, when calculating the amplitude attenuation coefficient of the k-th frame signal of the first channel audio at the w-th frequency point according to Formula 3, R k The value of (w) is R1(k,w); when calculating the amplitude attenuation coefficient of the k-th frame signal of the second channel audio at the w-th frequency point according to Formula 3, R k The value of (w) is R2(k,w). As shown in Formula 3, the greater the noise in the audio, the lower the signal-to-noise ratio R. k The smaller (w) is, the more η is obtained according to Formula 3. k The smaller the value of (w), the greater the attenuation of the signal amplitude.
[0083] The following explains why the first frequency point set is a subset of the second frequency point set:
[0084] Optionally, if the first frequency set is the same as the second frequency set, the terminal device needs to obtain the amplitude attenuation coefficients at all frequency points after Fourier transform for each frame of the dual-channel audio signal. For example, for each frame of the dual-channel audio signal, a total of M amplitude attenuation coefficients at M frequency points are needed.
[0085] Optionally, if the first frequency point set is a subset of the selected frequency points from the second frequency point set, the terminal device does not need to obtain the amplitude attenuation coefficients at all frequency points after the Fourier transform; it only needs to process a subset of the selected frequency points, which can effectively reduce the amount of data processing required to obtain the amplitude attenuation coefficients. In one possible implementation, the terminal device can perform a nonlinear transformation on each frequency point in the second frequency point set to obtain the nonlinear frequency points corresponding to each frequency point in the second frequency point set; determine the nonlinear frequency points that meet preset conditions among the nonlinear frequency points corresponding to each frequency point in the second frequency point set as target nonlinear frequency points; and determine the frequency points corresponding to the target nonlinear frequency points before the nonlinear transformation as the frequency points in the first frequency point set.
[0086] For example, when a terminal device performs a nonlinear transformation on each frequency point in a second frequency point set to obtain a nonlinear frequency point, it can nonlinearly transform each frequency point in the second frequency point set to a Mel scale, a Barker scale, or an equivalent rectangular bandwidth (ERB) scale. Since the frequency of each frequency point in the second frequency point set after the nonlinear transformation changes linearly with the sensitivity of the human ear, a frequency point that meets a preset condition can be selected as the target nonlinear frequency point. For example, the frequency point that meets the preset condition is a frequency point in the frequency band where the sensitivity of the human ear exceeds a preset threshold, for example, the frequency band is 1000Hz to 3000Hz.
[0087] S203. For each frame of dual-channel audio signal, the terminal device determines the single-frame amplitude attenuation coefficient of each frame signal as the weighted average of the amplitude attenuation coefficients at each frequency point in the first frequency point set.
[0088] In one possible implementation, for each frame of the dual-channel audio signal, the terminal device determines the single-frame amplitude attenuation coefficient of each frame signal as the weighted average of the amplitude attenuation coefficients at each frequency point in the first frequency point set, including: determining the single-frame amplitude attenuation coefficient of each frame of the first channel audio signal as the weighted average of the amplitude attenuation coefficients at each frequency point in the first frequency point set; and determining the single-frame amplitude attenuation coefficient of each frame of the second channel audio signal as the weighted average of the amplitude attenuation coefficients at each frequency point in the first frequency point set.
[0089] In this embodiment, the weighting values for calculating the single-frame amplitude attenuation coefficient can be determined using a window function, including rectangular windows, triangular windows, etc. Taking a rectangular window as an example, the single-frame amplitude attenuation coefficient of the k-th frame signal of the first channel audio / second channel audio can be obtained by the following formula:
[0090]
[0091] Where M is the total number of frequency points in the first frequency point set. As can be seen from Formula 4, when using a rectangular window to obtain the single-frame amplitude attenuation coefficient corresponding to a frame of signal, the single-frame amplitude attenuation coefficient is equal to the average value of the amplitude attenuation coefficients corresponding to M frequency points in that frame of signal.
[0092] Specifically, the terminal device can also determine the weighting values for calculating the single-frame amplitude attenuation coefficient according to any of the following methods:
[0093] Optionally, the amplitude attenuation coefficients of each frame of the dual-channel audio signal at each frequency point in the first frequency point set are input into the target model to obtain the weighted weights of the amplitude attenuation coefficients of each frame of the signal.
[0094] The target model can be pre-trained on a server and then sent to the terminal device for storage. During pre-training, the server can calculate the weighted values corresponding to better amplitude attenuation effects for different combinations of amplitude attenuation coefficients at various frequency points. Then, the terminal device can input the amplitude attenuation coefficients of the k-th frame signal from the first / second audio channel at various frequency points in the first frequency set into the stored target model to obtain the weighted values corresponding to the k-th frame signal. These weighted values will be used to obtain the single-frame amplitude attenuation coefficients with better amplitude attenuation effects. For example, the target model may include, but is not limited to, deep learning models such as convolutional neural networks.
[0095] Optionally, the terminal device searches the database for a weighted value of the amplitude attenuation coefficient that matches the amplitude attenuation coefficient at each frequency point in the first frequency point set for each frame of signal.
[0096] The database can be pre-built on the server and sent to the terminal device for storage. The server can build the database through expert experience, machine learning, etc., and the database stores the correspondence between different combinations of amplitude attenuation coefficients at each frequency point and their weighting values. The terminal device can search the database for the combinations of amplitude attenuation coefficients at the frequencies that have the highest similarity to the amplitude attenuation coefficients at each frequency point in the first frequency point set of the k-th frame signal of the first / second channel audio. The weighting value corresponding to the combination of the highest similarity amplitude attenuation coefficients is then used as the weighting value of the k-th frame signal of the first / second channel audio.
[0097] S204. For each frame of dual-channel audio signal, the terminal device performs amplitude attenuation processing on each frequency point in the second frequency point set of each frame signal based on the single-frame amplitude attenuation coefficient of each frame signal to obtain the attenuated signal of each frame of dual-channel audio.
[0098] In this embodiment, the attenuation signal per frame of the dual-channel audio includes the attenuation signal per frame of the first channel audio and the attenuation signal per frame of the second channel audio. The terminal device can perform amplitude attenuation based on the single-frame amplitude attenuation coefficient of each frame signal of the first channel audio to obtain the attenuation signal per frame of the first channel audio; and perform amplitude attenuation based on the single-frame amplitude attenuation coefficient of each frame signal of the second channel audio to obtain the attenuation signal per frame of the second channel audio.
[0099] For example, taking the k-th frame signal of the first channel audio as an example, if the second frequency point set includes S frequency points, the amplitude y1 of the k-th frame signal of the first channel audio at the s-th frequency point in the second frequency point set after attenuation is... ′ (k,s) can be obtained according to the following formula:
[0100]
[0101] Where s is a positive integer greater than or equal to 1 and less than or equal to S, and y1(k,s) is the amplitude of the k-th frame signal of the first channel audio at the s-th frequency point in the second frequency point set before attenuation. In other words, after obtaining the single-frame amplitude attenuation coefficient corresponding to the k-th frame signal, the amplitude corresponding to each frequency point can be multiplied by the single-frame amplitude attenuation coefficient corresponding to that frame.
[0102] Based on this method, each frame of the first / second audio channel signal can be attenuated uniformly without distortion after amplitude attenuation.
[0103] S205. The terminal device performs beamforming and inverse Fourier transform on the attenuation signal of each frame of the dual-channel audio to obtain the enhanced dual-channel audio.
[0104] In this embodiment, the terminal device can combine the attenuation signals of each frame of the first channel audio and the attenuation signals of each frame of the second channel audio based on the beam weights obtained in beamforming, thereby obtaining each frame signal of the enhanced dual-channel audio. For example, the beam weights can be obtained using any of several methods, such as delay-sum beamforming (DSB) and minimum variance distortionless response (MVDR). These beam weights can enhance signals in the target direction (such as human voice) and suppress signals in non-target directions (such as ambient noise) in the dual-channel audio. Then, the terminal device can perform an inverse Fourier transform on each frame signal of the enhanced dual-channel audio to obtain the enhanced dual-channel audio.
[0105] Optionally, when the terminal device uses an array of two or more microphones, the amplitude attenuation coefficient of each frame of audio signal captured by each microphone at each frequency point in the first frequency point set can be determined based on the signal-to-noise ratio calculation method mentioned above. Then, the amplitude attenuation coefficients of each frame of audio signal captured by each microphone at each frequency point in the first frequency point set are weighted and averaged to obtain the single-frame amplitude attenuation coefficient of each frame of audio signal captured by each microphone. However, based on the above description, the single-frame amplitude attenuation coefficient can still be used to uniformly attenuate each frame of signal, and the enhanced multi-channel audio can be obtained based on beamforming and inverse Fourier transform.
[0106] based on Figure 2 In the described embodiments, the terminal device can enhance the portion of interest or target part of the dual-channel audio and suppress the portion of no interest or / or interference parts through amplitude attenuation and beamforming. Furthermore, the terminal device can perform a weighted average of the amplitude attenuation coefficients of each frame of the dual-channel audio signal at various frequency points in a first frequency point set to obtain a single-frame amplitude attenuation coefficient for each frame; and perform amplitude attenuation on each frequency point obtained after Fourier transform based on the single-frame attenuation coefficient, ensuring that each frame of signal attenuates uniformly across the entire spectrum, and that each frame of signal does not produce distortion after amplitude attenuation, thus reducing distortion in the enhanced dual-channel audio.
[0107] The following is through Figure 3 The effect of using a single-frame amplitude attenuation coefficient to process the amplitude attenuation part in this application is illustrated with an example: Figure 3As shown, spectrogram 310 is the frequency-amplitude diagram of one channel of audio in a dual-channel audio stream before amplitude attenuation processing. After processing this single channel audio stream with different amplitude attenuation coefficients at each frequency point, spectrogram 320 is obtained. Comparing spectrograms 310 and 320, region 301 in spectrogram 310 appears granular and discontinuous in spectrogram 320, indicating severe distortion in this channel audio. Severely distorted audio sounds unnatural. By first calculating the single-frame amplitude attenuation coefficient of this single channel audio stream using the method described in this application, and then applying the single-frame amplitude attenuation coefficient for amplitude attenuation, spectrogram 330 is obtained. In spectrogram 330, the spectrum of region 301 no longer appears granular; the spectrum is clearer and more continuous, indicating that the distortion of this channel audio stream has been suppressed. Therefore, based on the scheme proposed in this application, distortion in audio enhancement can be effectively reduced, and the naturalness of the enhanced audio sound can be improved.
[0108] Please see Figure 4 , Figure 4 This is a schematic diagram of a communication device provided in an embodiment of the present invention. The communication device can be a terminal device or a device with terminal device functions (such as a chip). Specifically, as shown... Figure 4 As shown, the communication device 400 may include:
[0109] Acquisition unit 401 is used to acquire dual-channel audio collected by a dual-microphone array;
[0110] Processing unit 402 is used to process dual-channel audio and obtain the amplitude attenuation coefficient of each frame signal of dual-channel audio at each frequency point in the first frequency point set.
[0111] The processing unit 402 is also used to determine the single-frame amplitude attenuation coefficient of each frame signal as the weighted average of the amplitude attenuation coefficients at each frequency point in the first frequency point set for each frame signal of the dual-channel audio.
[0112] The processing unit 402 is also used to perform amplitude attenuation processing on each frequency point in the second frequency point set of each frame signal based on the single frame amplitude attenuation coefficient of each frame signal for each frame signal of the dual-channel audio, so as to obtain the attenuated signal of each frame of the dual-channel audio, wherein the first frequency point set is a subset of the second frequency point set, and the second frequency point set is the frequency point set obtained after performing Fourier transform on the dual-channel audio.
[0113] The processing unit 402 is also used to perform beamforming and inverse Fourier transform on the attenuation signal of each frame based on the dual-channel audio to obtain the enhanced dual-channel audio.
[0114] In one possible implementation, when processing the dual-channel audio and obtaining the amplitude attenuation coefficients of each frame signal of the dual-channel audio at each frequency point in the first frequency point set, the processing unit 402 is specifically used for:
[0115] The dual-channel audio is framed, windowed, and subjected to Fourier transform to obtain the signal of each frame of the dual-channel audio.
[0116] Determine the phase difference of each frame of the dual-channel audio signal at each frequency point in the first frequency point set;
[0117] Based on the relationship between phase difference and amplitude attenuation coefficient, and the phase difference of each frame signal of the dual-channel audio at each frequency point in the first frequency point set, the amplitude attenuation coefficient of each frame signal of the first channel audio in the dual-channel audio at each frequency point in the first frequency point set, and the amplitude attenuation coefficient of each frame signal of the second channel audio in the dual-channel audio at each frequency point in the first frequency point set are obtained.
[0118] In one possible implementation, when processing the dual-channel audio and obtaining the amplitude attenuation coefficients of each frame signal of the dual-channel audio at each frequency point in the first frequency point set, the processing unit 402 is specifically used for:
[0119] The dual-channel audio is framed, windowed, and subjected to Fourier transform to obtain the signal of each frame of the dual-channel audio.
[0120] Determine the signal-to-noise ratio (SNR) of each frame of the first audio channel in the dual-channel audio at each frequency point in the first frequency point set, and the SNR of each frame of the second audio channel in the dual-channel audio at each frequency point in the first frequency point set;
[0121] Based on the relationship between signal-to-noise ratio and amplitude attenuation coefficient, and the signal-to-noise ratio of each frame of the first channel audio signal at each frequency point in the first frequency point set, the amplitude attenuation coefficient of each frame of the first channel audio signal at each frequency point in the first frequency point set is obtained.
[0122] Based on the relationship between signal-to-noise ratio and amplitude attenuation coefficient, and the signal-to-noise ratio of each frame of the second channel audio signal at each frequency point in the first frequency point set, the amplitude attenuation coefficient of each frame of the second channel audio signal at each frequency point in the first frequency point set is obtained.
[0123] In one possible implementation, when the processing unit determines the single-frame amplitude attenuation coefficient of each frame signal as the weighted average of the amplitude attenuation coefficients at each frequency point in the first frequency point set for each frame of the dual-channel audio signal, it specifically performs the following:
[0124] The weighted average of the amplitude attenuation coefficients of each frame of the first channel audio signal at each frequency point in the first frequency point set is determined as the single-frame amplitude attenuation coefficient of each frame of the first channel audio signal.
[0125] The weighted average of the amplitude attenuation coefficients of each frame of the second channel audio signal at each frequency point in the first frequency point set is determined as the single-frame amplitude attenuation coefficient of each frame of the first channel audio signal.
[0126] In one possible implementation, the processing unit 402 is further configured to:
[0127] A nonlinear transformation is performed on each frequency point in the second frequency point set to obtain the nonlinear frequency point corresponding to each frequency point in the second frequency point set;
[0128] The nonlinear frequency points that meet the preset conditions among the nonlinear frequency points corresponding to each frequency point in the second frequency point set are determined as the target nonlinear frequency points;
[0129] The frequency point corresponding to the target nonlinear frequency point before the nonlinear transformation is determined as the frequency point in the first frequency point set.
[0130] In one possible implementation, the processing unit 402 is further configured to:
[0131] The amplitude attenuation coefficients of each frame of the dual-channel audio signal at each frequency point in the first frequency point set are input into the target model to obtain the weighted weights of the amplitude attenuation coefficients of each frame of the signal; the weighted weights are used to obtain the single-frame amplitude attenuation coefficients of each frame of the signal.
[0132] Alternatively, search the database for a weighted value of the amplitude attenuation coefficient that matches the amplitude attenuation coefficient at each frequency point in the first frequency point set for each frame of signal.
[0133] This application also provides a chip that can execute the relevant steps of the terminal device in the foregoing method embodiments. The chip includes a processor and a communication interface, and the processor is configured to enable the chip to execute the methods described above. Figure 2 The method described.
[0134] Please see Figure 5 , Figure 5 This is a schematic diagram of a terminal device according to an embodiment of the present invention. The terminal device 500 may include a memory 501 and a processor 502. Optionally, it may also include a communication interface 503. The memory 501, processor 502, and communication interface 503 are connected through one or more communication buses. The communication interface 503 is controlled by the processor 502 for sending and receiving information.
[0135] Memory 501 may include read-only memory and random access memory, and provides instructions and data to processor 502. A portion of memory 501 may also include non-volatile random access memory.
[0136] Communication interface 503 is used to receive or send data.
[0137] Processor 502 can be a Central Processing Unit (CPU), but it can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor; optionally, processor 502 can also be any conventional processor. Wherein:
[0138] Memory 501 is used to store computer programs, which include program instructions.
[0139] Processor 502 is used to call program instructions stored in memory 501.
[0140] The processor 502 calls the program instructions stored in the memory 501, causing the terminal device 500 to execute the method executed by the terminal device in the above method embodiment.
[0141] Please see Figure 6 , Figure 6 This is a schematic diagram of the structure of a module device provided in an embodiment of this application. The module device 600 can perform the relevant steps of the terminal device in the aforementioned method embodiments. The module device 600 includes: a communication module 601, a power module 602, a storage module 603, and a chip 604.
[0142] The power module 602 is used to provide power to the module device; the storage module 603 is used to store data and instructions; the communication module 601 is used for internal communication within the module device or for communication between the module device and external devices; and the chip 604 is used to execute the method executed by the terminal device in the above method embodiments.
[0143] It should be noted that for any aspects of the devices, chips, terminal devices, and module devices in this application that are not mentioned in the previous embodiments, as well as the specific implementation methods of each step, please refer to [link to relevant documentation]. Figures 1-2 The embodiments shown and the foregoing content will not be repeated here.
[0144] This application also provides a computer-readable storage medium storing a computer program, which includes program instructions. When the program instructions are executed on a processor, the method flow of the above method embodiments is implemented.
[0145] This application also provides a computer program, which includes program instructions that, when executed on a computer, cause the computer to perform the method described in the above method embodiments.
[0146] Regarding the modules / units included in the various devices and products described in the above embodiments, they can be software modules / units, hardware modules / units, or a combination of both. For example, for various devices and products applied to or integrated into a chip, all of their modules / units can be implemented using hardware methods such as circuits, or at least some modules / units can be implemented using software programs that run on the chip's integrated processor, while the remaining (if any) modules / units can be implemented using hardware methods such as circuits. For various devices and products applied to or integrated into a chip module, all of their modules / units can be implemented using hardware methods such as circuits. Different modules / units can be located in the same part (e.g., chip, circuit module, etc.) or different components of the chip module, or at least some modules / units... It can be implemented using software programs that run on the processor integrated within the chip module. The remaining (if any) modules / units can be implemented using hardware methods such as circuits. For various devices and products applied to or integrated into the terminal, the modules / units they contain can all be implemented using hardware methods such as circuits. Different modules / units can be located in the same component (e.g., chip, circuit module, etc.) or in different components within the terminal. Alternatively, at least some modules / units can be implemented using software programs that run on the processor integrated within the terminal, while the remaining (if any) modules / units can be implemented using hardware methods such as circuits.
[0147] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some operations can be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to this application.
[0148] The descriptions of the various embodiments provided in this application can be referenced mutually. Each embodiment has its own emphasis, and parts not described in detail in a certain embodiment can be referred to the relevant descriptions of other embodiments. For the sake of convenience and brevity, for example, the functions and operations of the various devices and equipment provided in the embodiments of this application can be referred to the relevant descriptions of the method embodiments of this application. The method embodiments and the device embodiments can also be referenced, combined or cited from each other.
[0149] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of this application.
Claims
1. An audio processing method, characterized in that, The method includes: Acquire dual-channel audio from a dual-microphone array; The dual-channel audio is processed to obtain the amplitude attenuation coefficient of each frame signal of the dual-channel audio at each frequency point in the first frequency point set; For each frame of the dual-channel audio signal, the weighted average of the amplitude attenuation coefficients of each frame signal at each frequency point in the first frequency point set is determined as the single-frame amplitude attenuation coefficient of each frame signal. For each frame of the dual-channel audio signal, amplitude attenuation processing is performed on each frequency point in the second frequency point set of the signal based on the single-frame amplitude attenuation coefficient of each frame to obtain the attenuated signal of each frame of the dual-channel audio. The first frequency point set is a subset of the second frequency point set, and the second frequency point set is the frequency point set obtained after performing Fourier transform on the dual-channel audio. Beamforming and inverse Fourier transform are performed on the attenuation signal of each frame of the dual-channel audio to obtain the enhanced dual-channel audio.
2. The method according to claim 1, characterized in that, The process of processing the dual-channel audio to obtain the amplitude attenuation coefficient of each frame signal of the dual-channel audio at each frequency point in the first frequency point set includes: The dual-channel audio is divided into frames, windowed, and subjected to Fourier transform to obtain the signal of each frame of the dual-channel audio. Determine the phase difference of each frame signal of the dual-channel audio at each frequency point in the first frequency point set; Based on the relationship between phase difference and amplitude attenuation coefficient, and the phase difference of each frame signal of the dual-channel audio at each frequency point in the first frequency point set, the amplitude attenuation coefficient of each frame signal of the first channel audio in the dual-channel audio at each frequency point in the first frequency point set, and the amplitude attenuation coefficient of each frame signal of the second channel audio in the dual-channel audio at each frequency point in the first frequency point set are obtained.
3. The method according to claim 1, characterized in that, The process of processing the dual-channel audio to obtain the amplitude attenuation coefficient of each frame signal of the dual-channel audio at each frequency point in the first frequency point set includes: The dual-channel audio is divided into frames, windowed, and subjected to Fourier transform to obtain the signal of each frame of the dual-channel audio. Determine the signal-to-noise ratio (SNR) of each frame of the first channel audio in the dual-channel audio at each frequency point in the first frequency point set, and the SNR of each frame of the second channel audio in the dual-channel audio at each frequency point in the first frequency point set; Based on the relationship between signal-to-noise ratio and amplitude attenuation coefficient, and the signal-to-noise ratio of each frame of the first channel audio signal at each frequency point in the first frequency point set, the amplitude attenuation coefficient of each frame of the first channel audio signal at each frequency point in the first frequency point set is obtained. Based on the relationship between signal-to-noise ratio and amplitude attenuation coefficient, and the signal-to-noise ratio of each frame of the second channel audio signal at each frequency point in the first frequency point set, the amplitude attenuation coefficient of each frame of the second channel audio signal at each frequency point in the first frequency point set is obtained.
4. The method according to claim 2 or 3, characterized in that, For each frame of the dual-channel audio signal, the weighted average of the amplitude attenuation coefficients at each frequency point in the first frequency point set is used to determine the single-frame amplitude attenuation coefficient of each frame signal, including: The weighted average of the amplitude attenuation coefficients of each frame of the first channel audio signal at each frequency point in the first frequency point set is determined as the single-frame amplitude attenuation coefficient of each frame of the first channel audio signal. The weighted average of the amplitude attenuation coefficients of each frame of the second channel audio signal at each frequency point in the first frequency point set is determined as the single-frame amplitude attenuation coefficient of each frame of the first channel audio signal.
5. The method according to any one of claims 1-3, characterized in that, The method further includes: A nonlinear transformation is performed on each frequency point in the second frequency point set to obtain the nonlinear frequency point corresponding to each frequency point in the second frequency point set; The nonlinear frequency points that meet the preset conditions among the nonlinear frequency points corresponding to each frequency point in the second frequency point set are determined as the target nonlinear frequency points; The frequency point corresponding to the target nonlinear frequency point before the nonlinear transformation is determined as the frequency point in the first frequency point set.
6. The method according to any one of claims 1-3, characterized in that, The method further includes: The amplitude attenuation coefficient of each frame signal of the dual-channel audio at each frequency point in the first frequency point set is input into the target model to obtain the weighted weight value of the amplitude attenuation coefficient of each frame signal; the weighted weight value is used to obtain the single-frame amplitude attenuation coefficient of each frame signal. Alternatively, the weighted values of amplitude attenuation coefficients that match the amplitude attenuation coefficients of each frame of signal at each frequency point in the first frequency point set can be found in the database.
7. A communication device, characterized in that, Includes units for implementing the method of any one of claims 1-6.
8. A chip, characterized in that, It includes a processor and a communication interface, the processor being configured to cause the chip to perform the method as described in any one of claims 1-6.
9. A module device, characterized in that, The module device includes a communication module, a power module, a storage module, and a chip, wherein: the power module is used to provide power to the module device; The storage module is used to store data and instructions; The communication module is used for internal communication within the module device, or for communication between the module device and external devices; The chip is used to perform the method as described in any one of claims 1 to 6.
10. A terminal device, characterized in that, The device includes a memory and a processor, the memory being used to store a computer program, the computer program including program instructions, and the processor being configured to invoke the program instructions to perform the method as described in any one of claims 1 to 6.
11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, the computer program including program instructions that, when executed on a communication device, cause the communication device to perform the method of any one of claims 1 to 6.