Stereo coding method and device

A stereo and encoding technology, applied in the multimedia field, can solve the problems of inability to achieve realistic restoration, uncomfortable listening experience of the listener, and inability to meet the recovery requirements, and achieve the effect of improving encoding efficiency and enhancing sound field effects.

Inactive Publication Date: 2013-10-23
HUAWEI TECH CO LTD
0 Cites 14 Cited by

AI-Extracted Technical Summary

Problems solved by technology

ILD is a ubiquitous signal characteristic parameter that reflects the sound field signal. ILD can better reflect the energy of the sound field. However, stereo sound often has background space and sound fields in the left and right directions. It is not enough to restore the original stereo sound only by transmitting ILD to restore the stereo sound. Signal requirements, so a scheme to transmit more parameters to better restore the stereo signal was proposed. In addition to extracting the most basic ILD parameters, it also proposed to transmit the phase difference between the left and right channels (IPD: InterChannel Phase Difference) and the left and right channels. Cross-correlation ICC parameters, sometimes including the phase difference (OPD) parameters of the left channel and the downmix signal, these parameters reflecting the background space of the stereo signal and the sound field information in the left and right directions and the ILD parameters are encoded as side information and sent to Decoder to restore stereo signal
[0003] Coding bit rate is one of the important evaluation factors of multimedia signal coding performance. Th...
View more

Method used

In the method for stereo encoding of the embodiment of the present invention, the group delay and the group phase that can reflect the global orientation information of the signal between the left and right channel signals of the stereo signal are estimated by using the left and right channel signals in the frequency domain so that the orientation information of the sound field is effectively obtained The enhancement of the stereo signal spatial characteristic parameters and the estimation of the group delay and group phase are applied to the stereo coding with little bit rate requirement, so that the spatial information and the global orientation information can be effectively combined to obtain a more accurate sound field information, enhance the sound field effect, and greatly improve the coding efficiency.
In the stereo coding equipment of the embodiment of the present invention, utilize the left and right channel signals on the frequency domain to estimate the group delay and the group phase between the left and right channels of the stereo signal, which can reflect the signal global orientation information, so that the orientation information of the sound field is effectively obtained The enhancement of the stereo signal spatial characteristic parameters and the estimation of the group delay and group phase are applied to the stereo coding with little bit rate requirement, so that the spatial information and the global orientation information can be effectively combined to obtain a more accurate sound field information, enhance the sound field effect, and greatly improve the coding efficiency.
Wherein, α and β are the constants of weighting, 0≤α≤1, β=1-α, in the present embodiment, before estimating group delay and group phase, to the mutual weighting of the left and right sound channels that obtains ...
View more

Abstract

The embodiment of the invention relates to a stereo coding method, which comprises the following steps of transforming a left channel signal and a right channel signal of stereo in a time domain into a frequency domain to form a left channel signal and a right channel signal in the frequency domain; performing down-mixing on the left channel signal and the right channel signal in the frequency domain to generate a single-channel down-mixed signal, and transmitting bits of the coded and quantized down-mixed signal; extracting spatial parameters of the left channel signal and the right channel signal in the frequency domain; estimating a group delay and a group phase between the left and right channels of the stereo by utilizing the left channel signal and the right channel signal in the frequency domain; and quantitatively coding the group delay, the group phase and the spatial parameters to achieve high stereo coding performance under a low code rate.

Application Domain

Speech analysis

Technology Topic

Stereophonic soundFrequency domain +4

Image

  • Stereo coding method and device
  • Stereo coding method and device
  • Stereo coding method and device

Examples

  • Experimental program(9)

Example Embodiment

[0031] Example 1:
[0032] figure 1 A schematic diagram of the implementation of a stereo encoding method, including:
[0033] Step 101: Transform the left channel signal and the right channel signal of the time domain stereo to the frequency domain to form the left channel signal and the right channel signal in the frequency domain.
[0034] Step 102: The left channel frequency domain signal and the right channel frequency domain signal in the frequency domain are down-mixed to generate a mono downmix signal (DMX), the bits after encoding and quantization of the DMX signal are transmitted, and the extracted frequency domain The spatial parameters of the upper left channel signal and the right channel signal are quantized and encoded.
[0035] Spatial parameters are parameters representing spatial characteristics of stereo signals, such as ILD parameters.
[0036] Step 103: Use the left and right channel signals in the frequency domain to estimate the group delay (Group Delay) and the group phase (Group Phase) between the left channel signal and the right channel signal in the frequency domain.
[0037] The group delay reflects the global orientation information of the time delay of the envelope between the stereo left and right channels, and the group phase reflects the global information of the similarity of the waveforms of the stereo left and right channels after time alignment.
[0038] Step 104: Quantize and encode the estimated group delay and group phase.
[0039] The group delay and group phase are quantized and encoded to form the content of the side information stream to be transmitted.
[0040] In the stereo encoding method of the embodiment of the present invention, the group delay and group phase are estimated while the spatial characteristic parameters of the stereo signal are extracted, and the estimated group delay and group phase are applied to the stereo encoding, so that the spatial parameters and the global orientation information Effective combination, more accurate sound field information can be obtained at low bit rate through the global orientation information estimation method, which enhances the sound field effect and greatly improves the coding efficiency.

Example Embodiment

[0041] Embodiment 2:
[0042] figure 2 It is a schematic diagram of another stereo coding method embodiment, including:
[0043] Step 201, transform the time domain stereo left channel signal and the right channel signal to the frequency domain to form a stereo left channel signal X in the frequency domain 1 (k) and the right channel signal X 2 (k), where k is the index value of the frequency point of the frequency signal.
[0044] Step 202, performing a downmix operation on the left channel signal and the right channel signal in the frequency domain, encoding and quantizing the downmix signal and transmitting, and encoding stereo spatial parameters, quantizing to form side information and transmitting, which may include the following steps:
[0045] Step 2021, the left channel signal and the right channel signal in the frequency domain are downmixed to generate a synthesized mono downmix signal DMX.
[0046] Step 2022, encode the quantized mono downmix signal DMX, and transmit the quantized information.
[0047]Step 2023: Extract the ILD parameters of the left channel signal and the right channel signal in the frequency domain.
[0048] Step 2024: Quantize and encode the ILD parameter to form side information and transmit it.
[0049] Steps 2021 and 2022 and steps 2023 and 2024 do not affect each other and can be executed independently. The side information formed by the former can be multiplexed with the side information formed by the latter and then transmitted.
[0050] In another embodiment, the monaural downmix signal obtained by downmixing can be further subjected to frequency-time transformation to obtain a time domain signal of the monaural downmix signal DMX, and the time domain signal of the monaural downmix signal DMX is subjected to The quantized bits are encoded for transmission.
[0051] Step 203: Estimate the group delay and the group phase between the left and right channel signals in the frequency domain.
[0052] Using the left and right channel signals in the frequency domain to estimate the group delay and the group phase between the left and right channel signals includes determining the cross-correlation function of the stereo left and right channel frequency domain signals, and obtaining the group delay of the stereo signal according to the signal estimation of the cross-correlation function. and the group phase, such as image 3 As shown, it can specifically include the following steps:
[0053] Step 2031: Determine the cross-correlation function between the stereo left and right channel signals in the frequency domain.
[0054] The cross-correlation function of the frequency domain signals of the stereo left and right channels can be a weighted cross-correlation function. In the process of determining the cross-correlation function, the weighted operation of the cross-correlation function of the estimated group delay and group phase is compared with other operations. Stereo signal coding results tend to be more stable, and the weighted cross-correlation function is the weighting of the product of the conjugate of the left channel frequency domain signal and the right channel frequency domain signal. The value at the frequency point half of the length N is 0. The form of the cross-correlation function of the stereo left and right channel frequency domain signals can be expressed as follows:
[0055] C r ( k ) = W ( k ) X 1 ( k ) X * 2 ( k ) 0 ≤ k ≤ N / 2 0 k N / 2 ,
[0056] where w(k) represents the weighting function, X * 2 (k) means X 2 The conjugate function of (k), or it can also be expressed as: C r (k)=X 1 (k)X * 2 (k) 0≤k≤N/2+1. In another form of cross-correlation function, combined with different weighting forms, the cross-correlation function of the left and right channel frequency domain signals of the stereo can be expressed as follows:
[0057] C r ( k ) = X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | k = 0 2 * X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | 1 ≤ k ≤ N / 2 - 1 X 1 ( k ) X 2 * ( k ) / | X 1 ( k ) | | X 2 ( k ) | k = N / 2 0 k N / 2 ,
[0058] Among them, N is the length of the time-frequency transform of the stereo signal, |X 1 (k)| and |X 2 (k)| is X 1 (k) and X 2 (k) Corresponding magnitude. The weighted cross-correlation function at frequency point 0 and frequency point N/2 is the reciprocal of the amplitude product of the left and right channel signals at the corresponding frequency points, and the weighted cross-correlation function at other frequency points is the amplitude product of the left and right channel signals. 2 times the reciprocal. In other implementations, the weighted cross-correlation function of the stereo left and right channel frequency domain signals can also be expressed in other forms, for example:
[0059] C r ( k ) = X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 k = 0 2 * X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 1 ≤ k ≤ N / 2 - 1 X 1 ( k ) X 2 * ( k ) / X 1 ( k ) 2 + X 2 ( k ) 2 k = N / 2 0 k N / 2 ,
[0060] In this regard, this embodiment is not limited, and any modification of the above formulas is within the scope of protection.
[0061] Step 2032, perform inverse time-frequency transformation on the weighted cross-correlation function of the stereo left and right channel frequency domain signals to obtain the cross-correlation function time-domain signal C r (n), where the cross-correlation function time domain signal is a complex signal.
[0062] Step 2033, estimate the group delay and group phase of the stereo signal according to the time domain signal of the cross-correlation function.
[0063] In another embodiment, the group delay and group phase of the stereo signal can be estimated directly according to the cross-correlation function between the stereo left and right channel signals in the frequency domain determined in step 2031 .
[0064] In step 2033, the group delay and group phase of the stereo signal can be estimated directly from the cross-correlation function time-domain signal; some signal preprocessing can also be performed on the cross-correlation function time-domain signal, and the stereo signal can be estimated based on the preprocessed signal. group delay and group phase.
[0065] If some signal preprocessing is performed on the cross-correlation function time domain signal, estimating the group delay and group phase of the stereo signal based on the preprocessed signal may include:
[0066] 1) Normalize or smooth the cross-correlation function time domain signal;
[0067] The smoothing of the cross-correlation function time-domain signal can be performed as follows:
[0068] C ravg (n)=α*C ravg (n)+β*C r (n)
[0069] Among them, α and β are weighted constants, 0≤α≤1, β=1-α, for this embodiment, before estimating the group delay and the group phase, the cross-correlation function time domain signal between the left and right channels obtained Preprocessing such as smoothing makes the estimated group delay more stable.
[0070] 2) Further smoothing is performed after normalizing the cross-correlation function time domain signal;
[0071] 3) Normalize or smooth the absolute value of the cross-correlation function time-domain signal;
[0072] The smoothing of the absolute value of the cross-correlation function time-domain signal can be performed as follows:
[0073] C ravg_abs (n)=α*C ravg (n)+β*|C r (n)|,
[0074] 4) The absolute value signal after normalization processing of the cross-correlation function time domain signal is further processed for smoothing.
[0075] It can be understood that before estimating the group delay and group phase of the stereo signal, the pre-processing of the cross-correlation function time-domain signal may also include other processing, such as autocorrelation processing. Preprocessing also includes autocorrelation or/and smoothing, etc.

Example Embodiment

[0076] Combined with the above-mentioned preprocessing of the cross-correlation function time-domain signal, the group delay and group phase of the stereo signal are estimated in step 2033 using the same estimation method, or can be estimated separately. Specifically, at least the following estimated group phase and Implementation of group delay:
[0077] Step 2033 Embodiment 1, such as Figure 4a shown:
[0078] According to the cross-correlation function time-domain signal or the index corresponding to the value with the largest amplitude in the processed cross-correlation function time-domain signal, the group delay is estimated, and the phase angle corresponding to the cross-correlation function corresponding to the group delay is obtained, and the group phase is estimated. , including the following steps:
[0079] Determine the relationship between the index corresponding to the value with the largest amplitude in the cross-correlation function of the time-domain signal and the symmetric interval related to the transformation length N. In one embodiment, if the index corresponding to the value with the largest amplitude in the cross-correlation function of the time-domain signal is less than or equal to N/2, then the group delay is equal to the index corresponding to the value with the largest amplitude in the cross-correlation function of the time domain signal. If the index corresponding to the value with the largest amplitude in the correlation function is greater than N/2, then the group delay is the index minus the index Transform length N, [0, N/2] and (N/2, N] can be regarded as the first symmetric interval and the second symmetric interval related to the time-frequency transform length N of the stereo signal, in another implementation, The range of judgment can be the first symmetric interval and the second symmetric interval of [0, m] and (N-m, N], where m is less than N/2, and the index corresponding to the value with the largest amplitude in the cross-correlation function of the time domain signal Compared with the relevant information of m, the index corresponding to the value with the largest amplitude in the cross-correlation function of the time-domain signal is located in the interval [0, m], then the group delay is equal to the index corresponding to the value with the largest amplitude in the cross-correlation function of the time-domain signal , the index corresponding to the value with the largest amplitude in the cross-correlation function of the time-domain signal is located in the interval (N-m, N], then the group delay is the index minus the transform length N. However, in practical applications, the judgment can be made by The adjacent value of the index corresponding to the value with the largest amplitude in the cross-correlation function of the time-domain signal can appropriately select the index corresponding to the value slightly smaller than the value with the largest amplitude as the judgment condition without affecting the subjective effect or according to the requirements. The index corresponding to the second largest value or the index corresponding to the value that differs from the maximum amplitude within a fixed or preset range is applicable. Taking the index corresponding to the value with the largest amplitude in the cross-correlation function of the time domain signal as an example, a specific form It is reflected as follows:
[0080] d g = arg max | C ravg ( n ) | arg max | C ravg ( n ) | ≤ N / 2 arg max | C ravg ( n ) | - N arg max | C ravg ( n ) | N / 2 , where argmax|C ravg (n)| is C ravg The index corresponding to the value with the largest amplitude in (n), this embodiment also protects various deformations of the above form.
[0081] According to the phase angle corresponding to the cross-correlation function of the time domain signal corresponding to the group delay, when the group delay d g greater than or equal to zero, by determining d g The phase angle corresponding to the corresponding cross-correlation value is estimated to obtain the group phase; when d g When less than zero, the group phase is d g The phase angle corresponding to the cross-correlation value corresponding to the +N index can be embodied in one of the following forms or any deformation of this form:
[0082] θ g = ∠ C ravg ( d g ) d g ≥ 0 ∠ C ravg ( d g + N ) d g 0 , where ∠C ravg (d g ) is the time domain signal cross-correlation function value C ravg (d g ), ∠C ravg (d g +N) is the time domain signal cross-correlation function value C ravg (d g +N) phase angle.
[0083] Step 2033 Embodiment 2, such as Figure 4b shown:
[0084] extracting the phase of the cross-correlation function, or based on the processed cross-correlation function where the function ∠C r (k is used to extract the complex number C r The phase angle of (k), and the average value α of the phase difference is obtained in a frequency in the low band 1 , the group delay is determined according to the ratio of the product of the phase difference and the transform length to the frequency information. Similarly, the group phase information is obtained according to the difference between the phase of the current frequency point of the cross-correlation function, the frequency point index and the product of the mean value of the phase difference. , specifically in the following ways:
[0085] α 1 = E { Φ ^ ( k + 1 ) - Φ ^ ( k ) } k Max ;
[0086] d g = - a 1 N 2 * π * Fs ;
[0087] θ g = E { Φ ^ ( k ) - a 1 * k } k Max 10
[0088] in Indicates the average value of the phase difference, Fs is the frequency used, and Max is the upper limit of the cut-off for calculating the group delay and group phase to prevent phase rotation.
[0089] Step 204: Quantize and encode the group delay and group phase to form side information for transmission.
[0090] Scalar quantization of the group delay within a preset or random range, which is a symmetrical positive and negative value [-Max, Max] or the available value under random conditions, and a longer value for the group delay after scalar quantization Time transmission or differential coding processing is used to obtain side information. The value range of the group phase is usually in the range of [0, 2*PI], which can be [0, 2*PI) or (-PI, PI]. The group phase is scalar quantized and encoded within the range, and the quantized and encoded group delay and the side information formed by the group phase are multiplexed to form an encoded code stream, which is sent to the stereo signal recovery device.
[0091] In the method for stereo encoding according to the embodiment of the present invention, the left and right channel signals in the frequency domain are used to estimate the group delay and group phase between the left and right channels of the stereo signal, which can reflect the global orientation information of the signal, so that the orientation information of the sound field is effectively enhanced. Combining the spatial characteristic parameters of the stereo signal with the estimation of the group delay and group phase is applied to the stereo coding with little code rate requirement, so that the spatial information and the global azimuth information can be effectively combined, and more accurate sound field information can be obtained. The sound field effect is greatly improved, and the coding efficiency is greatly improved.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Coded text generation method and device based on adversarial training and electronic equipment

PendingCN114692570AImprove understandingImprove coding efficiency
Owner:北京快确信息科技有限公司

Method and device for scanning transformation coefficients

ActiveCN102857748AImprove coding efficiency
Owner:HUAWEI TECH CO LTD

Video coding method and device and electronic equipment

PendingCN113938682AReduce bit rateImprove coding efficiency
Owner:BEIJING KINGSOFT CLOUD NETWORK TECH CO LTD

Inter-frame prediction method, video coding method, electronic equipment and storage medium

ActiveCN112055202AImprove coding efficiency
Owner:ZHEJIANG DAHUA TECH

Classification and recommendation of technical efficacy words

  • Improve coding efficiency

Encrypted embedded video chaotic secret communication method after H.264 coding

ActiveCN105791853AImprove coding efficiencyreduce redundancy
Owner:GUANGDONG UNIV OF TECH

Multi-viewpoint multi-description video encoding method based on data reusing

ActiveCN103533330Areduce redundancyImprove coding efficiency
Owner:HUAQIAO UNIVERSITY

Method and device for making virtual image region for motion estimation and compensation of panorama image

InactiveCN101002473AImprove coding efficiency
Owner:庆熙大学校产协力团 +1

Embedded H.264 coding method based on TMS320DM642 chip

ActiveCN102137257Aeasy to useImprove coding efficiency
Owner:BEIJING TELESOUND ELECTRONICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products