A real-time optimization method and system for adaptive personal sound zones in a dynamic environment

By constructing a sound field cost function and updating the acoustic transfer matrix in real time in a dynamic acoustic environment, the problem of performance degradation in individual sound zones caused by changes in the acoustic environment is solved. This achieves a stable and efficient independent listening zone in a dynamic environment, providing a high-quality sound field for multiple users, and exhibiting robustness and stability.

CN120595580BActive Publication Date: 2026-06-30NINGBO ARTIFICIAL INTELLIGENCE RES INST OF SHANGHAI JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NINGBO ARTIFICIAL INTELLIGENCE RES INST OF SHANGHAI JIAOTONG UNIV
Filing Date
2025-05-27
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In dynamic acoustic environments, existing technologies cause personal acoustic zone systems to degrade in performance due to unknown or changing acoustic environments, making it difficult to stably and efficiently provide users with independent, high-quality listening zones in different environments.

Method used

An adaptive real-time optimization method for individual vocal zones is adopted. By constructing a sound field cost function and updating the control filter, the acoustic transfer matrix is ​​identified and updated in real time using the RRLS online system to achieve tracking and compensation for environmental changes. The control filter is optimized by combining stochastic gradient descent and Newton's method, and a regularization term is introduced to suppress noise interference.

Benefits of technology

It achieves real-time response and stable control of the acoustic environment in dynamic environments, maintains the quality of sound field reconstruction, provides independent sound zones for multiple users, has robustness and stability, adapts to environmental changes, and isolates the listening zones of different users.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120595580B_ABST
    Figure CN120595580B_ABST
Patent Text Reader

Abstract

This invention discloses a real-time optimization method and system for adaptive personal vocal zones in dynamic environments, relating to the field of sound field control. The method includes: Step 1, constructing a sound field cost function, the goal of which is to optimize the reconstructed signal p in region B. B [n] and the expected signal x in region B B The error between [n] and the reconstructed signal p in region D D [n] and the expected signal x in region D D The errors between [n] are all less than the preset threshold, p B [n] = R B W, p D [n] = R D Step 2: Update the control filter W to obtain the drive signal u for the speaker array; Step 3: The drive signal u is played through the speaker array and propagated through the air to the microphone arrays in areas B and D. The difference between the reconstructed signal in area B and the actual signal obtained from the microphone array in area B, and the difference between the reconstructed signal in area D and the actual signal obtained from the microphone array in area D are used as error signals. Steps 2 and 3 are repeated continuously using the RRLS method, and R is updated in real time. B and R D This enables the tracking and compensation of changes in the acoustic environment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of sound field control, and more particularly to a real-time optimization method and system for adaptive personal vocal zones in dynamic environments. Background Technology

[0002] Personalized zone systems (PZS) can generate independent listening zones for multiple users within the same physical space, showing broad application prospects in areas such as car cabins, mobile devices, and public places. Currently, most related research focuses on achieving optimal performance in fixed acoustic environments and assumes that the precise transfer function from the speaker to the microphone is known in advance. However, in real-world applications, the acoustic environment changes over time, such as furniture movement, people walking, and temperature variations, leading to changes in the transfer function and thus limiting the performance and practicality of PZS systems.

[0003] Therefore, those skilled in the art are dedicated to developing a real-time optimization system that can adapt to individual vocal zones in dynamic acoustic environments, thereby overcoming the aforementioned deficiencies in the prior art. Summary of the Invention

[0004] In view of the above-mentioned deficiencies of the prior art, the technical problem to be solved by the present invention is how to solve the problem of performance degradation of personal acoustic zone system in dynamic acoustic environment due to unknown or changing acoustic environment, so as to realize that personal acoustic zone system can stably and efficiently provide users with independent high-quality listening zone in different environments.

[0005] To achieve the above objectives, the present invention provides a real-time optimization method for adaptive personal vocal regions in dynamic environments, the method comprising the following steps:

[0006] Step 1: Construct the sound field cost function. The goal of the sound field cost function is to reconstruct the signal p in region B. B [n] and the expected signal x in region B B The error between [n] and the reconstructed signal p in region D D [n] and the expected signal x in region D D The errors between [n] are all less than the preset threshold;

[0007] Where, p B [n] = R B W, p D [n] = R D W, R B It is the acoustic transfer matrix from the speaker array to the microphone array in area B, R D It is the acoustic transmission matrix from the loudspeaker array to the microphone array in area D, and W is the control filter;

[0008] Step 2: Update the control filter W to obtain the drive signal u of the speaker array;

[0009] Step 3: The driving signal u is played through the speaker array and propagates through the air to the microphone arrays in areas B and D. The difference between the reconstructed signal in area B and the actual signal obtained from the microphone array in area B, and the difference between the reconstructed signal in area D and the actual signal obtained from the microphone array in area D, are used as error signals for RRLS online system identification. Steps 2 and 3 are repeated continuously using the RRLS method to update the acoustic transfer matrix R in real time. B and R D This enables the tracking and compensation of changes in the acoustic environment.

[0010] Furthermore, in step 1, the sound field cost function is obtained by minimizing the reconstructed signal p in region B. B [n] and the expected signal x in region B B The square of the error between [n] and the reconstructed signal p of region D D [n] and the desired signal x in region D D The objective of the sound field cost function is achieved by summing the squares of the errors between [n].

[0011] Furthermore, the weighting objective of the sound field cost function is to minimize the sum of squared norms of the weighted error vector. The control filter W that minimizes the weighting objective is found. Specifically, the weighting objective J[n] is:

[0012] J[n]=E{β||p B [n]-x B [n]|| 2 +(1-β)||p D [n]-x D [n]|| 2}

[0013] Where β∈[0,1] is the weight of the error in region B, and (1-β) is the weight of the error in region D; when β=1, only the sound pressure matching in region B is focused on, and the sound pressure in region D is not controlled; when β=0.5, the sound pressure in both regions B and D will be optimized at the same time; E represents the mathematical expectation.

[0014] Furthermore, let x D [n] = 0, making it impossible for sound to be heard in region D, and the weighted target J[n] becomes:

[0015] J[n]=E{β||p B [n]-x B [n]|| 2 +(1-β)||pD [n]|| 2}

[0016] Then substitute the signal model into the weighted objective J[n]:

[0017] J[n]=W T [βz B +(1-β)z D W-β(W) T Q B +Q B T W)+βE{X B T X B}

[0018] Among them, z B =E{R B T R B}, z D =E{R D T R D}, Q B =E{R B T X B};

[0019] Finally, add the regularization term ρW T W transforms the weighted objective J[n] into:

[0020] J[n]=W T [βz B +(1-β)z D W-β(W) T Q B +Q B T W)+βE{X B T X B}+pW T W.

[0021] Furthermore, in step 2, the control filter W is updated using stochastic gradient descent.

[0022] Furthermore, in the stochastic gradient descent method, the gradient of the weighted objective J[n] is calculated.

[0023]

[0024] Update W(n+1) again:

[0025]

[0026] Where μ is the step size, which must satisfy:

[0027]

[0028] To ensure convergence, where H B H is the matrix representation of the transfer function from the speaker array to the microphone array in area B. D It is a matrix representation of the transfer function from the loudspeaker array to the microphone array in area D.

[0029] Furthermore, a momentum variable v is introduced to record the accumulated information of the previous gradients;

[0030] The update rules for the momentum variable v and the control filter W are as follows:

[0031]

[0032] Among them, v n η is the momentum at the nth iteration, α is the momentum coefficient, which controls the degree to which previous momentum information is retained; η is the learning rate.

[0033] According to the new momentum v n+1 Update W(n+1), moving in the direction of negative momentum:

[0034] W(n+1)=W(n)-v n+1

[0035] When the direction of the gradient remains consistent across multiple iterations, momentum accumulates continuously, causing the step size μ of parameter updates to gradually increase, thereby accelerating convergence.

[0036] Furthermore, in step 2, the control filter W is updated using Newton's method.

[0037] Furthermore, in the RRLS online system identification in step 3, a regularization term is introduced, and the cost function ξ of the object model from the loudspeaker array to the m-th microphone in area B is calculated according to the recursive least squares (RLS) algorithm. m [n] is defined as:

[0038]

[0039] Where λ represents the forgetting factor, δ represents the regularization parameter, and p B,m [n] represents the sound pressure level at the m-th microphone. This represents the real-time estimated transfer function from the speaker array to the m-th microphone;

[0040] Calculate the prior estimation error ξ B[n]:

[0041]

[0042] The driving matrix of the speaker array to the m-th microphone transfer function in area B. The update is as follows:

[0043]

[0044] Calculate the gain vector k[n]:

[0045]

[0046] Update covariance matrix

[0047]

[0048] in,

[0049] Similarly, the driving matrix of region D can be obtained. Update:

[0050]

[0051] This invention also provides a real-time optimization system for adaptive personal voice zones in dynamic environments, comprising a speaker array consisting of L speakers and two microphone arrays divided into zones B and D, wherein the microphone arrays each contain M... B and M D The microphone is characterized by further comprising a real-time optimization method for adaptive personal vocal zones in dynamic environments as described in any of the preceding claims.

[0052] The real-time optimization method and system for adaptive personal vocal zones in a dynamic environment provided by this invention includes at least the following technical effects:

[0053] 1. The technical solution provided by this invention has dynamic adaptability, including real-time response to environmental changes and dynamic adjustment of the control filter. The real-time response to environmental changes is achieved through online RRLS modeling, allowing the system to track changes in the acoustic path (such as personnel movement and temperature fluctuations) in real time and update the transfer function matrix, avoiding performance degradation caused by fixed models in traditional methods. In the dynamic adjustment of the control filter, the LMS algorithm quickly adjusts the control weights based on the current environmental model to maintain the quality of the sound field reconstruction.

[0054] 2. The technical solution provided by this invention has robustness and stability. The regularization anti-overfitting includes the regularization term in RRLS to suppress model overfitting under noise interference, ensuring stable parameter estimation.

[0055] 3. The technical solution provided by this invention has multiple independent sound zones for users. By expanding the multi-channel control architecture, independent sound fields can be generated for different users, such as isolation between the listening zones of drivers and passengers.

[0056] The following will further explain the concept, specific structure, and technical effects of the present invention in conjunction with the accompanying drawings, so as to fully understand the purpose, features, and effects of the present invention. Attached Figure Description

[0057] Figure 1 This is a schematic diagram of a real-time optimization method according to a preferred embodiment of the present invention. Detailed Implementation

[0058] The following description, with reference to the accompanying drawings, illustrates several preferred embodiments of the present invention to make its technical content clearer and easier to understand. The present invention can be embodied in many different forms, and the scope of protection of the present invention is not limited to the embodiments mentioned herein.

[0059] This invention provides an adaptive personal acoustic zone method and system suitable for dynamic environments with online system identification. It uses the Least Mean Squares (LMS) algorithm to control the updating of the control filter and the Regularized Recursive Least Squares (RRLS) algorithm to enable the system model to adapt to changes in the acoustic environment, creating independent listening zones for multiple users in the same space.

[0060] Example 1

[0061] like Figure 1 As shown, this embodiment of the invention provides a real-time optimization method for adaptive personal voice zones in a dynamic environment, comprising the following steps:

[0062] Step 1: Construct the sound field cost function. The goal of the sound field cost function is to reconstruct the signal p in region B. B [n] and the expected signal x in region B B The error between [n] and the reconstructed signal p in region D D [n] and the expected signal x in region D D The errors between [n] are all less than the preset threshold;

[0063] Where, p B [n] = R B W, p D [n] = R D W, R B It is the acoustic transfer matrix from the speaker array to the microphone array in area B, R D It is the acoustic transmission matrix from the loudspeaker array to the microphone array in area D, and W is the control filter;

[0064] Step 2: Update the control filter W to obtain the drive signal u of the speaker array;

[0065] Step 3: The driving signal u is played through the speaker array and propagated through the air to the microphone arrays in areas B and D. The difference between the reconstructed signal in area B and the actual signal obtained from the microphone array in area B, and the difference between the reconstructed signal in area D and the actual signal obtained from the microphone array in area D, are used as error signals for RRLS online system identification. Steps 2 and 3 are repeated continuously using the RRLS method to update the acoustic transfer matrix R in real time. B and R D This enables the tracking and compensation of changes in the acoustic environment.

[0066] Example 2

[0067] Based on Example 1, in step 1, the sound field cost function is to make the reconstructed signal as close as possible to the desired signal. This can usually be achieved by minimizing the sum of squared errors between them, and the importance of the errors in the two regions is measured according to different weights.

[0068] Specifically, the sound field cost function minimizes the reconstructed signal p in region B. B [n] and the expected signal x in region B B The square of the error between [n] and the reconstructed signal p in region D D [n] and the expected signal x in region D D The objective of the sound field cost function is achieved by summing the squares of the errors between [n].

[0069] Specifically, the weighted objective of the sound field cost function is to minimize the sum of squared L2 norms of the weighted error vector. The control filter W that minimizes the weighted objective is found. The weighted objective J[n] is specifically:

[0070] J[n]=E{β||p B [n]-x B [n]|| 2 +(1-β)||p D [n]-x D [n]|| 2}

[0071] Where β∈[0,1] is the weight of the error in region B, and (1-β) is the weight of the error in region D; when β=1, only the sound pressure matching in region B is focused on, and the sound pressure in region D is not controlled; when β=0.5, the sound pressure in both regions B and D is optimized simultaneously, and the importance of the errors in regions B and D is balanced by the weight β to ensure that the reconstructed signal is optimal in the sense of weighted mean square error; E represents the mathematical expectation, which means that the cost is considered in the sense of statistical average. In practical applications, acoustic signals may be affected by random factors such as noise and environmental changes. By calculating the expectation, the average performance of the objective function can be optimized under various possible conditions, rather than just for a specific situation.

[0072] Specifically, let x D [n] = 0, making it impossible for sound to be heard in region D, and the weighted target J[n] becomes:

[0073] J[n]=E{β||p B [n]-x B [n]|| 2 +(1-β)||p D [n]|| 2}

[0074] Then substitute the signal model into the weighted objective J[n]:

[0075] J[n]=W T [βz B +(1-β)z D W-β(W) T Q B +Q B T W)+βE{X B T X B}

[0076] Among them, z B =E{R B T R B}, z D =E{R D T R D}, Q B =E{R B T X B};

[0077] Finally, add the regularization term pW. T W transforms the weighted objective J[n] into:

[0078] J[n]=W T [βz B +(1-β)zD W-β(W) T Q B +Q B T W)+βE{X B T X B}+pW T W.

[0079] Example 3

[0080] Based on Example 1 or 2, in step 2, the control filter W is updated using the stochastic gradient descent method.

[0081] In the stochastic gradient descent method, the gradient of the weighted objective J[n] is calculated.

[0082]

[0083] Update W(n+1) again:

[0084]

[0085] Where μ is the step size, which must satisfy:

[0086]

[0087] To ensure convergence, where H B It is a matrix representation of the transfer function from the speaker array to the microphone array in area B, H D It is a matrix representation of the transfer function from the loudspeaker array to the microphone array in area D.

[0088] The step size controls the magnitude of the filter parameter updates in each iteration. A smaller step size can make the algorithm converge more stably, but may lead to a slower convergence speed; a larger step size may speed up the convergence speed, but may also cause the algorithm to oscillate or even fail to converge during the convergence process. The step size setting is determined after a trade-off to meet the requirements of algorithm stability and convergence speed in the simulation.

[0089] In particular, by employing momentum-based stochastic gradient descent, which comprehensively considers previous gradient information, the shortcomings of traditional stochastic gradient descent are overcome to some extent. This allows for more effective optimization of the objective function, faster convergence, and reduced oscillations. Specifically, a momentum variable v is introduced to record the accumulated information of previous gradients.

[0090] The update rules for momentum variable v and control filter W are as follows:

[0091]

[0092] Among them, vn η is the momentum at the nth iteration, α is the momentum coefficient, which controls the degree to which previous momentum information is retained; η is the learning rate, and α is usually taken as around 0.9s;

[0093] According to the new momentum v n+1 Update W(n+1), moving in the direction of negative momentum:

[0094] W(n+1)=W(n)-v n+1

[0095] When the gradient direction remains consistent across multiple iterations, momentum accumulates, causing the step size μ of parameter updates to gradually increase, thus accelerating convergence. When the gradient direction changes, momentum acts as a buffer, preventing drastic oscillations in parameter updates. This is because momentum contains information from previous gradients, preventing the parameters from drastically changing direction to completely follow the current gradient.

[0096] In particular, in step 2, the control filter W can also be updated using Newton's method.

[0097] Example 3

[0098] In real-world dynamic environments, the acoustic characteristics change continuously from when sound is emitted by a speaker to when it is received by a microphone. Precise sound field control needs to account for this change to avoid errors caused by variations in the acoustic transfer function. To address this issue, a real-time system identification based on the RLS algorithm is introduced, continuously updating model parameters based on actual data to track the dynamic changes of the system in real time.

[0099] Based on Examples 1, 2, or 3, step 3 identifies and updates the transfer function matrices of regions B and D using the RRLS online system. Specifically, the driving signal obtained in step 2 is played through a speaker array, propagates through the air to regions B and D, and is recorded by two microphone arrays placed in regions B and D for sound monitoring. The difference between the desired signal and the actual signal recorded by the microphone arrays is used as an error signal. By continuously comparing the difference between the two, the acoustic transfer matrix is ​​updated, allowing the model to better fit the actual situation.

[0100] Specifically, in the RRLS online system identification in step 3, a regularization term is introduced. Based on the recursive least squares (RLS) algorithm, the cost function ξ of the object model from the speaker array to the m-th microphone in area B is calculated. m [n] is defined as:

[0101]

[0102] Where λ represents the forgetting factor, δ represents the regularization parameter, and p B,m[n] represents the sound pressure level at the m-th microphone. This represents the real-time estimated transfer function from the loudspeaker array to the m-th microphone;

[0103] Calculate the prior estimation error ξ B [n]:

[0104]

[0105] The driving matrix for the transfer function of the speaker array to the m-th microphone in area B. The update is as follows:

[0106]

[0107] Calculate the gain vector k[n]:

[0108]

[0109] Update covariance matrix

[0110]

[0111] in,

[0112] Similarly, the driving matrix of region D can be obtained. Update:

[0113]

[0114] By repeating steps 2 and 3, the acoustic transfer matrix is ​​updated in real time and the control filter is adjusted to reconstruct the desired signal through the loudspeaker array.

[0115] Example 4

[0116] This invention also provides a real-time optimization system for adaptive personal voice zones in dynamic environments, comprising a speaker array consisting of L speakers and two microphone arrays divided into zones B and D, wherein the microphone arrays each contain M... B and M D The microphone is characterized by further including a real-time optimization method for adaptive personal voice zones in a dynamic environment, as described in any one of Embodiments 1 to 3.

[0117] The preferred embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make numerous modifications and variations based on the concept of the present invention without creative effort. Therefore, all technical solutions that can be obtained by those skilled in the art based on the concept of the present invention through logical analysis, reasoning, or limited experimentation on the basis of existing technology should be within the scope of protection defined by the claims.

Claims

1. A real-time optimization method for adaptive personal voice zones in a dynamic environment, characterized in that, The method includes the following steps: Step 1: Construct the sound field cost function. The goal of the sound field cost function is to reconstruct the signal in region B. With the expected signal in region B Errors between and D-region reconstructed signals With the expected signal in region D The errors between them are all less than the preset threshold; in, , , It is the acoustic transmission matrix from the speaker array to the microphone array in area B. It is the acoustic transmission matrix from the loudspeaker array to the microphone array in area D. For control of the filter; Step 2: Update the control filter The drive signal for the loudspeaker array is obtained. ; Step 3, the driving signal The signal is played through the speaker array and propagates through the air to the microphone arrays in areas B and D. The difference between the reconstructed signal in area B and the actual signal obtained from the microphone array in area B, and the difference between the reconstructed signal in area D and the actual signal obtained from the microphone array in area D, are used as error signals for RRLS online system identification. Steps 2 and 3 are repeated continuously using the RRLS method to update the acoustic transfer matrix in real time. and This enables the tracking and compensation of changes in the acoustic environment. In the RRLS online system identification in step 3, a regularization term is introduced. Based on the Recursive Least Squares (RLS) algorithm, the identification process is performed from the speaker array to the [number]th [unit] in region B. Cost function of an object model for a microphone Defined as: in, Indicates the forgetting factor, Represents the regularization parameter. Indicates the first Sound pressure level at each microphone Indicates the loudspeaker array to the first The transfer function estimated in real time by each microphone; Calculate the prior estimation error : The speaker array is located in area B. The driving matrix of the microphone transfer function The update is as follows: Calculate the gain vector : Update covariance matrix : in, , ; Similarly, the driving matrix of region D can be obtained. Update: 。 2. The real-time optimization method for adaptive personal voice zones in a dynamic environment as described in claim 1, characterized in that, In step 1, the sound field cost function is obtained by minimizing the reconstructed signal in region B. With the expected signal of region B The square of the error between and the reconstructed signal of region D With the desired signal of region D The objective of the sound field cost function is achieved by summing the squares of the errors between them.

3. The real-time optimization method for adaptive personal voice zones in a dynamic environment as described in claim 2, characterized in that, The weighted objective of the sound field cost function is to minimize the sum of squared L2 norms of the weighted error vector. The control filter that minimizes the weighted objective is then found. The weighted target Specifically: in, It is the weight of the error in region B. It is the weight of the error in region D; when At that time, only the sound pressure level in zone B is controlled, and the sound pressure level in zone D is not controlled; when At the same time, the sound pressure level of both areas B and D will be optimized. It represents the mathematical expectation.

4. The real-time optimization method for adaptive personal voice zones in a dynamic environment as described in claim 3, characterized in that, make This makes it impossible to hear sound in area D, thus reducing the weighted target. become: Then substitute the signal model into the weighted target. middle: in, ; Finally, add regularization terms. The weighted target become: 。 5. The real-time optimization method for adaptive personal voice zones in a dynamic environment as described in claim 4, characterized in that, In step 2, the control filter is updated using stochastic gradient descent. .

6. The real-time optimization method for adaptive personal voice zones in a dynamic environment as described in claim 5, characterized in that, In the stochastic gradient descent method, the weighted objective is calculated. gradient : Update : in, For step size, the following must be satisfied: To ensure convergence, among which, It is a matrix representation of the transfer function from the speaker array to the microphone array in area B. It is a matrix representation of the transfer function from the loudspeaker array to the microphone array in area D.

7. The real-time optimization method for adaptive personal voice zones in a dynamic environment as described in claim 6, characterized in that, By introducing a momentum variable To record the accumulated information of the previous gradient; The momentum variable The update rules and the control filter The update rules are as follows: in, It is the momentum at the nth iteration. It is the momentum coefficient, which controls the degree to which previous momentum information is retained; It is the learning rate; According to the new momentum renew Moving in the direction of negative momentum: When the direction of the gradient remains consistent across multiple iterations, momentum accumulates, increasing the step size of parameter updates. It gradually increases, thus accelerating convergence.

8. The real-time optimization method for adaptive personal voice zones in a dynamic environment as described in claim 4, characterized in that, In step 2, the control filter is updated using Newton's method. .

9. A real-time optimization system for adaptive personal vocal zones in a dynamic environment, comprising a speaker array consisting of L loudspeakers and two microphone arrays divided into zones B and D, wherein, The microphone arrays respectively include and The microphone is characterized by further comprising the real-time optimization method for adaptive personal voice zones in a dynamic environment as described in any one of claims 1-8.