A directional sector range pickup method, system, terminal and medium based on a linear array of microphones

By using a linear microphone array and a deep network model, the limitations of traditional devices in defining the sound pickup range and suppressing interference are overcome, achieving precise directional fan-shaped sound pickup and improving the user experience and anti-interference capabilities of the sound pickup device.

CN121967959BActive Publication Date: 2026-06-30ELEVOC TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ELEVOC TECH CO LTD
Filing Date
2026-04-01
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Traditional single-microphone or dual-microphone devices suffer from problems such as unclear sound pickup range and multiple sound sources affecting the interactive experience in outdoor recording and smart conferencing scenarios, making it difficult to achieve accurate sound pickup and anti-interference.

Method used

By employing a linear microphone array, the angle and positional relationship of the sound source are determined through the phase difference between omnidirectional microphones and the amplitude difference between directional and omnidirectional microphones. Combined with a deep network model, directional fan-shaped range sound pickup is achieved.

Benefits of technology

It enables precise control of the pickup distance and angle, isolates interference sources, enhances the user interaction experience, and improves the accuracy and anti-interference capability of the pickup.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121967959B_ABST
    Figure CN121967959B_ABST
Patent Text Reader

Abstract

This invention discloses a method, system, terminal, and medium for directional fan-shaped sound pickup based on a microphone linear array. The method includes: establishing a microphone linear array, which includes at least one directional microphone and several omnidirectional microphones; determining the sound source angle and the distance between the sound source and the center of the omnidirectional microphones based on the phase difference between the omnidirectional microphones; determining a fan-shaped sound pickup area based on the sound source angle and distance to achieve fan-shaped sound pickup; determining the positional relationship between the sound source and the microphone linear array based on the amplitude difference between the directional microphone and any one of the omnidirectional microphones; and obtaining a directional fan-shaped sound pickup area based on the positional relationship and the fan-shaped sound pickup area to achieve directional fan-shaped sound pickup. The positional relationship refers to the sound source being located in front of or behind the microphone linear array. This invention can precisely control the pickup distance and angle, effectively solving the problem of directional fan-shaped sound pickup.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of AI intelligent voice processing technology, and in particular to a method, system, terminal and medium for directional fan-shaped sound pickup based on a microphone linear array. Background Technology

[0002] With the increasing prevalence of outdoor recording, smart conferencing, and various in-store or shopping mall interactions, the market has placed higher demands on the accuracy, anti-interference capabilities, and scene adaptability of voice pickup devices. Traditional single-microphone or dual-microphone devices suffer from unclear pickup range. While dual-microphone devices can isolate speakers outside the pickup angle, in most scenarios, multiple sound sources will still be present within the pickup angle, significantly impacting the interactive experience or drastically reducing recognition rates, thus affecting user experience.

[0003] Therefore, existing technologies still have shortcomings. Summary of the Invention

[0004] To address the aforementioned deficiencies in the prior art, this invention provides a method, system, terminal, and medium for directional fan-shaped sound pickup based on a microphone linear array. The technical solution adopted by this invention is as follows:

[0005] In a first aspect, the present invention provides a directional fan-shaped range sound pickup method based on a linear microphone array, the method comprising:

[0006] A linear microphone array is established, comprising at least one directional microphone and several omnidirectional microphones;

[0007] The sound source angle and the distance between the sound source and the center of the omnidirectional microphone are determined based on the phase difference between the two microphones. A fan-shaped pickup area is then determined based on the sound source angle and the distance between the sound source and the center of the omnidirectional microphone to achieve fan-shaped pickup.

[0008] Based on the amplitude difference between the directional microphone and any omnidirectional microphone, the positional relationship between the sound source and the linear array of microphones is determined, and based on the positional relationship and the fan-shaped pickup area, a directional fan-shaped pickup area is obtained to achieve directional fan-shaped range pickup. The positional relationship refers to the sound source being located in front of or behind the linear array of microphones.

[0009] In one implementation, at least three omnidirectional microphones are provided.

[0010] In one implementation, when one directional microphone is provided and four omnidirectional microphones are provided, the omnidirectional microphones are respectively: a first microphone, a second microphone, a third microphone, and a fourth microphone. The sound source angle and the distance between the sound source and the centers of the two omnidirectional microphones are determined based on the phase difference between the omnidirectional microphones. A fan-shaped pickup area is determined based on the sound source angle and the distance between the sound source and the centers of the omnidirectional microphones, achieving fan-shaped pickup, including:

[0011] When the sound source is a single, noise-free, reverberation-free, and interference-free ideal sound source, short-time Fourier transforms are performed on the received signals of the first microphone, the second microphone, the third microphone, and the fourth microphone respectively to obtain the frequency domain characteristics of the first microphone, the second microphone, the third microphone, and the fourth microphone respectively.

[0012] The first phase difference between the received signals of the first microphone and the second microphone is determined based on the first correlation between the frequency domain characteristics of the first microphone and the second microphone, and the second phase difference between the received signals of the third microphone and the fourth microphone is determined based on the second correlation between the frequency domain characteristics of the third microphone and the fourth microphone.

[0013] The first time difference between the received signals of the first microphone and the second microphone is determined based on the first phase difference, and the second time difference between the received signals of the third microphone and the fourth microphone is determined based on the second phase difference.

[0014] The first sound source angle and the second sound source angle are determined based on the first time difference and the second time difference;

[0015] Determine the first distance between the first microphone and the second microphone, and the second distance between the second microphone and the third microphone. Based on the first sound source angle, the second sound source angle, the first distance, and the second distance, determine the first distance from the sound source to the midpoint between the first microphone and the second microphone, and the second distance from the sound source to the midpoint between the third microphone and the fourth microphone, respectively, and obtain the distance between the sound source and the center of the omnidirectional microphone.

[0016] Based on the first spacing, the second spacing, the first distance, and the second distance, the sound source distance between the sound source and the directional microphone is determined;

[0017] The sound source distance is compared with a preset sound source distance threshold. If the sound source distance is less than or equal to the preset sound source distance threshold, the sound source is determined to be within the fixed-distance pickup range.

[0018] Based on the first distance, the second distance, the first sound source angle, and the second sound source angle, a fan-shaped pickup area is obtained, thereby achieving fan-shaped range pickup.

[0019] In one implementation, determining the positional relationship between the sound source and the linear microphone array based on the amplitude difference between the directional microphone and any omnidirectional microphone includes:

[0020] Perform short-time Fourier transforms on the received signals from the directional microphone and any omnidirectional microphone respectively, and calculate the first decibel value and the second decibel value;

[0021] Based on the first decibel value and the second decibel value, the amplitude difference between the directional microphone and any omnidirectional microphone is obtained;

[0022] The amplitude difference is compared with a preset amplitude difference threshold to obtain the positional relationship between the sound source and the microphone linear array. If the amplitude difference is less than the preset amplitude difference threshold, the positional relationship is that the sound source is in front of the microphone linear array. If the amplitude difference is greater than the preset amplitude difference threshold, the positional relationship is that the sound source is behind the microphone linear array.

[0023] In one implementation, the method further includes:

[0024] When the sound source is a multi-source sound source containing noise signal, main signal, and interference signal, training data is generated in a simulation, and a deep network model is trained based on the training data. The training data is interference and noise signals in a multi-dimensional spatial environment simulating different reverberation times and sound source distances.

[0025] Short-time Fourier transforms are performed on the received signals from the first microphone, the second microphone, the third microphone, and the fourth microphone to obtain the frequency domain features of the first microphone, the second microphone, the third microphone, and the fourth microphone, respectively. The real and imaginary parts of the frequency domain features of the first microphone, the second microphone, the third microphone, and the fourth microphone are then extracted.

[0026] Perform short-time Fourier transforms on the received signals from the directional microphone and any one of the omnidirectional microphones respectively, and extract the amplitudes of the directional microphone and any one of the omnidirectional microphones.

[0027] The extracted real and imaginary features, along with all amplitudes, are combined to form an input feature set. This input feature set is then fed into a trained deep network model, which outputs the frequency domain features corresponding to the main speaker signal within the directional fan-shaped pickup area. Finally, a short-time Fourier transform is used to obtain the time-domain speech signal corresponding to the directional fan-shaped pickup area.

[0028] In one implementation, the deep network model employs a recurrent neural network, and the loss function includes frequency domain perceptual loss and time domain fidelity loss.

[0029] In one implementation, the input feature set is fed into a trained deep network model, and the frequency domain features corresponding to the main speaker signal within the directional fan-shaped pickup area are output, including...

[0030] The real and imaginary features and magnitude are modeled using two independent LSTMs in the deep network model to obtain the modeling results.

[0031] The modeling results are fused and time-series modeled using a continuous LSTM in the deep network model, and the masking results are obtained using two independent mask decoders in the deep network model.

[0032] The mask result is multiplied onto the real and imaginary features of the first microphone to output the frequency domain features corresponding to the main speaker signal within the directional fan-shaped pickup area.

[0033] Secondly, embodiments of the present invention also provide a directional fan-shaped range sound pickup system based on a microphone linear array, wherein the system is used to implement the steps of the directional fan-shaped range sound pickup method based on a microphone linear array as described in any of the above solutions, and the system includes:

[0034] A microphone linear array construction module is used to build a microphone linear array, wherein the microphone linear array includes at least one directional microphone and several omnidirectional microphones;

[0035] The fan-shaped pickup module is used to determine the sound source angle and the distance between the sound source and the center of the omnidirectional microphone based on the phase difference between the two microphones, and to determine the fan-shaped pickup area based on the sound source angle and the distance between the sound source and the center of the omnidirectional microphone, thereby realizing fan-shaped pickup.

[0036] The directional fan-shaped pickup module is used to determine the positional relationship between the sound source and the linear array of microphones based on the amplitude difference between the directional microphone and any omnidirectional microphone, and to obtain the directional fan-shaped pickup area based on the positional relationship and the fan-shaped pickup area, thereby realizing directional fan-shaped pickup. The positional relationship refers to the sound source being located in front of or behind the linear array of microphones.

[0037] Thirdly, embodiments of the present invention also provide a terminal, wherein the terminal includes a memory, a processor, and a directional fan-shaped range pickup program based on a microphone linear array stored in the memory and executable on the processor. When the processor executes the directional fan-shaped range pickup program based on a microphone linear array, it implements the steps of the directional fan-shaped range pickup method based on a microphone linear array according to any of the above-described solutions.

[0038] Fourthly, embodiments of the present invention also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a directional fan-shaped range pickup program based on a microphone linear array, the directional fan-shaped range pickup program based on a microphone linear array implementing the steps of the directional fan-shaped range pickup method based on a microphone linear array as described in any of the above schemes on the computer-readable storage medium.

[0039] Beneficial Effects: Compared with existing technologies, this invention provides a directional fan-shaped pickup method based on a microphone linear array. First, a microphone linear array is established, comprising at least one directional microphone and several omnidirectional microphones. Then, the sound source angle and the distance between the sound source and the center of each omnidirectional microphone are determined based on the phase difference between the omnidirectional microphones. A fan-shaped pickup area is then determined based on the sound source angle and the distance between the sound source and the center of each omnidirectional microphone, achieving fan-shaped pickup. Next, the positional relationship between the sound source and the microphone linear array is determined based on the amplitude difference between the directional microphone and any one of the omnidirectional microphones. Based on this positional relationship and the fan-shaped pickup area, a directional fan-shaped pickup area is obtained, achieving directional fan-shaped pickup. The positional relationship refers to the sound source being located in front of or behind the microphone linear array.

[0040] This invention overcomes the limitations of traditional microphone technology in range definition and interference suppression by constructing a precise, controllable, and highly interference-resistant sound pickup system. Its core function lies in the precise control of pickup distance and angle, resulting in a fan-shaped pickup area, rather than the angled area of ​​dual microphones or traditional multi-microphone systems. It isolates all interference sources outside the pickup range, picking up only the speaker's voice within the fan-shaped area. Furthermore, since linear microphones cannot distinguish between front and back directions, resulting in a symmetrical pickup range, this invention adds a directional microphone to the linear system to achieve directional and area-of-effect pickup, thereby significantly improving the user's interactive experience. Attached Figure Description

[0041] Figure 1 This is a flowchart of a preferred embodiment of the directional fan-shaped range sound pickup method based on a microphone linear array according to an embodiment of the present invention.

[0042] Figure 2 This is a schematic diagram of the structure of the microphone linear array in the directional fan-shaped range sound pickup method based on the microphone linear array according to an embodiment of the present invention.

[0043] Figure 3 This is a schematic diagram illustrating the principle of the directional fan-shaped range sound pickup method based on a microphone linear array according to an embodiment of the present invention.

[0044] Figure 4This is a flowchart illustrating the directional fan-shaped range sound pickup method based on a microphone linear array according to an embodiment of the present invention.

[0045] Figure 5 This is a suppression diagram of a directional microphone in a directional fan-shaped range sound pickup method based on a microphone linear array according to an embodiment of the present invention.

[0046] Figure 6 This is a flowchart illustrating the application of the directional fan-shaped sound pickup method based on a linear microphone array to a deep network model, as described in this embodiment of the invention.

[0047] Figure 7 This is a structural diagram of the deep network model in the directional fan-shaped range sound pickup method based on a microphone linear array according to an embodiment of the present invention.

[0048] Figure 8 The spectrum analysis diagram of the input signals of 36 test statements was collected, with 0° directly in front of the microphone array and a test angle set every 10°.

[0049] Figure 9 The spectrum analysis diagram of the output result after using the directional fan-shaped range sound pickup method based on the microphone linear array of this embodiment of the invention to test voice 1.

[0050] Figure 10 The spectrum analysis diagrams are of the input signals collected under various test scenarios, including fixed-distance tests (0.7m, 0.9m, 1.1m, 1.5m), front and rear dual-talk, external dual-talk, and one internal and one external dual-talk.

[0051] Figure 11 The spectrum analysis diagram of the output result after using the directional fan-shaped range sound pickup method based on the microphone linear array of this embodiment of the invention to test voice 2.

[0052] Figure 12 This is a schematic diagram of a directional fan-shaped pickup system based on a microphone linear array, provided for an embodiment of the present invention.

[0053] Figure 13 A schematic diagram of a terminal provided in an embodiment of the present invention. Detailed Implementation

[0054] To make the objectives, technical solutions, and effects of this invention clearer and more explicit, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

[0055] The flowchart shown in the attached diagram is for illustrative purposes only and does not necessarily include all content, operations, or steps, nor does it require execution in the described order. For example, some operations or steps can be broken down, combined, or partially merged, so the actual execution order may change depending on the actual situation.

[0056] It should be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.

[0057] It should be understood that, in order to clearly describe the technical solutions of the embodiments of the present invention, the terms "first" and "second" are used in the embodiments of the present invention to distinguish identical or similar items with essentially the same function and effect. For example, "first control information" and "second control information" are only used to distinguish different control information and do not limit their order.

[0058] Those skilled in the art will understand that the words "first" and "second" do not limit the quantity or the order of execution, and that the words "first" and "second" do not necessarily imply that they are different.

[0059] It should also be understood that the term “and / or” as used in this specification and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0060] This invention provides a directional fan-shaped area sound pickup method based on a linear microphone array, which effectively solves the problem of sound pickup in a directional fan-shaped area, while also addressing voice enhancement and noise suppression. Specifically, this directional fan-shaped area sound pickup method based on a linear microphone array can be applied to a terminal, such as a computer, smart speaker, or other intelligent device with sound pickup capabilities. Figure 1 As shown in the figure, the directional fan-shaped range sound pickup method based on a microphone linear array in this embodiment includes the following steps:

[0061] Step S100: Establish a microphone linear array, which includes at least one directional microphone and several omnidirectional microphones.

[0062] Specifically, the linear microphone array in this embodiment includes at least one directional microphone and several omnidirectional microphones. At least three omnidirectional microphones are provided. It should be noted that this embodiment does not limit the position of each microphone or the spacing between all microphones; the spacing between all microphones is flexible and variable, and the included angle and fixed distance range can be adjusted according to requirements.

[0063] In a preferred embodiment, combined with Figure 2 As shown, four omnidirectional microphones are provided, and one directional microphone is provided. The directional microphone (i.e....) Figure 2 The microphone mic5 is located at the center of the linear microphone array and includes types such as ECM (electret condenser microphone) and MEMS (microelectromechanical system microphone). Four omnidirectional microphones are located on either side of the directional microphone, forming a symmetrical linear topology. The four omnidirectional microphones are: the first microphone (i.e.,...) Figure 2 mic1), second microphone (i.e. Figure 2 mic2 in the middle), the third microphone (i.e. Figure 2 mic3) and the fourth microphone (i.e. Figure 2 (mic4 in the diagram), the first and second microphones are located to the left of the directional microphone, and the third and fourth microphones are located to the right of the directional microphone. Figure 2 As can be seen, the first and fourth microphones are symmetrically arranged, and the second and third microphones are symmetrically arranged, forming a linear topological structure that is symmetrical from left to right.

[0064] Step S200: Determine the sound source angle and the distance between the sound source and the center of the two omnidirectional microphones based on the phase difference between the omnidirectional microphones. Determine the fan-shaped pickup area based on the sound source angle and the distance between the sound source and the center of the omnidirectional microphones to achieve fan-shaped range pickup.

[0065] In practical applications, the sound source is assumed to be a single, noise-free, reverberation-free, and interference-free ideal source. In beamforming and linear array signal processing, spatial signals are often treated as plane waves, which facilitates the calculation of signal source angles and beamforming. However, for fixed-distance sound pickup, it is necessary to break down the signal into multiple microphones or treat the signal as a spherical wave, such as... Figure 3 The schematic diagram shown illustrates that the microphone linear array in this embodiment uses... Figure 2 The principle is explained using a left-right symmetrical linear topology as an example. Of course, other asymmetrical linear topologies can also implement the solution of this invention. Figure 3In this diagram, d1 represents the first distance between the first and second microphones, which also corresponds to the distance between the third and fourth microphones. d2 represents the second distance between the second and third microphones. α and β represent the angles between the sound source and the two microphones. Specifically, α is the first sound source angle between the sound source and the array formed by the first and second microphones, and β is the second sound source angle between the sound source and the array formed by the third and fourth microphones. dm1 and dm2 are the distances from the sound source to the centers of the two omnidirectional microphones. Specifically, since there are four omnidirectional microphones (the first and second microphones are located to the left of the directional microphones, and the third and fourth microphones are located to the right), dm1 is the first distance from the sound source to the center of the left omnidirectional microphone (the midpoint between the first and second microphones), dm2 is the second distance from the sound source to the center of the right omnidirectional microphone (the midpoint between the third and fourth microphones), and dm is the final calculated sound source distance between the sound source and the directional microphones.

[0066] Specifically, in combination Figure 4 As shown in the figure, this embodiment first performs short-time Fourier transform on the received signals of the first microphone, the second microphone, the third microphone, and the fourth microphone to obtain the frequency domain characteristics of the first microphone, the second microphone, the third microphone, and the fourth microphone, respectively. Then, based on the real and imaginary parts of the obtained frequency domain characteristics, angle and fan-shaped range sound pickup processing is performed to obtain the fan-shaped sound pickup area, and fan-shaped range sound pickup is performed.

[0067] This embodiment uses a first microphone and a second microphone as examples, defining: the received signal of the first microphone (mic1) as... The signal received by the second microphone (mic2) is Given a sampling rate of f, a sound speed of c, and a sound source distance threshold of dt, a short-time Fourier transform is performed on the received signal from the first microphone to obtain its frequency domain characteristics as follows: Performing a short-time Fourier transform on the received signal from the second microphone yields the following frequency domain characteristics of the second microphone: .

[0068] Next, a first phase difference between the received signals from the first and second microphones is determined based on a first correlation between their frequency domain characteristics, and a second phase difference between their received signals is determined based on a second correlation between their frequency domain characteristics. Specifically, taking the first and second microphones as examples, the first correlation between their frequency domain characteristics is expressed as follows: ,in Let |·| denote conjugate, and |·| denote magnitude. Then, for the first correlation... Perform inverse FFT (Fast Fourier Transform) to obtain Next, find The peak position was recorded as... (Sampling point offset), the location of this peak. This refers to the first phase difference between the received signals from the first and second microphones. Similarly, the second correlation between the frequency domain characteristics of the third and fourth microphones, as well as the second phase difference between the received signals from the third and fourth microphones, can be calculated.

[0069] Furthermore, a first time difference between the received signals of the first microphone and the second microphone is determined based on the first phase difference, and a second time difference between the received signals of the third microphone and the fourth microphone is determined based on the second phase difference. Taking the first microphone and the second microphone as examples, after calculating the first phase difference (i.e., peak position) between the received signals of the first microphone and the second microphone... After that, the first time difference between the received signals from the first microphone and the second microphone is expressed as: Similarly, the second time difference between the received signals from the third and fourth microphones can be calculated. Then, based on the first time difference and the second time difference, the first sound source angle (i.e., Figure 3 The included angle ) and the angle of the second sound source (i.e. Figure 3 The included angle Taking the first microphone and the second microphone as examples, the angle of the first sound source is expressed as: Similarly, the angle of the second sound source can be calculated. .

[0070] Next, this embodiment determines the first distance d1 between the first microphone and the second microphone, and the second distance d2 between the second microphone and the third microphone, and based on the first sound source angle Second sound source angle The first spacing d1 and the second spacing d2 are used to determine the first distance dm1 from the sound source to the midpoint between the first and second microphones, and the second distance dm2 from the sound source to the midpoint between the third and fourth microphones, respectively. This gives the distance between the sound source and the centers of the two omnidirectional microphones. Specifically, this is achieved using the triangle sine theorem. Calculate dm1 and dm2. Where dm1 = dm2 = .

[0071] Furthermore, in this embodiment, the sound source distance dm between the sound source and the directional microphone is determined based on the first spacing d1, the second spacing d2, the first distance dm1, and the second distance dm2. Using the properties of triangles, the length of the line connecting any vertex to the midpoint of the opposite side can be calculated given the side length. Therefore, the sound source distance dm is calculated as follows: dm = Finally, in this embodiment, the sound source distance dm is compared with a preset sound source distance threshold dt. If the sound source distance dm is less than or equal to the preset sound source distance threshold dt, then the sound source is determined to be within the fixed-distance pickup range. Therefore, this embodiment can be based on the first distance dm1, the second distance dm2, and the first sound source angle. and the angle of the second sound source This yields a fan-shaped pickup area and enables fan-shaped pickup. If the sound source distance dm is greater than the preset sound source distance threshold dt, it indicates that the sound source is outside the fixed-distance pickup range and is isolated.

[0072] Step S300: Based on the amplitude difference between the directional microphone and any omnidirectional microphone, determine the positional relationship between the sound source and the linear array of microphones, and based on the positional relationship and the fan-shaped pickup area, obtain a directional fan-shaped pickup area to achieve directional fan-shaped range pickup. The positional relationship refers to the sound source being located in front of or behind the linear array of microphones.

[0073] While linear array microphones can pick up sound in terms of angle and range, they cannot distinguish between front and back directions. Therefore, this embodiment uses the amplitude difference between a directional microphone and an omnidirectional microphone to determine the front and back direction. For example... Figure 5 As shown, by analyzing the suppression amplitude values ​​of the directional microphone (mic5) in different directions, a threshold for distinguishing the amplitude difference between front and rear is set. For example, when the amplitude difference is less than the amplitude difference threshold X, the sound source is considered to be in front; when the amplitude difference is greater than the amplitude difference threshold X, the sound source is considered to be behind, where X represents the amplitude difference threshold. Figure 5 The outer red scale 0 represents directly in front, 180 represents directly behind, the green scale represents the suppression decibel value, and the blue curve represents the gain of the directional microphone mic5 when the signal source is at different angular positions from 0° to 360°.

[0074] Specifically, in combination Figure 4As shown in the diagram, this embodiment performs short-time Fourier transforms on the received signals from the directional microphone and any omnidirectional microphone (taking the first microphone mic1 as an example) to obtain the frequency domain characteristics of the directional microphone and any omnidirectional microphone. Then, the amplitudes of the directional microphone and any omnidirectional microphone are calculated, and directional sound pickup processing is performed. This embodiment can calculate acoustic decibels based on the frequency domain characteristics to obtain a first decibel value D1 and a second decibel value D2. The calculation method for the first decibel value D1 and the second decibel value D2 in this embodiment belongs to the classic frequency domain amplitude decibel conversion method in speech signal processing, and it is calculated frame-by-frame and frequency-by-frequency point, which will not be elaborated here. Next, based on the first decibel value and the second decibel value, the amplitude difference between the directional microphone and any omnidirectional microphone is obtained, expressed as: D =|D1 - D2|. Next, the amplitude difference D is compared with a preset amplitude difference threshold X to obtain the positional relationship between the sound source and the microphone linear array. If the amplitude difference D is less than the preset amplitude difference threshold X, the positional relationship is that the sound source is in front of the microphone linear array. If the amplitude difference D is greater than the preset amplitude difference threshold X, the positional relationship is that the sound source is behind the microphone linear array.

[0075] Once the positional relationship is determined, it can be known whether the sound source is located in front of or behind the linear microphone array. In practical applications, such as... Figure 4 As shown, after processing the angle and fan-shaped pickup area to determine the fan-shaped pickup region, and after directional adaptation processing to determine whether the sound source is located in front of or behind the microphone linear array, the positional relationship with the fan-shaped pickup region yields the directional fan-shaped pickup region. This allows for the acquisition of the frequency domain features of the speech within the directional fan-shaped pickup region. After a short-time inverse Fourier transform, the time-domain speech signal is output, thus achieving directional fan-shaped pickup. Therefore, this invention can precisely control the pickup distance and angle, with a fan-shaped pickup range that can isolate all interference sources outside the range, picking up only the speaker's speech within the fan-shaped region.

[0076] In another implementation, real-world scenarios are affected by factors such as human interference, noise, and reverberation. Theoretical calculations alone are insufficient for accurate assessment, necessitating the use of deep learning to generate simulation data for training and processing to achieve the desired result. Therefore, this embodiment can train a deep network model, such as... Figure 6 As shown, the entire process of training a deep network model consists of four parts: data simulation and generation, input feature calculation, deep network model, and loss function. The deep network model replaces... Figure 4 The process of mid-range sound pickup processing and directional sound pickup processing, based on the deep network model of this embodiment, can more accurately and robustly solve the effects of interference, noise, reverberation and other factors in real-world scenarios.

[0077] In practical applications, when the sound source is a multi-source source containing noise signals, main signals, and interference signals, training data is generated through simulation. A deep network model is then trained based on this training data. The training data simulates interference and noise signals in a multi-dimensional spatial environment with different reverberation times and sound source distances. The training data also includes preset angle ranges and distance thresholds (such as a 90° angle and a 1m distance). These two parameters are incorporated into the model through scene annotations in the training data. The training data simulates both "fan-shaped sound sources within a specified angle and distance range" and "interference sound sources outside the range." Through learning, the deep network model assigns effective weights (preserves signals) to signals within the fan-shaped area using a mask, and assigns zero weights (filters signals) to signals outside the fan-shaped area. The final sound pickup range is the fan-shaped area formed by the angle and distance, which perfectly matches the fan-shaped sound pickup area calculated under an ideal sound source.

[0078] For orientation, the deep network model in this embodiment designs two independent mask encoders (i.e. Figure 7 The system employs a front mask decoder and a rear mask decoder, combining amplitude modeling of a directional microphone with that of any omnidirectional microphone. This allows two masks to correspond to directional fan-shaped regions in front of and behind the microphone linear array, respectively, rather than the traditional symmetrical fan-shaped regions of a linear array. The front mask decoder filters only the signal within the fan-shaped region directly in front, and the rear mask decoder filters only the signal within the fan-shaped region directly behind. Furthermore, it can select unidirectional (e.g., only front) or bidirectional directional fan-shaped pickup ranges according to actual needs, fully achieving the core requirement of directional imaging.

[0079] Specifically, this embodiment uses the established linear microphone array to generate a large-scale simulated room impulse response (RIR) as the basis for noise signals, speaker signals, and interference signals. Based on this dataset, the system can accurately simulate multi-dimensional spatial environments with different reverberation times and sound source distances, forming a training dataset. Furthermore, to address potential consistency deviations in mass-produced hardware, this embodiment introduces a dynamic frequency response perturbation and nonlinear distortion simulation mechanism during the training data generation process. Additionally, since this embodiment adds a directional microphone, the directional microphone (mic5) in the generated RIR does not necessarily meet the corresponding amplitude and frequency response requirements. This embodiment also needs to further refine the simulation. Figure 5 The system determines the location of the sound source and synthesizes the sound dynamically to ensure it matches the real environment.

[0080] Furthermore, in this embodiment, short-time Fourier transforms are first performed on the received signals from the first microphone (mic1), the second microphone (mic2), the third microphone (mic3), and the fourth microphone (mic4) to obtain the frequency domain features of the first microphone, the second microphone, the third microphone, and the fourth microphone, respectively. The real and imaginary parts of the frequency domain features of the first microphone, the second microphone, the third microphone, and the fourth microphone are then extracted. Short-time Fourier transforms are then performed on the received signals from the directional microphone (mic5) and any omnidirectional microphone (taking the first microphone mic1 as an example), respectively. The amplitudes of the directional microphone and any omnidirectional microphone are then extracted. All extracted real and imaginary part features, along with all amplitudes, are used to form an input feature set.

[0081] Next, the input feature set is input into the trained deep network model, combined with... Figure 7 As shown, the deep network model in this embodiment employs a recurrent neural network. The input to the deep network model consists of all extracted real and imaginary features, as well as all amplitudes (i.e., the real and imaginary features of the first microphone mic1, the second microphone mic2, the third microphone mic3, and the fourth microphone mic4) and the amplitudes of the directional microphone mic5 and the first microphone mic1. Two independent LSTM (Long Short-Term Memory) networks in the deep network model are used to model the real and imaginary features and amplitudes respectively, yielding the modeling results. Then, continuous LSTMs in the deep network model are used to perform feature fusion and time-series modeling on the modeling results. Since it is necessary to distinguish whether the sound source is located in front of or behind the microphone linear array, and to determine the distance to the sound source, the deep network model in this embodiment designs two independent mask decoders: a front mask decoder and a rear mask decoder, used to estimate the front real and imaginary mask and the rear real and imaginary mask, respectively. Therefore, the two independent mask decoders in the deep network model can be used to achieve accurate distinction between front and rear directions. Finally, in this embodiment, the masking results output by the two independent masking decoders are multiplied onto the real and imaginary features of the first microphone mic1 to estimate the spectral features of the main speaking signal. This results in the output of the frequency domain features corresponding to the main speaking signal within the target pickup area (i.e., the directional fan-shaped pickup area), thus completing the intelligent determination of the pickup range. Finally, the time-domain speech signal corresponding to the directional fan-shaped pickup area is obtained through short-time Fourier transform, thereby realizing directional fan-shaped pickup.

[0082] Furthermore, the loss function of the deep network model in this embodiment includes frequency domain perceptual loss and time domain fidelity loss. After the waveform synthesis and output steps in the model training phase, the loss function calculates the error between the reconstructed time-domain speech signal and the real target speech to guide the deep network model in backpropagation and parameter updates. To balance spectral detail recovery and time-domain waveform fidelity within a unified optimization framework, the loss function... Defined as:

[0083]

[0084] in, For frequency domain sensing loss, Loss due to time-domain fidelity.

[0085] Frequency domain sensing loss Composed of a weighted average of amplitude compression loss and complex normalization loss, it utilizes power-law compression to balance high and low frequency energy and constrain phase consistency, and is expressed as:

[0086]

[0087] Among them, amplitude compression loss Using power-law compression (0.3) to enhance the weight of weak signals is expressed as:

[0088]

[0089] Where F is the L2 norm of the matrix, This represents the predicted speech spectrum of a neural network. This represents the true, pure speech spectrum.

[0090] Complex normalized loss Introducing a 0.7 power normalization, the complex phase error is constrained within the compressed domain, expressed as: ,in, This represents a minimal constant to prevent the denominator from being zero.

[0091] Time-domain fidelity loss Using degree-invariant signal-to-noise ratio (SI-SNR) to eliminate gain scaling error and constrain waveform time structure, it is expressed as follows:

[0092]

[0093] in, Let the target projection vector be... For noise residuals, the target projection vector With noise residual The calculation is as follows:

[0094] , ,

[0095] in, and These are the predicted waveform and the actual target waveform, respectively, recovered by iSTFT (inverse short-time Fourier transform).

[0096] In summary, this invention proposes a regional sound pickup method based on a linear microphone array, which constructs a precise, controllable, and interference-resistant sound pickup system, breaking through the limitations of traditional sound pickup technology in terms of range definition and interference suppression.

[0097] Furthermore, this embodiment, based on the trained deep learning model, designs test scenarios such as angle / forward / backward direction and fixed distance / multiple sound sources to verify the model's accuracy in angular sound pickup, directional sound pickup, and fixed distance sound pickup. The experimental results, in turn, verify the rationality of the hardware topology, theoretical calculations, and deep learning model design. Specifically, such as Figure 8 As shown, Figure 8 The spectrum analysis of the input signals of 36 test statements were collected, with 0° directly in front of the microphone array and a test angle set every 10°. This spectrum was used to test the accuracy of the angle and the determination of the front and back directions. Figure 9 The spectrum analysis diagram shows the output result of the directional fan-shaped range sound pickup method based on a microphone linear array according to the embodiment of the present invention for testing voice 1. The first channel represents the reference first microphone mic1, the second channel represents the front output channel, and the third channel represents the rear output channel. The included angle is set to a 90° range. Figure 9 The spectrum analysis shows that the second channel, 0°~40° and 320°~350°, falls exactly at a 90° pickup angle directly in front. The third channel, 140°~220°, falls exactly at a 90° pickup angle directly behind, verifying the correctness of the pickup angle and the front / back pickup pattern. Figure 10 The spectrum analysis diagrams are of the input signals collected under various test scenarios, including fixed-distance tests (0.7m, 0.9m, 1.1m, 1.5m), front and rear dual-talk, external dual-talk, and one internal and one external dual-talk. Figure 11 The spectrum analysis diagram of the output result after using the directional fan-shaped range pickup method based on a microphone linear array according to the embodiment of the present invention to test speech 2. Setting the fixed distance range to 1m, it can correctly isolate the speech of speakers beyond 1m. Figure 11 As can be seen, the 0.7m and 0.9m outputs are on the first channel, while 1.1m and 1.5m are isolated and not within the pickup range. The front and rear dual-talk outputs show excellent separation; no sound leakage occurred at any location outside the designated area. The inner and outer speakers correctly identified the speaker's position and isolated interfering speech. The above experiments comprehensively verify the effectiveness of the algorithm in pickup within angular, directional, and fixed-distance ranges.

[0098] Based on the above embodiments, the present invention also provides a directional fan-shaped range sound pickup system based on a microphone linear array. The system in this embodiment is used to implement the steps of the above method embodiments. Figure 12 As shown, the system includes: a microphone linear array construction module 10, a fan-shaped pickup module 20, and a directional fan-shaped pickup module 30. Specifically, the microphone linear array construction module is used to establish a microphone linear array, which includes at least one directional microphone and several omnidirectional microphones. The fan-shaped pickup module 20 is used to determine the sound source angle and the distance between the sound source and the center of the omnidirectional microphone based on the phase difference between the omnidirectional microphones, and to determine a fan-shaped pickup area based on the sound source angle and the distance between the sound source and the center of the omnidirectional microphone, thereby achieving fan-shaped pickup. The directional fan-shaped pickup module 30 is used to determine the positional relationship between the sound source and the microphone linear array based on the amplitude difference between the directional microphone and any one of the omnidirectional microphones, and to obtain a directional fan-shaped pickup area based on the positional relationship and the fan-shaped pickup area, thereby achieving directional fan-shaped pickup. The positional relationship refers to the sound source being located in front of or behind the microphone linear array.

[0099] The principle of each module in the directional fan-shaped range sound pickup system based on the microphone linear array in this embodiment is the same as that of each step in the above method embodiment, and will not be elaborated further here.

[0100] Based on the above embodiments, the present invention also provides a terminal, the principle block diagram of which can be as follows: Figure 13 As shown. The terminal may include one or more processors 100 ( Figure 13 (Only one is shown in the diagram), memory 101, and computer program 102 stored in memory 101 and executable on one or more processors 100. For example, a directional fan-range pickup program based on a microphone linear array. When one or more processors 100 execute computer program 102, they can implement the various steps in the embodiment of the directional fan-range pickup method based on a microphone linear array. Alternatively, when one or more processors 100 execute computer program 102, they can implement the functions of various modules / units in the embodiment of the directional fan-range pickup system based on a microphone linear array, without limitation herein.

[0101] In one embodiment, the processor 100 may be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor.

[0102] In one embodiment, memory 101 can be an internal storage unit of the terminal, such as a hard disk or RAM. Memory 101 can also be an external storage device of the terminal, such as a plug-in hard disk, smart media card (SM), secure digital card (SD), flash card, etc. Furthermore, memory 101 can include both internal and external storage units. Memory 101 is used to store computer programs and other programs and data required by the terminal. Memory 101 can also be used to temporarily store data that has been output or will be output.

[0103] Those skilled in the art will understand that Figure 13 The block diagram shown is merely a partial structural diagram related to the present invention and does not constitute a limitation on the terminal to which the present invention is applied. A specific terminal may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0104] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided by this invention can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), direct memory bus RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

[0105] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for directional fan-shaped sound pickup based on a linear microphone array, characterized in that, The method includes: A linear microphone array is established, comprising at least one directional microphone and several omnidirectional microphones; The sound source angle and the distance between the sound source and the center of the omnidirectional microphone are determined based on the phase difference between the two microphones. A fan-shaped pickup area is then determined based on the sound source angle and the distance between the sound source and the center of the omnidirectional microphone to achieve fan-shaped pickup. Based on the amplitude difference between the directional microphone and any omnidirectional microphone, the positional relationship between the sound source and the linear array of microphones is determined, and based on the positional relationship and the fan-shaped pickup area, a directional fan-shaped pickup area is obtained to achieve directional fan-shaped range pickup. The positional relationship refers to the sound source being located in front of or behind the linear array of microphones. At least three omnidirectional microphones shall be provided; When one directional microphone is set and four omnidirectional microphones are set, the omnidirectional microphones are respectively: a first microphone, a second microphone, a third microphone, and a fourth microphone. The sound source angle and the distance between the sound source and the center of the omnidirectional microphones are determined based on the phase difference between the omnidirectional microphones. A fan-shaped pickup area is determined based on the sound source angle and the distance between the sound source and the center of the omnidirectional microphones to achieve fan-shaped range pickup, including: When the sound source is a single, noise-free, reverberation-free, and interference-free ideal sound source, short-time Fourier transforms are performed on the received signals of the first microphone, the second microphone, the third microphone, and the fourth microphone respectively to obtain the frequency domain characteristics of the first microphone, the second microphone, the third microphone, and the fourth microphone respectively. The first phase difference between the received signals of the first microphone and the second microphone is determined based on the first correlation between the frequency domain characteristics of the first microphone and the second microphone, and the second phase difference between the received signals of the third microphone and the fourth microphone is determined based on the second correlation between the frequency domain characteristics of the third microphone and the fourth microphone. The first time difference between the received signals of the first microphone and the second microphone is determined based on the first phase difference, and the second time difference between the received signals of the third microphone and the fourth microphone is determined based on the second phase difference. The first sound source angle and the second sound source angle are determined based on the first time difference and the second time difference; Determine the first distance between the first microphone and the second microphone, and the second distance between the second microphone and the third microphone. Based on the first sound source angle, the second sound source angle, the first distance, and the second distance, determine the first distance from the sound source to the midpoint between the first microphone and the second microphone, and the second distance from the sound source to the midpoint between the third microphone and the fourth microphone, respectively, and obtain the distance between the sound source and the center of the omnidirectional microphone. Based on the first spacing, the second spacing, the first distance, and the second distance, the sound source distance between the sound source and the directional microphone is determined; The sound source distance is compared with a preset sound source distance threshold. If the sound source distance is less than or equal to the preset sound source distance threshold, the sound source is determined to be within the fixed-distance pickup range. Based on the first distance, the second distance, the first sound source angle, and the second sound source angle, a fan-shaped sound pickup area is obtained, thereby achieving fan-shaped range sound pickup; Wherein, the distance to the sound source is expressed as: dm = , The first distance is from the sound source to the midpoint between the first and second microphones. The second distance is the distance from the sound source to the midpoint between the third and fourth microphones. The first spacing between the first microphone and the second microphone. The second spacing between the second and third microphones; dm1 = , dm2 = , From the angle of the first sound source, The second sound source angle is calculated in the same way as the first sound source angle. The first sound source angle is expressed as: c is the speed of sound. The first time difference is represented as: , This represents the sampling point offset between the peak positions of the received signals from the first and second microphones. Sampling rate; Determining the positional relationship between the sound source and the linear microphone array based on the amplitude difference between the directional microphone and any omnidirectional microphone includes: Perform short-time Fourier transforms on the received signals from the directional microphone and any omnidirectional microphone respectively, and calculate the first decibel value and the second decibel value; Based on the first decibel value and the second decibel value, the amplitude difference between the directional microphone and any omnidirectional microphone is obtained; The amplitude difference is compared with a preset amplitude difference threshold to obtain the positional relationship between the sound source and the microphone linear array. If the amplitude difference is less than the preset amplitude difference threshold, the positional relationship is that the sound source is in front of the microphone linear array. If the amplitude difference is greater than the preset amplitude difference threshold, the positional relationship is that the sound source is behind the microphone linear array. The method further includes: When the sound source is a multi-source sound source containing noise signal, main signal, and interference signal, training data is generated in simulation, and a deep network model is trained based on the training data. The training data is interference and noise signals in a multi-dimensional spatial environment with different reverberation time and sound source distance. During the generation of the training data, a dynamic frequency response disturbance and nonlinear distortion simulation mechanism is introduced. Short-time Fourier transforms are performed on the received signals from the first microphone, the second microphone, the third microphone, and the fourth microphone to obtain the frequency domain features of the first microphone, the second microphone, the third microphone, and the fourth microphone, respectively. The real and imaginary parts of the frequency domain features of the first microphone, the second microphone, the third microphone, and the fourth microphone are then extracted. Perform short-time Fourier transforms on the received signals from the directional microphone and any one of the omnidirectional microphones respectively, and extract the amplitudes of the directional microphone and any one of the omnidirectional microphones. The extracted real and imaginary features, along with all amplitudes, are combined to form an input feature set. This input feature set is then fed into a trained deep network model, which outputs the frequency domain features corresponding to the main speaker signal within the directional fan-shaped pickup area. The time domain speech signal corresponding to the directional fan-shaped pickup area is obtained through short-time Fourier transform. The deep network model uses a recurrent neural network, and the loss function includes frequency domain perceptual loss and time domain fidelity loss. loss function Defined as: in, For frequency domain sensing loss, For time-domain fidelity loss; Frequency domain sensing loss Composed of a weighted average of amplitude compression loss and complex normalization loss, it utilizes power-law compression to balance high and low frequency energy and constrain phase consistency, and is expressed as: Among them, amplitude compression loss Using power-3 compression to enhance the weight of weak signals is expressed as: Where F is the L2 norm of the matrix, This represents the predicted speech spectrum of a neural network. Represents the true, clean speech spectrum; Complex normalized loss Introducing a 0.7 power normalization, the complex phase error is constrained within the compressed domain, expressed as: ,in, This represents a very small constant to prevent the denominator from being zero; Time-domain fidelity loss Using a constant signal-to-noise ratio, eliminating gain scaling errors, and constraining the waveform time structure, it can be expressed as: in, Let the target projection vector be... For noise residuals, the target projection vector With noise residual , respectively represented as: , , in, and These are the predicted waveform recovered by inverse short-time Fourier transform and the actual target waveform, respectively. The input feature set is input into the trained deep network model, and the frequency domain features corresponding to the main speaker signal within the directional fan-shaped pickup area are output, including: The real and imaginary features and amplitudes are modeled using two independent long short-term memory networks in the deep network model, and the modeling results are obtained. The modeling results are fused and time-series modeled by a continuous long short-term memory network in the deep network model, and the masking results are obtained by two independent mask decoders in the deep network model. The mask result is multiplied onto the real and imaginary features of the first microphone to output the frequency domain features corresponding to the main speaker signal within the directional fan-shaped pickup area.

2. A directional fan-shaped range sound pickup system based on a linear microphone array, characterized in that, The system is used to implement the steps of the directional fan-shaped range sound pickup method based on a microphone linear array as described in claim 1, and the system includes: A microphone linear array construction module is used to build a microphone linear array, wherein the microphone linear array includes at least one directional microphone and several omnidirectional microphones; The fan-shaped pickup module is used to determine the sound source angle and the distance between the sound source and the center of the omnidirectional microphone based on the phase difference between the two microphones, and to determine the fan-shaped pickup area based on the sound source angle and the distance between the sound source and the center of the omnidirectional microphone, thereby realizing fan-shaped pickup. The directional fan-shaped pickup module is used to determine the positional relationship between the sound source and the linear array of microphones based on the amplitude difference between the directional microphone and any omnidirectional microphone, and to obtain the directional fan-shaped pickup area based on the positional relationship and the fan-shaped pickup area, thereby realizing directional fan-shaped pickup. The positional relationship refers to the sound source being located in front of or behind the linear array of microphones.

3. A terminal, characterized in that, The terminal includes a memory, a processor, and a directional fan-shaped range pickup program based on a microphone linear array stored in the memory and executable on the processor. When the processor executes the directional fan-shaped range pickup program based on a microphone linear array, it implements the steps of the directional fan-shaped range pickup method based on a microphone linear array as described in claim 1.

4. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a directional fan-shaped range pickup program based on a microphone linear array, the directional fan-shaped range pickup program based on a microphone linear array implementing the steps of the directional fan-shaped range pickup method based on a microphone linear array as described in claim 1 on the computer-readable storage medium.