Earphone intelligent transparent transmission system

CN115835080BActive Publication Date: 2026-06-12SHANGHAI ZHENCHENG MICROELECTRONICS TECH CO LTD

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHANGHAI ZHENCHENG MICROELECTRONICS TECH CO LTD
Filing Date: 2022-11-22
Publication Date: 2026-06-12

Application Information

Patent Timeline

22 Nov 2022

Application

12 Jun 2026

Publication

CN115835080B

IPC: H04R1/10; H04R3/00

AI Tagging

Application Domain

Earpiece/earphone attachmentsTransducer circuits

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Headset with folding behind-the-ear over-ear open back noise cancelling headphones
CN224356233UEarpiece/earphone attachments
Earpad assembly, earpad structure, earphone, and earphone system
WO2026118973A1Earpiece/earphone attachments
Acoustic signal output device
US20260164172A1Loudspeaker transducer fixingEarpiece/earphone attachments
Magnetic quick-release anti-slip earphone leather sheath
CN224356226UEarpiece/earphone attachments
Ear tips and related devices and methods
JP2026095459AEarpiece/earphone attachments

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN115835080B_ABST

Patent Text Reader

Abstract

The application captures spatial audio in a specified area direction by using a binaural earphone microphone array and a complex Gaussian mixture model in different directions trained in advance, adjusts the sound effect of a specified type of sound in the specified area direction by using a neural network intelligent model, transmits the specified type of sound in the specified direction to a user after sound effect balance adjustment processing of binaural channels, realizes ideal sound effect experience of intelligence and individualization of different users, different environments and different application requirements, strengthens and extends the hearing aid ability of earphones in daily life, and practically meets the demand of people using earphones as a helper in daily life.

Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] This invention belongs to the field of electronic technology, and specifically refers to an intelligent transparent transmission system for headphones. [Background Technology]

[0002] Headphones, as one of the most common wearable electronic products, have become an indispensable part of people's lives, and thus have been endowed with more and more functions: to eliminate the interference of ambient noise when listening to music, headphones have active noise cancellation (ANC); to prevent the ambient noise of the speaker from being transmitted to the listener during phone calls, headphones have added ambient noise cancellation (ENC); to be able to hear any external sounds clearly without removing the headphones, headphones have the so-called "passthrough" function. However, the noise cancellation and passthrough functions of headphones are greatly affected by the complexity of ambient sound, different wearing methods, and different ear structures. This affects the user's listening experience to a certain extent, making it difficult for users to clearly hear the sounds they want to hear in many situations. For example, when cooking in the kitchen with the range hood on, a user wants to listen to music without being disturbed by the noise of the range hood, while also wanting to hear a child crying in the bedroom or the doorbell ringing; or when watching TV in the living room, someone is chatting next to them, and the user only wants to hear the sound of the TV, not the conversation; or at a party, the user only wants to hear the conversation, not the background music. In these cases, the ordinary headphone passthrough function is difficult to meet the user's needs.

[0003] Currently, the pass-through function of headphones is mainly achieved by the headphone microphone picking up ambient sound, performing gain processing, analog-to-digital conversion, sampling filtering, and pass-through filtering on the signal, and then performing digital-to-analog conversion before being played by the headphone speaker, so that the user ultimately hears a simulated natural ambient sound. If a pass-through function with enhanced human voice is required, it can be achieved by setting an appropriate passband for the bandpass filter.

[0004] To further meet user needs and improve user experience, some industry professionals have proposed a method that uses two microphones on each side of the headphones to pick up ambient sound signals to determine the direction of the sound source. Based on the determined direction of the sound source and a pre-established correspondence between the location information of the sound source in different locations and the corresponding sound effect signals, the corresponding target sound effect signal is matched. Then, the headphones' speakers are controlled to play a sound with the corresponding sound effect based on the target sound effect signal.

[0005] However, the aforementioned headphone pass-through function is designed to allow users to clearly perceive ambient sounds or enhance only human voices in the environment without removing the headphones after entering pass-through mode. Users cannot selectively acquire sounds from different directions or selectively acquire specific sound types other than human voices. Therefore, it cannot flexibly adapt to users' personalized application needs in different environments, nor can it extend the headphone's hearing aid function. [Summary of the Invention]

[0006] The purpose of this invention is to provide an intelligent pass-through system for headphones, which solves the problem that existing technologies cannot flexibly meet the different hearing aid needs of different users in different scenarios, thus failing to provide users with a personalized and ideal sound experience.

[0007] To achieve the above objectives, the intelligent transparent transmission system for headphones according to the present invention includes:

[0008] The left and right ear microphone arrays are symmetrically arranged to correspond to the left and right ears, and include multiple microphones to capture audio signals;

[0009] The left and right ear spatial audio capture module includes multiple complex Gaussian mixture intelligent models corresponding to spatial directions. The left and right ear spatial audio capture modules respectively perform FFT transformation on the sound signals collected by the left and right ear microphone arrays, and then simultaneously input them to each complex Gaussian mixture intelligent model. Each complex Gaussian mixture intelligent model analyzes and processes the input sound signals according to the sound direction selected by the user to capture the spatial audio signal in the specified direction. At the same time, it calculates the likelihood value of the direction represented by each complex Gaussian mixture intelligent model and the current sound source direction to obtain the weight coefficient of the sound source in each direction. The spatial audio signals captured by each complex Gaussian mixture intelligent model are then weighted after IFFT transformation to obtain the left ear sound signal and right ear sound signal in the specified area direction respectively.

[0010] The spatial sound effect adjustment module for the left and right ear systems includes a neural network intelligent model. This module performs MFCC feature extraction on the left and right ear sound signals output from the left and right ear spatial audio capture modules, respectively, and uses a filter bank to acquire audio signals in different frequency bands. The feature parameters extracted by MFCC are input to the neural network intelligent model for analysis and processing. Sound enhancement is applied to selected sound types, while sound suppression is applied to non-selected sound types, thereby obtaining the left ear signal gain G in different frequency bands. ALi With right ear signal gain G ARi Simultaneously, the left ear signal S in each frequency band obtained after processing by the filter bank FLi With right ear signal S FRi The left ear signal S on each frequency band FLi With right ear signal S FRi With the corresponding left ear signal gain G ALi With right ear signal gain G ARi Multiplication produces a sound signal of a specified type and direction in a specified area after sound effect adjustment, i.e., the left ear sound signal S in each frequency band. OLi With the right ear sound signal S ORi ;

[0011] The channel balance module, based on the input left ear sound signal S OLi With the right ear sound signal S ORi First, obtain the sound signal S from the left ear. OLi With the right ear sound signal S ORi The volume of each frequency band in each frame is calculated, and then the volume of the left and right ears is balanced.

[0012] Based on the above key features, each microphone array includes four microphones, of which the second and third microphones are external microphones, the fourth microphone is a call microphone used to capture ambient sound from outside the space, and the first microphone is an internal microphone used to capture sound from inside the ear canal.

[0013] Based on the above main features, each of the left and right ear spatial audio capture modules includes 14 complex Gaussian mixture models, corresponding to 14 directions of the predefined listening field space. The listening field space is represented by a sphere centered on the human body. This space is divided into 14 directions in an axially symmetric manner: up, down, left, right, front, back, upper left front space center direction, upper right front space center direction, upper left back space center direction, upper right back space center direction, lower left front space center direction, lower right front space center direction, lower left back space center direction, and lower right back space center direction.

[0014] Based on the aforementioned key features, the 14 complex Gaussian mixture intelligent models of the left and right ear spatial audio capture modules correspond to 14 auditory spatial directions. Each complex Gaussian mixture intelligent model analyzes and processes the input sound signal and captures spatial audio signals according to the sound direction selected by the user, outputting 14 captured sound signals. Simultaneously, it calculates the likelihood value of the direction represented by each complex Gaussian mixture intelligent model being consistent with the current sound source direction, and normalizes these 14 likelihood values to obtain the weight of the sound source in each direction, and the weight coefficient ω for directions not selected. k =0 * weight k, the weight coefficient ω in the selected direction k =1 * weight k, the sound signals captured by each complex Gaussian mixture intelligent model are processed by IFFT transformation and then weighted to obtain the left ear sound signal S in the selected specified region direction. SL With the right ear sound signal S SR The weighted calculation formula is as follows:

[0015] Where S k The sound signals captured for each CGMM smart model.

[0016] Based on the above main features, when the left and right ear spatial sound effect adjustment module performs MFCC feature extraction, the feature extraction is processed frame by frame, with each frame being 10ms. Each frame is divided into n (n = 20 to 40) frequency bands, and each frequency band has 160 sampling points.

[0017] Based on the above key features, the intelligent pass-through system for headphones needs to train the NN model so that it can support the selected pass-through sound types. At the same time, it can also train the NN model on demand according to the user's needs, update the NN model parameters of the headphones, and provide selectable sound types as needed.

[0018] Based on the aforementioned key features, the spatial sound effect adjustment module utilizes the Filter Bank to acquire left ear signals S of different frequency bands frame by frame. FLi With right ear signal S FRi , where i = 1 to n, and n is the number of frequency bands divided in each frame of signal, which is an integer between 20 and 40.

[0019] Based on the aforementioned key features, the duct balancing module calculates the input left ear sound signal S according to the following formula. OLi With the right ear sound signal S ORi Volume of each frequency band in each frame:

[0020]

[0021] Where j represents the sampling point of each frame of audio signal in one frequency band, and then the volume E of each frequency band of each frame for the left and right ears is calculated. Li / E Ri The volume of the left and right ears is balanced. The reference standard for adjustment is the maximum difference in volume between the left and right ears (BEmax) obtained from actual testing on the passband, which is used as the boundary value. The volume difference between the two ears should not exceed this boundary value. If the volume difference between the two ears is within this boundary value, it meets the expectation and no balance adjustment is needed; if the volume difference between the two ears exceeds this boundary value, the corresponding frequency band signal S is adjusted accordingly. OLi / S ORi The balance coefficient G Li / G Ri The balance coefficient G Li / G Ri As the corresponding left ear sound signal S OLi With the right ear sound signal S ORi The weighting coefficients for signal balance adjustment in each frequency band, for the left ear sound signal S OLi With the right ear sound signal S ORi Weighted processing is performed to obtain the final playback sound, thereby achieving the goal of balancing the sound effects in the left and right ears. The specific balance adjustment method is as follows:

[0022] Let E diff =|ELi -E Ri |;

[0023] When E diff < BEmax, no adjustment is required;

[0024] When E diff > BEmax, set G i =(1 / 2)*(E diff– BEmax)

[0025] When E Li > E Ri then G Li =1 - G i / E Li ; G Ri =1 + G i / E Ri ;

[0026] When E Li < E Ri then G Li =1 + G i / E Li ; G Ri =1 - G i / E Ri ;

[0027] After that, the sound signals of each frequency band in each frame are weighted to obtain the target sound signal

[0028]

[0029] According to the above main features, the intelligent headphone passthrough system is used in conjunction with an application program, and the user can select the sound direction and sound type through the application program.

[0030] Compared with the prior art, the present invention uses a binaural headphone microphone array and a pre-trained complex Gaussian mixture model in different directions to capture spatial audio in the direction of a specified area, uses a neural network intelligent model to adjust the sound effects of a specified type of sound in the direction of the specified area, and then performs sound effect balance adjustment on the binaural channels and transmits the specified type of sound in the user-specified direction to the user, realizing an intelligent and personalized ideal sound effect experience for different users, different environments, and different application requirements, strengthening and extending the hearing aid ability of the headphones in daily life, and effectively meeting the needs of people to use the headphones as a daily life helper. 【BRIEF DESCRIPTION OF THE DRAWINGS】

[0031] Figure 1 It is a schematic diagram of the composition framework of the intelligent headphone passthrough system for implementing the present invention.

[0032] Figure 2A and Figure 2B These are schematic diagrams showing the distribution of the microphone array in the headphones.

[0033] Figure 3A , Figure 3B , Figure 3C A schematic diagram of the defined listening space.

[0034] Figure 4 This is a schematic diagram of the setup process for specific use.

[0035] Figure 5 This is a schematic diagram illustrating the working principle of the spatial audio capture module.

[0036] Figure 6 This is a schematic diagram illustrating the working principle of the system's spatial sound effect adjustment module.

[0037] Figure 7 This is a schematic diagram illustrating the working principle of the left and right ear channel balancing module. 【Detailed Implementation Methods】

[0038] Please see Figure 1 The diagram shown illustrates the structural framework of the intelligent earphone pass-through system of the present invention. The intelligent earphone pass-through system of the present invention includes left and right ear microphone arrays, a spatial audio capture module, a system spatial sound effect adjustment module, and a left and right ear channel balancing module. The composition and function of each module are described in detail below.

[0039] like Figure 2A and Figure 2B As shown, in specific implementation, the left and right ear microphone arrays are symmetrically arranged to correspond to the left and right ears. Each microphone array includes four microphones, of which the second microphone M2 and the third microphone M3 are external microphones, the fourth microphone M4 is a call microphone used to capture ambient sound from outside the space, and the first microphone M1 is an internal microphone used to capture sound from inside the ear canal.

[0040] The left and right ear microphone arrays respectively acquire the ambient sound signal S in the left ear. LM1 ~S LM4 With the ambient sound signal S in the right ear RM1 ~S RM4 The data is then input to the left ear spatial audio capture module and the right ear spatial audio capture module, respectively.

[0041] Both the left and right ear spatial audio capture modules include multiple Complex Gaussian Mixture Models (CGMMs). A CGMM is a Gaussian Mixture Model trained using complex model coefficients. The left and right ear spatial audio capture modules first perform an FFT (Fast Fourier Transform) on the input sound signal. The four signals after the FFT are simultaneously input to the 14 CGMMs in the left and right ear spatial audio capture modules. Each CGMM analyzes and processes the input signal according to the user-selected sound direction to capture the spatial audio signal in the specified direction. Simultaneously, it calculates the likelihood value of the direction represented by each CGMM and the current sound source direction to obtain the weight coefficients of the sound source in each direction. The spatial audio signals captured by each CGMM are then subjected to an IFFT (Inverse Fast Fourier Transform). After transformation, weighted processing is performed to obtain the left ear sound signal S in the specified region direction. SL With the right ear sound signal S SR .

[0042] In practical implementation, the listening space needs to be defined first, such as... Figure 3A , Figure 3B and Figure 3C The diagram shows the defined listening space. Specifically, a sphere centered on the user represents the listening space. This space is divided into 14 directions in an axially symmetrical manner: up, down, left, right, front, back, upper left front, upper right front, upper left back, upper right back, lower left front, lower right front, lower left back, and lower right back. Each direction is assigned a number (as shown in Table 1 below) to facilitate the user in selecting the direction of the sound source that transmits sound as needed.

[0043] Correspondingly, a CGMM intelligent model is trained in each of the 14 directions of the aforementioned listening space. The encoding of the 14 CGMM intelligent models corresponds one-to-one with the encoding of the 14 directions of the aforementioned listening space, and is used to identify the direction of sound, as shown in Table 1 below.

[0044] Direction number Corresponding to CGMM Sound source direction 1 CGMM1 superior 2 CGMM2 Down 3 CGMM3 Left 4 CGMM4 right 5 CGMM5 forward 6 CGMM6 back 7 CGMM7 Upper left front space center direction 8 CGMM8 Upper right front space center direction 9 CGMM9 Top left rear space center direction 10 CGMM10 Upper right rear space center direction 11 CGMM11 Lower left front space center direction 12 CGMM12 Lower right front space center direction 13 CGMM13 Down left rear space center direction 14 CGMM14 Down right rear space center direction

[0045] Table 1: Spatial Direction Numbering Table

[0046] Left ear sound signal S in the specified area direction SL With the right ear sound signal S SR The signals are respectively input to the spatial sound effect adjustment modules of the left and right ear systems, which are based on a neural network (NN) intelligent model. These modules simultaneously perform MFCC (Mel Frequency Cepstrum Coefficient) feature extraction on the input sound signal and use an Fbank (Filter Bank) to acquire audio signals in different frequency bands. The feature parameters extracted by MFCC are input to the NN intelligent model for analysis and processing. Sound enhancement is applied to selected sound types, while sound suppression is applied to non-selected sound types, thereby obtaining the gain G of the signal in different frequency bands. ALi (Left) / G ARi (Right) (i = 1 to n, where n is the number of frequency bands divided in each frame of signal, an integer between 20 and 40), and the left ear signal S in each frequency band obtained after processing by Fbank. FLi With right ear signal S FRi Where i = 1 to n, n is the number of frequency bands divided in each frame of signal, which is an integer between 20 and 40. The signal S in each frequency band is... FLi / S FRi and their respective gains G ALi / G ARi Multiplication produces a sound signal of a specified type and direction in a specified area after sound effect adjustment, i.e., the left ear sound signal S in each frequency band. OLi With the right ear sound signal S ORi , where i = 1 to n, n is the number of frequency bands divided in each frame of signal, an integer from 20 to 40.

[0047] In practical implementation, the neural network (NN) model of the spatial sound effect adjustment module needs to be trained to support the selected pass-through sound types. For example, users can select voices, whistles, doorbells, or children crying as pass-through sounds. In addition to the selectable sound types provided by the system, the NN model can be trained on demand according to the user's needs, updating the NN model parameters of the headphones and providing selectable sound types as needed.

[0048] The left ear sound signal S, after sound effect adjustment, is in a specified direction and of a specified type. OLi With the right ear sound signal S ORiThe input is fed to the left and right ear channel balancing modules. These modules first calculate the volume of each frequency band in each frame for their respective channels. Then, they calculate the balance coefficients for each frequency band signal in each frame for both left and right ears, using these coefficients as weighting factors for sound signal balance adjustment. Finally, they perform weighted processing on the sound signals of each frequency band in each frame for both left and right ears to obtain the final adjusted and balanced left ear sound signal S. BL With the right ear sound signal S BR Then the sound signal S from the left ear BL With the right ear sound signal S BR The sound is output to the speakers of the left and right headphones respectively, thereby achieving the ideal sound effect experience that users expect in a real environment.

[0049] Please see Figure 4 The diagram shows the setup process during actual use. In practice, the intelligent pass-through system for headphones of this invention is used in conjunction with an application (APP). Users can select the sound direction (as shown in Table 1, 14 directions are selectable) and sound type (pre-provided by the system or provided according to user needs) through the application (APP) to complete the pass-through function settings.

[0050] To facilitate understanding, the specific working methods of each of the above functional modules will be explained in detail below.

[0051] like Figure 5 As shown, the spatial audio capture module analyzes and captures the input ambient sound signal based on the sound direction selected by the user through the sound direction selector in the APP. The ambient sound signal input to the left and right ears each has 4 channels of S... LM1 ~S LM4 and S RM1 ~S RM4 Each input signal undergoes FFT transformation and is simultaneously fed into 14 CGMM intelligent models. These 14 CGMM models correspond to 14 auditory spatial directions. Each CGMM analyzes and processes the input sound signal and captures spatial audio signals based on the user-selected sound direction, outputting 14 captured sound signals. Simultaneously, it calculates the likelihood value between the direction represented by each CGMM and the current sound source direction, and normalizes these 14 likelihood values to obtain the weights of the sound source in each direction, including the weight coefficient ω for directions not selected. k =0 * weight k, the weight coefficient ω in the selected direction k =1 * weight k, the sound signals captured by each CGMM intelligent model are processed by IFFT transformation and then weighted to obtain the left ear sound signal S in the selected specified area direction. SLWith the right ear sound signal S SR The weighted calculation formula is as follows:

[0052] Where S k The sound signals captured for each CGMM smart model.

[0053] like Figure 6 As shown, the sound signal S in the specified area direction SL / S SR After the input is fed into the left and right ear spatial sound effect adjustment module, MFCC feature extraction is performed simultaneously, and audio signals of different frequency bands are obtained using FBank. Feature extraction is processed frame by frame, with each frame consisting of 10ms. Typically, in human hearing assessment, the hearing detection range is generally 125Hz to 8kHz, so the bandwidth of audio analysis is generally 0 to 8kHz. According to the Nyquist sampling theorem, a sampling rate of 16kHz is sufficient, so there are 160 sampling points per frame. Each frame is divided into n (n is an integer between 20 and 40) frequency bands, with 160 sampling points in each frequency band.

[0054] After processing by FBank, the audio signal S for each frequency band of each frame is output. FLi / S FRi The MFCC parameters, after feature extraction, are input into the NN intelligent model for analysis. The NN intelligent model enhances the selected sound types and suppresses the non-selected sound types, thereby obtaining the signal gain G in each frequency band of each frame. ALi / G ARi .

[0055] The signal S in each frequency band of each frame after FBank processing FLi / S FRi and the gain G corresponding to each frequency band signal in each frame ALi / G ARi Multiply to output the audio signal S in each frequency band of each frame. OLi / S ORi This refers to a specific type of sound signal in a specified area and direction after sound effect adjustment.

[0056] Since the ambient sound in the specified direction for the left and right ears is not the sound in an absolute direction, but rather the sound in a specified directional area, the volume (amplitude) of the sound signals in the two ears may be inconsistent after spatial audio capture and spatial sound effect adjustment. It is necessary to balance the volume of the left and right ears to obtain the user's ideal listening experience.

[0057] like Figure 7 As shown, based on the input left and right ear sound signals S OLi / S ORi, first calculate the volume of each frequency band of each frame of the left and right ears. The calculation formula is as follows:

[0058]

[0059] where j is the sampling point of the sound signal of each frame in a frequency band.

[0060] According to the calculated volume E of each frequency band of each frame of the left and right ears Li / E Ri Balance the volume of the left and right ears. The reference standard for adjustment is to use the maximum difference BEmax of the volumes of the left and right ears in the measured passband (the actual test result is that the volume difference between the left and right ears is within 6 dB, that is, BEmax = 6 dB) as the boundary value. The volume difference between the two ears cannot exceed this boundary value. If the volume difference between the two ears is within this boundary value, it meets the expectation and no balance adjustment is required; if the volume difference between the two ears exceeds this boundary value, then adjust the balance coefficient G of the corresponding frequency band signal S OLi / S ORi of Li / G Ri , take G Li / G Ri as the weight coefficient for balancing the adjustment of the corresponding frequency band signal S OLi / S ORi Perform weighted processing on S OLi / S ORi to obtain the final played sound, so as to achieve the goal of left and right ear sound effect balance. The specific balance adjustment method is as follows:

[0061] Let: E diff = |E Li -E Ri |;

[0062] When E diff < BEmax, no adjustment is required;

[0063] When E diff > BEmax, let G i = (1 / 2) * (E diff– BEmax)

[0064] When E Li > E Ri , then G Li = 1 - G i / E Li ; G Ri = 1 + G i / E Ri ;

[0065] When E Li < E Ri , then GLi =1+G i / E Li G Ri =1-G i / E Ri ;

[0066] Then, the audio signals of each frequency band in each frame are weighted to obtain the target audio signal.

[0067]

[0068] Signal S BL / S BR The output is played to the speakers of the left and right headphones respectively.

[0069] Compared with existing technologies, this invention utilizes a binaural microphone array and a pre-trained complex Gaussian mixture model in different directions to capture spatial audio in a specified area. It then uses a neural network intelligent model to adjust the sound effects of a specified type of sound in that specified area, and finally performs sound effect balance adjustment on the binaural channels before transmitting the specified type of sound in the specified direction to the user. This achieves an intelligent and personalized ideal sound experience for different users, in different environments, and with different application needs, enhancing and extending the hearing aid capabilities of headphones in daily life and truly meeting people's needs to use headphones as a daily life assistant.

[0070] It is understood that those skilled in the art can make equivalent substitutions or modifications to the technical solution and inventive concept of the present invention, and all such substitutions or modifications should fall within the protection scope of the appended claims.

Claims

1. A smart transparent transmission system for headphones, comprising: The left and right ear microphone arrays are symmetrically arranged to correspond to the left and right ears, and include multiple microphones to capture audio signals; The left and right ear spatial audio capture module includes multiple complex Gaussian mixture intelligent models corresponding to spatial directions. The left and right ear spatial audio capture modules respectively perform FFT transformation on the sound signals collected by the left and right ear microphone arrays, and then simultaneously input them to each complex Gaussian mixture intelligent model. Each complex Gaussian mixture intelligent model analyzes and processes the input sound signals according to the sound direction selected by the user to capture the spatial audio signal in the specified direction. At the same time, it calculates the likelihood value of the direction represented by each complex Gaussian mixture intelligent model and the current sound source direction to obtain the weight coefficient of the sound source in each direction. The spatial audio signals captured by each complex Gaussian mixture intelligent model are then weighted after IFFT transformation to obtain the left ear sound signal and right ear sound signal in the specified area direction respectively. The spatial sound effect adjustment module for the left and right ear systems includes a neural network intelligent model. The left and right ear spatial sound effect adjustment module performs MFCC feature extraction on the left ear sound signal and the right ear sound signal output by the left and right ear spatial audio capture modules, respectively, and uses a filter bank to obtain audio signals of different frequency bands. The feature parameters extracted by MFCC are input to the neural network intelligent model for analysis and processing. Sound effect enhancement processing is performed on the selected specified type of sound, and sound effect suppression is performed on the non-specified type of sound, thereby obtaining the left ear signal gain GALi and the right ear signal gain GARi of the signal in different frequency bands. At the same time, the left ear signal SFLi and the right ear signal SFRi of each frequency band obtained after processing by the filter bank are multiplied by the corresponding left ear signal gain GALi and right ear signal gain GARi, and the sound effect adjusted sound signal of the specified type in the specified area direction is output, that is, the left ear sound signal SOLi and the right ear sound signal SORi of each frequency band. The channel balance module first calculates the volume of each frequency band of the left ear sound signal SOLi and the right ear sound signal SORi in each frame based on the input left ear sound signal SOLi and right ear sound signal SORi, and then adjusts the volume of the left and right ears to balance. Each of the left and right ear spatial audio capture modules includes 14 complex Gaussian mixture intelligent models, corresponding to 14 directions of the predefined listening field space. The listening field space is represented by a sphere centered on the human body. This space is divided into 14 directions in an axially symmetrical manner: up, down, left, right, front, back, upper left front space center direction, upper right front space center direction, upper left back space center direction, upper right back space center direction, lower left front space center direction, lower right front space center direction, lower left back space center direction, and lower right back space center direction. The 14 complex Gaussian mixture intelligent models of the left and right ear spatial audio capture modules correspond to 14 listening field spatial directions. Each complex Gaussian mixture intelligent model analyzes and processes the input sound signal and captures spatial audio signals based on the sound direction selected by the user, outputting 14 captured sound signals. Simultaneously, it calculates the likelihood value of the direction represented by each complex Gaussian mixture intelligent model aligning with the current sound source direction, and normalizes these 14 likelihood values to obtain the weight of the sound source in each direction. The weight coefficient ωk for directions not selected is 0. Weight k, the weight coefficient ωk=1 in the selected direction. The weights k are calculated by weighting the sound signals captured by each complex Gaussian mixture intelligent model after IFFT transformation, resulting in the left ear sound signal SSL and the right ear sound signal SSR in the selected specified region direction. The weighting formula is as follows: , where Sk is the sound signal captured by each CGMM smart model.

2. The intelligent transparent transmission system for headphones as described in claim 1, characterized in that: Each microphone array includes four microphones. Among them, the second and third microphones are external microphones, the fourth microphone is a call microphone for capturing the sound of the external space environment, and the first microphone is an internal microphone for capturing the sound inside the ear canal.

3. The intelligent transparent transmission system for headphones as described in claim 1, characterized in that: When the left and right ear system spatial sound effect adjustment module performs MFCC feature extraction, it processes by frame. Each frame is 10 ms, and each frame is divided into n (n = 20 - 40) frequency bands, and there are 160 sampling points on each frequency band.

4. The intelligent transparent transmission system for headphones as described in claim 3, characterized in that: The described headset intelligent passthrough system needs to train the neural network intelligent model so that it supports the selected passthrough sound types. At the same time, according to the needs of users, the neural network intelligent model can be trained as required, the parameters of the neural network intelligent model of the headset can be updated, and optional sound types can be provided as required.

5. The intelligent transparent transmission system for headphones as described in claim 4, characterized in that: The left and right ear system spatial sound effect adjustment module uses Filter Bank to obtain the left ear signal SFLi and right ear signal SFRi of different frequency bands by frame, where i = 1 - n, n is the number of frequency bands divided by each frame signal, and it is an integer between 20 and 40.

6. The intelligent transparent transmission system for headphones as described in claim 5, characterized in that: The channel balance module obtains the volume of each frequency band of the input left ear sound signal SOLi and right ear sound signal SORi per frame according to the following formula: ； Where j is the sampling point of each frame of sound signal on a frequency band. Then, according to the calculated volume ELi / ERi of each frequency band of the left and right ears per frame, the volume of the left and right ears is balanced. The reference standard for adjustment is to use the maximum difference BEmax of the volumes of the left and right ears on the through - frequency band obtained from actual tests as the boundary value, and the volume difference between the two ears cannot exceed this boundary value; if the volume difference between the two ears is within this boundary value, it meets the expectation and no balance adjustment is required; if the volume difference between the two ears exceeds this boundary value, the balance coefficients GLi / GRi of the corresponding frequency band signals SOLi / SORi are adjusted accordingly. The balance coefficients GLi / GRi are used as the weight coefficients for the balance adjustment of each frequency band signal of the corresponding left ear sound signal SOLi and right ear sound signal SORi, and the left ear sound signal SOLi and right ear sound signal SORi are weighted to obtain the final playback sound, so as to achieve the goal of left and right ear sound effect balance. The specific balance adjustment method is as follows: Let: Ediff = |ELi - ERi|; When Ediff < BEmax, no adjustment is required; When Ediff > BEmax, let Gi = (1 / 2) (Ediff – BEmax) When ELi > ERi, then GLi = 1 - Gi / ELi; GRi = 1 + Gi / ERi; When ELi < ERi, then GLi = 1 + Gi / ELi; GRi = 1 - Gi / ERi; After that, the sound signals of each frequency band of each frame are weighted to obtain the target sound signal ；。 7. The intelligent transparent transmission system for headphones as described in claim 6, characterized in that: The described headset intelligent passthrough system is used in cooperation with an application program. The user can select the sound direction and sound types through the application program.