Bluetooth earphone intelligent control method and system
By analyzing real-time audio and motion data from Bluetooth headsets, a multi-dimensional fusion mapping scene recognition mechanism is constructed. Combined with user preferences and safety assessments, this enables Bluetooth headsets to dynamically and adaptively adjust in complex environments. This solves the problem of existing technologies being unable to distinguish between scenarios with different safety requirements, thereby improving user safety and comfort.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENZHEN XUSHENG TECH CO LTD
- Filing Date
- 2026-02-05
- Publication Date
- 2026-06-12
AI Technical Summary
Existing Bluetooth headset noise cancellation control methods rely on a single audio energy threshold, which cannot make correct decisions in scenarios with similar physical acoustic characteristics but different safety requirements. This results in the inability to capture key warning signals in time when walking outdoors, causing safety hazards.
By acquiring real-time audio signals from Bluetooth headsets and user motion data, frequency domain features are extracted and change patterns are analyzed. A multi-dimensional fusion mapping scene recognition mechanism is constructed, and combined with a user historical preference feature library and security assessment rules, dynamic adaptive adjustment of headset function configuration is achieved.
It enables precise differentiation of the user's situation under complex and changing working conditions, ensuring that safety needs are prioritized and dynamically balancing outdoor safety and commuting comfort, avoiding the safety hazards and hearing shielding problems caused by a single control strategy in traditional solutions.
Smart Images

Figure CN122205293A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of audio device control, and more particularly to a method and system for intelligent control of Bluetooth headsets. Background Technology
[0002] With the booming development of mobile internet and smart wearable technology, audio interaction devices based on smart sensors have penetrated into people's daily lives, becoming an important bridge connecting virtual information and the real auditory environment. Their scene adaptability directly determines the user's interactive experience and travel safety.
[0003] In a current technology, Bluetooth headsets typically employ a single trigger mechanism based on a fixed audio energy threshold to control noise cancellation mode. For example, when the external sound pressure level collected by the microphone continuously exceeds a preset decibel threshold (e.g., 75dB), the system uses comparator logic to determine that the current environment is noisy, and then forcibly activates the active noise cancellation (ANC) function to isolate external interference. This technology mainly relies on instantaneous energy data from acoustic sensors, assuming that a high-noise environment necessarily corresponds to an auditory need for interference isolation, but it ignores the complex nonlinear relationship between user behavior and environmental semantics. However, because this control logic lacks multimodal fusion perception of user motion characteristics, it is difficult to make correct decisions in scenarios with similar physical acoustic characteristics but drastically different safety requirements. For example, when walking on a busy street or commuting in a subway car, the ambient background noise is at a high intensity level. Current technologies often uniformly perform deep noise cancellation processing, causing users to be unable to promptly detect critical warning signals such as car horns when walking outdoors due to auditory blockage, thus leading to serious safety hazards.
[0004] In summary, existing technologies cannot accurately identify specific scene attributes and user behavioral intentions in a high-noise background based on a single audio feature, making it impossible to achieve an intelligent dynamic balance between ensuring outdoor walking safety and providing a noise reduction experience in enclosed spaces. Summary of the Invention
[0005] This invention provides a method and system for intelligent control of Bluetooth headsets, enabling dynamic adaptive adjustment based on environmental noise and user movement status.
[0006] Firstly, in order to solve the above-mentioned technical problems, the present invention provides a Bluetooth headset smart control method, comprising: The system acquires real-time audio signals from Bluetooth headsets and user motion data, extracts frequency domain features from the real-time audio signals, and analyzes the change patterns of the user motion data to obtain an environmental noise feature set and a user motion state feature set. The environmental noise feature set and the user motion state feature set are fused and mapped in multiple dimensions to generate the current scene identifier; If the current scene identifier corresponds to a high-noise scene, then the environmental noise feature set, the user motion state feature set, and the preset safety assessment rules are matched and calculated to output the user safety requirement level; The system will extract matching preference parameters from a preset user history preference feature library based on the user's security requirement level, and then fuse the preference parameters with the environmental noise feature set to calculate and generate the environmental sound activation ratio. The mixing weight of the external ambient sound signal and the internal audio signal is determined according to the ambient sound activation ratio, and the headphone audio channel is adaptively adjusted according to the mixing weight. The system continuously collects the real-time audio signal and the user's motion data. Based on the collection results, it recalculates the ambient sound activation ratio and the corresponding mixing weight, and calculates the deviation with the current actual mixing weight. If the deviation exceeds the preset error range, it updates the current actual mixing weight in real time and adjusts the audio channel or switches the working mode accordingly to generate a dynamically adapted headphone function configuration.
[0007] Secondly, the present invention provides a Bluetooth headset intelligent control system, comprising: The environmental perception module acquires real-time audio signals from the Bluetooth headset and user motion data, extracts frequency domain features from the real-time audio signals, and analyzes the change patterns of the user motion data to obtain an environmental noise feature set and a user motion state feature set. The scene recognition module performs multi-dimensional fusion mapping between the environmental noise feature set and the user motion state feature set to generate the current scene identifier; The safety assessment module, if the current scene identifier corresponds to a high-noise scene, will match and calculate the environmental noise feature set, the user motion state feature set, and the preset safety assessment rules, and output the user safety requirement level. The preference decision module will extract matching preference parameters from the preset user historical preference feature library according to the user's security requirement level, and fuse the preference parameters with the environmental noise feature set to calculate and generate the environmental sound activation ratio. The audio mixing module determines the mixing weight of the external ambient sound signal and the internal audio signal based on the ambient sound activation ratio, and adaptively adjusts the headphone audio channel according to the mixing weight. The closed-loop adaptive module continuously collects the real-time audio signal and the user's motion data. Based on the collection results, it recalculates the ambient sound activation ratio and the corresponding mixing weight, and calculates the deviation with the current actual mixing weight. If the deviation exceeds the preset error range, it updates the current actual mixing weight in real time and adjusts the audio channel or switches the working mode accordingly to generate a dynamically adapted headphone function configuration.
[0008] Compared with the prior art, the present invention has the following beneficial effects: (1) This invention acquires real-time audio frequency domain features and user motion change patterns and performs multi-dimensional fusion mapping to construct a scene recognition mechanism based on dual verification of acoustic fingerprint and behavioral state. This mechanism can deeply analyze the semantic attributes of environmental signals, so that the system can accurately distinguish the specific situation of the user in high-noise environments with similar sound pressure levels but different scene attributes (such as streets with dense traffic and subway cars in operation). This fundamentally breaks through the bottleneck of scene misidentification caused by existing technologies that rely solely on a single audio energy threshold for linear judgment, and provides a reliable perceptual basis for the differentiated and accurate matching of subsequent functions.
[0009] (2) When the scene is detected to switch from low noise to high noise, the present invention introduces a preset safety assessment rule and combines it with the user’s historical preference feature library for weighted fusion, forming a hybrid decision logic with physical safety as a rigid constraint and user habits as a flexible adjustment. This enables the headphones to prioritize the calculation and response to the user’s safety needs level when faced with sudden environmental changes (such as retaining warning sounds when walking outdoors), effectively avoiding the safety hazards caused by the traditional solution’s uniform strong noise reduction that masks key environmental sounds, and achieving a dynamic balance between travel safety and personalized listening experience.
[0010] (3) The present invention constructs a dual-mode adaptive configuration mechanism of gain allocation and noise shielding based on the ambient sound activation ratio and the current scene identifier. It can flexibly switch or linearly adjust between "ambient sound pass-through mode" and "active noise reduction shielding mode" according to the calculated mixing weight, thereby driving the audio channel to enhance external signals in open spaces to maintain context awareness and suppress low-frequency noise in enclosed spaces to ensure immersive experience. It completely solves the technical contradiction that a single control strategy cannot simultaneously take into account outdoor safety and commuting comfort under complex and variable working conditions.
[0011] (4) This invention establishes a closed-loop feedback control system with adaptive correction capability by continuously calculating the deviation between the regenerated target mixing weight and the current actual mixing weight and triggering real-time updates when the deviation exceeds the preset error range. This system can offset the interference caused by sensor data fluctuations or non-stationary changes in environmental noise, ensuring that the adjustment command of the audio channel always closely follows the dynamic changes of the target value. It eliminates the sudden changes in volume, adjustment lag or auditory oscillation caused by open-loop control from the source, and significantly improves the smoothness and stability of the audio interaction process. Attached Figure Description
[0012] Figure 1 This is a schematic flowchart of the Bluetooth headset intelligent control method provided in the first embodiment of the present invention; Figure 2This is a schematic diagram of the Bluetooth headset intelligent control system provided in the second embodiment of the present invention. Detailed Implementation
[0013] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0014] Reference Figure 1 The first embodiment of the present invention provides a smart control method for Bluetooth headsets, including the following steps: S11, acquire the real-time audio signal and user motion data of the Bluetooth headset, extract frequency domain features from the real-time audio signal and analyze the change pattern of the user motion data to obtain an environmental noise feature set and a user motion state feature set. S12, perform multi-dimensional fusion mapping between the environmental noise feature set and the user motion state feature set to generate the current scene identifier; S13, if the current scene identifier corresponds to a high noise scene, then the environmental noise feature set, the user motion state feature set and the preset safety assessment rules are matched and calculated to output the user safety requirement level; S14, based on the user's security requirement level, extract matching preference parameters from the preset user historical preference feature library, and fuse the preference parameters with the environmental noise feature set to calculate and generate the environmental sound activation ratio; S15, determine the mixing weight of the external ambient sound signal and the internal audio signal according to the ambient sound activation ratio, and adaptively adjust the headphone audio channel according to the mixing weight; S16, continuously collect the real-time audio signal and the user motion data, recalculate the ambient sound activation ratio and corresponding mixing weight based on the collection results, and calculate the deviation with the current actual mixing weight. If the deviation exceeds the preset error range, update the current actual mixing weight in real time and adjust the audio channel or switch the working mode accordingly to generate a dynamically adapted headphone function configuration.
[0015] In step S11, real-time audio signals from the Bluetooth headset and user motion data are acquired. Frequency domain features are extracted from the real-time audio signals, and change pattern analysis is performed on the user motion data to obtain an environmental noise feature set and a user motion state feature set, including: S111, acquire real-time audio signals, perform Fourier transform on the real-time audio signals, extract frequency spectrum features and noise levels, and match and classify them with a pre-stored noise template library to obtain the environmental noise type and noise level. S112, acquire user motion data, perform statistical analysis on the acceleration change amplitude and frequency of the user motion data, and compare it with a preset intensity threshold at multiple levels to obtain the motion intensity level and motion mode label; S113, the environmental noise type, the noise level and the frequency spectrum features are fused into an environmental noise feature set, and the motion intensity level and motion mode label are fused into a user motion state feature set.
[0016] Specifically, for the processing of the real-time audio signal in step S111, the Short Time Fourier Transform (STFT) algorithm is used to convert the time-domain audio stream into a time-frequency domain spectrogram. In this process, a Hamming window is selected as the window function to reduce spectral leakage, and the window length is set to 20ms to 30ms. This window length is based on the well-known characteristic that environmental noise and speech signals are stationary within a short time range (i.e., within 20-30ms). When extracting frequency spectrum features, Mel-frequency cepstral coefficients (MFCCs) are further calculated. Typically, the first 13 dimensions are extracted as the core feature vector because this number of dimensions can cover the key frequency band information perceived by human hearing and achieves a balance between computational efficiency and feature representation capability. For the calculation of noise levels, the A-weighting algorithm is used to weight and sum the spectral energy to simulate the differences in human ear sensitivity to different frequencies, thereby obtaining the sound pressure level in decibels (dBA). In the environmental noise type matching and classification stage, the Mahalanobis distance algorithm is used to calculate the distance between the currently extracted MFCC feature vector and the center vectors of various noise types (such as traffic noise, office background noise, wind noise, etc.) in the pre-stored noise template library. Mahalanobis distance takes into account the correlation and distribution variance between the various dimensions of the features, and can provide more accurate classification results based on the statistical distribution characteristics of the multidimensional feature space.
[0017] In one implementation, for the user motion data analysis in step S112, raw data is acquired using a triaxial accelerometer, and preprocessed using a Butterworth low-pass filter with a cutoff frequency set to 20Hz. This threshold is based on the biomechanical principle that the frequency of daily human movement is usually below 20Hz, which can filter out high-frequency electronic noise generated by the sensor itself. Subsequently, the signal vector magnitude (SVM) of the triaxial acceleration is calculated, and statistical features are extracted within a preset time sliding window (e.g., 2 seconds, the window length is set based on the statistical law that a complete gait cycle is usually 1-1.5 seconds, ensuring that at least one complete motion cycle is included within the window). The statistical features include the mean and standard deviation of the SVM, and the motion frequency estimated by the zero-crossing rate (ZCR). When comparing with a preset intensity threshold, the intensity threshold is the optimal critical value determined by selecting the maximum point of the Youden Index after analyzing the historical exercise data of a large number of users of different heights and weights using ROC curves (Responder Operating Characteristic curves), thereby accurately distinguishing different intensity levels such as stillness, slow walking, and vigorous exercise.
[0018] It is worth noting that the feature fusion in step S113 is not a simple numerical concatenation, but rather a Z-Score normalization method is used to normalize the environmental noise features and user motion state features separately. Since noise level (dB) and acceleration (g) are physical quantities with different dimensions, Z-Score processing can transform the data into a distribution with a mean of 0 and a standard deviation of 1. The mathematical basis of this processing is to eliminate the interference of dimensional differences on the weights of subsequent multi-dimensional fusion mapping algorithms (such as neural networks or cluster analysis). The normalized data is constructed into a structured tensor or feature vector, which not only contains the current instantaneous features, but preferably also incorporates historical feature frames from the time series (e.g., the feature mean of the past 5 seconds) to construct an environmental noise feature set and a user motion state feature set containing temporal context information, providing input data with a time dimension for subsequent scene recognition.
[0019] In step S12, the environmental noise feature set and the user motion state feature set are fused and mapped in a multi-dimensional manner to generate a current scene identifier, including: S121, perform multi-level matching between the environmental noise feature set and the preset scene feature library to obtain a preliminary scene probability distribution; S122, the preliminary scene probability distribution and the user motion state feature set are weighted and fused to generate a scene identifier.
[0020] Specifically, for scene feature matching in step S121, the k-Nearest Neighbors (k-NN) algorithm is used to perform similarity retrieval in a high-dimensional feature space. The environmental noise feature set extracted in step S11 is used as the query vector, and its cosine similarity with each sample point in the preset scene feature library (containing cluster centers of typical scenes such as subways, streets, offices, and libraries) is calculated. Cosine similarity is chosen instead of Euclidean distance because it is a well-known mathematical principle that in a high-dimensional audio feature space, the directional difference of feature vectors is more representative of sound texture characteristics than the difference in numerical magnitude. The k neighbors with the highest similarity (e.g., k=5, this value is set based on the lowest error rate determined after leave-one-out cross-validation on the test set) are selected, and the frequency of the scene category to which these k neighbors belong is counted. Subsequently, the frequency statistics results are transformed into a normalized probability distribution vector using the Softmax function (e.g., [subway: 0.7, street: 0.2, other: 0.1]), thus obtaining the preliminary scene probability distribution.
[0021] It is worth noting that the preset scene feature library is built and maintained through a combination of offline supervised learning and online incremental updates. During the construction phase, the system collects massive amounts of audio and motion data samples of typical labeled scenes (such as subway cars, busy streets, and quiet offices). Features are extracted from each type of sample, and a Gaussian Mixture Model (GMM) is used to calculate the mean vector and covariance matrix of the distribution, forming a structured database indexed by "scene ID" and containing "feature centroids" and "distribution boundaries." Regarding data updates, the system introduces an adaptive calibration mechanism based on user behavior feedback. If a user manually forces a change in the working mode (indicating a system recognition deviation), the system captures the current feature vector as a correction sample and uses an incremental learning algorithm to fine-tune the feature centroids of the corresponding scene in the library at a preset learning rate, ensuring that the feature library continuously evolves with environmental changes.
[0022] In one implementation, for the weighted fusion in step S122, a Bayesian inference model is used to modify the initial scene probability distribution using the user motion state feature set as a priori conditions. Specifically, a motion-scene correlation matrix is constructed, defining the conditional probability of each scene occurring under a specific motion mode (e.g., in the "vigorous running" mode, the probability of the "library" scene is extremely low). The posterior probability distribution is obtained by performing a dot product operation between the initial scene probability distribution (as a likelihood function) and the prior probability corresponding to the motion state. The category with the highest posterior probability is selected as the scene identifier for the current time frame. During this process, if the motion state is "stationary," higher weights (e.g., 0.8) are assigned to audio features, while lower weights (e.g., 0.2) are assigned to motion features; conversely, if in a "high-frequency vibration" state, the weights of audio features are reduced to suppress wind noise interference. This dynamic weight allocation strategy is based on the Shannon entropy principle, that is, assigning greater decision weights to modes with lower information entropy (higher certainty).
[0023] It is worth noting that to avoid scene recognition abrupt changes caused by transient noise (such as sudden car horns), a sliding time window mechanism is used for smoothing. The time window length is set to 3 seconds, based on the "scene integration time window" theory in human auditory cognitive psychology, which states that the human brain typically requires about 2 to 3 seconds of continuous perception to confirm a change in the environment. Within each time window, a majority voting method is used to determine the dominant scene identifier for that window. Then, the state difference between the dominant identifier of the current time window and the dominant identifier of the previous time window is calculated. The preset switching condition is defined as follows: a scene switching event is confirmed only if the new scene identifier remains consistent across N consecutive (e.g., N=3) overlapping windows, and the mean posterior probability confidence exceeds a threshold (e.g., 0.75, determined based on the requirement of a false alarm rate (FPR) < 5% in the ROC curve). This double confirmation mechanism effectively filters out short-term interference, ensuring the robustness of scene switching judgments.
[0024] In step S13, if the current scene identifier corresponds to a high-noise scene, the environmental noise feature set, the user motion state feature set, and the preset safety assessment rules are matched and calculated to output the user safety requirement level, including: S131, compare the exercise intensity level with the preset intensity threshold level by level to obtain a preliminary risk score; S132, the preliminary risk score and the noise level are weighted and summed to obtain the comprehensive risk value; S133, match the comprehensive risk value with the preset security assessment rules, and output the user's security requirement level.
[0025] Specifically, for the risk score assessment in step S131, a piecewise linear mapping function is used to convert the exercise intensity level into a quantified risk value (0-100). In this process, a baseline exercise threshold is set as the rate of change of acceleration of 1.5 m / s². 2 This threshold is set based on reaction time theory in human biomechanics. When the acceleration fluctuation caused by walking speed exceeds this value, the average physical reaction time of the human body to sudden road conditions will be significantly prolonged, leading to a decrease in hazard avoidance ability. If the exercise intensity level is displayed as "fast running" (corresponding to acceleration fluctuation > 2.5 m / s²), then... 2 If the user is moving at high speed, a higher base risk score (e.g., 80 points) is assigned because the visual field experiences a tunneling effect during high-speed movement, reducing the user's perception of their surroundings. Conversely, if the user is walking at a slow speed, a lower risk score (e.g., 20 points) is assigned. This tiered scoring mechanism ensures that the risk assessment is directly linked to the user's physical risk avoidance potential.
[0026] In one implementation, the Entropy Weight Method (EWM) is used to objectively determine the fusion weights of each factor in the comprehensive risk value calculation in step S132. First, the input noise level (dBA) is Min-Max normalized to map it to the [0,1] interval. Then, an evaluation matrix is established containing the motion risk score and the normalized environmental noise value. The dispersion of each indicator is measured by calculating its information entropy, thereby deriving the objective weights. Typically, the weight of environmental noise (e.g., ...) is... ) will be set with a higher weight than the risk of exercise (e.g. The statistical basis for this weighting is that, in outdoor safety accident analysis, the correlation coefficient between the perception loss caused by auditory masking effect and the accident is significantly higher than that of simple movement speed. Based on the determined weights, a weighted linear summation formula is used to calculate the comprehensive risk value, thereby mathematically representing the overall safety threat level in the current scenario.
[0027] It is worth noting that the preset security assessment rules for the level matching in step S133 are generated based on unsupervised learning of historical security event data using the K-Means clustering algorithm. Specifically, the massive amount of historically collected risk value data is clustered into three clusters, corresponding to "low risk," "medium risk," and "high risk," respectively, and the centroid boundary of each cluster is used as the critical threshold for level classification. For example, a comprehensive risk value greater than 75 is set as a "high security requirement level." This threshold is not arbitrarily selected but is based on the d' (sensitivity index) analysis in Signal Detection Theory (SDT). When the risk value exceeds 75, the amount of environmental noise masking warning sounds (such as car horns) usually exceeds the human hearing threshold, requiring external intervention (such as pass-through enhancement) to restore security perception. Through this lookup table matching, continuous risk values can be quickly discretized into executable control command levels.
[0028] In step S14, matching preference parameters are extracted from a preset user historical preference feature library based on the user's security requirement level, and the preference parameters are fused with the environmental noise feature set to calculate and generate the environmental sound activation ratio, including: S141, extract multiple historical environmental sound ratio records that match the scene identifier and the user security requirement level from a preset user history preference feature library; S142, after assigning different weights to the environmental sound ratio record and the environmental noise feature set, perform fusion calculation to calculate and generate the environmental sound on-state ratio.
[0029] Specifically, for the historical record extraction in step S141, a content-based recommendation algorithm is used to construct a retrieval model. First, the current scene identifier (e.g., a One-Hot encoded vector) and the user's security requirement level are combined to construct a query feature vector. Then, in a pre-defined user historical preference feature library, the cosine similarity between the query vector and each historical record vector is calculated. Specifically, in this embodiment, the user historical preference feature library is constructed as a multi-dimensional sparse mapping table structure. The index key of this mapping table is generated by combining the scene identifier and the user's security requirement level, and the corresponding stored value is the statistical distribution characteristics (including the mean) of the proportion of ambient sound activation actively set by the user under this specific combination condition in the past. With variance To ensure the preference model can adapt to the dynamic drift of user habits, the database employs an event-driven online update mechanism. Whenever the system detects a user manually adjusting volume or switching modes in a specific scenario, it captures the scenario label and security status at that moment. Then, it uses an exponentially weighted moving average algorithm with a time decay factor to integrate the new sample data into the corresponding historical records, thus assigning higher weight to recent behaviors. During the extraction operation, the system directly uses the current scenario identifier and the user's security requirement level output in step S13 as a joint index key to retrieve and extract the corresponding historical mean from the database. This serves as the baseline preference ratio. If the retrieval fails (e.g., during the cold start phase), a pre-defined safety fallback strategy is triggered, directly applying the default safety threshold corresponding to the current security level.
[0030] During the query process, the top-K records with the highest similarity scores are selected (e.g., K=5). The value of K is set based on the balance between the law of large numbers in statistics and computational efficiency. Experimental data shows that when the sample size reaches 5, the variance of the sample mean is significantly reduced, effectively filtering out interference from accidental erroneous records while avoiding the introduction of low-relevance noise from too many samples. Furthermore, a time decay factor is preferably introduced, using an exponential decay function. The similarity scores are weighted, where the attenuation coefficient is... The value is set to 0.05 (in days). This coefficient is based on a variant of the Ebbinghaus forgetting curve, which states that users' operating habits drift over time, and recent interaction data is more representative of their current true intentions.
[0031] In one implementation, for the fusion calculation in step S142, the system uses a "deviation correction model under security constraints" to dynamically optimize the baseline preference ratio. Specifically, the final ambient sound activation ratio is set. .in, The aforementioned extracted baseline value, This is a fine-tuning amount calculated based on environmental noise characteristics. The system analyzes the spectral flatness of the environmental noise. If harsh high-frequency wind noise or mechanical howling is detected in the current environment, even with a high baseline preference, the system will generate a negative fine-tuning amount (e.g., -10%) to improve listening comfort. Furthermore, the algorithm introduces a hard lower bound clamping logic determined by the "user's safety requirement level": regardless of the fine-tuning calculation result, the final output ratio... It must be higher than the minimum security threshold corresponding to the current security level (e.g., the lower limit is locked at 60% under a high security requirement level) to ensure that personalized adjustments are always above the physical security baseline.
[0032] In step S15, the mixing weight of the external ambient sound signal and the internal playback audio signal is determined according to the ambient sound activation ratio, and the headphone audio channel is adaptively adjusted according to the mixing weight, including: S151, calculate the gain coefficient of the external ambient sound signal and the attenuation coefficient of the internal audio signal based on the ambient sound activation ratio. S152, perform real-time weighted superposition of the two audio signals according to the gain coefficient and attenuation coefficient and output the result; S153, when the ambient sound activation ratio is lower than the preset lower threshold, the external ambient sound gain coefficient is set to the preset minimum value and the noise reduction processing intensity is increased, and the noise shielding mode is entered.
[0033] Specifically, the coefficient calculation in step S151 does not employ a simple linear mapping, but rather a nonlinear gain mapping algorithm based on the equal-loudness contour. Since there is a logarithmic relationship between the human ear's perception of sound loudness and sound pressure level, the ambient sound activation ratio (0-100%) is normalized to the [0,1] interval and used as the independent variable input to the logarithmic function to calculate the gain coefficient of the external ambient sound signal. For example, using the formula ,in To enable the ratio, To prevent the use of tiny, logarithmically singular constants, the attenuation coefficient of the internally played audio signal is calculated in reverse using side-chain compression logic. This ensures that when the external sound gain increases, the internal audio is proportionally "ducking." This gain-attenuation relationship follows the Constant Power Law, guaranteeing... This rule prevents the total energy from overflowing after the two signals are superimposed, which would cause digital clipping distortion.
[0034] In one implementation, time-domain weighted superposition technology is employed in a digital signal processor (DSP) for the real-time weighted superposition in step S152. To avoid abrupt changes in gain coefficients introducing "zipper noise" into the audio stream, linear interpolation or smoothing filtering algorithms are used to buffer coefficient changes when applying new gain coefficients. The ramp time for gain changes is set to 10ms to 20ms. This time parameter is based on the time integration characteristics of the human ear; transient changes below 10ms may produce audible clicking sounds, while delays above 30ms will produce a perceptible volume lag. During the mixing process, comb filtering is applied to compensate for the external ambient sound signal to correct the phase delay introduced by the microphone acquisition, ensuring that it remains in phase with the residual sound passively transmitted through the physical earpiece.
[0035] It is worth noting that, regarding the mode switching in step S153, when the ambient sound activation ratio is lower than a preset lower threshold (e.g., 15%), it is determined that the noise shielding mode is entered. This threshold is based on the masking threshold in signal detection theory. When the gain of the external ambient sound is lower than this level, its sound pressure level is usually lower than the background noise inside the headphones or the residual noise after passive sound insulation. Continuing to play this signal will not only fail to provide effective information, but will also introduce unnecessary background noise. In this mode, the FxLMS (Filtered-x Least Mean Squares) adaptive filtering algorithm is activated to generate a cancellation wave that is out of phase with the external low-frequency noise. At this time, the noise reduction processing intensity (i.e., the amplitude gain of the reverse wave) is increased to the maximum allowable value (e.g., 25dB). The setting of this maximum value is limited by the linear excursion range of the speaker and the stability margin reserved to avoid the "fluctuating bellows effect" under non-steady-state noise.
[0036] In step S16, the real-time audio signal and the user's motion data are continuously collected. Based on the collection results, the ambient sound activation ratio and corresponding mixing weight are recalculated, and the deviation is calculated with the current actual mixing weight. If the deviation exceeds the preset error range, the current actual mixing weight is updated in real time, and the audio channel is adjusted or the working mode is switched accordingly to generate a dynamically adapted headphone function configuration, including: S161, re-acquire the real-time audio signal and the user motion data within a preset period, and update the environmental noise feature set and the user motion state feature set based on the acquisition results; S162, based on the updated feature set, re-execute scene recognition, security requirement assessment and ambient sound activation ratio calculation to obtain the latest target ambient sound activation ratio and corresponding hybrid weights. S163, calculate the deviation between the corresponding mixing weight and the mixing weight currently being executed by the headphones in real time to obtain the weight deviation value; S164, if the weight deviation value exceeds the preset error range, then the audio channel adjustment is re-executed or the corresponding working mode is switched based on the latest mixed weight, until the deviation enters the preset error range.
[0037] Specifically, for the data re-acquisition and update in step S161, an incremental update mechanism is adopted using a sliding time window. A sampling update period is set. The control period is set to 50ms. This period is based on the fact that the human ear's auditory temporal integration constant is typically between 100ms and 200ms. Following a control theory variant of the Nyquist sampling theorem, half of the integration constant (50ms) is chosen as the control period. This ensures that the system's response to environmental changes is faster than the human ear's perception delay, while avoiding processor power redundancy caused by oversampling. During feature updates, the statistical values of the entire dataset are not recalculated. Instead, an Exponential Weighted Moving Average (EWMA) algorithm is used to smooth the environmental noise feature set (such as noise level) and the user motion state feature set (such as acceleration variance). A smoothing factor is set. This value is designed to retain 80% of the historical state inertia in order to filter out random Gaussian white noise generated during sensor acquisition.
[0038] In one implementation, to reduce the real-time computational load on the embedded chip during the recalculation process in step S162, a lightweight decision tree or a pre-computed look-up table (LUT) is preferably used instead of a complex deep neural network model for fast inference. The updated feature vector is mapped to the look-up table index to directly obtain the corresponding latest target ambient sound activation ratio. Subsequently, based on the aforementioned equal loudness curve model, this activation ratio is converted into target mixing weights. In this process, a hysteresis comparator logic is introduced, meaning that the target weight is only updated when the change in the calculated new ratio from the original ratio exceeds a "minimum change threshold" (e.g., 3%). This threshold is set based on Weber's law, which states that the relative change in stimulus intensity must reach a certain proportion (minimum perceptible difference JND) for the perceptual system to recognize it, thereby avoiding frequent adjustments to the audio channel due to small computational fluctuations.
[0039] In steps S163 and S164, the deviation between the corresponding mixing weight and the mixing weight currently being executed by the headphones in real time is calculated to obtain a weight deviation value. If the weight deviation value exceeds the preset error range, the audio channel adjustment is re-executed or the corresponding working mode is switched based on the latest mixing weight until the deviation enters the preset error range.
[0040] Specifically, this step preferably employs a proportional-integral-derivative (PID) control algorithm derived from classical control theory to dynamically calculate the gain adjustment step size of the audio channel. This algorithm can generate gain adjustment commands quickly, smoothly, and without artifacts, ensuring that the headphone audio output can accurately track the target mixing weights even when there are sudden changes in environmental noise or drastic fluctuations in the user's movement, thus avoiding auditory discomfort caused by sudden volume changes.
[0041] The algorithm first performs error calculation by comparing the latest target mixing weight (i.e., the setpoint SP, for example, 0.6, representing 60% ambient sound) determined in S162 with the current actual mixing weight (referring to the physical gain configuration parameter currently in effect in the headphone digital signal processor, for example, 0.45), thereby calculating the deviation between the two. The target hybrid weight is dynamically calibrated based on the current scene identifier (e.g., "walking on the street") and the user's level of safety needs, representing the optimal acoustic balance between environmental perception and auditory comfort. This error represents the degree of deviation between the current audio channel gain state and the ideal safe auditory state.
[0042] Next, and most importantly, in the adjustment phase, the algorithm calculates the proportional (P) term. Based on the current weight deviation, it calculates a gain adjustment amount proportional to the current weight deviation. The adjustment magnitude is determined by a preset proportional gain. The control, specifically the proportional term, serves to quickly respond and eliminate most gain discrepancies, rapidly bringing the volume close to the target value. To further enhance the smoothness of the listening experience and control precision, the algorithm also introduces integral (I) and derivative (D) terms. The integral term eliminates minor steady-state errors caused by the inherent nonlinear response of the DSP hardware or digital quantization errors, ensuring the final output strictly aligns with the target weights. The derivative term predicts the trend of deviation changes (e.g., detecting whether the ambient sound proportion is rising sharply or leveling off), introducing damping in advance to prevent overshoot during gain adjustment, thus preventing excessive volume adjustment that could cause momentary harshness or sudden changes in volume, resulting in a smoother listening transition.
[0043] The effects of these three factors are ultimately weighted and summed using the classic PID control formula to obtain the total gain adjustment. : ; in, It is a time-varying weighted error signal (i.e.) ); It is the proportional gain, a dimensionless number used to set the overall response sensitivity of the gain adjustment; It is the integral time, which is measured in time. It is used to adjust the speed at which the system eliminates steady-state deviations. A smaller value means a stronger ability to correct historical errors. It is the differential time, also measured in time, used to predict the trend of error changes in order to suppress overshoot and oscillations in the system, thereby improving the stability of audio gradation.
[0044] It should be noted that the gain coefficient The gain coefficients were determined through electroacoustic system model simulation optimization. Specifically, a mathematical model of the transfer function was first established to accurately describe the gain response characteristics of the Bluetooth headset's digital signal processor (DSP) and the time integral characteristics of human hearing. Subsequently, in simulation software (such as MATLAB / Simulink), step response tests were conducted on multiple different combinations of PID parameters to evaluate the system's performance indicators under these combinations (e.g., whether the rise time is less than the human ear's echo threshold of 50ms, whether the gain overshoot is less than 1dB to protect hearing, and whether the steady-state error converges to within 0.5%). Finally, a set of parameter values that achieves the fastest convergence speed and whose gain overshoot conforms to the psychoacoustic comfort curve, while ensuring the absence of auditory artifacts (such as pops or zipper noise), was selected as the fixed gain coefficient.
[0045] The total adjustment This is then translated into specific digital gain step instructions. For example, in the example above, if the actual mix weight (PV, 0.45) is lower than the target mix weight (SP, 0.6), indicating insufficient ambient sound, the PID controller calculates a positive adjustment, which is mapped to the step value of the DSP gain register. For example, it drives the variable gain amplifier (VGA) to linearly boost the gain of the external transparency channel by 3dB over the next 20ms, rapidly enhancing environmental awareness when danger is detected while ensuring a smooth and natural listening experience.
[0046] In summary, this invention, based on the spatiotemporal fusion analysis of multidimensional audio features and motion states, combined with a scene-linked safety assessment model and a user historical preference weighting mechanism, ensures that the system continuously meets the performance goals of high environmental safety, high auditory comfort, and dynamic adaptability of audio control under complex conditions such as non-stationary fluctuations in environmental noise and rapid switching of user motion scenes through adaptive audio mixing weight allocation, dual-mode intelligent switching strategy, and closed-loop feedback adjustment mechanism based on deviation correction.
[0047] Reference Figure 2 The second embodiment of the present invention provides a Bluetooth headset smart control system, including: The environmental perception module acquires real-time audio signals from the Bluetooth headset and user motion data, extracts frequency domain features from the real-time audio signals, and analyzes the change patterns of the user motion data to obtain an environmental noise feature set and a user motion state feature set. The scene recognition module performs multi-dimensional fusion mapping between the environmental noise feature set and the user motion state feature set to generate the current scene identifier; The safety assessment module, if the current scene identifier corresponds to a high-noise scene, will match and calculate the environmental noise feature set, the user motion state feature set, and the preset safety assessment rules, and output the user safety requirement level. The preference decision module will extract matching preference parameters from the preset user historical preference feature library according to the user's security requirement level, and fuse the preference parameters with the environmental noise feature set to calculate and generate the environmental sound activation ratio. The audio mixing module determines the mixing weight of the external ambient sound signal and the internal audio signal based on the ambient sound activation ratio, and adaptively adjusts the headphone audio channel according to the mixing weight. The closed-loop adaptive module continuously collects the real-time audio signal and the user's motion data. Based on the collection results, it recalculates the ambient sound activation ratio and the corresponding mixing weight, and calculates the deviation with the current actual mixing weight. If the deviation exceeds the preset error range, it updates the current actual mixing weight in real time and adjusts the audio channel or switches the working mode accordingly to generate a dynamically adapted headphone function configuration.
[0048] It should be noted that the Bluetooth headset intelligent control system provided in this embodiment of the invention is used to execute all the process steps of the Bluetooth headset intelligent control method in the above embodiment. The working principle and beneficial effects of the two are one-to-one, so they will not be described again.
[0049] This invention also provides an electronic device. The electronic device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, such as an audio mixing program. When the processor executes the computer program, it implements the steps described in the various Bluetooth headset smart control method embodiments above, for example... Figure 1 The step S11 shown. Alternatively, when the processor executes the computer program, it implements the functions of each module / unit in the above system embodiments, such as the closed-loop adaptive module.
[0050] For example, the computer program may be divided into one or more modules / units, which are stored in the memory and executed by the processor to complete the present invention. The one or more modules / units may be a series of computer program instruction segments capable of performing a specific function, which describe the execution process of the computer program in the electronic device.
[0051] The electronic device may be a desktop computer, laptop, handheld computer, or smart tablet, etc. The electronic device may include, but is not limited to, a processor and memory. Those skilled in the art will understand that the above components are merely examples of electronic devices and do not constitute a limitation on the electronic device. It may include more or fewer components than described above, or combine certain components, or different components. For example, the electronic device may also include input / output devices, network access devices, buses, etc.
[0052] The processor can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor. The processor is the control center of the electronic device, connecting all parts of the electronic device via various interfaces and lines.
[0053] The memory can be used to store the computer programs and / or modules. The processor implements various functions of the electronic device by running or executing the computer programs and / or modules stored in the memory and by calling data stored in the memory. The memory may mainly include a program storage area and a data storage area. The program storage area may store the operating system, at least one application program required for a function (such as sound playback function, image playback function, etc.), etc.; the data storage area may store data created according to the use of the mobile phone (such as audio data, phonebook, etc.). In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as hard disk, memory, plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, at least one disk storage device, flash memory device, or other volatile solid-state storage device.
[0054] Wherein, if the modules / units integrated in the electronic device are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments of the present invention can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable medium can include: any entity or device capable of carrying the computer program code, recording media, USB flash drives, portable hard drives, magnetic disks, optical disks, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in the computer-readable medium can be appropriately added or removed according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media do not include electrical carrier signals and telecommunication signals.
[0055] It should be noted that the system embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Furthermore, in the accompanying drawings of the system embodiments provided by this invention, the connection relationships between modules indicate that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines. Those skilled in the art can understand and implement this without any creative effort.
[0056] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above descriptions are merely specific embodiments of the present invention and are not intended to limit the scope of protection of the present invention. In particular, it should be noted that any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention for those skilled in the art.
Claims
1. A method for intelligent control of Bluetooth headsets, characterized in that, include: The system acquires real-time audio signals from Bluetooth headsets and user motion data, extracts frequency domain features from the real-time audio signals, and analyzes the change patterns of the user motion data to obtain an environmental noise feature set and a user motion state feature set. The environmental noise feature set and the user motion state feature set are fused and mapped in multiple dimensions to generate the current scene identifier; If the current scene identifier corresponds to a high-noise scene, then the environmental noise feature set, the user motion state feature set, and the preset safety assessment rules are matched and calculated to output the user safety requirement level; Based on the user's security requirement level, matching preference parameters are extracted from a preset user history preference feature library, and the preference parameters are fused with the environmental noise feature set to calculate and generate the environmental sound activation ratio. The mixing weight of the external ambient sound signal and the internal audio signal is determined according to the ambient sound activation ratio, and the headphone audio channel is adaptively adjusted according to the mixing weight. The system continuously collects the real-time audio signal and the user's motion data. Based on the collection results, it recalculates the ambient sound activation ratio and the corresponding target mixing weight, and calculates the deviation with the current actual mixing weight. If the deviation exceeds the preset error range, it updates the current actual mixing weight in real time and adjusts the audio channel or switches the working mode accordingly to generate a dynamically adapted headphone function configuration.
2. The Bluetooth headset intelligent control method as described in claim 1, characterized in that, The process of acquiring real-time audio signals from Bluetooth headsets and user motion data, extracting frequency domain features from the real-time audio signals, and analyzing change patterns in the user motion data to obtain environmental noise feature sets and user motion state feature sets includes: Acquire real-time audio signals, perform Fourier transform on the real-time audio signals, extract frequency spectrum features and noise levels, and match and classify them with a pre-stored noise template library to obtain the environmental noise type and noise level. Acquire user motion data, perform statistical analysis on the acceleration change amplitude and frequency of the user motion data, and compare it with preset intensity thresholds at multiple levels to obtain motion intensity level and motion mode label; The environmental noise type, the noise level, and the frequency spectrum features are fused into an environmental noise feature set, and the motion intensity level and motion mode label are fused into a user motion state feature set.
3. The Bluetooth headset intelligent control method as described in claim 1, characterized in that, The step of performing multi-dimensional fusion mapping between the environmental noise feature set and the user motion state feature set to generate a current scene identifier includes: The environmental noise feature set is matched with a preset scene feature library at multiple levels to obtain a preliminary scene probability distribution; The preliminary scene probability distribution is weighted and fused with the user motion state feature set to generate a scene identifier.
4. The Bluetooth headset intelligent control method as described in claim 2, characterized in that, If the current scene identifier corresponds to a high-noise scene, then the environmental noise feature set, the user motion state feature set, and the preset safety assessment rules are matched and calculated to output the user safety requirement level, including: The exercise intensity level is compared with the preset intensity threshold level by level to obtain a preliminary risk score; The preliminary risk score is weighted and summed with the noise level to obtain the comprehensive risk value; The comprehensive risk value is matched with the preset security assessment rules to output the user's security requirement level.
5. The Bluetooth headset intelligent control method as described in claim 1, characterized in that, The step of extracting matching preference parameters from a preset user historical preference feature library based on the user's security requirement level, and fusing the preference parameters with the environmental noise feature set to calculate and generate the environmental sound activation ratio includes: Extract multiple historical environmental sound ratio records that match the scene identifier and the user's security requirement level from a preset user history preference feature library; The ambient sound ratio record and the ambient noise feature set are assigned different weights and then fused together to calculate the ambient sound activation ratio.
6. The Bluetooth headset intelligent control method as described in claim 1, characterized in that, The step of determining the mixing weight of the external ambient sound signal and the internal playback audio signal based on the ambient sound activation ratio, and adaptively adjusting the headphone audio channel according to the mixing weight, includes: Calculate the gain coefficient of the external ambient sound signal and the attenuation coefficient of the internal audio signal based on the ambient sound activation ratio. The two audio signals are weighted and superimposed in real time according to the gain coefficient and attenuation coefficient, and then output. When the ambient sound activation ratio is lower than the preset lower threshold, the external ambient sound gain coefficient is set to the preset minimum value and the noise reduction processing intensity is increased, thus entering the noise shielding mode.
7. The Bluetooth headset intelligent control method as described in claim 1, characterized in that, The system continuously collects the real-time audio signal and the user's motion data, recalculates the ambient sound activation ratio and corresponding mixing weight based on the collection results, and calculates the deviation between this recalculation and the current actual mixing weight. If the deviation exceeds a preset error range, the system updates the current actual mixing weight in real time and adjusts the audio channel or switches the working mode accordingly, generating a dynamically adapted headphone function configuration, including: The real-time audio signal and the user motion data are re-acquired within a preset period, and the environmental noise feature set and the user motion state feature set are updated based on the acquisition results. Based on the updated feature set, scene recognition, security requirement assessment and ambient sound activation ratio calculation are re-executed to obtain the latest target ambient sound activation ratio and corresponding mixing weights. The deviation between the corresponding mixing weight and the mixing weight currently being executed by the headphones in real time is calculated to obtain the weight deviation value; If the weight deviation value exceeds the preset error range, the audio channel adjustment will be re-executed or the corresponding working mode will be switched based on the latest mixed weight until the deviation falls within the preset error range.
8. A smart control system for Bluetooth headsets, characterized in that, include: The environmental perception module acquires real-time audio signals from the Bluetooth headset and user motion data, extracts frequency domain features from the real-time audio signals, and analyzes the change patterns of the user motion data to obtain an environmental noise feature set and a user motion state feature set. The scene recognition module performs multi-dimensional fusion mapping between the environmental noise feature set and the user motion state feature set to generate the current scene identifier; The safety assessment module, if the current scene identifier corresponds to a high-noise scene, will match and calculate the environmental noise feature set, the user motion state feature set, and the preset safety assessment rules, and output the user safety requirement level. The preference decision module will extract matching preference parameters from the preset user historical preference feature library according to the user's security requirement level, and fuse the preference parameters with the environmental noise feature set to calculate and generate the environmental sound activation ratio. The audio mixing module determines the mixing weight of the external ambient sound signal and the internal audio signal based on the ambient sound activation ratio, and adaptively adjusts the headphone audio channel according to the mixing weight. The closed-loop adaptive module continuously collects the real-time audio signal and the user's motion data. Based on the collection results, it recalculates the ambient sound activation ratio and the corresponding mixing weight, and calculates the deviation with the current actual mixing weight. If the deviation exceeds the preset error range, it updates the current actual mixing weight in real time and adjusts the audio channel or switches the working mode accordingly to generate a dynamically adapted headphone function configuration.