An intelligent predictive maintenance system for audio equipment failure

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By integrating multi-source data and using artificial intelligence analysis, combined with a hybrid model of convolutional neural networks and long short-term memory networks, intelligent predictive maintenance of audio equipment has been achieved. This addresses the shortcomings of traditional maintenance methods and improves the accuracy of fault prediction and the stability of equipment operation.

CN120996781BActive Publication Date: 2026-06-16SHENZHEN JIEYU INFORMATION TECH CO LTD

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHENZHEN JIEYU INFORMATION TECH CO LTD
Filing Date: 2025-08-07
Publication Date: 2026-06-16

Application Information

Patent Timeline

07 Aug 2025

Application

16 Jun 2026

Publication

CN120996781B

IPC: G06Q10/20; G08B29/18; G08B31/00; G06F18/20; G06F18/15; G06F18/25; G06N3/045; G06N3/0442; G06N3/0464; G06F123/02

CPC: G08B29/188; G08B29/186; G08B31/00; G06Q10/20; G06F18/20; G06F18/15; G06F18/253; G06N3/045

AI Tagging

Application Domain

Biological models Alarms

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing audio equipment maintenance methods make it difficult to adjust maintenance cycles according to the actual health status of the equipment, resulting in over-maintenance or under-maintenance. Furthermore, repairs after a fault cause excessive downtime for the equipment. Traditional condition monitoring and early warning systems have low accuracy and cannot achieve intelligent predictive maintenance.

⚗Method used

Employing multi-source data fusion and artificial intelligence analysis methods, the system acquires multi-source operational data from audio devices through a data acquisition module, preprocesses the data using a multimodal feature fusion algorithm, and combines a hybrid analysis model of convolutional neural networks and long short-term memory networks with an attention mechanism to assess health status, predict fault risks in real time, and generate maintenance decisions and early warnings.

🎯Benefits of technology

It enables accurate assessment and real-time prediction of the operating status of audio equipment, reducing the risk of failure, minimizing equipment downtime and maintenance costs, and improving operational reliability and service life.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN120996781B_ABST

Patent Text Reader

Abstract

The application discloses an intelligent predictive maintenance system for audio equipment failure, comprising: a data acquisition module for acquiring internal sensor data, output audio signal feature parameters, external environment parameters and historical operation logs during audio equipment operation; a feature preprocessing module for standardizing and denoising multi-source data based on a multi-modal feature fusion algorithm; a health state evaluation module for outputting real-time health state evaluation results of the audio equipment through a hybrid analysis model combining a convolutional neural network, a long short-term memory network and an attention mechanism; a failure risk prediction module for generating predictive maintenance decision parameters by performing real-time failure risk prediction according to the evaluation results; and a maintenance decision and early warning module for outputting failure early warning information according to the prediction parameters, and automatically generating maintenance operation suggestions when the early warning level reaches a preset condition. The application can effectively improve the operation reliability of the audio equipment and realize intelligent prediction and advance maintenance of the failure.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of audio equipment maintenance technology, and in particular to an intelligent predictive maintenance system for audio equipment malfunctions. Background Technology

[0002] With the rapid development of information technology and the consumer electronics industry, audio equipment has permeated various fields such as daily life, industrial production, and public services. From home audio systems and headphones to professional stage audio equipment and broadcast television transmission equipment, its stable operation directly affects user experience and the continuity of production activities. Audio equipment is usually composed of complex electronic components, acoustic components, and software systems. During long-term operation, it is susceptible to failure due to environmental factors, frequency of use, and component aging. Therefore, effective maintenance is crucial to ensuring equipment reliability.

[0003] Currently, audio equipment maintenance primarily involves two methods: regular preventative maintenance and post-failure repair. Regular preventative maintenance involves inspecting, cleaning, and replacing parts according to fixed schedules to reduce the probability of malfunctions. Post-failure repair, on the other hand, involves repairing or replacing parts after a significant malfunction has occurred. In recent years, some equipment has begun to incorporate basic condition monitoring functions, using a single sensor to collect parameters such as temperature and current. When these parameters exceed preset thresholds, an alarm is triggered, providing some reference for maintenance.

[0004] However, existing maintenance methods have significant limitations. Regular preventative maintenance struggles to adjust maintenance cycles based on the actual health status of the equipment, potentially leading to over-maintenance or under-maintenance, increasing unnecessary costs, or failing to detect potential faults in a timely manner. Post-fault repair is a reactive approach, often resulting in excessive downtime and disrupting normal operation. Furthermore, basic condition monitoring relies on only a single type of data, making it difficult to comprehensively reflect the equipment's operating status. Alarm thresholds are often fixed, failing to adapt to changes in equipment characteristics at different stages of use, resulting in low accuracy of warnings and hindering truly intelligent predictive maintenance. Summary of the Invention

[0005] In view of this, embodiments of the present invention provide an intelligent predictive maintenance system for audio equipment failures to solve the above-mentioned technical problems.

[0006] To achieve the above objectives, in a first aspect, an intelligent predictive maintenance system for audio equipment malfunctions is provided, the system comprising:

[0007] The data acquisition module is used to collect multi-source operating data of the audio equipment during operation, including internal sensor data of the audio equipment, characteristic parameters of the output audio signal, external environmental parameters, and historical operating logs.

[0008] The feature preprocessing module is used to preprocess the multi-source running data based on a multimodal feature fusion algorithm to obtain running feature data after feature standardization and noise reduction.

[0009] The health status assessment module includes a pre-trained audio device health status assessment model, which is used to input the operating feature data into the audio device health status assessment model and output the health status assessment result of the audio device; the audio device health status assessment model is a hybrid analysis model that combines convolutional neural networks and long short-term memory networks with an attention mechanism;

[0010] The fault risk prediction module is used to predict the fault risk of the audio device in real time based on the health status assessment results and generate predictive maintenance decision parameters.

[0011] The maintenance decision and early warning module is used to output audio equipment fault early warning information according to the predictive maintenance decision parameters, and generate corresponding maintenance operation suggestions according to the predictive maintenance decision parameters and the early warning level when the early warning level represented by the audio equipment fault early warning information meets preset conditions.

[0012] Secondly, a smart predictive maintenance method for audio equipment failure is provided, applicable to audio equipment, the method comprising the following steps:

[0013] Collect multi-source operating data of the audio device during operation, including internal sensor data of the audio device, characteristic parameters of the output audio signal, external environmental parameters, and historical operating logs;

[0014] The multi-source operational data is preprocessed based on a multimodal feature fusion algorithm to obtain operational feature data after feature standardization and noise reduction.

[0015] The operational feature data is input into a pre-trained audio device health status assessment model, which outputs the health status assessment result of the audio device. The audio device health status assessment model is a hybrid analysis model that combines convolutional neural networks and long short-term memory networks with an attention mechanism.

[0016] Based on the health status assessment results, real-time fault risk prediction is performed on the audio equipment to generate predictive maintenance decision parameters;

[0017] Based on the predictive maintenance decision parameters, output audio equipment fault warning information, and when the warning level represented by the audio equipment fault warning information meets preset conditions, generate corresponding maintenance operation suggestions based on the predictive maintenance decision parameters and the warning level.

[0018] Thirdly, an electronic device is provided, comprising:

[0019] One or more processors;

[0020] Storage device for storing one or more programs.

[0021] When the one or more programs are executed by the one or more processors, the one or more processors implement an intelligent predictive maintenance method for audio device failures as described in the second aspect.

[0022] Fourthly, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements an intelligent predictive maintenance method for audio device malfunctions as described in the second aspect.

[0023] Fifthly, a computer program product is provided, comprising a computer-readable storage medium storing a computer program, which, when executed, performs an intelligent predictive maintenance method for audio device malfunctions as described in the second aspect.

[0024] The above technical solution has the following beneficial technical effects:

[0025] The above technical solution acquires multi-source operational data through a data acquisition module, enabling comprehensive capture of the operating status and environmental influencing factors of audio equipment, overcoming the limitations of single-data monitoring. The feature preprocessing module performs standardization and noise reduction based on a multimodal feature fusion algorithm, effectively improving data quality and providing a reliable foundation for subsequent analysis. The health status assessment module adopts a hybrid model combining convolutional neural networks, long short-term memory networks, and attention mechanisms, which can accurately assess the health status of audio equipment, avoiding the limitations of traditional fixed threshold judgments. The fault risk prediction module works in conjunction with the maintenance decision and early warning module to predict fault risks in real time and generate targeted maintenance suggestions, realizing the transformation from passive repair and periodic maintenance to proactive predictive maintenance. This helps reduce audio equipment downtime, lower maintenance costs, and improve the reliability and service life of audio equipment. Attached Figure Description

[0026] The accompanying drawings are provided to better understand the invention and are not intended to unduly limit the scope of the invention. Wherein:

[0027] Figure 1 This is a functional block diagram of an intelligent predictive maintenance system for audio equipment failure according to an embodiment of the present invention;

[0028] Figure 2 This is a schematic diagram of the logical structure of the data acquisition module according to an embodiment of the present invention;

[0029] Figure 3 This is a schematic diagram of the processing flow of the audio feature extraction unit in an embodiment of the present invention;

[0030] Figure 4 This is a schematic diagram of the logical structure of the feature preprocessing module in an embodiment of the present invention;

[0031] Figure 5 This is a schematic diagram of the logical structure of the wavelet denoising unit in an embodiment of the present invention;

[0032] Figure 6 This is a schematic diagram of the logical structure of the health status assessment module according to an embodiment of the present invention;

[0033] Figure 7 This is a schematic diagram of the logical structure of the fault risk prediction module according to an embodiment of the present invention;

[0034] Figure 8 This is a schematic diagram of the logical structure of the maintenance decision and early warning module according to an embodiment of the present invention;

[0035] Figure 9 This is a flowchart of an intelligent predictive maintenance method for audio equipment failure according to an embodiment of the present invention;

[0036] Figure 10 This is a schematic diagram of the structure of a computer system according to an embodiment of the present invention. Detailed Implementation

[0037] The following description, in conjunction with the accompanying drawings, illustrates exemplary embodiments of the present invention, including various details to aid understanding. These details should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the invention. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.

[0038] like Figure 1 As shown, this embodiment provides an intelligent predictive maintenance system for audio equipment failures, the system comprising:

[0039] The data acquisition module is used to collect multi-source operating data of the audio equipment during operation, including internal sensor data of the audio equipment, characteristic parameters of the output audio signal, external environmental parameters, and historical operating logs.

[0040] The feature preprocessing module is used to preprocess the multi-source running data based on a multimodal feature fusion algorithm to obtain running feature data after feature standardization and noise reduction.

[0041] The health status assessment module includes a pre-trained audio device health status assessment model, which is used to input the operating feature data into the audio device health status assessment model and output the health status assessment result of the audio device; the audio device health status assessment model is a hybrid analysis model that combines convolutional neural networks and long short-term memory networks with an attention mechanism;

[0042] The fault risk prediction module is used to predict the fault risk of the audio device in real time based on the health status assessment results and generate predictive maintenance decision parameters.

[0043] The maintenance decision and early warning module is used to output audio equipment fault early warning information according to the predictive maintenance decision parameters, and generate corresponding maintenance operation suggestions according to the predictive maintenance decision parameters and the early warning level when the early warning level represented by the audio equipment fault early warning information meets preset conditions.

[0044] Specifically, the data acquisition module in this embodiment includes multiple sensors and data interfaces, including internal audio equipment such as temperature sensors, vibration sensors, voltage and current sensors, and microphone arrays, for real-time acquisition of audio equipment operating status data. In addition, the data acquisition module also includes an audio analysis unit for extracting audio signal feature parameters (such as spectral characteristics, harmonic distortion, signal-to-noise ratio, etc.), environmental parameter acquisition devices (such as temperature and humidity sensors, air quality sensors), and a log database interface for collecting historical operating logs. All acquired data is uniformly stored in a data buffer and provided to the feature preprocessing module in real time.

[0045] Specifically, the feature preprocessing module in this embodiment employs a multimodal feature fusion algorithm, including data cleaning, standardization, and noise reduction. First, data cleaning is completed through missing value imputation and outlier identification and removal. Then, z-score standardization is used to standardize the features of the multi-source data. Finally, wavelet transform and filtering algorithms are used for noise reduction, outputting standardized and clean running features.

[0046] The data is used by the subsequent health status assessment module.

[0047] Specifically, the health status assessment module in this embodiment incorporates a pre-trained hybrid analysis model. This model combines a Convolutional Neural Network (CNN) and a Long Short-Term Memory Network (LSTM), supplemented by an attention mechanism, to capture the temporal and spatial correlations in the audio device's operational data. In practice, preprocessed feature data is used as model input. Spatial features are extracted through a CNN layer, temporal features are modeled through an LSTM layer, and the features are weighted and fused using an attention mechanism to accurately output the current health status assessment result of the audio device.

[0048] Specifically, the fault risk prediction module in this embodiment uses a risk prediction algorithm to predict the potential fault risks of the audio equipment in real time based on the above health status assessment results. Specifically, it employs classification or regression models (such as random forests, support vector machines, etc.) to generate predictive maintenance decision parameters, including fault probability, fault severity, and remaining life prediction, for subsequent processing by the maintenance decision and early warning module.

[0049] Specifically, the maintenance decision and early warning module in this embodiment outputs fault early warning information in real time based on predictive maintenance decision parameters, including fault type, fault location, and severity. When the fault early warning level reaches a preset condition (e.g., the severity exceeds a threshold), the system automatically generates corresponding maintenance operation suggestions, such as immediately stopping operation, repairing or replacing components, and optimizing operating parameters, and pushes these suggestions to the relevant maintenance personnel's terminals to achieve timely and proactive maintenance response.

[0050] The intelligent predictive maintenance system for audio equipment faults proposed in this embodiment achieves accurate assessment and real-time prediction of the operating status of audio equipment through multi-source data fusion and artificial intelligence analysis methods, effectively improving the accuracy and timeliness of fault prediction. Specifically, through a multimodal feature fusion algorithm, it comprehensively utilizes internal sensor data, audio signal feature parameters, environmental parameters, and historical operating logs of the audio equipment to achieve efficient data standardization and noise reduction, thereby improving data quality. Furthermore, the hybrid analysis model integrating convolutional neural networks and long short-term memory networks with an attention mechanism can more comprehensively and deeply mine the spatiotemporal features in the data, improving the accuracy of health status assessment. Based on this, the system performs real-time fault risk prediction, proactively generating clear maintenance warning information and specific maintenance operation suggestions, enabling early detection and proactive maintenance of potential audio equipment faults. This effectively reduces the risk of fault occurrence, decreases audio equipment downtime and maintenance costs, and improves the operational stability and reliability of the audio equipment.

[0051] like Figure 2 As shown, in some embodiments, the data acquisition module may include:

[0052] The internal sensor unit includes any number of temperature sensors, current sensors, vibration sensors, air pressure sensors and Hall effect sensors disposed inside the housing of the audio device, for acquiring hardware operating status data of the audio device during operation;

[0053] An audio feature extraction unit is connected to a microphone array disposed inside the housing of the audio device. It is used to acquire the output audio signal through the microphone array and perform time-domain analysis and frequency-domain analysis based on the output audio signal to extract the characteristic parameters of the output audio signal.

[0054] An external environment sensing unit is connected to an external environment sensor mounted on the housing of the audio device, and is used to collect external environment parameters of the operating environment of the audio device;

[0055] The log reading unit is used to read historical operation log data from the storage unit of the audio device or a remote server.

[0056] Specifically, the internal sensor unit includes a temperature sensor mounted close to the power amplifier module housing of the audio equipment, capable of real-time monitoring of the module's operating temperature, ranging from -40℃ to 125℃ with an accuracy of ±0.5℃; a current sensor connected in series in the power input circuit to collect the equipment's operating current, with a measurement range of 0-10A and an accuracy of ±1%; a vibration sensor bolted to the metal bracket of the speaker unit, employing piezoelectric principles to capture vibration signals in the 5Hz-10kHz frequency band; a pressure sensor installed near the internal ventilation openings to monitor internal air pressure changes, with a measurement range of 80kPa-110kPa; and a Hall effect sensor positioned near the mechanical moving parts of the audio equipment to monitor their operating conditions, including displacement, rotation, and opening / closing. These sensors are connected to the main controller via an I2C bus, with a sampling frequency of 1kHz, transmitting hardware operating status data in real-time to the data acquisition module's buffer. In high-end audio systems, Hall effect sensors can be used to detect changes in displacement and angle of automatic telescopic structures, opening and closing of front panels, or rotation of adjustment components in speakers; in smart speakers or multimedia control terminals, Hall effect sensors can be used to monitor the rotation amplitude of motorized volume knobs, screen flipping, or the extension and retraction status of microphone arrays; in professional recording or monitoring equipment, Hall effect sensors can be used to detect the position of motorized tone faders, the opening and closing status of door control modules, or the rotation status of cooling fans; in installation-type broadcast audio systems, Hall effect sensors can also be used to detect changes in height and locking status of motorized hoists or lifting devices.

[0057] Specifically, the microphone array connected to the audio feature extraction unit consists of four omnidirectional microphones arranged in a square, installed 10cm in front of the speaker output port inside the audio equipment housing. The microphones have a frequency response range of 20Hz-20kHz and a sensitivity of -42dB±3dB. The microphone array is connected to the processing unit via an audio acquisition card, with a sampling frequency of 44.1kHz and a quantization bit depth of 16 bits. When extracting the characteristic parameters of the output audio signal, the time-domain analysis uses the empirical mode decomposition method to decompose the audio signal into eight intrinsic mode components, extracting features such as energy and peak factor for each component. The frequency-domain analysis uses short-time Fourier transform to obtain the spectrum, extracting the 20Hz-20kHz frequency band, calculating the energy distribution and spectral centroid of each 1 / 3 octave band, and finally forming a 32-dimensional audio signal feature parameter vector.

[0058] Specifically, the external environment sensing unit is connected to external environment sensors installed on the top of the audio equipment housing. The ambient temperature sensor is a digital temperature and humidity sensor with a measurement range of -20℃ to 60℃, and the humidity measurement range is 0-100%RH, with accuracies of ±0.3℃ and ±2%RH respectively. The light sensor is installed on the side of the housing and is a photoresistor type, with a measurement range of 0-100,000 lux. The dust sensor...

[0059] Through inhalation sampling, it can detect the concentration of particulate matter larger than 0.3 μm, with a measurement range of 0-1000 μg / m³. 3 These sensors communicate with the data acquisition module via an RS485 bus, with a sampling interval set to 10 seconds, and upload external environmental parameters in real time.

[0060] Specifically, the log reading unit connects to the audio device's flash memory storage unit via the device's internal SPI interface. It can read operational log data such as the device's power-on time, power-off time, fault code records, and parameter adjustment history, storing the data in CSV file format. A new log file is automatically generated at midnight every day. When the device is connected to the internet, the log reading unit can also establish a connection with a remote server via TCP / IP protocol and download backup data of the past three months of historical operational logs using FTP protocol, ensuring the integrity and traceability of the log data.

[0061] The data acquisition module in this embodiment achieves comprehensive monitoring and refined perception of the audio equipment's operating status through the coordinated operation of an internal sensor unit, an audio feature extraction unit, an external environment perception unit, and a log reading unit. Specifically, the internal sensor unit captures real-time hardware operating status data such as temperature, current, vibration, air pressure, and magnetic field changes, facilitating timely detection of internal anomalies in the audio equipment; the audio feature extraction unit performs precise time-domain and frequency-domain analysis of audio signals using a microphone array, promptly identifying sound quality degradation or equipment malfunctions; the external environment perception unit collects real-time operating environment information, effectively identifying the impact of external environmental factors on equipment performance; and the log reading unit fully mines historical operating information, enhancing the system's ability to judge long-term trend changes and potential risks.

[0062] In some preferred embodiments, the temperature sensor is located inside the audio device housing near the power amplifier circuit and battery module to collect the real-time operating temperature of the power amplifier circuit and battery module; the current sensor is located on the power supply line of the audio device to detect the magnitude and fluctuation of the power supply current; the vibration sensor is fixed to the acoustic unit mounting position inside the audio device housing to detect the vibration characteristics of the speaker, microphone assembly, and mechanical connecting parts; the air pressure sensor is located in the acoustic cavity area inside the audio device housing to collect the internal air pressure changes of the acoustic cavity; the Hall effect sensor is located near the mechanical moving parts of the audio device to monitor the operating status of the mechanical moving parts, including displacement, rotation, or opening and closing; the hardware operating status data includes: power amplifier circuit temperature, battery module temperature, power supply current, vibration characteristics of the acoustic unit, internal air pressure changes of the acoustic cavity, and operating status of the mechanical moving parts.

[0063] like Figure 3 As shown, in some preferred embodiments, the audio feature extraction unit may specifically include:

[0064] A multi-scale time-frequency analysis subunit is used to perform joint analysis on the output audio signal acquired by the microphone array using a multi-scale time-frequency fusion algorithm to obtain a multi-scale time-frequency feature representation of the output audio signal. The multi-scale time-frequency feature representation includes: intrinsic mode components obtained by performing joint processing of empirical mode decomposition and variational mode decomposition on the output audio signal in the time domain; and spectral energy distribution obtained by a multi-resolution analysis method based on a combination of improved short-time Fourier transform and continuous wavelet transform in the frequency domain.

[0065] The audio feature extraction subunit is used to extract the feature parameters of the output audio signal based on the intrinsic mode components and the spectral energy distribution in the multi-scale time-frequency feature representation.

[0066] Specifically, the implementation process of the multi-scale time-frequency analysis subunit is as follows: The microphone array uses 4-channel MEMS microphones, which are evenly distributed within a 10cm range around the speaker module of the audio equipment. The sampling frequency is set to 44.1kHz. The audio signal is synchronously acquired and beamformed to suppress environmental noise interference. The execution steps of the multi-scale time-frequency fusion algorithm are as follows: First, the single-channel audio signal after beamforming is processed in the time domain. Empirical Mode Decomposition (EMD) is used to decompose the signal into 8-12 intrinsic mode components (IMFs), and the termination condition is set to a standard deviation of less than 0.2. Then, variational mode decomposition (VMD) is performed on each IMF component. The preset number of modes is 3-5, and the penalty factor is 2000. More stable secondary intrinsic mode components are obtained through iterative optimization, and finally merged into 12-15 time-domain feature components. In the frequency domain processing, an improved short-time Fourier transform is first used, with the Hanning window length set to 1024 points and the overlap rate to 50%. The window length is dynamically adjusted (512-2048 points) to adapt to different frequency components. Then, a continuous wavelet transform is performed on the transform result, using the Morlet wavelet basis with a scale factor ranging from 1 to 32, to calculate the wavelet coefficients at 32 scales. Finally, multi-resolution analysis is used to divide the frequency domain into 10 frequency bands from 20Hz to 20kHz, and the energy values and energy proportions of each frequency band are statistically analyzed to form a spectral energy distribution matrix.

[0067] The improved short-time Fourier transform (SFT) is an enhanced Fourier analysis technique that optimizes the resolution, accuracy, and noise resistance of time-frequency analysis results by introducing adaptive window functions, time-frequency resolution optimization mechanisms, or window function superposition strategies, based on the traditional SFT. Specifically, during the sliding window Fourier transform, this method dynamically adjusts the length and shape of the window function to better match the local features of the signal, thereby improving the joint characterization of time and frequency domain information. Furthermore, this method can be combined with amplitude spectrum enhancement, phase compensation, and edge smoothing strategies to improve the signal's discriminability and robustness at multiple frequency scales. The improved SFT is suitable for high-resolution time-frequency analysis scenarios, especially for the fine detection of weak frequency drifts or transient disturbances in audio fault signals, providing a more stable and accurate spectral energy representation for subsequent feature extraction and pattern recognition.

[0068] Specifically, the audio feature extraction subunit operates as follows: For the intrinsic mode components in the multi-scale time-frequency feature representation, time-domain feature parameters of each component are extracted, including peak amplitude, root mean square value, kurtosis, skewness, zero-crossing rate, and duration. The zero-crossing rate calculation uses an adaptive threshold (5% of the signal amplitude) to avoid low-frequency noise interference. For the spectral energy distribution, the center frequency, bandwidth, energy entropy, spectral centroid, and spectral kurtosis of each frequency band are extracted. The energy entropy is calculated using the Shannon entropy formula based on the energy proportion of each frequency band. Subsequently, the time-domain and frequency-domain features are jointly screened. Redundant features with a correlation greater than 0.85 are removed using the Pearson correlation coefficient, ultimately retaining 32-dimensional feature parameters, including 16 time-domain features and 16 frequency-domain features, forming the feature vector of the output audio signal for subsequent fusion processing in the preprocessing module.

[0069] In a preferred embodiment, the output audio signal characteristic parameters include: fundamental frequency offset, harmonic energy distribution, time-domain envelope fluctuation characteristics, instantaneous frequency change rate, amplitude-frequency coupling characteristics, and sparse feature vectors of acoustic anomalies. Specifically, the fundamental frequency offset refers to the difference between the actual fundamental frequency of the audio signal and the nominal fundamental frequency of the device. It is calculated by extracting the signal fundamental frequency using the autocorrelation method and comparing it with the factory-calibrated standard fundamental frequency. Its value reflects the frequency stability of the audio device's oscillator or the degree of deviation in the resonant frequency of the speaker diaphragm. An offset exceeding 0.5% usually indicates diaphragm aging or abnormal circuit clock. Harmonic energy distribution refers to the proportion of energy at integer multiples of the fundamental frequency (2nd to 8th harmonics) to the total signal energy. It is obtained by decomposing the signal spectrum using Fourier transform and statistically analyzing the proportion of each harmonic energy. This parameter can intuitively reflect the nonlinear distortion of the power amplifier or the degradation state of the speaker's magnetic circuit system. Under normal operating conditions, the proportion of higher harmonics (5th and above) is usually less than 3%, but it will increase significantly in abnormal situations. The time-domain envelope fluctuation characteristic is obtained by extracting the instantaneous amplitude envelope of the signal using Hilbert transform and then calculating parameters such as the variance, peak factor, and rise time of the envelope. It is used to characterize the consistency of the device's response to dynamic audio signals. An envelope fluctuation standard deviation exceeding... A change of more than 5% indicates aging of the amplifier circuit capacitors or a decrease in the elasticity of the speaker suspension edge; the instantaneous frequency change rate is a statistical characteristic (e.g., mean, maximum rate of change) of the instantaneous frequency of the signal (obtained through the Hilbert-Huang transform) after taking the first derivative, reflecting the frequency drift speed over time, and can sensitively capture the instantaneous fluctuation of the speaker resonant frequency caused by overheating of the voice coil; the amplitude-frequency coupling characteristic is obtained by calculating the cross-correlation coefficient between amplitude and frequency within a specific frequency band (e.g., 20Hz-20kHz), used to describe the degree of correlation between amplitude change and frequency change. The amplitude-frequency coupling coefficient of normal equipment is less than 0.3, and this coefficient will increase significantly when the component parameters of the speaker crossover drift; the sparse feature vector of acoustic abnormal events is a low-dimensional vector (16-dimensional) extracted by the K-SVD sparse coding algorithm for abnormal events such as sudden noise and intermittent noise in the signal, where the non-zero elements correspond to the occurrence time, energy peak and frequency range of the abnormal event, which can accurately locate the precursors of hidden faults such as instantaneous noise caused by poor contact.

[0070] The advantages of this technical solution lie in the fact that the audio feature extraction unit performs joint deep analysis of the output audio signal in both the time and frequency domains through a multi-scale time-frequency analysis subunit. The combination of empirical mode decomposition (EMD) and variational mode decomposition (VMD) not only overcomes the shortcomings of single mode decomposition, such as susceptibility to noise interference and mode aliasing, but also captures subtle time-domain distortions in the audio signal caused by minor component aging or latent faults. Furthermore, the combination of improved short-time Fourier transform (SFT) and continuous wavelet transform retains the time resolution advantage of SFT in the mid-to-high frequency bands while enhancing the frequency resolution capability in the low-frequency band through the multi-scale characteristics of wavelet transform, achieving accurate representation of the full-band spectral energy distribution. Based on this, the audio feature extraction subunit's fusion extraction of these two types of features enables cross-scale feature capture of audio equipment, from minor performance degradation to potential fault precursors, improving the accuracy of subsequent health status assessments and identifying latent fault risks that are difficult to detect using traditional methods 3-5 audio equipment operation cycles in advance.

[0071] like Figure 4 As shown, the feature preprocessing module may specifically include:

[0072] The data synchronization and completion unit is used to perform time synchronization and interpolation completion on the multi-source running data to obtain time-aligned and data-complete multi-source running data.

[0073] The wavelet denoising unit is used to filter the time-aligned and complete multi-source running data using a wavelet denoising method based on acoustic characteristic constraints, so as to obtain denoised multi-source running data.

[0074] The feature fusion unit is used to perform feature compression and fusion on the noise-reduced multi-source operating data using principal component analysis combined with acoustic feature weights, so as to obtain fused low-dimensional operating feature data, which preserves the dynamic change trend of the acoustic operating features of the audio device.

[0075] In a preferred embodiment, the specific working process of the data synchronization and completion unit is as follows: Time synchronization of multi-source operating data is based on the high-precision real-time clock (RTC) built into the audio device. Millisecond-level timestamps are added to various types of data. Internal sensor data (sampling frequency 10Hz) and external environmental parameters (sampling frequency 1Hz) are synchronized via hardware triggering. Output audio signal characteristic parameters (sampling frequency 44.1kHz) and historical operating logs (event-triggered records) are aligned through timestamp mapping. For data missing issues, a hierarchical interpolation strategy is adopted: For short-term missing (≤3 sampling points) sensor data, linear interpolation is used for completion; for long-term missing (>3 sampling points) slowly varying parameters such as temperature and current, forward padding combined with a moving average (window size 5) is used; for missing high-frequency audio characteristic parameters, cubic spline interpolation is used to ensure that the completed data is continuous and consistent in trend on the time axis, ultimately forming a complete multi-source dataset with a time interval of 100ms.

[0076] In a preferred embodiment, the specific operation of the wavelet denoising unit is as follows: The wavelet denoising method based on acoustic characteristic constraints first selects the wavelet basis function according to the data type. For low-frequency signals such as temperature and current, the db4 wavelet is selected, and for high-frequency signals such as vibration and audio features, the sym8 wavelet is selected. The number of decomposition layers is set to 5-7 layers (dynamically adjusted according to the highest frequency of the signal; 7 layers are used for high-frequency signals). The denoising threshold is determined by combining the main frequency range of the output audio signal (20Hz-20kHz). For the wavelet coefficients of each scale obtained by decomposition, the threshold is set according to the frequency band: a more lenient threshold (1.2 times the signal standard deviation) is used for the coefficients corresponding to the low frequency band (≤500Hz) to retain the basic operating characteristics of the equipment; a more stringent threshold (0.8 times the signal standard deviation) is used for the coefficients corresponding to the mid-to-high frequency band (>500Hz) to suppress high-frequency noise. For coefficients below the threshold, soft threshold attenuation is performed (i.e., when |x|<t, sign(x)(|x|-t)) is output, while coefficients above the threshold remain unchanged. Finally, the data is reconstructed through inverse wavelet transform, which improves the signal-to-noise ratio of the denoised data by 15-20dB, and there is no significant loss of key information such as peaks and abrupt changes in acoustic features.

[0077] In a preferred embodiment, the specific implementation process of the feature fusion unit is as follows: First, the multi-source data after noise reduction is standardized using Principal Component Analysis (PCA) with acoustic feature weights (mean = 0, standard deviation = 1), resulting in 49 features including internal sensor data (12-dimensional), external environmental parameters (5-dimensional), and output audio signal feature parameters (32-dimensional). The weights of each feature are calculated using the entropy weighting method, with the weight coefficient for the audio signal feature parameters set to 0.6 (higher than the 0.4 for other features) to highlight the importance of acoustic characteristics. The weighted feature matrix is input into the PCA model to calculate the covariance matrix and solve for the eigenvalues. Principal components are selected from largest to smallest eigenvalues until the cumulative contribution rate reaches 95% (retaining 15-20 principal components). The fused low-dimensional operating feature data is verified by comparing the dynamic change trends of the audio signal features before and after fusion (e.g., the rate of change of fundamental frequency offset, the fluctuation amplitude of harmonic energy distribution) to ensure that the trend retention of acoustic features is ≥90%, ultimately obtaining fused features that can both compress data dimensions and accurately reflect the acoustic operating status of the device.

[0078] The advantages of the above technical solution are as follows: the feature preprocessing module achieves time alignment and completeness of multi-source operational data through the data synchronization and completion unit, solving the problems of temporal misalignment and missing data of different types, and providing a consistent data foundation for subsequent analysis; the wavelet denoising unit performs filtering based on acoustic characteristic constraints, effectively suppressing noise while accurately preserving the key acoustic features of the audio device, avoiding the loss of useful information; the feature fusion unit combines principal component analysis with acoustic feature weights, compressing data dimensions to improve processing efficiency while focusing on preserving the dynamic change trend of acoustic operational features, ensuring that the fused features can truly reflect the core operating status of the device. The synergistic effect of these three components improves the data quality and effectiveness input to the health status assessment model.

[0079] like Figure 5 As shown, the wavelet denoising unit may specifically include:

[0080] The wavelet decomposition subunit is used to perform wavelet decomposition on various types of sensor data in the time-aligned and complete multi-source running data, and decompose each type of sensor data into wavelet coefficients of different scales.

[0081] The threshold determination subunit is used to determine the denoising threshold for each scale based on the frequency distribution and harmonic energy characteristics of the output audio signal.

[0082] A threshold processing subunit is used to perform soft threshold attenuation processing on wavelet coefficients below the denoising threshold, and retain wavelet coefficients above the threshold.

[0083] The signal reconstruction subunit is used to reconstruct the processed wavelet coefficients by inverse wavelet transform, so as to obtain multi-source operating data with noise reduction and reduced random noise interference while preserving the changes in the operating characteristics of the audio equipment.

[0084] In some embodiments, the specific implementation process of the wavelet decomposition subunit includes: performing wavelet decomposition according to data type for time-aligned and complete multi-source operational data. For low-frequency slowly varying signals such as temperature (0-100℃, sampling rate 10Hz) and air pressure (80-120kPa, sampling rate 1Hz), the db4 wavelet basis is selected, and the number of decomposition layers is set to 5, thereby obtaining 5 sets of approximate coefficients and 5 sets of detail coefficients from low frequency to high frequency; for mid-to-high frequency signals such as current (0-5A, sampling rate 100Hz) and vibration (0-500Hz, sampling rate 1kHz), the sym8 wavelet basis is selected, and the number of decomposition layers is set to 6; for high-frequency characteristic parameters related to audio (e.g., instantaneous frequency change rate, sampling rate 44.1kHz), the coif5 wavelet basis is selected, and the number of decomposition layers is set to 7, ensuring that the coefficients at each scale can match the frequency components of the corresponding signal (low-frequency coefficients correspond to stable operating characteristics of the equipment, and high-frequency coefficients correspond to dynamic fluctuations or noise). The decomposition process is implemented using the MATLAB wavelet toolbox. After each decomposition, the wavelet coefficient matrix at that scale is output, and the matrix dimension is consistent with the length of the original data.

[0085] In some embodiments, the threshold determination subunit operates as follows: First, the output audio signal is subjected to spectral analysis to divide it into 5 key frequency bands (20-200Hz, 200-1kHz, 1-5kHz, 5-10kHz, 10-20kHz), and the energy proportion of each frequency band is statistically analyzed (15%, 25%, 30%, 20%, and 10% under normal operating conditions, respectively). Then, according to the scale-frequency correspondence of wavelet decomposition (for example, in a 7-level decomposition, scale 1 corresponds to 10-20kHz, scale 2 corresponds to 5-10kHz, and so on), the energy proportion of each frequency band is mapped to the corresponding decomposition scale. For the threshold of each scale, the "signal standard deviation × dynamic coefficient" is used for calculation, where the dynamic coefficient is positively correlated with the harmonic energy proportion of the corresponding frequency band at that scale: if the harmonic energy proportion of a certain frequency band is higher than the normal threshold (for example, the 1-5kHz band exceeds 30%), the dynamic coefficient is set to 1.2 (relaxing the threshold to retain more features); if it is lower than the normal threshold, the dynamic coefficient is set to 0.8 (tightening the threshold for strong noise reduction). For example, for scale 3 corresponding to the 1-5kHz frequency band, when the harmonic energy accounts for 35%, the threshold is set to 1.2 times the standard deviation of the scale coefficient.

[0086] In some embodiments, the specific working process of the threshold processing subunit is as follows: Soft threshold attenuation processing is performed on wavelet coefficients at each scale. The calculation formula is: if the absolute value of the coefficient |x| ≤ threshold t, then the output y = sign(x) × (|x| - t) (where sign(x) is the sign function); if |x| > t, then the output y = x. Processing is performed sequentially by scale. Low-frequency approximation coefficients (corresponding to basic equipment operating characteristics) are treated with a mild attenuation coefficient of 0.9 (i.e., t × 0.9) to avoid excessive attenuation leading to trend distortion; high-frequency detail coefficients (corresponding to noise or burst features) are treated with an enhanced attenuation coefficient of 1.1 (i.e., t × 1.1) to enhance noise suppression. After processing, the coefficient energy percentage is verified. The energy retention rate of the coefficients after processing at each scale must be ≥85% (low-frequency coefficients) and ≥70% (high-frequency coefficients) to ensure that effective features are not over-filtered.

[0087] In some embodiments, the specific process of the signal reconstruction subunit is as follows: the wavelet coefficients (including approximation coefficients and detail coefficients) of each scale after thresholding are input into the inverse wavelet transform function (matched with the wavelet basis during decomposition) in reverse order of the original decomposition level, and the signal is reconstructed layer by layer. A multi-scale energy compensation strategy is adopted during the reconstruction process: 10% energy compensation of the original signal is superimposed on the reconstructed signal in the low-frequency band (≤1kHz) to avoid baseline drift caused by thresholding; the reconstructed signal in the high-frequency band (>1kHz) is smoothed and filtered (window size 3) to reduce reconstruction oscillations. The final output of the denoised multi-source operating data must meet two indicators: first, the signal-to-noise ratio (SNR) is improved by ≥18dB compared with the original data (calculated by comparing the mean square error of the reconstructed signal and the original signal); second, the retention rate of key features (such as vibration peak and audio fundamental frequency offset) is ≥95% (calculated by the eigenvalue deviation rate), ensuring that while suppressing random noise, the characteristic changes of the equipment operating state are fully preserved.

[0088] In some embodiments, the setting of the denoising threshold satisfies the following constraints: preserving low-frequency variation trends related to audio equipment malfunctions, suppressing high-frequency random noise, and maintaining effective information on the main harmonic frequency bands related to the normal operating characteristics of the audio equipment. In specific implementations, the setting of the denoising threshold needs to be configured specifically for each frequency band in conjunction with the operating characteristics of the audio equipment. This includes: for low-frequency variation trends related to malfunctions (e.g., slow temperature drift, vibration baseline shift, etc., corresponding to frequencies ≤500Hz), setting the denoising threshold for the corresponding wavelet scale of this frequency band to 1.5 times the signal standard deviation, thus preserving malfunction precursor features such as a slow rise in power amplifier module temperature of 0.5℃ per hour and a baseline shift in speaker diaphragm vibration amplitude of 0.1mm through a relaxed threshold; for high-frequency random noise (e.g., electromagnetic interference, ambient airflow noise, etc., corresponding to frequencies >10kHz), setting the denoising threshold for its corresponding wavelet scale to 0.5 times the signal standard deviation, thus attenuating the frequency band coefficients through a strict threshold, suppressing high-frequency power supply ripple (15kHz) and sudden environmental noise (20kHz). Unrelated interference (above Hz) is eliminated, reducing noise energy in this frequency band by more than 60%. For the main harmonic frequency band (20Hz-10kHz, covering the main harmonic components of human voice and music signals) related to the normal operating characteristics of the equipment, the thresholds are subdivided by frequency band: the threshold for 20-500Hz (low-frequency harmonics) is set to 1.2 times the signal standard deviation to preserve the harmonic energy distribution of the bass unit; the threshold for 500-5kHz (mid-frequency harmonics) is set to 1.0 times the signal standard deviation to maintain the harmonic proportion of the mid-range unit; and the threshold for 5-10kHz (high-frequency harmonics) is set to 0.8 times the signal standard deviation to preserve the harmonic characteristics of the tweeter while denoising. Ultimately, the energy ratio deviation of the fundamental frequency and the 2nd-8th harmonics in this frequency band is controlled within ±5%, ensuring that effective information related to normal operating characteristics is not over-filtered. Through the above-mentioned frequency band threshold settings, the low-frequency trends related to faults such as slow temperature changes are fully preserved, high-frequency random noise is suppressed, and the characteristic integrity of the main harmonic frequency band is maintained, meeting the preset constraint requirements.

[0089] The advantages of this technical solution are as follows: the wavelet denoising unit performs targeted wavelet decomposition on different types of sensor data through the wavelet decomposition subunit, ensuring that the wavelet coefficients at each scale accurately match the signal frequency characteristics; the threshold determination subunit dynamically sets the denoising threshold based on the frequency distribution and harmonic energy characteristics of the output audio signal, making the threshold adaptable to changes in the acoustic characteristics of the equipment; the threshold processing subunit uses soft threshold attenuation processing to suppress noise while avoiding distortion of key features; and the signal reconstruction subunit achieves signal reconstruction through inverse wavelet transform and energy compensation, effectively preserving changes in the operating characteristics of the equipment. The synergistic effect of these four components reduces random noise interference while accurately preserving key operating characteristics such as vibration peaks and audio fundamental frequency shifts, overcoming the difficulty of balancing noise suppression and feature preservation in traditional denoising methods, and improving the accuracy and stability of fault prediction for the entire system.

[0090] like Figure 6 As shown, the audio device health status assessment model includes: a feature extraction unit, a convolutional neural network and a long short-term memory neural network connected to the feature extraction unit, and an attention mechanism unit for weighted fusion of the outputs of the convolutional neural network and the long short-term memory neural network;

[0091] The health status assessment module specifically includes:

[0092] The feature decomposition unit is used to input the operational feature data into the audio device health status assessment model, and to perform spectral decomposition and time series feature extraction on the operational feature data in the feature extraction unit of the audio device health status assessment model to obtain frequency domain features and time series features.

[0093] A frequency domain processing unit is used to input the frequency domain features into the convolutional neural network for processing to obtain a deep representation of the frequency domain features;

[0094] A time series processing unit is used to input the time series features into the long short-term memory neural network for processing to obtain a prediction vector of the time series features;

[0095] The fusion evaluation unit is used, under the action of the attention mechanism unit, to fuse the deep representation of the frequency domain features with the prediction vector of the time series features to generate a comprehensive health status index, and output the health status evaluation result of the audio device based on the index.

[0096] In some embodiments, the feature extraction unit adopts a dual-channel parallel structure. The frequency domain feature extraction branch converts 15-20 dimensional low-dimensional running feature data into a 256×256 spectrogram (time step 256, frequency resolution 256) using a short-time Fourier transform. The temporal feature extraction branch converts the feature data into a 60-length time series segment using a sliding window (window size 60, stride 10). The convolutional neural network adopts a 3-layer convolutional structure. The first layer consists of 64 3×3 convolutional kernels (stride 1, padding = 1) + ReLU activation function + 2×2 max pooling. The second layer consists of 128 3×3 convolutional kernels + ReLU + 2×2 max pooling. The third layer... The system uses 256 3×3 convolutional kernels + ReLU + global average pooling to output 256-dimensional deep features in the frequency domain. The Long Short-Term Memory (LSTM) neural network consists of two bidirectional LSTM layers (64 units per layer) + dropout (probability 0.3), and finally outputs a 64-dimensional time series prediction vector through a fully connected layer. The attention mechanism unit adopts an additive attention model to calculate the weights of the 256-dimensional features output by the convolutional network and the 64-dimensional vector output by the LSTM (the weight values are normalized by softmax). The initial value of the frequency domain feature weight is set to 0.6, and the initial value of the time series feature weight is set to 0.4. After dynamic adjustment, a 128-dimensional fused feature vector is obtained by weighted summation.

[0097] In some embodiments, the feature decomposition unit receives the fused low-dimensional running feature data (15-20 dimensions), divides it into two parallel processing paths: one path is input to the frequency domain branch of the feature extraction unit, which obtains frequency domain features in the form of a spectrogram (256×256) through short-time Fourier transform; the other path is input to the time-series branch, which obtains time-series features (length 60) through sliding window processing. The frequency domain processing unit inputs the spectrogram into a convolutional neural network, and after three layers of convolution and pooling operations, outputs a 256-dimensional deep frequency domain representation, which contains an abstract mapping of key frequency domain features such as spectral energy distribution and harmonic distortion. The time-series processing unit inputs the time-series features into a bidirectional LSTM, and captures the long-term dependencies of the features (e.g., ...) through two layers of loop computation. The system analyzes temperature trends and vibration cycle changes, outputting a 64-dimensional time series prediction vector containing feature predictions for the next five time steps. The fusion evaluation unit calls the attention mechanism unit to dynamically weight the 256-dimensional frequency domain features and the 64-dimensional time series features (the weights are updated in real time according to the importance of the features; for example, the weight of the time series features during the equipment aging stage is increased to 0.5). After weighted fusion, a comprehensive health status index of 0-100 is output through a fully connected layer (128→64→1) (100 is the best state, 0 is a serious fault). The evaluation results are output according to the index range (for example, 90-100 is healthy, 70-89 is mild degradation, 50-69 is moderate degradation, and <50 is high fault risk).

[0098] like Figure 7 As shown, the fault risk prediction module may specifically include:

[0099] The threshold adjustment unit is used to calculate and dynamically adjust the health threshold of the audio device based on the acoustic sensitivity curve, compare the comprehensive health status index with the dynamically adjusted health threshold, and calculate the current health score.

[0100] The trend prediction unit is used to input the current health score and the historical health score sequence into the prediction model, calculate the health score change trend in the future operating cycle, and perform weighted processing on the health score change trend based on the importance weights of each acoustic frequency band obtained during the training of the prediction model to obtain the predicted health score for the future operating cycle.

[0101] The decision parameter generation unit is used to generate predictive maintenance decision parameters when the predicted health score for the future operating cycle is lower than the dynamically adjusted health threshold.

[0102] In a specific embodiment of the threshold adjustment unit, the acoustic sensitivity curve is obtained by fitting the device's factory calibration and historical fault data. The horizontal axis represents the audio device's operating frequency (20Hz-20kHz), and the vertical axis represents the sensitivity coefficient (0-1, with higher values indicating greater sensitivity to faults in that frequency band). The sensitivity coefficient for the 1-5kHz band is set to 0.8 (the main frequency band for human voices, where faults have a significant impact), and the 20-200Hz band is set to 0.5. During dynamic adjustment, the distribution of the main frequency of the current device's output audio is first detected (e.g., extracting frequency bands accounting for more than 30% when playing music). The sensitivity coefficient weight of the frequency bands covered by the main frequency is increased (e.g., when the current main frequency is 1-3kHz, the weight of that frequency band is increased to 1.2). Then, a correction coefficient (range 0.8-1.2) is calculated based on the weight to determine the comprehensive health threshold benchmark value (default 70 points). The final dynamic threshold = 70 × correction coefficient (e.g., the threshold is increased to 84 points when the high-sensitivity frequency band is active, and decreased to 56 points when the sensitivity is low). The comprehensive health status index is compared with the dynamic threshold. The current health score is calculated as (index / dynamic threshold) × 100 (for example, when the index is 60 points and the threshold is 70 points, the score is approximately 85.7 points).

[0103] In a specific embodiment of the trend prediction unit, the prediction model adopts an improved Prophet time series model. The input is the current health score and the historical score sequence of the previous 60 operating cycles (each cycle is 24 hours). The model parameters are set as follows: annual cycle term strength 0.1 (annual aging trend of equipment), weekly cycle term strength 0.3 (weekly regularity of usage frequency), and trend flexibility parameter 0.05 (smoothing short-term fluctuations). After predicting the health score change trend for the next 14 operating cycles, acoustic frequency band importance weights are introduced (obtained during training through random forest feature importance calculations: 1-5kHz band weight 0.35, 5-10kHz band weight 0.25, 20-200Hz band weight 0.2, and other frequency bands weight 0.2) to perform weighted correction on the trend curve. If the health index corresponding to the high-weight frequency band in the predicted trend decreases at a faster rate (e.g., the daily decrease of more than 2 points for the 1-5kHz related index), the predicted score for the corresponding cycle is reduced by 5%-10%. Finally, the corrected predicted health score sequence for the next 14 cycles is output.

[0104] In a specific embodiment of the decision parameter generation unit, the parameter generation process is triggered when the predicted health score for any future period is lower than the dynamically adjusted health threshold (e.g., a predicted score of 52 points in period 7 < threshold 60 points). The predictive maintenance decision parameters include seven core types of information: fault risk level (divided into three levels based on the threshold difference: >20 points for high risk, 10-20 points for medium risk, and <10 points for low risk), expected fault occurrence period (the period from which the score first falls below the threshold, e.g., period 7), associated faulty components (based on historical data mapping, e.g., a drop in high-frequency indicators corresponds to a tweeter), key influencing factors (e.g., 30% for temperature exceeding limits, 50% for abnormal vibration), recommended maintenance window (from now until 3 days before the expected fault period), estimated maintenance cost (based on component type and labor costs), and priority score (0-10 points, considering both risk level and importance of the usage scenario).

[0105] Its beneficial technical effects are as follows: the threshold adjustment unit dynamically adapts the health threshold through the acoustic sensitivity curve, avoiding misjudgments under different operating conditions due to fixed thresholds; the trend prediction unit combines historical data with acoustic frequency band weights to improve the accuracy of future health trend predictions, especially for the forward-looking identification of faults associated with high-sensitive frequency bands; and the decision parameter generation unit outputs multi-dimensional maintenance parameters when a risk is triggered, providing a quantitative basis for subsequent maintenance decisions. The synergy of these three components improves the timeliness of fault risk prediction and reduces the false alarm rate.

[0106] like Figure 8 As shown, the maintenance decision and early warning module specifically includes:

[0107] The early warning generation unit is used to generate audio equipment fault early warning information based on the predictive maintenance decision parameters, and to determine the early warning level corresponding to the audio equipment fault early warning information.

[0108] The strategy determination unit is used to compare the warning level with preset conditions, and when the warning level reaches the preset conditions, determine the maintenance strategy, which is generated based on the predictive maintenance decision parameters and the warning level.

[0109] The suggested output unit is used to output maintenance operation suggestions based on the maintenance strategy; the maintenance operation suggestions include at least one of software update, component inspection, component replacement, acoustic performance self-calibration, or parameter optimization.

[0110] In practice, after receiving predictive maintenance decision parameters, the early warning generation unit generates fault early warning information based on three dimensions: risk level, expected failure period, and associated components. The risk level is mapped to an early warning level (high risk corresponds to red, medium risk to yellow, and low risk to blue), the expected failure period is marked as the possible failure after X cycles, and the associated components are specified as specific components (e.g., tweeter unit, power amplifier module). For example, when the risk level in the decision parameters is medium, the expected failure period is 7, and the associated component is the tweeter unit, the system will generate a yellow early warning message stating that the tweeter unit of the audio equipment is expected to fail after 7 operating cycles, and the current risk level is medium. The system also records the early warning generation time, trigger threshold, and other relevant metadata, storing them in the system log.

[0111] In specific implementation, the preset conditions for the strategy determination unit are as follows: red alerts correspond to immediate response (maintenance initiated within 1 hour), yellow alerts correspond to priority handling (maintenance planned within 24 hours), and blue alerts correspond to routine tracking (planning developed within 72 hours). When the alert level reaches the preset conditions, a maintenance strategy is generated based on predictive maintenance decision parameters: the red alert strategy includes emergency shutdown recommendations, backup equipment switching plans, and an emergency procurement list (e.g., immediate shutdown for inspection, activation of backup tweeters); the yellow alert strategy includes phased inspection steps and key parameter monitoring frequencies (e.g., daily monitoring of tweeter harmonic energy distribution, disassembly inspection in the 3rd cycle); the blue alert strategy includes a periodic data re-acquisition plan and a performance degradation trend tracking cycle (e.g., re-acquiring vibration data every 3 cycles, continuously tracking fundamental frequency offset changes). Key influencing factors in the decision parameters are simultaneously considered when formulating the strategy; for example, when the proportion of vibration anomalies exceeds 50%, the vibration sensor calibration process is strengthened in the strategy.

[0112] In practical implementation, it is recommended that the output unit match specific operation suggestions from the types mentioned above according to the maintenance strategy. Software update suggestions should specify the version and update content (e.g., updating firmware to V2.3.1 to fix high-frequency signal processing algorithm vulnerabilities); component inspection suggestions should detail the steps (e.g., disassembling the tweeter dust cover to check for diaphragm cracks and loose leads); component replacement suggestions should specify the model and replacement standard (e.g., replacing a CAP-100 filter capacitor with a capacitance deviation ≤5%); acoustic performance self-calibration suggestions should include calibration parameters (e.g., starting the built-in calibration program to automatically correct the frequency response curve in the 20Hz-20kHz band, with a target deviation ≤±1dB); parameter optimization suggestions should specify the adjustment values (e.g., reducing the power amplifier module gain from 8dB to 6dB to reduce the risk of overheating). Maintenance suggestions should be output in a combination of text and graphics, and can be exported as PDF or synchronized to the maintenance terminal.

[0113] Audio devices include at least one of Bluetooth speakers, smart headphones, conference microphones, mixing consoles, and audio amplifiers.

[0114] Its advantages lie in the fact that the early warning generation unit achieves precise fault location and risk visualization through three-dimensional early warning information, avoiding the ambiguity of traditional early warnings; the strategy determination unit, based on a dynamic matching response mechanism of early warning levels, ensures that resource investment is adapted to risk levels, reducing over-maintenance; and the suggestion output unit transforms maintenance strategies into specific and actionable steps, covering multiple dimensions such as software, hardware, and parameters, improving the accuracy and efficiency of maintenance execution. The synergy of these three components shortens the fault early warning response time for audio equipment, reducing the probability of downtime and maintenance costs.

[0115] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this application. The specific working process of the units and modules in the above system can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0116] like Figure 9 As shown, this embodiment also provides an intelligent predictive maintenance method for audio equipment faults, applicable to audio equipment. The method includes the following steps:

[0117] S10: Collect multi-source operating data of the audio device during operation, including internal sensor data of the audio device, characteristic parameters of the output audio signal, external environmental parameters, and historical operating logs;

[0118] S20: The multi-source running data is preprocessed based on a multimodal feature fusion algorithm to obtain running feature data after feature standardization and noise reduction.

[0119] S30: Input the running feature data into the pre-trained audio device health status assessment model, and output the health status assessment result of the audio device; the audio device health status assessment model is a hybrid analysis model that combines convolutional neural networks and long short-term memory networks with an attention mechanism;

[0120] S40: Based on the health status assessment results, perform real-time fault risk prediction on the audio device and generate predictive maintenance decision parameters;

[0121] S50: Output audio equipment fault warning information according to the predictive maintenance decision parameters, and when the warning level represented by the audio equipment fault warning information meets the preset conditions, generate corresponding maintenance operation suggestions according to the predictive maintenance decision parameters and the warning level.

[0122] Furthermore, step S10 specifically includes:

[0123] S11: Obtain hardware operating status data of the audio device during operation by means of temperature sensor, current sensor, vibration sensor, air pressure sensor and Hall effect sensor installed inside the housing of the audio device;

[0124] S12: The output audio signal is acquired by a microphone array located inside the housing of the audio device, and time-domain and frequency-domain analysis is performed on the output audio signal to extract characteristic parameters of the output audio signal;

[0125] S13: External environmental parameters of the audio device's operating environment are collected by an external environment sensor installed on the housing of the audio device;

[0126] S14: Read historical operation log data from the storage unit of the audio device or a remote server.

[0127] Further, step S12 specifically includes:

[0128] S121: Based on the output audio signal acquired by the microphone array, a multi-scale time-frequency fusion algorithm is used to jointly analyze the output audio signal to obtain a multi-scale time-frequency feature representation of the output audio signal. The multi-scale time-frequency feature representation includes: intrinsic mode components obtained by performing joint processing of empirical mode decomposition and variational mode decomposition on the output audio signal in the time domain; and spectral energy distribution obtained by a multi-resolution analysis method based on a combination of improved short-time Fourier transform and continuous wavelet transform in the frequency domain.

[0129] S122: Extract the characteristic parameters of the output audio signal based on the intrinsic mode components and the spectral energy distribution in the multi-scale time-frequency feature representation.

[0130] Furthermore, step S20 specifically includes:

[0131] S21: Perform time synchronization and interpolation on the multi-source running data to obtain time-aligned and complete multi-source running data;

[0132] S22: The time-aligned and complete multi-source running data is filtered using a wavelet denoising method based on acoustic characteristic constraints to obtain denoised multi-source running data.

[0133] S23: The principal component analysis method combined with acoustic feature weights is used to perform feature compression and fusion on the noise-reduced multi-source operating data to obtain fused low-dimensional operating feature data, so that the operating feature data retains the dynamic change trend of the acoustic operating features of the audio device.

[0134] Further, step S22 specifically includes:

[0135] S221: Perform wavelet decomposition on the various types of sensor data in the time-aligned and complete multi-source operational data, and decompose each type of sensor data into wavelet coefficients of different scales.

[0136] S222: Determine the denoising threshold for each scale based on the frequency distribution and harmonic energy characteristics of the output audio signal;

[0137] S223: Perform soft threshold attenuation processing on wavelet coefficients below the denoising threshold, and retain wavelet coefficients above the threshold;

[0138] S224: Reconstruct the processed wavelet coefficients using inverse wavelet transform to obtain denoised multi-source operating data that reduces random noise interference while preserving the changes in the operating characteristics of the audio equipment.

[0139] Furthermore, the audio device health status assessment model includes: a feature extraction module, a convolutional neural network module and a long short-term memory neural network connected to the feature extraction module, and an attention mechanism module for weighted fusion of the outputs of the convolutional neural network module and the long short-term memory neural network.

[0140] Furthermore, step S30 specifically includes:

[0141] S31: Input the operational feature data into the audio device health status assessment model, and perform spectral decomposition and time series feature extraction on the operational feature data in the feature extraction module of the audio device health status assessment model to obtain frequency domain features and time series features;

[0142] S32: Input the frequency domain features into the convolutional neural network for processing to obtain a deep representation of the frequency domain features;

[0143] S33: Input the time series features into the long short-term memory neural network for processing to obtain the prediction vector of the time series features;

[0144] S34: Under the action of the attention mechanism module, the deep representation of the frequency domain features is fused with the prediction vector of the time series features to generate a comprehensive health status index, and the health status assessment result of the audio device is output based on the index.

[0145] Furthermore, step S40 specifically includes:

[0146] S41: Calculate and dynamically adjust the health threshold of the audio device based on the acoustic sensitivity curve, compare the comprehensive health status index with the dynamically adjusted health threshold, and calculate the current health score;

[0147] S42: Input the current health score and the historical health score sequence into the prediction model, calculate the health score change trend in the future operating cycle, and perform weighted processing on the health score change trend based on the importance weights of each acoustic frequency band obtained during the training of the prediction model to obtain the predicted health score for the future operating cycle.

[0148] S43: When the predicted health score for the future operating cycle is lower than the dynamically adjusted health threshold, predictive maintenance decision parameters are generated.

[0149] Furthermore, step S50 specifically includes:

[0150] S51: Based on the predictive maintenance decision parameters, generate audio equipment fault warning information and determine the warning level corresponding to the audio equipment fault warning information;

[0151] S52: Compare the warning level with preset conditions. When the warning level reaches the preset conditions, determine the maintenance strategy. The maintenance strategy is generated based on the predictive maintenance decision parameters and the warning level.

[0152] S53: Output maintenance operation suggestions based on the maintenance strategy.

[0153] Furthermore, the maintenance operation recommendations include at least one of the following: software update, component inspection, component replacement, acoustic performance self-calibration, or parameter optimization.

[0154] This invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements any of the methods described above.

[0155] If the integrated module / unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable medium can include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a random access memory (RAM), an electrical carrier signal, a telecommunication signal, and a software distribution medium, etc. Of course, there are other forms of readable storage.

[0156] Storage media, such as quantum memories, graphene memories, etc. It should be noted that the content of the computer-readable medium may be appropriately added to or subtracted from the content according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media may not include electrical carrier signals and telecommunication signals.

[0157] The present invention also provides an electronic device. The electronic device according to an embodiment of the present invention includes: one or more processors; and a storage device for storing one or more programs, which, when executed by the one or more processors, cause the one or more processors to implement the method provided by the present invention.

[0158] The following is for reference. Figure 10It shows a schematic diagram of the structure of a computer system 800 suitable for implementing an electronic device according to embodiments of the present invention. Figure 10 The electronic device shown is merely an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present invention.

[0159] like Figure 10 As shown, the computer system 800 includes a central processing unit (CPU) 801, which can perform various appropriate actions and processes based on programs stored in read-only memory (ROM) 802 or programs loaded from storage section 808 into random access memory (RAM) 803. The RAM 803 also stores various programs and data required for the operation of the computer system 800. The CPU 801, ROM 802, and RAM 803 are interconnected via a bus 804. An input / output (I / O) interface 805 is also connected to the bus 804.

[0160] The following components are connected to I / O interface 805: an input section 806 including a keyboard, mouse, etc.; an output section 807 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 808 including a hard disk, etc.; and a communication section 809 including a network interface card such as a LAN card, modem, etc. The communication section 809 performs communication processing via a network such as the Internet. A drive 810 is also connected to I / O interface 805 as needed. A removable medium 811, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on drive 810 as needed so that computer programs read from it can be installed into storage section 808 as needed.

[0161] In particular, according to the embodiments disclosed in this invention, the processes described in the above main step diagrams can be implemented as computer software programs. For example, embodiments of this invention include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the main step diagrams. In the above embodiments, the computer program can be downloaded and installed from a network via communication section 809, and / or installed from removable medium 811. When the computer program is executed by central processing unit 801, it performs the functions defined in the system of this invention.

[0162] It should be noted that the computer-readable medium shown in this invention can be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof.

[0163] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can occur depending on design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.

Claims

1. An intelligent predictive maintenance system for audio equipment malfunctions, characterized in that, The system includes: The data acquisition module is used to collect multi-source operating data of the audio equipment during operation, including internal sensor data of the audio equipment, characteristic parameters of the output audio signal, external environmental parameters, and historical operating logs. The feature preprocessing module is used to preprocess the multi-source running data based on a multimodal feature fusion algorithm to obtain running feature data after feature standardization and noise reduction. The health status assessment module includes a pre-trained audio device health status assessment model, which is used to input the operating feature data into the audio device health status assessment model and output the health status assessment result of the audio device; the audio device health status assessment model is a hybrid analysis model that combines convolutional neural networks and long short-term memory networks with an attention mechanism; The fault risk prediction module is used to predict the fault risk of the audio device in real time based on the health status assessment results and generate predictive maintenance decision parameters. The maintenance decision and early warning module is used to output audio equipment fault early warning information according to the predictive maintenance decision parameters, and generate corresponding maintenance operation suggestions according to the predictive maintenance decision parameters and the early warning level when the early warning level represented by the audio equipment fault early warning information meets the preset conditions. The health status assessment module specifically includes: The feature decomposition unit is used to input the operational feature data into the audio device health status assessment model, and to perform spectral decomposition and time series feature extraction on the operational feature data in the feature extraction unit of the audio device health status assessment model to obtain frequency domain features and time series features. A frequency domain processing unit is used to input the frequency domain features into the convolutional neural network for processing to obtain a deep representation of the frequency domain features; The time series processing unit is used to input the time series features into the long short-term memory network for processing to obtain a prediction vector of the time series features; The fusion evaluation unit is used, under the action of the attention mechanism unit, to fuse the deep representation of the frequency domain features with the prediction vector of the time series features to generate a comprehensive health status index, and output the health status evaluation result of the audio device based on the index. Specifically, the fault risk prediction module includes: The threshold adjustment unit is used to calculate and dynamically adjust the health threshold of the audio device based on the acoustic sensitivity curve, compare the comprehensive health status index with the dynamically adjusted health threshold, and calculate the current health score. The trend prediction unit is used to input the current health score and the historical health score sequence into the prediction model, calculate the health score change trend in the future operating cycle, and perform weighted processing on the health score change trend based on the importance weights of each acoustic frequency band obtained during the training of the prediction model to obtain the predicted health score for the future operating cycle. The decision parameter generation unit is used to generate predictive maintenance decision parameters when the predicted health score for the future operating cycle is lower than the dynamically adjusted health threshold.

2. The intelligent predictive maintenance system for audio equipment failure according to claim 1, characterized in that, The data acquisition module includes: The internal sensor unit includes any number of temperature sensors, current sensors, vibration sensors, air pressure sensors and Hall effect sensors disposed inside the housing of the audio device, for acquiring hardware operating status data of the audio device during operation; An audio feature extraction unit is connected to a microphone array disposed inside the housing of the audio device. It is used to acquire the output audio signal through the microphone array and perform time-domain analysis and frequency-domain analysis based on the output audio signal to extract the characteristic parameters of the output audio signal. An external environment sensing unit is connected to an external environment sensor mounted on the housing of the audio device, and is used to collect external environment parameters of the operating environment of the audio device; The log reading unit is used to read historical operation log data from the storage unit of the audio device or a remote server.

3. The intelligent predictive maintenance system for audio equipment failure according to claim 2, characterized in that, The audio feature extraction unit specifically includes: A multi-scale time-frequency analysis subunit is used to perform joint analysis on the output audio signal acquired by the microphone array using a multi-scale time-frequency fusion algorithm to obtain a multi-scale time-frequency feature representation of the output audio signal. The multi-scale time-frequency feature representation includes: intrinsic mode components obtained by performing joint processing of empirical mode decomposition and variational mode decomposition on the output audio signal in the time domain; and spectral energy distribution obtained by a multi-resolution analysis method based on a combination of improved short-time Fourier transform and continuous wavelet transform in the frequency domain. The audio feature extraction subunit is used to extract the feature parameters of the output audio signal based on the intrinsic mode components and the spectral energy distribution in the multi-scale time-frequency feature representation.

4. The intelligent predictive maintenance system for audio equipment failure according to claim 1, characterized in that, The feature preprocessing module includes: The data synchronization and completion unit is used to perform time synchronization and interpolation completion on the multi-source running data to obtain time-aligned and data-complete multi-source running data. The wavelet denoising unit is used to filter the time-aligned and complete multi-source running data using a wavelet denoising method based on acoustic characteristic constraints, so as to obtain denoised multi-source running data. The feature fusion unit is used to perform feature compression and fusion on the noise-reduced multi-source operating data using principal component analysis combined with acoustic feature weights, so as to obtain fused low-dimensional operating feature data, which preserves the dynamic change trend of the acoustic operating features of the audio device.

5. The intelligent predictive maintenance system for audio equipment failure according to claim 4, characterized in that, The wavelet denoising unit specifically includes: The wavelet decomposition subunit is used to perform wavelet decomposition on various types of sensor data in the time-aligned and complete multi-source running data, and decompose each type of sensor data into wavelet coefficients of different scales. The threshold determination subunit is used to determine the denoising threshold for each scale based on the frequency distribution and harmonic energy characteristics of the output audio signal. A threshold processing subunit is used to perform soft threshold attenuation processing on wavelet coefficients below the denoising threshold, and retain wavelet coefficients above the threshold. The signal reconstruction subunit is used to reconstruct the processed wavelet coefficients by inverse wavelet transform, so as to obtain multi-source operating data with noise reduction and reduced random noise interference while preserving the changes in the operating characteristics of the audio equipment.

6. The intelligent predictive maintenance system for audio equipment failure according to claim 1, characterized in that, The audio device health status assessment model includes: a feature extraction unit, a convolutional neural network and a long short-term memory network connected to the feature extraction unit, and an attention mechanism unit for weighted fusion of the outputs of the convolutional neural network and the long short-term memory network.

7. The intelligent predictive maintenance system for audio equipment failure according to claim 1, characterized in that, The maintenance decision and early warning module specifically includes: The early warning generation unit is used to generate audio equipment fault early warning information based on the predictive maintenance decision parameters, and to determine the early warning level corresponding to the audio equipment fault early warning information. The strategy determination unit is used to compare the warning level with preset conditions, and when the warning level reaches the preset conditions, determine the maintenance strategy, which is generated based on the predictive maintenance decision parameters and the warning level. The suggested output unit is used to output maintenance operation suggestions based on the maintenance strategy.

8. The intelligent predictive maintenance system for audio equipment failure according to claim 1, characterized in that, The maintenance recommendations include at least one of the following: software update, component inspection, component replacement, acoustic performance self-calibration, or parameter optimization.