Machine voiceprint feature modeling method, system, device, medium and program product
By performing time-blocking and frequency importance assessment on two-dimensional time-frequency acoustic spectrograms, the problem of insufficient machine acoustic feature extraction capability in existing technologies is solved, achieving more robust and generalized feature representation and improving the accuracy and adaptability of industrial equipment fault diagnosis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NORTHEASTERN UNIV CHINA
- Filing Date
- 2025-08-15
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to effectively capture key time-frequency information from machine acoustic signatures in complex and ever-changing industrial environments, resulting in insufficient accuracy and generalization of fault diagnosis models in identification and anomaly detection.
By performing time-block processing on the two-dimensional time-frequency spectrogram and introducing a frequency importance assessment mechanism and a sorting pooling strategy, feature vectors of local time periods and key frequency regions are extracted to construct a machine voiceprint feature matrix.
It significantly improves the identification accuracy and robustness of fault diagnosis models in complex environments, enhances the modeling ability of local key features, and is suitable for acoustic print modeling tasks of various types of industrial equipment.
Smart Images

Figure CN120766686B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of voiceprint analysis technology, specifically to machine voiceprint feature modeling methods, systems, devices, media, and program products. Background Technology
[0002] With the continuous improvement of industrial automation, the complexity of industrial machines and the changing operating environment have placed higher demands on the real-time monitoring and intelligent diagnosis of equipment operating status. Traditional fault diagnosis methods often rely on human experience, which is inefficient and easily affected by subjective factors. In recent years, with the rapid development of signal processing and artificial intelligence technologies, industrial machine voiceprint analysis has gradually become a key means to solve this problem. The so-called "machine voiceprint" refers to the unique sound pattern emitted by a machine under specific structure, working conditions, and operating states. By extracting the voiceprint features of this type of pattern, it can be used to identify whether the equipment is in a normal state or has potential abnormalities.
[0003] However, due to the diverse types and complex operating states of industrial machines in actual production, the significant differences in the sound spectrum structure of different equipment, and challenges such as environmental noise interference and scarce fault data, it is difficult to effectively characterize machine voiceprints using only traditional audio time-domain and frequency-domain features (such as root mean square value, short-time energy, Mel-frequency cepstral coefficients, and power spectral density) in this domain offset scenario. To address these issues, current methods often involve converting one-dimensional sound signals into two-dimensional time-frequency spectrograms to extract their statistical features for voiceprint feature modeling and classification; or using neural networks to directly extract more complex and abstract feature representations of the original sound signal in an end-to-end manner. These methods can improve the ability of voiceprint analysis to some extent. However, these methods generally employ global average pooling or statistical averaging operations in the feature aggregation stage, failing to dynamically model local differences in the time spectrum and highlight key time periods and important frequency regions, thus limiting their applicability and generalization in complex and variable actual operating environments.
[0004] Therefore, designing an effective method for machine voiceprint feature representation and modeling is of great significance for improving the accuracy and robustness of using industrial voiceprints in tasks such as anomaly detection, fault diagnosis and identification. Summary of the Invention
[0005] To address the shortcomings of existing technologies, this invention provides a machine voiceprint feature modeling method, system, device, medium, and program product to solve the problem of the lack of feature representation technology in actual industry that can efficiently and fully capture key time-frequency information of machine voiceprints, thereby improving the accuracy and generalization of existing fault diagnosis models in identifying the operating status and detecting anomalies of industrial machines in complex environments.
[0006] In a first aspect, the present invention provides a machine voiceprint feature modeling method, comprising the following steps:
[0007] Step 1: Collect the sound signals of industrial equipment operation, preprocess the sound signals of industrial equipment operation, and generate a two-dimensional time-frequency sound spectrum;
[0008] Preprocessing of industrial equipment operating sound signals includes: pre-emphasis, framing, windowing, noise reduction, and short-time Fourier transform operations. Mel transform is then used to convert the one-dimensional audio signal into a two-dimensional time-frequency spectrogram. Where F is the frequency dimension and T is the time dimension;
[0009] Step 2: Divide the two-dimensional time-frequency spectrogram into several non-overlapping sub-blocks;
[0010] The two-dimensional time-frequency spectrogram X is divided into N non-overlapping sub-blocks of length W along the time dimension T. , Where the time length W = T / N;
[0011] Step 3: Sort the time frames in each sub-block of the two-dimensional time-frequency spectrogram according to their importance based on the time dimension, and then convert each sub-block of the two-dimensional time-frequency spectrogram into a one-dimensional vector. Perform weighted pooling on the sorted sub-blocks to obtain a one-dimensional feature vector for each sub-block.
[0012] Step 3.1: For each sub-block of the two-dimensional time-frequency spectrogram, sort the energy of the frequency frames of each sub-block according to the time dimension. The sorted two-dimensional time-frequency spectrogram is as follows: ;
[0013] Step 3.2: Use learnable pooling vectors For sorted sub-blocks Weighted pooling is performed to obtain a one-dimensional feature vector representing the voiceprint characteristics of each sub-block within a local time period. As shown in the formula below:
[0014]
[0015]
[0016] Where T represents the matrix transpose calculation, and r∈[0,1] are learnable parameters. The sum of the learnable parameters r;
[0017] Step 4: Calculate the frequency attention weights of the two-dimensional time-frequency spectrogram;
[0018] Step 4.1: Calculate the frequency importance value of the two-dimensional time-frequency spectrogram;
[0019] Calculate the global average value of each frequency band f∈F in the time dimension T of the two-dimensional time-frequency spectrogram X, and use it as a frequency importance value to measure the overall energy level of the frequency band. As shown in the formula below:
[0020]
[0021] Frequency importance value A larger value indicates that the frequency band f is more important;
[0022] Step 4.2: Calculate the frequency attention weight of the two-dimensional time-frequency spectrogram based on the frequency importance value of the two-dimensional time-frequency spectrogram;
[0023] A multilayer perceptron network is used, and a weight vector for the frequency importance value of each frequency band is calculated through a Softmax normalization layer. As shown in the formula below:
[0024]
[0025] Where K and b are both learnable parameters, and ;
[0026] Step 5: Based on the frequency attention weights of the two-dimensional time-frequency spectrogram and the one-dimensional feature vector of each sub-block, calculate the weighted feature vector of each sub-block of the two-dimensional time-frequency spectrogram;
[0027] The weight vector A of the frequency importance values for each frequency band. f sequentially with the one-dimensional feature vector of each sub-block Element-wise multiplication and weighting according to frequency importance of the two-dimensional time-frequency spectrogram yields the weighted feature vector of each sub-block of the two-dimensional time-frequency spectrogram. As shown in the formula below:
[0028]
[0029] Step 6: Concatenate the weighted feature vectors of each sub-block sequentially to obtain the machine voiceprint feature matrix;
[0030] The feature vector of each sub-block By concatenating these components sequentially, the machine voiceprint feature matrix is obtained. As shown in the formula below:
[0031] .
[0032] Secondly, the present invention also provides a machine voiceprint feature modeling system, including: an industrial sound signal data acquisition module, a sound preprocessing and noise reduction module, and a machine voiceprint characterization and modeling module;
[0033] The industrial sound signal data acquisition module is used to collect the sound signals of industrial equipment operation, store the collected sound signals of industrial equipment operation in the form of audio signals, and transmit them to the sound preprocessing and noise reduction module.
[0034] The sound preprocessing and denoising module is used to preprocess the collected sound signals of industrial equipment operation, and uses Mel transform to convert the one-dimensional audio signal into a two-dimensional time-frequency spectrogram, and transmits the two-dimensional time-frequency spectrogram to the machine acoustic text characterization and modeling module.
[0035] The machine voiceprint representation and modeling module is used to process the two-dimensional time-frequency spectrogram. It divides the two-dimensional time-frequency spectrogram into several sub-blocks in the time dimension and extracts the one-dimensional feature vector of each sub-block. At the same time, it calculates the weighted feature vector of each sub-block based on frequency attention weights. Finally, it concatenates the weighted feature vectors of each sub-block in sequence to obtain the machine voiceprint feature matrix.
[0036] Thirdly, this application proposes an electronic device, including: one or more processors, and a memory for storing instructions, which, when executed by the one or more processors, cause the one or more processors to perform the machine voiceprint feature modeling method.
[0037] Fourthly, this application proposes a computer-readable storage medium storing executable instructions that, when executed, cause a processor to perform the machine voiceprint feature modeling method.
[0038] Fifthly, this application proposes a computer program product, including a computer program or instructions that, when executed by a processor, implement the machine voiceprint feature modeling method described above.
[0039] The beneficial effects of adopting the above technical solution are as follows: The machine voiceprint feature modeling method provided by this invention addresses the problems of weak key information extraction capability, insufficient time-frequency resolution, and poor model generalization ability in existing industrial voiceprint feature extraction methods. It enhances the modeling capability of local voiceprint features, improves the recognition and diagnostic effectiveness of the frequency dimension, and enhances the adaptability and robustness of the diagnostic model. It has strong adaptability and broad application prospects and practical value, specifically:
[0040] By dividing the original time-frequency spectrogram into multiple time blocks and performing weighted pooling and feature aggregation within each local block, the model's ability to identify key local features is significantly improved, avoiding the problems of blurred feature representation and neglect of dynamic time information caused by basic methods.
[0041] Introducing a frequency importance assessment mechanism makes the voiceprint representation modeling process focus more on frequency regions that are strongly correlated with the device's operating status, thereby improving the recognition accuracy under abnormal conditions.
[0042] Through a unified structured strategy of segmentation, adaptive weighting, and aggregation, it is possible to efficiently capture the voiceprint characteristics of different machine types and learn more robust and higher-level audio representations, thereby further improving the recognition accuracy and generalization ability of subsequent diagnostic models in non-ideal acoustic environments such as multi-state and multi-device types.
[0043] This invention has a simple structure and is interpretable. It can be flexibly applied to the task of voiceprint modeling of the operating status of various types of industrial equipment, including power equipment, machine tools, fans, pumps and valves. It has good engineering adaptability and deployment feasibility, and can provide reliable voiceprint modeling support for the field of industrial equipment fault detection and diagnosis. Attached Figure Description
[0044] Figure 1 Flowchart of machine voiceprint segmentation time-frequency domain weighted pooling representation modeling provided in Embodiment 1 of the present invention;
[0045] Figure 2 A schematic diagram of the machine voiceprint feature modeling system provided in Embodiment 2 of the present invention;
[0046] Figure 3 The modeling results of the voiceprint features of the sound signal under abnormal operation state of the transformer provided in Embodiment 3 of the present invention are as follows: (a) is the original sound spectrum of the sound signal under abnormal operation state of the transformer; (b) is the sorted two-dimensional time spectrum of the sound signal under abnormal operation state of the transformer; (c) is the pooling vector; and (d) is the machine voiceprint feature matrix of the sound signal under abnormal operation state of the transformer. Detailed Implementation
[0047] The specific implementation methods of this application will be further described in detail below with reference to the accompanying drawings and embodiments.
[0048] Example 1
[0049] In the field of industrial machine acoustic signature analysis, several methods exist for sound feature extraction and modeling. The most commonly used technique is to convert the original audio signal into a two-dimensional time-spectrum graph using Short-Time Fourier Transform (STFT), and then construct feature vectors by statistically averaging or modeling the variance of the entire time-spectrum graph. While these methods are simple to implement and computationally efficient, their feature representation process fails to fully highlight the differences in key time segments and important frequency regions. In recent years, deep learning-based methods have been increasingly applied to audio feature representation. For example, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Long Short-Term Memory (LSTM) networks are used to extract deep features from audio. However, these neural network structures are complex, and the extracted features lack interpretability. Furthermore, they require substantial computational resources and training data to obtain an ideal model, which are unavailable in real-world industrial environments. Another approach involves performing a global weighted average pooling operation along the time dimension on the converted two-dimensional time-frequency acoustic spectrum to obtain a one-dimensional pooling vector to represent the machine's acoustic signature. However, the acoustic signals generated during machine operation exhibit strong non-stationarity and diversity. Methods relying on a unified global model of the entire signal segment cannot capture information between time segments, resulting in the neglect of short-term dynamic features and making them unsuitable for the complex and ever-changing working conditions in practical applications.
[0050] The aforementioned audio feature extraction and representation modeling methods still have some problems, such as ignoring temporal dynamic anomaly information, failing to perform differentiated modeling of key frequency regions, and lacking interpretability in the representation learning process.
[0051] This embodiment proposes a machine voiceprint feature modeling method. By performing time block processing on the two-dimensional time-frequency acoustic spectrogram and introducing a frequency importance assessment mechanism and a sorting pooling strategy, the voiceprint characteristics of key time and frequency frames are fully captured and learned, effectively improving the performance and adaptability of industrial machine voiceprint modeling in actual fault diagnosis tasks.
[0052] The machine voiceprint feature modeling method in this embodiment, such as Figure 1 As shown, it includes the following steps:
[0053] Step 1: Collect the sound signals of industrial equipment operation, preprocess the sound signals of industrial equipment operation, and generate a two-dimensional time-frequency sound spectrum;
[0054] Preprocessing of industrial equipment operating sound signals includes: pre-emphasis, framing, windowing, noise reduction, and short-time Fourier transform operations. Mel transform is then used to convert the one-dimensional audio signal into a two-dimensional time-frequency spectrogram. Where F is the frequency dimension and T is the time dimension;
[0055] Step 2: Divide the two-dimensional time-frequency spectrogram into several non-overlapping sub-blocks;
[0056] The two-dimensional time-frequency spectrogram X is divided into N non-overlapping sub-blocks of length W along the time dimension T. , Where the time length W = T / N;
[0057] Step 3: Sort the time frames in each sub-block of the two-dimensional time-frequency spectrogram according to their importance based on the time dimension, and then convert each sub-block of the two-dimensional time-frequency spectrogram into a one-dimensional vector. Perform weighted pooling on the sorted sub-blocks to obtain a one-dimensional feature vector for each sub-block.
[0058] Step 3.1: For each sub-block of the two-dimensional time-frequency spectrogram, sort the energy of the frequency frames of each sub-block according to the time dimension. The sorted two-dimensional time-frequency spectrogram is as follows: ;
[0059] Step 3.2: Use learnable pooling vectors For sorted sub-blocks Weighted pooling is performed to obtain a one-dimensional feature vector representing the voiceprint characteristics of each sub-block within a local time period. As shown in the formula below:
[0060]
[0061]
[0062] Where T represents the matrix transpose calculation, and r∈[0,1] are learnable parameters. The sum of the learnable parameters r;
[0063] When the learnable parameter r=0, it represents max pooling calculation, ignoring the stationary characteristics of the operating state; when the learnable parameter r=1, it represents average pooling calculation, ignoring the transient characteristics of the operating state. Based on the operating characteristics of different industrial equipment, the learnable parameter r is adaptively set to different values. Compared to the global sorting pooling method, this method better preserves the temporal information of the two-dimensional time-frequency spectrogram X and more robustly represents and models the machine's acoustic characteristics. The sorted sub-blocks... One-dimensional feature vector It has a lower X-dimensionality than a two-dimensional time-frequency spectrogram;
[0064] Step 4: Calculate the frequency attention weights of the two-dimensional time-frequency spectrogram;
[0065] Step 3's weighted pooling operation enhances the reliance on the time dimension, but it doesn't consider the importance of different frequency bands under different machine types and operating conditions. The varying importance of these frequency bands is crucial for anomaly monitoring and fault diagnosis in industrial equipment. By extracting frequency importance scores and assigning different weights to different frequency bands, the classifier can focus more on frequency regions that are critical and important for fault detection.
[0066] Step 4.1: Calculate the frequency importance value of the two-dimensional time-frequency spectrogram;
[0067] Calculate the global average value of each frequency band f∈F in the time dimension T of the two-dimensional time-frequency spectrogram X, and use it as a frequency importance value to measure the overall energy level of the frequency band. As shown in the formula below:
[0068]
[0069] Frequency importance value A larger value indicates that the frequency band f is more important;
[0070] Step 4.2: Calculate the frequency attention weight of the two-dimensional time-frequency spectrogram based on the frequency importance value of the two-dimensional time-frequency spectrogram;
[0071] A multilayer perceptron network is used, and a weight vector for the frequency importance value of each frequency band is calculated through a Softmax normalization layer. As shown in the formula below:
[0072]
[0073] Where K and b are both learnable parameters, and ;
[0074] Step 5: Based on the frequency attention weights of the two-dimensional time-frequency spectrogram and the one-dimensional feature vector of each sub-block, calculate the weighted feature vector of each sub-block of the two-dimensional time-frequency spectrogram;
[0075] The weight vector A of the frequency importance values for each frequency band. f sequentially with the one-dimensional feature vector of each sub-block Element-wise multiplication and weighting according to frequency importance of the two-dimensional time-frequency spectrogram yields the weighted feature vector of each sub-block of the two-dimensional time-frequency spectrogram. As shown in the formula below:
[0076]
[0077] Step 6: Concatenate the weighted feature vectors of each sub-block sequentially to obtain the machine voiceprint feature matrix;
[0078] The feature vector of each sub-block By concatenating these components sequentially, the machine voiceprint feature matrix is obtained. As shown in the formula below:
[0079]
[0080] The machine voiceprint feature matrix V takes into account information from both key time frames and frequency frames, enabling a more robust and generalized feature representation of machine voiceprint characteristics. This allows it to serve as input for subsequent fault detection, identification, and diagnosis modules, improving their diagnostic accuracy and robustness.
[0081] This embodiment proposes a machine voiceprint feature modeling method. It divides the time-frequency spectrogram into multiple local blocks along the time dimension and sequentially weights each block to enhance the modeling capability for key temporal regions. This addresses the problems of coarse feature representation granularity and neglect of dynamic time dimension information caused by global pooling in traditional methods. A frequency importance evaluation mechanism and a ranking pooling strategy are introduced to significantly weight the frequency dimension features within each time block, thereby highlighting the diagnostic value of key frequency bands and improving the discriminative power of voiceprint representations for different machine types and operating conditions. A structured local representation modeling process is designed. Based on the original two-dimensional time-frequency acoustic spectrogram, this invention performs segmented modeling through a unified block division, adaptive weighting, and aggregation method. This avoids the loss of key information caused by global sorting average pooling, achieving a more targeted, robust, and generalizable voiceprint feature matrix representation. It effectively enhances the model's local sensitivity and fine-grained recognition capability. The proposed audio voiceprint representation modeling method is lightweight, highly scalable, and can be efficiently integrated into existing industrial acoustic signal diagnostic processes as a feature extraction process. It is suitable for voiceprint modeling tasks under various equipment types and operating conditions.
[0082] Compared with existing audio feature extraction methods, the machine voiceprint feature modeling method in this embodiment can adaptively adjust and optimize hyperparameters according to the machine type in the actual industrial scenario. It can model a more robust voiceprint feature representation as input to the diagnostic model, improve the accuracy and interpretability of the diagnostic results, and has good prospects for practical application and promotion.
[0083] Example 2
[0084] This embodiment proposes a machine voiceprint feature modeling system, such as Figure 2 As shown, it includes: an industrial acoustic signal data acquisition module, an acoustic preprocessing and noise reduction module, and a machine acoustic signature characterization and modeling module.
[0085] The industrial sound signal data acquisition module is used to collect the sound signals of industrial equipment operation, store the collected sound signals of industrial equipment operation in the form of audio signals, and transmit them to the sound preprocessing and noise reduction module.
[0086] The sound preprocessing and denoising module is used to preprocess the collected sound signals of industrial equipment operation, and uses Mel transform to convert the one-dimensional audio signal into a two-dimensional time-frequency spectrogram, and transmits the two-dimensional time-frequency spectrogram to the machine acoustic text characterization and modeling module.
[0087] The machine voiceprint representation and modeling module is used to process the two-dimensional time-frequency spectrogram. It divides the two-dimensional time-frequency spectrogram into several sub-blocks in the time dimension and extracts the one-dimensional feature vector of each sub-block. At the same time, it calculates the weighted feature vector of each sub-block based on frequency attention weights. Finally, it concatenates the weighted feature vectors of each sub-block in sequence to obtain the machine voiceprint feature matrix.
[0088] Example 3
[0089] This embodiment proposes a method for acoustic signature modeling and fault diagnosis under abnormal transformer conditions, based on the aforementioned machine acoustic signature modeling method. The acoustic signature modeling process for the sound signal under abnormal transformer operating conditions adopts the machine acoustic signature modeling method of Embodiment 1, and includes the following steps:
[0090] S1: Collect sound signals under abnormal transformer operation conditions, preprocess the collected sound signals under abnormal transformer operation conditions, and obtain a two-dimensional time-frequency spectrum X of the sound signals under abnormal transformer operation conditions.
[0091] Preprocessing of audio signals under abnormal transformer operation includes: pre-emphasis, framing, windowing, noise reduction, and short-time Fourier transform operations. Mel transform is then used to convert the one-dimensional audio signal into a two-dimensional time-frequency spectrogram. Where F is the frequency dimension and T is the time dimension;
[0092] S2: Divide the two-dimensional time-spectrum diagram X of the sound signal under abnormal transformer operation into several non-overlapping sub-blocks;
[0093] The two-dimensional time-frequency spectrogram X of the sound signal under abnormal transformer operation is divided into N non-overlapping sub-blocks of length W along the time dimension T. , Where the time length W = T / N;
[0094] S3: Sort the time frames in each sub-block of the two-dimensional time-frequency spectrogram of the sound signal under abnormal operation of the transformer according to the importance of the time dimension, and then convert each sub-block of the two-dimensional time-frequency spectrogram of the sound signal under abnormal operation of the transformer into a one-dimensional vector. Then, perform weighted pooling on the sorted sub-blocks to obtain the one-dimensional feature vector of each sub-block.
[0095] S3.1: For each sub-block of the two-dimensional time-frequency spectrogram X of the sound signal under abnormal transformer operation, sort the energy of the frequency frames of each sub-block according to the time dimension. The sorted two-dimensional time-frequency spectrogram is as follows: ;
[0096] S3.2: Using learnable pooling vectors For sorted sub-blocks Weighted pooling is performed to obtain a one-dimensional feature vector representing the voiceprint characteristics of each sub-block within a local time period. As shown in the formula below:
[0097]
[0098]
[0099] Where T represents the matrix transpose calculation, and r∈[0,1] are learnable parameters. The sum of learnable parameters r;
[0100] S4: Calculate the frequency attention weight of the two-dimensional time-frequency spectrogram of the sound signal under abnormal transformer operation conditions;
[0101] S4.1: Calculate the frequency importance value of the two-dimensional time-frequency spectrogram of the sound signal under abnormal operation of the transformer;
[0102] Calculate the global average value of each frequency band f∈F in the time dimension T of the two-dimensional time-frequency spectrum X of the sound signal under abnormal transformer operation conditions. This value is used as the frequency importance value to measure the overall energy level of each frequency band. As shown in the formula below:
[0103]
[0104] Frequency importance value A larger value indicates that the frequency band f is more important;
[0105] S4.2: Based on the frequency importance value of the two-dimensional time-frequency spectrogram of the sound signal under abnormal transformer operation, calculate the frequency attention weight of the two-dimensional time-frequency spectrogram of the sound signal under abnormal transformer operation.
[0106] A multilayer perceptron network is used, and a weight vector for the frequency importance value of each frequency band is calculated through a Softmax normalization layer. As shown in the formula below:
[0107]
[0108] Where K and b are both learnable parameters, and ;
[0109] S5: Based on the frequency attention weights of the two-dimensional time-frequency spectrogram of the sound signal under abnormal transformer operation and the one-dimensional feature vector of each sub-block, calculate the weighted feature vector of each sub-block of the two-dimensional time-frequency spectrogram of the sound signal under abnormal transformer operation.
[0110] The weight vector A of the frequency importance value of each frequency band of the sound signal under abnormal transformer operation state is used. f sequentially with the one-dimensional feature vector of each sub-block By multiplying element-wise and weighting the two-dimensional time-frequency spectrogram of the sound signal under abnormal transformer operation according to frequency importance, the weighted feature vector of each sub-block of the two-dimensional time-frequency spectrogram of the sound signal under abnormal transformer operation is obtained. As shown in the formula below:
[0111]
[0112] S6: The weighted feature vector of each sub-block of the two-dimensional time-frequency spectrogram of the sound signal under abnormal transformer operation. By sequentially splicing the data, the machine acoustic signature matrix of the sound signal under abnormal transformer operation conditions is obtained. As shown in the formula below:
[0113]
[0114] S7: Machine voiceprint feature matrix of sound signals under abnormal transformer operation conditions. Input the fault diagnosis model to obtain the fault diagnosis results of the transformer;
[0115] In this embodiment, SVM, BP, and LSTM models are used as fault diagnosis models, respectively. Figure 3This is a schematic diagram illustrating the intermediate process results of the sound signal characterization modeling part of a transformer under an abnormal state using the proposed machine acoustic signature feature modeling method in this embodiment. (a) is the original spectrogram of the sound signal under the abnormal operating state of the transformer; (b) is the sorted two-dimensional time-frequency spectrogram of the sound signal under the abnormal operating state of the transformer; (c) is the pooling vector; and (d) is the machine acoustic signature feature matrix of the sound signal under the abnormal operating state of the transformer. To verify the effectiveness of this invention in modeling sound signals under the operating state of industrial machines in practice, this embodiment selected audio signal samples under an abnormal state of a transformer and sequentially performed the extraction of the original spectrogram, time-dimension sorting, construction of the pooling vector, and output of the corresponding time-frequency feature characterization matrix, respectively corresponding to... Figure 3 As shown in (a), (b), (c), and (d) in the figures, the results clearly demonstrate that the present invention can effectively model voiceprint representation by further compressing redundant information while preserving the main temporal and frequency structures. Furthermore, from the figures... Figure 3 (d) The modeled voiceprint feature matrix results show that the proposed method can highlight the main energy regions of abnormal sound signals and effectively suppress invalid frequency bands and time frames, thus providing more robust and discriminative input features for subsequent diagnostic models. Therefore, this also verifies the practicality and effectiveness of this invention in industrial scenarios.
[0116] Table 1 shows the comparison results of the recognition accuracy of three fault diagnosis models for different fault diagnosis models of transformers before and after using the machine voiceprint feature modeling method proposed in this embodiment. The three fault diagnosis models are SVM, BP, and LSTM. To verify the effectiveness and practicality of the present invention in modeling the operating sound signals of industrial equipment, this embodiment uses a dataset of actual transformer operating status audio data, and trains and evaluates the features extracted by the traditional method of directly using the original spectrogram feature extraction and the voiceprint representation modeling method proposed in this invention. To ensure the fairness of the experiment, the ratio of training set to test set is 4:1, and the three fault diagnosis models commonly used in practical applications are selected, while keeping their default parameter settings unchanged. The recognition accuracy results of the three fault diagnosis models using the traditional method and the method of the present invention are shown in Table 1. As can be seen from the results in Table 1, after using the feature modeling method of the present invention, the recognition accuracy of the three diagnostic models is improved compared with the basic feature extraction method, thus proving that the feature representation extracted by the invention helps the model learn key information that is conducive to fault identification.
[0117] Table 1. Comparison of recognition results of the three fault diagnosis models before and after the voiceprint feature modeling method in this embodiment.
[0118]
[0119] Through the above Figure 3 The experimental results in Table 1 verify the effectiveness and universality of the present invention in improving the ability to model the voiceprint features of industrial sound signals. It can be applied as a general voiceprint feature extraction and modeling solution to the fault detection and diagnosis of industrial equipment, significantly improving the recognition accuracy and robustness of the model.
[0120] Example 4
[0121] This embodiment proposes an electronic device, including: one or more processors, and a memory, wherein the memory is used to store instructions, and when the instructions are executed by the one or more processors, the one or more processors execute the machine voiceprint feature modeling method.
[0122] The electronic device may be a mobile phone, computer, or tablet computer, etc., and includes a memory and a processor. The memory stores a computer program, which, when executed by the processor, implements the machine voiceprint feature modeling method as described in the embodiments. It is understood that the electronic device may also include an input / output (I / O) interface and communication components.
[0123] The processor is used to execute all or part of the steps in the machine voiceprint feature modeling method as described in the above embodiments. The memory is used to store various types of data, which may include, for example, instructions for any application or method in the electronic device, as well as application-related data.
[0124] The processor can be implemented as an Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), controller, microcontroller, microprocessor, or other electronic components, and is used to execute the machine voiceprint feature modeling method described in the above embodiments.
[0125] Example 5
[0126] This embodiment proposes a computer-readable storage medium that stores executable instructions. When these instructions are executed, if they are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
[0127] The computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the machine voiceprint feature modeling method described in the various embodiments of this application.
[0128] The aforementioned storage media include: flash memory, hard disks, multimedia cards, card-type memory (e.g., SD (Secure Digital Memory Card) or DX (Memory Data Register, MDR) memory), random access memory (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic storage, disks, optical discs, servers, APP (Application) application stores, and other media capable of storing program verification codes. These media store computer programs, which, when executed by a processor, can implement the various steps of the aforementioned machine voiceprint feature modeling method.
[0129] Example 6
[0130] This embodiment proposes a computer program product, including a computer program or instructions, which, when executed by a processor, implements the machine voiceprint feature modeling method.
[0131] Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or part of the technical solution, can be embodied in the form of a computer program product.
[0132] The various embodiments in this application are described in a progressive manner. The same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on describing the differences from other embodiments.
[0133] The scope of protection of this application is not limited to the embodiments described above. Obviously, those skilled in the art can make various modifications and variations to this disclosure without departing from the scope and spirit of this disclosure. If such modifications and variations fall within the scope of this disclosure and its equivalents, then the intent of this disclosure also includes these modifications and variations.
Claims
1. A machine voiceprint feature modeling method, characterized in that, Includes the following steps: Step 1: Collect the sound signals of industrial equipment operation, preprocess the sound signals of industrial equipment operation, and generate a two-dimensional time-frequency sound spectrum; Step 2: Divide the two-dimensional time-frequency spectrogram into several non-overlapping sub-blocks; Step 3: Sort the time frames in each sub-block of the two-dimensional time-frequency spectrogram according to their importance based on the time dimension, and then convert each sub-block of the two-dimensional time-frequency spectrogram into a one-dimensional vector. Perform weighted pooling on the sorted sub-blocks to obtain a one-dimensional feature vector for each sub-block. Step 4: Calculate the frequency attention weights of the two-dimensional time-frequency spectrogram; Step 5: Based on the frequency attention weights of the two-dimensional time-frequency spectrogram and the one-dimensional feature vector of each sub-block, calculate the weighted feature vector of each sub-block of the two-dimensional time-frequency spectrogram; Step 6: Concatenate the weighted feature vectors of each sub-block sequentially to obtain the machine voiceprint feature matrix; Step 3 includes: Step 3.1: For each sub-block of the two-dimensional time-frequency spectrogram, sort the energy of the frequency frames of each sub-block according to the time dimension. The sorted two-dimensional time-frequency spectrogram is as follows: ; Step 3.2: Use learnable pooling vectors For sorted sub-blocks Weighted pooling is performed to obtain a one-dimensional feature vector representing the voiceprint characteristics of each sub-block within a local time period. As shown in the formula below: Where T represents the matrix transpose calculation, and r∈[0,1] are learnable parameters. Let r be the sum of the learnable parameters r.
2. The machine voiceprint feature modeling method according to claim 1, characterized in that, Step 1, the preprocessing of the industrial equipment operating sound signal, includes: pre-emphasis, framing, windowing, noise reduction, and short-time Fourier transform operations on the industrial equipment operating sound signal; and using Mel transform to convert the one-dimensional audio signal into a two-dimensional time-frequency spectrogram. Where F is the frequency dimension and T is the time dimension.
3. The machine voiceprint feature modeling method according to claim 1, characterized in that, The specific method for step 2 is as follows: The two-dimensional time-frequency spectrogram X is divided into N non-overlapping sub-blocks of length W along the time dimension T. , Where the time length W = T / N.
4. The machine voiceprint feature modeling method according to claim 1, characterized in that, Step 4 includes: Step 4.1: Calculate the frequency importance value of the two-dimensional time-frequency spectrogram; Step 4.2: Calculate the frequency attention weight of the two-dimensional time-frequency spectrogram based on the frequency importance value of the two-dimensional time-frequency spectrogram.
5. The machine voiceprint feature modeling method according to claim 4, characterized in that, The specific method for step 4.1 is as follows: Calculate the global average value of each frequency band f∈F in the time dimension T of the two-dimensional time-frequency spectrogram X, and use it as a frequency importance value to measure the overall energy level of the frequency band. As shown in the formula below: Frequency importance value A larger value indicates that the frequency band f is more important.
6. The machine voiceprint feature modeling method according to claim 5, characterized in that, The specific method for step 4.2 is as follows: A multilayer perceptron network is used, and a weight vector for the frequency importance value of each frequency band is calculated through a Softmax normalization layer. As shown in the formula below: Where K and b are both learnable parameters, and .
7. The machine voiceprint feature modeling method according to claim 1, characterized in that, The specific method for step 5 is as follows: The weight vector A of the frequency importance values for each frequency band. f sequentially with the one-dimensional feature vector of each sub-block Element-wise multiplication and weighting according to frequency importance of the two-dimensional time-frequency spectrogram yields the weighted feature vector of each sub-block of the two-dimensional time-frequency spectrogram. As shown in the formula below: 。 8. The machine voiceprint feature modeling method according to claim 1, characterized in that, The specific method for step 6 is as follows: The feature vector of each sub-block By concatenating these components sequentially, the machine voiceprint feature matrix is obtained. As shown in the formula below: 。 9. A machine voiceprint feature modeling system, which performs machine voiceprint feature modeling based on the method described in claim 1, characterized in that, include: Industrial acoustic signal data acquisition module, sound preprocessing and noise reduction module, machine acoustic text characterization and modeling module; The industrial sound signal data acquisition module is used to collect the sound signals of industrial equipment operation, store the collected sound signals of industrial equipment operation in the form of audio signals, and transmit them to the sound preprocessing and noise reduction module. The sound preprocessing and denoising module is used to preprocess the collected sound signals of industrial equipment operation, and uses Mel transform to convert the one-dimensional audio signal into a two-dimensional time-frequency spectrogram, and transmits the two-dimensional time-frequency spectrogram to the machine acoustic text characterization and modeling module. The machine voiceprint representation and modeling module is used to process the two-dimensional time-frequency spectrogram. It divides the two-dimensional time-frequency spectrogram into several sub-blocks in the time dimension and extracts the one-dimensional feature vector of each sub-block. At the same time, it calculates the weighted feature vector of each sub-block based on frequency attention weights. Finally, it concatenates the weighted feature vectors of each sub-block in sequence to obtain the machine voiceprint feature matrix.
Citation Information
Patent Citations
Method for training transformer fault detection model, fault diagnosis method, and related device
US20250370068A1