A machine learning-based site micro-vibration source identification method

By using machine learning methods, the system collects signals using micro-vibration accelerometers and performs feature extraction and cluster analysis, solving the problem of identifying micro-vibration sources in the site, improving the vibration resistance of semiconductor manufacturing plants, and ensuring stable equipment operation.

CN115392284BActive Publication Date: 2026-06-23TIANJIN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TIANJIN UNIV
Filing Date
2022-07-14
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

There is a lack of effective methods in the current technology to identify and analyze the sources of micro-vibrations in a site, especially in the environment of semiconductor manufacturing plants, which affect the normal operation of precision equipment and product yield. Moreover, there are few engineering applications of this technology at home and abroad.

Method used

A machine learning-based approach was adopted, using micro-vibration acceleration sensors to collect signals. Through steps such as Welch overlapping piecewise averaging, spectral subtraction, MFCC feature extraction, dynamic time-warped distance clustering, and Gaussian mixture model, the micro-vibration sources of the site were identified and classified.

Benefits of technology

It enables accurate identification and classification of micro-vibration sources in the field, helping to improve vibration damping design, reduce economic losses, and ensure the normal operation of precision equipment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115392284B_ABST
    Figure CN115392284B_ABST
Patent Text Reader

Abstract

The application discloses a site micro-vibration vibration source identification method based on machine learning, signals measured by acceleration sensors at the edge of a measured site and acceleration sensors at the center of the measured site are processed in sequence, including denoising, transient impact signal extraction, feature extraction, clustering processing of dynamic time warping distance of the transient impact signal based on a feature matrix, GMM modeling, obtaining a model, verifying the number of independent Gaussian components of the model, selecting the number of components with the minimum value as the optimal model according to the Akaike information criterion value, and performing maximum a posteriori probability identification on the transient impact signals captured by the sensors; GMM parameter comparison is performed on sets that may belong to the same category, two sets with highly overlapped models are merged into the same set, and the set is used as vibration data sets generated around the site and having an impact on the center position of the site.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of micro-vibration signal measurement technology in electronic industrial plant environments, and is used to identify micro-vibration sources around the site. It mainly relates to a site micro-vibration source identification method based on machine learning.

[0002] Background Information

[0003] With the continuous development of my country's semiconductor industry, achieving independent processing and production of high-end chips has become an indispensable part of the development of domestic technology. In addition to the extremely high requirements for temperature, humidity and cleanliness of the factory, the most important aspect of the high-end chip manufacturing industry is to control the micro-vibration of the production site. Excessive micro-vibration will affect the normal operation of precision equipment and greatly reduce the product yield.

[0004] According to the "Technical Specification for Micro-vibration Prevention Engineering in the Electronic Industry" (GB 51076-2015) promulgated in my country in 2015, micro-vibration is defined as vibration with a time-domain amplitude below 10 μm and a vibration velocity below 1000 μm / s. For specific precision equipment, taking a universal tool microscope with a contact interferometer accuracy of 1 μm as an example, its maximum permissible vibration velocity is 300 μm / s. Micro-vibration in the surrounding environment of production plants and other similar environments has numerous micro-vibration sources, which can be approximated as a superposition of transient impact sources and some periodic sources. These include natural sources, rail transportation, power equipment, construction, and human-induced vibrations, all of which are potential sources of interference. To monitor and analyze such micro-vibration sources, it is necessary to extract and classify these vibration signals from long-term observation signals and identify signals acquired later. This allows us to understand the distribution location and other information of these potential sources, thereby improving the vibration isolation of the plant, protecting precision equipment, and reducing economic losses.

[0005] Currently, research on vibration source identification methods is relatively limited. Most methods developed in recent years utilize various machine learning techniques and blind signal separation algorithms based on the "cocktail party problem" in information theory. These methods offer better accuracy compared to earlier approaches using wavelet transform and multimodal analysis, and also solve the problem of identifying transient time-domain signals. Furthermore, there are few engineering applications of micro-vibration analysis and identification, both domestically and internationally. Most applications are in fiber optic sensors and geological activity measurement, leaving the identification and analysis of site micro-vibration sources virtually nonexistent.

[0006] Currently, based on the development of the entire machine learning field, problems can be divided into two main categories: supervised learning and unsupervised learning. For "unlabeled" classification problems such as micro-vibration source identification, clustering algorithms from machine learning are suitable. These algorithms classify existing vibration signal datasets by feature analysis, thereby obtaining specific information about each type of signal. Based on the core idea of ​​unsupervised learning, the following data processing approach can be comprehensively derived for the problem of micro-vibration source monitoring and analysis: Various vibration signals are extracted from a large amount of long-term site monitoring signals; features are extracted from the extracted signals and clustered based on the feature vectors; and the clustered datasets are modeled using mathematical models to obtain unified characteristics that can represent this type of vibration source. Summary of the Invention:

[0007] The purpose of this invention is to propose a machine learning-based method for identifying the source of micro-vibration in a site. This method is applied to the processing and analysis of long-term micro-vibration monitoring data in laboratories and semiconductor manufacturing vibration isolation plants that house precision equipment. The aim is to classify the micro-vibrations emitted by different vibration sources around the site that have been captured by long-term monitoring, and then analyze the impact of various vibration sources on the site. At the same time, it can determine whether different sensors have captured the vibration signal emitted by the same vibration source, thereby improving the vibration isolation design. Furthermore, it can determine whether the same type of vibration source still exists in the improved vibration isolation site.

[0008] This invention specifically includes the following:

[0009] A machine learning-based method for identifying the source of micro-vibrations in a site includes the following steps:

[0010] S1: Utilizing a micro-vibration acceleration sensor at the edge of the tested area Micro-vibration acceleration sensor at the center of the test site Signal x was measured respectively i and y i The power spectrum of the signal is estimated using the Welch overlapping piecewise averaging method to obtain the respective dominant frequency components. and Then, amplitude square coherence calculation is performed, the peak value of the calculated amplitude square coherence result is extracted, and the corresponding common frequency f is returned. kcom ,Will The frequencies are determined to be k frequencies from the periphery of the field that will affect the center of the field. The periodic vibration source;

[0011] S2: For the sensor The collected discrete time-domain signal x(n) is denoised by spectral subtraction to obtain the denoised frequency-domain signal s(ω) and the denoised time-domain signal s(n) is recovered.

[0012] S3: An endpoint detection algorithm based on two-dimensional Gaussian distribution is used to extract transient impact signals from the denoised time-domain signal s(n);

[0013] S4: Perform MFCC feature extraction on the extracted n transient impact signals s(i), i∈1,...,n, and perform energy normalization on the extracted MFCC features to obtain the feature matrix EM of each transient impact signal. i ;

[0014] S5: Based on the characteristic matrix EM, the transient impact signal is... i Clustering processing of dynamic time warp distance was performed, and K-medoids was selected as the clustering method. The number of clusters K was determined by the contour coefficient method and the connected components of the cross-correlation undirected graph, resulting in K sets S(k), k∈K. The number of members I(k), k∈K of each set was counted to obtain the vibration frequency of different vibration sources in long-term real-time micro-vibration monitoring.

[0015] S6: The set of all characteristic matrices of set S(k) To perform GMM modeling, we treat the row vectors as independent variables to obtain model m. k (c), c∈(2,...,n), model parameters (Σ1,...Σ i ,μ1,...μ i ,σ1,...σ i The model is obtained through iterative processing using the EM algorithm. The number of independent Gaussian components c in the model is verified. Based on the Akaike Information Criterion, the model with the smallest number of components is selected as the optimal model, which is then used as the final model for the set S(k).

[0016] S7: For the sensor The acquired time-domain discrete signal y(n) is processed by repeating steps S2-S6 above;

[0017] S8: Place the sensor The captured transient impulse signals t(i), i∈1,...,n Maximum a posteriori probability identification is performed; sets that may belong to the same class are compared using GMM parameters, and sets with highly overlapping models are merged into a single set. Statistical analysis is then performed to obtain a dataset of vibrations around the site that affect the center of the site.

[0018] In the above technical solution, step S2 specifically includes:

[0019] S2-1: Obtain the noise spectrum estimate E by performing a Fourier transform on the discrete-time signal x(n). mean (ω);

[0020] S2-2: Perform spectral subtraction noise reduction on the original vibration signal. The specific expression is as follows:

[0021]

[0022] in, Let |Y(ω)| be the noise-reduced vibration signal, and |Y(ω)| be the original vibration signal. Perform IFFT to obtain the denoised time-domain signal s(n).

[0023] In the above technical solution, step S3 specifically includes:

[0024] S3-1: For the training dataset, first perform frame segmentation. Then, perform STFT transformation on the segmented data x(i) to obtain the frequency domain signal X(k), as shown in the formula:

[0025] framesize is the frame length, and N is the number of sampling points per frame.

[0026] S3-2: Divide each frame of signal converted to the frequency domain into n frequency sub-bands with unequal bandwidths, and calculate the energy P of each sub-band. j The upper and lower limits of the subband are respectively and

[0027] f j size For the bandwidth of the j-th subband

[0028] S3-3: Training data Y containing transient impact waveforms and silent state waveforms i Constructing a feature matrix from frame-by-frame energy features:

[0029] a mn Energy representing the nth subband of the mth frame data

[0030] S3-4: Use a clustering algorithm to binary classify each row vector of the transient impact waveform and the silent state waveform into different clusters. Since the feature variable is energy magnitude, the Euclidean distance is chosen as the distance metric for clustering.

[0031] Where a, b∈(1,2...m)

[0032] For each clustering result, construct a Gaussian model with dimension two as shown in the following formula. The parameters of the model are obtained by the EM algorithm through multiple iterations.

[0033]

[0034]

[0035] Where, φ i The weights for each independent Gaussian distribution, μ i , Σ i For expectation and variance;

[0036] S3-5: Perform the same frame division and frequency sub-band energy calculation on the input waveform data, and calculate the probability P of each frame for both. s (x) and P n (x), determine whether the i-th frame contains a transient impact signal according to the likelihood ratio test criterion:

[0037]

[0038]

[0039] Among them, P s (x,i,j) and P n (x,i,j) represents the probability that the j-th feature variable in the i-th frame belongs to the transient impact or the silent state. The probability that each frame contains a transient impact is obtained by thresholding the likelihood ratio:

[0040] T ξ With T η Energy thresholds for each frame and sub-band obtained from the training dataset

[0041] For two adjacent frames that partially overlap, take F. AD i With F AD i+1 The union of the values ​​is used as the determination result for this frame, and consecutive values ​​and intervals within 20 frames are considered as F. AD i The frame with value 1 is considered as a complete transient impact waveform.

[0042] In the above technical solution, step S4 specifically includes:

[0043] S4-1: Truncate the transient impulse signal s(i), i∈1,...,n into frames of the same length as the frame length in step S3-1, and perform Mel filter bank filtering. Replace the traditional Mel scale and Hertz conversion relationship with the following formula:

[0044]

[0045] S4-2: Perform MFCC feature extraction on s(i) with transformed scale to form the feature matrix M. i M i Energy normalization is performed as follows:

[0046]

[0047] CM i (t,f)=(1-s)CM i (t-1,f)+sM i (t,f)

[0048] Among them, α, s, r, and δ are the main parameters, selected based on the micro-vibration sensor chosen in the project and the amplitude distribution of the measured vibration data. s = 0.025, α = 0.98, δ = 2, r = 0.5, ε = 10. -6 ;

[0049] S4-3: Obtain the characteristic matrix EM belonging to s(i) i :

[0050]

[0051] Where row T represents the number of frames for each independent transient impulse, and column J represents the coefficients of the Mel filter bank.

[0052] In the above technical solution, step S5 specifically includes:

[0053] S5-1: Perform K-medoids clustering based on DTW distance:

[0054] (1) Equate the characteristic matrix EM of each s(i) i Select the K points with the largest mutual DTW distance as the initial centroids;

[0055] (2) Cluster according to the nearest principle, calculate the DTW distance from each s(i) to the centroid of the K classes in turn, and assign all remaining s(i) to the nearest class according to the principle of the closest distance to the centroid of the K classes, forming K classes;

[0056] (3) Redetermine the centroid and cost function for each class:

[0057]

[0058] In the formula, SUM k (i) is the sum of the DTW distances from s(i) to other transient impacts of the same type k; find the SUM in this type. k (i) The smallest s(i) and the SUM of the original centroid k (i) Compare the two and select the one with the smaller cost function value as the new centroid;

[0059] (4) Repeat steps (2) and (3) until the centroids no longer change or the maximum number of iterations is reached, to obtain the K-classified data.

[0060] S5-2: The optimal number of clusters K is determined by combining the contour coefficient method and the cross-correlation of the connected components of the undirected graph;

[0061] (1) Using K as a variable, perform multiple clustering operations on all s(i), where K takes values ​​in the range [2, 2K]. c The optimal K value K is obtained by the profile coefficient method. s (j):

[0062] K s (j) = peakfinder(c sum (k)), peakfinder is the peak extraction function, K s (j) represents j distinct peak values, and the silhouette coefficient is...

[0063] Where a(i) and b(i) represent the DTW distance from element i to the centroid of its cluster and the DTW distance from the element to the next nearest centroid of the cluster, respectively;

[0064] (2) Obtaining the optimal K value K using the cross-correlation undirected graph connected component method c Assuming the normalized cross-correlation value of the same transient impact signal is above 0.8, then all s(i):

[0065] 1) Calculate the cross-correlation between all pairs of s(i), take the maximum value, and form a cross-correlation matrix:

[0066] a ij The maximum cross-correlation value between s(i) and s(j)

[0067] 2) To Thresholding is performed to obtain the cross-correlation adjacency matrix.

[0068]

[0069] Among them, L 2 (i) represents the square of the maximum amplitude of the transient impact signal s(i);

[0070] 3) For the cross-correlation adjacency matrix Find all its connected components and denote the number of paths as K. c ;

[0071] (3) The optimal K value is determined by K s (i) and K c Jointly determined:

[0072] K = min{K s (i)|K s (i)≥K c}

[0073] In the above technical solution, step S8 specifically includes:

[0074] S8-1: Identify a single t(i) and determine it to be a sensor. Types of captured transient impact signals k * :

[0075]

[0076] Simultaneously, it is necessary to determine whether t(i) is a transient impact signal emitted by an external vibration source, which requires k * A threshold determination is performed to identify it as a transient impact signal of the same type within the set. Where T is the EER or high FRR threshold, which needs to be determined comprehensively using the training dataset during model training.

[0077] S8-2: For sensors and The GMM models for various transient impact signals are determined separately, and it is assumed that each has a model. and The sum of the relative errors of each parameter and the DTW distance between the center points are calculated as the result of determining whether the two types of signals belong to the same class; let... Parameters Corresponding The parameter index is set to j, in Find with The closest of each And calculate the similarity between the two models.

[0078]

[0079] When i≠j, pad with zeros for any missing parameters.

[0080] S8-3: Traverse the ModelSimilarity of all clusters of the two sensors, find the smallest pair, and compare the DTW distance between their center points. If the distance is less than the average distance within the cluster, then the two models are considered to describe the signal dataset emitted by the same vibration source.

[0081] The advantages and beneficial effects of this invention are as follows:

[0082] The machine learning-based site micro-vibration source identification method proposed in this invention is a complete source identification scheme. It can classify the vibration signals captured by a single sensor deployed in the site using limited data, and model different types of vibration signals to facilitate the later detection, elimination and vibration isolation of vibration sources. At the same time, it can determine whether different sensors are affected by the same vibration source. Attached Figure Description

[0083] Figure 1 This is a flowchart of the present invention.

[0084] Figure 2 It is the time-domain signal of the measured micro-vibration velocity.

[0085] Figure 3 This is the WOSA power spectrum estimation result.

[0086] Figure 4 It is the result of amplitude square coherence.

[0087] Figure 5 This is the result of the endpoint detection algorithm.

[0088] Figure 6 It is a transient impact waveform EN-MFCC characteristic.

[0089] Figure 7 These are three datasets demonstrating the clustering effect.

[0090] Figure 8 This is a visualization of the t-SNE clustering results.

[0091] For those skilled in the art, other related figures can be obtained from the above figures without any creative effort. Detailed implementation method:

[0092] To enable those skilled in the art to better understand the present invention, the technical solution of the present invention will be further described below with reference to specific embodiments. An example demonstration is provided using time-domain data collected by a micro-vibration sensor deployed at the test site. The data is as follows: Figure 2 As shown, the sampling rate is 3200 SPS, and the data is one hour of measurement data in a laboratory where the vibration is relatively stable.

[0093] A machine learning-based method for identifying the source of micro-vibrations in a site, the flowchart of which is shown below. Figure 1 As shown, the specific steps are as follows:

[0094] S1: To determine the location of periodic vibration sources at the center and edge of the site, the following steps are performed:

[0095] S1-1: Assume a sensor is positioned at the center of the site. and sensors placed in the edge area of ​​the site Data x were measured respectively i and y i Select the data within the specified time range to be analyzed, or divide the data into L segments, x i (n), y i(n) represents the i-th data segment, whose dominant frequency components are obtained by power spectrum estimation of the signal using the Welch Overlapping Segmented Average (WOSA) method. and The calculation results of randomly selecting data within the same time period in this embodiment are as follows: Figure 3 As shown.

[0096] S1-2: For x i and y i Perform cross-power spectral density estimation:

[0097]

[0098] S1-3: For x i and y i Perform amplitude square coherence calculation:

[0099]

[0100] peak f kcom As x i and y i The common frequency. In this embodiment, the result is as follows: Figure 4 As shown, the common frequency f of the two is obtained by peak lookup with a threshold of 0.08. kcom =90.625, 118.75, 240.625, 256.25Hz. It can be considered that the vibration sources of the above frequencies existing at the edge of the site also caused varying degrees of interference to the center of the site. Of course, if we use this data to find the actual vibration source emitting this frequency, we need to optimize the parameters of frequency domain conversion and spectrum estimation accuracy. From the above results, we can analyze that the first two pairs of peaks represent a periodic vibration source of about 100Hz, while the latter two pairs represent a periodic vibration source with a frequency of about 248Hz.

[0101] Next, in order to identify the transient impact sources contained in the collected data, the sensor... The acquired discrete-time signal x(n) is then processed.

[0102] S2: For the sensor The acquired discrete-time signal x(n) is denoised using spectral subtraction to obtain the denoised frequency-domain signal s(ω), and the denoised time-domain signal s(n) is then recovered. The noise spectrum can be obtained by manually extracting a representative long-term noise signal or by using the mean value of a set of silent state segments obtained by the endpoint detection algorithm in step S3. Step S2 specifically includes:

[0103] S2-1: Obtain the noise spectrum estimate E by performing a Fourier transform on the discrete-time signal x(n). mean (ω).

[0104] S2-2: Perform spectral subtraction noise reduction on the original vibration signal. The specific expression is as follows:

[0105]

[0106] in, Let |Y(ω)| be the noise-reduced vibration signal, and |Y(ω)| be the original vibration signal. Perform IFFT to obtain the denoised time-domain signal s(n).

[0107] S3: Transient impulse extraction is performed on the obtained denoised time-domain signal s(n), using an endpoint detection algorithm based on a two-dimensional Gaussian distribution. Step S3 specifically includes:

[0108] S3-1: The training dataset is first processed by frame segmentation. Since transient impacts are usually short, the frame length is approximately 0.01s. For signals with a sampling frequency Fs, the frame length framesize is set based on engineering experience. (In this embodiment, the framesize of the signal with a sampling frequency of 3200 SPS is set to...) The inter-frame overlap can be set to 25%-50% of the frame length (in this embodiment, the inter-frame overlap is set to 40 samples). The STFT transform is performed on the segmented data x(i) to obtain X(k), as shown in the formula:

[0109]

[0110] S3-2: Calculate the frequency band energy for each frame of signal converted to the frequency domain. Since the micro-vibration sensor differs from ordinary accelerometers in that it has extremely high sensitivity and a frequency response range concentrated in the low-frequency band, assuming the highest effective vibration frequency captured by the micro-vibration sensor is around 1600Hz, divide it into 6 frequency sub-bands with varying bandwidths [10-100; 100-300; 300-500; 500-800; 800-1100; 1100-1600]. Calculate the energy P of each sub-band. j The upper and lower limits of the subband are respectively and

[0111]

[0112] S3-3: Training data Y containing transient and non-transient impulse waveforms (hereinafter referred to as "silent state waveforms") i Constructing a feature matrix from frame-by-frame energy features:

[0113] a mnEnergy representing the nth subband of the mth frame data

[0114] S3-4: Use a clustering algorithm to binary classify each row vector of the transient impact waveform and the silent state waveform into different clusters. Since the feature variable is energy magnitude, the Euclidean distance is chosen as the distance metric for clustering.

[0115]

[0116] When selecting the training dataset, we should try to ensure that the transient impact waveforms collected by the micro-vibration sensor are as diverse as possible to improve the accuracy of the endpoint detection algorithm in the later stage. We construct a Gaussian model with a dimension of two according to Equation (7) based on the clustering results. The parameters of the model are obtained by the EM algorithm through multiple iterations.

[0117] The multidimensional Gaussian distribution model is as follows:

[0118]

[0119] Where, φ i The weights for each independent Gaussian distribution, μ i , Σ i Let these be the expected value and the variance.

[0120] S3-5: Perform the same frame division and frequency sub-band energy calculation on the input waveform data, and calculate the probability P of each frame for both. s (x) and P n (x), based on the likelihood ratio test criterion, determine whether the i-th frame contains a transient impact signal rather than a silent state:

[0121]

[0122] Among them, P s (x,i,j) and P n (x,i,j) represents the probability that the j-th feature variable in the i-th frame belongs to either a transient shock or a silent state. Thresholding the likelihood ratio yields the probability that each frame contains a transient shock:

[0123]

[0124] For overlapping two adjacent frames, take F AD i With F AD i+1 The union of these values ​​is used as the determination result for this frame. The detection result is as follows: Figure 5 As shown, the blue line represents the probability of a transient impulse signal at that moment. The F values ​​are then compared between consecutive frames and intervals of up to 20 frames. AD i The frame with value 1 is considered as a complete transient impact waveform.

[0125] S4: Perform EN-MFCC (Energy Normalized MFCC) feature extraction on the extracted n transient impact signals s(i). Since the micro-vibration sensor has a better frequency response in the low-frequency region, the traditional MFCC Mel-scale expression is discarded, and a frequency scale that better matches the characteristics of the micro-vibration signal is selected. The extracted MFCC features are then subjected to energy normalization to obtain the EN-MFCC feature matrix EM for each transient impact signal. i The specific steps are as follows:

[0126] S4-1: Truncate the transient impact signal s(i) into frames of the same length as the framesize in step S3-1, and perform Mel filter bank filtering. Based on the low-frequency sensitivity characteristics of micro-vibration sensors in engineering, the traditional conversion relationship between Mel scale and Hertz is replaced by the following formula:

[0127]

[0128] S4-2: Perform MFCC feature extraction on s(i) with transformed scale to form the feature matrix M. i To ensure that the amplitudes of each s(i) are not affected by each other, M is... i Energy normalization is performed as follows:

[0129]

[0130] Among them, α, s, r, and δ are the main parameters, selected based on the micro-vibration sensor chosen in the project and the amplitude distribution of the measured vibration data. Here is a set of practically usable parameter settings: s = 0.025, α = 0.98, δ = 2, r = 0.5, and ε is a very small quantity to ensure that the divisor is not zero, usually taken as ε = 10. -6 S4-3: Obtain the characteristic matrix EM belonging to s(i). i :

[0131]

[0132] Where row T represents the number of frames for each independent transient impulse, and column J represents the coefficients of the Mel filter bank. In this embodiment, as... Figure 6 The diagram shows the feature matrix of four individual transient impulse signals, where the X-axis represents the frame number of each independent transient impulse and the Y-axis represents the EN-MFCC coefficients.

[0133] S5: Based on the characteristic matrix EM, the transient impact signal is... iK-medoids clustering based on dynamic time-warped (DTW) distance is performed, with the number of clusters K determined by the silhouette coefficient method and the connected components of the cross-correlation undirected graph (CUG). This yields K sets S(k), k∈K. The number of members I(k), k∈K in each set is then statistically analyzed to represent the vibration frequency of different vibration sources in long-term real-time micro-vibration monitoring. The specific steps are as follows:

[0134] S5-1: Perform K-medoids clustering based on DTW distance until the centroids no longer change or the maximum number of iterations is reached.

[0135] (1) Equate the characteristic matrix EM of each s(i) i Select the K points with the largest mutual DTW distance as the initial center points (centroids);

[0136] (2) Clustering is performed according to the nearest principle. The DTW distance from each s(i) to the centroids of the K classes is calculated in turn, and all remaining s(i) are assigned to the nearest classes according to the principle of proximity to the centroids of the K classes, forming K classes;

[0137] (3) Redetermine the centroid and cost function for each class:

[0138]

[0139] In the formula, SUM k (i) is the sum of the DTW distances from s(i) to other transient impacts of the same type k; find the SUM in this type. k (i) The smallest s(i) and the SUM of the original centroid k (i) Compare the two and select the one with the smaller cost function value as the new centroid.

[0140] (4) Repeat steps (2) and (3) until the centroids no longer change or the maximum number of iterations is reached, to obtain the K-classified data.

[0141] S5-2: Determine the optimal number of clusters K, which is also the type of vibration signal. Here, a method is adopted that combines the silhouette coefficient method and the connected components of the cross-correlation undirected graph (CUG).

[0142] (1) Using K as a variable, perform multiple clustering operations on all s(i), where K takes values ​​in the range [2, 2K]. c The optimal K value K is obtained by the profile coefficient method. s (j):

[0143] K s (j) = peakfinder(c sum(k)), peakfinder is the peak extraction function, K s (j) represents j distinct peak values.

[0144] Profile coefficient

[0145] Where a(i) and b(i) represent the DTW distance from element i to the centroid of its cluster and the DTW distance from the element to the next nearest centroid of the cluster, respectively.

[0146] (2) Obtaining the optimal K value K using the connected component method of cross-correlated undirected graph (CUG). c Assuming the normalized cross-correlation value of the same transient impact signal is above 0.8, then all s(i):

[0147] 1) Calculate the cross-correlation between all pairs of s(i), take the maximum value, and form a cross-correlation matrix:

[0148] a ij The maximum cross-correlation value between s(i) and s(j) is (14).

[0149] 2) To Thresholding is performed to obtain the cross-correlation adjacency matrix.

[0150]

[0151] Among them, L 2 (i) represents the square of the maximum amplitude of the transient impact signal s(i).

[0152] 3) For the cross-correlation adjacency matrix Find all its connected components and denote the number of paths as K. c .

[0153] (3) The optimal K value is determined by K s (i) and K c Jointly determined:

[0154] K = min{K s (i)|K s (i)≥K c} (16)

[0155] The results were divided into K categories. To better demonstrate the classification performance of this method, three datasets obtained in the laboratory are selected, such as... Figure 7 As shown, eight transient impact signals were selected for classification, and the results were visualized using the t-SNE dimensionality reduction method, as follows. Figure 8 As shown.

[0156] S6: The set of all characteristic matrices of set S(k) To perform GMM (Gaussian Mixture Model) modeling, we treat the row vectors, i.e., the frame vectors, as independent variables to obtain model m. k (c), c∈(2,...,n), model parameters (Σ1,...Σ i ,μ1,...μ i ,σ1,...σ i The value is obtained through iteration using the EM algorithm. The number of independent Gaussian components c in the model is verified, and the AIC(m) is selected based on the Akaike Information Criterion (AIC) value. k (c) The smallest GMM model is used as the final model. Step S6 specifically includes:

[0157] S6-1: For class K To perform GMM modeling, according to equation (7), the row vectors of the entire feature matrix of each class are used as independent elements to model c independent Gaussian component mixture models. The parameters (Σ1,...Σ) of each Gaussian component are obtained using the EM algorithm. i ,μ1,...μ i ,σ1,...σ i The model is denoted as m. k (c), c∈(2,...,n), where n is usually an integer less than or equal to 40.

[0158] S6-2: For model m k (c) Determining the optimal number of components c: Using the Akaike Information Criterion (AIC) value, select components that maximize AIC(m k (c) The smallest c value is used as the final number of components to obtain the final model.

[0159] S7: For the sensor The acquired time-domain discrete signal y(n) is processed by repeating steps S2-S6 above.

[0160] S8: Place the sensor The captured transient impulse signals t(i), i∈1,...,n Maximum a posteriori probability identification is performed; sets that may belong to the same class are compared using GMM parameters, and sets with highly overlapping models are merged into a single set. Statistical analysis is then performed to obtain a dataset of vibrations around the site that affect the center of the site.

[0161] With sensors and For example, if we want to determine Is the captured or subsequently captured transient impact signal t(i) related to the sensor? If a certain type of signal belongs to the same type of signal, then EN-MFCC feature extraction can be performed on t(i), the probability density can be calculated for each GMM model, and a judgment can be made based on the threshold. Meanwhile, if it is necessary to compare sensors... and When determining whether two types of vibration sources are the same type of vibration source, it can be determined based on ModelSimilarity and the DTW distance between the center points of the two types.

[0162] Step S8 specifically includes:

[0163] S8-1: Identify a single t(i) and determine it to be a sensor. Types of captured transient impact signals k * :

[0164]

[0165] Simultaneously, it is necessary to determine whether t(i) is a transient impact signal emitted by an external vibration source, which requires k * A threshold determination is performed to identify it as a transient impact signal of the same type within the set. Where T can be either EER or a high FRR threshold, which needs to be determined comprehensively using the training dataset during model training.

[0166] S8-2: For sensors and The GMM models for various transient impact signals are determined separately, and it is assumed that each has a model. and The sum of the relative errors of each parameter and the DTW distance between the center points are calculated as the result of determining whether the two types of signals belong to the same class. Let... Parameters Corresponding The parameter index is set to j, in Find with The closest of each And calculate the similarity between the two models.

[0167]

[0168] When i≠j, pad with zeros for any missing parameters.

[0169] S8-3: Traverse the ModelSimilarity of all clusters of the two sensors, find the pair with the smallest similarity, and compare the DTW distance between the center points of the two. When the distance is less than the average distance within the cluster, it can be considered that the two models describe the signal dataset emitted by the same vibration source.

[0170] This invention effectively solves the problem of classifying vibration signals from different sources in long-term micro-vibration monitoring data, making it easier to analyze the micro-vibration sources around the site. It can also identify whether transient impact signals captured by different sensors or by the same sensor belong to a certain type of known vibration source, which is helpful in finding micro-vibration sources and vibration isolation treatment in the later stage. Furthermore, it can determine whether the same type of vibration source still exists in the improved vibration isolation site.

Claims

1. A method for identifying the source of micro-vibrations in a site based on machine learning, characterized in that, Includes the following steps: S1: Utilizing a micro-vibration acceleration sensor at the edge of the tested area Micro-vibration acceleration sensor at the center of the test site The signals were measured respectively The power spectrum of the signal is estimated using the Welch overlapping piecewise averaging method to obtain the respective dominant frequency components. The amplitude square coherence calculation is performed, the peak values ​​of the calculated amplitude square coherence results are extracted, and the corresponding common frequencies are returned. ,Will The frequencies are determined to be k frequencies from the periphery of the field that will affect the center of the field. The periodic vibration source; S2: For the sensor Acquired time-domain discrete signals Spectral subtraction is performed to reduce noise, resulting in the denoised frequency domain signal. And recover the denoised time-domain signal. ; S3: An endpoint detection algorithm based on a two-dimensional Gaussian distribution is used to process the denoised time-domain signal. Extracting transient impact signals; S4: For the extracted A transient impact signal MFCC feature extraction is performed, and the extracted MFCC features are then subjected to energy normalization to obtain the feature matrix of each transient impact signal. ; S5: Based on the feature matrix, the transient impact signal is used... Dynamic time-warped distance clustering was performed, with K-medoids chosen as the clustering method. The number of clusters K was determined by a combination of silhouette coefficients and cross-correlation of the connected components of the undirected graph. a set and the number of members in each set Statistical analysis was conducted to determine the vibration frequency of different vibration sources during long-term real-time micro-vibration monitoring. S6: For sets The set of characteristic matrices To perform GMM modeling, row vectors are treated as independent variables to obtain the model. Model parameters ( The number of independent Gaussian components in the model is obtained through iterative analysis using the EM algorithm. The model was validated, and the optimal model was selected based on the number of components with the smallest value according to the Akaike Information Criterion, and this model was used as the set. Final Model ; S7: For the sensor Acquired time-domain discrete signals Repeat steps S2-S6 above; S8: Place the sensor Captured transient impact signal right Maximum a posteriori probability identification is performed; sets that may belong to the same class are compared using GMM parameters, and sets with highly overlapping models are merged into a single set. Statistical analysis is then performed to obtain a dataset of vibrations around the site that affect the center of the site.

2. The site micro-vibration source identification method based on machine learning according to claim 1, characterized in that, Step S2 specifically includes: S2-1: For discrete-time signals The noise spectrum is estimated by performing a Fourier transform. ; S2-2: Perform spectral subtraction noise reduction on the original vibration signal. The specific expression is as follows: ;in, and to Perform IFFT to obtain the denoised time-domain signal .

3. The method for identifying site micro-vibration sources based on machine learning according to claim 1, characterized in that, Step S3 specifically includes: S3-1: First, perform frame segmentation on the training dataset, and then process the segmented data... Perform STFT transformation to frequency domain signal The formula is: , For frame length, The number of sampling points per frame; S3-2: Divide each frame of signal converted to the frequency domain into n frequency sub-bands with unequal bandwidths, and calculate the energy of each sub-band. The upper and lower limits of the subband are respectively : , For the first Each unit has bandwidth; S3-3: Training data containing transient impulse waveforms and silent state waveforms Constructing a feature matrix from frame-by-frame energy features: ,in, This represents the energy of the nth subband of the mth frame of data. S3-4: Use a clustering algorithm to binary classify each row vector of the transient impact waveform and the silent state waveform into different clusters. Since the feature variable is energy magnitude, the Euclidean distance is chosen as the distance metric for clustering. ; For each clustering result, construct a Gaussian model with dimension two as shown in the following formula. The parameters of the model are obtained by the EM algorithm through multiple iterations. ,in, The weights for each independent Gaussian distribution, For expectation and variance; S3-5: Perform the same frame division and frequency sub-band energy calculation on the input waveform data, and calculate the probability of each frame of data in relation to both. The likelihood ratio test criterion is used to determine the first... Does the frame contain transient impulse signals? ;in, and The probability that the j-th feature variable in the i-th frame belongs to the transient impact or the silent state is used to threshold the likelihood ratio to obtain the probability that each frame contains a transient impact: , and The energy threshold for each frame and the sub-band energy threshold are obtained from the training dataset; For two adjacent frames that partially overlap, take and The union of the results is used as the determination result for this frame, and consecutive sets and sets within 20 frames of each other are considered. The frames are considered as a complete set of transient impact waveforms.

4. The method for identifying site micro-vibration sources based on machine learning according to claim 1, characterized in that, Step S4 specifically includes: S4-1: Transient impact signal The frame is truncated to the same length as the frame in step S3-1 and then filtered using a Mel filter bank. The conversion relationship between the Mel scale and Hertz is expressed by the following formula: ; S4-2: To Perform MFCC feature extraction with transformed scale to form a feature matrix. ,Will Energy normalization is performed as follows: ; in, These are parameters, selected based on the micro-vibration sensor chosen in the project and the amplitude distribution of the specific vibration data measured. , ; S4-3: Obtaining what belongs to Feature matrix : ; Among them, row The number of frames representing each independent transient impact, column This represents the coefficients of the Mel filter bank.

5. The site micro-vibration source identification method based on machine learning according to claim 4, characterized in that, Step S5 specifically includes: S5-1: Perform K-medoids clustering based on DTW distance: (1) Each Feature matrix Select the K points with the largest mutual DTW distance as the initial centroids; (2) Cluster according to the nearest principle, and calculate each cluster in turn. Calculate the DTW distances to the K class centroids, and then, based on the principle of finding the closest distance to the K class centroids, assign all remaining [centroids]. Assign them to the nearest class, forming K classes; (3) Redetermine the centroid of each class, and the cost function: ; In the formula for The sum of DTW distances to other transient impacts of the same type k; find the class of... smallest With the original mass center The two centroids are compared, and the one with the smaller cost function value is selected as the new centroid. (4) Repeat steps (2) and (3) until the centroids no longer change or the maximum number of iterations is reached, to obtain the K-classified data. ; S5-2: The optimal number of clusters K is determined by combining the contour coefficient method and the cross-correlation of the connected components of the undirected graph; (1) Taking K as a variable, for all Perform multiple clustering operations, with K taking values ​​within a certain range. The optimal K value is obtained by the profile coefficient method. : ; in, These represent the DTW distance from element i to the centroid of its cluster and the DTW distance from the element to the next nearest centroid, respectively. (2) Obtaining the optimal K value using the cross-correlation undirected graph connected component method. Assuming the normalized cross-correlation value of the same transient impact signal is above 0.8, then all... : 1) Calculate all The cross-correlation between each pair of elements is maximized, and the cross-correlation matrix is ​​constructed as follows: ; 2) To Thresholding is performed to obtain the cross-correlation adjacency matrix. : ; in, Represents transient impact signal The square of the maximum amplitude; 3) For the cross-correlation adjacency matrix Find all its connected components and denote the number of paths as . ; (3) The optimal K value is determined by and Jointly determined: 。 6. The method for identifying site micro-vibration sources based on machine learning according to claim 5, characterized in that, Step S8 specifically includes: S8-1: For a single It was identified and determined to be a sensor. Types of captured transient impact signals : (18) Sure Whether it is a transient impact signal emitted by an external vibration source, on A threshold determination is performed to identify it as a transient impact signal of the same type within the set. , where T is the EER or high FRR threshold, which is determined comprehensively using the training dataset during model training; S8-2: For sensors and The GMM models for various transient impact signals are determined separately, and it is assumed that each has a model. and The sum of the relative errors of each parameter and the DTW distance between the center points are calculated as the result of determining whether the two types of signals belong to the same class; let... Parameters Correspondingly The parameter index is set to j, in Find with The closest of each And calculate the similarity between the two models. ; S8-3: Traverse all clusters of both sensors Find the smallest pair and compare the DTW distance between their center points. If the distance is less than the average distance within the cluster, then the two models are considered to describe the signal dataset emitted by the same vibration source.