A method for nondestructive testing and quality grading of thawed state of prefrozen meat

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining cross-modal temporal resonance analysis of spontaneous acoustic signals and visible light image sequences, the problem of assessing the thawing uniformity during the thawing process of frozen meat is solved, realizing high-sensitivity and low-cost thawing status detection, which is suitable for intelligent monitoring of cold chain food.

CN122193418APending Publication Date: 2026-06-12GUANGDONG HAICHANGYUAN GUOTONG FOOD MATEERIAL CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: GUANGDONG HAICHANGYUAN GUOTONG FOOD MATEERIAL CO LTD
Filing Date: 2026-03-05
Publication Date: 2026-06-12

Application Information

Patent Timeline

05 Mar 2026

Application

12 Jun 2026

Publication

CN122193418A

IPC: G01N29/14; G01N29/44; G01N29/46; G01N29/50; G01N21/84; G01B11/28; G01B11/30

AI Tagging

Application Domain

Material analysis using acoustic emission techniquesProcessing detected response signal

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Shield tunnel joint toughness test reciprocating loading method and system
CN122192746AMachine part testingMaterial analysis using acoustic emission techniques
Method and apparatus for testing the impact processing speed of acoustic emission detection systems
CN122193428AMaterial analysis using acoustic emission techniquesProcessing detected response signal
Online monitoring of sinter quality
WO2026119526A1Analysing solids using sonic/ultrasonic/infrasonic wavesMaterial analysis using acoustic emission techniquesMetallurgyIndustrial engineering
A cultural relic acoustic emission multi-source information disease monitoring method and device
CN120761501BMaterial analysis using acoustic emission techniquesProcessing detected response signal
A method for analyzing multi-field coupling creep characteristics of rock mass with anchor joint
CN121656005BMaterial analysis using acoustic emission techniquesMaterial strength using tensile/compressive forces

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies struggle to accurately assess the uniformity of thawing during the freezing of frozen meat, particularly the dynamic differences in microstructure. Furthermore, existing methods rely on external imaging or contact temperature measurement, lacking a highly sensitive, calibration-free, and fully non-contact detection solution.

⚗Method used

By acquiring spontaneous acoustic signals and visible light image sequences of frozen meat samples during the thawing process, a time-aligned acoustic signal and image frame sequence is generated. Adaptive wavelet threshold denoising is used to extract phase transition acoustic fingerprints. Combined with cross-modal temporal resonance analysis, a thawing uniformity evaluation feature vector is generated and input into a classification model for judgment.

🎯Benefits of technology

It achieves highly sensitive, robust, and low-cost detection of the thawing state of frozen meat, accurately captures microscopic physical processes, improves the sensitivity and reliability of thawing state identification, reduces hardware and computing overhead, and is suitable for continuous monitoring of cold chain food.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122193418A_ABST

Patent Text Reader

Abstract

The present application relates to the multi-modal data acquisition and thawing uniformity intelligent evaluation technology in the frozen meat thawing process. In view of the low detection precision and poor real-time performance of the existing thawing uniformity detection, the acoustic fingerprint features reflecting the ice crystal phase transition process are extracted through adaptive wavelet denoising and short-time Fourier analysis; combined with U-Net neural network and dynamic threshold segmentation, the pixel-level identification of high-precision thawing area and the area and edge roughness quantification are realized. The dynamic cross-correlation and beamforming algorithm are used to establish the cross-modal coupling strength index, which allows spatial positioning of thawing unevenness and realizes real-time trend thermodynamic map analysis. Finally, the multi-dimensional sound-visual data is fused, and the thawing uniformity grading and abnormal early warning are realized through the lightweight XGBoost model. The scheme improves the fineness and intelligent level of the thawing process state monitoring, and has significant application value for cold chain food quality control.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of cold chain food processing and non-destructive testing of food, and in particular to a method for non-destructive testing and quality grading of the thawing state of pre-frozen meat. Background Technology

[0002] Currently, in the field of non-destructive testing of cold chain food processing and related foods, the assessment of the thawing uniformity of frozen meat primarily relies on visual image analysis as the core technology. Mainstream solutions largely depend on single-modal visible light imaging or thermal imaging techniques. By analyzing the surface color, texture changes, and temperature distribution of frozen meat during the thawing process, they achieve dynamic monitoring and uniformity assessment of the thawing process. These technologies typically employ industrial cameras, infrared sensors, or multispectral sensors to acquire timed image sequences or temperature thermal maps, and use algorithms such as image segmentation, region extraction, and edge recognition to quantify the rate of thawing area advancement and surface state changes. With the development of deep learning and image segmentation technologies, some solutions further integrate neural network models such as U-Net to improve boundary recognition accuracy and attempt to determine thawing uniformity through multi-frame image temporal analysis. Simultaneously, the industry trend is evolving towards integrating multiple sensing methods and improving real-time performance and intelligent decision-making capabilities.

[0003] Although visible light imaging and thermal imaging methods have certain advantages in practical production applications, such as speed, non-contact operation, and ease of integration, the following problems have also been prominently exposed in practical applications: Relying solely on a single visual modality is susceptible to significant interference from sample appearance heterogeneity, surface reflection, localized frost, and differences in color or shape. Existing two-dimensional imaging schemes struggle to effectively perceive phase transitions and temperature gradients within internal structures.

[0004] While thermal imaging can reflect the temperature distribution on the surface and at a limited depth, accurately obtaining the interface of the internal thawing area depends on the heat conduction model and calibration. It is significantly correlated with meat density, thickness, and fat distribution, and has poor versatility and reliability.

[0005] Some multi-sensor combination solutions attempt to perform data fusion (such as image + temperature), but they usually face two major challenges: first, the synchronization of multimodal hardware is difficult, and the cost of timestamp alignment and spatial registration increases; second, a large amount of calibration and supervised training is required, which greatly increases the difficulty of deployment and maintenance.

[0006] The aforementioned mainstream methods mostly remain at the level of macroscopic characterization of the thawing front or area changes, which are difficult to effectively reveal the dynamic differences in the microscale structure of meat during the solid-liquid phase transition. They are also unable to accurately characterize the local unevenness of thawing caused by differences in variety, morphology, and initial conditions, resulting in limited accuracy and stability in the determination.

[0007] Current technologies largely ignore the physical signals generated by frozen meat during thawing, particularly the spontaneous acoustic responses excited by ice crystal melting and structural relaxation during the solid-liquid phase transition. For a long time, thawing uniformity assessment has relied primarily on external imaging or contact temperature measurement, failing to fully exploit the transmission, emission, or passive acoustic information characteristics during the phase transition process. The industry lacks highly sensitive, calibration-free, end-to-end non-contact detection solutions for the natural kinetics of frozen meat thawing. Furthermore, few technologies can effectively and dynamically couple microscopic dynamic parameters with macroscopic visual processes to achieve end-to-end intelligent quality control from internal structural evolution to a visually discernible thawing state. Summary of the Invention

[0008] This application provides a method for non-destructive testing and quality grading of the thawing state of pre-frozen meat, aiming to solve one of the problems or issues of the prior art mentioned in the background.

[0009] This application provides a non-destructive testing and quality grading method for the thawing state of pre-frozen meat, specifically including: S1: Based on the spontaneous acoustic signals and visible light image sequences that evolve over time in a constant temperature thawing environment of the obtained frozen meat samples, generate a time-aligned original acoustic signal sequence and image frame sequence. S2: Perform adaptive wavelet threshold denoising on the acoustic signal sequence to obtain a time-frequency spectrum, and calculate three types of dynamic features in the sensitive frequency band: instantaneous acoustic energy entropy, main frequency drift rate, and short-time zero-crossing rate variation coefficient, to obtain a phase transition acoustic fingerprint vector sequence that characterizes the microstructure relaxation properties of the phase transition process. S3: Generate a thawing area proportion curve and an edge roughness time series sequence based on the image frame sequence, and obtain a reference sequence of the thawing process change rate by performing a first-order difference operation on the thawing area proportion curve and the edge roughness time series sequence. S4: Based on the thawing process change rate benchmark sequence, calculate the dynamic cross-correlation function between it and the features of each dimension in the phase change acoustic fingerprint vector sequence, and construct a cross-modal temporal resonance feature set by identifying the acoustic sub-features with the largest time delay correlation and their corresponding time delay parameters; S5: Generate a frame-by-frame updated temporal sequence of resonance intensity based on the cross-modal temporal resonance feature set; S7: By fusing the mean, spatial dispersion, and acoustic fingerprint entropy slope of the resonance intensity time series, a thawing uniformity evaluation feature vector is obtained, and this feature vector is input into the classification model to output the thawing uniformity level judgment result.

[0010] This application provides a non-destructive testing and quality grading method for the thawing state of pre-frozen meat, which has the following beneficial effects: (1) The method proposed in this application constructs a thawing state recognition framework based on passive acoustic perception for the first time by mining the acoustic signals spontaneously generated by the solid-liquid phase transition during the natural thawing of frozen meat. This effectively overcomes the technical bottlenecks of high system complexity, high energy consumption, and difficulty in continuous monitoring caused by the reliance on external excitation sources or high-cost thermal imaging equipment in traditional methods. Compared with the problems of the single visual modality commonly used in the prior art being susceptible to changes in illumination and interference from surface reflection, which leads to a decrease in segmentation accuracy, and the limitation of infrared imaging that only reflects the surface temperature distribution and cannot perceive the internal phase transition process, this scheme uses a ring-arranged multi-channel low-noise microphone array to non-contactly collect wide-band spontaneous acoustic signals. Combined with adaptive wavelet denoising and sensitive frequency band feature extraction, it accurately captures the "phase transition acoustic fingerprint" in the range of 1–5 kHz that is strongly correlated with microscopic physical processes such as ice crystal breakage and water migration. This achieves a deep, non-destructive, and real-time characterization of the thawing dynamics process, significantly improving the sensitivity and physical interpretability of state recognition.

[0011] (2) Furthermore, this invention innovatively introduces a cross-modal temporal resonance analysis mechanism, dynamically aligning and cross-correlates the acoustic fingerprint sequence with the evolution trajectory of the thawing region obtained from image segmentation. This identifies a lag response of approximately 12–18 seconds between key acoustic features such as the dominant frequency drift rate and the macroscopic thawing area change. Based on this, a "resonance intensity" index with clear physical meaning is defined to quantify the coupling relationship between local phase transition activity and the overall thawing process. This mechanism not only reveals the potential predictive ability of acoustic response leading visible appearance changes in the time dimension, but also achieves spatial localization of the sound source through beamforming algorithms. This is superimposed onto the visual frame to generate an acoustic-visual fusion heatmap. Combined with spatial consistency criteria (high resonance intensity in multiple quadrants and excessive spatial dispersion), it realizes early warning and spatial source tracing of thawing non-uniformity, significantly improving detection reliability and discrimination robustness. This solves the common problems of high misjudgment rate and large feedback delay caused by the lack of spatiotemporal joint criteria in existing technologies.

[0012] (3) Finally, this scheme abandons the complex multimodal synchronous calibration process and expensive multispectral sensing devices, and constructs a lightweight closed-loop evaluation system based entirely on passive perception. It integrates three core indicators: temporal resonance mean, spatial dispersion, and acoustic entropy slope, to drive the classifier to output multi-level uniformity levels and generate trend visualization maps. While ensuring high-precision recognition, it significantly reduces hardware costs and computational overhead, and has good engineering adaptability and production line deployment potential. The entire method does not require sample pretreatment, external excitation, or variable temperature control conditions. It is suitable for continuous, non-invasive quality monitoring of various cold chain foods under constant temperature conditions, and provides an original technical path for achieving precise thawing control in intelligent cold chain logistics. It has outstanding practicality, scalability, and industrial application prospects.

[0013] In summary, this invention establishes a complete mapping chain from phase transition physical mechanisms to passive acoustic responses and multi-dimensional spatiotemporal fusion criteria, achieving highly sensitive, robust, and low-cost identification of the thawing state of cold chain food. It breaks through the fundamental limitations of traditional visual or thermal imaging-dominated methods in terms of deep information acquisition and dynamic process analysis, and constructs a new intelligent monitoring system with physical interpretability, no need for complex parameter tuning, and easy integration and deployment, effectively promoting the evolution of thawing processes towards digitalization and intelligence. Attached Figure Description

[0014] Figure 1 This is the main flowchart of a method for non-destructive testing and quality grading of pre-frozen meat in its thawing state.

[0015] Figure 2 This is a sub-flowchart of a method for non-destructive testing and quality grading of pre-frozen meat in its thawing state.

[0016] Figure 3 This is another sub-flowchart of a method for non-destructive testing and quality grading of pre-frozen meat in its thawing state. Detailed Implementation

[0017] Embodiments of the present invention are described in detail below, examples of which are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.

[0018] The following disclosure provides many different embodiments or examples for implementing different structures of the invention. To simplify the disclosure, specific examples of components and arrangements are described below. Of course, these are merely examples and are not intended to limit the invention. Furthermore, reference numerals and / or letters may be repeated in different examples; such repetition is for simplification and clarity and does not in itself indicate a relationship between the various embodiments and / or arrangements discussed.

[0019] like Figure 1 As shown, this application provides a method for non-destructive testing and quality grading of pre-frozen meat in its thawing state, specifically including: S1: Based on the spontaneous acoustic signals and visible light image sequences that evolve over time in a constant temperature thawing environment of the obtained frozen meat samples, generate a time-aligned original acoustic signal sequence and image frame sequence. S2: Perform adaptive wavelet threshold denoising on the acoustic signal sequence to obtain a time-frequency spectrum, and calculate three types of dynamic features in the sensitive frequency band: instantaneous acoustic energy entropy, main frequency drift rate, and short-time zero-crossing rate variation coefficient, to obtain a phase transition acoustic fingerprint vector sequence that characterizes the microstructure relaxation properties of the phase transition process. S3: Generate a thawing area proportion curve and an edge roughness time series sequence based on the image frame sequence, and obtain a reference sequence of the thawing process change rate by performing a first-order difference operation on the thawing area proportion curve and the edge roughness time series sequence. S4: Based on the thawing process change rate benchmark sequence, calculate the dynamic cross-correlation function between it and the features of each dimension in the phase change acoustic fingerprint vector sequence, and construct a cross-modal temporal resonance feature set by identifying the acoustic sub-features with the largest time delay correlation and their corresponding time delay parameters; S5: Generate a frame-by-frame updated temporal sequence of resonance intensity based on the cross-modal temporal resonance feature set; S7: By fusing the mean, spatial dispersion, and acoustic fingerprint entropy slope of the resonance intensity time series, a thawing uniformity evaluation feature vector is obtained, and this feature vector is input into the classification model to output the thawing uniformity level judgment result.

[0020] Step S1: A non-contact, ring-shaped array of 8 low-noise condenser microphones is arranged within the thawing chamber to simultaneously acquire spontaneous acoustic signals from frozen meat samples evolving over time in a constant-temperature thawing environment. A top-mounted industrial camera acquires corresponding visible light image sequences at a frame rate of 1 fps. Hardware timestamps are used to achieve strict alignment of the acoustic and visual dual-modal data, generating a time-aligned original acoustic signal sequence and image frame sequence. Specifically, this includes: S1.1: Based on the physical mechanism that frozen meat spontaneously generates a wideband acoustic response due to microscale structural relaxation caused by ice crystal phase change during thawing, a sensor array consisting of 8 low-noise condenser microphones is arranged non-contactly in a ring distribution position inside the thawing chamber. Each microphone has a frequency response range of 50 Hz–20 kHz and a signal-to-noise ratio of ≥75 dB to uniformly cover the three-dimensional spatial acoustic radiation field of the sample and obtain passive acoustic signal input from multiple perspectives.

[0021] Based on the microscale structural relaxation acoustic emission mechanism triggered by the ice crystal phase transition during the thawing of frozen meat, a response range covering... Hz, signal-to-noise ratio not lower than A low-noise condenser microphone of dB is used as the sensing element to construct a sensing array with a ring distribution feature to ensure three-dimensional uniform coverage of the sound radiation field.

[0022] A geometric uniform point distribution algorithm is used (parameter: array radius). mm, vertical spacing Eight microphones are arranged on the inner wall of the thawing chamber (mm) to ensure that each sensing unit covers different azimuth and elevation angles of the frozen meat sample, forming an all-round passive acoustic acquisition layout.

[0023] Furthermore, an array directivity equalization optimization method is used (parameters: sensitivity deviation compensation value of each unit ±...). (dB) calibrates the pickup characteristics of individual microphones to equalize the overall array's receiving gain in all directions and reduce spatial acquisition blind spots caused by differences in unit response.

[0024] Furthermore, through acoustic propagation modeling methods (based on the spherical wave equation and the reflection coefficient of the cavity inner wall), The coverage efficiency of the array inside the thawing cavity was evaluated, and the microphone spacing and installation angle were optimized so that the spatial sampling grid of the array could meet the Nyquist spatial sampling conditions in the target frequency band, thereby avoiding spatial aliasing.

[0025] Through a non-contact installation structure (parameter: natural frequency of the suspended shock-absorbing bracket) (Hz), to isolate the coupling interference of mechanical vibration on the microphone, and to stably suspend the ring array at the designated position in the thawing cavity, ensuring the purity and stability of acoustic signal acquisition, and achieving the design goal of acquiring multi-view passive acoustic signal input.

[0026] For example, in a machine with a volume of m In the constant temperature defrosting chamber, a CMC-8L condenser microphone array unit was selected, with each unit having a frequency response range of [missing information]. Hz, signal-to-noise ratio is dB. Using a geometrically uniform distribution algorithm, eight microphones are arranged at equal intervals in a diameter... A circular ring, mm in diameter, perpendicular to the center of the sample. mm, adjust the installation angle to the azimuth interval °, elevation angle °. After the array was built, the gain of each microphone was calibrated using a pink noise signal, and the sensitivity error of each unit was measured to be controlled within ± dB range. Verification using a spherical wave propagation model showed that the array coverage efficiency reached the design value within the target frequency band, and spatial sampling satisfied the Nyquist condition; the suspended anti-vibration bracket effectively isolated the workbench vibration, reducing the channel noise floor to [value missing]. μPa effectively improves the dynamic range of subsequent signal processing stages. The final output data is an 8-channel multi-view passive acoustic raw signal sequence, supporting the subsequent phase change acoustic fingerprint extraction process and achieving multi-directional, low-interference spontaneous sound field acquisition.

[0027] S1.2: The original analog acoustic signal output by the 8-channel microphone array is pre-amplified and anti-aliasing filtered. Then, the analog-to-digital conversion is completed by the multi-channel synchronous sampling module at a sampling rate of 48 kHz to generate the original acoustic signal sequence in the digital domain. This ensures that the phase consistency error between each channel is less than ±1.5°, so as to guarantee the spatial positioning accuracy of subsequent beamforming.

[0028] For the raw analog sound signal acquired by the 8-channel low-noise condenser microphone array, a low-noise preamplifier circuit (gain set to 20 dB, input impedance ≥ 10 kΩ) is used to increase the amplitude of the weak spontaneous sound signal, ensuring that the signal amplitude matches the dynamic range of the subsequent analog-to-digital conversion.

[0029] Furthermore, an anti-aliasing filtering algorithm (type: fourth-order Butterworth low-pass filter, cutoff frequency set to 24 kHz) is used to attenuate spectral components above the Nyquist frequency to prevent spectral aliasing during sampling and to obtain the filtered analog signal output.

[0030] Furthermore, a multi-channel synchronous sampling module (sampling rate 48 kHz, quantization bit depth 24 bit, clock source unified from embedded PLL module) is used to realize multi-channel analog-to-digital conversion, converting the amplified and filtered analog signal into a digital domain signal. At the same time, a phase calibration parameter is introduced into the sampling control logic to ensure that the sampling trigger of each channel is synchronized, and the original acoustic signal sequence in the digital domain is obtained.

[0031] Furthermore, a phase consistency detection algorithm (based on cross-correlation phase difference calculation) is used to monitor and fine-tune the phase difference between digital signals in each channel. The phase consistency error threshold is set to ±1.5°. Once the detection exceeds the limit, the phase compensation module is triggered to obtain a corrected, highly consistent multi-channel digital signal.

[0032] Through the above-mentioned multi-channel preamplification, anti-aliasing filtering, synchronous analog-to-digital conversion and phase consistency correction processing, the original analog acoustic signal is transformed into a digital domain original acoustic signal sequence with high amplitude fidelity, high spectral purity and high spatial phase consistency, thus realizing the basic data conditions required for subsequent beamforming and spatial positioning.

[0033] For example, when configuring an 8-channel low-noise condenser microphone array with a ring distribution radius of 250 mm, the preamplifier circuit of each microphone is set to a gain of 20 dB and an input impedance of 15 kΩ to accommodate the weak spontaneous acoustic signals of frozen meat samples under constant temperature (4 ℃ ± 0.3 ℃) conditions in the thawing chamber. The anti-aliasing filter uses a fourth-order Butterworth structure with an analog cutoff frequency of 24 kHz, achieving attenuation of over −48 dB for high-frequency interference. The multi-channel synchronous sampling module operates at a sampling rate of 48 kHz and a quantization accuracy of 24 bit. The unified clock source comes from a PLL circuit with an accuracy of ±50 ppm, and channel triggering synchronization is achieved through a delay matching algorithm within the firmware. The phase consistency detection module performs cross-correlation phase difference calculation within each second signal segment, using the following formula: in, and Let be the digital signal vectors of the i-th channel and the j-th channel, respectively. Here is their cross-correlation matrix. The phase difference is calculated. If the detected value exceeds ±1.5°, the software delay compensation module is triggered to perform channel phase adjustment. Through this processing flow, the phase consistency of the output digital domain original acoustic signal sequence remains within ±1° in actual testing, the waveform fidelity is significantly improved, and the environmental noise suppression effect is significantly optimized, providing a stable guarantee for subsequent spatial positioning accuracy.

[0034] S1.3: An industrial-grade CMOS camera is orthogonally mounted on the top of the thawing chamber. Its imaging frame rate is set to 1 fps, and the exposure parameters are locked at a fixed gain and shutter speed. Based on the ambient constant illuminance (>800 lux) condition, visible light images reflected from the surface of frozen meat are acquired to generate a time-continuous sequence of raw image frames as visual modal input.

[0035] An industrial-grade CMOS camera (pixel resolution ≥4000×3000, dynamic range ≥70 dB) is fixedly installed at the orthogonal position at the top of the thawing chamber, ensuring that the optical axis is perpendicular to the sample's horizontal plane to obtain a visual perspective complementary to the acoustic array's arrangement direction, thereby enabling visible light image acquisition.

[0036] Based on the stability of the acquisition task, the camera imaging frame rate is set to 1 fps to ensure an integer multiple relationship with the acoustic sampling period, providing periodic synchronization conditions for subsequent hardware timestamp alignment.

[0037] By locking the exposure parameters at a fixed gain and a fixed shutter speed using the camera control module, the non-linear effect of exposure drift caused by slow changes in ambient brightness on pixel grayscale can be avoided.

[0038] The image is acquired using constant ambient light conditions. When the light intensity is maintained at >800 lux, an image acquisition command is triggered to convert the visible light signal reflected from the surface of the frozen meat into a digital image matrix.

[0039] The continuously acquired digital image matrix is encapsulated into a sequence of original image frames in chronological order, and state parameters such as the sensor’s built-in temperature and ambient brightness reference value are recorded in the metadata of each image frame, providing an additional reference channel for subsequent visual modality feature extraction.

[0040] By using the imaging control and locking method described above, the stable acquisition conditions established in the previous step are transformed into RGB image data with good temporal continuity, thereby achieving high consistency and synchronicity of visual modal input.

[0041] For example, an industrial-grade CMOS camera (IMX253, resolution 4096×3000, pixel size 3.45 μm, dynamic range 72 dB) was installed in a constant-temperature thawing chamber. The camera was fixed to the top of the thawing chamber using a custom bracket, with its optical axis perpendicular to the horizontal plane and a focal length set to 35 mm to cover the entire sample area. The control system was set to a frame rate of 1 fps, a fixed gain of 12 dB, a shutter speed of 1 / 60 s, and a constant illumination of 850 lux. During operation, when the illumination sensor detected brightness fluctuations not exceeding ±5 lux, the CMOS camera acquired RGB image frames and recorded additional parameters such as camera temperature (28.3°C) and ambient brightness (849 lux) in the metadata. The acquired image sequence was verified to have no exposure drift, and the time interval between each frame was a whole second. A continuous sequence of 180 frames was output for subsequent visual modality analysis, verifying that the output sequence possessed high illumination stability and a mean brightness difference between frames of less than 0.5.

[0042] S1.4: Using a unified hardware clock source in the embedded system, each frame of acoustic data packet and image data frame output by the microphone array and industrial camera is stamped with a UTC timestamp accurate to the millisecond level. Based on the timestamp, the time axis alignment operation of the heterogeneous data stream is performed to generate an audio-visual dual-modal synchronization dataset with strict time sequence correspondence.

[0043] The system acquires raw image data frames from an industrial-grade CMOS camera and digital acoustic data packets from an 8-channel low-noise condenser microphone array through preprocessing and analog-to-digital conversion, which are then used as inputs for heterogeneous modal synchronization processing.

[0044] A unified hardware clock source synchronization strategy (parameter: high-precision UTC reference signal ±0.5 ms error) is adopted to unify the time base of the industrial camera acquisition module and the multi-channel acoustic acquisition module in the digital domain output stage, and lock the acquisition trigger signals of each data stream to the same phase reference.

[0045] Furthermore, a timestamp generation algorithm (parameters: millisecond-level resolution, 64-bit integer storage format) is used to perform UTC timestamp calibration on each frame of RGB image and the corresponding second-level acoustic data segment, ensuring that the consistency error between the timestamp and the actual trigger acquisition moment does not exceed 1 ms, and forming a standardized timestamp field that can be parsed across systems.

[0046] Furthermore, a timeline alignment algorithm (method: sorting and matching based on timestamp differences) is used to achieve temporal pairing of bimodal data. Using timestamps as unique indexes, the algorithm searches for data pairs with the smallest absolute time difference in the acoustic and visual data streams, establishes a one-to-one mapping relationship, and generates a complete pairing index table.

[0047] Furthermore, the pairing index table is validated by an abnormal frame removal algorithm (parameter: time difference threshold = 2 ms), and all audio-visual data pairs with time differences exceeding the threshold are removed to avoid timing errors introduced by abnormal sampling or data loss, and to ensure that the time synchronization accuracy of the final dataset meets the requirements of subsequent physical coupling analysis.

[0048] By unifying the hardware clock source and aligning the time axis, the multi-channel digital acoustic signal sequence and the original image frame sequence from the previous step are transformed into an acoustic-visual dual-modal synchronous dataset with strict temporal correspondence, thus realizing the basic conditions for cross-modal signal analysis.

[0049] For example, in the thawing detection chamber of a cold chain food processing line, the UTC synchronization module of an industrial-grade CMOS camera and the acquisition controller of the microphone array are both connected to a hardware clock source (temperature-controlled crystal oscillator, frequency stability <0.1 ppm) of an embedded system. This clock source maintains UTC time synchronization accuracy of ±0.5 ms with the host computer via the PTP protocol. Each time the camera acquires a frame, it triggers a hardware interrupt to send the current UTC timestamp. The microphone acquisition module records the UTC timestamp of the initial sampling every time a 1-second audio segment is recorded. The timestamp uses a 64-bit integer to record the number of milliseconds; for example, 2024-04-15 10:34:12.532 UTC corresponds to a recorded value of... During the data alignment phase, the system follows a pairing rule: it searches the acoustic data list for the image with the smallest difference in timestamp that does not exceed a specified value. Millisecond audio clips are used to create bimodal matching records. For a continuous 300-second thawing process of samples within the testing chamber, 300 records with a temporal consistency error not exceeding [a certain threshold] are ultimately generated. The millisecond acoustic-visual paired data unit is used for subsequent cross-modal resonance analysis of phase change acoustic fingerprints and thawed region pixel features. In this embodiment, the temporal consistency of multi-frame verification significantly improves the reliability of dynamic coupling analysis and the stability of hierarchical determination.

[0050] S1.5: The multi-channel digital audio signal sequence and the image frame sequence, which are aligned by timestamps, are paired and encapsulated according to time index to generate data units in a structured storage format. Each time unit contains an 8-channel, 1-second audio segment and a 1-frame RGB image corresponding to the time, which is used to support the extraction of 'phase change acoustic fingerprint' and joint modeling of pixel-level thawing regions in subsequent stages.

[0051] Step S2: Adaptive wavelet threshold denoising is performed on the original acoustic signal sequence to suppress environmental background noise interference. The time-spectrum of each denoised signal frame is extracted, and three types of dynamic features—instantaneous acoustic energy entropy, dominant frequency drift rate, and short-time zero-crossing rate variation coefficient—are calculated within the 1–5 kHz sensitive frequency band. These constitute a 'phase transition acoustic fingerprint' vector sequence characterizing the microstructural relaxation properties of the phase transition process. Specifically, this includes: S2.1: Obtain the original acoustic signal sequence acquired by an 8-channel low-noise condenser microphone array with a circular distribution. Based on the Discrete Wavelet Transform (DWT) algorithm, the db6 wavelet basis function is used to decompose each channel signal into 5 levels to extract wavelet coefficients at each scale, so as to construct a multi-resolution time-frequency representation space and obtain a set of layered wavelet coefficients.

[0052] S2.2: Based on the aforementioned layered wavelet coefficient set, the noise standard deviation at each scale is calculated using the signal local variance estimation model. Then, the optimal threshold parameter for each layer of wavelet coefficients is adaptively determined according to the SURE unbiased risk estimation criterion, generating a scale-adaptive threshold vector to achieve accurate modeling of non-stationary background noise.

[0053] S2.3: Apply a soft threshold function to the high-frequency detail coefficients in the layered wavelet coefficient set to perform coefficient shrinkage processing, retain strong correlation components and suppress weak amplitude noise terms, and then reconstruct the denoised acoustic signal waveform through inverse discrete wavelet transform (IDWT) to output a time-aligned denoised acoustic signal sequence.

[0054] S2.4: Based on the denoised acoustic signal sequence, a high-resolution time spectrum is generated using short-time Fourier transform (STFT), and the sensitive frequency band of 1–5 kHz is divided into equally spaced sub-bands. The instantaneous acoustic energy entropy in each sub-band is calculated as a nonlinear dynamic feature. At the same time, the time trajectory of the main frequency component is extracted and its first-order difference mean is calculated as the main frequency drift rate. The standard deviation of the short-time zero-crossing rate is further calculated as the coefficient of variation to form a three-dimensional dynamic feature vector group.

[0055] S2.5: The three-dimensional dynamic feature vector group is aligned and spliced according to time frames to generate a time-series feature matrix that is updated frame by frame. Based on normalization processing, the gain difference between channels is eliminated, and finally a standardized 'phase change acoustic fingerprint' vector sequence is output as the core acoustic characterization of the microscale structural relaxation process caused by ice crystal phase change.

[0056] The three-dimensional dynamic feature vector group generated by S2.4 is subjected to frame-level alignment processing according to a unified time base. The timestamp index matching algorithm (parameter: millisecond-level UTC timestamp) is used to realize the synchronous pairing of feature data from different channels at the same time.

[0057] Furthermore, by using a structured feature matrix splicing method (parameters: column dimension is instantaneous acoustic energy entropy, main frequency drift rate, and short-time zero-crossing rate variation coefficient), a time-series feature matrix that is updated frame by frame is generated, forming a two-dimensional data structure with time as the row index and feature dimension as the column index.

[0058] Furthermore, the gain difference between channels is eliminated through a normalization method (parameters: Z-score standardization, mean μ, standard deviation σ estimated based on full-time-series samples). The calculation formula is as follows: in, For the original value of the feature, This is the mean of the feature over the entire time series. The standard deviation of this feature over the entire time series is given.

[0059] Furthermore, by using the column-by-column standard deviation test method of the normalized matrix (parameter: test threshold set to 1.0), we ensure that each feature dimension fluctuates within a uniform scale range, thus preventing scale bias in the subsequent cross-modal modeling process.

[0060] Through the above processing, the time-aligned and normalized temporally sequenced feature matrix is transformed into a standardized 'phase transition acoustic fingerprint' vector sequence, realizing the core acoustic characterization of the microscale structural relaxation process induced by ice crystal phase transition, and providing accurate acoustic feature input for subsequent vision-driven cross-modal dynamic cross-correlation analysis.

[0061] For example, during the thawing of a frozen beef brisket sample, the three-dimensional dynamic feature vector set includes an instantaneous acoustic energy entropy value ranging from 2.1 to 3.4, a dominant frequency drift rate ranging from 0.05 to 0.12 Hz / s, and a short-time zero-crossing rate coefficient of variation ranging from 1.8 to 3.0. Millisecond-level UTC timestamps are used to synchronize the three features, generating a temporal feature matrix containing 180 rows (corresponding to a 180-second thawing process) × 3 columns (corresponding to the three feature dimensions). An example row in the matrix is [2.85, 0.083, 2.45]. The mean instantaneous acoustic energy entropy is calculated when Z-score normalization is performed. =2.75, standard deviation =0.28, the normalized result corresponding to the first value is =0.36. After all normalization processes are completed, the feature standard deviation test values are all between 0.98 and 1.02, indicating that the three feature dimensions are within a unified scale range. The standardized 'phase change acoustic fingerprint' vector sequence is output for cross-modal correlation analysis with the visual thawing process change rate benchmark sequence in the subsequent S3 step.

[0062] like Figure 2 As shown, step S3 involves jointly performing pixel-level recognition of the thawing region based on dynamic threshold segmentation of the HSV color space V channel and a fine-tuned U-Net network. This extracts the thawing region from each frame in the image frame sequence, generating a thawing area percentage curve and an edge roughness time-series sequence at the corresponding time point. A first-order difference operation is then performed to obtain a baseline sequence of the thawing process change rate. Specifically, this includes: S3.1: Acquire a sequence of visible light image frames captured by the top industrial camera at a frame rate of 1 fps. Based on the HSV color space conversion algorithm, convert each frame of the original RGB image into an HSV format image to enhance robustness to illumination changes. Utilize the sensitivity of the V channel (luminance component) to the brightness difference between the melted ice crystal area and the unthawed area on the surface of frozen meat to extract the V channel grayscale image of each frame to generate a V channel image sequence, which serves as input data for subsequent adaptive threshold segmentation.

[0063] Acquire the raw RGB image frame sequence output by the top industrial camera at a frame rate of 1 fps as the processing object of the current sub-step, ensuring that the acquisition is performed under the condition of exposure time and gain parameter locking to maintain the stability of cross-frame illumination conditions.

[0064] The HSV color space conversion algorithm (parameters: image size is fixed at the acquisition resolution, color channel conversion precision is 32-bit floating point) is used to perform non-linear transformation on each frame of original RGB image data by pixel to obtain an HSV format image containing Hue, Saturation, and Value, thereby achieving robust processing of illumination changes.

[0065] Furthermore, through channel separation operation (parameter: retain only the grayscale value of the Value channel, discard the Hue and Saturation components), the Value channel matrix of each frame image is extracted, and the data type is converted into a single-channel grayscale image to retain the original precision to avoid quantization error, thus forming a single-channel grayscale image that is sensitive to brightness information.

[0066] Furthermore, by utilizing the brightness difference sensitivity of the Value channel, the brightness distribution difference between the melted ice crystal area and the unthawed area on the surface of frozen meat is expressed in matrix form, ensuring that subsequent adaptive threshold segmentation can achieve preliminary differentiation between the thawed area and the unthawed background based on this brightness feature.

[0067] By using the above color space conversion and channel extraction processing methods, the RGB image sequence is transformed into a Value channel image sequence containing brightness-sensitive characteristics, thus providing a stable, single-channel, and physically meaningful data source for subsequent dynamic threshold segmentation input.

[0068] For example, in a sequence of RGB images with a resolution of 1920×1080, the industrial camera captures images with an ambient illumination of 850 lux, an exposure time of 12 ms, and a fixed gain of 6 dB. Each frame is processed by an HSV conversion algorithm, outputting three component matrices: Hue, Saturation, and Value. The Value matrix has values ranging from 0 to 255. After extracting the Value channel and converting it to grayscale, the average brightness of this grayscale image is approximately 90 in the ice crystal region and approximately 150 in the unthawed region, resulting in a brightness difference of 60. This brightness difference is used as a key input for maximizing the inter-class variance in the subsequent Otsu algorithm calculation during adaptive thresholding segmentation. The generated binary mask significantly improves the separation between ice crystals and unthawed textures under this parameter configuration, ensuring that the thresholding segmentation results stably reproduce the contour features of the thawed region in different batches of samples.

[0069] S3.2: Perform dynamic threshold segmentation processing on each frame in the V channel image sequence: use the Otsu algorithm to adaptively calculate the optimal segmentation threshold within a local time window to cope with the slow drift of background illumination and changes in reflection characteristics during the thawing process; based on the dynamic threshold, perform binarization processing on the V channel image to generate a preliminary thawing area mask image and obtain the initial pixel-level thawing area contour.

[0070] S3.3: Obtain a U-Net semantic segmentation model that has been fine-tuned on images of the thawing process of multiple types of frozen meat samples (such as beef brisket, pork tenderloin, and chicken breast). This model is fine-tuned end-to-end using a real scene dataset with labeled thawing boundaries on the basis of ImageNet pre-trained weights through transfer learning, optimizing its ability to identify the thawing front of low contrast and blurred edges. Input the V-channel image sequence into the fine-tuned U-Net model, perform pixel-level classification inference, and output a high-precision probability map of the thawing region.

[0071] We obtained the parameters of a U-Net semantic segmentation model that had been pre-fine-tuned on images of thawing processes of multiple types of frozen meat samples. The model structure was based on the standard encoder-decoder framework. On the basis of ImageNet pre-trained weight initialization, a feature mapping layer specific to the thawing scene was introduced to improve the detection capability of thawing regions with low contrast and blurred edges.

[0072] A transfer learning approach (parameters: base network ResNet34, pre-trained dataset ImageNet) is used to construct the feature extraction channel of the U-Net encoder. The weights of the first two convolutional kernels are frozen to retain general low-level features, and the high-level convolutions are unfrozen for targeted training to achieve sensitive recognition of the unique textures at the unfrozen front.

[0073] Furthermore, the model was trained on a real-world dataset with labeled thawing boundaries using an end-to-end fine-tuning approach (parameters: learning rate 1e−4, optimizer Adam, batch size 16). The model was optimized for classifying the boundary between thawed and unthawed regions by using a composite objective function of cross-entropy loss and Dice coefficient, and a weighted model that converged on the validation set was obtained.

[0074] Furthermore, by using data augmentation methods (parameters: random rotation 0–15°, random scaling factor 0.9–1.1, brightness perturbation ±10%), the robustness of the model to environmental changes during the thawing process is enhanced, effectively improving its adaptability to dynamic lighting and slight surface water marks, and generating a model version with stronger generalization ability.

[0075] Furthermore, the V-channel image sequence is input into the fine-tuned U-Net model, pixel-by-pixel classification inference is performed, Softmax normalization is used to obtain the probability value of each pixel belonging to the thaw region, and a two-dimensional probability matrix is output to form a high-precision thaw region probability map corresponding to each frame.

[0076] By using the finely tuned U-Net network inference output, the V-channel image generated in the previous step is accurately mapped to the probability distribution data of the thawing region, achieving precise pixel-level recognition of the thawing front with low contrast and blurred edges.

[0077] For example, in a scenario where three types of samples—beef brisket, pork tenderloin, and chicken breast—are thawed simultaneously, an industrial camera captures V-channel images at 1 fps and inputs them into a fine-tuned U-Net model. The encoder uses a ResNet34 backbone, freezing the parameters of the first two convolutional kernels and thawing the subsequent convolutional layers for training on the thawed texture. The real-world dataset is labeled with boundary pixel locations, and the training is performed with a batch size of 16 and a learning rate of [missing information]. Perform 20 iterations, with the loss function being... ,in For cross-entropy loss, The loss is the Dice coefficient; the average IoU of pixel recognition in the thawed area is significantly improved on the validation set, and the output probability map maintains boundary coherence in the blurred edge area, effectively supporting the subsequent fusion optimization processing of S3.4, which can significantly improve the accuracy and stability of the refined binary mask of the thawed area.

[0078] S3.4: Based on the probability map output by the U-Net and the mask image obtained by dynamic threshold segmentation, perform fusion optimization processing: a weighted fusion strategy is adopted to combine the results of the two, in which the high confidence region is mainly based on the U-Net output, and the edge transition region is introduced with a dynamic mask for morphological repair. The continuity of the thawed region boundary is optimized by opening operation and boundary tracking algorithm, and finally a fine-grained binary mask of the thawed region is generated for each frame; based on this, the proportion of thawed pixels to total pixels in each frame is calculated, and a time-series curve of the proportion of thawed area is generated.

[0079] Based on the thawing region probability map output by U-Net and the mask image obtained from dynamic thresholding, a weighted fusion algorithm (parameter: fusion weight matrix W_f adaptively assigned according to region confidence) is used to fuse the segmentation results from the two sources. Furthermore, a high-confidence pixel set is obtained through a region confidence calculation method (parameter: the average product of the probability map pixel value p_i and the mask binary label m_i), and the high-confidence regions are assigned fusion weights with U-Net output as the dominant result, preserving their boundary shape and internal consistency. Further, a dynamic threshold mask repair mechanism (parameter: 3×3 local neighborhood window, intensity difference threshold δ=0.15) is introduced in the edge transition region to fill low-value areas of the probability map and anchor edge contours, resulting in a pre-repaired fused image. Finally, a morphological opening operation algorithm (parameter: structuring element type = ellipse, radius r=2 px) is used to remove small-area noise regions and clean up isolated pixels, improving boundary smoothness. Furthermore, based on the boundary tracking algorithm (parameter: four-neighbor search mode), the boundary contour sequence of the fused image is extracted, and contour node reordering and corner smoothing filtering are performed to optimize the continuity and geometric stability of the thawed region boundaries. Through binarization, the optimized fused result is transformed into a refined binary mask of the thawed region. The ratio of the number of thawed region pixels N_defrost to the total number of pixels N_total in each frame is calculated to form a time-series curve of the thawed area proportion, achieving a quantitative output of the dynamic change of the thawed area proportion over time.

[0080] By combining the U-Net probability map with the dynamic threshold segmentation mask through the weighted fusion and morphological repair algorithm, the thawing region identification result with higher spatial and boundary consistency is transformed into a stable calculation of the thawing area proportion time series curve, providing high-precision visual input for subsequent roughness index extraction and thawing rate analysis.

[0081] For example, in a frozen meat thawing monitoring scenario involving beef brisket and pork tenderloin, the U-Net model outputs a thawing region probability map with a resolution of 1920×1080. The initial mask obtained from dynamic threshold segmentation exhibits boundary breakage. The fusion weight matrix W_f is set with a high-confidence region weight of 0.85 and an edge transition region weight of 0.4. A 3×3 neighborhood window and a brightness threshold of δ=0.15 are used for mask repair. The morphological opening operation structuring element is an ellipse with a radius of 2 pixels, effectively removing noise spots with a diameter less than 3 pixels. The contour sequence output from boundary tracking contains approximately 1200 points, and after corner smoothing filtering, the boundary continuity is improved to over 97%. In the test video sequence, the number of pixels in the thawed area of a certain frame is N_defrost=528000, and the total number of pixels is N_total=2073600. The thawed area ratio is calculated to be 0.2548 according to the above formula. The continuous time series output forms a smooth ratio change curve, which is used for subsequent calculation of the thaw rate ΔA(t) and cross-modal cross-correlation analysis, significantly improving the visual segmentation accuracy and edge stability of the thaw uniformity evaluation model.

[0082] S3.5: Based on the refined binary mask sequence of the thawing region, calculate the Hausdorff distance variation coefficient and Fourier descriptor frequency domain energy entropy of the edge contour frame by frame to quantify the spatial complexity and irregularity of the thawing front and generate an edge roughness time series sequence; perform a first-order difference operation on the thawing area proportion time series curve to calculate the area change ΔA(t) between adjacent time points and obtain the thawing process change rate benchmark sequence as a visual driving reference signal for subsequent cross-modal dynamic cross-correlation analysis.

[0083] Based on the refined binary mask sequence of the thawed region, an edge contour extraction algorithm (parameters: Canny operator, low threshold 0.2, high threshold 0.5) is used to accurately detect the boundary pixel chain of the thawed region at each time step and generate a complete set of contour points.

[0084] Furthermore, by using the Hausdorff distance calculation method (parameter: bidirectional maximum and minimum distance mode), the bidirectional Hausdorff distance is calculated between the contour point sets of adjacent time frames to obtain the magnitude of local boundary position changes. Combined with the statistical analysis model, the coefficient of variation of the Hausdorff distance is calculated to generate a spatial index characterizing the stability and complexity of edge morphology.

[0085] Furthermore, the Fourier descriptor extraction algorithm (parameters: boundary points are normalized to 512 points, and the first 40 coefficients are truncated in the frequency domain) is used to map the contour point sequence to the frequency domain, calculate the energy distribution entropy value of the frequency coefficients, obtain the frequency domain complexity index reflecting the degree of edge irregularity, and construct the frequency domain components in the edge roughness sequence.

[0086] Furthermore, by splicing time series data, the Hausdorff distance variation coefficient and the Fourier descriptor frequency domain energy entropy are sequentially mapped frame by frame to generate a dual-component combined edge roughness time series, which is used to uniformly characterize the spatial complexity and local structural irregularity of the thawing front.

[0087] Furthermore, a first-order difference algorithm (parameter: the difference step size corresponds to an image frame rate of 1 fps) is used to calculate the area change at adjacent time points on the time-series curve of the thawed area proportion, as shown in the following formula: in, This represents the percentage of the area thawed at the current point in time. For time indexing, difference results A baseline value for describing the rate of change of the thawing process in adjacent frames.

[0088] By using a first-order difference processing method, the thawing area percentage curve is transformed into a baseline sequence of thawing process change rates, achieving the expected technical effect of providing an accurate visual-driven reference signal for cross-modal dynamic cross-correlation analysis.

[0089] For example, in the image sequence of the thawing process of beef brisket samples, the resolution of the industrial camera is 1920×1080 pixels, and the refined binary mask is output from the previous step. Canny edge extraction is performed on each frame mask, with edge pixel chain lengths between 3500 and 4200, generating a set of contour points. The bidirectional distance between adjacent frame contour point sets is calculated using Hausdorff distance, with distance values between 3 and 12 pixels, and the coefficient of variation is calculated to obtain parameter values ranging from 0.18 to 0.26. Based on Fourier descriptor analysis of the boundary point sequence, the energy entropy of the frequency domain coefficients fluctuates between 2.8 and 3.4, and the edge roughness sequence is stitched together frame by frame from the above two indices. Inter-frame difference calculation is performed on the thawing area percentage curve to obtain... The value ranges from 0.004 to 0.011. When this rate reference sequence is input into the subsequent cross-modal dynamic cross-correlation analysis module, it can significantly improve the sensitivity and stability of the identification of the coupling relationship between acoustic fingerprint and visual thawing process.

[0090] like Figure 3 As shown, step S4 involves: using the thawing process change rate benchmark sequence as a reference input, calculating the dynamic cross-correlation function between it and the features of each dimension in the 'phase change acoustic fingerprint' vector sequence, identifying the acoustic sub-feature with the maximum time delay correlation and its corresponding time delay parameter, and constructing a cross-modal temporal resonance feature set containing time delay matching relationships. Specifically, this includes: S4.1: Based on the reference signal of the thawing process change rate benchmark sequence of the preceding output, the feature subsequences of each dimension in the 'phase change acoustic fingerprint' vector sequence aligned with its time are obtained, including the instantaneous acoustic energy entropy sequence, the dominant frequency drift rate sequence and the short-time zero-crossing rate variation coefficient sequence, as inputs to the multi-channel acoustic dynamic response to be analyzed, so as to support cross-modal temporal correlation modeling.

[0091] S4.1: Obtain the reference signal of the thawing process change rate output by S3, and at the same time obtain the feature subsequences of each dimension in the 'phase change acoustic fingerprint' vector sequence generated by S2, including the instantaneous acoustic energy entropy sequence, the dominant frequency drift rate sequence and the short-time zero-crossing rate variation coefficient sequence, as input to the multi-channel acoustic dynamic response to be analyzed, so as to support cross-modal temporal correlation modeling.

[0092] S4.1: Obtain the reference signal of the thawing process change rate output by S3, and at the same time obtain the feature subsequences of each dimension in the phase change acoustic fingerprint vector sequence generated by S2, including the instantaneous acoustic energy entropy sequence, the dominant frequency drift rate sequence and the short-time zero-crossing rate variation coefficient sequence, as input to the multi-channel acoustic dynamic response to be analyzed, so as to support cross-modal temporal correlation modeling.

[0093] Based on the thawing process change rate benchmark sequence ΔA(t) generated by step S3, the visual modal reference input is used. The data pipeline loading mechanism (parameter: timestamp accuracy ≤ 1 ms) is used to import the sequence into the cross-modal analysis module to achieve standardized input of the visual reference signal.

[0094] Furthermore, using a feature channel selection algorithm (parameters: three feature dimensions, including instantaneous acoustic energy entropy, dominant frequency drift rate, and short-time zero-crossing rate variation coefficient), the corresponding multi-channel acoustic dynamic response subsequences are extracted from the phase-change acoustic fingerprint vector sequence generated in step S2, while maintaining the time index at the original sampling rate.

[0095] Furthermore, using a timestamp synchronization verification method (parameter: maximum allowed asynchronous deviation ±0.5 ms), a time-axis comparison is performed between the visual reference signal and each acoustic feature subsequence to eliminate segments with time drift or missing frames, ensuring phase consistency and sampling integrity of the input data in cross-modal analysis.

[0096] Furthermore, a data normalization processing algorithm (parameter: Z-score normalization, for the target range with a mean of 0 and a variance of 1) is applied to unify the amplitude scale of the visual reference signal ΔA(t) and the multi-channel acoustic feature subsequence, respectively, in order to eliminate the influence of the difference in the dimensions of different modes on the cross-correlation calculation.

[0097] Furthermore, an input matrix for dynamic cross-correlation analysis is constructed. The acoustic feature channels and visual reference signals are arranged according to a unified time index, forming a bimodal basic dataset that can be directly used for sliding window segmentation.

[0098] Through the above feature extraction and time alignment processing, the original visual reference sequence and acoustic dynamic response sequence from the previous step are transformed into cross-modal input data with phase consistency, scale uniformity and channel integrity, thereby achieving the accuracy and operability of cross-modal temporal correlation modeling.

[0099] For example, in an evaluation of the thawing uniformity of a cold-chain beef brisket sample, images acquired by an industrial camera at 1 fps were segmented at the pixel level to extract the thawing region, resulting in a ΔA(t) sequence with a length of 1800 sampling points. Correspondingly, acoustic signals acquired by a microphone array were denoised using wavelet thresholding and extracted using STFT features to generate an instantaneous acoustic energy entropy sequence, a dominant frequency drift rate sequence, and a short-time zero-crossing rate variation coefficient sequence. The number of sampling points for each sequence was strictly consistent with ΔA(t). Timestamp synchronization verification was used to detect and remove four segments with a 0.7 ms delay, retaining 1796 valid time points. Z-score normalization was performed on each sequence; for example, the original mean of the instantaneous acoustic energy entropy was 3.52, and after normalization, the mean was 0 and the standard deviation was 1. ΔA(t) and the three types of acoustic features were combined into a matrix by time index, with a matrix dimension of 1796×4. In the subsequent S4.2 sliding window processing, this matrix was divided into 353 local time series blocks with a window length of 30 s (30 sampling points) and a step size of 5 s (5 sampling points). This provided complete and dimensionless dual-modal input data for dynamic cross-correlation calculation and time delay feature identification, significantly improving the accuracy and robustness of cross-modal analysis.

[0100] S4.2: Perform sliding window segmentation on the reference sequence of the thawing process change rate and the acoustic feature subsequence of each dimension respectively. Use an overlapping window function with a length of 30 seconds and a step size of 5 seconds to divide the time series segments and generate a set of local time series blocks to enhance the time resolution of dynamic correlation analysis and suppress global noise interference, thereby obtaining a dual-modal time series data unit with local stationarity.

[0101] S4.2: Perform sliding window segmentation on the reference sequence of the thawing process change rate and the acoustic feature subsequence of each dimension respectively. Use an overlapping window function with a length of 30 seconds and a step size of 5 seconds to divide the time series segments and generate a set of local time series blocks to enhance the time resolution of dynamic correlation analysis and suppress global noise interference, thereby obtaining a dual-modal time series data unit with local stationarity.

[0102] S4.2: Perform sliding window segmentation on the reference sequence of the thawing process change rate and the acoustic feature subsequence of each dimension respectively. Use an overlapping window function with a length of 30 seconds and a step size of 5 seconds to divide the time series segments and generate a set of local time series blocks to enhance the time resolution of dynamic correlation analysis and suppress global noise interference, thereby obtaining a dual-modal time series data unit with local stationarity.

[0103] S4.3: Based on each pair of local time blocks, calculate the dynamic cross-correlation function values of the thawing process rate reference signal and each acoustic feature subsequence under different time lags τ (-20s ≤ τ ≤ +20s), and use the normalized cross-correlation algorithm ρ(τ) = Σ t / (√Σ[t]x²(t) √Σ[t]y²(t)), extract the maximum absolute cross-correlation coefficient between each pair of signals and its corresponding optimal time delay parameter Δt_max to form a preliminary time delay correlation measurement result.

[0104] S4.3: Based on each pair of local time blocks, calculate the dynamic cross-correlation function values of the reference signal for the rate of change of the thawing process and each acoustic feature subsequence within different time delay ranges. Use the normalized cross-correlation algorithm to extract the maximum absolute cross-correlation coefficient between each pair of signals and its corresponding optimal time delay parameter to form preliminary time delay correlation measurement results.

[0105] S4.3: Based on each pair of local time blocks, calculate the dynamic cross-correlation function values of the reference signal for the rate of change of the thawing process and each acoustic feature subsequence under different time delays τ (-20s ≤ τ ≤ +20s). Use the normalized cross-correlation algorithm to extract the maximum absolute cross-correlation coefficient between each pair of signals and its corresponding optimal time delay parameter Δt_max, forming a preliminary time delay correlation measurement result.

[0106] S4.4: Based on the maximum cross-correlation coefficient extracted from all sliding windows and the corresponding optimal time delay parameter, perform spatiotemporal consistency clustering analysis. Use the DBSCAN clustering algorithm to perform pattern recognition on the acoustic feature dimension, determine the acoustic sub-features that stably appear in continuous time segments with high correlation and concentrated time delay distribution, mark them as the dominant acoustic response channels with significant dynamic resonance characteristics, and output the dominant feature identifier list.

[0107] The set of maximum cross-correlation coefficients and the corresponding set of optimal time delay parameters within each sliding window output by S4.3 are obtained to construct a time-labeled feature matrix as input data for pattern recognition. The DBSCAN clustering algorithm (parameters: minimum sample size minPts, radius threshold ε) is used to perform density clustering analysis on the cross-correlation coefficient-time delay two-dimensional coordinate space of the feature matrix. Furthermore, a feature classification method based on Euclidean distance (distance metric: weighted sum of cross-correlation coefficient differences and time delay differences) is used to merge acoustic features with similar peak correlations and concentrated time lag distributions, resulting in a set of candidate clusters. Finally, the time coverage (the proportion of time slices in which the cluster appears in the window sequence) and concentration index within each cluster are calculated. (in The standard deviation of the time delay. The method uses the mean cross-correlation coefficient to jointly evaluate stability and concentration characteristics, and selects clusters that appear stably in continuous time segments and have high concentration of time delay distribution. Further, clusters with time coverage greater than the threshold T_cov and concentration index less than the threshold T_con are selected from the candidate clusters and identified as dominant acoustic response channels, with their corresponding feature dimension numbers recorded. Through DBSCAN clustering and stability screening, the time delay correlation measurement results from the previous step are transformed into a list of dominant feature identifiers, enabling automatic identification of significant dynamic resonance channels in cross-modal time series data.

[0108] For example, in the analysis of the thawing process of a frozen beef brisket sample, the sliding window length was set to 30 seconds with a step size of 5 seconds. The maximum cross-correlation coefficients and corresponding optimal time delay parameters of 400 window segments were obtained, forming a 400×2-dimensional feature matrix. The DBSCAN parameters were configured as ε = 0.05 and minPts = 5. Seven clusters were obtained from the clustering results, with three clusters having a time coverage rate above 0.6 and a concentration index below 0.15. Using thresholds T_cov = 0.5 and T_con = 0.2 as criteria, three dominant acoustic response channels were selected, corresponding to one dimension of instantaneous acoustic energy entropy, dominant frequency drift rate, and short-time zero-crossing rate coefficient of variation. In this scenario, the channels identified by the dominant features showed significantly higher coupling strength than the non-dominant channels in subsequent resonance intensity modeling, verifying the effectiveness of the clustering selection.

[0109] S4.4: Spatiotemporal consistency clustering analysis is performed based on the maximum cross-correlation coefficient extracted within all sliding windows and the corresponding Δt_max. The DBSCAN clustering algorithm is used for pattern recognition of acoustic feature dimensions to determine the stable occurrence of high correlation (|ρ(Δt_max)) in continuous time segments. maxAcoustic sub-features with time delay distribution concentrated at |>0.6) are marked as dominant acoustic response channels with significant dynamic resonance characteristics, and a list of dominant feature identifiers is output.

[0110] S4.4: Based on the maximum cross-correlation coefficient extracted within all sliding windows and the corresponding Δt max Spatiotemporal consistency clustering analysis was performed, and the DBSCAN clustering algorithm was used to perform pattern recognition on the acoustic feature dimensions to determine the stable occurrence of high correlation (|ρ(Δt)) in continuous time segments. max Acoustic sub-features with time delay distribution concentrated at |>0.6) are marked as dominant acoustic response channels with significant dynamic resonance characteristics, and a list of dominant feature identifiers is output.

[0111] S4.5: Optimal time delay parameter Δt for integrating the dominant acoustic response channel max Based on the corresponding maximum cross-correlation coefficient, a structured cross-modal temporal resonance feature set is constructed. This feature set includes the name of each type of effective acoustic feature, its Δt relationship with the rate of change of the thawing process, and the corresponding maximum cross-correlation coefficient. max The mean, standard deviation, and average correlation strength serve as the core inputs for subsequent modeling and spatial consistency verification of the resonance intensity index Ri(t).

[0112] S4.5: Optimal time delay parameter Δt for integrating the dominant acoustic response channel max Based on the corresponding maximum cross-correlation coefficient, a structured cross-modal temporal resonance feature set is constructed. This feature set includes the name of each type of effective acoustic feature, its Δt relationship with the rate of change of the thawing process, and the corresponding maximum cross-correlation coefficient. max The mean, standard deviation, and average correlation strength serve as the core inputs for subsequent modeling and spatial consistency verification of the resonance intensity index Ri(t).

[0113] S4.5: By integrating the optimal time delay parameters of the dominant acoustic response channels with the corresponding maximum cross-correlation coefficients, a structured cross-modal temporal resonance feature set is constructed. This feature set includes the name of each type of effective acoustic feature, the mean, standard deviation, and average correlation strength of the optimal time delay between it and the rate of change of the thawing process, which serve as the core input basis for subsequent modeling and spatial consistency verification of the resonance intensity index R_i(t).

[0114] Using the list of dominant feature identifiers output by S4.4 and its associated maximum cross-correlation coefficient and optimal time delay parameter set as input data, a structured data mapping method (parameters: feature dimension identifier, time delay array, cross-correlation coefficient array) is adopted to encode the time delay matching relationship between each dominant acoustic response channel and the thawing process change rate benchmark sequence one-to-one, thereby realizing the data structure initialization of cross-modal correlation characteristics.

[0115] Furthermore, using statistical analysis methods (parameters: time window length = total sampling duration, statistical method = arithmetic mean and standard deviation), the mean and standard deviation of the optimal time delay parameters for the same acoustic feature within multiple sliding windows are calculated to form a generalized distribution of the time delay for that feature; the mean delay is calculated using the following formula: in Let be the optimal time delay value for the k-th sliding window. This represents the total number of windows.

[0116] Furthermore, using a correlation aggregation algorithm (parameter: cross-correlation threshold = 0.5), the arithmetic mean of the maximum absolute cross-correlation coefficients of the same acoustic feature across all windows is calculated to generate an average correlation strength index; the calculation is performed using the following formula: in It represents the maximum absolute cross-correlation coefficient of the k-th sliding window.

[0117] Furthermore, by using a field combination method (parameters: feature name string, mean delay, standard deviation of delay, and average correlation strength value), the above quantization results are encapsulated into quadruple records, and a cross-modal temporal resonance characteristic data table is established with the feature name as the index, so as to achieve orderly storage of quantization indicators of different acoustic features.

[0118] By using a formatted output algorithm (parameter: JSON key-value pair pattern), the cross-modal time-series resonance characteristic data table is transformed into a structured cross-modal time-series resonance feature set, enabling the resonance characteristic data to be used in subsequent resonance intensity indices. Direct invocation during the modeling and spatial consistency verification process ensures accurate alignment of time delay matching relationships with relevant intensity indicators.

[0119] For example, in monitoring the thawing process of a batch of beef brisket samples in a cold chain processing workshop, the input conditions were: the dominant acoustic response channel included instantaneous acoustic energy entropy and dominant frequency drift rate; the total number of sliding windows K was set to 60; and the time delay array values ranged from 12 to 18 seconds. When calculating the mean delay, the obtained instantaneous acoustic energy entropy... for seconds, standard deviation is seconds; frequency drift rate for seconds, standard deviation is Seconds. The average correlation intensity calculation results show that the instantaneous sound energy entropy... = The main frequency drift rate = The above results are recorded in quadruplicate form as (feature name: instantaneous acoustic energy entropy, mean delay: 15.2s, standard deviation of delay: 1.8s, average correlation strength: 0.68) and (feature name: dominant frequency drift rate, mean delay: 14.6s, standard deviation of delay: 2.1s, average correlation strength: 0.71), and stored in the cross-modal temporal resonance feature set. In the subsequent R_i(t) modeling stage, the system directly calls this feature set, which significantly improves the accuracy of the resonance intensity index in time delay compensation of local phase transition dynamics and the cross-modal coupling characterization ability, effectively improving the robustness and accuracy of spatial consistency detection.

[0120] Step S5: Based on the cross-modal temporal resonance feature set, a resonance intensity index Ri(t) is defined. This index is formed by the product of the absolute value of the maximum cross-correlation coefficient and the standard deviation of the instantaneous acoustic energy entropy. It is used to quantify the coupling tightness between the local phase transition dynamic response and the macroscopic thawing process, generating a frame-by-frame updated temporal sequence of resonance intensity. Specifically, it includes: S5.1: Based on the cross-modal temporal resonance feature set output by the previous step S4, extract the acoustic sub-features with the largest time delay correlation and their corresponding time delay parameter Δt_max as the key input conditions for constructing the resonance intensity index; use this time delay parameter to perform time delay alignment processing on the features of each dimension in the acoustic fingerprint vector sequence to ensure synchronous comparability with the benchmark sequence of the thawing process change rate in dynamic response.

[0121] Based on the cross-modal temporal resonance feature set output by step S4, a feature retrieval method (parameters: list of dominant feature identifiers, mean optimal time delay) is used to identify the acoustic sub-features with the greatest time delay correlation and their corresponding delay parameters. Precise extraction.

[0122] Furthermore, by using the index matching method (parameters: acoustic fingerprint vector sequence, multi-dimensional feature index), the extracted acoustic sub-features are mapped to specific dimensions in the acoustic fingerprint vector sequence, and a reference to the original temporal feature sequence is obtained.

[0123] Furthermore, a time-delay alignment algorithm is employed (parameter: time delay). Interpolation method = cubic spline), which realizes the time axis translation operation of the acoustic sub-feature sequence and generates an acoustic feature synchronization sequence that is strictly aligned with the reference sequence of the rate of change of the thawing process in the dynamic response dimension.

[0124] Furthermore, by using a boundary condition check method (parameters: synchronization sequence length, missing data padding method = zero-order hold), missing time boundary data caused by time delay translation is filled in, and an complete aligned signal sequence is obtained.

[0125] Furthermore, a phase consistency correction algorithm (parameters: least squares phase fitting, multi-channel phase difference threshold = ±1.5°) is used to fine-tune and calibrate the aligned multi-channel acoustic features in the phase dimension, ensuring that there is no coupling index error caused by phase drift during cross-modal comparison.

[0126] By using the aforementioned time-delay alignment processing method, the cross-modal temporal resonance feature set results from the previous step are transformed into an acoustic dynamic response sequence that is strictly synchronized with the visual reference signal, thereby providing distortion-free input conditions for subsequent resonance intensity calculations.

[0127] For example, in a cold chain pork tenderloin thawing detection, the dominant acoustic sub-feature identified by the cross-modal temporal resonance feature set is the instantaneous acoustic energy entropy and the time delay parameter. for seconds. The length of this feature dimension in the acoustic fingerprint vector sequence is... The frame corresponds to the length of the baseline sequence for the rate of change in the thawing process. Frame. The alignment algorithm uses cubic spline interpolation to shift the acoustic feature sequence as a whole. Seconds are interpolated to fill the beginning and end, and the boundary ends are insufficient. The frame uses zero-order hold padding. In multi-channel phase correction, the phase difference between each channel and the reference channel is calculated; if it exceeds... The delay parameters were then optimized using a least-squares fitting method. The final aligned entropy sequence corresponds precisely to the visual reference signal in each frame. In subsequent resonance intensity calculations, the correlation coefficient remained stable in a high-level range, significantly improving the accuracy of the thawing homogeneity coupling analysis.

[0128] S5.2: Normalize the acoustic sub-feature sequence after time delay alignment to eliminate the influence of dimensional differences and generate a standardized dynamic response sequence. Based on this, calculate the first-order dynamic cross-correlation coefficient ρ(Δt_max) between the sub-feature and the reference sequence of the thawing process change rate at each time t to obtain a dimensionless correlation metric that reflects the instantaneous coupling strength between the two, which serves as the first component of the resonance intensity index.

[0129] For the acoustic sub-feature sequence after time delay alignment with the time delay parameter Δt_max, the Z-score normalization method (parameters: mean μ, standard deviation σ) is used to eliminate dimensions and normalize amplitudes of features of different dimensions. Furthermore, the normalization transformation formula is applied... To obtain a dimensionless dynamic response numerical sequence, the distribution patterns of each feature are normalized to zero mean and unit variance. Furthermore, a first-order dynamic cross-correlation calculation method (parameters: time delay Δt_max, sliding window length L, step size s) is employed to measure the instantaneous correlation between the normalized sub-feature sequence and the baseline sequence of the thawing process change rate. Further, the normalized cross-correlation formula is used... The normalized cross-correlation coefficient is calculated to obtain the instantaneous correlation ρ(Δt_max) at time t, which serves as the first component of the resonance intensity index. Furthermore, through the aforementioned cross-correlation processing, the synchronicity between the sub-features and the visual baseline sequence in the dynamic response is quantified into a dimensionless index, enabling a direct measurement of the cross-modal dynamic coupling strength.

[0130] For example, in the scenario of monitoring the thawing of cold chain meat, an acoustic sub-feature sequence formed by the dominant frequency drift rate is selected. After time alignment of Δt_max = 15 s, Z-score standardization is performed with a sliding window length of L = 20 s and a step size of s = 5 s to obtain sequence data with zero mean and unit variance. Normalized cross-correlation calculation is performed on this sequence and the thawing area change rate curve. The parameters μ = 0 and σ are dynamically updated according to the statistical characteristics of the standardized sequence itself, using the above formula. The cross-correlation coefficient ρ is output in the range of [-1, 1] within each time window. In the verification of multiple sets of samples, when the thawing non-uniformity is obvious, the instantaneous correlation ρ(Δt_max) between the dominant frequency drift rate and the visual area change rate exceeds 0.75 in most time windows, reflecting the significant dynamic coupling effect between acoustic indicators and macroscopic thawing process. These ρ values will serve as the core input for subsequent resonance intensity modeling, which can effectively improve the sensitivity and physical interpretability of thawing uniformity determination.

[0131] S5.3: Extract the instantaneous acoustic energy entropy value corresponding to each frame from the original acoustic signal sequence as the core parameter characterizing the relaxation complexity of the microstructure during the ice crystal phase transition; calculate the standard deviation of the entropy value sequence using a sliding window to obtain its fluctuation intensity σ(Energy_Entropy_t) within the local time window, which is used to characterize the non-stationarity and local activity of the phase transition process, forming the second component of the resonance intensity index.

[0132] S5.4: Based on the absolute value of the maximum cross-correlation coefficient |ρ(Δt_max)| and the standard deviation σ(Energy_Entropy_t) of the instantaneous acoustic energy entropy, a product fusion operation is performed to define the resonance intensity index R_i(t) = |ρ(Δt_max)| ×σ(Energy_Entropy_t); this fusion mechanism comprehensively considers the cross-modal dynamic coupling strength and the inherent irregularity of the acoustic response, and generates a sequence of quantitative indices that can characterize the tightness of coupling between local phase transition dynamics and macroscopic thawing process.

[0133] Based on the first-order dynamic cross-correlation coefficient sequence obtained by S5.2 and the instantaneous acoustic energy entropy standard deviation sequence obtained by S5.3, a product fusion method is adopted (the parameter setting follows the principle of equal weighting of cross-correlation coefficient and entropy value fluctuation intensity) to achieve a unified characterization of the two parameters of cross-modal coupling strength and acoustic response complexity.

[0134] Furthermore, by using numerical matching rules (parameter: time index strictly aligned to the millisecond level), the absolute value of the cross-correlation coefficient is paired with the standard deviation of the entropy value of the corresponding time window to obtain a list of fusion coefficients for each time step.

[0135] Furthermore, the resonance intensity index is calculated using the multiplication fusion operator (parameter: 64-bit floating-point precision) and the following formula is executed: in, To achieve optimal time delay Normalized cross-correlation coefficient under the given conditions It represents the standard deviation of the instantaneous sound energy entropy within a local time window.

[0136] Furthermore, by windowing smoothing (parameters: window length 3 seconds, weighting coefficients [0.25, 0.5, 0.25]), short-term spike noise interference is suppressed, and a smoothed resonance intensity index sequence is generated.

[0137] Through the above product fusion and smoothing process, the cross-correlation coefficient and entropy change fluctuation intensity are transformed into a single quantitative sequence, achieving a stable and comparable characterization of the tight coupling between local phase transition dynamics and macroscopic thawing process.

[0138] For example, in a isothermal thawing test of a group of beef brisket samples, the absolute value of the cross-correlation coefficient reached 0.78 at the optimal time delay of 1.2 seconds, and the standard deviation of the instantaneous acoustic energy entropy was 0.042. Substituting both into the formula: The resonance intensity value was 0.03276. In continuous frame calculations, this index gradually increased in the early stage of the thawing process, remained stable in the middle stage, and slightly decreased in the later stage, indicating that the sample had a high degree of coupling between local phase transition and macroscopic process in the middle stage. After windowing smoothing, the sequence fluctuations were significantly suppressed, making it suitable for subsequent spatial positioning analysis. It can clearly locate the high resonance region in the beamforming heatmap overlay and provide reliable input for spatial consistency detection.

[0139] S5.5: The generated resonance intensity index R_i(t) sequence is time-aligned according to the image frame rate (1 fps) to form a frame-by-frame updated resonance intensity time sequence; this time sequence is used as the input condition for the subsequent spatial heat map overlay and anomaly detection module to drive the spatial positioning and continuity criterion evaluation of high resonance areas.

[0140] For the resonance intensity index R_i(t) sequence output by S5.4, a time alignment algorithm (parameter: frame rate 1fps) is used to achieve a strict correspondence between the index sequence and the image frame sequence in terms of sampling time.

[0141] Furthermore, by using the interpolation resampling method (parameters: linear interpolation, time reference is the image acquisition timestamp), the temporal domain reconstruction of the R_i(t) sequence is realized, and an index value matrix corresponding to each image frame is obtained.

[0142] Furthermore, a sliding window smoothing filter method (parameters: window length 3 frames, weighting coefficients [0.25, 0.5, 0.25]) is used to achieve local stabilization of the time series indicators and generate a smoothed frame-by-frame resonance intensity sequence.

[0143] Furthermore, by using a time index mapping method (parameters: index type UTC millisecond level, mapping strategy is doubly linked list mapping), the smoothing index sequence and image frame sequence index are synchronously bound, and a data interface that can be directly called by the spatial heatmap overlay module is generated.

[0144] By using the time alignment and smooth binding processing methods described above, the original resonance intensity results from the previous step are transformed into frame-by-frame quantization inputs required for spatial positioning and continuity criterion evaluation, thereby achieving a precise driving effect for the resonance region in the spatial analysis stage.

[0145] For example, during the thawing process of a batch of beef brisket samples, the resonance intensity index R_i(t) output by S5.4 is a time-series vector of length 360, with a sampling rate corresponding to the calculated frequency of 48 kHz audio. A time alignment algorithm is used, with the timestamps of images acquired by an industrial camera (1-second intervals) as the target sequence. Linear interpolation resampling is performed to map R_i(t) to a frame-by-frame sequence of length 360, where each element corresponds to an image frame. During this process, the interpolation function matches the time points in the audio calculation domain to the UTC millisecond-level time points in the image acquisition domain, ensuring no frame loss. Similarly, a weighted smoothing filter with a sliding window length of 3 frames is used to stabilize the mapped index sequence. For example, at frame 120, the original R_i(t) values are 0.68, 0.71, and 0.69, which are then smoothed to... The fluctuation amplitude was significantly reduced. Finally, the frame-by-frame smoothing sequence was bound to the image frame index in a doubly linked list to form the frame-by-frame resonance intensity matrix required for spatial positioning. This matrix was input into the subsequent beamforming thermal superposition module, realizing accurate spatial capture and continuity criterion evaluation of the high resonance region of the beef brisket sample during thawing.

[0146] Following S5 are: S6: Input the multi-channel acoustic signal received by the microphone array into the beamforming algorithm to reconstruct the spatial distribution heat map of the sound source, and superimpose it onto the image frame at the corresponding time to locate the spatial coordinates of the high resonance intensity region; count the number of consecutive frames that satisfy the resonance intensity index greater than the preset value and the spatial dispersion greater than the preset quadrant, and generate a spatial consistency anomaly flag. The spatial consistency anomaly flag is used as a benchmark for determining the eigenvector of the thawing uniformity evaluation.

[0147] Step S6 specifically includes: S6.1: Obtain the 8-channel raw acoustic signals collected by the microphone array during the synchronous sampling period. Based on the timestamp alignment results of each channel signal, construct a multi-channel time-domain signal matrix as input data for the beamforming algorithm to support subsequent sound source spatial orientation estimation.

[0148] S6.2: Based on the multi-channel time-domain signal matrix, a delay-and-sum beamforming algorithm is used to scan different spatial directions under the geometric constraints of the ring array, calculate the sound pressure energy value at each angle grid point, and generate a two-dimensional sound source spatial distribution map as the initial representation of the sound source spatial distribution heat map.

[0149] S6.3: The spatial distribution heat map of the sound source is spatially registered according to the rigid transformation relationship between the thawing cavity coordinate system and the image coordinate system. It is then resampled to the same resolution as the visible light image frame using bilinear interpolation and superimposed on the thawing region segmentation image with the corresponding timestamp to form a sound-visual joint heat map that integrates visual context.

[0150] Given the generated two-dimensional sound source spatial distribution map as input, a rigid transformation matrix registration method (parameters: calibration results of the three-dimensional ring array coordinate system, calibration results of the industrial camera imaging plane, and rigid body rotation and translation parameters between the two coordinate systems) is used to achieve a consistent mapping between the sound source distribution map and the thawing cavity coordinate system.

[0151] Furthermore, the spatial position of each pixel in the sound source spectrum is calculated on the image plane using a coordinate transformation formula, as follows: in The coordinates of the reprojected image plane are: This is the intrinsic parameter matrix of an industrial camera. It is a three-dimensional rotation matrix. The spatial coordinate vector of the sound source. It is a three-dimensional translation vector.

[0152] Furthermore, a bilinear interpolation algorithm (parameters: target image resolution M×N, input sound source spectrum resolution m×n, interpolation weight calculation formula) is used for resampling to adjust the original sound source spatial distribution spectrum to the same resolution as the visible light image frame, while preserving the spatial gradient characteristics of the sound pressure energy value.

[0153] Furthermore, through mask overlay operation (parameters: refined binary mask image of the thawed region, and resampled sound source spectrum pixel value matrix), weighted fusion is performed on the corresponding positions of the two pixels one by one. The weight coefficients are calculated according to the product of the normalized sound pressure energy value and the mask pixel value to generate a sound-visual joint heatmap matrix that integrates the visual context.

[0154] Furthermore, the fused sound-visual joint heatmap matrix is subjected to color-level mapping processing (parameters: minimum threshold of resonance intensity, maximum threshold of resonance intensity, linear gradient color table) to convert the matrix values into a visual color intensity distribution, which is used to intuitively present the sound pressure activity level in the thawing area.

[0155] By using rigid transformation registration and bilinear interpolation resampling, the two-dimensional sound source spatial distribution map from the previous step is transformed into a spatial resolution consistent with visible light visual data and superimposed on the thawed region segmentation result, realizing the spatial fusion of acoustic and visual information, and providing an accurate multimodal thermal map basis for subsequent spatial localization of resonance intensity and non-uniformity detection.

[0156] For example, within a constant-temperature thawing chamber (temperature 4℃±0.3℃), an industrial camera with a resolution of 1920×1080 and a circular microphone array radius of 150 mm are used. The rotation matrix between the microphone coordinate system and the camera coordinate system is described below. The translation vector is obtained from the three-dimensional calibration field measurement. for mm. Using the intrinsic parameter matrix. and the above , The parameters are used to rigidly transform the spatial distribution map of the sound sources and map it onto the camera imaging plane. The original sound source map resolution is 256×256, which is then resampled to 1920×1080 using bilinear interpolation. A linear weighted fusion algorithm is then used to fuse the finely detailed thawing region mask with the sound pressure energy normalization matrix. In the resulting acoustic-visual joint heatmap, the high sound pressure active region closely matches the boundary of the thawing region, and the color gradient clearly reflects the spatial distribution of acoustic activity. This verifies the accuracy and effectiveness of the proposed method in acoustic and visual spatial registration and fusion.

[0157] S6.4: Based on the sound-visual joint heatmap, extract the active sound source pixel group with resonance intensity R_i(t)>0.65 in each frame, calculate its spatial centroid coordinate sequence in the image plane, and divide the image into four spatial regions: upper left, upper right, lower left, and lower right according to the four-quadrant division rule, and count the distribution frequency of active sound source groups in each quadrant.

[0158] Based on the joint sound-visual heatmap output by S6.3, a threshold filtering method (parameter: R_i(t) threshold is set to 0.65) is used to achieve pixel-level identification of active sound source regions in each time frame.

[0159] Furthermore, by using a connected component analysis algorithm (parameter: 4-neighborhood connection), spatial clustering is performed on the set of pixels that meet the resonance intensity condition, and the set of pixel coordinates for each active sound source group is obtained.

[0160] Furthermore, by using the centroid calculation method (parameter: the centroid calculation formula takes the arithmetic mean of all pixel coordinates), the spatial centroid coordinates of the active sound source group in the image plane are extracted, and a frame-by-frame centroid coordinate sequence is generated.

[0161] Furthermore, by using a spatial region division method (parameters: the midpoint of the image width and height are taken as the division boundary), the image plane is divided into four quadrants: upper left, upper right, lower left, and lower right, and a quadrant division matrix is formed for subsequent statistics.

[0162] Furthermore, a frequency accumulation statistical method (parameter: statistical period is one frame) is adopted to accumulate the frequency of occurrence of active sound source groups in each quadrant and generate quadrant distribution frequency table data.

[0163] By using the clustering, centroid extraction, and quadrant statistics algorithms described above, the acoustic-visual heatmap results from the previous step are transformed into spatial distribution frequency data, thereby enabling a quantitative characterization of the distribution patterns of high-resonance active sound sources in different spatial quadrants.

[0164] For example, in a cold chain thawing detection scenario, the combined sound-visual heatmap output by the ring microphone array has a resolution of 640×480 pixels. With a resonance intensity threshold R_i(t) = 0.65, five connected component regions are identified as the active sound source regions in the current frame, each containing an average of 120 pixels. Connected component analysis is used to obtain the set of pixel coordinates for each region, and the centroid calculation formula is then applied. in and These are the pixel horizontal and vertical coordinates, Given the total number of pixels within the region, five centroid coordinates are calculated and distributed across the four quadrants of the image: two in the upper left quadrant, one in the upper right quadrant, one in the lower left quadrant, and one in the lower right quadrant. Based on the quadrant division matrix, a frequency table of quadrant distribution for the current frame is obtained, with the upper left quadrant showing the highest frequency. This output is then input into the S6.5 spatial dispersion calculation module, enabling the identification of spatial inconsistencies in high-frequency activations across consecutive frame quadrants during subsequent detection, thereby significantly improving the accuracy of thawing non-uniformity localization.

[0165] S6.5: Based on the quadrant distribution statistics of three consecutive frames, calculate the spatial dispersion σ_xy of the active sound source group. If σ_xy of the current frame is greater than 15 mm and there are high-frequency activations in more than three quadrants, then generate a spatial consistency anomaly flag bit as '1', otherwise set it to '0', which is used to indicate whether there is uneven thawing across regions.

[0166] The system acquires three consecutive frames of quadrant distribution statistics output by S6.4, using the centroid coordinate sequence as the input vector set for spatial dispersion calculation. The Euclidean distance formula is used to calculate pairwise distances between the centroid coordinates of continuously activated sound sources, forming the original sample set for dispersion. Furthermore, the spatial dispersion value of the centroid coordinates in the two-dimensional plane is determined using the standard deviation calculation method. The calculation formula is as follows: ,in For a single centroid to the mean distance, The sample size is given. Further, based on the quadrant distribution frequency statistics, the number of quadrants simultaneously active in three consecutive frames is extracted. A logical criterion is used: when the spatial dispersion of the current frame... When the number of active quadrants is greater than 15 and the number of active quadrants is greater than 3, the spatial consistency anomaly flag is set to 15. Furthermore, when the criterion is not met, the spatial consistency anomaly flag is set to... And output. Through the above logical judgment, the quadrant distribution and dispersion results of the previous step are transformed into spatial consistency anomaly detection indicators, realizing the automatic identification of uneven thawing phenomena across regions.

[0167] For example, in the monitoring scenario of the thawing process of cold chain beef brisket, the centroid coordinates of the high resonance intensity region in three consecutive frames are (120 mm, 85 mm), (135 mm, 92 mm), and (150 mm, 88 mm), respectively. The Euclidean distance samples are calculated to be 18.03 mm, 15.52 mm, and 15.81 mm, respectively, with a mean of 16.45 mm. In this embodiment, the spatial dispersion is calculated as follows: The number of active quadrants in three consecutive frames shows that sound source activation exists simultaneously in all four quadrants: top left, top right, bottom left, and bottom right, meaning the number of active quadrants is 4. According to the criteria, if the dispersion is >15 and the number of active quadrants is >3, the spatial consistency anomaly flag is triggered. The output detection results indicate significant uneven thawing across regions. In another set of data from the same scene, the spatial dispersion is 14.26 mm and the number of active quadrants is 3, which does not meet the criterion, and the flag output is... This indicates that the spatial distribution falls within the uniformity criterion.

[0168] Step S7: The mean, spatial dispersion, and acoustic fingerprint entropy slope of the resonance intensity time series are fused to form a multi-dimensional thawing uniformity evaluation feature vector. This feature vector is then input into a classification model trained using the lightweight XGBoost algorithm, outputting a four-level thawing uniformity rating result. Specifically, this includes: S7.1: Obtain the time series of resonance intensity, calculate its mean value in the current evaluation period based on the sliding window method, so as to characterize the overall stability level of the coupling between the local phase change dynamic response and the macroscopic thawing process in this period, and generate the 'mean resonance intensity' characteristic parameter.

[0169] S7.2: Obtain the sequence of heat map of spatial distribution of sound sources reconstructed by beamforming algorithm, extract the set of spatial coordinates of high resonance intensity region, calculate its standard deviation in two-dimensional plane as spatial dispersion, measure the degree of spatial inconsistency of the thawing front advancement, and generate the 'spatial dispersion' feature parameter.

[0170] S7.3: Obtain the time series sequence of instantaneous acoustic energy entropy in the phase change acoustic fingerprint, perform linear regression fitting on it, extract the slope of the regression line to reflect the evolution trend of the microstructure relaxation complexity during the thawing process, and generate the 'acoustic fingerprint entropy change slope' feature parameter.

[0171] S7.4: The three feature parameters, 'average resonance intensity', 'spatial dispersion' and 'acoustic fingerprint entropy change slope', are normalized. The dimensional differences are eliminated based on the Z-score normalization method to generate a multidimensional thawing uniformity evaluation feature vector under a unified scale, which is used as the input of the lightweight XGBoost classification model.

[0172] Obtain the three feature parameters 'average resonance intensity', 'spatial dispersion' and 'acoustic fingerprint entropy change slope' calculated by steps S7.1, S7.2 and S7.3 respectively, and construct the original feature vector sequence as input data for normalization processing.

[0173] The Z-score standardization algorithm (parameters: using the population mean μ and standard deviation σ of each feature in the current evaluation period as statistical benchmarks) is used to achieve zero mean and unit variance for each feature component. The formula is as follows: in, The eigenvalues to be normalized are... This is the mean of the feature in the current sample set. This is the standard deviation of the feature on the current sample set.

[0174] Furthermore, the 'average resonance intensity' feature is standardized using the above formula to obtain a zero-mean feature component with dimensionless differences removed; based on this, the 'spatial dispersion' feature is standardized using the same parameter method to obtain a spatial consistency characterization component with the same scale as the previous feature.

[0175] Furthermore, the same standardized function interface is called to process the 'acoustic fingerprint entropy slope' feature, generating a unit variance-based dynamic complexity change rate component, thereby eliminating the interference of variance differences between features on model training.

[0176] By performing vector concatenation on the three standardized feature components in a predetermined order, a multidimensional thawing uniformity evaluation feature vector under a unified scale is generated, ensuring that each component has equal weight and comparability in the input stage of the lightweight XGBoost classification model.

[0177] By using the Z-score standardization and vectorization concatenation methods described above, the results of the previous step are transformed into input feature data that fits the classification model, thereby achieving the expected technical effect of unifying and making the thawing uniformity assessment features comparable across different physical dimensions.

[0178] For example, in a cold chain food processing scenario, the system monitors the thawing process of a batch of beef brisket samples, obtaining the following characteristic values: 'average resonance intensity' (2.35), 'spatial dispersion' (18.6 mm), and 'acoustic fingerprint entropy change slope' (0.042). The mean values of these three characteristics for this batch of samples within the current evaluation period are 1.95, 16.2, and 0.039, with standard deviations of 0.25, 2.4, and 0.003, respectively. Substituting these data into the Z-score formula, the standardized result for 'average resonance intensity' is... =1.6; the standardized result for 'spatial dispersion' is =1.0; the standardized result of 'acoustic fingerprint entropy slope' is =1.0. The three are concatenated to form a standardized feature vector [1.6, 1.0, 1.0], which is input into the lightweight XGBoost classification model. This significantly improves the balance of decision contributions among different features during the model inference stage and obtains accurate unfreezing uniformity level output in the subsequent level determination.

[0179] S7.5: Input the multidimensional thawing uniformity evaluation feature vector into the pre-trained lightweight XGBoost classification model. The model optimizes the node splitting strategy based on historical labeled samples (including manual rating results) using a weighted cross-entropy loss function, outputs the category label corresponding to the highest probability, and generates a four-level thawing uniformity rating result (excellent / good / medium / poor).

[0180] The multidimensional thawing uniformity evaluation feature vector output by S7.4 is received as input data for the classification model, ensuring that each dimension of the feature vector is on a uniform numerical scale after Z-score standardization.

[0181] A lightweight XGBoost algorithm (parameters: maximum tree depth ≤ 4, learning rate ≤ 0.05, subsampling ratio = 0.8) is adopted, and pre-trained model weights are loaded to achieve weighted decision tree classification inference on input features.

[0182] Furthermore, by setting a weighted cross-entropy loss function, the weights of each class are adjusted to address the imbalance in the class ratio of the training samples. Optimize the split gain of the model on a minority of classes.

[0183] Furthermore, the contribution of three types of features—'average resonance intensity', 'spatial dispersion', and 'acoustic fingerprint entropy slope'—to node splitting is evaluated using a feature importance ranking algorithm (based on the Gain metric). This dynamically adjusts the feature splitting order within the model, thereby improving the overall classification accuracy.

[0184] Furthermore, by calculating the output probability vector for each category during the prediction phase... The category label with the highest probability is selected as the thawing uniformity level of the current sample.

[0185] This lightweight XGBoost inference processing method transforms the feature vector from the previous step into a qualitative hierarchical classification result, enabling intelligent determination of four-level thawing uniformity driven by multi-feature fusion.

[0186] For example, in the real-time monitoring of the beef brisket thawing process, the extracted 'average resonance intensity' feature value is 0.58, the 'spatial dispersion' feature value is 12.4 mm, and the 'acoustic fingerprint entropy slope' feature value is −0.004. After Z-score normalization, a three-dimensional feature vector (−0.12, 0.35, −0.08) is obtained. This feature vector is input into a lightweight XGBoost model, which contains 41 decision trees of depth 4, with a learning rate of 0.04 and class weights set as Excellent: 1.0, Good: 1.2, Average: 1.5, Poor: 1.8. After weighted cross-entropy optimization, the model outputs a probability vector. The highest probability of 0.44 corresponds to the 'Good' level, and the model directly generates a 'Good' result for the thawing uniformity. This result is then passed to S8 for trend heatmap plotting and early warning criterion judgment. In the same scenario, if changes in the three-dimensional feature vector lead to a significant increase in the probability of the 'Poor' category, the model's output level will immediately change to achieve rapid response and intelligent intervention to changes in the thawing process.

[0187] Following S7, step S8 is also included: generating a thawing uniformity trend heatmap based on the thawing uniformity level determination result. If the determination level is 'poor' or the determination level decreases in two consecutive steps, an early warning signal is triggered and the current grading process is paused, entering the manual review mode.

[0188] Step S8 specifically includes: S8.1: Obtain the four-level thawing uniformity level judgment results on the continuous time series output by the lightweight XGBoost classification model, construct a time-level mapping sequence based on the judgment results, where each time point corresponds to a classification level label (excellent / good / medium / poor), and encode it into a numerical level index sequence E(t) for subsequent trend quantification analysis.

[0189] S8.2: Based on the grade index sequence E(t), calculate the trend slope within the sliding time window, and use the least squares linear fitting algorithm to estimate the slope of the E(t) values at the most recent N time points to generate the trend coefficient ΔE of the thawing uniformity change, which serves as the basis for determining whether the thawing quality continues to deteriorate.

[0190] S8.3: Based on the trend coefficient ΔE and the current level status, perform a dual anomaly judgment logic: if the current level is 'poor', or the level index decreases by more than or equal to 1 within two consecutive adjacent time steps and the trend coefficient ΔE < -0.5, then generate an anomaly judgment flag flag_abnormal = True as the decision input to trigger an early warning.

[0191] S8.4: Based on the grade index sequence E(t) and its spatial consistency anomaly markers, the resonance intensity thermal distribution data are fused to generate a two-dimensional spatiotemporally aligned thawing uniformity trend heat map; the heat map uses image frames as spatial references and color intensity to characterize the time cumulative change rate of the historical resonance intensity mean of each region, intuitively presenting the evolution path and focus area of thawing non-uniformity.

[0192] S8.5: Based on the status of the anomaly detection flag_abnormal, execute process control operations: When flag_abnormal is True, send an early warning signal to the central control system and automatically pause the current automated grading process, lock the current sample information, activate the human-machine interface to prompt manual review instructions, and enter the manual review mode to await operator intervention for confirmation or adjustment.

[0193] For those skilled in the art, various other corresponding changes and modifications can be made based on the technical solutions and concepts described above, and all such changes and modifications should fall within the protection scope of the claims of this invention.

[0194] Unless otherwise defined, the technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this application pertains. The terms “first,” “second,” “third,” and similar terms used in this patent application specification and claims do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Similarly, the terms “an” or “a” and similar terms do not indicate a quantity limitation, but rather indicate the presence of at least one. The terms “comprising” or “including” and similar terms mean that the elements or objects preceding “comprising” or “including” encompass the elements or objects listed following “comprising” or “including” and their equivalents, and do not exclude other elements or objects. The “multiple” mentioned in the embodiments of this application refers to two or more. A and / or B indicate three possibilities: A; B; and A and B.

[0195] The above description is merely an exemplary embodiment of this application, but the scope of protection of this application is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this application, and such modifications or substitutions should all be covered within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A non-destructive testing and quality grading method for the thawing state of pre-frozen meat, specifically comprising: S1: Based on the spontaneous acoustic signals and visible light image sequences that evolve over time in a constant temperature thawing environment of the obtained frozen meat samples, generate a time-aligned original acoustic signal sequence and image frame sequence. S2: Perform adaptive wavelet threshold denoising on the acoustic signal sequence to obtain a time-frequency spectrum, and calculate three types of dynamic features in the sensitive frequency band: instantaneous acoustic energy entropy, main frequency drift rate, and short-time zero-crossing rate variation coefficient, to obtain a phase transition acoustic fingerprint vector sequence that characterizes the microstructure relaxation properties of the phase transition process. S3: Generate a thawing area proportion curve and an edge roughness time series sequence based on the image frame sequence, and obtain a reference sequence of the thawing process change rate by performing a first-order difference operation on the thawing area proportion curve and the edge roughness time series sequence. S4: Based on the thawing process change rate benchmark sequence, calculate the dynamic cross-correlation function between it and the features of each dimension in the phase change acoustic fingerprint vector sequence, and construct a cross-modal temporal resonance feature set by identifying the acoustic sub-features with the largest time delay correlation and their corresponding time delay parameters; S5: Generate a frame-by-frame updated temporal sequence of resonance intensity based on the cross-modal temporal resonance feature set; S7: By fusing the mean, spatial dispersion, and acoustic fingerprint entropy slope of the resonance intensity time series, a thawing uniformity evaluation feature vector is obtained, and this feature vector is input into the classification model to output the thawing uniformity level judgment result.

2. The method for non-destructive testing and quality grading of pre-frozen meat thawing state according to claim 1, characterized in that, Between S5 and S7, there is also: S6: Input the multi-channel acoustic signal received by the microphone array into the beamforming algorithm to reconstruct the spatial distribution heat map of the sound source, and superimpose it onto the image frame at the corresponding time to locate the spatial coordinates of the high resonance intensity region; count the number of consecutive frames that satisfy the resonance intensity index greater than the preset value and the spatial dispersion greater than the preset quadrant, and generate a spatial consistency anomaly flag. The spatial consistency anomaly flag is used as a benchmark for determining the eigenvector of the thawing uniformity evaluation.

3. The method for non-destructive testing and quality grading of pre-frozen meat thawing state according to claim 1, characterized in that, Following S7 are: S8: Generate a thawing uniformity trend heatmap based on the thawing uniformity level determination result. If the determination level is 'poor' or the determination level decreases in two consecutive steps, trigger an early warning signal and pause the current grading process, and enter the manual review mode.

4. The method for non-destructive testing and quality grading of pre-frozen meat thawing state according to claim 1, characterized in that, Step S1 specifically includes: S1.1: Based on the physical mechanism that frozen meat spontaneously generates a broadband acoustic response due to the relaxation of microscale structures caused by the phase change of ice crystals during the thawing process, a sensor array consisting of multiple low-noise capacitive microphones is arranged non-contactly in a ring-shaped distribution position inside the thawing cavity to obtain passive acoustic signal input from multiple perspectives. S1.2: The original analog acoustic signal output by the microphone array is pre-amplified and anti-aliasing filtered, and then analog-to-digital conversion is completed through a multi-channel synchronous sampling module to generate a digital domain original acoustic signal sequence; S1.3: An industrial-grade CMOS camera is orthogonally mounted on the top of the thawing chamber to acquire visible light images reflected from the surface of frozen meat under constant ambient illuminance conditions, generating a time-continuous sequence of raw image frames. S1.4: Using a unified hardware clock source in the embedded system, each frame of acoustic data packet and image data frame output by the microphone array and industrial camera is timestamped with UTC. Based on the timestamping, the time axis alignment operation of the heterogeneous data stream is performed to generate an audio-visual dual-modal synchronization dataset with strict time sequence correspondence. S1.5: The multi-channel digital audio signal sequence aligned with the timestamp is paired and encapsulated with the image frame sequence according to the time index to generate data units in a structured storage format.

5. The method for non-destructive testing and quality grading of pre-frozen meat thawing state according to claim 1, characterized in that, Step S2 specifically includes: S2.1: Obtain the original acoustic signal sequence collected by a ring-shaped distributed condenser microphone array, and decompose the signal of each channel using wavelet basis functions based on the discrete wavelet transform algorithm to extract wavelet coefficients at each scale; S2.2: Based on the aforementioned layered wavelet coefficient set, the noise standard deviation at each scale is calculated using the signal local variance estimation model. Then, the optimal threshold parameter for each layer of wavelet coefficients is adaptively determined according to the SURE unbiased risk estimation criterion, generating a scale-adaptive threshold vector. S2.3: Apply a soft thresholding function to the high-frequency detail coefficients in the layered wavelet coefficient set to perform coefficient shrinkage processing, and reconstruct the denoised acoustic signal waveform through inverse discrete wavelet transform (IDWT) to output a time-aligned denoised acoustic signal sequence; S2.4: Based on the denoised acoustic signal sequence, a high-resolution time spectrum is generated using short-time Fourier transform, and the sensitive frequency band is divided into equally spaced sub-bands. The instantaneous acoustic energy entropy in each sub-band is calculated as a nonlinear dynamic feature. At the same time, the time trajectory of the main frequency component is extracted and its first-order difference mean is calculated as the main frequency drift rate. The standard deviation of the short-time zero-crossing rate is calculated as the coefficient of variation, forming a three-dimensional dynamic feature vector group. S2.5: Align and concatenate the three-dimensional dynamic feature vector group according to time frames to generate a time-series feature matrix that is updated frame by frame, and output a standardized phase-change acoustic fingerprint vector sequence.

6. The method for non-destructive testing and quality grading of pre-frozen meat thawing state according to claim 1, characterized in that, Step S3 specifically includes: S3.1: Acquire the visible light image frame sequence captured by the top industrial camera, convert each frame of the original RGB image into an HSV format image based on the HSV color space conversion algorithm, and extract the V channel grayscale image of each frame by utilizing the sensitivity of the V channel to the brightness difference between the melted area and the unthawed area on the surface of frozen meat, and generate a V channel image sequence. S3.2: Perform dynamic threshold segmentation processing on each frame in the V channel image sequence: use the Otsu algorithm to adaptively calculate the optimal segmentation threshold within a local time window, and perform binarization processing on the V channel image based on the dynamic threshold to generate a preliminary thawed area mask image and obtain the initial pixel-level thawed area contour. S3.3: Obtain the U-Net semantic segmentation model that has been fine-tuned on images of the thawing process containing multiple types of frozen meat samples. Perform end-to-end fine-tuning using a real scene dataset with labeled thawing boundaries to optimize its ability to identify the thawing front in low-contrast, blurred edges. Input the V-channel image sequence into the fine-tuned U-Net model, perform pixel-level classification inference, and output a probability map of the thawing region. S3.4: Based on the probability map output by the U-Net and the mask image obtained by dynamic threshold segmentation, perform fusion optimization processing: adopt a weighted fusion strategy to combine the results of the two, optimize the continuity of the thawing region boundary through opening operation and boundary tracking algorithm, and generate a binary mask of the thawing region for each frame; generate a thawing area proportion time series curve by calculating the proportion of thawing pixels to total pixels in each frame. S3.5: Based on the refined binary mask sequence of the thawing region, calculate the Hausdorff distance variation coefficient and Fourier descriptor frequency domain energy entropy of the edge contour frame by frame to measure the spatial complexity and irregularity of the thawing front and generate an edge roughness time series sequence; perform a first-order difference operation on the thawing area proportion time series curve to calculate the area change between adjacent time points and obtain a reference sequence of the thawing process change rate.

7. The method for non-destructive testing and quality grading of pre-frozen meat thawing state according to claim 1, characterized in that, Step S4 specifically includes: S4.1: Obtain the feature subsequences of each dimension in the baseline sequence of the thawing process change rate and the phase change acoustic fingerprint vector sequence; S4.2: Perform sliding window segmentation processing on the reference sequence of the thawing process change rate and the acoustic feature subsequence of each dimension respectively to generate a set of local time series blocks to obtain dual-modal time series data units. S4.3: Based on each pair of local time blocks, calculate the dynamic cross-correlation function values of the reference signal of the thawing process change rate and each acoustic feature subsequence within different time delay ranges. Use the normalized cross-correlation algorithm to extract the maximum absolute cross-correlation coefficient between each pair of signals and its corresponding optimal time delay parameter to form preliminary time delay correlation measurement results. S4.4: Based on the maximum cross-correlation coefficient extracted from all sliding windows and the corresponding optimal time delay parameter, perform spatiotemporal consistency clustering analysis, use the DBSCAN clustering algorithm to perform pattern recognition on the acoustic feature dimension, determine the acoustic sub-features that stably appear in continuous time segments with high correlation and concentrated time delay distribution, mark them as the dominant acoustic response channels with significant dynamic resonance characteristics, and output the dominant feature identifier list. S4.5: By integrating the optimal time delay parameters of the dominant acoustic response channels with the corresponding maximum cross-correlation coefficients, a structured cross-modal temporal resonance feature set is constructed. This feature set includes the name of each type of effective acoustic feature, the mean of its optimal time delay with respect to the rate of change of the thawing process, its standard deviation, and the average correlation strength.

8. The method for non-destructive testing and quality grading of pre-frozen meat thawing state according to claim 1, characterized in that, Step S5 specifically includes: S5.1: Based on the cross-modal temporal resonance feature set, the acoustic sub-features with the largest time delay correlation and their corresponding time delay parameters are extracted. The time delay parameters are then used to perform time delay alignment processing on the features of each dimension in the acoustic fingerprint vector sequence. S5.2: Normalize the acoustic sub-feature sequence after time delay alignment to generate a standardized dynamic response sequence; and calculate the first-order dynamic cross-correlation coefficient between the sub-feature and the reference sequence of the rate of change of the thawing process at each time step to obtain a dimensionless correlation metric that reflects the instantaneous coupling strength between the two. S5.3: Extract the instantaneous acoustic energy entropy value corresponding to each frame from the original acoustic signal sequence, calculate the standard deviation of the entropy value sequence by sliding window, obtain its fluctuation intensity within the local time window, and form the second component of the resonance intensity index; S5.4: Based on the absolute value of the maximum cross-correlation coefficient and the standard deviation of the instantaneous acoustic energy entropy, perform a product fusion operation to generate a quantitative index sequence that can characterize the degree of coupling between local phase transition dynamics and macroscopic thawing process; S5.5: Time-align the resonance intensity index sequence according to the image frame rate to form a frame-by-frame updated resonance intensity time sequence.

9. The method for non-destructive testing and quality grading of the thawing state of pre-frozen meat according to claim 1, characterized in that, Step S2 denoising process employs discrete wavelet transform, signal local variance estimation, SURE unbiased risk estimation method adaptive thresholding, soft threshold shrinkage, and inverse wavelet transform reconstruction. After denoising, the signal is analyzed by short-time Fourier transform, and the instantaneous acoustic energy entropy, dominant frequency drift rate, and short-time zero-crossing rate variation coefficient characteristics are output frame by frame.

10. The method for non-destructive testing and quality grading of pre-frozen meat thawing state according to claim 1, characterized in that, In step S3, image preprocessing performs HSV color space conversion and extracts the V channel for each frame. The Otsu algorithm is used for dynamic thresholding and binarization. Combined with the thawing probability map output by the fine-tuned U-Net semantic segmentation model, a pixel-level thawing region binary mask is generated through a weighted fusion algorithm and morphological repair.