Machine tool thermal error prediction method and thermal error compensation system based on differentiable physical regularization

By introducing a differentiable finite element residual regularization module into the thermal error prediction model, and combining the physical laws of heat conduction with data-driven methods, the problem of insufficient adaptability and physical consistency of thermal error modeling across operating conditions is solved, and high-precision, real-time thermal error prediction and compensation are achieved.

CN122242090APending Publication Date: 2026-06-19CHONGQING UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHONGQING UNIV
Filing Date
2025-12-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing thermal error modeling methods are insufficient in terms of cross-condition adaptability and physical consistency, making it difficult to achieve high-precision, real-time thermal error prediction and compensation. They are prone to overfitting or failure, especially when data is scarce or operating conditions change.

Method used

A machine tool thermal error prediction method with differentiable physical regularization is adopted. By introducing a differentiable finite element residual regularization module, the physical laws of heat conduction are integrated into the backbone neural network as soft constraints to construct a joint optimization path driven by physics and data, forming a unified model of physical consistency and data fitting.

Benefits of technology

It significantly improves the accuracy of thermal error prediction and the model's cross-condition generalization ability, reduces the dependence on massive amounts of data, achieves high efficiency in small-sample learning and plug-and-play model functionality, and forms a fully automatic closed-loop thermal error compensation system.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242090A_ABST
    Figure CN122242090A_ABST
Patent Text Reader

Abstract

This invention discloses a method for constructing a machine tool thermal error prediction model and a compensation system based on differentiable physical regularization. By constructing a thermal error prediction model comprising a backbone neural network and a differentiable finite element residual regularization module, the predicted thermal error is treated as an equivalent heat source during training. Learnable stiffness matrices K and M are introduced, and a physical residual loss is constructed based on the discretized heat conduction equation. This residual loss is jointly optimized with data-driven loss, enabling the model output to achieve both data fitting accuracy and physical consistency. The compensation system comprises a physical layer, a digital twin core layer, and a user interaction layer. Sensor data is integrated via the OPC UA protocol, and the prediction model generates thermal error values ​​and compensation commands in real time, driving the CNC system to form a closed-loop control. This invention achieves a deep integration of physical principles and data-driven approaches, significantly improving the cross-condition generalization ability and compensation accuracy of thermal error prediction, providing an effective solution for the thermal stability control of high-precision machine tools.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of machine tool thermal error control technology, specifically a machine tool thermal error prediction method and thermal error compensation system based on differentiable physical regularization. Background Technology

[0002] Modern manufacturing is entering a new phase where competition is defined by micrometer or even sub-micrometer precision. Achieving such extreme machining accuracy requires more than just high-precision, high-rigidity mechanical structures; it necessitates controlling minute yet ubiquitous error sources such as geometric, load, and thermal errors. Thermal errors, in particular, often determine the upper limit of a manufacturing system's performance. Even slight fluctuations in ambient temperature or internal temperature gradients can trigger micrometer-level or even more severe error accumulation, ultimately affecting the machining quality and precision of critical components such as aero-engine parts, precision electronic devices, and advanced optical systems.

[0003] Thermal error modeling methods are generally divided into two main categories: physics-based methods and data-driven methods. The former emphasizes theoretical modeling and physical interpretation, possessing a clear physical foundation; however, these models heavily rely on the precise setting of material properties, boundary conditions, and structural parameters. Under complex and variable conditions, such models struggle to adapt quickly, and their prediction accuracy often drops significantly when faced with uncalibrated scenarios. In contrast, data-driven methods learn from historical data, achieving higher prediction accuracy in specific cases. From early Support Vector Machines (SVM) and Artificial Neural Networks (ANN) to more recent deep neural networks—including Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), and Gated Recurrent Units (GRU)—data-driven models have been widely used in thermal error modeling. Although data-driven methods have achieved considerable accuracy, their fundamental limitations remain unresolved. While numerous studies on thermal error modeling have emerged in recent years, most methods still improve prediction performance by increasing model depth and complexity. These methods may achieve excellent training results under specific conditions, but they generally suffer from poor generalization and a lack of physical consistency constraints. Furthermore, the "black box" nature of deep neural networks makes them highly dependent on a large amount of historical data. When the sample size is limited or the operating conditions change, they are prone to overfitting or even failure, which severely restricts their applicability in industrial settings.

[0004] To address these issues, existing technologies have begun exploring hybrid modeling approaches that integrate physical mechanisms with data-driven methods to improve robustness and interpretability. In current research, physical mechanisms are typically used only in the feature engineering stage, with the core prediction model remaining purely data-driven, lacking continuous physical embedding and explicit physical constraints. Most existing technologies still suffer from fundamental limitations such as rigid model structure, poor trainability, and limited interpretability. In contrast, research deeply integrating physical mechanisms with data-driven modeling remains relatively scarce.

[0005] Existing methods struggle to simultaneously achieve trainability, structural universality, and the integration of physical mechanisms, lacking a unified modeling framework that can flexibly embed physical principles into various deep learning architectures while ensuring generalization performance and deployment efficiency. To address this, researchers are gradually integrating deep learning into digital twin systems, improving overall performance through closed-loop mechanisms. While current research has made progress in model development and system implementation, a unified modeling method that simultaneously guarantees physical consistency, structural universality, and deployment friendliness is still lacking. Furthermore, an integrated system architecture is needed to achieve a fully closed-loop workflow from data perception and thermal error prediction to CNC (Computer Numerical Control) compensation, as well as highly interactive visualization. The main challenges currently facing thermal error prediction and system-level control include the following three points.

[0006] (1) Although some studies have introduced the principle of heat conduction into data-driven modeling, most of them only perform shallow embedding in the feature selection stage. The core prediction model still lacks interpretability and continuous physical constraints. At present, there is a lack of trainable modeling methods that clearly integrate physical laws, especially schemes that can maintain stable generalization ability under different conditions and model architectures.

[0007] (2) Existing methods rely heavily on a large amount of historical data, which can easily lead to overfitting or failure when data is scarce or operating conditions change. Their complex structure and tightly coupled parameters further reduce deployment efficiency and cross-condition adaptability, which restricts their practical application in industrial scenarios.

[0008] (3) Current thermal error modeling and system-level control lack deep integration, making it difficult to achieve real-time closed-loop operation of "sensing-prediction-compensation". Existing digital twin systems still have limitations in terms of physical modeling depth, compensation execution mechanism and industrial deployment adaptability. Summary of the Invention

[0009] In view of this, the purpose of this invention is to provide a machine tool thermal error prediction method and thermal error compensation system based on differentiable physical regularization. By introducing a differentiable physical regularization module as a physical constraint into an arbitrary backbone neural network, a joint optimization path between physical and thermal information data is formed, which can effectively improve the accuracy of thermal error prediction.

[0010] To achieve the above objectives, the present invention provides the following technical solution: This invention first proposes a machine tool thermal error prediction method based on differentiable physical regularization, comprising the following steps: Step 1: Acquire temperature sensor time-series data of the machine tool to be predicted at multiple consecutive time steps; Step 2: Input the temperature time series data into the trained thermal error prediction model to obtain the corresponding thermal error prediction value; wherein, the thermal error prediction model includes a backbone neural network for time series feature extraction and mapping, and a differentiable finite element residual regularization module coupled to the backbone neural network; Step 3: During the training process of the thermal error prediction model, the differentiable finite element residual regularization module receives the thermal error prediction value output by the backbone neural network and the corresponding input temperature sensor time-series data, constructs the physical residual based on the discretized transient heat conduction equation, and calculates the physical residual loss term. The physical residual loss item The predicted thermal error value is considered as an equivalent heat source acting on the heat conduction system; this includes the following steps: 31) The predicted thermal error It is considered as an equivalent time-varying heat source acting on the heat conduction system; 32) Introduce a learnable stiffness matrix With the mass matrix Where: stiffness matrix The mass matrix is ​​used to characterize the spatial thermal coupling relationship between nodes in the temperature field. Used to characterize the thermal response inertia of each node; 33) Based on the discretized transient heat conduction equation, calculate the heat conduction at each time step. physical residuals ; 34) Calculate the physical residual loss term The physical constraint loss term The mean square error of the physical residuals at all time steps; Step 4: Calculate the physical residual loss term. Compared to data-driven loss terms calculated based on predicted and actual values Perform a weighted summation to construct the total loss function. ; Step 5: Minimize the total loss function using the gradient backpropagation algorithm. Simultaneously, the parameters of the backbone neural network and the stiffness matrix in the differentiable finite element residual regularization module are also considered. With the mass matrix End-to-end collaborative optimization is performed so that the optimized model, while outputting the predicted thermal error, satisfies the relationship between its output and input temperature field as defined by the stiffness matrix. With the mass matrix Defined physical consistency constraints.

[0011] Furthermore, the physical residual expression calculated by the differentiable finite element residual regularization module is as follows: in: express The node temperature vector is composed of temperature sensor data at any given time; express The thermal error value vector predicted by the backbone neural network at any given moment; It is a learnable stiffness matrix used to characterize the spatial thermal coupling effect between nodes; It is a learnable mass matrix used to characterize the thermal response inertia of a node; This represents the sampling time interval.

[0012] Furthermore, the stiffness matrix The mass matrix is ​​constrained to be a symmetric positive definite matrix during training; Parameterization is performed using Cholesky decomposition, i.e.: in: It is a trainable lower triangular matrix; It is a small positive number. It is the identity matrix, thus ensuring the quality matrix. Positive definiteness and numerical stability.

[0013] Furthermore, the physical residual loss term Defined as the mean square error of the physical residuals over all time steps: in: This represents the total number of time steps.

[0014] Furthermore, in step four, the total loss function Represented as: in: This refers to the data-driven loss term; These are adjustable regularization weights used to balance data fit and physical consistency.

[0015] This invention also proposes a machine tool thermal error compensation system based on differentiable physical regularization, comprising: The physical layer includes a temperature sensor array deployed in the critical heat source area of ​​the machine tool, a displacement sensor for measuring the axial thermal elongation of the spindle, a CNC system for executing compensation commands, and a data acquisition edge gateway for time synchronization, filtering, and preprocessing of sensor data. The data acquisition edge gateway integrates an OPC UA server module for time synchronization, filtering, and structured encapsulation of multi-source sensor data, and publishes it to the upper layer via the OPC UA protocol. The core layer of the digital twin, deployed on the server side, includes an OPC UA client module, a data processing module, a thermal error prediction module, and a compensation strategy module. The OPC UA client actively subscribes to and acquires real-time temperature and thermal error data periodically from the OPC UA server of the physical layer. The data preprocessing module cleans, normalizes, and handles outliers in the acquired data. The thermal error prediction module integrates a thermal error prediction model constructed using the method described in any one of claims 1-7, which receives the preprocessed temperature data and outputs predicted thermal error values ​​in real time. The compensation strategy module generates corresponding CNC axis displacement compensation commands based on the predicted thermal error values. The user interaction layer is a web-based client that provides a 3D visualization human-machine interface for remotely displaying the machine tool's 3D digital twin model, temperature field cloud map, thermal error prediction curve, and thermal error compensation status in real time. It also supports users in selecting models, configuring thresholds, and monitoring the system. The digital twin core layer sends the compensation command to the CNC system of the physical layer through a communication interface to perform real-time dynamic compensation for the machining process, forming a closed-loop control of "perception-modeling-prediction-compensation".

[0016] The beneficial effects of this invention are as follows: The machine tool thermal error prediction method based on differentiable physical regularization of this invention has the following main technical effects: (1) Significantly improved physical consistency and generalization ability: By introducing the differentiable finite element residual regularization (FERR) module, the discretized physical law of heat conduction is explicitly embedded as a soft constraint in the model training. This forces the model to satisfy the dynamic relationship between its output (predicted thermal error) and input (temperature field) defined by the learnable stiffness matrix K and mass matrix M while fitting the data. This fundamentally solves the problem that the pure data-driven model lacks physical interpretation and is prone to failure when the working conditions change. It can effectively improve the accuracy of thermal error prediction and significantly enhance the model's cross-working condition generalization ability and robustness. (2) High learning efficiency with small sample size, which alleviates overfitting: By introducing physical knowledge as a regularization term into the loss function, it provides a strong prior guidance for model optimization; In the common scenario of scarce training data in industrial sites, physical constraints can effectively limit the solution space of the model and guide it to converge to a solution that is more in line with physical laws, thereby greatly reducing the dependence on massive labeled data and significantly suppressing the overfitting tendency of complex neural networks under small sample size, thus improving the feasibility and economy of the model in actual industrial deployment; (3) Plug and play and strong structural versatility: The FERR module adopts a decoupled design from the backbone neural network (such as CNN, GRU, TCN, Informer, etc.) and interacts only through a standardized interface. It is seamlessly integrated into various time series prediction architectures as a general component. This design makes the embedding of physical knowledge no longer dependent on a specific model structure, realizing a "model-independent" physical regularization framework, which greatly improves the applicability and scalability of the method.

[0017] This invention, based on a differentiable physical regularization machine tool thermal error compensation system, constructs a complete architecture from the physical layer, the digital twin core layer, to the user interaction layer. Through the OPC UA protocol, it achieves standardized integration and real-time communication of industrial data, dynamically senses the thermal state of the machine tool, generates compensation commands using a high-precision prediction model, and feeds them back to the CNC system in real time for execution and correction, forming a fully automatic online compensation closed loop. This solves the problems of modeling and control being disconnected and unable to respond in real time in traditional methods, and can effectively improve the accuracy and real-time response of thermal error compensation. Attached Figure Description

[0018] To make the objectives, technical solutions, and beneficial effects of this invention clearer, the following figures are provided for illustration: Figure 1 A general framework for predicting thermal errors that integrates data-driven models and physical constraints; Figure 2 The working principle of the differentiable FERR module; Figure 3 The process of building a digital twin for modeling thermal errors; Figure 4 A digital twin system based on the OPC UA protocol and B / S architecture; Figure 5 The temperature distribution and thermal error curves under operating conditions #1 and #2 are shown. Figure 6 To compare the prediction accuracy of models with and without integrated differentiable FERR modules; Figure 7 The fitted curve (left) and residual histogram (right) for thermal error prediction under training condition 1 and testing condition 2; Figure 8 A radar chart showing the model's performance during the training (fitting) and testing (prediction) phases; Figure 9 Thermal error prediction curves (left) and residual distributions (right) under cross-condition settings: Condition #2 training, Condition #1 testing; Figure 10 Four models (CNN, GRU, TCN, Informer) were assigned different physical regularization weights. Ratio of training set Predictive performance heatmap ; Figure 11 Loss curves for different batch sizes: (a) Differentiable FERR module enabled; (b) Differentiable FERR module disabled (GRU as backbone model); Figure 12(a) and (b) show the rate of change of data loss and differentiable FERR module loss during the first 50 training rounds of the GRU+FERR model, respectively; (c) a schematic diagram of gradient angle; and (d) the evolution of the angle between the gradient directions of the differentiable FERR module loss and the data loss. Figure 13 The distribution of parameter gradient ranges for the four key layers (layers 1 / 2 / 3 and the fully connected layer) during the training of the GRU+FERR model: the dots represent the original data, the black diamonds represent the mean, and the gray bars mark the standard deviation intervals; Figure 14 Visualize the evolution patterns and heatmaps of key matrix elements under different operating conditions (C1 and C2); Figure 15 For the fabrication of test pieces: (a) solid workpiece; (b) arrangement of measuring points; Figure 16 The workpiece flatness error under three compensation strategies: (a) no compensation; (b) GRU-based compensation; (c) GRU-FERR-based compensation; (d) comparison of errors under the three conditions. Figure 17 Layout diagram of temperature and displacement sensors; Figure 18 Temperature and thermal error under two operating conditions: (a) nine-channel temperature data for operating condition 1; (b) thermal error data for operating condition 1; (c) nine-channel temperature data for operating condition 2; (d) thermal error data for operating condition 2. Detailed Implementation

[0019] The present invention will be further described below with reference to the accompanying drawings and specific embodiments, so that those skilled in the art can better understand and implement the present invention. However, the embodiments described are not intended to limit the present invention.

[0020] 1. Thermal error model This embodiment is based on a machine tool thermal error prediction method with differentiable physical regularization, and includes the following steps.

[0021] Step 1: Obtain temperature sensor time series data of the machine tool to be predicted at multiple consecutive time steps.

[0022] Step 2: Input the temperature time-series data into the trained thermal error prediction model to obtain the corresponding thermal error prediction value. The thermal error prediction model includes a backbone neural network for time-series feature extraction and mapping, and a differentiable finite element residual regularization module coupled to the backbone neural network. Specifically, the backbone neural network is a deep learning model for time series prediction, selected from any of the following: convolutional neural network, gated recurrent unit network, temporal convolutional network, and Informer model based on attention mechanism. The differentiable finite element residual regularization module is structurally decoupled from the backbone neural network, receiving only the prediction output of the backbone neural network. and the corresponding temperature input Interacting with each other; the differentiable finite element residual regularization module is designed as a plug-and-play component, which can be integrated into different types of time-series prediction backbone neural networks through a standardized interface without changing the internal structure of the backbone neural network.

[0023] Step 3: During the training process of the thermal error prediction model, the differentiable finite element residual regularization module receives the thermal error prediction value output by the backbone neural network and the corresponding input temperature sensor time-series data, constructs the physical residual based on the discretized transient heat conduction equation, and calculates the physical residual loss term. The physical residual loss item The predicted thermal error value is considered as an equivalent heat source acting on the heat conduction system; this includes the following steps: 31) The predicted thermal error It is considered as an equivalent time-varying heat source acting on the heat conduction system; 32) Introduce a learnable stiffness matrix With the mass matrix Where: stiffness matrix The mass matrix is ​​used to characterize the spatial thermal coupling relationship between nodes in the temperature field. Used to characterize the thermal response inertia of each node; 33) Based on the discretized transient heat conduction equation, calculate the heat conduction at each time step. physical residuals ; 34) Calculate the physical residual loss term The physical constraint loss term The mean square error of the physical residuals over all time steps.

[0024] Step 4: Calculate the physical residual loss term. Compared to data-driven loss terms calculated based on predicted and actual values Perform a weighted summation to construct the total loss function. .

[0025] Step 5: Minimize the total loss function using the gradient backpropagation algorithm. Simultaneously, the parameters of the backbone neural network and the stiffness matrix in the differentiable finite element residual regularization module are also considered. With the mass matrix End-to-end collaborative optimization is performed so that the optimized model, while outputting the predicted thermal error, satisfies the relationship between its output and input temperature field as defined by the stiffness matrix. With the mass matrix Defined physical consistency constraints.

[0026] 1.1 Differentiable Finite Element Residual Regularization Module Existing thermal error prediction methods heavily rely on precise physical parameters such as thermal conductivity and specific heat capacity, as well as fine structural division, making it difficult to adapt to the actual variability and uncertainty of machine tools under dynamic operating conditions. While purely data-driven methods can effectively extract patterns from historical data, they lack physical consistency, are susceptible to noise interference, and have limited generalization ability. Therefore, this embodiment proposes embedding the heat conduction equation in residual form into a differentiable FERR module of a deep learning framework. By minimizing the residual constraint model output during training, the prediction results both conform to the laws of heat conduction and avoid physical contradictions. This mechanism integrates prior physical knowledge, reduces dependence on large amounts of labeled data, and parameterizes the stiffness matrix. and quality matrix Modeling the spatial thermal coupling effect between temperature nodes—these matrices, as learnable tensors, can adapt to the thermal structural characteristics of a specific system during training, achieving a unified fusion of physical modeling and data-driven approaches.

[0027] Machine tool thermal error is essentially thermal expansion caused by dynamic changes in the temperature field. Existing methods often treat thermal error as a static compensation value, but the time-varying characteristics of heat sources in actual machining require dynamic modeling methods. This embodiment uses thermal error prediction terms... The heat conduction equation is directly introduced as a time-varying internal heat source to construct a closed-loop feedback mechanism between thermal error and the temperature field. The transient heat conduction equation can be expressed as: in: The density of the material; Specific heat capacity; Thermal conductivity; As an internal heat source; This represents the change in temperature.

[0028] The equations are discretized into matrix form using the Galerkin finite element method: in: The node temperature vector; This is the thermal load vector; Here is the stiffness matrix; This is the quality matrix.

[0029] Thermal expansion can be viewed as a response to changes in temperature: in: The coefficient of thermal expansion; This refers to the change in temperature. Let be the characteristic length. In this case, the interaction between the thermal error and the temperature field can be considered equivalent to a heat source, and its effect on the temperature field manifests as a local change in thermal resistance. in: The implicit coupling coefficient is used to convert displacement into an equivalent heat source.

[0030] At the micrometer scale, the interaction between the temperature field and the mechanical deformation field is weak and can be approximated as a unidirectional coupling; therefore, the feedback of thermal error to the temperature field is negligible. Based on this assumption, the thermal error... Introducing it as an external excitation term simplifies calculations and aligns with engineering practice. It also helps predict thermal errors. After considering it as a dynamic heat source, the corrected heat conduction equation is: This embodiment models thermal errors as time-varying "virtual heat sources," characterizing system changes that cannot be explained by heat conduction alone by introducing additional energy inputs / outputs into the heat conduction model. This equivalent treatment allows the model to explicitly consider the effects of thermal errors during the evolution of the temperature field—its function is similar to adding a local heat source or radiator: positive thermal errors correspond to additional heat sources that raise the local temperature, while negative errors are equivalent to a heat dissipation effect that lowers the temperature.

[0031] During training, the model automatically adjusts its stiffness matrix. and quality matrix The weights are used to establish an implicit mapping between thermal error and the equivalent heat source. Although and Their physical dimensions are different, but through and The learnable parameters in the model achieve internal consistency. Therefore, the model maintains dimensional consistency with the physical model throughout backpropagation. The corresponding residual form is expressed as: Since sensor data is discrete in time, the time derivative is approximated using the central difference method: The central difference scheme utilizes the temperature values ​​from two adjacent time steps, exhibiting second-order accuracy. Compared to forward or backward difference methods, it better captures the temperature variation over time and provides higher stability and accuracy. Substituting the approximate time derivative into the residual equation yields the discretized residual expression: in: express The node temperature vector is composed of temperature sensor data at any given time; express The thermal error value vector predicted by the backbone neural network at any given moment; It is a learnable stiffness matrix used to characterize the spatial thermal coupling effect between nodes; It is a learnable mass matrix used to characterize the thermal response inertia of a node; This represents the sampling time interval.

[0032] Stiffness matrix Derived from the spatial discretization of the heat conduction differential equation, it characterizes the heat transfer interactions between discrete nodes. Mass matrix The stiffness matrix is ​​obtained by discretizing the heat capacity term and reflects the heat storage capacity of each node. According to finite element theory, these two matrices possess symmetric positive definiteness, a property that guarantees the physical rationality and numerical stability of the solution. In the differentiable FERR module, the stiffness matrix... Explicitly constrained to be a symmetric positive definite matrix to ensure bidirectional heat diffusion; mass matrix The parameterization is achieved through Cholesky decomposition: in: It is a trainable lower triangular matrix; It is a small positive number. It is the identity matrix, thus ensuring the quality matrix. Positive definiteness and numerical stability.

[0033] All operations in the differentiable FERR module—including matrix multiplication, addition, and discrete differentiation—are differentiable with respect to temperature and thermal error variables. This allows deep learning frameworks to automatically differentiate and calculate the gradient of the residual loss with respect to network parameters, seamlessly integrating physical constraints into the model parameter update process and enabling end-to-end joint training. Within this framework, the stiffness matrix... and quality matrix All parameters are designed to be trainable and dynamically updated during training, enabling the model to adaptively represent the thermal conductivity and heat capacity characteristics of specific structures, thereby enhancing its ability to represent the underlying physical processes. Physical residual loss term. The mean square error of the residuals is defined as follows: in: This represents the total number of time steps.

[0034] During model training, minimizing this residual ensures that the predicted thermal error and the temperature field jointly satisfy the discretized heat conduction equation, thereby enhancing the model's adherence to physical laws. The total loss is defined as the weighted sum of the data loss and the physical loss: in: This refers to the data-driven loss term; These are adjustable regularization weights used to balance data fit and physical consistency.

[0035] To enhance model adaptability and scalability, the differentiable FERR module adopts a modular design, forming a highly general and plug-and-play architecture. This module relies solely on temperature input and the main model's predictions, and is not strongly coupled with the internal network structure. It can be flexibly embedded into various deep learning architectures while maintaining strong structural decoupling. This design transforms the incorporation of physical knowledge from a model-specific constraint into a general regularization term applicable to a wide range of thermal error modeling tasks. During training, guided by the differentiable FERR module, the model learns temporal features from the data and produces outputs that better conform to the laws of thermodynamics, significantly improving prediction accuracy and generalization ability under complex conditions.

[0036] 2. Machine tool thermal error compensation system like Figure 1 As shown, this embodiment is a machine tool thermal error compensation system based on differentiable physical regularization, including a physical layer, a digital twin core layer, and a user interaction layer.

[0037] The physical layer includes a temperature sensor array deployed in the critical heat source area of ​​the machine tool, a displacement sensor for measuring the axial thermal elongation of the spindle, a CNC system for executing compensation commands, and a data acquisition edge gateway for time synchronization, filtering, and preprocessing of sensor data. The data acquisition edge gateway integrates an OPC UA server module for time synchronization, filtering, and structured encapsulation of multi-source sensor data, and publishes it to the upper layer via the OPC UA protocol.

[0038] The core layer of the digital twin is deployed on the server side and includes an OPC UA client module, a data processing module, a thermal error prediction module, and a compensation strategy module. The OPC UA client actively subscribes to and acquires real-time temperature and thermal error data periodically from the OPC UA server of the physical layer. The data preprocessing module cleans, normalizes, and handles outliers in the acquired data. The thermal error prediction module integrates a thermal error prediction model constructed using the method described in any one of claims 1-7, which receives the preprocessed temperature data and outputs the predicted thermal error value in real time. The compensation strategy module generates corresponding CNC axis displacement compensation commands based on the predicted thermal error value. The core layer of the digital twin also includes a model update module, which monitors the output of the thermal error prediction module. When the predicted thermal error value continuously exceeds a preset error threshold, it automatically triggers a process of retraining the thermal error prediction model using historical data. The thermal error prediction module in the core layer of the digital twin supports switching or fusing multiple different backbone neural networks and provides corresponding visual comparisons at the user interaction layer. The backbone neural networks include convolutional neural networks, gated recurrent unit networks, temporal convolutional networks, and an attention-based Informer model. The core layer of the digital twin sends the compensation commands to the CNC system of the physical layer via a communication interface to perform real-time dynamic compensation of the machining process, forming a closed-loop control of "perception-modeling-prediction-compensation".

[0039] The user interaction layer is a web-based client that provides a 3D visualization human-machine interface for remotely displaying the machine tool's 3D digital twin model, temperature field cloud map, thermal error prediction curve, and thermal error compensation status in real time. It also supports users in selecting models, configuring thresholds, and monitoring the system.

[0040] 2.1 Overall Model Architecture and Module Integration Strategy Figure 1 This paper presents a general prediction framework that integrates data-driven modeling and physical consistency constraints. The differentiable FERR module proposed in this embodiment serves as a pluggable component, seamlessly integrating into any time series prediction architecture. In the thermal error prediction scenario, this module implements a dual-driven modeling paradigm of physics and data, establishing a deep coupling between data features and physical laws through bidirectional information interaction, overcoming the limitations of traditional single-path modeling. The model employs a dual-channel collaborative design: in the data-driven channel, the backbone network uses a time-series model to extract time-series features from the input temperature sequence, establishing a nonlinear mapping between the temperature field and thermal error; in the physical constraint channel, the differentiable FERR module constructs a weak-form equivalent of the heat conduction equation, performs physical consistency verification on the network output, and encodes the coupling relationship between structural response and heat conduction as a differentiable residual constraint.

[0041] Compared to the strongly constrained form of explicitly solving the thermal-structural coupling equations in the traditional finite element method, this embodiment treats the predicted thermal error as an equivalent source term in the heat conduction equation. A feedback mechanism between the thermal error and the temperature field is established through inverse mapping, providing a reduced-order representation of complex thermal-structural coupling effects. The differentiable FERR module has three advantages at the engineering level: First, by constructing a physical constraint subspace, it theoretically improves the stability of the model in cross-condition migration prediction; second, the residual-based constraint mechanism is expected to improve convergence and robustness under small sample conditions—when training data is limited, the model can still converge to a physically reasonable solution space, mitigating performance degradation caused by data scarcity; third, it employs a trainable stiffness matrix. and quality matrix This reduces the reliance on strict physical parameters or boundary conditions while ensuring physical consistency, thereby reducing modeling complexity.

[0042] Stiffness matrix Determined by the structural layout and material thermal conductivity, it typically remains stable and exhibits structural memory under different operating conditions; while the mass matrix Influenced by heat source input, boundary conditions, and external disturbances, its structure will adaptively adjust to reflect environmental variability. Within the differentiable FERR module, the temperature sequence is first approximated by the central difference, based on which a trainable [module / mechanism] is constructed. and The physical residual term is calculated using a matrix, and the physical loss is formed by minimizing the residual, which participates in the joint optimization of the total loss function. The dual-channel modeling framework proposed in this embodiment takes module decoupling and joint optimization as its core design principles, combining structural flexibility, training feasibility, and engineering adaptability. While maintaining strong data fitting capabilities, it significantly improves physical consistency, providing an efficient, interpretable, and universally applicable modeling paradigm for thermal error prediction.

[0043] 2.2 Working principle of the differentiable FERR module The core function of the differentiable FERR module is to embed the thermal error predicted by the backbone network into a differentiable heat conduction process, treating it as an equivalent heat source to construct physical residual terms, and providing physical guidance for model learning through joint optimization. For example... Figure 2 As shown, the model first sets the current predicted value Mapped to equivalent heat sources: positive thermal errors correspond to structural "thermal elongation errors," interpreted as internal heat generation; negative errors correspond to "contraction," considered as heat absorption or cooling. It's important to note that this modeling paradigm applies not only to materials with positive thermal expansion but also to those with negative thermal expansion. This "error-heat source" coupling mechanism is essentially a weak physical approximation linking structural response to thermodynamics, allowing thermal errors to be incorporated as a source term into the modeling of the heat conduction process. The equivalent heat source and temperature data are input into a differentiable FERR module, which utilizes a trainable stiffness matrix. and quality matrix The defined formula for calculating physical losses based on thermal conduction residuals. .

[0044] It must be emphasized that the differentiable FERR module is not used as a hard constraint during training, but rather as a regularization path that works synergistically with the data loss of the backbone network to form a dynamic joint optimization mechanism. From an optimization perspective, there are two interacting driving forces during training: data loss prompts the model to fit the observations, while physical residual loss guides the model towards a physically consistent solution space. These two objectives establish a tension between data fidelity and physical consistency—the former emphasizes empirical accuracy, while the latter focuses on structural rationality; they are both contradictory and complementary.

[0045] The model achieves a dynamic balance between the two forces, ultimately converging to the optimal solution that balances prediction accuracy and physical consistency. This game-theoretic optimization mechanism is the essential characteristic that distinguishes the differentiable FERR module from traditional regularization terms. Through the design of equivalent heat source mapping and trainable structural parameter matrices, this module constructs a unified modeling path that combines physical consistency, structural interpretability, and data compatibility, providing a new solution framework for applying deep learning to complex heat-structure interaction problems.

[0046] 2.3 Decoupling Modeling Paradigm from Backbone Network To systematically verify the structural independence and general adaptability of the differentiable FERR module, this embodiment designed a series of integration experiments under different modeling paradigms. Specifically, four types of temporal backbone models—CNN, GRU, Temporal Convolutional Network (TCN), and Informer—were selected as the basic prediction framework, and the differentiable FERR module was seamlessly integrated through a standardized interface. CNN represents a static feature extraction paradigm; although it does not explicitly model temporal dynamics, it captures spatial temperature distribution through convolution operations. GRU belongs to the recurrent architecture family and, due to its sequence memory characteristics, is suitable for modeling the short- to medium-term evolution patterns of temperature and thermal error. TCN adopts a hierarchical temporal modeling strategy, capturing long-term dependencies through stacked dilated causal convolutions. Informer is based on the principle of global modeling and uses a sparse attention mechanism to capture features across time steps.

[0047] These four types of models cover the four major time-series modeling paradigms of the current mainstream static convolution, recurrent update, hierarchical dilation and global attention

[39] , forming a structurally complementary validation set. Based on the proposed decoupling framework, the differentiable FERR module can be flexibly integrated into any backbone network through a unified interface. Only the backbone model needs to output the predicted thermal error vector and the corresponding temperature time-series data to realize the exchange of structural parameters and rapid embedding. By comparing the performance before and after the integration of heterogeneous models, the module's adaptability to different modeling paradigms, its structural independence relative to the backbone network, and its plug-and-play compatibility with emerging architectures are evaluated.

[0048] The subsequent experimental section will systematically compare different backbone-differentiable FERR module combinations to verify the adaptability and stability of the modules under different modeling strategies, and further explore their generalization ability, decoupling strength and cross-architecture scalability.

[0049] 2.4. Thermal Error Modeling and Compensation Method Based on Digital Twin 2.4.1 Digital Twin Construction The construction of a digital twin system is the technical foundation for thermal error prediction and closed-loop compensation. Addressing the thermal error control requirements of CNC machine tools, this embodiment integrates physical modeling, data-driven learning, and interactive visualization to design a real-time and scalable digital twin workflow. With thermal error modeling and compensation at its core, the system employs an integrated strategy of "physical modeling - behavior mapping - intelligent prediction" to construct a multi-layered twin architecture that maps physical entities to virtual space. Figure 3 It demonstrates the entire system workflow from 3D model creation, sensing and monitoring, intelligent prediction to digital twin feedback, reflecting the logical construction path from physical objects to digital mirrors.

[0050] The 3D modeling phase is fundamental to a digital twin system, with the core objective of accurately replicating the geometric features, assembly relationships, and visual appearance of a physical machine tool. This process begins with acquiring geometric data through dimensional measurements and structural disassembly, followed by the reconstruction of key components (such as the spindle, bed, and worktable) in modeling software. To ensure structural consistency and motion linkage, the modeling process defines assembly constraints and parent-child hierarchical relationships, establishing driving logic between components through kinematic pairs to achieve hierarchical motion representation. After the geometric model is completed, texture mapping and material processing enhance visual realism, and physical images are used for texture alignment to improve visual fidelity. To support real-time web-based rendering, the model also undergoes optimization processes such as mesh simplification, redundancy elimination, and texture compression to reduce memory load and latency, and improve interactive response efficiency. The final 3D model not only maintains high geometric fidelity but also supports behavioral modeling and thermal error simulation, ensuring consistency between visual presentation and structural logic in the digital twin.

[0051] The behavior modeling phase is a core component of the digital twin system, enabling real-time interaction and synchronization between the virtual and physical worlds. Based on a 3D structural model, this phase defines the operational logic and response mechanisms of each component of the machine tool. By encapsulating basic behavior modules, a multi-layered behavior model is established, ranging from unit-level and component-level to system-level. These models dynamically drive the 3D geometry, simulating and mapping the actual machine tool operation. During this process, machine tool data is uploaded to the core layer of the digital twin via the OPC UA protocol, serving as the perceptual input to trigger behavior responses. Each typical action (such as spindle rotation, table translation, or clamping) is encapsulated as a reusable behavior module and dynamically bound to the corresponding geometry. The system automatically invokes the corresponding behavior through an event-driven triggering mechanism, achieving real-time visualization and interactive simulation. Through hierarchical behavior abstraction and hierarchical dependencies, the digital twin can accurately reproduce the operating state of the physical machine tool. Behavior modeling thus becomes a bridge between perception and dynamic response, achieving a two-way mapping between physical states and virtual actions. This modeling method significantly improves the adaptability and scenario compatibility of the digital twin, ensuring the consistency and coordination of behavior between the physical and virtual systems.

[0052] The thermal error modeling module, as the core of intelligent analysis in digital twins, is responsible for transforming historical state data and real-time sensor inputs into accurate thermal error predictions. This module integrates multi-source temperature signals and historical operating data, generating feature tensors as model inputs through data cleaning, normalization, and temporal structuring. This embodiment integrates multiple temporal models such as GRU, CNN, TCN, and Informer, and embeds a differentiable FERR module to establish a physical information optimization path. The model adaptively captures the nonlinear relationship between temperature input and thermal error output, jointly modeling structural response inertia, heat source disturbances, and spatial coupling effects. The prediction results are transmitted to the compensation strategy module, where, after correction calculation, real-time control commands are sent to the CNC system, forming a closed-loop compensation mechanism. Within this framework, the thermal error module not only improves prediction accuracy but also ensures interpretability and robustness under different operating conditions through the integration of physical consistency and data-driven modeling.

[0053] The digital twin system developed in this embodiment integrates 3D modeling, behavior mapping, and thermal error modeling into a unified workflow of perception-prediction-control. The 3D modeling component ensures high-fidelity geometric reconstruction and lightweight adaptation, laying the visual and structural foundation for the digital twin. The behavior model, through multi-layered behavior encapsulation driven by OPC UA, achieves real-time mapping between physical states and virtual actions. The thermal error module, enhanced by a differentiable FERR module and a multi-model architecture, provides a prediction engine with physical interpretability and generalization capabilities, supporting intelligent compensation strategies. This multi-dimensional integrated construction method strengthens the real-time response and thermal sensing capabilities of the digital twin, while improving the engineering scalability and deployment feasibility of the compensation system.

[0054] 2.4.2 Digital Twin Framework and Thermal Error Compensation Strategy To achieve real-time monitoring, accurate modeling, and dynamic compensation of machine tool thermal errors, a digital twin system based on the OPC UA protocol and a B / S architecture was designed and deployed, such as... Figure 4 As shown, the system adopts a three-layer distributed structure consisting of a physical layer, a digital twin core layer, and a user interaction layer. Through collaborative operation, it establishes a fully closed-loop system covering perception, modeling, prediction, and control.

[0055] The physical layer is responsible for acquiring physical information and executing control actions, and consists of the machine tool body, CNC system, temperature and displacement sensor arrays, and edge data acquisition gateway. Sensor modules continuously collect key thermal variables, while the edge gateway performs local preprocessing tasks such as time synchronization, anomaly filtering, and data alignment. The OPC UA server module deployed on the gateway encapsulates multi-source sensor data into structured data frames, enabling unified upstream publishing. This processing flow improves data quality and transmission stability, while enhancing the system's adaptability in multi-device, multi-protocol environments. In the system, OPC UA not only serves as a data acquisition interface but also as a unified communication protocol, enabling structured and standardized publishing of sensor data, exhibiting strong cross-platform compatibility and industrial deployment friendliness. The CNC system, as the core of the physical layer control, connects to the modeling module through serial communication and input / output (I / O) interfaces, synchronizing system status and mapping the model output back to control parameters in real time, ensuring effective execution of compensation commands. By integrating sensing, preprocessing, and control interfaces, the physical layer provides high-quality data sources and real-time physical interfaces for the upper-layer thermal error modeling and compensation modules, ensuring stable system operation and closed-loop control under varying operating conditions. It serves as the sensing foundation and execution center of the entire digital twin system.

[0056] The core layer of the digital twin, deployed on the server side, serves as the central hub for data processing and compensation decisions. The OPC UA client module periodically and proactively retrieves temperature and thermal error data from the physical layer, establishing a reliable and efficient cross-layer communication mechanism. The data processing module performs cleaning, normalization, and anomaly removal to ensure the validity and consistency of subsequent modeling inputs. The preprocessed data is then input into the thermal error modeling module, which integrates a regularization method based on a differentiable FERR module. By introducing differentiable physical residuals, it achieves physically consistent modeling of thermal response behavior, improving generalization capabilities across multiple operating conditions. Furthermore, the system incorporates an error threshold monitoring mechanism—when the predicted thermal error exceeds a preset range… The system automatically initiates a model retraining process, utilizing historical data to optimize parameters and update the model structure, ensuring the long-term accuracy and robustness of the prediction model. Predicted thermal error values ​​are processed by the compensation strategy module to generate real-time correction commands, which are transmitted to the CNC system via a communication interface, forming a closed-loop dynamic compensation process. As the system control center, the server receives real-time physical layer status data and responds to user interaction requests, bridging data flow, model execution, and compensation control to achieve collaborative operation of all components of the digital twin system.

[0057] The interaction layer serves as the human-machine interface of the digital twin system, handling visualization, model configuration, and status monitoring. It plays a crucial role in providing system-level operational feedback and user interaction. Based on a B / S architecture, this layer utilizes WebGL (Web Graphics Library) for 3D visualization and communicates with the core layer via the standard Hypertext Transfer Protocol (HTTP), supporting data requests, command transmission, and status updates. Users can remotely access the machine tool virtual twin through a browser client. The interface integrates a 3D structural model, temperature sensor layout, thermal error trend curves, a model selection panel, and threshold configuration options. The system supports real-time visualization of prediction results generated by backbone models such as GRU, CNN, and TCN combined with a differentiable FERR module. Users can flexibly configure sensor input channels and error tolerance thresholds, while the system continuously performs thermal error compensation tasks. When the prediction error exceeds a set threshold, a retraining mechanism is automatically triggered to maintain prediction accuracy. The generation and execution of the compensation strategy are handled by the core layer; the interaction layer is only responsible for input configuration and feedback display, ensuring a safe and stable control process. Because the client is decoupled from the underlying OPC UA communication logic, users do not need to deal with the details of industrial protocols, which significantly simplifies client deployment and system maintenance, while enhancing cross-platform compatibility and engineering adaptability.

[0058] This embodiment of the system achieves standardized integration of multi-source sensor data across devices through the OPC UA protocol, and uses an HTTP interface to enable data interaction between the web client and the core layer, thereby constructing a fully closed-loop workflow covering perception, modeling, prediction, compensation, and control. This architecture clearly distinguishes two core data paths: the data pipeline ensures stable transmission of thermal error information and drives the modeling process, while the control pipeline is responsible for real-time compensation and strategy adjustment feedback. Together, they form a thermal stability control mechanism based on physical modeling. Leveraging the deployment flexibility of the B / S architecture and the industrial interoperability of the OPC UA protocol, the system possesses high scalability and platform adaptability, enabling it to adapt to various machine tool models and complex operating conditions for digital twin deployment.

[0059] 3. Experimental Research and Verification 3.1 Thermal Information Data Acquisition This embodiment uses a TGK46100 CNC machine tool as the experimental platform for thermal error analysis. To avoid interfering with normal machining, all thermal information (including temperature and thermocouple signals) was collected under no-load conditions. The sensor layout is detailed in Appendix A. The experiment focuses on the machine tool's electric spindle, setting two typical operating conditions: Condition #1, where the spindle speed increases in a stepwise manner, and Condition #2, where the spindle speed fluctuates randomly. To ensure the system reaches thermal steady state, a 5-minute preheating period was performed before each data acquisition, and the sampling frequency was set to once per minute. The total collection time is Ultimately, 500 sets of time-series samples of temperature and thermal error were obtained. The temperature change curves and corresponding thermal error distributions under the two operating conditions are shown in Appendix A.

[0060] The calculated machine tool temperature and thermal error distribution under operating conditions #1 and #2 are as follows: Figure 5 As shown in the upper chart, although the average temperatures of the two operating conditions are similar, their fluctuation ranges and time patterns differ significantly, reflecting the dynamic response characteristics of the system under different thermal input modes. The lower thermal error distribution further reveals the structural response to thermal deformation. The thermal error amplitude of operating condition #1 is significantly higher than that of operating condition #2, and the fluctuations are more severe, indicating that the excitation mode of the heat source directly affects the thermal error behavior of the machine tool. This dual difference in thermal input and thermal response demonstrates that the thermal characteristics of the machine tool change significantly with operating conditions. Therefore, the cross-operating condition generalization ability of the thermal error prediction model is crucial for practical applications. In addition, due to the low sampling frequency and limited dataset size, the modeling method must be able to accurately capture complex thermodynamic characteristics under small sample conditions.

[0061] In subsequent sections, the effectiveness of the differentiable FERR module in enhancing cross-condition generalization is evaluated by exchanging the training and test sets. Specifically, the model is trained using the temperature time series of one condition as input and the corresponding thermal error as the supervision signal. The model is then applied to temperature data from another condition for prediction, and finally, the predicted thermal error is compared with the measured value. Mean absolute error (MAE), root mean square error (RMSE), mean square error (MSE), and the coefficient of determination are used. The model performance is quantitatively evaluated using multiple indicators.

[0062] 3.2 Validation of the thermal error model To ensure fairness and reproducibility, all models were trained using a uniform configuration. The mean squared error (MSE) loss function was used, and the Adam algorithm with a learning rate of 0.001 was employed as the optimizer. The maximum number of training epochs was set to 1000, and an early stopping strategy was used to prevent overfitting. All experiments were conducted on an NVIDIA RTX 3090 GPU using the PyTorch framework. The dataset consisted of nine temperature sensor channels, including eight main axis surface measurement points and one ambient temperature measurement point. Both the training and testing sequences contained 500 time steps. To improve convergence efficiency and eliminate dimensional differences, all input features were independently min-max normalized per channel. In the backbone model, the CNN architecture includes two one-dimensional convolutional layers (16 and 32 channels respectively, with a kernel size of 3), along with a Dropout layer and a fully connected layer; the GRU model adopts a three-layer stacked structure, with the hidden state at the final time step as the output; the TCN model consists of two causal dilated convolutional layers (16 channels, kernel size 3) and residual connections; the Informer model follows the Transformer architecture, introduces a ProbSparse attention mechanism to improve the efficiency of long sequence modeling, and constructs the decoder input by combining the label length and the prediction length.

[0063] To improve the modeling efficiency and engineering deployability of the differentiable FERR module, this embodiment employs a feature abstraction and approximation strategy for input selection. Given that the prediction target is the axial elongation of the spindle, temperature time-series data from the T1 sensor located on the rear bearing end cap and the T6 sensor on the front bearing end cap are selected as module inputs. This modeling method simplifies the spindle into a one-dimensional heat conduction system defined by two end nodes, ignoring the internal temperature distribution and equating the thermal error to the axial displacement between the two nodes. Under this configuration, the differentiable FERR module learns the corresponding axial displacement response using the temperatures at both ends as boundary conditions, thereby constructing a structured physical model of the thermal error. This approximation scheme significantly reduces modeling complexity while still effectively capturing the spindle's thermal deformation trend, enhancing the module's practicality and scalability.

[0064] The differentiable FERR module is integrated into the four backbone models as a general physical regularization component, transforming predicted thermal errors into equivalent heat sources for calculating heat conduction residuals. The module internally contains two trainable matrices. and These represent the spatial coupling characteristics and node response characteristics, respectively. They are initialized as a sparse structure to accelerate convergence and maintain differentiability throughout to support gradient updates. The physical residual loss term weights... Set the value to 1 for all models to ensure consistent physical constraint effects across different models. See Appendix B for detailed network architectures and training configurations of each backbone model.

[0065] 3.2.1 Accuracy Validation: Multi-model Comparison and Baseline Enhancement To evaluate the generalization and performance improvement of the differentiable FERR module across different time-series prediction models, eight comparative experiments were conducted based on four backbone architectures (CNN, GRU, TCN, and Informer) (four experiments each with and without the module). Under a unified dataset, training strategy, and loss function setting, the model's performance on the hot error prediction task was systematically evaluated, with metrics including RMSE, MAE, and [missing data]. and total training time. Results are as follows: Figure 6 As shown, detailed data can be found in Appendix C.

[0066] In terms of root mean square error (RMSE) and mean absolute error (MAE), all models showed significant error reductions after integrating the differentiable FERR module. The combination of the Informer module and the differentiable FERR module performed best, reducing RMSE to 0.9294 μm and MAE to 0.7591 μm, representing reductions of 61.1% and 63.3% respectively compared to the original model. The combinations of GRU and the differentiable FERR module, and CNN and the differentiable FERR module also achieved approximately [missing data - likely related to error reduction]. and The reduction in RMSE confirms the module's universal enhancement effect across various network architectures. In terms of metrics, all models integrating the differentiable FERR module achieved accuracy above 0.9912. However, among the models without this module, the more complex TCN and Informer models actually had lower accuracy. The values ​​of 0.9695 and 0.9602, respectively, indicate that the differentiable FERR module effectively alleviates the overfitting phenomenon of complex models during small-sample training. In terms of training time, this module introduces only a slight computational overhead (less than 15%), which is highly cost-effective in high-precision thermal error modeling scenarios given the significant improvement in prediction accuracy. Overall, the differentiable FERR module demonstrates strong compatibility and performance gains across all backbone architectures, with particularly significant improvements to GRU and Informer models. It not only greatly reduces errors but also improves generalization ability, laying a solid foundation for subsequent cross-condition validation and elucidation of the regularization mechanism.

[0067] To compare the fitting performance and error distribution characteristics of different models in thermal error prediction, Figure 7The diagram illustrates the prediction curves and residual distributions under training condition 1 and testing condition 2. The left figure shows a line graph comparison between predicted and measured values, while the right figure is a residual frequency histogram, revealing the distribution patterns across different error intervals. As seen in the left figure, models without the integrated differentiable FERR module generally exhibit significant systematic bias, especially during abrupt or gradual increases in thermal error, where the prediction curve is prone to amplitude lag or shift. After integrating this module, the agreement between the prediction curve and the measured values ​​is significantly improved, particularly at inflection points and plateau regions, demonstrating the module's advantage in capturing the inertia and thermal coupling behavior of physical responses.

[0068] The residual histogram on the right further reveals the convergence characteristics of the model error: without the integration of the differentiable FERR module, the residual distribution is scattered, exhibiting a long-tail effect and center shift; after integration, the residuals are significantly concentrated, with the peak value concentrated near zero, forming a quasi-normal distribution with a high peak and narrow tail. This indicates that the differentiable FERR module effectively reduces systematic and extreme errors, enhancing the overall stability and reliability of the prediction.

[0069] To visually demonstrate how the differentiable FERR module balances model fitting ability and generalization performance, Figure 8 The performance of the four backbone models on the training and test sets was compared using radar charts. The left chart shows the results of the original model without integrating this module, and the right chart shows the results after integration. After introducing the differentiable FERR module, the training set performance metrics (such as...) are significantly improved. And MAE) slightly decreased, for example, the training set of GRU The decrease from 0.9915 to 0.9700 indicates a reduction in model flexibility. However, all models showed significant improvements on the test set, with the combination of GRU and differentiable FERR modules showing the best performance. Reaching 0.9936, RMSE and MAE decreased simultaneously, proving that the physical residual constraint introduced by this module effectively suppressed overfitting of the training data and enhanced robustness and generalization ability across different working conditions.

[0070] This performance shift reflects a new modeling paradigm: by moderately sacrificing the flexibility of pure data fitting and introducing physical consistency and structural priors, the model achieves better predictive stability and interpretability under unseen conditions. This strategy is particularly valuable in thermal error modeling of complex engineering systems. Compared to traditional pure data-driven methods, practical applications place greater emphasis on the reliability of the model under environmental disturbances and changing boundary conditions. Therefore, the differentiable FERR module proposed in this study not only achieves theoretical synergy between physics and data but also provides a general solution for intelligent modeling of complex industrial scenarios.

[0071] 3.2.2 Cross-condition prediction experiment To comprehensively evaluate the cross-condition generalization ability and directional symmetry of the differentiable FERR module, this embodiment conducts a reverse cross-validation experiment. Specifically, after training the model under condition #2, thermal error prediction is tested under condition #1. This setting simulates the model's adaptability and residual control performance when encountering unseen data characterized by low heat load or early behavior, further verifying the robustness and generalization ability of the differentiable FERR module. As mentioned in the previous section, condition #1 has a stronger thermal response and more complex error fluctuations, making it more difficult to model; while condition #2 has weaker thermal dynamics and a more stable error pattern, making it easier to model. Therefore, testing under condition #2 after training under condition #1 is a favorable generalization scenario, which can verify the model's robustness; the reverse setting constitutes a more stringent test, which can more rigorously evaluate the generalization ability.

[0072] Figure 9 The prediction curves and residual distributions of each model under this cross-condition setting are shown. The results are consistent with the previous experiment (condition #1 training, condition #2 prediction): the model without the integrated differentiable FERR module still shows significant prediction lag or amplitude amplification during the thermal error abrupt change stage (especially at local peaks and valleys); while the model with the integrated module can better grasp the overall error change trend, the prediction curve is smoother, and the trend following ability is significantly improved. The residual histogram on the right further confirms this advantage—without the differentiable FERR module, the residuals show a multi-peak distribution, peak shift, and heavy tails; after integration, the residuals are closely concentrated near zero, the distribution range is significantly narrowed, the systematic error is reduced, and the prediction robustness is enhanced.

[0073] The results show that the differentiable FERR module not only improves unidirectional generalization performance (training in condition #1 and prediction in condition #2), but also achieves stable gains under inverse conditions, exhibiting cross-condition symmetrical generalization characteristics. This consistent performance improvement in both directions has significant engineering value for systems subjected to complex environmental disturbances and variable thermal configurations—something that is difficult to achieve with purely data-driven methods. Although the differentiable FERR module performs stably in different condition switching scenarios, its sensitivity to the amount of training data and the strength of regularization still needs to be investigated. The following section will analyze in detail the joint effect of data volume and physical regularization weights.

[0074] 3.2.3 The combined effect of data volume and regularization strength To systematically evaluate the adaptability and robustness of the differentiable FERR module under different training data scales and physical regularization strengths, we conducted two-dimensional parameter experiments on four backbone models (CNN, GRU, TCN, and Informer). By varying the training data ratio (TrainRatio ∈ and regularization weights Analyze its predictive performance on the test set (using the coefficient of determination). The impact of (measurement) is considered. The total length of the training sequence is fixed at 500 time steps, and the test set always maintains the full 500 time steps to ensure the stability of the generalization ability assessment.

[0075] Figure 10 A heatmap of prediction accuracy for each model under parametric grid conditions is presented. The results show that, with limited training data, the differentiable FERR module significantly improves generalization performance. For example, when TrainRatio=0.5, the GRU model without physical regularization... The value is 0.9177; a differentiable FERR module is introduced ( )back Increased to 0.9664. Similarly, the Informer model at approximately The value reached 0.978. This indicates that the module can enhance the sample efficiency of the backbone model, making it particularly suitable for real-world industrial scenarios with limited data and complex operating conditions.

[0076] The heatmap results further demonstrate that the differentiable FERR module in Optimal performance was achieved within the range, demonstrating a strong synergistic effect between physics guidance and data fitting. When the size is too small, the regularization effect is weak and physical information cannot be effectively transferred to the model structure; conversely, when the size is too large... Values ​​can inhibit the model's ability to extract effective features from the data, leading to performance degradation. These findings suggest that the core value of the differentiable FERR module lies not only in introducing physical constraints, but also in its tunability—establishing a balance between physical consistency and learning flexibility through soft constraint mechanisms.

[0077] Furthermore, the differentiable FERR module consistently improves performance across all modeling paradigms, including CNNs, GRUs, TCNs, and attention-based Informer networks. This structure-independent and model-compatible characteristic indicates that the module can serve as a general physical regularization component, flexibly integrated into mainstream time-series models to enhance prediction accuracy. Even under small sample conditions, the differentiable FERR module exhibits strong robustness, demonstrating good scalability and engineering applicability. Its design concept of adjustable physical constraints and decoupled interfaces provides a theoretically sound and practically feasible physical-data hybrid modeling paradigm for thermal error prediction and related fields.

[0078] 3.3 Analysis of FERR Regularization Mechanism: The Physical Basis of Performance Gain 3.3.1 Dynamic Interaction Between Physical Loss and Data Loss To gain a deeper understanding of the training optimization mechanism of the differentiable FERR module, two analysis methods were used: (1) Observing the trend of loss function changes under different batch sizes ( Figure 11(2) Analyze the variability of the loss of the differentiable FERR module and the data loss, as well as the consistency of the gradient direction. Figure 11 The combination of GRU and differentiable FERR module is selected as a typical case for detailed analysis.

[0079] Training was conducted with batch sizes of 8, 16, 32, 64, 128, and 256, revealing that the introduction of physical constraints made the overall training process more stable. The loss curve converged more smoothly, validating reduced loss fluctuations and indicating stronger suppression of overfitting. Although the combination of GRU and differentiable FERR modules converged slightly slower in the early stages of training, it achieved more balanced performance metrics and ultimately obtained better generalization ability. At different batch sizes, this combination showed significantly stronger robustness to changes in optimization granularity. In contrast, the baseline model... The gradient angle decreases significantly with increasing batch size, leading to reduced convergence stability. Overall, the differentiable FERR module enhances training stability and generalization performance through structured physical priors, effectively mitigating gradient estimation noise and performance degradation risks. It is noteworthy that at early stopping points, the gradient angle of the optimal model is still within the range... to Within a range (e.g., batch) At that time This indicates that the module does not improve performance by guiding the model toward a collaborative optimization path, but rather by continuously offsetting the model's tendency to overfit the data through constant optimization tension. Dynamic pressure drives the model into a solution space that simultaneously satisfies the task objective and physical constraints.

[0080] Figure 11 (a) illustrates the training process of the GRU combined with the differentiable FERR module at different batch sizes, including the total training loss, data loss, differentiable FERR module loss, and validation loss. The results show that all loss components steadily decrease with training iterations. The differentiable FERR module loss converges synchronously with the data loss, indicating that this module has excellent differentiability and can share the optimization path with the backbone network. The convergence speed slows slightly with increasing batch size, but the final prediction accuracy remains high. Still at a high level This demonstrates that the regularization mechanism is stable and effective at different training granularities. In contrast, Figure 11 (b) shows the training curves of the baseline GRU model (without the integrated differentiable FERR module), highlighting the changes in data loss and validation loss. The data loss decreases rapidly in the early stages of training, reflecting strong fitting ability; however, the validation loss does not decrease accordingly and fluctuates under specific batch settings, suggesting a risk of overfitting. Compared to the combination of GRU and the differentiable FERR module, the baseline GRU typically achieves a lower final loss. numerical values Furthermore, the generalization performance decreased more significantly at batch sizes of 128 and 256.

[0081] Figure 12 (a) and (b) show the rate of decrease (i.e., the first derivative with respect to the training epochs) of the loss from the differentiable FERR module and the data loss in the first 50 training epochs for the combination of GRU and differentiable FERR modules at different batch sizes. The results show that at small batch sizes (8, 16, and 32), the decreasing trends of the two loss components are highly consistent, and no obvious trade-off is observed in the loss curves. However, at larger batch sizes (64, 128, and 256), the rate of decrease begins to diverge significantly, even showing opposite directions in the early stages of training. This indicates a dynamic trade-off between the differentiable FERR module objective and the data objective along the optimization trajectory. The underlying reason for this phenomenon lies in the gradient estimation characteristics of large batches: larger batch sizes provide more stable gradient estimates, allowing the gradient signal from the differentiable FERR module loss to propagate more clearly. Therefore, the differentiable FERR module component exerts a stronger influence in the early stages of training, causing the model to deviate from the steepest descent path determined by pure data loss and instead guide convergence to a solution that is more physically consistent.

[0082] A schematic diagram illustrating the angle between the data loss gradient and the loss gradient of the differentiable FERR module is shown below. Figure 12 (c) Figure 12 (d) illustrates the evolution of the angle between the gradients of the differentiable FERR module loss and the data loss. This angle varies across all batch size settings. Always close This indicates that it is differentiable. This mechanism behaves differently at different batch sizes: smaller batches introduce larger gradient variance, which to some extent blurs the conflict between physical and data objectives. In this setting, the differentiable FERR module plays a milder role, acting as a "soft constraint" to coordinate the optimization process. Conversely, at larger batch sizes, the gradient is more stable, and the influence of the differentiable FERR module becomes more dominant, with its perturbation effect more clearly and directly reflected in the optimization dynamics. Although this high-tension state may cause short-term fluctuations in the loss curve during the early stages of training, it ultimately drives the model to achieve better generalization under the constraints of physical structure. During training, the reduction in loss from the differentiable FERR module is often accompanied by a short-term rebound in data loss. This "gradient perturbation-tension-driven" mechanism allows the model to escape the local minima of purely data-driven optimization, promoting structural generalization.

[0083] From an optimization dynamics perspective, the differentiable FERR module is not merely a passive supplement to model capacity. Instead, it guides the model towards a balance between physical consistency and prediction accuracy by applying directional interventions early in training. This gradient tension-based collaborative optimization strategy establishes a novel "soft physics constraint" modeling paradigm. This method does not rely on gradient alignment but instead improves prediction accuracy, convergence stability, and cross-conditional generalization ability simultaneously through structural interventions.

[0084] 3.3.2 Gradient Propagation and Enhanced Training Stability To investigate the impact of the differentiable FERR module on gradient propagation and training stability, we analyzed the distribution characteristics of the parameter update ranges of each layer in the GRU model. Experiments were conducted with three batch sizes (16, 64, and 256), with the module enabled and disabled respectively. Figure 13 A violin plot showing the gradient range distribution of the four key layers (layer 1, layer 2, layer 3, and fully connected layer) throughout the training process is presented, where the dots represent the original data, the black diamonds represent the mean, and the gray bars mark the standard deviation intervals.

[0085] Data shows that enabling the differentiable FERR module significantly reduces parameter update variation, especially in the intermediate layers (layers 2 and 3). For example, under a batch size of 64, the gradient range of layer 2 decreases from 0.0969 to 0.0343, a reduction of over 64%. At that time, the second and third layers decreased by 62.2% and 32.7% respectively, while the fully connected layer changed only slightly by 11.8%. This indicates that the modulatory effect of this module is most significant in the middle layers of the network, and has a relatively weak impact on the input and output layers.

[0086] This phenomenon can be explained from two aspects: the model architecture and the FERR module's operating mechanism. First, the initial layer (layer 1) is mainly responsible for local perception of the original temperature sequence and extraction of low-level features, while the intermediate layers (layers 2 and 3) are responsible for temporal dynamic integration and abstract physical response modeling, serving as the core of "structural memory" and "thermal inertia learning." The output layer (FC) focuses on mapping the abstract representation to the final prediction. Since the residual gradient of the FERR module propagates backward from the model output, its moderating effect is most concentrated in the intermediate layers. Second, the FERR mechanism uses the thermal error predicted by the backbone network as the heat source input equation, constructs the residual term, and then calculates the physical loss. The gradient is then propagated in reverse, and the resulting gradient has the most direct effect on the output layer and the most significant effect on the intermediate layers. As the error propagation path extends, the gradient gradually decays at the bottom layer.

[0087] It is worth noting that the effect of the differentiable FERR module is more pronounced in large-batch settings. Large-batch settings provide more accurate gradient estimates, allowing the directionality and intensity of FERR-related gradient signals to be maintained during propagation, thereby more effectively suppressing redundant updates and enhancing structural stability. For example, large batch... When the gradient distribution of the second layer is more concentrated, the tail shrinks, and the mean is stable, it indicates that the module exerts stronger control under "high-resolution gradient feedback". In contrast, when the module is disabled, the gradient distribution of the intermediate layers is more dispersed and exhibits obvious long-tail characteristics, suggesting that frequent and violent parameter oscillations may lead to convergence instability or overfitting.

[0088] In summary, the differentiable FERR module improves overall training stability—especially under large-batch conditions—by modulating intermediate layer parameter fluctuations through gradient-level intervention based on physical residuals. Its significant effect on intermediate layers not only strengthens the structural consistency of temporal modeling but also supports improved model generalization ability and physical interpretability. Functionally, this module, as an implicit structure controller, applies targeted influences along the gradient propagation path, demonstrating strong versatility and engineering applicability.

[0089] 3.3.3 Structural Memory and Thermal Response Learning in the K / M Model The differentiable FERR module introduces learnable stiffness and mass matrices M, constructing physical residual terms based on the backbone model output during training. These residual terms act as a regularizer, guiding the optimization process to seek a balance between data fitting and physical consistency. To explore the evolution and physical interpretation of these matrices, Figure 14 The changes in key matrix elements and the corresponding heatmap visualization results are shown under operating conditions #1 and #2.

[0090] Stiffness matrix Essentially, it reflects the thermal interaction relationships between nodes and is an abstract encoding of sensor layout, structural topology, and material thermal conductivity. Experimental results show that under operating conditions... and Under these conditions, the matrix converges rapidly to a stable configuration, exhibiting a highly consistent spatial pattern and numerical structure. This indicates that... It captures a structure-invariant representation—one that is independent of specific heat source distributions or boundary conditions. This condition-invariant stability endows the model with structural memory capabilities when transferring between different conditions, providing a foundation for small-sample generalization. By explicitly embedding this structural information into the neural network training path, the differentiable FERR module serves as a bridge between data-driven learning and physics-based modeling.

[0091] In contrast, the quality matrix The response characteristics of each temperature node to heat input were characterized, reflecting the condition-sensitive features of heat capacity or thermal inertia. M converged slowly during training and exhibited significantly different spatial distributions under conditions #1 and #2. This indicates that the learned... The matrix exhibits high adaptability to various operating conditions. This variability not only does not weaken the model's generalization ability, but also enhances its adaptability and robustness to specific training conditions through a dynamic adjustment of the hot response mechanism. This contrasts with structurally invariant... Matrix synergy, Together, they formed a collaborative modeling strategy that combines structural memory and working condition perception.

[0092] It is important to emphasize that the differentiable FERR module is only introduced as a regularization component during the training phase. It is not called during inference. and The matrix ensures that the model remains lightweight and consistent during deployment. Even with physical constraints imposed during training, the computational graph during inference remains identical to the original backbone model (such as GRU, CNN, TCN, and Informer), ensuring that physical interpretability is achieved without incurring additional inference overhead and maintaining practicality for industrial deployment.

[0093] This embodiment introduces a stiffness matrix. Explicit modeling of structural interactions and heat conduction paths between temperature nodes enhances the model's ability to perceive spatial structural consistency across different operating conditions; simultaneously, a mass matrix is ​​employed. Characterizing the thermal response inertia of each node to heat source disturbances enables adaptive modeling of operating condition sensitivity and thermodynamics. Based on this, the differentiable FERR module embeds the predicted thermal error and temperature sequence into a... and The dominant residual formula abstracts and simplifies the underlying physical mechanisms. This formula implicitly guarantees dimensional consistency and balance with physical constraints, ensuring that the constructed regularized path possesses physical consistency, structural interpretability, and full differentiability—guiding the model to converge to a structurally sound parameter space during optimization. Notably, since the differentiable FERR module is used only during the training phase, it does not introduce any additional computational burden during inference, thus maintaining deployment efficiency and industrial applicability.

[0094] In summary, the proposed general thermal error prediction framework not only improves the accuracy and stability of thermal error prediction, but also significantly enhances its cross-condition generalization ability and engineering adaptability, providing an efficient, plug-and-play and generalizable modeling paradigm for deep learning of physical information.

[0095] 3.4 Thermal Error Compensation Experiment Verification To achieve dynamic sensing and compensation control of spindle thermal errors, a digital twin system based on the OPC UA protocol and a B / S architecture was developed. This system establishes a closed-loop process of "sensing-modeling-prediction-control": a temperature sensor array is arranged circumferentially around the spindle to continuously collect data from key heat source areas; the edge gateway performs time synchronization, filtering, and structured processing before transmitting the data to the server via the OPC UA protocol; the server uses a digital twin framework integrating multiple time-series models and a differentiable FERR module to predict the spindle axial elongation in real time. When the prediction error remains within a preset threshold, the system generates a value based on the current spindle state. The axis compensation command is sent to the CNC system controller to perform dynamic displacement correction; if the error exceeds the limit, the automatic model retraining mechanism is triggered to enhance the system's robustness and prediction accuracy. The entire process is scheduled by the server core module, and the temperature field and compensation status are visualized in real time through a web interface to achieve online correction of thermal errors.

[0096] The experiment used pre-machined parts made of 304 stainless steel and set the spindle speed. Feed rate Depth of cut A combination strategy of reverse milling and contour milling is adopted. A high-rigidity cubic boron nitride (CBN) face milling cutter is used to ensure shallow cutting stability. Emulsion is directly sprayed into the cutting area through a nozzle to reduce local temperature rise and suppress heat accumulation. Figure 15 The workpiece shown in (a) has a typical annular structure that is sensitive to axial thermal elongation of the spindle. Figure 15 (b) The annular end face planar structure and the layout of 16 evenly distributed sampling points are shown, covering the inner and outer ring areas to capture thermally induced overall warpage and local fluctuations. All measurements are completed in a constant temperature environment by a coordinate measuring machine (CMM) to ensure the reliability of the analysis, providing a practical solution for high-precision machining under complex thermal conditions.

[0097] To verify the effectiveness of the digital twin thermal error compensation system and its differentiable FERR module, end-face milling experiments were conducted. Nine ring-shaped specimens with identical geometric dimensions were selected and divided into three groups under the same machining parameters: W1-W3 served as the uncompensated control group; W4-W6 were machined using a digital twin compensation system based on the GRU model; and W7-W9 incorporated a differentiable FERR module to enhance physical consistency modeling. After machining, thickness data was collected at 16 equally spaced measurement points on each end face using a CMM to calculate the flatness error. The experimental results are as follows: Figure 16 As shown.

[0098] Thickness distribution under three compensation strategies as follows Figure 16 As shown in (a)-(c): No compensation group ( Figure 16 a) Due to thermal expansion of the spindle, there are obvious surface fluctuations and end face deformations; after the GRU model digital twin system is enabled ( Figure 16b) The thickness variation is effectively reduced and the measurement point distribution becomes more concentrated; when further integrated with a differentiable FERR module ( Figure 16 c) The consistency of measurement point height is significantly improved, the machined surface is closer to the ideal design surface, and it exhibits better geometric accuracy and thermal stability.

[0099] Figure 16 (d) presents the statistical analysis results of the three groups of flatness errors. The average flatness error of the uncompensated group is... Furthermore, the error fluctuates significantly. Introducing the GRU model reduced the error to 0.0107. The decrease was approximately The effectiveness of the compensation was initially verified. After incorporating the differentiable FERR module, the error was further reduced to [value missing]. The overall performance was reduced by 82.4% compared to the uncompensated case. These results demonstrate that the proposed digital twin system has strong modeling and control capabilities, with the differentiable FERR module playing a key role in improving prediction accuracy and system robustness.

[0100] 4. Conclusion This embodiment proposes a novel paradigm for machine tool thermal error modeling that integrates data-driven learning and physical consistency. By constructing a physical regularization module based on differentiable finite element residuals and embedding structured residual constraints, the model is guided to obtain a physically consistent solution. Furthermore, a digital twin framework and a thermal error compensation strategy are designed, and the proposed thermal error model and digital twin framework are validated: the thermal error model exhibits better generalization and wider applicability across various mainstream time-series prediction models; the digital twin framework is validated through actual machining and error compensation. This method combines theoretical innovation with practical engineering value, and the main conclusions are as follows: (1) The differentiable FERR module proposed in this embodiment introduces a learnable stiffness matrix K and a mass matrix M to model the spatial coupling effect of the temperature field and the thermal response inertia, respectively, establishing a structured physical mapping between thermal error and temperature. This method does not require complex prior physical knowledge, only... and This allows the training process to autonomously establish a dimensionally consistent mapping mechanism. Embedded as a physical residual term in the backbone model, the differentiable FERR module significantly improves physical consistency and generalization ability during training. In cross-condition prediction experiments, this module maintains high accuracy and stability under unseen thermal conditions, reducing the average RMSE by [percentage missing]. It demonstrates a strong ability to adapt to different scenarios.

[0101] (2) From the perspective of optimizing dynamics and training behavior, the differentiable FERR module forms a dynamic game between physical loss and data loss, and applies structured intervention to the gradient path. This mechanism effectively suppresses overfitting and improves generalization. Further analysis of gradient propagation and parameter updates shows that this module significantly reduces parameter fluctuations during training and enhances the model's stability in dealing with complex thermodynamics. With its powerful plug-and-play characteristics and model independence, the differentiable FERR module can be seamlessly integrated into various mainstream time series prediction networks without structural adjustments or prior physical modeling, demonstrating excellent versatility and engineering applicability.

[0102] (3) The thermal error compensation digital twin system developed in this embodiment is built on the OPC UA protocol and B / S architecture, and has powerful data perception, prediction and closed-loop control capabilities. The system realizes a hierarchical collaborative mechanism of "perception-modeling-prediction-control", and supports cross-platform compatibility and industrial deployment. Milling experiments on a typical ring-shaped workpiece have confirmed that the system can reduce flatness error from Down to The improvement exceeded 82%, effectively suppressing thermally induced planar deformation. These results validate that the proposed compensation strategy has excellent accuracy and practical application potential under complex thermal conditions.

[0103] 5. Appendix Appendix A Figure 17 The layout of temperature and displacement sensors within the electric spindle system is demonstrated. The system deploys nine temperature sensors (T1-T9), covering key heat source areas such as the spindle structure, cooling channels, and surrounding environment. T1 and T6 are located in the Y-axis position of the rear and front bearings, respectively, serving as input nodes for the differentiable FERR module to construct a simplified representation of the axial thermal field. T2 is mounted on the spindle base to monitor overall heat conduction in the housing; T3 and T4 are located at the coolant outlets of the front and rear bearings, respectively; and T5 records the ambient temperature. T7 is positioned on the Y-axis side of the motor, and T8 and T9 correspond to the coolant inlet and outlet of the motor, respectively. Axial thermal error is measured by displacement sensor S1, which is mounted on the spindle end face to record real-time thermal elongation in the Z-axis. This sensor configuration covers the main heat sources and conduction paths, comprehensively monitoring local temperature rise, cooling efficiency, and overall thermal deformation trends, laying a solid data foundation for subsequent modeling and experimental verification. Figure 18 The evolution of the temperature field and the thermal error of the spindle under two typical operating conditions are presented. Figure 18 (a) and (b) show the nine-channel temperature curves and corresponding thermal errors for operating condition 1, respectively. Figure 18 (c) and (d) show the test results for operating condition 2. The data shows that there are significant differences in temperature distribution under different operating conditions, and the spindle thermal error exhibits obvious nonlinear accumulation characteristics. These findings reveal the cross-operating condition complexity of thermal error modeling and highlight the necessity for the model to adapt to different thermal environments.

[0104] Appendix B To ensure the reproducibility of the model training process, the structural parameters and unified training configurations of each backbone model are detailed in Tables B-1 to B-5. The corresponding code and experimental procedures are provided in the appendix for model reconstruction and training validation. Table B-1 lists the basic training settings shared by all models, including input feature dimension, time step, sliding window size, optimizer configuration, normalization method, and computational environment. Tables B-2 to B-5 detail the architecture and key hyperparameters of CNN, GRU, TCN, and Informer models, covering elements such as the number of layers, number of channels, activation function, attention mechanism, kernel size, label length, and prediction stride. Together, these elements constitute the complete technical framework of the model design and implementation strategy adopted in this study.

[0105] Table 1 Basic Training Parameter Configuration Table Table 2 CNN Model Structure and Parameter Settings Table 3 GRU Model Structure and Parameter Settings Table 4 TCN Model Structure and Parameter Settings Table 5 Informer Model Structure and Parameter Settings Appendix C This appendix provides supplementary performance evaluation data for each backbone model with / with the differentiable FERR module enabled / disabled, and the results correspond to... Figure 4 and Figure 6 These are presented in Tables 6 and 7, respectively. Table C-1 corresponds to... Figure 4 The report summarizes the key performance metrics of the six models, including mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination. And the total training time (in seconds). The results show that after introducing the differentiable FERR module, the MAE and RMSE of all models were significantly reduced. The values ​​also showed a significant improvement, confirming the module's strong generalization ability in enhancing modeling accuracy. Meanwhile, the training time remained within a reasonable range, demonstrating the module's practical feasibility in engineering applications.

[0106] Table 7 corresponds to Figure 6The model performance during the training phase was reported to evaluate the impact of the differentiable FERR module on learning stability and fitting ability. Although the introduction of the regularization term led to a slight increase in the training error of some models, the prediction accuracy on the test set was significantly improved, further confirming the effectiveness of this module in enhancing the generalization and robustness of the model across various working conditions.

[0107] Table 6 Comparison of Prediction Results Table 7 Comparison of Fitting Performance The above-described embodiments are merely preferred embodiments provided to fully illustrate the present invention, and the scope of protection of the present invention is not limited thereto. Equivalent substitutions or modifications made by those skilled in the art based on the present invention are all within the scope of protection of the present invention. The scope of protection of the present invention is defined by the claims.

Claims

1. A method for constructing a machine tool thermal error prediction model based on differentiable physical regularization, characterized in that: Includes the following steps: Step 1: Acquire temperature sensor time-series data of the machine tool to be predicted at multiple consecutive time steps; Step 2: Input the temperature time series data into the trained thermal error prediction model to obtain the corresponding thermal error prediction value; wherein, the thermal error prediction model includes a backbone neural network for time series feature extraction and mapping, and a differentiable finite element residual regularization module coupled to the backbone neural network; Step 3: During the training process of the thermal error prediction model, the differentiable finite element residual regularization module receives the thermal error prediction value output by the backbone neural network and the corresponding input temperature sensor time-series data, constructs the physical residual based on the discretized transient heat conduction equation, and calculates the physical residual loss term. Physical residual loss item The predicted thermal error value is considered as an equivalent heat source acting on the heat conduction system; this includes the following steps: 31) The predicted thermal error e is regarded as an equivalent time-varying heat source acting on the heat conduction system; 32) Introduce a learnable stiffness matrix K and a mass matrix M, where: the stiffness matrix K is used to characterize the spatial thermal coupling relationship between temperature field nodes, and the mass matrix M is used to characterize the thermal response inertia of each node; 33) Based on the discretized transient heat conduction equation, calculate the physical residual at each time step t. t ; 34) Calculate the physical residual loss term The physical constraint loss term The mean square error of the physical residuals at all time steps; Step 4: Calculate the physical residual loss term. Compared to data-driven loss terms calculated based on predicted and actual values Perform a weighted summation to construct the total loss function. Step 5: Minimize the total loss function using the gradient backpropagation algorithm. Simultaneously, end-to-end collaborative optimization is performed on the parameters of the backbone neural network and the stiffness matrix K and mass matrix M in the differentiable finite element residual regularization module, so that the optimized model, while outputting the predicted thermal error, satisfies the physical consistency constraint defined by the stiffness matrix K and mass matrix M between its output and input temperature fields.

2. The method for constructing a machine tool thermal error prediction model based on differentiable physical regularization according to claim 1, characterized in that: The differentiable finite element residual regularization module is structurally decoupled from the backbone neural network, and only receives the predicted output e and the corresponding temperature input T from the backbone neural network. t Interacting with each other; the differentiable finite element residual regularization module is designed as a plug-and-play component, which can be integrated into different types of time-series prediction backbone neural networks through a standardized interface without changing the internal structure of the backbone neural network.

3. The method for constructing a machine tool thermal error prediction model based on differentiable physical regularization according to claim 1, characterized in that: The backbone neural network is a deep learning model for time series prediction, selected from any of the following: convolutional neural network, gated recurrent unit network, temporal convolutional network, and Informer model based on attention mechanism.

4. The method for constructing a machine tool thermal error prediction model based on differentiable physical regularization according to claim 1, characterized in that: The physical residual expression calculated by the differentiable finite element residual regularization module is as follows: Wherein: T t f represents the node temperature vector composed of temperature sensor data at time t; t Δt represents the thermal error vector predicted by the backbone neural network at time t; K is a learnable stiffness matrix used to characterize the spatial thermal coupling effect between nodes; M is a learnable mass matrix used to characterize the thermal response inertia of the nodes; Δt is the sampling time interval.

5. The method for constructing a machine tool thermal error prediction model based on differentiable physical regularization according to claim 4, characterized in that: The stiffness matrix K is constrained to be a symmetric positive definite matrix during training; the mass matrix M is parameterized through Cholesky decomposition, i.e.: M=LL T +∈I Where: L is a trainable lower triangular matrix; ∈ is a small positive constant; and I is the identity matrix, thus ensuring the positive definiteness and numerical stability of the quality matrix M.

6. The method for constructing a machine tool thermal error prediction model based on differentiable physical regularization according to claim 4, characterized in that: The physical residual loss item Defined as the mean square error of the physical residuals over all time steps: Where: N is the total number of time steps.

7. The method for constructing a machine tool thermal error prediction model based on differentiable physical regularization according to claim 4, characterized in that: In step four, the total loss function Represented as: in: λ is the data-driven loss term. phy These are adjustable regularization weights used to balance data fit and physical consistency.

8. A machine tool thermal error compensation system based on differentiable physical regularization, characterized in that: include: The physical layer includes a temperature sensor array deployed in the critical heat source area of ​​the machine tool, a displacement sensor for measuring the axial thermal elongation of the spindle, a CNC system for executing compensation commands, and a data acquisition edge gateway for time synchronization, filtering, and preprocessing of sensor data. The data acquisition edge gateway integrates an OPC UA server module for time synchronization, filtering, and structured encapsulation of multi-source sensor data, and publishes it to the upper layer via the OPC UA protocol. The core layer of the digital twin, deployed on the server side, includes an OPC UA client module, a data processing module, a thermal error prediction module, and a compensation strategy module; the OPC UA client is used to actively subscribe to and obtain real-time temperature and thermal error data periodically from the OPC UA server of the physical layer; The data preprocessing module is used to clean, normalize, and handle outliers of the acquired data; the thermal error prediction module integrates a thermal error prediction model constructed by the method described in any one of claims 1-7, which is used to receive the preprocessed temperature data and output the predicted thermal error value in real time. The compensation strategy module is used to generate corresponding CNC axis displacement compensation commands based on the predicted thermal error values. The user interaction layer is a web-based client that provides a 3D visualization human-machine interface for remotely displaying the machine tool's 3D digital twin model, temperature field cloud map, thermal error prediction curve, and thermal error compensation status in real time. It also supports users in selecting models, configuring thresholds, and monitoring the system. The digital twin core layer sends the compensation command to the CNC system of the physical layer through a communication interface to perform real-time dynamic compensation of the machining process and form a closed-loop control.

9. The machine tool thermal error compensation system based on differentiable physical regularization according to claim 8, characterized in that: The core layer of the digital twin also includes a model update module, which is used to monitor the output of the thermal error prediction module; when the predicted thermal error value continues to exceed the preset error threshold, the process of retraining the thermal error prediction model using historical data is automatically triggered.

10. The machine tool thermal error compensation system based on differentiable physical regularization according to claim 8, characterized in that: The thermal error prediction module in the core layer of the digital twin supports switching or fusing multiple different backbone neural networks and provides corresponding visual comparisons in the user interaction layer. The backbone neural networks include convolutional neural networks, gated recurrent unit networks, temporal convolutional networks, and Informer models based on attention mechanisms.