A multi-modal perception based self-adaptive drying method for color box printing

By combining multimodal perception with neural fuzzy reasoning algorithms, the problem of insufficient real-time perception in traditional color box printing drying systems is solved, enabling dynamic optimization of ink status and environmental adaptive control, thereby improving printing quality and production efficiency.

CN121523040BActive Publication Date: 2026-06-19QINGDAO BAINA PACKAGING CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
QINGDAO BAINA PACKAGING CO LTD
Filing Date
2025-11-21
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional color box printing drying systems lack real-time sensing of complex variables such as ink condition, humidity changes, and ambient temperature, resulting in insufficient or excessive drying, which fails to meet the demand for stable and high-quality production.

Method used

By employing a multimodal sensing method, data is fused from infrared, visual, and humidity sensors, and combined with a neural fuzzy inference algorithm to dynamically adjust drying parameters, thereby achieving real-time response to ink status and stable control under environmental disturbances.

Benefits of technology

It achieves comprehensive perception and precise control of the color box printing process, improves printing quality and production efficiency, overcomes the limitations of traditional control methods, and has the ability to learn, adjust and compensate itself.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121523040B_ABST
    Figure CN121523040B_ABST
Patent Text Reader

Abstract

This invention provides an adaptive drying method for color box printing based on multimodal perception, belonging to the field of intelligent manufacturing and printing process control technology. It collects multimodal data during the printing process using multimodal sensors and preprocesses it into spatially consistent multimodal feature sequences. These sequences are then input into a lightweight cross-modal adaptive coding network. A shallow feature extraction network extracts and weights the spatial features of each modality, while a temporal dynamic modeling network captures the temporal evolution of the data. Finally, a modal consistency correction network achieves heterogeneous modal fusion and consistency calibration, outputting a comprehensive representation vector of the ink drying state. This vector is then input into an adaptive controller, where a Gaussian membership function completes feature fuzzification mapping. Based on fuzzy rules, neural fuzzy inference is used to generate optimal drying parameters. This invention achieves intelligent upgrades in the closed-loop control chain of perception, analysis, decision-making, and correction, significantly reducing energy consumption and scrap rates.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of intelligent manufacturing and printing process control technology, and particularly relates to an adaptive drying method for color box printing based on multimodal perception. Background Technology

[0002] Color box printing is a crucial step in the packaging industry, and its printing quality directly impacts the product's appearance and market competitiveness. On modern high-speed printing production lines, the ink drying process is a key factor determining the quality of the finished product and production efficiency. Traditional color box printing drying systems mostly employ fixed temperature, fixed airflow, or simple time-delay control methods for drying regulation. These control strategies lack real-time sensing of complex variables such as ink state, humidity changes, and ambient temperature, leading to frequent instances of under-drying or over-drying, which in turn causes problems such as color differences, adhesion, wrinkles, and excessive energy consumption. Especially in multi-color overprinting, high-gloss, or special material color box printing, this traditional control method, lacking adaptive adjustment, cannot meet the demands for stable, high-quality production.

[0003] In existing technologies, some companies attempt to monitor the printing environment simply by installing infrared temperature or humidity sensors; other studies have proposed methods based on single-modal ink thickness or surface reflectivity detection to indirectly estimate the drying state. However, because the printing process is affected by multiple factors, such as ink layer thickness, paper adsorption properties, equipment thermal field distribution, ventilation structure, and ambient humidity, single-modal data cannot fully reflect the true drying state of the ink. Furthermore, existing drying control algorithms mostly rely on static rules or simple PID feedback control, lacking self-learning and multi-variable coordination capabilities, making it difficult to cope with dynamic changes and complex coupling characteristics in the production process.

[0004] To address the aforementioned issues, an intelligent control method is needed that can sense the ink drying status in real time and dynamically optimize drying parameters in complex production environments. Summary of the Invention

[0005] This invention aims to solve the following three core problems: data isolation and information fragmentation, by using multimodal sensor fusion to achieve unified modeling of multidimensional data such as infrared, vision, and humidity, thereby improving the accuracy of perception of printing status; control response lag and parameter rigidity, by using a neural fuzzy inference algorithm to dynamically adjust drying parameters, replacing fixed rule-based control, and achieving real-time response to changes in ink status; and model mismatch and environmental disturbance, by using a statistical deviation compensation mechanism and online self-correction function to enable the system to maintain stable control performance under fluctuations in ambient temperature and humidity.

[0006] This invention proposes an adaptive drying method for color box printing based on multimodal sensing, comprising the following steps:

[0007] S1, through a multimodal sensor array deployed at key locations in the printing production line, synchronously collects printing surface temperature field image data, printing surface high-resolution texture image data, printing surface humidity data, and ink thickness data during the color box printing process.

[0008] S2 preprocesses the data collected in S1 to form a spatially consistent multimodal feature sequence;

[0009] S3 inputs the multimodal feature sequence into a lightweight cross-modal adaptive coding network, and performs spatial feature extraction and weighting through a shallow feature extraction network to obtain weighted multi-scale features of the printing surface temperature field, weighted multi-scale features of the printing surface color brightness, weighted features of the printing surface humidity, and weighted features of the ink thickness. Then, a time dynamic modeling network captures the nonlinear relationship of the evolution of each modality data over time to obtain the time features of the printing surface temperature field, the time features of the printing surface humidity, and the time features of the ink thickness. Finally, a modal consistency correction network performs heterogeneous modal fusion to output a comprehensive representation vector that reflects the ink drying state.

[0010] S4. The comprehensive representation vector is input into the adaptive controller NFR based on neural fuzzy reasoning. The feature fuzzification mapping is completed through Gaussian membership function, and neural fuzzy reasoning is performed based on fuzzy rules to generate the optimal drying parameters, which are then transmitted to the actuator.

[0011] Preferably, the preprocessing in S2 includes spatial registration, specifically:

[0012] Using the visible light industrial camera coordinate system as a reference, the mapping matrix between the temperature field image data of the printed surface and the high-resolution texture image data is solved through a calibration plate. A dot array calibration plate is placed on the surface of the printing press worktable. A visible light industrial camera and an infrared thermal imager simultaneously capture the same scene. The coordinates of feature points on the calibration plate are detected in both images. Based on the paired point set, the mapping matrix is ​​obtained using the least squares method. ;use Map the pixel coordinates of the temperature field image data of the printed surface to the coordinate system of a visible light industrial camera;

[0013] When spatially registering printing surface humidity data with ink thickness data, a physical location index on the printing press is introduced. This is used to identify the two-dimensional physical location of each sensor sampling point in the printing platform coordinate system; a three-dimensional coordinate form is introduced, that is, a printing plane height is added to the plane. ,get The sampling points for printing surface humidity data and ink thickness data are indexed by coordinates on the printing press. The intrinsic parameter matrix, rotation matrix, and translation vector of the visible light industrial camera are mapped to the visible light industrial camera coordinate system through coordinate transformation, thereby unifying the multimodal data into the same reference space.

[0014] Preferably, the shallow feature extraction network in S3 includes two branches and a cross-modal channel attention layer; the first branch is used to extract features from the temperature field image data of the printed surface and the high-resolution texture image data of the printed surface; the second branch is used to extract features from the humidity data of the printed surface and the ink thickness data; finally, the weights are calculated through the cross-modal channel attention layer. Adjusting the contribution of the four extracted modal features to the degree of printing drying, The weighted multi-scale features of the printing surface temperature field, the weighted multi-scale features of the printing surface color brightness, the weighted printing surface humidity, and the weighted ink thickness are obtained by performing channel-by-channel multiplication with the four modal features respectively.

[0015] Preferably, the first branch utilizes a multi-scale convolutional layer to simultaneously capture local texture and overall contour information of the image. The multi-scale convolutional layer includes three-scale shallow convolutional groups. Each convolutional group consists of depthwise separable convolution, batch normalization, LeakyReLU activation function, and feature concatenation layer. The depthwise separable convolutional kernel sizes of the three shallow convolutional groups are 3*3, 5*5, and 7*7, respectively, and the number of output channels per scale is set to 32. The printing surface temperature field image data and the printing surface high-resolution texture image data are respectively input into the multi-scale convolutional layer to obtain the printing surface temperature field multi-scale features and the printing surface color brightness multi-scale features.

[0016] Before convolving the data features, the second branch uses a linear projection layer to map the printing surface humidity data and ink thickness data to the same dimension. Then, it sequentially uses a path composed of depthwise separable convolution, batch normalization and LeakyReLU activation function to extract features from the printing surface humidity data and ink thickness data, respectively, to obtain the printing surface humidity features and ink thickness features.

[0017] Preferably, the time-dynamic modeling network includes a multi-scale one-dimensional temporal convolutional layer and a temporal smoothing layer. The obtained weighted multi-scale features of the printing surface temperature field, weighted printing surface humidity, and weighted ink thickness are respectively input into the multi-scale one-dimensional temporal convolutional layer to obtain the multi-scale temporal features of the printing surface temperature field, the printing surface humidity, and the ink thickness. The multi-scale one-dimensional temporal convolutional layer consists of two one-dimensional temporal convolutions with kernel scales of 1 and 3, respectively, used to capture the short-term and long-term relationships of the three modal features over time. The multi-scale one-dimensional temporal convolutional layer uses residual connections. Next, the multi-scale temporal features of the printing surface temperature field, the printing surface humidity, and the ink thickness are respectively input into the temporal smoothing layer. By performing exponential weighted averaging on the feature values ​​at adjacent time points, noise in the features is suppressed, and finally, the temporal features of the printing surface temperature field, the printing surface humidity, and the ink thickness are obtained.

[0018] Preferably, the modal consistency correction network specifically comprises:

[0019] First, the weighted multi-scale features of color brightness of the printed surface are input into a one-dimensional convolutional layer for linear projection to obtain visual modal feature vectors; second, the time features of temperature field of printed surface, time features of humidity of printed surface and time features of ink thickness are input into a one-dimensional convolutional layer for linear projection to obtain a set of time modal feature vectors, which includes the projected time features of temperature field of printed surface, time features of humidity of printed surface and time features of ink thickness.

[0020] Secondly, cosine similarity is used to calculate the temporal consistency score of each modality to obtain modal coherence. First, the cosine similarity between the visual modal feature vector and each feature vector in the temporal modal feature vector set is calculated, with a range of [-1, 1]. The closer to 1, the more consistent the directions of the visual feature vector and the temporal feature vector are. The cosine similarity between visual feature and printing surface temperature field temporal feature, visual feature and printing surface humidity feature, and visual feature and ink thickness temporal feature are obtained. The three are then spliced ​​together to form the modal consistency feature.

[0021] Next, a weighted fusion is performed on the feature vectors from the visual modality feature vector set, the feature vectors from the temporal modality feature vector set, and the modality consistency feature. First, a multi-head attention weighting mechanism is used to calculate the attention score for each feature. Second, after obtaining the attention scores for each modality, the scores are normalized using the Softmax function. Finally, the normalized attention scores are used to perform a linear weighted fusion of each feature vector to obtain a comprehensive representation vector. .

[0022] Preferably, in step S4, feature fuzzification mapping is performed using a Gaussian membership function. The specific process is as follows:

[0023] The comprehensive representation vector The input features are fed into the fuzzy inference module, where a Gaussian membership function is used to perform fuzzy mapping on the input features.

[0024] ;

[0025] in, For the first The membership degree of each of the j features with respect to the i-th fuzzy subset; and These are the center and width parameters of the membership function, respectively; Let j be the j-th eigenvalue in the comprehensive representation vector; through membership degree calculation, the multimodal feature space is divided into multiple semantic regions to describe the brightness state, humidity level, ink thickness, and temperature change trends, including:

[0026] Brightness status zones: Low brightness, Medium brightness, High brightness;

[0027] Humidity status zones: Low humidity, Medium humidity, High humidity;

[0028] Ink thickness range: thin layer, medium thickness, thick layer;

[0029] Temperature change trend zones: rapid decrease, slow decrease, stable.

[0030] Preferably, the specific process of generating optimal drying parameters by performing neural fuzzy inference based on fuzzy rules is as follows:

[0031] Membership degree of each input feature As the input activation intensity for the neural fuzzy inference rules, R fuzzy rules are first constructed, each rule representing a drying state response pattern, in the form of:

[0032] If the input feature vector The first component belongs to the fuzzy subset corresponding to the rule. And the second component belongs to the corresponding fuzzy subset. Then the system output It is given by a linear model, which consists of a constant term. With each input component According to weight A linear combination yields; where, It is the fuzzy subset corresponding to the j-th input of the r-th rule; and These are the linear parameters learned by the neural network. This is the linear output of the r-th rule; for The feature dimension; the fuzzy rules are constructed based on the Takagi-Sugeno model and cover all combinations of the four semantic regions;

[0033] Calculate the activation strength of each rule And normalize it; The activation values ​​are calculated from the product of the input membership degrees. The activation strengths of all rules are summed and used as a normalization factor. The activation strength of each rule is divided by the sum to obtain the normalized activation value of each rule. The sum of the activation weights of all rules is 1.

[0034] The sub-output values ​​of each rule are weighted and merged according to their normalized weights to obtain the comprehensive control output of the drying system. , These represent the optimal control parameters for drying temperature, air velocity, and drying time, respectively.

[0035] Compared with the prior art, the present invention has the following innovative features and beneficial effects:

[0036] (1) Multimodal fusion mechanism. This invention is the first to achieve multimodal fusion of infrared thermal imaging, visual texture, humidity and ink thickness signals in the field of color box printing drying, forming a spatiotemporally consistent feature sequence, breaking through the limitations of traditional single-modal detection in information coverage and state representation;

[0037] (2) Lightweight cross-modal adaptive coding network. A feature self-balancing mechanism with modality consistency correction function is proposed, which can realize dynamic weight adjustment and feature space alignment under the condition of heterogeneous multimodal information, ensuring the real-time performance and robustness of feature extraction and fusion;

[0038] (3) Neural fuzzy reasoning control model. Combining the self-learning ability of deep learning with the interpretability of fuzzy logic, nonlinear adaptive optimization control of printing drying parameters is achieved, overcoming the limitations of traditional PID or fixed rule control;

[0039] (4) Online self-calibration and environmental compensation mechanism. Lightweight online correction of model parameters is achieved through sliding statistical bias analysis, which has the ability to learn, recover and adapt to the environment, and significantly improves the sustainable stability of the system in actual industrial scenarios.

[0040] This invention, through the collaborative design of multimodal perception, intelligent fusion, and adaptive control, transforms the color box printing drying process from a traditional control mode relying on manual experience and static parameter settings into an intelligent, dynamic closed-loop adjustment system with self-learning, self-adjustment, and self-compensation capabilities. This system can achieve comprehensive perception and precise control of the ink drying state in complex and ever-changing production environments, significantly improving printing quality and production efficiency. Through the synergistic effect of a multi-source sensor array, a cross-modal feature fusion network, and a neural fuzzy adaptive control algorithm, this invention constructs a complete intelligent control link from data acquisition, state representation, strategy decision-making, and self-correction feedback, overcoming the technical bottlenecks of information isolation, response lag, and coarse control in traditional printing drying processes. Attached Figure Description

[0041] Figure 1 This is a flowchart illustrating the overall technical route of the present invention.

[0042] Figure 2 This is a diagram of the cross-modal adaptive coding network structure.

[0043] Figure 3 Flowchart for implementing the neural fuzzy inference controller.

[0044] Figure 4 This example compares the multimodal feature fusion effects.

[0045] Figure 5 The output response surface diagram of the neural fuzzy controller in this embodiment is shown.

[0046] Figure 6 This is a diagram illustrating the self-correction effect of the drying system error in the embodiment. Detailed Implementation

[0047] This invention proposes an adaptive drying method for color box printing based on multimodal sensing, realizing full-time monitoring and intelligent control of the color box printing drying process. The overall process is as follows: Figure 1As shown in the diagram. In specific implementation, this invention first deploys a multimodal sensor array at key locations on the printing production line. A time synchronization module performs unified timestamp correction and registration on the various signals to obtain a multimodal feature sequence with spatiotemporal consistency, providing a reliable data foundation for subsequent feature fusion and decision control. In the data processing stage, this invention designs a lightweight cross-modal adaptive coding network (CAEN) to achieve efficient fusion and adaptive representation of multimodal features. The CAEN network ultimately outputs a comprehensive state feature vector, which accurately reflects the ink drying degree, temperature gradient change, and moisture evaporation rate under the current printing state, providing a quantitative state basis for the adaptive control process. In the control strategy generation stage, this invention introduces an adaptive control method based on neural fuzzy reasoning (NFR). This controller learns the mapping relationship between the CAEN output features and drying parameters through the nonlinear fitting capability of neural networks, while simultaneously integrating fuzzy logic rules to achieve interpretability and robustness of control decisions. This strategy can not only adaptively adjust according to real-time changes in ink drying rate but also automatically migrate under different printing materials, ink types, and environmental conditions, achieving intelligent closed-loop regulation driven by state. To ensure the long-term stable operation of the system, this invention also proposes an online self-correction and environmental compensation mechanism based on statistical deviation compensation. After each production cycle, the system compares the deviation between the predicted drying effect and the actual drying result, and establishes an adaptive correction coefficient through moving average and standard deviation analysis to dynamically correct key weight parameters in the control model. Furthermore, when significant fluctuations in external ambient temperature or humidity are detected, the system automatically triggers an environmental compensation mode, making small-scale dynamic adjustments to the target drying temperature and wind speed parameters, enabling the system to maintain stable control performance without retraining the model. Throughout the system's workflow, each module forms a closed-loop feedback loop between data and control. A multimodal sensor array continuously collects new state data, which, after time synchronization and CAEN feature extraction, forms a comprehensive state vector. The NFR controller calculates the optimal drying control parameters based on this state and outputs them to the actuator. The drying results, after detection and statistical analysis, are fed back to the correction module to update model parameters and guide the next control cycle. This closed-loop mechanism ensures that the system maintains optimal drying conditions at different stages of the printing process, achieving fully automatic adaptive control from data acquisition to decision execution.

[0048] The specific implementation process of the present invention will be described in detail below with reference to specific embodiments.

[0049] I. Multimodal Data Acquisition and Synchronous Calibration

[0050] In the color box printing and drying process, to achieve accurate monitoring of multiple factors such as temperature, humidity, ink thickness, and surface condition, multimodal sensing data acquisition was implemented in this stage. This system consists of an infrared thermal imager, a visible light industrial camera, a humidity sensor, and an optical ink thickness measurement module. Time synchronization and spatial calibration mechanisms ensure spatiotemporal consistency of the data. All sensor data undergo sampling delay correction via the time synchronization module and spatial registration via coordinate mapping, thus forming a structurally consistent and time-aligned multimodal feature sequence. This provides a high-precision input foundation for subsequent cross-modal feature fusion and adaptive drying control. Specifically, the following steps are included:

[0051] 1. Multimodal data acquisition

[0052] To achieve high-precision dynamic monitoring of the thermal field, humidity field, and ink layer thickness during the printing process on a color box printing production line, this invention designs a multimodal sensor array to collect state data under different physical dimensions. The array mainly consists of an infrared thermal imager, a visible light industrial camera, a humidity sensor, and an optical ink thickness measurement module.

[0053] (1) Data acquisition of temperature field images of printed surfaces; an infrared thermal imager with a resolution of 640×480 pixels and a wavelength range of 8~14 was used. An uncooled infrared detector with a frame rate of 30fps and a thermal sensitivity of 0.05℃ obtains temperature field image data of the printed surface through the principle of infrared radiation thermometry. ;exist Time, position of printed surface The temperature at that location can be expressed as ;

[0054] (2) High-resolution texture image data acquisition of printed surfaces: A visible light industrial camera with a resolution of 1920×1080 pixels and a frame rate of 60fps was used to acquire high-resolution texture images of the printed surfaces. ;exist Time, position of printed surface The color brightness value at that location can be expressed as High-resolution texture image data of the printed surface is used to quantify the degree of surface dryness.

[0055] (3) Data acquisition of humidity on printed surfaces; The humidity sensor is model DHT-22, with a measurement range of 0-100%RH, an accuracy of ±2%RH, and a sampling frequency of 10Hz, to obtain humidity data of the printed surfaces. ;

[0056] (4) Ink thickness data acquisition; ink thickness is measured using an optical interferometric thickness sensor. Measurement range 0-50 Resolution 0.01 ;

[0057] The sampling frequencies of the uncooled infrared detector, visible light industrial camera, humidity sensor, and optical interferometric thickness sensor are controlled by a unified sampling controller. The four modal data collected are timestamped. Record and form a multimodal dataset of color box printing. .

[0058] 2. Spatial registration

[0059] Due to the obtained multimodal dataset of color box printing There is an issue of inconsistent spatial coordinates, so spatial registration must be performed to align the modal data to a unified spatial reference coordinate system.

[0060] Using the visible light industrial camera coordinate system as a reference, the mapping matrix between the temperature field image data of the printed surface and the high-resolution texture image data is solved through a calibration plate. A dot array calibration plate is placed on the surface of the printing press worktable. A visible light industrial camera and an infrared thermal imager simultaneously capture the same scene. The coordinates of feature points on the calibration plate are detected in both images. Based on the paired point set, the mapping matrix is ​​obtained using the least squares method. ;use The pixel coordinates of the printed surface temperature field image data are mapped to the visible light industrial camera coordinate system, as follows:

[0061] ;

[0062] in, For high-resolution texture image data pixel coordinates of the printed surface, The pixel coordinates of the temperature field image data of the printed surface. This is a homogeneous transformation matrix. To achieve spatial registration between printing surface humidity data and ink thickness data, a physical location index on the printing press is introduced. This is used to identify the two-dimensional physical location of each sensor sampling point in the printing platform coordinate system. Considering the imaging geometric mapping relationship, a three-dimensional coordinate form is introduced into the model, that is, a printing plane height is added to the plane. ,get The sampling points for printing surface humidity data and ink thickness data are indexed using coordinates on the printing press. The intrinsic parameter matrix, rotation matrix, and translation vector of the visible light industrial camera are mapped to the visible light industrial camera coordinate system through coordinate transformation, thereby unifying the multimodal data into the same reference space.

[0063] Finally, the spatially registered multimodal dataset is combined with the spatially mapped pixel coordinates to generate a spatially consistent multimodal feature sequence:

[0064] ;

[0065] in For spatially registered printed surface temperature field image data, For spatially registered printing surface humidity data, This is the spatially registered ink thickness data. This multimodal feature sequence... Spatial alignment was achieved, providing standardized input data for subsequent cross-modal feature fusion and adaptive representation models.

[0066] II. Design of a Lightweight Cross-Modal Adaptive Coding Network (CAEN)

[0067] After completing the acquisition and spatial synchronization of multimodal data, this invention constructs a lightweight cross-modal adaptive coding network (CAEN) to extract high-dimensional features that accurately reflect the drying state of color box printing from complex and heterogeneous sensor information. This network outputs a multimodal feature sequence. Using data as input, CAEN achieves spatial feature extraction, temporal dynamic analysis, and intermodal consistency correction through joint modeling of multi-source information including printing surface temperature field image data, printing surface high-resolution texture image data, printing surface humidity data, and ink thickness data. CAEN comprises a shallow feature extraction network, a temporal dynamic modeling network, and a modal consistency correction network, enabling high-precision real-time characterization of the printing state while maintaining low computational complexity.

[0068] 1. Shallow Feature Extraction Network

[0069] This network is primarily responsible for processing the input multimodal feature sequences. Preliminary encoding is performed to extract spatial texture, color, humidity, and thickness features at the location level of the printed surface. The shallow feature extraction network contains two branches and a cross-modal channel attention layer. The first branch is used to extract features from the temperature field image data and high-resolution texture image data of the printed surface. The second branch is used to extract features from the humidity data and ink thickness data of the printed surface.

[0070] The first branch: Considering the spatial resolution difference between the temperature field image data and the high-resolution texture image data of the printed surface, a multi-scale convolutional layer is used to simultaneously capture local texture and overall contour information of the image. This multi-scale convolutional layer includes three shallow convolutional groups, each consisting of depthwise separable convolution, batch normalization, LeakyReLU activation function, and a feature concatenation layer. The depthwise separable convolutional kernel sizes of the three shallow convolutional groups are 3*3, 5*5, and 7*7, respectively, with 32 output channels per scale, thus reducing computational overhead while maintaining feature richness. and Multi-scale features of temperature field and color brightness of printed surface are obtained by inputting multi-scale convolutional layers respectively.

[0071] The second branch: To enhance the sensitivity of features to ink thickness and surface dryness, the second branch uses a linear projection layer before convolving the data features. and Mapping to the same dimension, then sequentially applying a pathway consisting of depthwise separable convolution, batch normalization, and the LeakyReLU activation function to respectively... and Feature extraction is performed to obtain the humidity characteristics of the printed surface and the ink thickness characteristics.

[0072] Cross-modal channel attention layer: This invention adjusts the contribution of multi-scale features of printing surface temperature field, multi-scale features of printing surface color brightness, printing surface humidity, and ink thickness to the degree of printing drying by adjusting the output weights of the cross-modal channel attention layer. It highlights the key information that reflects the printing drying state among the four modal features. The specific weight calculation formula is as follows:

[0073] ;

[0074] GAP stands for Global Average Pooling. For channel attention weights, This indicates channel-by-channel multiplication. and For attention layer parameters, , , and These are multi-scale characteristics of the printing surface temperature field, multi-scale characteristics of the printing surface color brightness, printing surface humidity characteristics, and ink thickness characteristics, respectively; this design enables the network to automatically amplify the channel responses sensitive to drying conditions in the shallow layer; The weighted multi-scale features of the printing surface temperature field, the weighted multi-scale features of the printing surface color brightness, the weighted printing surface humidity, and the weighted ink thickness are obtained by performing channel-by-channel multiplication with the four modal features respectively.

[0075] 2. Time-dynamic modeling network;

[0076] A temporal dynamic modeling network is used to model the dynamic changes of ink temperature, humidity, and thickness on a printed surface over time, thereby capturing the nonlinear relationship between ink temperature, humidity, and thickness over time. The network comprises a multi-scale one-dimensional temporal convolutional layer and a temporal smoothing layer. The weighted multi-scale features of the printed surface temperature field, weighted multi-scale features of the printed surface humidity field, and weighted multi-scale features of the ink thickness obtained from the shallow feature extraction network are input into the multi-scale one-dimensional temporal convolutional layer to obtain the multi-scale temporal features of the printed surface temperature field, the multi-scale temporal features of the printed surface humidity field, and the multi-scale temporal features of the ink thickness. The multi-scale one-dimensional temporal convolutional layer consists of two one-dimensional temporal layers. The system consists of convolutional layers with kernel scales of 1 and 3, used to capture the short-term and long-term relationships of three modal features over time. To avoid gradient explosion caused by deep feature extraction, residual connections are used in the multi-scale one-dimensional temporal convolutional layers. Next, the multi-scale temporal features of the printing surface temperature field, printing surface humidity, and ink thickness are input into the temporal smoothing layer. Noise in the features is suppressed by exponentially weighted averaging of the feature values ​​at adjacent time points, ultimately obtaining the temporal features of the printing surface temperature field, printing surface humidity, and ink thickness. The calculation formula for the temporal smoothing layer is shown below:

[0077] ;

[0078] in, For input features. To control the coefficient of time smoothing intensity, Features after time smoothing The input feature length.

[0079] 3. Modal consistency correction network;

[0080] The goal of a modal consistency correction network is to perform consistency correction and cross-modal fusion on heterogeneous modalities in the time dimension, and output a global representation. The two main implementation mechanisms of this layer are: (1) modal projection and cross-modal consistency scoring; and (2) attention-weighted fusion based on current consistency.

[0081] First, the weighted multi-scale features of color and brightness of the printed surface are input into a one-dimensional convolutional layer for linear projection to obtain visual modal feature vectors. Second, the time features of temperature field, humidity, and ink thickness of the printed surface are input into a one-dimensional convolutional layer for linear projection to obtain a set of time modal feature vectors. The set of time modal feature vectors includes the projected time features of temperature field, humidity, and ink thickness of the printed surface. The purpose of this step is to make the set of time modal feature vectors and the visual modal feature vectors lie in the same semantic space, so that they become comparable modal vectors.

[0082] Secondly, cosine similarity is used to calculate the temporal consistency score of each modality to obtain modal coherence. First, the cosine similarity between the visual modal feature vector and each feature vector in the temporal modal feature vector set is calculated. The value ranges from [-1, 1]. The closer it is to 1, the more consistent the directions of the visual feature vector and the temporal feature vector are, indicating that the two have the same direction of influence on the drying state of the printing surface. Finally, the cosine similarity between the visual feature and the printing surface temperature field time feature, the visual feature and the printing surface humidity feature, and the visual feature and the ink thickness time feature are obtained. The three are then spliced ​​together to form the modal consistency feature.

[0083] The feature vectors from the visual modality feature vector set, the temporal modality feature vector set, and the modality consistency feature are then weighted and fused again. Specifically, firstly, a multi-head attention weighting mechanism is used to calculate the attention score of each feature; secondly, after obtaining the attention scores of each modality, the scores are further normalized using the Softmax function to achieve adaptive balance of modality weights; finally, the normalized attention scores are used to linearly weight and fuse the feature vectors to obtain the comprehensive representation vector. This weighted fusion mechanism can achieve information complementarity while maintaining the differences in modal features. By integrating multi-source dynamic features such as visual texture, thermal imaging temperature, ambient humidity, and ink thickness, this method can comprehensively reflect the key physical changes in ink drying rate, heat diffusion, and humidity coupling effects during the printing process. The cross-modal adaptive coding network structure diagram of this process is shown below. Figure 2 As shown.

[0084] III. Adaptive Drying Strategy Generation and Control Optimization

[0085] This stage aims to utilize the comprehensive representation vector output by the CAEN network model. This invention achieves adaptive control and optimization of the drying process. The drying system incorporates a Neural Fuzzy Inference Controller (NFR), integrating the nonlinear learning capability of neural networks with the interpretability of fuzzy logic. The NFR used in this invention is a hybrid neural fuzzy network based on a Takagi-Sugeno structure, employing a five-layer ANFIS-like structure to combine nonlinear mapping learning with fuzzy rule interpretation and to adaptively adjust the drying process of color box printing. This model combines the nonlinear learning capability of neural networks with fuzzy logic rules to achieve dynamic adaptive adjustment of drying temperature, air velocity, and time. This control structure can maintain stable and efficient drying performance under different raw material moisture contents, ink thicknesses, and environmental disturbances. The process specifically includes:

[0086] 1. State feature input and fuzzy mapping:

[0087] The comprehensive representation vector from the CAEN model The input features are fed into the fuzzy inference module, where a Gaussian membership function is used to perform fuzzy mapping on the input features.

[0088] ;

[0089] in, For the first The membership degree of each of the j features with respect to the i-th fuzzy subset; and These are the center and width parameters of the membership function, respectively; Let be the j-th eigenvalue in the comprehensive representation vector. Through membership degree calculation, the system divides the multimodal feature space into multiple semantic regions to describe brightness state, humidity level, ink thickness, and temperature change trends, including:

[0090] (1) Brightness status areas: low brightness, medium brightness, high brightness;

[0091] (2) Humidity status zones: low humidity, medium humidity, high humidity;

[0092] (3) Ink thickness range: thin layer, medium thickness, thick layer;

[0093] (4) Temperature change trend areas: rapid decrease, slow decrease, stable.

[0094] 2. Neural Fuzzy Reasoning and Control Parameter Generation:

[0095] After fuzzification, the membership degree of each input feature is... The activation intensity is directly used as the input to the neural fuzzy inference rules. First, R fuzzy rules are constructed, each representing a drying state response pattern, in the form:

[0096] ;

[0097] in, It is the fuzzy subset corresponding to the j-th input of the r-th rule; and These are the linear parameters learned by the neural network. is the linear output of the r-th rule; the fuzzy rule is constructed based on the Takagi-Sugeno model and covers all combinations of the four semantic regions.

[0098] Next, calculate the activation strength of each rule. And normalize it. Calculated from the product of input membership degrees: The activation strengths of all rules are summed, and this sum is used as a normalization factor. The activation strength of each rule is then divided by the sum to obtain the normalized activation value for each rule. This ensures that the sum of the activation weights of all rules is 1, which is used for subsequent output weighting calculations.

[0099] Finally, after normalizing the rule activation strength, the sub-output values ​​of each rule are weighted and fused according to their normalized weights to obtain the system's comprehensive control output. . These represent the optimal control parameters for drying temperature, air velocity, and drying time, respectively. This output is directly transmitted to the actuator as a control command signal to achieve control of the drying system.

[0100] 3. Adaptive update and control optimization of neural weights:

[0101] To enhance the self-learning capability of the NFR controller under different environmental disturbances, this invention introduces an online neural weight update mechanism during the training phase. An instantaneous error energy function is defined. . This is the preset desired control output vector; This reflects the current deviation of the drying parameters; further, based on the gradient descent principle, for each linear weight parameter... Update. In each iteration, the new weights are equal to the previous weights plus the learning rate multiplied by the partial derivative of the error energy with respect to that weight, thus achieving parameter correction along the steepest descent direction. This indicates the direction of parameter adjustment. Furthermore, to improve the stability of the training process and avoid oscillations, this invention introduces a momentum term correction mechanism in the weight update. That is, the new weight change is equal to the weighted sum of the current gradient term and the previous week's option value change. The momentum factor is used to smooth the update process, enabling the system to maintain response speed while reducing parameter oscillations.

[0102] 4. Control output execution and closed-loop optimization

[0103] When the optimal control parameters are obtained Then, the system transmits the parameters to the execution unit via fieldbus or industrial communication interface. Temperature setpoint Control the power output of the drying machine's heating module; fan speed setpoint Adjust the fan motor speed; adjust the drying time coefficient. Adjust the duration of the drying stage. Figure 3 This is a flowchart illustrating the implementation of a neuro-fuzzy inference controller.

[0104] IV. Simulation Experiment

[0105] like Figure 4 As shown, the correlation between different modal features was low before fusion, and the elements outside the diagonal of the matrix were weak; however, after fusion by the CAEN model, the correlation between features was significantly enhanced, indicating that the model can effectively extract cross-modal coupled features and provide high-dimensional semantic information for subsequent fuzzy control.

[0106] like Figure 5 As shown, the nonlinear response surface morphology of the neural fuzzy controller in the input feature space is displayed. The surface changes smoothly and continuously, indicating that the fuzzy rule layer successfully captures the coupling relationship between the input features. At the same time, the neural network part enhances the global nonlinear mapping capability of the output, realizing multidimensional dynamic optimization of temperature, wind speed and time control parameters.

[0107] like Figure 6 As shown in the figure, the results demonstrate that the online self-calibration and environmental compensation mechanism proposed in this invention can effectively suppress system drying error fluctuations under environmental disturbance conditions. As can be seen from the figure, when the environmental humidity changes abruptly (t>60), the uncalibrated system (red line) exhibits a significant increase in error and unstable fluctuations; however, after enabling the self-calibration mechanism (blue line), the system can correct key control weights in real time through moving mean deviation analysis, causing the drying error to gradually converge and remain within a low stable range. This indicates that the designed deviation compensation and self-adjustment strategy possesses good robustness and steady-state control performance under changes in environmental temperature and humidity, and can maintain the stability and consistency of the drying process without retraining the model.

[0108] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

[0109] While the specific embodiments of the present invention have been described above, they are not intended to limit the scope of protection of the present invention. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of the present invention are still within the scope of protection of the present invention.

Claims

1. A multi-modal perception based self-adaptive drying method for carton printing, characterized in that, Includes the following steps: S1, through a multimodal sensor array deployed at key locations in the printing production line, synchronously collects printing surface temperature field image data, printing surface high-resolution texture image data, printing surface humidity data, and ink thickness data during the color box printing process. S2 preprocesses the data collected in S1 to form a spatially consistent multimodal feature sequence; S3 inputs the multimodal feature sequence into a lightweight cross-modal adaptive coding network, and performs spatial feature extraction and weighting through a shallow feature extraction network to obtain weighted multi-scale features of the printing surface temperature field, weighted multi-scale features of the printing surface color brightness, weighted features of the printing surface humidity, and weighted features of the ink thickness. Then, a time dynamic modeling network captures the nonlinear relationship of the evolution of each modality data over time to obtain the time features of the printing surface temperature field, the time features of the printing surface humidity, and the time features of the ink thickness. Finally, a modal consistency correction network performs heterogeneous modal fusion to output a comprehensive representation vector that reflects the ink drying state. S4. The comprehensive representation vector is input into the adaptive controller NFR based on neural fuzzy reasoning. The feature fuzzification mapping is completed through Gaussian membership function, and neural fuzzy reasoning is performed based on fuzzy rules to generate the optimal drying parameters, which are then transmitted to the actuator.

2. A multi-modal perception based carton printing self-adaptive drying method according to claim 1, characterized in that: The preprocessing in S2 includes spatial registration, specifically: Using the visible light industrial camera coordinate system as a reference, the mapping matrix between the temperature field image data of the printed surface and the high-resolution texture image data is solved through a calibration plate. A dot array calibration plate is placed on the surface of the printing press worktable. A visible light industrial camera and an infrared thermal imager simultaneously capture the same scene. The coordinates of feature points on the calibration plate are detected in both images. Based on the paired point set, the mapping matrix is ​​obtained using the least squares method. ;use Map the pixel coordinates of the temperature field image data of the printed surface to the coordinate system of a visible light industrial camera; When spatially registering printing surface humidity data with ink thickness data, a physical location index on the printing press is introduced. This is used to identify the two-dimensional physical location of each sensor sampling point in the printing platform coordinate system; a three-dimensional coordinate form is introduced, that is, a printing plane height is added to the plane. ,get The sampling points for printing surface humidity data and ink thickness data are indexed by coordinates on the printing press. The intrinsic parameter matrix, rotation matrix, and translation vector of the visible light industrial camera are mapped to the visible light industrial camera coordinate system through coordinate transformation, thereby unifying the multimodal data into the same reference space.

3. The adaptive drying method for color box printing based on multimodal sensing as described in claim 1, characterized in that: The shallow feature extraction network in S3 includes two branches and a cross-modal channel attention layer; the first branch is used to extract features of the temperature field image data of the printed surface and features of the high-resolution texture image data of the printed surface. The second branch is used to extract the moisture data features of the printed surface and the ink thickness data features; finally, the weights are calculated through a cross-modal channel attention layer. Adjusting the contribution of the four extracted modal features to the degree of printing drying, The weighted multi-scale features of the printing surface temperature field, the weighted multi-scale features of the printing surface color brightness, the weighted printing surface humidity, and the weighted ink thickness are obtained by performing channel-by-channel multiplication with the four modal features respectively.

4. A multi-modal perception based carton printing self-adaptive drying method according to claim 3, characterized in that: The first branch utilizes multi-scale convolutional layers to simultaneously capture local texture and overall contour information of the image. The multi-scale convolutional layers include three-scale shallow convolutional groups. Each convolutional group consists of depthwise separable convolution, batch normalization, LeakyReLU activation function, and feature concatenation layer. The depthwise separable convolutional kernel sizes of the three shallow convolutional groups are 3*3, 5*5, and 7*7, respectively, and the number of output channels per scale is set to 32. The printing surface temperature field image data and the printing surface high-resolution texture image data are respectively input into the multi-scale convolutional layers to obtain multi-scale features of the printing surface temperature field and multi-scale features of the printing surface color and brightness. Before convolving the data features, the second branch uses a linear projection layer to map the printing surface humidity data and ink thickness data to the same dimension. Then, it sequentially uses a path composed of depthwise separable convolution, batch normalization and LeakyReLU activation function to extract features from the printing surface humidity data and ink thickness data, respectively, to obtain the printing surface humidity features and ink thickness features.

5. A multi-modal perception based self-adaptive drying method for carton printing, according to claim 3, wherein: The time dynamic modeling network includes a multi-scale one-dimensional time convolutional layer and a time smoothing layer; the obtained weighted printing surface temperature field multi-scale features, weighted printing surface humidity features and weighted ink thickness features are respectively input into the multi-scale one-dimensional time convolutional layer to obtain the printing surface temperature field multi-scale time features, printing surface humidity multi-scale time features and ink thickness multi-scale time features. The multi-scale one-dimensional temporal convolutional layer consists of two one-dimensional temporal convolutions with kernel scales of 1 and 3, respectively, used to capture the short-term and long-term relationships of three modal features over time. The multi-scale one-dimensional temporal convolutional layer uses residual connections. Then, the multi-scale temporal features of the printing surface temperature field, the printing surface humidity, and the ink thickness are respectively input into the temporal smoothing layer. By performing an exponential weighted average of the feature values ​​at adjacent time points, noise in the features is suppressed, and finally, the temporal features of the printing surface temperature field, the printing surface humidity, and the ink thickness are obtained.

6. A multi-modal perception based carton printing self-adaptive drying method according to claim 5, characterized in that: The modal consistency correction network is specifically as follows: First, the weighted multi-scale features of color brightness of the printed surface are input into a one-dimensional convolutional layer for linear projection to obtain visual modal feature vectors; second, the time features of temperature field of printed surface, time features of humidity of printed surface and time features of ink thickness are input into a one-dimensional convolutional layer for linear projection to obtain a set of time modal feature vectors, which includes the projected time features of temperature field of printed surface, time features of humidity of printed surface and time features of ink thickness. Secondly, cosine similarity is used to calculate the temporal consistency score of each modality to obtain modal coherence. First, the cosine similarity between the visual modal feature vector and each feature vector in the temporal modal feature vector set is calculated, with a range of [-1, 1]. The closer to 1, the more consistent the directions of the visual feature vector and the temporal feature vector are. The cosine similarity between visual feature and printing surface temperature field temporal feature, visual feature and printing surface humidity feature, and visual feature and ink thickness temporal feature are obtained. The three are then spliced ​​together to form the modal consistency feature. Next, the feature vectors in the visual modality feature vector set, the feature vector set in the temporal modality feature vector set, and the modality consistency feature are weighted and fused; firstly, a multi-head attention weighting mechanism is used to calculate the attention score of each feature; Secondly, after obtaining the attention scores for each modality, the scores are normalized using the Softmax function. Finally, the normalized attention scores are used to perform linear weighted fusion of the feature vectors to obtain a comprehensive representation vector. .

7. The adaptive drying method for color box printing based on multimodal sensing as described in claim 1, characterized in that: In S4, feature fuzzification mapping is completed through a Gaussian membership function. The specific process is as follows: The comprehensive representation vector The input features are fed into the fuzzy inference module, where a Gaussian membership function is used to perform fuzzy mapping on the input features. ; in, For the first The membership degree of each of the j features with respect to the i-th fuzzy subset; and These are the center and width parameters of the membership function, respectively; Let j be the j-th eigenvalue in the comprehensive representation vector; through membership degree calculation, the multimodal feature space is divided into multiple semantic regions to describe the brightness state, humidity level, ink thickness, and temperature change trends, including: Brightness status zones: Low brightness, Medium brightness, High brightness; Humidity status zones: Low humidity, Medium humidity, High humidity; Ink thickness range: thin layer, medium thickness, thick layer; Temperature change trend zones: rapid decrease, slow decrease, stable.

8. The adaptive drying method for color box printing based on multimodal sensing as described in claim 7, characterized in that: The specific process of generating optimal drying parameters through neural fuzzy reasoning based on fuzzy rules is as follows: Membership degree of each input feature As the input activation intensity for the neural fuzzy inference rules, R fuzzy rules are first constructed, each rule representing a drying state response pattern, in the form of: If the input feature vector The first component belongs to the fuzzy subset corresponding to the rule. And the second component belongs to the corresponding fuzzy subset. Then the system output It is given by a linear model, which consists of a constant term. With each input component According to weight A linear combination yields; where, It is the fuzzy subset corresponding to the j-th input of the r-th rule; and These are the linear parameters learned by the neural network. This is the linear output of the r-th rule; for The feature dimension; The fuzzy rules are constructed based on the Takagi-Sugeno model and cover all combinations of the four semantic regions; Calculate the activation strength of each rule And normalize it; The activation values ​​are calculated from the product of the input membership degrees. The activation strengths of all rules are summed and used as a normalization factor. The activation strength of each rule is divided by the sum to obtain the normalized activation value of each rule. The sum of the activation weights of all rules is 1. The sub-output values ​​of each rule are weighted and merged according to their normalized weights to obtain the comprehensive control output of the drying system. , These represent the optimal control parameters for drying temperature, air velocity, and drying time, respectively.