Method for extracting moire fringe alignment offset based on neural network
By using a neural network method based on convolutional attention mechanism, the problem of extracting the fundamental frequency component in small-area moiré fringe images is solved, and high-precision alignment offset extraction is achieved, which is suitable for photolithography and precision alignment measurement.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INST OF OPTICS & ELECTRONICS CHINESE ACAD OF SCI
- Filing Date
- 2026-02-10
- Publication Date
- 2026-06-23
AI Technical Summary
Existing algorithms for analyzing moiré fringe offsets struggle to extract the effective fundamental frequency component from moiré fringe images with small alignment marks with high precision, resulting in insufficient alignment accuracy.
A neural network method based on convolutional attention mechanism is adopted to adaptively generate a two-dimensional filter template through frequency domain transformation, filtering and feature fusion, suppress zero-frequency component interference, extract effective spectral components, and determine the alignment offset between the mask and the substrate through multi-channel fusion features.
It achieves high-precision and robust offset extraction under small-area marking conditions, improves alignment accuracy, and is applicable to photolithography and other precision alignment measurement fields.
Smart Images

Figure CN122265660A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of photolithography alignment technology, and more specifically to a method for extracting moiré fringe alignment offset based on a neural network. Background Technology
[0002] Moiré fringe-based photolithography alignment technology is widely used in contact or proximity lithography systems due to its high precision and simple structure. Alignment technology is mainly used to detect the lateral offset between the mask and the substrate to ensure that the pattern on the mask can be accurately transferred to the substrate.
[0003] However, the larger the area on the substrate used to place the exposure pattern, the higher the utilization rate of a single substrate. Therefore, it is best to place the alignment mark in the dicing channel to avoid the alignment mark occupying the area for placing the exposure pattern. However, the main obstacle to miniaturizing the alignment mark comes from existing moiré fringe offset analysis algorithms (mainly Fourier-based algorithms). These algorithms need to extract the fundamental frequency component from the spectrum of the moiré fringes to obtain the phase and then calculate the difference to obtain the offset. However, the moiré fringe images generated by small-area alignment marks have the characteristic of too few effective pixels, and the fundamental frequency component and zero frequency component in the image frequency are severely mixed. Traditional filtering algorithms are unable to obtain the fundamental frequency component with high accuracy.
[0004] Therefore, in order to reduce the area of alignment marks, an offset resolution method is needed for small-area moiré fringe images. Summary of the Invention
[0005] In view of this, the present invention provides a method for extracting moiré fringe alignment offset based on neural networks.
[0006] This invention provides a method for extracting moiré fringe alignment offset based on a neural network, comprising: converting each sub-image of multiple sets of moiré fringe images generated by an alignment system to the frequency domain to obtain their respective spectral distributions; filtering each spectral distribution using a convolutional attention mechanism to obtain a two-dimensional filter template for suppressing zero-frequency component interference, and extracting effective spectral components from the corresponding spectral distribution based on the two-dimensional filter template, wherein the effective spectral components carry alignment offset information; performing feature fusion on each extracted effective spectral component to obtain a multi-channel fusion feature for enhancing interference suppression, and determining the alignment offset between the mask and the substrate based on the multi-channel fusion feature.
[0007] According to an embodiment of the present invention, filtering the spectral distribution using a convolutional attention mechanism to obtain a two-dimensional filter template includes: performing pooling processing on the spectral distribution to obtain pooling features, wherein the pooling features characterize the global context information of the spectral distribution; performing convolution processing on the pooling features to obtain spatial attention weight features, wherein the spatial attention weight features characterize the importance distribution of different spatial locations in the spectral distribution; and performing normalization processing on the spatial attention weight features to generate a two-dimensional attention weight template with a value range of a preset range, which serves as the two-dimensional filter template.
[0008] According to an embodiment of the present invention, pooling processing is performed on the spectrum distribution to obtain pooling features, including: performing global average pooling and global max pooling in parallel on the spectrum distribution to obtain a first pooling feature characterizing the overall energy distribution of the spectrum distribution and a second pooling feature characterizing significant local responses in the spectrum distribution, respectively; concatenating the first pooling feature and the second pooling feature into channels to obtain a pooling feature that integrates the overall energy distribution and significant local responses to characterize global context information.
[0009] According to an embodiment of the present invention, extracting effective spectral components from a spectral distribution based on a two-dimensional filter template includes: multiplying the two-dimensional attention weight template with the spectral distribution element by element to obtain filtered spectral data as effective spectral components.
[0010] According to an embodiment of the present invention, feature fusion of the extracted effective spectral components to obtain multi-channel fused features includes: splicing the effective spectral components along the channel dimension to obtain initial multi-channel features; performing convolution preprocessing on the initial multi-channel features to extract and fuse the correlation features between different effective spectral components while suppressing redundant components in the initial multi-channel features, thereby obtaining multi-channel fused features.
[0011] According to an embodiment of the present invention, the convolutional preprocessing of the initial multi-channel features includes: performing two-dimensional convolution processing on the initial multi-channel features to extract cross-channel correlation spatial features; performing batch normalization and nonlinear activation processing on the correlation spatial features to obtain activation features with enhanced representational power; and performing max pooling processing on the activation features to achieve feature dimensionality reduction and retain significant spatial structure, thereby obtaining multi-channel fused features.
[0012] According to an embodiment of the present invention, determining the alignment offset between the mask and the substrate based on multi-channel fusion features includes: weighting the multi-channel fusion features using a channel attention mechanism to obtain channel recalibration features that calibrate the importance of each feature channel; performing nonlinear transformation and feature integration on the channel recalibration features to obtain high-order abstract features characterizing the relative positional relationship between the mask and the substrate; and performing linear regression mapping on the high-order abstract features to decode the high-order abstract features into scalar outputs, thereby obtaining the alignment offset.
[0013] According to an embodiment of the present invention, the weighted processing of multi-channel fusion features using a channel attention mechanism includes: performing global pooling on the multi-channel fusion features to compress the spatial dimension of the multi-channel fusion features and aggregate global information to obtain a channel description vector; performing one-dimensional convolution and non-linear activation on the channel description vector to construct the dependency relationship between channels and generate a channel attention weight vector; and performing element-wise multiplication of the channel attention weight vector with the multi-channel fusion features along the channel dimension to recalibrate the importance of the multi-channel fusion features to obtain channel recalibrated features.
[0014] According to an embodiment of the present invention, linear regression mapping of high-dimensional abstract features includes: inputting the high-order abstract features into a regression network consisting of multiple linear layers connected sequentially; wherein the output dimension of each linear layer in the regression network decreases layer by layer to compress and focus the high-order abstract features step by step; and performing regression calculation and outputting alignment offset by the last linear layer with an output dimension of 1.
[0015] According to an embodiment of the present invention, multiple sets of moiré fringe images include a first set of moiré fringe images and a second set of moiré fringe images with different frequencies, each set containing sub-images with different phases; the first set of moiré fringe images and the second set of moiré fringe images have different spectral distribution center positions in the frequency domain; the spectral distributions of the first set of moiré fringe images and the second set of moiré fringe images are respectively filtered by a first filter module and a second filter module with different learnable parameters.
[0016] The neural network-based moiré fringe alignment offset extraction method according to embodiments of the present invention introduces an adaptive filtering based on a convolutional attention mechanism, which can accurately generate a two-dimensional filter template in the frequency domain, thereby effectively suppressing zero-frequency interference and extracting the effective spectral components carrying offset information with high fidelity, thus achieving purification of the input spectrum from the source. Next, by fusing features from multiple purified spectra, a multi-channel fused feature with enhanced anti-interference capability is generated, and the alignment offset between the mask and the substrate is directly determined based on this fused feature, achieving high-precision and highly robust offset extraction. Attached Figure Description
[0017] The above-described features, other objects, and advantages of the present invention will become clearer from the following description of embodiments of the invention with reference to the accompanying drawings, in which:
[0018] Figure 1 A schematic diagram of the overall structure of the moiré fringe alignment system according to an embodiment of the present invention is shown.
[0019] Figure 2 A raster pattern for alignment according to an embodiment of the present invention is illustrated schematically;
[0020] Figure 3 The illustration schematically shows moiré fringes generated by grating alignment on different alignment objects according to embodiments of the present invention;
[0021] Figure 4 Schematic illustration of from Figure 3 The two sets of moiré fringe images shown are cropped from the moiré fringe images;
[0022] Figure 5 The flowchart illustrates a process for a neural network model based on a convolutional attention filter, according to an embodiment of the present invention, to process two sets of moiré fringe images.
[0023] Figure 6 A schematic flowchart illustrating the processing of a convolutional attention filter according to an embodiment of the present invention is shown.
[0024] Figure 7 The flowchart of the convolutional preprocessing module according to an embodiment of the present invention is illustrated schematically.
[0025] Figure 8 The flowchart illustrating the processing of the high-efficiency channel attention module according to an embodiment of the present invention is shown in the illustration.
[0026] Figure 9 A schematic diagram illustrating the processing flow of the efficient channel attention layer according to an embodiment of the present invention is shown.
[0027] Figure 10 The diagram illustrates a comparison of the computational errors of a neural network model based on a convolutional attention filter according to an embodiment of the present invention and a traditional frequency domain-based analytical algorithm applied to moiré fringe images generated by small-area alignment marks.
[0028] Figure 11 A flowchart illustrating the pre-training process of a neural network model based on a convolutional attention filter according to an embodiment of the present invention is shown.
[0029] Figure 12 A flowchart illustrating a neural network-based method for extracting moiré fringe alignment offsets according to an embodiment of the present invention is shown. Detailed Implementation
[0030] To address the limitations of existing Fourier analysis-based analytical algorithms for moiré fringe offsets, which struggle to accurately extract the effective fundamental frequency component when processing images with severe spectral aliasing caused by reduced alignment mark area, this invention provides a neural network-based method for extracting moiré fringe alignment offsets. This method systematically solves the aforementioned problems through collaborative steps including frequency domain transformation, spectral filtering, feature fusion, and offset determination. First, after transforming the image to the frequency domain, a filter module based on a convolutional attention mechanism is creatively introduced to adaptively generate a two-dimensional weighted template, extracting the effective spectral component carrying offset information from the aliased spectrum with high fidelity, thus achieving source purification of input features. Next, by fusing multiple effective spectral components, a multi-channel fusion feature with enhanced anti-interference capability is generated, completing the integration and enhancement of key information. Finally, based on this multi-channel fusion feature, the alignment offset between the mask and the substrate is directly determined, establishing a robust mapping from purified features to offset. This invention provides an efficient and reliable solution for achieving nanometer-level precision alignment using miniaturized alignment marks within a limited space.
[0031] This invention has a wide range of applications, especially suitable for scenarios requiring high-precision, small-size alignment marks. For example, in the field of photolithography, it can be used to accurately detect the lateral alignment offset between a substrate and a mask. Furthermore, the principles of this invention can also be extended to other precision alignment and measurement fields, such as precision optical assembly and micro / nano assembly, to detect the relative positional deviation of upper and lower layers in a plane, demonstrating good technical versatility and application potential.
[0032] To make the above-mentioned objects, features, and advantages of the present invention clearer and easier to understand, the technical solution of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that similar or identical parts are referred to by the same reference numerals in the drawings or description. Implementations not shown or described in the drawings are forms known to those skilled in the art.
[0033] Please see Figures 1 to 4 This example demonstrates the process of generating and acquiring moiré fringe images processed by the method of the present invention.
[0034] like Figure 1 As shown, the alignment system 10 of this embodiment may include a substrate 11 and a mask 12 with alignment marks, a light source 13, a microscope objective 14, and an imaging device 15. X, Y, and Z in the figure represent different directions. As an example, the imaging device 15 may be a charge-coupled device (CCD) camera, and the light generated by the light source 13 may be a laser.
[0035] In this embodiment, the alignment system 10 can be a dual-frequency heterodyne moiré fringe alignment system, which can achieve high-precision alignment over a large range. The dual-frequency heterodyne moiré fringe alignment system can generate two sets of moiré fringes with slightly different frequencies. The frequencies of the two moiré fringes within each set of moiré fringes are consistent. When the mask and the substrate are not perfectly aligned, there is a phase difference in both sets of moiré fringes, but the phase difference values are different.
[0036] like Figure 2 As shown, in one specific embodiment of the dual-frequency heterodyne moiré fringe alignment system, the alignment marks on the mask 12 can be a set of line gratings with periods P1, P2, P3, and P4, respectively, and the substrate 11 can be a checkerboard grating with corresponding periods. These gratings can be arranged in different quadrants of the alignment center line. It should be noted that the specific pattern of the gratings is not limited to the line type and checkerboard pattern shown in the figure.
[0037] like Figure 1 As shown, when the alignment system 10 is operating, the light source 13 injects laser light into the alignment area of the substrate 11 and the mask 12. The laser light, after being diffracted by a grating, generates moiré fringes containing alignment information. The generated moiré fringes are displayed on the imaging device 15 via the microscope objective 14, and can form a pattern such as... Figure 3 The complete moiré fringe image is shown. Specifically, a line grating with period P3 on the mask interacts with a checkerboard grating with period P4 on the substrate, producing a moiré fringe A with period T1. The line grating with period P4 on the mask interacts with the checkerboard grating with period P3 on the substrate, producing a moiré fringe B with the same period T1 but a different phase. Moiré fringes A and B constitute the first set of moiré fringes. The line grating with period P1 on the mask interacts with the checkerboard grating with period P2 on the substrate, producing a moiré fringe C with period T2. The line grating with period P2 on the mask interacts with the checkerboard grating with period P1 on the substrate, producing a moiré fringe D with the same period T2 but a different phase. Moiré fringes C and D constitute the second set of moiré fringes.
[0038] When there is an alignment deviation between the mask 12 and the substrate 11, a first phase difference is generated between moiré fringe A and moiré fringe B, and a second phase difference is generated between moiré fringe C and moiré fringe D, and the first phase difference and the second phase difference are different.
[0039] When the alignment mark area is small, the moiré fringe image generated by the above process has a limited number of effective pixels, resulting in severe aliasing of the fundamental frequency component and the zero frequency component in its spectrum. As is well known to those skilled in the field of image processing, this makes it difficult for traditional frequency domain-based analytical algorithms to accurately extract the effective spectral components.
[0040] To adapt the neural network model designed in this invention for processing, it can be obtained from... Figure 3A representative local area is cropped from the complete moiré fringe image shown. For example... Figure 4 As shown, sub-images a, b, c, and d can be extracted from moiré fringes A, B, C, and D, respectively. These sub-images together constitute the input data of the neural network. Sub-images a and b (with the same period T1 but different phases) form the first set of moiré fringe images, while sub-images c and d (with the same period T2 but different phases) form the second set of moiré fringe images. Corresponding to different periods (T1 and T2), these two sets of images have different spectral distribution center positions in the frequency domain. Subsequent processing steps can use these two sets of images as input.
[0041] It is important to emphasize that the specific number of moiré fringe images input to the neural network model based on convolutional attention filters, and the number of sub-images within each group, are not limited to the two groups and two sub-images per group as described in the embodiment. The core of the method of this invention lies in utilizing multiple groups of moiré fringe images with frequency differences to provide richer frequency domain information, thereby enhancing the robustness and accuracy of the model under spectral aliasing conditions. Therefore, in some embodiments, three or more groups of moiré fringe images can be designed and used as input.
[0042] However, while increasing the number of input groups theoretically provides more information dimensions, it also significantly increases computational complexity and system design costs. More image groups mean designing more grating pairs with subtle frequency differences, which places higher demands on the design and fabrication of alignment marks and may encroach on the limited alignment mark area on the substrate. Furthermore, a larger number of input groups directly increases the parameter size and computational burden of the neural network, affecting the efficiency of model training and real-time inference. Therefore, the two-group design adopted in this embodiment is a preferred solution that effectively solves the problem of spectral aliasing in small-area marks while balancing practicality and economy.
[0043] Please see Figure 5 , Figure 5 The diagram illustrates the overall processing flow of a neural network (CAFN) model based on a convolutional attention filter according to an embodiment of the present invention. This CAFN model can be used to implement the neural network-based moiré fringe alignment offset extraction method of the present invention.
[0044] like Figure 5As shown, the CAFN model mainly includes the following modules connected in sequence: Fast Fourier Transform (FFT) module, Convolutional Attention Filter (CAF) group, Convolutional Preprocessing Module (CPM) group, Efficient Channel Attention Module (ECAM) group, residual connection network, and fully connected layer group.
[0045] The processing flow of the CAFN model can be as follows:
[0046] First, two sets of moiré fringe images input to the model (e.g., the first set consisting of sub-images a and b, and the second set consisting of sub-images c and d, respectively) are fed into the FFT module. The FFT module transforms each sub-image (a, b, c, d) from the spatial domain to the frequency domain, obtaining its corresponding spectral distribution. Due to the small area of the alignment markers, the effective spectral components (i.e., the components carrying alignment offset information) in these spectral distributions are severely aliased with the zero-frequency components, resulting in significant interference.
[0047] Next, each spectral distribution is fed into a corresponding Convolutional Attention Filter (CAF) module for filtering. Given that the two sets of moiré fringes have different frequencies, this embodiment uses two different CAFs (CAF1 and CAF2) to process the corresponding groups' spectra respectively. CAF1 and CAF2 have different learnable convolutional kernel parameters, which can be trained to adapt to the spectral distribution characteristics of their corresponding frequencies. Each CAF module (its detailed structure can be found in...) Figure 6 (As described below) This generates a two-dimensional filter template to suppress zero-frequency component interference. This template adaptively enhances the effective spectral components and suppresses zero-frequency and other noise interference by multiplying element-wise with the input spectral distribution, thereby extracting the cleaned effective spectral components.
[0048] Then, the effective spectral components extracted by the CAF module are concatenated to form a multi-channel feature tensor (initial multi-channel features) for enhanced interference suppression, which is then input into the convolutional preprocessing module (CPM). (For detailed structure of the CPM module, please refer to...) Figure 7 The feature tensor is preprocessed by convolution to extract and fuse the correlation features between different effective spectral components, while suppressing redundant components, thereby generating a multi-channel fusion feature for enhancing interference suppression.
[0049] Subsequently, the multi-channel fused features are input into an ECAM group composed of multiple efficient channel attention modules (ECAMs) cascaded together for deep feature extraction. As an example, this embodiment uses 10 ECAMs (10×), but the invention is not limited thereto. Each ECAM module (its detailed structure can be found in...) Figure 8 , Figure 9 By introducing a channel attention mechanism, the importance of each feature channel is adaptively calibrated, enhancing the model's ability to perceive key features, thus obtaining channel recalibrated features. The output of the ECAM group then undergoes further feature integration and nonlinear mapping through a residual connection network to extract higher-order abstract features, resulting in a high-dimensional feature tensor. Figure 5 As shown, in this embodiment, the residual connection network may sequentially include a max pooling layer, a fully connected layer, a one-dimensional convolutional layer, a batch normalization layer, an activation function (e.g., a ReLU function) layer, a one-dimensional convolutional layer, a re-batch normalization layer, and an activation function (e.g., a ReLU function) layer.
[0050] Finally, this high-dimensional feature tensor is fed into a fully connected layer array. The fully connected layer array performs progressive compression, focusing, and regression calculations on the high-order abstract features through multiple linear layers with progressively decreasing output dimensions, ultimately outputting a scalar value. This value is the alignment offset determined based on multi-channel fusion features, characterizing the relative positional relationship between the mask and the substrate. In this embodiment, for the four subgraphs, the fully connected layer group can be constructed by sequentially connecting four linear layers, with the output dimension of each layer decreasing sequentially (e.g., 32, 16, 8, 1), ultimately outputting a 1×1 tensor, i.e., a scalar value. .
[0051] Please see Figure 6 , Figure 6 The internal processing flow of a convolutional attention filter (CAF) according to an embodiment of the present invention is illustrated schematically.
[0052] like Figure 6 As shown, the CAF module mainly consists of a pooling layer, a two-dimensional convolutional layer, and an activation function layer connected in sequence.
[0053] First, a pooling layer is used to obtain pooling features of the input spectral distribution to capture its global contextual information. In a preferred embodiment, this pooling layer may include parallel global average pooling and global max pooling operations to obtain a first pooling feature characterizing the overall energy distribution of the spectral distribution and a second pooling feature characterizing significant local responses in the spectral distribution, respectively. These two features are then concatenated along the channel dimension to form a comprehensive pooling feature. This design helps the network to simultaneously focus on both the overall distribution of the spectrum and local salient information.
[0054] Next, a two-dimensional convolutional layer is used to receive pooling features. The importance distribution of different spatial locations in the spectral distribution is learned through convolution operations to obtain spatial attention weight features.
[0055] Then, an activation function layer is used to perform a nonlinear transformation and normalization on the spatial attention weight features output by the two-dimensional convolutional layer, generating a two-dimensional attention weight template with a preset range, which serves as the two-dimensional filter template. For example, the activation function can be the Sigmoid function, with a preset range of [0, 1]. Using the Sigmoid function can constrain the weight values within the [0, 1] interval, thus generating the final two-dimensional attention weight template. Multiplying this template element-wise with the original input spectrum distribution achieves adaptive weighting of components at different positions in the spectrum, thereby enhancing effective components and suppressing interference.
[0056] Please see Figure 7 , Figure 7 The internal processing flow of the convolutional preprocessing module (CPM) according to an embodiment of the present invention is illustrated schematically.
[0057] like Figure 7 As shown, the CPM module may include a two-dimensional convolutional layer, a batch normalization layer, an activation function layer (e.g., a ReLU function), and a max pooling layer connected in sequence.
[0058] The CPM module's processing can be described as follows: First, a two-dimensional convolutional layer is used to extract and fuse the effective spectral components of the input multi-channel array, extracting cross-channel correlation spatial features. Then, a batch normalization layer is used to standardize the correlation spatial features of the convolutional output to accelerate training and improve stability. Nonlinearity is introduced through an activation function layer to obtain stable activation features with enhanced representational power. Finally, a max pooling layer is used to downsample the activation features to achieve dimensionality reduction and extract more robust features, thereby outputting multi-channel fused features.
[0059] Please see Figure 8 and Figure 9 , Figure 8 The overall structure of the efficient channel attention module (ECAM) according to an embodiment of the present invention is illustrated schematically. Figure 9 This section specifically illustrates the processing flow of the Efficient Channel Attention Layer (ECAL) in ECAM.
[0060] like Figure 8As shown, an ECAM module can include a main branch and a shortcut connection. The main branch can contain a series of operations, such as one-dimensional convolution, batch normalization, activation functions, two-dimensional convolution, and further batch normalization, to perform non-linear transformations on the input features to obtain intermediate features. These intermediate features are then fed into an efficient channel attention layer.
[0061] like Figure 9 As shown, the Efficient Channel Attention Layer (ECAL) first performs global average pooling on the input intermediate features to compress spatial information and generate channel description vectors. Then, it convolves these description vectors through a one-dimensional convolutional layer (whose kernel size k can be adaptively determined or set to a fixed value) to capture cross-channel interaction information. Finally, it generates attention weights for each channel using a sigmoid activation function, with values ranging from 0 to 1. These channel attention weights are then element-wise multiplied with the original intermediate features along the channel dimension to recalibrate channel importance. The recalibrated features are then element-wise added to the original input features of the ECAM module via a shortcut connection to form a residual structure, outputting the final features. This design avoids excessive parameter overhead while introducing channel attention, achieving a balance between efficiency and performance.
[0062] Please see Figure 10 , Figure 10 This diagram illustrates a performance comparison between the CAFN model according to an embodiment of the present invention and traditional frequency-domain-based analytical algorithms (such as the DMFH-S algorithm) in processing moiré fringe images generated by small-area alignment marks. The figure shows the computational error distribution of the two methods on the test dataset, with the horizontal axis representing misalignment and the vertical axis representing offset extraction error, both in nanometers (nm). It can be seen that the prediction error of the CAFN model (with dots) designed in this invention is significantly lower than that of the traditional algorithm (without dots). Specifically, the average error of CAFN on the test set is as low as 0.7 nm, while the average error of the traditional algorithm is as high as hundreds of nanometers. This comparison intuitively verifies the significant advantage of the method of the present invention in offset extraction accuracy when processing small-area moiré fringe images with spectral aliasing. The fundamental reason is that traditional algorithms struggle to accurately separate the effective fundamental frequency component from severely aliased spectra, while the CAFN model of the present invention adaptively learns and extracts robust features through a data-driven approach.
[0063] Please see Figure 11 , Figure 11 A flowchart illustrating a method for pre-training the CAFN model described above according to an embodiment of the present invention is shown. The pre-training method includes operations S111 to S113.
[0064] In operation S111, training images are acquired and ground truth values are generated. Large-area, high-resolution images containing complete moiré fringe information are acquired by alignment system 10. These images are calculated using mature, high-precision frequency domain analytical algorithms (such as Fourier transform-based algorithms) to obtain accurate alignment offsets, and this result is used as the ground truth values required for model training.
[0065] In operation S112, a training dataset is constructed. From the large-area training images obtained in operation S111, imaging conditions simulating small-area labeling are used to crop out multiple local region image patches. For example, image patches with a resolution of 80 pixels × 80 pixels can be cropped, and each training sample contains 4 such image patches (corresponding to sub-images a, b, c, d). Each image patch sample is associated with its corresponding ground truth offset to construct a large-scale image-ground truth pairing dataset. In one embodiment, a dataset containing millions of samples can be constructed and proportionally divided into training, validation, and test sets. As an example, the ratio of the training, validation, and test sets is, for example, 8:1:1.
[0066] In operation S113, end-to-end joint training is performed. Using the constructed dataset, the CAFN model (including its CAF filter module and subsequent neural network part) undergoes end-to-end supervised training. Advanced optimizers (such as NAdamW) can be employed during training, along with dynamic learning rate adjustment strategies to accelerate convergence. For example, the ReduceLROnPlateau callback function can be used to automatically reduce the learning rate if the validation set performance does not improve over multiple consecutive rounds. To improve efficiency, training can be performed in a distributed parallel manner in a multi-graphics processing unit environment, with an appropriate batch size, such as 128. Through iterative optimization, the model learns to directly regress high-precision alignment offsets from small-area moiré fringe image patches of the input.
[0067] Please refer to Figure 12 , Figure 12 A flowchart illustrating a neural network-based method for extracting moiré fringe alignment offsets according to an embodiment of the present invention is shown.
[0068] like Figure 12 As shown, the moiré fringe alignment offset extraction method based on neural networks in this embodiment may include operations S121 to S123.
[0069] In operation S121, each sub-image of the multiple sets of moiré fringe images generated by the alignment system 10 is converted to the frequency domain to obtain their respective spectral distributions.
[0070] In operation S122, for each spectral distribution, the spectral distribution is filtered using the convolutional attention mechanism to obtain a two-dimensional filter template for suppressing zero-frequency component interference. Based on the two-dimensional filter template, the effective spectral components in the corresponding spectral distribution are extracted. The effective spectral components carry alignment offset information.
[0071] In operation S123, feature fusion is performed on each extracted effective spectral component to obtain a multi-channel fusion feature for enhancing interference suppression, and the alignment offset between the mask and the substrate is determined based on the multi-channel fusion feature.
[0072] For details regarding the specific implementation methods of the above operations, the module details involved (such as convolutional attention filters, the specific structure of neural network models), and training methods, please refer to the preceding text of this manual. Figures 1 to 11 The relevant embodiments described herein will not be repeated here.
[0073] It should be noted that, in this document, relational terms such as “first” and “second” are used only to distinguish one entity or operation from another, and do not necessarily require or imply any actual relationship or order between these entities or operations. The terms “comprising,” “including,” or any other variations are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.
[0074] In this document, the descriptions of each embodiment have different focuses. Parts not detailed in a particular embodiment can be found in the relevant descriptions of other embodiments. The above are merely specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Those skilled in the art should understand that various changes, modifications, substitutions, and variations can be made to these embodiments without departing from the principles and spirit of the present invention. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this invention should be included within the scope of protection of this invention. Therefore, the scope of protection of this invention should be determined by the scope of the claims.
Claims
1. A method for extracting moiré fringe alignment offset based on neural networks, characterized in that, include: Each sub-image of the multiple sets of moiré fringe images generated by the alignment system is transformed to the frequency domain to obtain its respective spectral distribution; For each of the spectrum distributions, a convolutional attention mechanism is used to filter the spectrum distribution to obtain a two-dimensional filter template for suppressing zero-frequency component interference. Based on the two-dimensional filter template, the effective spectrum components in the corresponding spectrum distribution are extracted. The effective spectrum components carry alignment offset information. The extracted effective spectral components are fused to obtain multi-channel fusion features for enhancing interference suppression, and the alignment offset between the mask and the substrate is determined based on the multi-channel fusion features.
2. The method according to claim 1, characterized in that, The two-dimensional filter template is obtained by filtering the spectral distribution using a convolutional attention mechanism, including: The spectrum distribution is subjected to pooling processing to obtain pooling features, which characterize the global context information of the spectrum distribution; The pooling features are convolved to obtain spatial attention weight features, which characterize the importance distribution of different spatial locations in the spectral distribution. The spatial attention weight features are normalized to generate a two-dimensional attention weight template with a preset value range, which serves as the two-dimensional filter template.
3. The method according to claim 2, characterized in that, The pooling process is applied to the spectral distribution to obtain the pooling features, including: The spectrum distribution is subjected to parallel global average pooling and global max pooling to obtain a first pooling feature characterizing the overall energy distribution of the spectrum distribution and a second pooling feature characterizing significant local responses in the spectrum distribution, respectively. The first pooling feature and the second pooling feature are concatenated to obtain the pooling feature that integrates the overall energy distribution and significant local response, thereby characterizing the global context information.
4. The method according to claim 2, characterized in that, Extracting the effective spectral components from the spectral distribution based on the two-dimensional filter template includes: The two-dimensional attention weight template is multiplied element-wise with the spectral distribution to obtain the filtered spectral data as the effective spectral component.
5. The method according to claim 1, characterized in that, The extracted effective spectral components are fused to obtain the multi-channel fused features, which include: The effective spectral components are concatenated along the channel dimension to obtain the initial multi-channel features; The initial multi-channel features are preprocessed by convolution to extract and fuse the correlation features between different effective spectral components, while suppressing redundant components in the initial multi-channel features, thereby obtaining the multi-channel fused features.
6. The method according to claim 5, characterized in that, The convolutional preprocessing of the initial multi-channel features includes: The initial multi-channel features are subjected to two-dimensional convolution processing to extract cross-channel correlation spatial features; The correlation spatial features are subjected to batch normalization and nonlinear activation processing to obtain activation features with enhanced representation capabilities. The activated features are subjected to max pooling to reduce feature dimensionality while preserving significant spatial structure, thereby obtaining the multi-channel fused features.
7. The method according to claim 1, characterized in that, Determining the alignment offset between the mask and the substrate based on the multi-channel fusion features includes: The multi-channel fusion features are weighted using a channel attention mechanism to obtain channel recalibration features that calibrate the importance of each feature channel; By performing nonlinear transformation and feature integration on the channel recalibration features, a high-order abstract feature characterizing the relative positional relationship between the mask and the substrate is obtained. A linear regression mapping is performed on the higher-order abstract features to decode them into scalar outputs, thereby obtaining the alignment offset.
8. The method according to claim 7, characterized in that, The multi-channel fusion features are weighted using a channel attention mechanism, including: The multi-channel fusion features are subjected to global pooling to compress the spatial dimension of the multi-channel fusion features and aggregate global information to obtain channel description vectors; The channel description vectors are subjected to one-dimensional convolution and non-linear activation to construct the dependencies between channels and generate channel attention weight vectors. The channel attention weight vector is multiplied element-wise by the multi-channel fusion feature along the channel dimension to recalibrate the importance of the multi-channel fusion feature, thus obtaining the channel recalibrated feature.
9. The method according to claim 7, characterized in that, Performing linear regression mapping on the high-dimensional abstract features includes: The high-order abstract features are input into a regression network consisting of multiple linear layers connected sequentially; wherein the output dimension of each linear layer in the regression network decreases layer by layer, so as to compress and focus the high-order abstract features step by step. The alignment offset is calculated and output by regression from the last linear layer with an output dimension of 1.
10. The method according to claim 1, characterized in that, The multiple sets of moiré fringe images include a first set of moiré fringe images and a second set of moiré fringe images with different frequencies, each set containing sub-images with different phases; the first set of moiré fringe images and the second set of moiré fringe images have different spectral distribution center positions in the frequency domain; The spectral distributions of the first set of moiré fringe images and the second set of moiré fringe images are filtered using a first filter module and a second filter module with different learnable parameters, respectively.