Multi-modal deep learning network-based overlay mark asymmetry compensation method and apparatus
By using a multimodal deep learning network to address the asymmetry problem of overlay markings in integrated circuit manufacturing, high-precision error prediction and automated compensation are achieved. This solves the measurement deviation and process instability caused by target asymmetry, and improves production efficiency and model adaptability.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- MZ OPTOELECTRONIC TECHNOLOGY (SHANGHAI) CO LTD
- Filing Date
- 2025-12-30
- Publication Date
- 2026-06-25
AI Technical Summary
In integrated circuit manufacturing, the asymmetry of the target leads to measurement accuracy problems, especially in high-precision applications such as automotive electronics and new energy power chips. Traditional methods cannot effectively deal with the asymmetry of overlay marks, resulting in problems such as measurement deviation, data inconsistency, process instability and resource waste.
By employing a multimodal deep learning network, we can extract and fuse features from manufacturing data of different modalities, predict the asymmetric error of overlay marks, and compensate for it in actual production. We can also use machine learning technology to automate feature engineering and reduce manual intervention.
It improves measurement accuracy and process stability, reduces rework rate, increases production efficiency, enhances the model's adaptability to changing production environments, and provides more accurate analysis results.
Smart Images

Figure CN2025147304_25062026_PF_FP_ABST
Abstract
Description
Method and apparatus for asymmetric compensation of overlay markings in multimodal deep learning networks Technical Field
[0001] This application relates to the field of semiconductor manufacturing technology, and more specifically, to a method and apparatus for asymmetric compensation of overlay markings in a multimodal deep learning network. Background Technology
[0002] In integrated circuit manufacturing processes, the application of third-generation compound materials differs from the relatively mature manufacturing technology in silicon-based materials. They typically exhibit unique characteristics in the process, such as thick films, residues, and marking asymmetry. This asymmetry significantly impacts measurement accuracy, particularly in high-precision applications like automotive electronics and new energy power chips.
[0003] In the manufacturing of automotive electronics and new energy power chips, traditional solutions are clearly insufficient to meet the measurement accuracy challenges caused by target asymmetry and cannot cope with the asymmetry of overlay marks.
[0004] Therefore, a new scheme for compensation of asymmetric overlay markings is needed. Summary of the Invention
[0005] In view of this, embodiments of this specification provide a method and apparatus for asymmetric compensation of overlay markings in multimodal deep learning networks.
[0006] The embodiments in this specification provide the following technical solutions:
[0007] This specification provides an embodiment of a method for compensating for overlay label asymmetry in multimodal deep learning networks, including:
[0008] Acquire manufacturing data for different modalities;
[0009] Manufacturing data from different modalities are input into a multimodal deep learning network for processing to obtain the prediction error of overlay mark asymmetry. The multimodal deep learning network extracts features from the manufacturing data of different modalities and fuses the corresponding vector features of different modalities. It obtains the prediction error of overlay mark asymmetry by associating and combining the physical parameters of actual production during the manufacturing process with the parameters observed by the equipment.
[0010] Based on the prediction error of asymmetric overlay marks, compensation is made for the actual overlay marks in actual production.
[0011] This specification provides an embodiment of a device for compensating for overlay marking asymmetry in a multimodal deep learning network, comprising:
[0012] The acquisition module is used to acquire manufacturing data for different modes;
[0013] The processing module is used to input manufacturing data of different modalities into a multimodal deep learning network for processing to obtain the prediction error of overlay mark asymmetry. The multimodal deep learning network extracts features from the manufacturing data of different modalities and fuses the vector features corresponding to different modalities. It obtains the prediction error of overlay mark asymmetry by associating and combining the physical parameters of actual production in the manufacturing process with the parameters observed by the equipment.
[0014] The compensation module is used to compensate for the actual overlay marks in actual production based on the predicted error of the asymmetry of the overlay marks.
[0015] Compared with the prior art, the beneficial effects that at least one technical solution adopted in the embodiments of this specification can achieve include at least:
[0016] The embodiments in this specification are based on a multimodal deep learning network-based overlay label asymmetric compensation method. By integrating data from different sources (such as images, structured data, and text), it can comprehensively mine potential features, thereby providing more accurate analysis results. Simultaneously, it can automatically learn and optimize feature engineering, reducing reliance on manual intervention and improving the efficiency and accuracy of feature extraction. Furthermore, by integrating data from different modalities, the model can adapt to various processes and products, enhancing its generalization ability and improving prediction accuracy, ensuring its high efficiency even in changing production environments. Attached Figure Description
[0017] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0018] Figure 1 is a schematic diagram of the asymmetry of overlay markings in the prior art;
[0019] Figure 2 is a schematic diagram of the actual SEM image of the asymmetrical markers in an embodiment of this application;
[0020] Figure 3 is a schematic diagram of the overlay marking force shaping compensation of a multimodal deep learning network according to this application;
[0021] Figure 4 is a flowchart of the overlay label asymmetric compensation process for a multimodal deep learning network provided in this application;
[0022] Figure 5. Schematic diagram comparing the trends of asymmetric results in this application;
[0023] Figure 6. Distribution of residual data after asymmetric compensation in this application. Detailed Implementation
[0024] The embodiments of this application will now be described in detail with reference to the accompanying drawings.
[0025] The following specific examples illustrate the implementation of this application. Those skilled in the art can easily understand other advantages and effects of this application from the content disclosed in this specification. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. This application can also be implemented or applied through other different specific embodiments, and the details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of this application. It should be noted that, in the absence of conflict, the following embodiments and features in the embodiments can be combined with each other. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0026] It should be noted that various aspects of embodiments within the scope of the appended claims are described below. It will be apparent that the aspects described herein can be embodied in a wide variety of forms, and any particular structure and / or function described herein is merely illustrative. Based on this application, those skilled in the art will understand that one aspect described herein can be implemented independently of any other aspect, and two or more of these aspects can be combined in various ways. For example, any number and aspects set forth herein can be used to implement the device and / or practice the method. Additionally, this device and / or method can be implemented using structures and / or functionalities other than one or more of the aspects set forth herein.
[0027] It should also be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of this application. The drawings only show the components related to this application and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.
[0028] Additionally, specific details are provided in the following description to facilitate a thorough understanding of the examples. However, those skilled in the art will understand that practice can be carried out without these specific details.
[0029] Integrated circuit manufacturing processes often have unique characteristics, such as thick films, residues, and asymmetric markings. The asymmetry of the target significantly impacts measurement accuracy, especially in applications requiring high precision, such as automotive electronics and new energy power chips.
[0030] Asymmetry in overprinting marks has the following effects:
[0031] The effect of asymmetry on measurement accuracy
[0032] Target asymmetry can lead to a series of measurement errors, which in turn affect product quality and performance:
[0033] Measurement deviation: Measurement deviation caused by asymmetry directly affects the setting of process parameters, which may lead to products not meeting specifications, thereby affecting product reliability and market competitiveness.
[0034] Data inconsistency: During the production process, different batches of products may produce inconsistent measurement results due to asymmetry in objectives, which increases the difficulty of quality control.
[0035] The impact of overlay mark asymmetry on Fab
[0036] Asymmetry in overlay markings has a significant impact on the overall efficiency and product quality of a semiconductor fabrication fab. Specifically, this manifests as follows:
[0037] Quality fluctuations: Measurement errors caused by asymmetry can affect wafer uniformity and performance, leading to a decrease in product yield. Final products may be discarded due to non-compliance, increasing production costs.
[0038] Process stability: The presence of asymmetry may render the setting of process parameters ineffective, leading to increased fluctuations in the production process and affecting the stability and repeatability of the process.
[0039] Resource waste: Due to measurement errors caused by asymmetry, resources (such as raw materials and time) may be wasted in the production process.
[0040] Substandard products increase overall production costs and environmental burden.
[0041] Challenges in Fab Applications
[0042] In wafer fabrication environments (Fab), the application of complex processes such as thick-film coating leads to frequent target asymmetries, bringing new challenges:
[0043] Increased processing difficulty: The requirements of thick-film molding process make the product more prone to asymmetry during molding, which affects subsequent measurement and inspection.
[0044] Cost of rework: To address the problems caused by asymmetry, production lines often need to be reworked multiple times. This not only wastes materials and time but also increases production costs and negatively impacts overall efficiency.
[0045] Although there are ways to deal with asymmetry problems in existing technologies, traditional methods rely heavily on human intervention, which is not only time-consuming but also easily affected by subjective factors and may not fully reflect the complexity of asymmetry.
[0046] Many traditional models perform well on specific datasets, but their predictive ability is poor on new data or under different process conditions, i.e., they suffer from overfitting and poor generalization. This limitation restricts their application in variable production environments.
[0047] Traditional methods often fail to fully extract the potential information in multimodal data and fail to achieve deep learning of features, resulting in an incomplete understanding of target asymmetry.
[0048] Based on this, the embodiments of this specification propose a new scheme for compensation of overlay marking asymmetry using a multimodal deep learning network, which aims to solve the measurement error problem caused by target asymmetry. By introducing a multimodal learning network, it fully explores and utilizes various data features to enhance the generalization ability of the model. At the same time, it uses machine learning technology to realize the intelligence of equipment, providing an effective traceability and localization solution for process problems in wafer manufacturing.
[0049] The following are explanations of technical terms:
[0050] Overlay error (OVL) describes the deviation between the current lithography layer and the previous lithography layer in the photolithography process, thereby enabling monitoring and providing data to correct the process.
[0051] Overlay marks (OVL-Mark) are graphic markings used to measure overlay errors. These marks are distributed in a certain number in the lithography unit. In terms of measurement type, they are generally divided into imaging measurement (IBO, Image-Based-Overlay) and non-imaging diffraction measurement (DBO, Diffraction-Based-Overlay). The graphics of the overlay marks will also be different. The specific graphic illustration is shown in part A of Figure 1.
[0052] Asymmetric markings are typically designed to be symmetrical, resulting in a symmetrical physical morphology when photolithographically applied to photoresist. However, certain special processes can lead to a degree of asymmetry in the markings. These processes include CMP (Chemical Mechanical Polishing), STI (Shallow Trench Isolation), thick resist processes, and double-layer resist processes. This is a type of processing error, as illustrated in Part B of Figure 1. We aim to minimize the measurement inaccuracies caused by such processing errors in the OVL measurement system.
[0053] CIS (CMOS Image Sensor) is a sensor used to receive light signals from markers on a test sample.
[0054] The technical solutions provided by the various embodiments of this application are described below with reference to the accompanying drawings.
[0055] As shown in Figures 3 and 4, this application provides a method for asymmetric compensation of overlay marks using a multimodal deep learning network, including steps S401-S403. Step S401 involves acquiring manufacturing data from different modalities. Step S402 involves inputting the manufacturing data from different modalities into a multimodal deep learning network for processing to obtain the asymmetric prediction error of the overlay marks. The multimodal deep learning network extracts features from the manufacturing data of different modalities and fuses the corresponding vector features of different modalities. It obtains the prediction error of the asymmetric overlay marks by associating and combining the physical parameters of actual production during the manufacturing process with the parameters observed by the equipment. Step S403 involves compensating for the actual overlay marks in actual production based on the prediction error of the asymmetric overlay marks.
[0056] Among them, the manufacturing data of different modes mainly target the overlay marks in the photolithography process of semiconductor manufacturing. It not only marks specific positions and size information on the wafer to ensure the accuracy and consistency of subsequent processes, but also helps to monitor and adjust small deviations in the production process, ensuring that the circuit patterns on each wafer can be precisely aligned, thus achieving chip precision.
[0057] Step S401 involves acquiring manufacturing data from different modalities, such as historical and real-time data, including data from various sources. This means integrating data from different sources and utilizing a multimodal deep learning network to comprehensively uncover potential features.
[0058] In step S402, manufacturing data from different modalities are input into a multimodal deep learning network for processing. This allows for the integration of data from different modalities through the multimodal deep learning network, providing more accurate analysis results. Specifically, the multimodal deep learning network extracts features from the manufacturing data of different modalities and fuses the corresponding vector features. This process involves associating the actual physical parameters of the manufacturing process with the parameters observed by the equipment (i.e., linking the process with the measurement of asymmetric overlay marks). This provides in-depth source tracing and localization analysis for asymmetry and other process problems, quickly identifying potential process defects or anomalies, thereby obtaining a more accurate prediction error for overlay mark asymmetry.
[0059] The actual physical parameters generated include specific features that affect the wafer surface morphology. These parameters have a significant impact on asymmetry during wafer manufacturing, such as wafer images and process parameters. The parameters observed by the equipment include: measurement images obtained from the overlay marks on the monitored targets, equipment error analysis data, signal data measured by the equipment, equipment hardware parameters, and equipment software parameters.
[0060] In some embodiments, historical parameters observed by the device are also included, which not only serve as training data but also provide potential factors for model feature representation to achieve more accurate predictions.
[0061] In step S403, the actual overlay marks are compensated based on the prediction error of the asymmetric overlay marks predicted by the multimodal deep learning network.
[0062] The residual is calculated after compensating for the predicted error caused by the asymmetry of the overlay marks. In some embodiments, it has been verified that this application can effectively predict and compensate for the error caused by asymmetry.
[0063] In some embodiments, prediction errors for target asymmetry are generated and the prediction errors and feature importance analysis are displayed using visualization tools such as charts and dashboards.
[0064] In some embodiments, multi-modal deep learning networks perform multi-level feature fusion; feature vectors of different modalities are extracted from manufacturing data of different modalities, and the manufacturing data of different modalities are directly concatenated in the input layer to obtain an initial feature vector; the feature vectors of each modality are fused by adding fully connected layers in the intermediate layers, and the outputs of the fully connected layers are concatenated again. Attention weights are applied to each modal feature vector through an attention mechanism to obtain the output feature vector after feature weight allocation; wherein, the activation function or dimensionality reduction layer is used to enhance the output feature vector of the fully connected layer.
[0065] Specifically, early fusion directly concatenates the features of each modality into a single feature vector at the input layer. Image feature vectors, structured data feature vectors, and text feature vectors are directly concatenated to form a large vector. Assuming the manufacturing data from different modalities are represented as image feature vectors (v_img), structured data feature vectors (v_struct), and text feature vectors (v_text), the initial feature vector is represented as v_earlyfusion = [v_img, v_struct, v_text]. By fusing data from different sources at the input layer to form a unified feature representation, model training is facilitated, and the complementary advantages of different data sources are fully utilized. This provides more accurate information for overlay marking asymmetry analysis, enhancing adaptability and reliability under various environmental conditions.
[0066] Early fusion eliminates the need for complex network structures to process data from different modalities, simplifying the model and training process. It also reduces information loss by directly integrating features from all modalities. Furthermore, it enables parallel feature extraction from different modalities, facilitating rapid data processing.
[0067] The process involves fusion using one or more fully connected layers, fusing features at intermediate layers. Image features and structured data features are extracted separately, and then fully connected layers are added to each layer in the intermediate layers. The outputs of these fully connected layers are then concatenated. The fused feature representation capability is improved by adding activation functions or dimensionality reduction layers; activation functions introduce nonlinearity to enhance the learning of complex feature combinations and interactions; dimensionality reduction layers, such as Principal Component Analysis (PCA), reduce redundancy and retain key information.
[0068] By incorporating an attention mechanism and applying attention weights to the feature vectors of each modality, the model automatically learns the weights for each modality. The image feature weight is w_img, the structured data feature weight is w_struct, and the text feature weight is w_text. The final feature vector after the feature weight allocation is represented as v_final = v_img*w_img + v_struct*w_struct, v_text*w_text. In other words, the features are integrated into a unified feature vector to obtain the output feature vector.
[0069] In some embodiments, the multimodal deep learning network further includes preprocessing and classifying the input data, and extracting features using different classification designs. During feature fusion, the extracted high-dimensional features are reduced or normalized, and principal components are retained based on feature contributions. Specifically, feature extraction is performed on the input image data to obtain asymmetric image visual features, which are then fused and correlated with structured data features during feature fusion. Features are extracted from the structured data to obtain the combined interaction effect between process parameters and equipment status, and to obtain the evaluation score of each feature in the structured data, so as to obtain the contribution of each feature to the output feature vector during feature fusion. Temporal features are extracted from the text, and the correlation between contextual information is obtained based on the temporal features, so as to set key factors in the context during feature fusion, thereby increasing potential factors for compensation of overlay mark asymmetry.
[0070] Specifically, data from different modalities, including wafer images, process parameters, equipment status, and related production documents, will be collected. The data will be preprocessed to ensure its quality and consistency. Different classification designs will be used for feature extraction. During feature fusion, the extracted high-dimensional features will be reduced or normalized, and principal components will be retained based on feature contributions. A feature contribution analysis algorithm similar to PCA will be used to reduce the dimensionality of high-dimensional images and structured data, reduce redundancy, and retain the most representative features on the principal components, ensuring that the feature dimensions are compact and information-rich.
[0071] For example, feature extraction from input image data can be performed using methods such as Convolutional Neural Networks (CNNs) and Vision Transformers, primarily extracting visual features of asymmetry in the image (such as edges, textures, and shapes). Each image is input into a CNN, which outputs a high-dimensional feature vector to be fused and correlated with structured data features during feature fusion. Neural networks for images also include ResNet, Inception, Transformer, or other networks.
[0072] It should be noted that the embodiments in this specification are for an imaging-based OVL error measurement method. The image is the first intuitive data acquired by the device, and image feature extraction is also very important. Much of the structured data is also obtained through image signal processing.
[0073] Feature extraction from structured data reveals that features such as the glue wall angle and trench depth have a clear physical correlation with asymmetry. By obtaining the evaluation score for each feature in the structured data and extracting the importance weights using specific algorithms, the magnitude of the impact of each process parameter on asymmetry can be intuitively explained, thus revealing the hierarchical relationship between process parameters and equipment states. Structured data feature extraction can demonstrate the interaction effects between different process parameters, different equipment states, and different process parameters and different equipment states. For example, the synergistic effect between the glue wall angle and sidewall height may exacerbate asymmetry; these interactive characteristics cannot be directly reflected by image and text features. In other words, it obtains the influence of the interaction of two or more parameters on the output result, so as to obtain the contribution of each feature to the output feature vector during feature fusion.
[0074] Textual features such as production documents (descriptive information in production documents) are converted into vector representations. Recurrent Neural Networks (RNNs) and their variant LSTMs are used to extract temporal features. The correlation between contextual information is obtained based on the temporal features. Potential factors of hard line asymmetry are extracted from the textual descriptions, which facilitates the subsequent reasonable setting of key contributing factors and guides process improvement. That is, during feature fusion, key factors in the context (such as equipment hardware or software parameters obtained from historical data) are set to enhance potential factors for compensating for overlay mark asymmetry.
[0075] In some embodiments, multiple machine learning models are selected to train the multimodal deep learning network, including random forests for classification and regression, evaluating the importance of features; Bayesian classifiers for probabilistic prediction, providing probability estimates of asymmetric features; and Markov chains for time-series data analysis, capturing temporal dependencies in the data.
[0076] In some embodiments, while using a random forest or decision tree model to extract features from structured data, the classification nodes corresponding to process parameters and equipment status are evaluated and sorted in descending order to obtain the evaluation score for each feature. In some embodiments, the structured data processing network also includes an MLP (Multilayer Perceptron), RNN, or other networks to reduce unnecessary noise by filtering highly important structured features.
[0077] By training a network to process structured data, an "importance score" is obtained for each feature, and these scores are sorted in descending order. Importance assessment based on tree models (such as random forests and decision trees) can automatically calculate the contribution of each feature to the target variable. During training, random forests evaluate the split nodes of each feature to determine how much the feature improves the model's predictive accuracy. Generally, features that provide higher information gain at split nodes are assigned higher importance.
[0078] In some embodiments, a Bayesian algorithm is used to obtain the interaction effect between process parameters and equipment status after feature extraction from structured data, in order to eliminate noisy data.
[0079] Specifically, Bayesian algorithms can classify data based on conditional probabilities. After extracting features from structured data, they can reveal the interaction effects between process parameters and equipment status. For example, the synergistic effect between the glue wall angle and sidewall height may exacerbate asymmetry. These interactive features cannot be directly reflected by image and text features. The process parameter and equipment status data in the embodiments of this specification often have high noise levels. Bayesian algorithms are particularly effective for such noisy or highly uncertain structured data, modeling the probability association between each feature and asymmetry, thereby better understanding the impact of structured data on asymmetry—something traditional methods cannot achieve.
[0080] In some embodiments, the method further includes using a Markov chain to obtain context information corresponding to the text information in historical data based on temporal characteristics.
[0081] Specifically, a Markov chain is used to construct a state transition matrix based on temporal characteristics, treating the text feature vector as the state. This matrix describes the probability of transitioning from one state (text feature vector) to another. Each state transition is viewed as a connection between historical data points, obtaining contextual information corresponding to the text information in historical data, and making predictions by analyzing the state transition probabilities. The embodiments in this specification not only analyze from the perspective of parameter prediction error but also predict future errors based on performance on long-term data of the same type.
[0082] In some embodiments, early-stage data for different modalities includes historical and real-time data of wafer fabrication processes; categorized as wafer image data, structured data, and production documents. Structured data refers to datasets containing numerous entries, which include the following parameters:
[0083] Structured data means that this dataset contains many entries, and these entries include the following parameters:
[0084] 1. Process parameters
[0085] Process parameters encompass specific characteristics that influence the surface morphology of wafers. These parameters significantly impact asymmetry during wafer fabrication, and specifically include:
[0086] Asymmetric morphology: refers to the irregular shape features of a wafer surface or structure, including possible asymmetric features such as offset and bending.
[0087] Adhesive wall angle: The angle at which the thick adhesive material forms a barrier. If the angle deviation is too large, it will lead to asymmetrical morphology in subsequent steps.
[0088] Sidewall width: The thickness of the sidewalls of the wafer structure. This parameter affects the accuracy of photolithography or etching. Wider sidewalls result in more complex asymmetries.
[0089] Trench depth: refers to the depth of the trench structure on the wafer. Trenches that are too deep or too shallow can cause asymmetric phenomena such as light deflection or scattering during the photolithography process.
[0090] Groove radius: The radius of curvature or fillet inside the groove. A smaller radius of curvature is more likely to cause asymmetrical deformation.
[0091] Sidewall height: The height of the sidewalls of the wafer structure. This parameter affects the symmetry of photolithography or etching. Height deviations can make it difficult to achieve symmetry control during manufacturing.
[0092] 2. Equipment Status
[0093] Measurement images: Image data captured by the equipment when measuring a wafer, used to detect and evaluate changes in asymmetric morphology.
[0094] Measurement error data: refers to the error data generated by the equipment during the measurement process, including position deviation, signal fluctuation, etc.
[0095] Equipment error analysis data: By analyzing the source and type of equipment measurement errors, systematic deviations during equipment operation can be identified, which helps in equipment calibration.
[0096] Signal data measured by the equipment: The multimodal signals (such as reflected light, transmitted light, interference signals, etc.) acquired by the equipment when measuring the wafer can provide important information about the surface characteristics of the wafer.
[0097] Device hardware parameters:
[0098] Light source: The type of light source used in the equipment (such as laser, LED, etc.). Different light sources have different effects on the measurement accuracy of specific materials.
[0099] Wavelength: The wavelength parameter of a light source, which corresponds to the properties and absorption characteristics of different materials.
[0100] Numerical aperture (NA): Affects resolution and focusing accuracy, and is an important parameter for image quality and precision.
[0101] Focal offset: The amount of focus shift when the equipment performs measurements at different levels, which affects the accuracy and consistency of the measurement.
[0102] Device software parameters:
[0103] Measurement algorithm parameters: The type of measurement algorithm used by the device (such as edge detection, shape matching, etc.). Different algorithms have different effects on asymmetric measurement.
[0104] Optimization algorithm parameters: The optimization algorithm of the equipment's measurement process (such as the error correction algorithm) affects the stability and accuracy of the measurement.
[0105] Preprocessing algorithm parameters: Preprocessing algorithms for measurement data (such as filtering and noise reduction) can improve the reliability and consistency of the data.
[0106] Production documents refer to all kinds of documents and records generated during the production process.
[0107] In some embodiments, cross-validation is used to evaluate model performance and select the best model for final prediction.
[0108] Correspondingly, the embodiments of this specification introduce a multimodal method to predict errors. The trained multimodal deep learning network can achieve broader generalization and is applicable to wafers of different wafer layers and wafers of different processes.
[0109] The embodiments in this specification utilize multimodal deep learning networks to analyze historical and real-time data, enabling in-depth tracing and localization analysis of asymmetries. By integrating different types of data, the generalization ability of the model is improved, maintaining high accuracy for different processes and wafer products and exhibiting good adaptability in various complex production environments.
[0110] Some embodiments were used to verify the results. As shown in Figure 2, the actual SEM image of the asymmetric marking was used. The bevel of the marking was measured using SEM, the difference in the edge was defined, and the actual difference in asymmetry was observed under the SEM image. The results were compared with the prediction results of the model to check the accuracy. As shown in Figure 5, the data validity verification showed that the asymmetric results obtained by SEMCD and this application had a good trend and were consistent, proving that the data was effective.
[0111] The error caused by the asymmetry between the data after model compensation and the actual situation is calculated, and the residual after compensation is shown in Figure 6. Data validity verification - the residual data after asymmetry compensation of the present invention is obviously distributed around the 0 axis, and the mean residual value is 0.7nm. The experiment shows that the system of the present invention can effectively predict and compensate for the error caused by asymmetry.
[0112] The target asymmetric evaluation system based on multimodal learning in the embodiments of this specification can effectively meet the challenges of traditional methods and has the following advantages:
[0113] Deep feature mining: By integrating data from different sources (such as images, structured data, and text), multimodal systems can comprehensively mine potential features, thereby providing more accurate analysis results.
[0114] Automated feature engineering: Multimodal learning networks can automatically learn and optimize feature engineering, reducing reliance on manual intervention and improving the efficiency and accuracy of feature extraction.
[0115] Enhance the model's generalization ability: By integrating data from different modalities, the model can adapt to various processes and products, improve prediction accuracy, and maintain high efficiency in changing production environments.
[0116] The benefits of machine learning are manifested in the following ways:
[0117] Data preparation: Input data from various sources, including image data, structured data, and relevant production documents.
[0118] Preprocess the data to ensure its quality and consistency.
[0119] Image feature processing: Visual features are extracted from images using a pre-trained convolutional neural network (CNN). Each image is fed into the CNN, which outputs a high-dimensional feature vector. The model can also include ResNet, Inception, Transformer, or other networks.
[0120] Structured data feature processing: Algorithms such as decision trees, random forests, and Bayesian methods are used to extract features from structured data. The importance of each feature is also evaluated, and parameters with the greatest asymmetric impact on the objective are selected. The processing networks include MLPs, RNNs, and other networks.
[0121] Text data feature processing: Recurrent neural networks (RNNs) are used to process text data.
[0122] Feature fusion: The extracted features are fused. Principal component analysis (PCA) is used to reduce the dimensionality of image features and structured features, reduce redundancy, retain the main information, and optimize the fused features.
[0123] That is, feature standardization and preprocessing
[0124] Image: Image features extracted using deep models such as CNNs are high-dimensional vectors that can be aligned with the dimensions of structured features through dimensionality reduction or normalization.
[0125] Structured data: Standardize or normalize the numerical features of structured data (such as glue wall angle and trench depth) and equipment status data to ensure that the magnitude of these data is consistent with other modal features.
[0126] Text: Convert text features (such as descriptive information in production documents) into vector representations, for example, using RNN or LSTM encoding, and output the result as a text feature vector.
[0127] Feature selection and dimensionality reduction
[0128] Using a feature contribution analysis algorithm similar to PCA, high-dimensional images and structured data are reduced in dimensionality, and the most representative features are retained on the principal components, ensuring that the feature dimensions are compact and information-rich.
[0129] Use a random forest-like model to filter out highly important structured features, thereby reducing unnecessary noise.
[0130] Feature fusion
[0131] The features reduced by PCA are then fused with the text features. Early fusion or intermediate fusion methods can be used to integrate the features into a unified feature vector.
[0132] Early fusion directly concatenates the features of each modality into a complete feature vector at the input layer. This involves directly concatenating the image feature vector, structured data feature vector, and text feature vector into a single large vector. Assuming the image feature vector is v_img, the structured data feature vector is v_struct, and the text feature vector is v_text, the final feature vector is represented as v_earlyfusion = [v_img, v_struct, v_text];
[0133] The process involves fusing features through one or more fully connected layers, integrating them at intermediate layers. Image features and structured data features are extracted separately, and then fully connected layers are added to each layer in the intermediate layers. The outputs of these fully connected layers are then concatenated. The feature representation capability of the fused product is improved by adding activation functions or dimensionality reduction layers.
[0134] By incorporating an attention mechanism, attention weights are applied to the feature vectors of each modality, enabling the model to automatically learn the weights for each modality. The weights for image features are w_img, the weights for structured data features are w_struct, and the weights for text features are w_text. The final feature vector after weight allocation is represented as v_final = v_img*w_img + v_struct*w_struct, v_text*w_text.
[0135] Model training and prediction: Train various machine learning models based on fused features, such as Markov chain models, for time series analysis to improve prediction accuracy.
[0136] Multiple machine learning models were selected for training, including:
[0137] Random forest: used for classification and regression to evaluate the importance of features.
[0138] Bayesian classifiers: used for probability prediction, providing asymmetric probability estimates.
[0139] Markov chains: used for time series data analysis to capture time dependencies in the data.
[0140] Model evaluation: Cross-validation is used to evaluate model performance, and the best model is selected for final prediction.
[0141] Output: Generate target asymmetry prediction results and display the prediction results and feature importance analysis through visualization tools such as charts and dashboards.
[0142] In other words, by leveraging machine learning techniques, especially multimodal learning networks, the ability to handle asymmetric problems can be significantly improved:
[0143] Intelligent enhancement: Machine learning methods can automatically identify and learn data features of different modalities, reducing reliance on manual feature selection, thereby improving the intelligence level of the system.
[0144] Problem tracing and localization: By analyzing historical and real-time data, machine learning models can provide in-depth tracing and localization analysis for asymmetry and other process problems, and quickly identify potential process defects or anomalies.
[0145] Enhanced generalization ability: By integrating different types of data, multimodal learning networks can improve the model's generalization ability, enabling it to maintain high accuracy in error prediction even when facing different processes and wafer products. This characteristic makes the system highly adaptable to various complex production environments.
[0146] The embodiments described in this specification provide accurate error prediction, particularly in the manufacturing of automotive electronics and new energy power chips, significantly improving production efficiency and reducing rework rates, thereby offering an innovative solution for the semiconductor manufacturing industry.
[0147] In conjunction with the above embodiments, this specification also provides a device for compensating for overlay marking asymmetry in multimodal deep learning networks, comprising:
[0148] The acquisition module is used to acquire manufacturing data for different modes;
[0149] The processing module is used to input manufacturing data of different modalities into a multimodal deep learning network for processing to obtain the prediction error of overlay mark asymmetry. The multimodal deep learning network extracts features from the manufacturing data of different modalities and fuses the vector features corresponding to different modalities. It obtains the prediction error of overlay mark asymmetry by associating and combining the physical parameters of actual production in the manufacturing process with the parameters observed by the equipment.
[0150] The compensation module is used to compensate for the actual overlay marks in actual production based on the predicted error of the asymmetry of the overlay marks.
[0151] In conjunction with the above embodiments, this specification also provides a system for overlay asymmetric compensation of multimodal deep learning networks, comprising: a memory, a processor, and a computer program, wherein the computer program is stored in the memory, and the processor runs the computer program to execute the overlay asymmetric compensation method of the multimodal deep learning network in any of the above technical solutions.
[0152] In conjunction with the above embodiments, this specification also provides a readable storage medium storing a computer program, which, when executed by a processor, is used to implement the overlay label asymmetric compensation method for the multimodal deep learning network in any of the above technical solutions.
[0153] The same or similar parts between the various embodiments in this specification can be referred to mutually. Each of the 15 embodiments focuses on describing the differences from other embodiments. In particular, the product embodiments described later are relatively simple in description since they correspond to the methods, and relevant parts can be referred to the descriptions in the system embodiments.
[0154] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method for asymmetric compensation of overlay markings in multimodal deep learning networks, characterized in that, include: Acquire manufacturing data for different modalities; Manufacturing data from different modalities are input into a multimodal deep learning network for processing to obtain the prediction error of overlay mark asymmetry. The multimodal deep learning network extracts features from the manufacturing data of different modalities and fuses the corresponding vector features of different modalities. It obtains the prediction error of overlay mark asymmetry by associating and combining the physical parameters of actual production during the manufacturing process with the parameters observed by the equipment. Based on the prediction error of the asymmetry of the overlay marks, compensation is made for the actual overlay marks in actual production.
2. The method of claim 1, wherein the asymmetric overlay compensation is performed by the multi-modal deep learning network. Multi-level feature fusion is performed in the multimodal deep learning network; For manufacturing data of different modalities, feature vectors of each modality are extracted respectively, and the feature vectors of each modality are directly concatenated in the input layer to obtain an initial feature vector; Each modality feature vector is fused by adding a fully connected layer in the intermediate layer, and the outputs of the fully connected layer are concatenated again. Through an attention mechanism, attention weights are applied to each modality feature vector to obtain the output feature vector after feature weight allocation. In this method, activation functions or dimensionality reduction layers are used to enhance the output feature vector of the fully connected layer.
3. The multi-modal deep learning network's litho-mark asymmetry compensation method of claim 1, wherein, Also includes: The multimodal deep learning network preprocesses and classifies the input data, and uses different classification designs to extract features. During the feature fusion process, the extracted high-dimensional features are reduced or normalized, and principal components are retained based on feature contributions. In this process, feature extraction is performed on the input image data to obtain asymmetric image visual features, which are then fused and associated with structured data features during feature fusion. Features are extracted from structured data to obtain the combined interaction effect between process parameters and equipment status, and the evaluation score of each feature in the structured data is obtained so as to obtain the contribution of each feature to the output feature vector during feature fusion. Temporal features are extracted from the text, and the correlation between contextual information is obtained based on the temporal features. Key factors in the context are set during feature fusion to increase potential factors for compensation of overlay mark asymmetry.
4. The multi-modal deep learning network's litho-mark asymmetry compensation method of claim 3, wherein, Also includes: While using random forest or decision tree models to extract features from structured data, the split nodes corresponding to process parameters and equipment status are evaluated and sorted in descending order to obtain the evaluation score for each feature.
5. The multi-modal deep learning network's litho-mark asymmetry compensation method of claim 3, wherein, Also includes: A Bayesian algorithm is used to extract features from structured data to obtain the interaction effect between process parameters and equipment status, thereby eliminating noisy data.
6. The method of claim 1, wherein, Also includes: Markov chains are used to obtain contextual information corresponding to textual information in historical data based on temporal characteristics.
7. The multi-modal deep learning network's litho-mark asymmetry compensation method of any one of claims 1-6, wherein, Manufacturing data in different modes includes historical and real-time data of wafer fabrication processes; it is categorized into wafer image data, structured data, and production documents. Correspondingly, the trained multimodal deep learning network is applicable to wafers of different wafer types, wafers of different layers, and wafers of different processes.
8. A device for compensating for asymmetric overlay markings in a multimodal deep learning network, characterized in that, include: The acquisition module is used to acquire manufacturing data for different modes; The processing module is used to input manufacturing data of different modalities into a multimodal deep learning network for processing to obtain the prediction error of overlay mark asymmetry. The multimodal deep learning network extracts features from the manufacturing data of different modalities and fuses the vector features corresponding to different modalities. It obtains the prediction error of overlay mark asymmetry by associating and combining the physical parameters of actual production in the manufacturing process with the parameters observed by the equipment. The compensation module is used to compensate for the actual overlay marks in actual production based on the predicted error of the asymmetry of the overlay marks.
9. A system for compensating for asymmetric overlay markings in a multimodal deep learning network, characterized in that, include: The system includes a memory, a processor, and a computer program, wherein the computer program is stored in the memory, and the processor executes the computer program to perform the overlay label asymmetric compensation method for the multimodal deep learning network according to any one of claims 1-7.
10. A readable storage medium, characterized by, The readable storage medium stores a computer program, which, when executed by a processor, is used to implement the overlay mark asymmetric compensation method for the multimodal deep learning network according to any one of claims 1-7.