Method for predicting random contributors

By using machine learning models and independent component analysis methods, error contribution sources in the lithography process are identified and decomposed, solving the problem of inaccurate identification of error contribution sources in existing technologies, improving the efficiency and yield of the lithography process, and reducing costs.

CN115605811BActive Publication Date: 2026-06-30ASML NETHERLANDS BV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ASML NETHERLANDS BV
Filing Date
2021-05-12
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies struggle to accurately distinguish and eliminate sources of random error during photolithography, impacting high process yield and wafer production. Existing methods are also time-consuming and costly.

Method used

By employing machine learning model training and independent component analysis, combined with feature image data, error contribution sources are identified and decomposed, reducing the impact of noise and improving the accuracy of error contribution sources.

Benefits of technology

It improves the accuracy of identifying error contributors during photolithography, reduces measurement time and wafer damage, lowers costs, and increases process yield and wafer production.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115605811B_ABST
    Figure CN115605811B_ABST
Patent Text Reader

Abstract

This paper describes a method for training a machine learning model to determine error contribution sources for multiple features of a pattern printed on a substrate. The method includes: acquiring training data with multiple datasets, each dataset having an error contribution value representing the error contribution to a feature from one of the multiple sources, and each dataset being associated with an actual classification that identifies the error contribution source for the corresponding dataset; and training the machine learning model based on the training data to predict the classification of a reference dataset in the dataset, such that a cost function is reduced, the cost function determining the difference between the predicted classification and the actual classification of the reference dataset.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-references to related applications

[0002] This application claims priority to EP application 20174556.9, filed on May 14, 2020; EP application 20177933.7, filed on June 3, 2020; and EP application 21171063.7, filed on April 28, 2021, the entire contents of which are incorporated herein by reference. Technical Field

[0003] This article describes lithography equipment and processes, and more specifically, tools for determining random variations in printed patterns (e.g., in a mask or resist layer on a wafer), which can be used to detect defects (e.g., defects in the mask or wafer) and optimize patterning processes, such as mask optimization and source optimization. Background Technology

[0004] Photolithography equipment is a machine that applies a desired pattern to a target area of ​​a substrate. Photolithography equipment can be used in, for example, the manufacture of integrated circuits (ICs). For instance, the IC chip in a smartphone can be as small as a human thumb and can contain over 2 billion transistors. Manufacturing ICs is a complex and time-consuming process, involving circuit components in different layers and hundreds of individual steps. Even an error in one step can lead to problems with the final IC and potentially device failure. High process yields and high wafer throughput can be negatively impacted by the presence of defects, especially when operator intervention is required to inspect them. Inspection tools such as optical or electron microscopes (SEMs) are used to identify defects to help maintain high yields and low costs. Summary of the Invention

[0005] In one embodiment, a non-transitory computer-readable medium is provided, including instructions that, when executed by a computer, cause the computer to perform a method for training a machine learning model to determine sources of error contribution for multiple features of a pattern printed on a substrate. The method includes: acquiring training data having multiple datasets, each dataset having error contribution values ​​representing error contributions to a feature from one of the multiple sources, and each dataset being associated with an actual classification, the actual classification identifying the source of error contribution for the corresponding dataset; and training the machine learning model based on the training data to predict a classification for a reference dataset in the datasets, such that a cost function is reduced, the cost function determining the difference between the predicted classification and the actual classification of the reference dataset.

[0006] In one embodiment, a non-transitory computer-readable medium is provided including instructions that, when executed by a computer, cause the computer to perform a method for determining error contribution sources for a plurality of features of a pattern printed on a substrate. The method includes: inputting a specified dataset having error contribution values, each error contribution value representing an error contribution to a feature from one of a plurality of sources; and executing the machine learning model to determine a classification associated with the specified dataset, wherein the classification identifies a specified source among the plurality of sources as an error contribution source of the error contribution values ​​in the specified dataset.

[0007] Furthermore, in one embodiment, a method is provided for training a machine learning model to determine error contribution sources for multiple features of a pattern printed on a substrate. The method includes: acquiring training data having multiple datasets, each dataset having an error contribution value representing an error contribution to a feature from one of the multiple sources, and wherein each dataset is associated with an actual classification, the actual classification identifying the error contribution source corresponding to that dataset; and training the machine learning model based on the training data to predict the classification of a reference dataset in the dataset, such that a cost function is reduced, the cost function determining the difference between the predicted classification and the actual classification of the reference dataset.

[0008] Furthermore, in one embodiment, a method is provided for determining error contribution sources of multiple features of a pattern printed on a substrate. The method includes: inputting a specified dataset having error contribution values ​​representing the error contribution to a feature from one of a plurality of sources into a machine learning model; and executing the machine learning model to determine a classification associated with the specified dataset, wherein the classification identifies a specified source among the plurality of sources as an error contribution source of the error contribution values ​​in the specified dataset.

[0009] Furthermore, in one embodiment, an apparatus is provided for training a machine learning model to determine error contribution sources for multiple features of a pattern printed on a substrate. The apparatus includes a memory storing an instruction set; and at least one processor configured to execute the instruction set to cause the apparatus to perform the following methods: acquiring training data having multiple datasets, each dataset having an error contribution value representing an error contribution to a feature from one of the multiple sources, and wherein each dataset is associated with an actual classification, the actual classification identifying the error contribution source corresponding to the dataset; and training the machine learning model based on the training data to predict the classification of a reference dataset in the dataset, such that a cost function is reduced, the cost function determining the difference between the predicted classification and the actual classification of the reference dataset.

[0010] Furthermore, in one embodiment, a non-transitory computer-readable medium is provided including instructions that, when executed by a computer, cause the computer to perform a method for training a machine learning model to determine error contributions to features of a pattern printed on a substrate. The method includes: acquiring training data having multiple datasets, wherein the datasets include a first dataset having: (a) first image data of one or more features of a pattern to be printed on the substrate, and (b) first error contribution data, the first error contribution data including error contributions to one or more features from multiple sources; and training a machine learning model based on the training data to predict error contribution data for the first dataset such that a cost function is reduced, the cost function indicating the difference between the predicted error contribution data and the first error contribution data.

[0011] Furthermore, in one embodiment, a non-transitory computer-readable medium is provided including instructions that, when executed by a computer, cause the computer to perform a method for determining error contribution data, the error contribution data including error contributions from multiple sources to features of a pattern to be printed on a substrate. The method includes: receiving image data of a feature set of a specified pattern to be printed on a first substrate; inputting the image data into a machine learning model; and executing the machine learning model to determine error contribution data including error contributions to the feature set from multiple sources.

[0012] Furthermore, in one embodiment, a method is provided for training a machine learning model to determine error contributions of features of a pattern printed on a substrate. The method includes acquiring training data having multiple datasets, wherein the datasets include a first dataset having: (a) first image data of one or more features of a pattern to be printed on the substrate, and (b) first error contribution data, the first error contribution data including error contributions to one or more features from multiple sources; and training a machine learning model based on the training data to predict error contribution data for the first dataset such that a cost function is reduced, the cost function indicating the difference between the predicted error contribution data and the first error contribution data.

[0013] Furthermore, in one embodiment, a method is provided for determining error contribution data, which includes error contributions to pattern features printed on a substrate from multiple sources. The method includes receiving image data of a feature set of a specified pattern to be printed on a first substrate; inputting the image data into a machine learning model; and executing the machine learning model to determine the error contribution data, which includes error contributions to the feature set from multiple sources.

[0014] Furthermore, in one embodiment, an apparatus is provided for training a machine learning model to determine error contributions of pattern features printed on a substrate. The apparatus includes a memory storing an instruction set; and at least one processor configured to execute the instruction set to cause the apparatus to perform the following methods: acquiring training data having multiple datasets, wherein the datasets include a first dataset having: (a) first image data of one or more features of a pattern to be printed on a substrate, and (b) first error contribution data, the first error contribution data including error contributions to one or more features from multiple sources; and training a machine learning model based on the training data to predict error contribution data for the first dataset such that a cost function is reduced, the cost function indicating the difference between the predicted error contribution data and the first error contribution data.

[0015] Furthermore, in one embodiment, an apparatus is provided for determining error contribution data, the error contribution data including error contributions from multiple sources to a pattern feature to be printed on a substrate. The apparatus includes a memory storing an instruction set; and at least one processor configured to execute the instruction set to cause the apparatus to perform the following methods: receiving image data of a feature set of a specified pattern to be printed on a first substrate; inputting the image data into a machine learning model; and executing the machine learning model to determine error contribution data, the error contribution data including error contributions to the feature set from multiple sources.

[0016] In addition, in one embodiment, a computer program product is provided including a non-transitory computer-readable medium having instructions recorded thereon, the instructions implementing the aforementioned method when executed by a computer system. Attached Figure Description

[0017] Embodiments will now be described by way of example only with reference to the accompanying drawings, in which:

[0018] Figure 1 This is a block diagram of various subsystems of a lithography system according to one embodiment.

[0019] Figure 2 According to one embodiment and Figure 1 The block diagram of the simulation model corresponding to the subsystem in the diagram.

[0020] Figure 3 This is a block diagram illustrating the use of Independent Component Analysis (ICA) to decompose data according to one embodiment.

[0021] Figure 4 This is a block diagram illustrating an example scanning electron microscope (SEM) image and a graph of the critical size (CD) value of a contact hole printed on a substrate according to one embodiment.

[0022] Figure 5A graph showing the measured values ​​of features corresponding to multiple thresholds acquired at multiple measurement points, according to one embodiment, is presented.

[0023] Figure 6 The diagram illustrates a block diagram of a decomposer module according to one embodiment, which decomposes measurement data associated with features to obtain error contributors.

[0024] Figure 7A This is a graph of LCDU data for decomposing error contributors according to one embodiment.

[0025] Figure 7B This is another graph of LCDU data for decomposing error contributors according to one embodiment.

[0026] Figure 8A This is a flowchart of a process for decomposing measurements of features according to one embodiment to derive error contributions from multiple sources.

[0027] Figure 8B This is a flowchart of a process for deriving error contributions from a linear mixture using ICA, according to one embodiment.

[0028] Figure 9 This is according to one embodiment for obtaining Figure 8A The flowchart of the process of measuring the decomposition process.

[0029] Figure 10 This is a diagram illustrating a process for obtaining measurements of contour lines for various thresholds according to one embodiment.

[0030] Figure 11 An embodiment of SEM according to one embodiment is illustrated schematically.

[0031] Figure 12 An embodiment of an electron beam inspection apparatus according to one embodiment is schematically depicted.

[0032] Figure 13 This is a flowchart illustrating various aspects of an example method for joint optimization according to one embodiment.

[0033] Figure 14 An embodiment of another optimization method according to one embodiment is shown.

[0034] Figure 15A , Figure 15B and Figure 16 Example flowcharts of various optimization processes according to one embodiment are shown.

[0035] Figure 17 This is a block diagram of an example computer system according to one embodiment.

[0036] Figure 18 This is a schematic diagram of a photolithography projection apparatus according to one embodiment.

[0037] Figure 19 This is a schematic diagram of another photolithography projection apparatus according to one embodiment.

[0038] Figure 20 According to one embodiment Figure 19 A more detailed view of the device.

[0039] Figure 21 According to one embodiment Figure 19 and Figure 20 A more detailed view of the device's source collector module SO.

[0040] Figure 22 This is a block diagram illustrating a classification of a dataset or error contribution signal representing error contribution values ​​based on the error contribution source, according to one embodiment.

[0041] Figure 23 The illustration shows training according to one embodiment. Figure 22 The block diagram shows a classifier model for classifying error contribution signals based on error contribution sources.

[0042] Figure 24 This is a flowchart of a process for generating an error contribution signal according to one embodiment.

[0043] Figure 25A This is a flowchart illustrating the process of determining the classification of error contributor signals for training a classifier model, according to one embodiment.

[0044] Figure 25B This is a flowchart illustrating the process of determining the classification of error contributor signals for training a classifier model, according to one embodiment.

[0045] Figure 26 This is a flowchart of a process for determining the source of an error contribution signal according to one embodiment.

[0046] Figure 27A This is a flowchart illustrating a process for training an error contribution model to predict error contributions from multiple sources, according to one embodiment.

[0047] Figure 27B This is a flowchart illustrating a process for training an error contribution model to predict error contributions from multiple sources, according to one embodiment.

[0048] Figure 28 This is a block diagram illustrating how a training error contribution model, based on one embodiment, determines error contributions from multiple sources.

[0049] Figure 29 This is a flowchart, according to one embodiment, of a process for determining the error contribution of multiple sources to a pattern feature to be printed on a substrate.

[0050] Figure 30 This is a block diagram according to one embodiment for determining the error contribution of multiple sources to a pattern feature to be printed on a substrate.

[0051] Embodiments will now be described in detail with reference to the accompanying drawings, which are provided as illustrative examples to enable those skilled in the art to practice these embodiments. Note that the following drawings and examples are not intended to limit the scope to a single embodiment, but rather other embodiments are possible by interchangeably using some or all of the elements described or illustrated. The same reference numerals are used throughout the drawings to denote the same or similar parts whenever convenient. Where certain elements of these embodiments can be implemented partially or entirely using known components, only those portions of such known components necessary for understanding these embodiments will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the description of these embodiments. Embodiments illustrating a single component in this specification should not be considered limiting; rather, unless expressly stated otherwise herein, the scope of the invention is intended to cover other embodiments comprising a plurality of identical components, and vice versa. Furthermore, unless expressly stated otherwise, the applicant does not intend to assign any term in the specification or claims an unusual or special meaning. Moreover, the scope covers current and future known equivalents of the components mentioned herein as illustrative. Detailed Implementation

[0052] Photolithography equipment is a machine that applies a desired pattern onto a target portion of a substrate. The process of transferring a desired pattern onto a substrate is called a patterning process. The patterning process may include a patterning step that transfers a pattern from a patterning apparatus (such as a mask) onto a substrate. Furthermore, one or more related patterning steps may then occur, such as resist development using a developing apparatus, baking the substrate using a baking tool, etching the pattern onto the substrate using an etching apparatus, etc. Various variations (e.g., random variations, errors, or noise caused by any of the inspection tools, masks, or resists) can potentially limit the realization of photolithography in high-volume semiconductor manufacturing (HVM). To characterize, understand, and determine such variations, reliable methods are needed in industry to measure these variations across various design patterns.

[0053] Some implementations use independent component analysis (ICA) to derive random variations. In ICA, measurement data for multiple features are acquired using multiple sensors. For example, three measurement datasets are acquired using three different sensors, and these three datasets are input as three signals to the ICA method. The ICA method decomposes the three input signals to obtain three output signals corresponding to error contributions from three sources, such as masks, resists, and inspection tools like scanning electron microscopes (SEM). However, in some cases, the ICA method may not be able to determine which output signal corresponds to which source of error contribution because the error contributions from various sources may be similar, and therefore the ICA method may not be able to distinguish them.

[0054] Some embodiments of this disclosure identify error contribution sources for a given error contribution signal. A machine learning (ML) model is trained to distinguish error contributions from various sources, and the trained ML model is used to determine the classification (e.g., error contribution source) of a given signal.

[0055] While the ICA method can be used to determine error contributions from multiple sources, it is characterized by the assumption that the error contributions are a linear mixture of errors from different sources. In some embodiments, additional noise sources may exist, such as noise from sources different from those determined using ICA, and if these noise sources are not removed when using the ICA method, the error contributions determined by the ICA method may be inaccurate. Therefore, the ICA method can be constrained by the aforementioned assumption. Embodiments of this disclosure implement an ML model to determine error contributions from a set of sources. For example, an ML model is trained using images of various features and error contribution measurements associated with these features to predict error contributions from a set of sources for a given feature. Error contribution measurements used to train the ML model can be obtained using a method that is not constrained by the assumption that the error contributions are a linear mixture of errors from the set of sources. For prediction, an image of a feature (e.g., a contact hole) is provided as input to the ML model, and the ML model predicts error contributions from various sources for the input feature. By training an ML model based on error contribution data determined using the following method, which is not constrained by the assumption that error contributions are a linear mixture of source sets, the error contribution data predicted by the ML model can be unaffected by the presence of additional noise sources, thereby improving the accuracy of determining error contributions.

[0056] As a brief introduction Figure 1 An exemplary photolithography projection apparatus 10A is illustrated.

[0057] While specific references may be made herein to the manufacture of ICs, it should be clearly understood that the description herein has many other possible applications. For example, it can be used to manufacture integrated optical systems, guide and detection patterns for magnetic domain memories, liquid crystal display panels, thin-film magnetic heads, etc. Those skilled in the art will understand that, in the context of such alternative applications, any use of the terms “mask,” “wafer,” or “die” herein should be considered interchangeable with the more general terms “mask,” “substrate,” and “target portion,” respectively.

[0058] In this document, the terms “radiation” and “beam” are used to cover all types of electromagnetic radiation, including ultraviolet radiation (e.g., with wavelengths of 365 nm, 248 nm, 193 nm, 157 nm, or 126 nm) and EUV (extreme ultraviolet radiation, e.g., with wavelengths in the range of 5–20 nm).

[0059] The term "optimizing" as used in this article refers to adjusting the lithography projection equipment to give the lithography result or process more desirable characteristics, such as higher projection accuracy and a larger process window when the design layout is placed on the substrate.

[0060] Furthermore, the lithography projection apparatus can be of the type having two or more substrate stages (or two or more pattern forming apparatus stages). In such a "multi-stage" apparatus, additional stages can be used in parallel, or preparation steps can be performed on one or more stages while one or more other stages are used for exposure. A dual-stage lithography projection apparatus is described, for example, in US 5,969,441, which is incorporated herein by reference.

[0061] The patterning apparatus mentioned above includes or can form design layouts. Design layouts can be generated using CAD (Computer-Aided Design) programs, a process often referred to as EDA (Electronic Design Automation). Most CAD programs follow a predetermined set of design rules to create functional design layouts / patterning apparatuses. These rules are set by processing and design constraints. For example, design rules define the spatial tolerances between circuit devices (such as gates, capacitors, etc.) or interconnects to ensure that circuit devices or lines do not interact with each other in an undesirable manner. Design rule constraints are often referred to as “critical dimensions” (CD). The critical dimension of a circuit can be defined as the minimum width of a line or via, or the minimum spacing between two lines or two vias. Therefore, CD determines the overall size and density of the designed circuit. Of course, one goal in integrated circuit manufacturing is (via the patterning apparatus) to faithfully reproduce the original circuit design on the substrate.

[0062] The terms “mask” or “patterning apparatus” as used herein can be broadly interpreted to refer to a general patterning apparatus that can be used to impart a patterned cross-section to an incident radiation beam, corresponding to a pattern to be created in a target portion of the substrate; the term “optical valve” may also be used in this context. Examples of such patterning apparatuses, besides classical masks (transmissive or reflective; binary, phase-shifting, hybrid, etc.), include:

[0063] - Programmable mirror arrays. An example of such a device is a matrix-addressable surface with a viscoelastic control layer and a reflective surface. The basic principle behind such a device is (for example): addressable regions of the reflective surface reflect incident radiation as diffracted radiation, while non-addressable regions reflect incident radiation as non-diffracted radiation. Using appropriate filters, the non-diffracted radiation can be filtered out from the reflected beam, leaving only the diffracted radiation; in this way, the beam is patterned according to the addressing pattern of the matrix-addressable surface. Appropriate electronic components can be used to perform the desired matrix addressing. More information about such mirror arrays can be found, for example, in U.S. Patent Nos. 5,296,891 and 5,523,193, which are incorporated herein by reference.

[0064] - Programmable LCD array. An example of such a construction is given in U.S. Patent No. 5,229,872, which is incorporated herein by reference.

[0065] The main components are: a radiation source 12A, which may be a deep ultraviolet excimer laser source or other types of sources, including extreme ultraviolet (EUV) sources (as mentioned above, the photolithography projection device itself does not need to have a radiation source); an irradiation optics that define partial coherence (denoted as σ) and may include optics 14A, 16Aa, and 16Ab that shape the radiation from source 12A; a pattern forming apparatus 14A; and a transmission optics 16Ac that projects an image of the pattern from the pattern forming apparatus onto a substrate plane 22A. An adjustable filter or aperture 20A at the pupil plane of the projection optics can limit the range of beam angles striking the substrate plane 22A, wherein the maximum possible angle defines the numerical aperture NA of the projection optics as sin(θ). max ).

[0066] In the optimization process of a system, the quality factor of the system can be represented as a cost function. The optimization process boils down to finding the set of system parameters (design variables) that minimizes the cost function. Depending on the optimization objective, the cost function can have any suitable form. For example, the cost function can be the weighted root mean square (RMS) of the deviations of certain characteristics of the system (evaluation points) from the expected values ​​(e.g., ideal values) of these characteristics; the cost function can also be the maximum value of these deviations (i.e., the worst-case deviation). The term "evaluation point" in this document should be interpreted broadly to include any characteristic of the system. The design variables of the system can be restricted to a finite range or are interrelated due to the practicality of the system implementation. In the case of photolithography projection equipment, constraints are typically associated with the physical properties and characteristics of the hardware (such as adjustable range or patterning device manufacturing design rules), and evaluation points can include physical points on the resist image on the substrate as well as non-physical characteristics (such as dose and focus).

[0067] In a photolithography projection apparatus, a source provides illumination (i.e., light); projection optics guide and shape the illumination via a patterning apparatus and direct the illumination onto a substrate. The term "projection optics" should be broadly defined herein as any optical component that can alter the wavefront of the radiation beam. For example, projection optics may include at least some of components 14A, 16Aa, 16Ab, and 16Ac. A spatial image (AI) is the distribution of radiation intensity at the substrate level. A resist layer on the substrate is exposed, and the spatial image is transferred to the resist layer as a potential "resist image" (RI). A resist image (RI) can be defined as the spatial distribution of the solubility of the resist in the resist layer. Resist models can be used to calculate the resist image from the spatial image, examples of which can be found in commonly assigned U.S. Patent Application Serial No. 12 / 315,849, the entire contents of which are incorporated herein by reference. Resist models relate only to the properties of the resist layer (e.g., the effects of chemical processes occurring during exposure, PEB, and development). The optical properties of a lithographic projection apparatus (e.g., the properties of the source, patterning apparatus, and projection optics) indicate a spatial image. Since the patterning apparatus used in a lithographic projection apparatus can be modified, it is desirable to separate the optical properties of the patterning apparatus from the optical properties of the rest of the lithographic projection apparatus, which includes at least the light source and the projection optics system.

[0068] Figure 2An exemplary flowchart for simulating lithography in a lithographic projection apparatus is illustrated. Source model 31 represents the optical characteristics of the source (including radiation intensity distribution or phase distribution). Projection optics model 32 represents the optical characteristics of the projection optics (including variations in radiation intensity distribution or phase distribution caused by the projection optics). Design layout model 35 represents the optical characteristics of the design layout (including variations in radiation intensity distribution or phase distribution caused by a given design layout 33), which is a representation of the arrangement of features on or formed by the patterning apparatus. A spatial image 36 can be simulated based on the design layout model 35, the projection optics model 32, and the design layout model 35. A resist image 38 can be simulated based on the spatial image 36 using a resist model 37. The lithography simulation can, for example, predict contour lines and CDs in the resist image.

[0069] More specifically, note that source model 31 can represent the optical characteristics of the source, including but not limited to NA-sigma(σ) settings and any particular illumination source shape (e.g., off-axis radiation sources such as ring, quadrupole, and dipole). Projection optics model 32 can represent the optical characteristics of the projection optics, including aberrations, distortions, refractive index, physical dimensions, physical dimensions, etc. Design layout model 35 can also represent the physical properties of the physical pattern forming apparatus, as described, for example, in U.S. Patent No. 7,587,704, which is incorporated herein by reference in its entirety. The purpose of the simulation is to accurately predict, for example, edge locations, spatial image intensity slope, and CD, which can then be compared with the intended design. The intended design is typically defined as a pre-OPC design layout that can be provided in a standardized digital file format such as GDSII or OASIS or other file formats.

[0070] Based on the design layout, one or more portions (referred to as "segments") can be identified. In one embodiment, a group of segments is extracted, representing complex patterns in the design layout (typically around 50 to 1000 segments, but any number of segments can be used). As those skilled in the art will understand, these patterns or segments represent small parts of the design (e.g., circuits, cells, or patterns), and in particular, segments represent small parts that require special attention or verification. In other words, segments can be parts of the design layout, or can resemble parts of the design layout or have similar behavior, where key features are identified through experience (including segments provided by the customer), through trial and error, or by running full-chip simulation. Segments typically contain one or more test patterns or quasi-quantitative patterns.

[0071] The initial large set of segments can be provided a priori by the customer based on known key feature regions in the design layout that require specific image optimization. Alternatively, in another embodiment, the initial large set of segments can be extracted from the entire design layout using some automated (such as machine vision) or manual method that identifies key feature regions.

[0072] Because of the “few” photons per millijoule dose and the preferred combination of low-dose processes (e.g., in terms of shrinkage potential and exposure dose specifications), stochastic variations in patterning processes (e.g., resist processes) potentially limit EUV lithography realization in high-volume semiconductor manufacturing (HVM), which in turn affects product yield or wafer output, or both, of the patterning process. In one embodiment, stochastic variations in the resist layer can manifest in different failure modes, described by, for example, linewidth roughness (LWR), line edge roughness (LER), local CD inhomogeneity, closed holes or trenches, or dashed lines under extreme conditions. Such stochastic variations affect and limit successful high-volume manufacturing (HVM). To characterize, understand, and predict stochastic variations, reliable methods are needed industrially to measure such variations across a variety of design patterns.

[0073] Existing methods for measuring random variations involve different measurement techniques for different features. For example, measuring lines / spacing in one direction (e.g., x or y) can be done in two directions (e.g., x and y) for printed contact holes or arrays of contact hole patterns on a substrate. As examples of measurements, pattern measurement is line width roughness (LWR) (an example of unidirectional measurement), and measurement of repeating dense contact arrays is local CD uniformity (LCDU) (an example of bidirectional measurement). Various random contributors lead to variations in the LWR / LCDU of the feature.

[0074] To control, reduce, and predict random contributors, the semiconductor industry needs robust solutions to accurately measure them. Currently, industrial measurements estimate random contributors for line LWRs and LCDUs used in repeating contact arrays. Furthermore, these measurements focus only on the pattern level (e.g., one number per pattern) and not on the edge point level where hotspots occur (e.g., points along the pattern outline).

[0075] In one embodiment, a measurement tool such as a scanning electron microscope (SEM) is used to characterize random contributors associated with the desired pattern. Noise is embedded in the SEM image data captured by the SEM tool. In one embodiment, the SEM image can be analyzed to determine the CD of a feature (e.g., the CD of a contact hole), ΔCD (which is the deviation of CD from the mean of the CD distribution), and the LCDU of the contact hole. In one embodiment, the term "local" (e.g., in an LCDU) can refer to a specific region (e.g., a unit cell or a specific die). In one embodiment, the CD of a contact hole or LCDU may be affected by a number of contributors including: (i) SEM noise (or SEM error contribution) δCD. SEM (ii) Mask noise (or mask error contribution) δCD MASK And (iii) resist noise (or resist error contribution) δCD RESIST In the following equation, the measured CD of the contact hole can be expressed as:

[0076]

[0077] in It is the average CD of multiple contact holes.

[0078] Mask noise can originate from errors during mask fabrication. Resist noise (also known as shot noise) can originate from the chemical layers in the resist, photon shot noise from the light source of the lithography equipment used to print patterns on the substrate, and SEM-related noise can originate from the SEM (e.g., shot noise from the electron side). In the prior art, noise decomposition can be performed based on a linear nested model. For example, the local critical size uniformity (LCDU) of contact holes has various contributions, including SEM noise, mask noise, and resist noise. In one embodiment, LCDU data can be provided to a linear nested model to decompose these three contributions.

[0079] In one embodiment, to prepare data for the decomposition method using existing techniques, a dedicated experiment is performed to conduct measurements, which includes: printing a design pattern on a substrate, capturing images of the printed pattern on the substrate twice using the same SEM metrology, and allowing local alignment in the fabrication to reduce SEM measurement position offset between different measurement repetitions. Similar measurements can be performed between different dies. In one embodiment, anchoring features (e.g., at the center of the area to be scanned) are typically included in the SEM's field of view (FOV) to help align SEM images between different measurements (and different dies).

[0080] In this disclosure, the term "repetition" as used for measurements relating to the substrate refers to multiple measurements performed at a specified location on the substrate using a specified metrology. For example, repetitive data refers to acquiring multiple images at a first location on the substrate (e.g., the center of a specified die) using a specified metrology (e.g., landing energy, probe current, scan rate, etc.). In one embodiment, at least two repetitive data points are generated from multiple images.

[0081] The disadvantages of existing technologies include, but are not limited to, the following: Obtaining measurements may require specialized experiments, which is time-consuming, costly, and consumes significant computational and manufacturing resources. The measurement process involves at least two repetitions. Subsequently, a large (x, y, z) placement offset exists between any two measurement repetitions. For example, when running SEM metrology multiple times, the fabrication process must perform global and local alignment (e.g., wafer alignment) for each run. Even with local alignment (which reduces measurement yield), a typical (x, y) placement error is approximately 10 nm. Large variations exist in the time hysteresis associated with the same die location, resulting in significant SEM shrinkage uncertainty associated with the resist of the measured substrate. For example, when running SEM metrology twice, it is difficult to control the time elapsed between the first and second measurement repetitions in different dies. Time elapsed increases the shrinkage uncertainty between the two measurement repetitions. This shrinkage uncertainty reduces the accuracy of decomposition results such as SEM noise, mask noise, and resist. Longer data acquisition times and a higher chance of wafer damage exist. For example, to obtain good-quality SEM images at defined locations on a substrate, metrology tools must perform focus adjustments, global and local alignments for each fabrication run. This results in longer acquisition times and a greater chance of wafer damage. When using the SEM beam for focusing and local alignment, the SEM beam can damage the wafer surface.

[0082] This disclosure uses Independent Component Analysis (ICA) to decompose the LWR / LCDU / CD distribution. Some advantages of the disclosed method include eliminating the need for dedicated experiments and multiple repetitions, and minimizing the number of SEM images required for decomposition (typically requiring a much smaller number of SEM images than known methods). Furthermore, the disclosed method performs decomposition with less measurement time and less wafer damage compared to existing methods. In one embodiment, the method uses a large FOV and high-throughput SEM tool (such as an HMI) capable of acquiring SEM images covering a large wafer area in a short time. While the following embodiments for deriving error contributors are described with reference to CD distribution and LCDU data, these embodiments are not limited to CD distribution and LCDU data; they can also be used to derive error contributions from LWR data with decomposition characteristics.

[0083] Figure 3 This is a block diagram illustrating a method 300 for decomposing data using ICA according to various embodiments. ICA is a known decomposition method in signal processing; however, it is briefly described below for convenience. ICA is a technique for blind source signal separation of linearly mixed signals without having any information about the original signal. ICA attempts to decompose a multivariate signal into independent non-Gaussian signals. For example, sound is typically a signal composed of the sum of the values ​​of signals from several sources at each time t. The question then becomes whether these contributing sources can be separated from the observed total signal. Blind ICA separation of mixed signals yields very good results when the statistical independence assumption is correct.

[0084] A simple application of ICA is the "cocktail party problem," where a base speech signal (e.g., a first source signal 301 and a second source signal 302) is separated from sample data consisting of people speaking simultaneously in a room. The sample data can be different observations of different people talking simultaneously. For example, the first observation can be a first mixed signal 305 of source signals 301 and 302 output from a first sensor 311 (e.g., a microphone) located at a first position in the room, while the second observation can be a second mixed signal 306 of source signals 301 and 302 output from a second sensor 312 (e.g., a microphone) located at a second position different from the first position. A decomposer module 320 based on the ICA method can analyze the mixed signals 305 and 306 into a linear mixed signal, determine the mixing matrix (A) 313, and decompose the linear mixed signal using a non-mixing matrix 314 to determine the original source signals 301 and 302.

[0085] In some embodiments, the ICA determines the mixing matrix as follows. In the ICA, n mixed signals (e.g., mixed signals 305 and 306) are represented as n linear mixtures x1,…,xn of n independent components s (e.g., source signals 301 and 302).

[0086] xj = aj1s1 + aj2s2 + ... + aj n s n For all j…(2)

[0087] In some embodiments, the linear mixture is a linear function of the coefficient set and the explanatory variables (independent variables), the values ​​of which are used to predict the outcome of the dependent variable. In Equation 2 above, the dependent variable can be xj, and the coefficient set can be aj1 to aj. n And the explanatory variables can be S1 to S2. n .

[0088] Let x denote a vector whose elements are a linear mixture of x1-xn, and similarly let s denote a vector with elements s1-sn. Let A denote a matrix with coefficients aij. Using this vector-matrix notation, the above mixture model can be written as:

[0089] x=As…(3)

[0090] or

[0091]

[0092] In some embodiments, the statistical model in Equation 4 is referred to as Independent Component Analysis (ICA) or an ICA model. An ICA model is a generative model, meaning it describes how observed data are generated through a process of mixing components si. Independent components are latent variables, meaning they cannot be directly observed. Furthermore, it is assumed that the mixing matrix (A)313 is unknown. All observations are random vectors x, and A and s can be estimated using them. This must be done under the most general possible assumptions.

[0093] The ICA model performs several processes (e.g., linearly mixing the source signals, whitening the mixed signals, which will not be described here for simplicity) to determine the mixing matrix (A)313. Then, after estimating the mixing matrix (A)313, the inverse 314, such as W, of the mixing matrix (A)313 is obtained, which is then used to obtain the source components s via the following equation:

[0094] s = Wx...(5).

[0095] In some embodiments, ICA is based on two assumptions: (1) the source signals Si are independent of each other, and (2) the values ​​in each source signal Si have a non-Gaussian distribution. Furthermore, one constraint in ICA may be that if there are N sources, at least N observations (e.g., sensors or microphones) are required to recover the original N signals. While the following paragraphs describe deriving three error contributors using three input signals, it should be noted that more than three input signals can be used to derive three error contributors. In another example, if two error contributors are to be derived, two or more input signals may be required. In some embodiments, the ICA method can be implemented using one of a variety of algorithms, such as FastICA, infomax, JADE, and kernel-independent component analysis.

[0096] In some embodiments, the ICA method can be used to determine the error contribution factor, such as δCD, of the LCDU / CD distribution of contact holes printed on the substrate. MASK δCD RESIST and δCD SEM This will be referenced at least in the following text. Figures 4-9This is used to describe the process. Note that the decomposition of error contributors is not limited to the ICA method, and other variations of the ICA method can be used, such as the Reconstructed ICA (RICA) method or the Orthogonal ICA method.

[0097] Figure 4 This is a block diagram illustrating an example SEM image and a graph of CD values ​​for a contact hole printed on a substrate according to one embodiment. SEM image 405 may be an image of a design pattern printed on the substrate, obtained using an image acquisition tool such as SEM. The design pattern printed on the substrate may include multiple features, such as the contact hole 410 shown in SEM image 405. One or more measurements can be obtained from SEM image 405, using which multiple error contributors (such as δCD) can be derived. MASK δCD RESIST and δCD SEM Each of the following. Examples of such measurements may include the CD distribution (e.g., CD value or δCD value) or LCDU, as described in detail below.

[0098] In some embodiments, the contour of the contact hole 410 can be obtained using a threshold associated with the SEM image 405. For example, the SEM image 405 can be a grayscale image, and the threshold can be a pixel value (e.g., corresponding to a bright white band in the grayscale image), such as 30%, 50%, or 70% as shown in graph 415. Graph 415 shows the CD value of the contact hole contour for various thresholds (e.g., bright white band values). In some embodiments, if the value of a white pixel is "1" and the value of a black pixel is "0", the threshold for 30% of the bright white band can be 30% of "1", which is "0.3". For this threshold, the position of the contour (e.g., the contour height) can be obtained, and thus the CD of the contour can be obtained. In some embodiments, the threshold corresponds to... Figure 3 The sensor described by the ICA method in [the text].

[0099] The position of the contour line, and therefore the CD of the contour line, is typically influenced by the error contributor. Therefore, the CD value of the contour line with respect to a first threshold 421 (e.g., 30%) can be used as or used to derive a mixed signal, which can be input to an ICA method to be decomposed to obtain the error contributor to the CD distribution. In some embodiments, instead of using the CD value, a δCD value can be used as the mixed signal input to the ICA method. In some embodiments, the δCD value of the contact hole can be the difference between the average CD value and the CD value of the contact hole. In some embodiments, the average CD value is the average of the CD values ​​of multiple contact holes. Furthermore, in some embodiments, the δCD value can be determined using the average CD value shifted to "0" (meaning subtracting the average from the CD values ​​of all contact holes). In some embodiments, the δCD value of the contact hole can be the distance between a specified point on the contour line of the contact hole and a reference point on a reference contour line of the contact hole. The reference contour line can be obtained from a target pattern simulated based on a mask pattern of the corresponding contact hole.

[0100] In some embodiments, the relationship between the δCD value of the contact hole and the error contributor can be expressed as:

[0101] δCD=δCD MASK +δCD RESIST +δCD SEM (6)

[0102] In order to use ICA to decompose the error contributors, in some embodiments, δCD can be represented as a linear mixture of the error contributors as follows:

[0103] δCD=a11*δCD MASK +a12*δCD RESIST +a13*δCD SEM (7)

[0104] Where a11-a13 are the coefficient set of linear mixtures of ICA and part of the mixture matrix (A)313 of ICA.

[0105] The δCD value can be used as input to the ICA method. However, in some embodiments, due to the existence of three error contributors, at least three different δCD values ​​may be required for the decomposition process because ICA has the constraint that the number of mixed signals required as input must be equal to or greater than the number of source components to be derived or decomposed. Therefore, δCD values ​​are obtained for three different thresholds of the bright band, for example, the first δCD value, i.e., δCD 30% The second δCD value, i.e., δCD, is obtained based on the CD value at the first threshold 421 (e.g., 30% of the bright band). 50%The third δCD value is obtained based on the CD value at the second threshold of 422 (e.g., 50% of the bright band), and the third δCD value is also obtained based on the CD value at the second threshold of 422. 70% The values ​​are obtained based on the CD value at the third threshold of 423 (e.g., 70% of the bright band). The three δCD values ​​can be represented as three different linear mixtures of the error contributors as follows:

[0106] δCD 30% =a11*δCD MASK +a12*δCD RESIST +a13*δCD SEM (8)

[0107] δCD 50% =a21*δCD MASK +a22*δCD RESIST +a23*δCD SEM (9)

[0108] δCD 70% =A31*δCD MASK +a32*δCD RESIST +a33*δCD SEM (10)

[0109] or

[0110]

[0111] in It is a mixed matrix 313, and δCD MASK δCD RESIST and δCD SEM It is a function of the error contribution square in Equation 8-10. For example, δCD MASK It can be considered as δCD MASK(30%) δCD MASK(50%) and δCD MASK(70%) The average of (70%) values, or δCD MASK It can be considered as δCD MASK(30%) δCD MASK(50%) and δCD MASK(70%) One of the values.

[0112] Although the above δCD value, i.e. δCD 30% δCD 50% and δCD 70% Determined relative to a single measurement point, but for each of the three thresholds that generate three different signals, multiple such δCD values ​​are obtained for multiple measurement points, where the first signal includes multiple δCD values. 30% The value, the second signal includes multiple δCD values. 50%The value, and the third signal includes multiple δCD values. 70% value.

[0113] Figure 5 A graph showing measured values ​​of features corresponding to each of a plurality of thresholds obtained at a plurality of measurement points, according to one embodiment, is illustrated. Graph 505 shows the CD values ​​obtained at each measurement point for each of the three thresholds. For example, graph 505 shows a first CD value set 515 obtained at a first threshold 421 of 30%, a second CD value set 520 obtained at a second threshold 422 of 50%, and a third CD value set 525 obtained at a third threshold 423 of 70%. Each CD value set is a vector of CD values, where the vector size is the number of measurement points. The CD value sets are further processed (e.g., calculating the mean and shifting the mean to "0") to obtain a δCD value for each threshold. For example, a first δCD value set 515a is obtained from the first CD value set 515, a second δCD value set 520a is obtained from the second CD value set 520, and a third δCD value set 525a is obtained from the third CD value set 525. In some embodiments, each δCD value set may be input as a mixed signal to the resolver module 320.

[0114] In some embodiments, the measurement point or measurement point (e.g., the point for measuring the CD value) may be on the same contact hole or different contact holes.

[0115] Figure 6 The diagram illustrates a block diagram of a decomposer module according to one embodiment, which decomposes measurement data associated with features to obtain error contributors. Decomposer module 320 decomposes measurement data, such as CD distribution data, to obtain error contributors, such as δCD, that cause changes in the CD distribution. MASK δCD RESIST and δCD SEM In some embodiments, the CD distribution data includes the δCD values ​​of the contact holes, such as the first, second, and third δCD value sets 515a-525a of the contact holes.

[0116] In some embodiments, the resolver module 320 is implemented using the ICA method, which at least refers to Figure 3 Let's discuss this in detail. As mentioned above, the ICA method may require N mixed signals to decompose them into N independent components. In some embodiments, since the LCDU data may include signals from three sources (e.g., δCD), MASK δCD RESIST and δCD SEMThe change in δCD values ​​results in three input signals 615, 620, and 625 being provided to the resolver module 320. The first input signal 615 may include a first δCD value set 515a, the second input signal 620 may include a second δCD value set 520a, and the third input signal 625 may include a third δCD value set 525a.

[0117] The resolver module 320 can process the first, second, and third δCD value sets 515a-525a of the contact hole (e.g., based on at least the above references). Figure 3 The ICA method described above determines a mixing matrix 613, which is a set of coefficients of a linear mixture represented by a first, second, and third δCD value set 515a-525a. In some embodiments, the mixing matrix 613 is similar to the mixing matrix (A) 313 shown in Equation 3 or 11. After obtaining the mixing matrix 613, the decomposer module 320 obtains the error contribution square based on the inverse 614 of the mixing matrix 613 and the first, second, and third δCD value sets 515a-525a as shown below. Note that in embodiments where the mixing matrix 613 is not a square matrix (e.g., the number of sensors is greater than the number of sources to be decomposed), the inverse 614 of the mixing matrix 613 may be a pseudo-inverse.

[0118]

[0119] Therefore, the resolver module 320 can determine the value of each error contributor based on Equation 12. The resolver module 320 can output the value of δCD. MASK δCD RESIST and δCD SEM The error contribution corresponds to three signals or datasets. For example, the first output signal or dataset may include the signal corresponding to δCD. MASK The value corresponding to the error contribution 601, the second output signal or dataset may include the value related to δCD. RESIST The error contribution 602 corresponds to the value, and the third output signal or dataset may include the value related to δCD. SEM The value corresponding to error contribution 603. Figure 6 In this diagram, the error contribution is shown as a graph. In some embodiments, each output dataset can be a vector, and the size of the vector can be the same as the size of the vector corresponding to the input mixed signals 615-625.

[0120] In some embodiments, the decomposer module 320 can determine a specific error contribution based on a single value (rather than based on a vector or anything other than a vector). For example, the decomposer module 320 can determine the average value of the values ​​in the first dataset 601 as δCD. MASK Error contribution.

[0121] In some embodiments, error contribution values ​​601-603 can be used to improve / optimize various aspects of the patterning process, such as source optimization, mask optimization, or optimal proximity correction. For example, based on δCD MASK Error contribution or δCD RESIST Error contribution refers to the ability to adjust one or more parameters of the mask / patterning apparatus or lithography equipment used for printing patterns so that the pattern printed on the substrate meets specified specifications. Adjustable parameters may include adjustable parameters of the light source, patterning apparatus, projection optics, dose, focus, design layout / pattern features, etc. Typically, optimizing or improving the patterning process involves adjusting one or more parameters until one or more cost functions associated with the process are minimized or meet specified specifications. The following references at least... Figures 13-16 Here are some examples to illustrate the optimization.

[0122] Although the above decomposition process uses CD distribution data such as the first, second, and third δCD value sets 515a-525a as input 615-625 to determine the error contributor, in some embodiments, the decomposition process may also use LCDU data as input 615-625 to obtain the error contributor.

[0123] Figure 7A and Figure 7B This is a graph of LCDU data for decomposing error contributors according to one embodiment. In some embodiments, LCDU is the 3σ value of the CD distribution. In some embodiments, the LCDU value can be obtained from the Focus Exposure Matrix (FEM) wafer using focus and dose values. Different parameters can be used as sensors to generate different mixed signals (e.g., they can be used as inputs to the resolver module 320). For example, the dose level can be used as a sensor, and different LCDU datasets can be obtained as input signals 615-625 for different dose levels (e.g., as shown in the image). Figure 7A (As shown in the curve graph).

[0124] like Figure 7A As shown in the graph, the first LCDU dataset 715 includes data with a first dose level (e.g., 45.60 mJ / cm). 2 The values ​​corresponding to the LCDU through focus are for the second LCDU dataset 720, which includes values ​​corresponding to the second dose level (e.g., 52.44 mJ / cm). 2 The LCDU values ​​corresponding to the third dose level (e.g., 59.2 mJ / cm²) are included in the third LCDU dataset 725. 2 The value corresponding to the LCDU cross-focus.

[0125] Each LCDU dataset can be represented as a linear mixture of three error contributors as shown in the following equation (e.g., a linear mixture of the CD distribution as in Equation 8-10).

[0126] LCDU1=a11*LCDU MASK +a12*LCDU RESIST +a13*LCDU SEM (13)

[0127] LCDU2=a21*LCDU MASK +a22*LCDU RESIST +a23*LCDU SEM (14)

[0128] LCDU3=a31*LCDU MASK +a32*LCDU RESIST +a33*LCDU SEM (15)

[0129] The LCDU datasets 715-725 above can be provided as inputs 615-625 to the decomposer module 320. The decomposer module 320 processes the first, second, and third LCDU datasets (e.g., based on at least the above references). Figure 3 The ICA method described herein, and similar to at least the referenced Figure 6 The first, second, and third δCD value sets (515a-525a) are used to determine the error contributors, such as LCDU. MASK LCDU RESIST and LCDU SEM (For example, similar to δCD) MASK Error contribution 601, δCD RESIST Error contribution 602 and δCD SEM Error contribution 603).

[0130] In another example, the white bright band values ​​in the SEM image can be used as a sensor (e.g., as at least a reference). Figure 4 (as described above), and for different threshold levels of the white bright band, different LCDU datasets can be obtained as input signals 615-625 (e.g., such as...). Figure 7B (As shown in the curve graph). Figure 7BAs shown in the graph, the first LCDU dataset 765 includes values ​​corresponding to LCDUs at a first threshold (e.g., 30%) of the bright white band, the second LCDU dataset 770 includes values ​​corresponding to LCDUs at a second threshold (e.g., 50%) of the bright white band, and the third LCDU dataset 775 includes values ​​corresponding to LCDUs at a third threshold (e.g., 70%) of the bright white band. Each LCDU dataset can be represented as a linear mixture of the three error contribution squares as shown in Equations 13-15, and can be input as inputs 615-625 to the decomposer module 320 to obtain the error contribution, such as LCDU. MASK LCDU RESIST and LCDU SEM .

[0131] In another example, the focus level can be used as a sensor, and different LCDU datasets can be obtained as input signals 615-625 for different focus levels. For example, a first LCDU dataset, a second LCDU dataset, and a third LCDU dataset can be obtained, wherein the first LCDU dataset includes LCDU values ​​of multiple dose values ​​at a first focus level, the second LCDU dataset includes LCDU values ​​of multiple dose values ​​at a second focus level, and the third LCDU dataset includes LCDU values ​​of multiple dose values ​​at a third focus level.

[0132] Figure 8A This is a flowchart of a process 800 for deriving the error contributions of multiple sources to a feature from measurements used to decompose a feature, according to one embodiment. In some embodiments, the feature of the pattern design may be a contact hole, and multiple such contact holes may be printed on a substrate. At operation 805, an image 801 of the pattern printed on the substrate is obtained. In some embodiments, image 801 may include SEM image 405. In some embodiments, image 801 is obtained using a tool such as SEM. In some embodiments, multiple images of the pattern may be obtained.

[0133] At operation 810, image 801 is used to obtain multiple measurements 811 of the features of the pattern. For example, measurements 811 may include CD distribution data (e.g., CD or δCD values) or LCDU data for multiple contact holes for different sensor values. Different parameters can be used as sensors. For example, a threshold associated with image 801 (such as a bright white band in image 801) can be used as a sensor, and measurements 811 for different thresholds of the bright white band may include, as at least, referenced... Figure 4 and Figure 5The first δCD value set 515a obtained at the first threshold 421 (e.g., 30% of the bright band), the second δCD value set 520a obtained at the second threshold 422 (e.g., 50% of the bright band), and the third δCD value set 525a obtained at the third threshold 423 (e.g., 70% of the bright band).

[0134] In another example, the dose level can be used as a sensor and the measurements 811 of different dose levels can include, at least, reference values. Figure 7A The first LCDU dataset 715 obtained for the first dose level, the second LCDU dataset 720 obtained for the second dose level, and the third LCDU dataset 725 obtained for the third dose level are described.

[0135] At operation 815, each measurement 811 is correlated with a linear mixture of multiple error contributions to generate multiple linear mixtures 816. In some embodiments, the error contributions use an ICA method (e.g., as at least referencing...). Figure 3 and 6 The description is used to derive this. Because there are three error contributors (e.g., δCD),... MASK δCD RESIST and δCD SEM Therefore, the decomposition process may require at least three distinct linear blending 816 values ​​because the ICA method has the constraint that the number of mixed signals required as input must equal the number of source components that need to be derived or decomposed from the input. Thus, it may be necessary to generate three distinct linear blending 816 values. In one example, the three distinct linear blending 816 values ​​may include a set of first, second, and third δCD values ​​515a-525a, which can be represented using Equations 8-10. In another example, the three distinct linear blending 816 values ​​may include a first, second, and third LCDU dataset 715-725, which can be represented using Equations 13-15.

[0136] At operation 820, error contribution 821 is derived from linear mixing 816. In some implementations, linear mixing 816 uses at least a reference... Figure 3 and Figure 6 The ICA method is described for decomposition. For example, a linear mixture 816 comprising the first, second, and third δCD value sets 515a-525a can be decomposed by providing them as inputs 615-625 to the decomposer module 320 (e.g., implemented using the ICA method) to derive the error contribution square 821, such as at least referencing Figure 6 The aforementioned mask error contribution (e.g., δCD) MASK Error contribution 601), resist error contribution (e.g., δCD) RESISTError contribution 602) and SEM error contribution (e.g., δCD) SEM Error contribution 603). In another example, the linear mixture 816 comprising the first, second, and third LCDU datasets 715-725 can be decomposed by providing them as inputs 615-625 to the decomposer module 320 to derive the error contribution formula 821, such as the mask error contribution (e.g., LCDU error contribution). MASK ), resist error contribution (e.g., LCDU) RESIST ) and SEM error contribution (e.g., LCDU) SEM ).

[0137] Figure 8B This is a flowchart of a process 850 for deriving error contributions from linear mixtures using ICA according to one embodiment. In some embodiments, process 850 is... Figure 8A This is performed as part of operation 820 of process 800. At operation 855, the linear mixture 816 is processed using the ICA method to determine a mixture matrix, such as mixture matrix 613, which is the set of coefficients of the linear mixture 816 represented by the first, second, and third δCD value sets 515a-525a. Mixture matrix 613 can be represented as shown in Equation 3 or 11. In some embodiments, mixture matrix 613 is at least as described in reference... Figure 3 and Figure 6 The description determines this.

[0138] At operation 860, the inverse of the mixing matrix A613 is determined, for example as shown in Equation 12, to obtain the unmixed matrix 614.

[0139] At operation 865, for example as shown in Equation 12, the error contribution 821 is derived from the linear mixture 816 using the unmixed matrix 614.

[0140] Figure 9 This is a flowchart of process 900 for obtaining measurement values ​​of the decomposition process of FIG8 according to one embodiment. In some embodiments, process 900 may be used as... Figure 8A This is performed as part of operation 810. At operation 905, the contour line 906 of the pattern's features is obtained. For example, contour line 906 may include the contour line of a contact hole in the SEM image 405. In some embodiments, any known number of methods can be used to determine the contour line of the contact hole. For example, thresholding techniques can be applied to the SEM image to obtain the contour line of the feature. In some embodiments, the thresholding technique can determine the contour line based on the variation of pixel values ​​in the grayscale SEM image; for example, pixels with values ​​that satisfy a specified threshold (e.g., a bright band value) can form the contour line of the feature. Figure 10 The outline of a feature obtained using such a technique is shown.

[0141] In some embodiments, contour line 906 is distorted due to the presence of noise (e.g., error contributions from multiple sources such as the mask, resist, and SEM) to produce different contour line heights, such as 906a, 906b, and 906c. In some embodiments, the distorted contour lines 906a-906c can be identified by thresholding the SEM image to different thresholds, and the CD value of contour line 906 can be obtained for different thresholds. For example, the SEM image 405 can be thresholded to a first threshold (e.g., at least with reference to...). Figure 4 and Figure 5 The outline 906a is identified by 30% of the white bright band value, and the SEM image 405 can be thresholded to a second threshold (e.g., at least with reference to...). Figure 4 and Figure 5 The outline 906b is identified by 50% of the white highlight value.

[0142] At operation 910, CD values ​​are obtained for different thresholds. For example, specifying a threshold of 1051 could be as follows: Figure 4 The curve 415 shows the first threshold 421 (e.g., 30% of the white bright band value), and the CD value can correspond to the first threshold 421.

[0143] The CD value can be obtained using any of several methods. Figure 10A method for obtaining the CD value of a contour line according to one embodiment is illustrated. In some embodiments, the CD value of contour line 906 is measured by defining cleaving lines (e.g., measurement points associated with contour line 906). For measurement, different cleaving lines are defined such that each cleaving line (e.g., cleaving line 1005) passes through contour line 906 in a direction perpendicular to contour line 906. Such cleaving lines can be applied to measure arbitrary contour lines of arbitrary shapes. Each cleaving line may extend to intersect contour line 906, which is referred to as a measurement point. A one-dimensional (1D) image (e.g., an SEM signal such as pixel value vs.x, which is the coordinates of a specific pixel from a specific reference point) is generated from cleaving line 1006 as shown in graph 1050. A specified threshold 1051 can be applied to the 1D image to obtain a setting dx for cleaving line 1005, which provides the CD value of contour line 906 with respect to the cleaving lines (e.g., measurement points) of the specified threshold 1051. In some embodiments, the 1D image undergoes different thresholds to obtain CD values ​​corresponding to different thresholds. For example, if the specified threshold 1051 is a first threshold 421 (e.g., 30% of the white highlight value), then setting dx could be the CD value corresponding to the first threshold 421 as shown in graph 415. In another example, if the specified threshold 1051 is a second threshold 422 (e.g., 50% of the white highlight value), then setting dx could be the CD value corresponding to the second threshold 422 as shown in graph 415. In another example, if the specified threshold 1051 is a third threshold 423 (e.g., 70% of the white highlight value), then setting dx could be the CD value corresponding to the third threshold 423 as shown in graph 415.

[0144] At the end of operation 910, different CD values ​​(e.g., three different CD values) corresponding to different thresholds (e.g., three different thresholds 420-422) can be obtained for a specific cutting line (or measurement point). In some embodiments, operations 905 and 910 are repeated for a finite number of iterations (e.g., a user-defined number of times) to obtain a CD value for each threshold for a finite number of measurement points (e.g., cutting lines). Measurement points can be in the same contact hole or different contact holes. At the end of the finite number of iterations of 905 and 910, a different set of CD values ​​is created. For example, the following values ​​with CD values ​​for each measurement point are created: [and such as...] Figure 5 The first CD value set 515 corresponding to the first threshold 421 of 30%, the second CD value set 520 corresponding to the second threshold 422 of 50%, and the third CD value set 525 corresponding to the third threshold 423 of 70%.

[0145] At operation 915, the average value of the CD is determined at 916. The CD values ​​may include those obtained in operation 910, such as the first, second, and third CD value sets 515-525.

[0146] At operation 920, the average value 916 can be shifted to a specified value (e.g., "0"). In some embodiments, shifting the average value 916 to the specified value may include subtracting the difference between the average value 916 and the specified value from each CD value.

[0147] At operation 925, the δCD value is obtained for each CD value in the first, second, and third CD value sets 515-525. For example, Figure 5 The first δCD value set 515a corresponding to the first threshold 421 is obtained from the first CD value set 515, the second δCD value set 520a corresponding to the first threshold 421 is obtained from the second CD value set 515, and the third δCD value set 525a corresponding to the first threshold 421 is obtained from the third CD value set 515.

[0148] In some embodiments, after obtaining the first, second, and third δCD value sets 515a-525a, process 900 may return to operation 815 of process 800.

[0149] Figure 11 Embodiments of scanning electron microscope (SEM) tools according to various embodiments are depicted. In some embodiments, the inspection device may be an SEM that produces images of structures (e.g., some or all of the structure of a device) exposed or transferred on a substrate. A primary electron beam EBP emitted from an electron source ESO is converged by a focusing lens CL and then passes through beam deflectors EBD1, E×B deflectors EBD2, and an objective lens OL to illuminate a substrate PSub on a substrate stage ST at the focal point.

[0150] When the substrate PSub is irradiated with an electron beam EBP, secondary electrons are generated from the substrate PSub. These secondary electrons are deflected by an E×B deflector EBD2 and detected by a secondary electron detector SED. A two-dimensional electron beam image can be obtained by detecting electrons generated from the sample, either synchronously with a two-dimensional scan of the electron beam by a beam deflector EBD1 in the X or Y direction, or synchronously with repeated scans of the electron beam EBP by a beam deflector EBD1; and by continuous movement of the substrate PSub in the other direction (X or Y) by a substrate stage ST.

[0151] The signal detected by the secondary electronic detector (SED) is converted into a digital signal by an analog-to-digital (A / D) converter (ADC), and the digital signal is sent to the image processing system (IPU). In one embodiment, the image processing system IPU may have a memory (MEM) for storing all or part of the digital images processed by the processing unit (PU). The processing unit (PU) (e.g., specially designed hardware or a combination of hardware and software) is configured to convert or process the digital images into a dataset representing the digital images. Furthermore, the image processing system IPU may have a storage medium (STOR) configured to store the digital images and corresponding datasets in a reference database. A display device (DIS) may be connected to the image processing system IPU, allowing an operator to perform necessary operations of the device with the aid of a graphical user interface.

[0152] Figure 12 Another embodiment of the inspection apparatus is schematically illustrated. The system is used to inspect a sample 90 (such as a substrate) on a sample stage 89 and includes a charged particle beam generator 81, a focusing lens module 82, a probe forming objective module 83, a charged particle beam deflection module 84, a secondary charged particle detector module 85, and an image forming module 86.

[0153] Charged particle beam generator 81 generates a primary charged particle beam 91. A focusing lens module 82 focuses the generated primary charged particle beam 91. A probe forming objective module 83 focuses the focused primary charged particle beam into a charged particle beam probe 92. A charged particle beam deflection module 84 causes the formed charged particle beam probe 92 to scan across the surface of a region of interest on a sample 90 fixed on a sample stage 89. In one embodiment, the charged particle beam generator 81, the focusing lens module 82, and the probe forming objective module 83, or equivalent designs, alternatives, or any combination thereof, together form a charged particle beam probe generator for generating the scanning charged particle beam probe 92.

[0154] The secondary charged particle detector module 85 detects secondary charged particles 93 emitted from the sample surface when bombarded by the charged particle beam probe 92 (and may also include other reflected or scattered charged particles from the sample surface) to generate a secondary charged particle detection signal 94. An image forming module 86 (e.g., a computing device) is coupled to the secondary charged particle detector module 85 to receive the secondary charged particle detection signal 94 from the secondary charged particle detector module 85 and thus form at least one scan image. In one embodiment, the secondary charged particle detector module 85 and the image forming module 86, or their equivalents, alternatives, or any combination thereof, together form an image forming apparatus that forms a scan image based on the detected secondary charged particles emitted from the sample 90 bombarded by the charged particle beam probe 92.

[0155] As mentioned above, SEM images can be processed to extract contour lines describing the edges of objects representing device structures. These contour lines are then quantized via metrics such as edge-to-edge distance (CD). Therefore, typically, images of device structures are compared and quantized using simple metrics such as edge-to-edge distance (CD) or simple pixel differences between images. Typical contour line models for detecting the edges of objects in an image to measure CD use image gradients. In practice, these models rely on strong image gradients. However, in reality, images are often noisy and have discontinuous boundaries. Techniques such as smoothing, adaptive thresholding, edge detection, erosion, and dilation can be used to process the results of image gradient contour line models to address noisy and discontinuous images, but ultimately result in low-resolution quantization of high-resolution images. Therefore, in most cases, the mathematical processing performed on images of device structures to reduce noise and automate edge detection leads to a loss of image resolution, resulting in a loss of information. Thus, the result is low-resolution quantization, which is equivalent to a simplified representation of a complex high-resolution structure.

[0156] Therefore, it is desirable to have a mathematical representation of a structure (e.g., circuit features, alignment marks, or measurement target portions (e.g., grating features)) generated or anticipated using a patterning process, such that the mathematical representation maintains resolution and still describes the overall shape of the structure, regardless of whether the structure is in a potential resist image, in a developed resist image, or as a layer transferred to a substrate, for example, by etching. In the case of photolithography or other patterning processes, the structure can be a device being manufactured or a portion thereof, and the image can be an SEM image of the structure. In some cases, the structure can be a feature of a semiconductor device (e.g., an integrated circuit). In some cases, the structure can be: an alignment mark or a portion thereof (e.g., a grating of alignment marks) used in an alignment measurement process to determine the alignment of an object (e.g., a substrate) with another object (e.g., a patterning apparatus), or the structure can be a measurement target or a portion thereof (e.g., a grating of a measurement target) used to measure parameters of the patterning process (e.g., overlap, focus, dose, etc.). In one embodiment, the measurement target is a diffraction grating used for measurement (e.g., overlap).

[0157] In one embodiment, according to Figure 3Measurement data (e.g., random variations) determined by the method and relating to the printed pattern can be used to optimize the patterning process or adjust the parameters of the patterning process. As an example, OPC addresses the fact that the final size and placement of an image of a design layout projected onto a substrate will be different from or only related to the size and placement of the design layout on the patterning apparatus. It should be noted that the terms “mask,” “patterning plate,” and “patterning apparatus” are used interchangeably herein. Similarly, those skilled in the art will recognize that, particularly in the context of lithography simulation / optimization, the terms “mask” / “patterning apparatus” and “design layout” are used interchangeably, as in lithography simulation / optimization, a physical patterning apparatus is not required, but rather a design layout can be used to represent the physical patterning apparatus. For small feature sizes and high feature densities present on certain design layouts, the position of a particular edge of a given feature will be influenced to some extent by the presence or absence of other adjacent features. These proximity effects are caused by small amounts of radiation or non-geometric optical effects (such as diffraction and interference) coupled from one feature to another. Similarly, proximity effects can be caused by post-exposure baking (PEB), resist development, and diffusion and other chemical effects during etching, which typically occurs after photolithography.

[0158] To ensure that the projected image of a design layout conforms to the requirements of a given target circuit design, complex numerical models, corrections, or pre-distortions of the design layout are needed to predict and compensate for proximity effects. The article “Full-Chip Lithography Simulation and Design Analysis—How OPC Is Changing IC Design”, C. Spence, Proc. SPIE, Vol. 5751, pp. 1-14 (2005) provides an overview of current “model-based” optical proximity correction processes. In typical high-end designs, almost every feature of the design layout undergoes some modification to achieve high fidelity of the projected image to the target design. These modifications can include shifting or offsetting edge positions or linewidths, as well as the application of “auxiliary” features designed to assist the projection of other features.

[0159] Assuming that chip design typically involves millions of features, the application of model-based OPC in the target design involves well-developed process models and considerable computational resources. However, applying OPC is generally not an "exact science" but rather an iterative process of experience that does not always compensate for all possible proximity effects. Therefore, the effectiveness of OPC needs to be validated through design checks (i.e., intensive full-chip simulations using calibrated numerical process models) (e.g., design layout after applying OPC and any other RETs) to minimize the possibility of design flaws built into the patterning device pattern. This is driven by the enormous cost of manufacturing high-end patterning devices operating in the billions of dollars range, and the turnaround time impact of reworking or repairing the actual patterning device once it is manufactured.

[0160] Both OPC and full-chip RET verification can be based on digital modeling systems and methods, such as those described in U.S. Patent Application No. 10 / 815,573 and “Optimized Hardware and Software For Fast, Full Chip Simulation”, by Y. Cao et al., Proc. SPIE, Vol. 5754,405 (2005).

[0161] A RET relates to adjustments for global deviations in the design layout. Global deviation is the difference between the pattern in the design layout and the pattern intended to be printed on the substrate. For example, a circular pattern with a diameter of 25 nm can be printed on the substrate at a high dose using either a pattern with a diameter of 50 nm in the design layout or a pattern with a diameter of 20 nm in the design layout.

[0162] In addition to optimizing the design layout or patterning apparatus (e.g., OPC), the illumination source can be optimized jointly or separately with the patterning apparatus to improve overall lithographic fidelity. The terms "illumination source" and "source" are used interchangeably herein. Since the 1990s, numerous off-axis illumination sources, such as toroidal, quadrupole, and dipole sources, have been introduced, providing greater freedom in OPC design and thus improving imaging results. Off-axis illumination is a proven method for addressing fine structures (i.e., target features) contained within patterning apparatuses. However, off-axis illumination sources typically provide lower radiant intensity for spatial imaging (AI) compared to conventional illumination sources. Therefore, it is desirable to optimize the illumination source to achieve an optimal balance between finer resolution and reduced radiant intensity.

[0163] For example, several source optimization methods can be found in Rosenbluth et al.'s article "Optimum Mask and Source Patterns to Print A Given Shape", Journal of Microlithography, Microfabrication, Microsystems 1(1), pp.13-20, (2002). The source is divided into several regions, each corresponding to a region of the pupil spectrum. Then, it is assumed that the source distribution is uniform in each source region, and the brightness of each region is optimized for the process window. However, this assumption that the source distribution is uniform in each source region is not always valid, and therefore the effectiveness of the method is compromised. In another example described in Granik's article "Source Optimization for Image Fidelity and Throughput", Journal of Microlithography, Microfabrication, Microsystems 3(4), pp.509-522, (2004), several existing source optimization methods are outlined, and a method based on illuminator pixels is proposed, which transforms the source optimization problem into a series of non-negative least squares optimizations. While these methods have demonstrated some success, they typically require numerous complex iterations to converge. Additionally, determining the appropriate / optimal values ​​for some extra parameters, such as γ in the Granik method, which indicates a trade-off between optimizing the source for substrate image fidelity and the source smoothness requirements, can be difficult.

[0164] For low-k1 lithography, optimization of the source and patterning apparatus is useful for ensuring a feasible process window for projecting critical circuit patterns. Some algorithms (e.g., Socha et al. Proc. SPIE vol.5853, 2005, p.180) discretize the illumination into individual source points and the mask into diffraction orders in the spatial frequency domain, and separately formulate the cost function (defined as a function of selected design variables) based on process window metrics such as exposure latitude, which can be predicted by an optical imaging model based on the source point intensity and the diffraction order of the patterning apparatus. The term "design variables" as used herein includes the set of parameters of the lithography projection apparatus or the lithography process, such as user-adjustable parameters of the lithography projection apparatus or image characteristics that the user can adjust by adjusting these parameters. It should be understood that any characteristic of the lithography projection process (including the characteristics of the source, patterning apparatus, projection optics, or resist) can be included in the optimized design variables. The cost function is typically a nonlinear function of the design variables. Standard optimization techniques are then used to minimize the cost function.

[0165] Correspondingly, the increasing pressure of shrinking design rules has driven semiconductor wafer manufacturers deeper into the era of low-k1 lithography, with the existing 193nm ArF lithography. The move towards lower-k1 lithography places significant demands on RETs, exposure tools, and lithography-friendly designs. 1.35ArF super numerical aperture (NA) exposure tools can be used in the future. To help ensure that circuit designs can be generated on substrates with working process windows, source patterning device optimization (referred to herein as source mask optimization or SMO) is becoming an important RET for the 2x nm node.

[0166] A method and system for optimizing source and pattern forming apparatus (design layout) is described in co-assigned international patent application No. PCT / US2009 / 065359, filed on November 20, 2009 and published as WO2010 / 059954, which allows for simultaneous optimization of source and pattern forming apparatus using a cost function without constraints and within a practical timeframe. The entire contents of that patent application are incorporated herein by reference.

[0167] The same U.S. Patent Application No. 12 / 813456, filed June 10, 2010, which is jointly assigned, describes a method and system for optimizing a source by adjusting the pixels of the source, and is published as U.S. Patent Application Publication No. 2010 / 0315614 entitled “Source-Mask Optimization in Lithographic Apparatus”, the entire contents of which are incorporated herein by reference.

[0168] In a photolithography projection device, as an example, the cost function is expressed as:

[0169]

[0170] Where (z1, z2, ..., z) N ) represents N design variables or their values. p (z1, z2, ..., z N ) can be design variables (z1, z2, ..., z N Functions for design variables (z1, z2, ..., z) N The difference between the actual and expected values ​​of the characteristic at the evaluation point. p Is with f p (z1, z2, ..., z NThe associated weighting constant. Evaluation points or patterns that are more critical than other evaluation points or patterns can be assigned a higher weighting constant. p Value. Patterns or evaluation points that occur more frequently can also be assigned a higher w value. p Value. Examples of evaluation points can be any physical point or pattern on the substrate, any point on a virtual design layout, a resist image, a spatial image, or a combination thereof. p (z1, z2, ..., z N It can also be a function of one or more random effects, such as LWR, where the function of the design variables (z1, z2, ..., z...) is the design variable. N The cost function can represent any suitable characteristic of the lithography projection device or substrate, such as feature failure rate, focus, CD, image offset, image distortion, image rotation, random effects, yield, CDU, or a combination thereof. CDU is the local CD variation (e.g., three times the standard deviation of the local CD distribution). CDU is interchangeably referred to as LCDU. In one embodiment, the cost function represents (i.e., is a function of) CDU, yield, and random effects. In another embodiment, the cost function represents EPE, yield, and random effects (i.e., is a function of EPE, yield, and random effects). In one embodiment, the design variables (z1, z2, ..., z...) N This includes the dose, the overall deviation of the patterning apparatus, the shape of the irradiation from the source, or a combination thereof. Since the resist image typically indicates a circuit pattern on the substrate, the cost function usually includes functions representing some characteristics of the resist image. For example, f at such an evaluation point... p (z1, z2, ..., z N It can simply be the distance between a point in the resist image and the expected location of that point (i.e., edge placement error EPE). p (z1, z2, ..., z N The design variables can be any adjustable parameters, such as those of the light source, patterning apparatus, projection optics, dose, focus, etc. The projection optics can include components collectively referred to as "wavefront manipulators," which can be used to adjust the shape and intensity distribution or phase shift of the wavefront of the radiation beam. Preferably, the projection optics can adjust the wavefront and intensity distribution at any location along the optical path of the lithography projection apparatus, such as before the patterning apparatus, near the pupil plane, near the image plane, or near the focal plane. The projection optics can be used to correct or compensate for certain distortions in the wavefront and intensity distribution caused by, for example, temperature variations in the source, patterning apparatus, lithography projection apparatus, or thermal expansion of components of the lithography projection apparatus. Adjusting the wavefront and intensity distribution can change the values ​​of the evaluation points and the cost function. Such changes can be based on model simulation or actual measurements. Of course, CF(z1, z2, ..., z...)N (This is not limited to the form in Equation 1.) CF(z1, z2, ..., z N () can be any other suitable form.

[0171] It should be understood that f p (z1, z2, ..., z N The standard weighted root mean square (RMS) is defined as follows: Therefore, f p (z1, z2, ..., z N The weighted RMS minimization is equivalent to minimizing the cost function defined in Equation 1. Minimize. Therefore, for the sake of symbolic simplicity in this paper, f p (z1, z2, ..., z N The weighted RMS and Equation 1 can be used interchangeably.

[0172] Furthermore, if we consider maximizing the process window (PW), then the same physical location from different PW conditions can be regarded as different evaluation points in the cost function (Equation 1). For example, if we consider N PW conditions, we can classify the evaluation points according to their PW conditions and write the cost function as:

[0173]

[0174] in It is f p (z1, z2, ..., z N The value of f under the u-th PW condition, where u = 1, ..., U. p (z1, z2, ..., z N When the substrate EPE is 0, minimizing the above cost function is equivalent to minimizing the edge offset under various PW conditions, which leads to maximizing the PW. Specifically, if the PW also consists of different mask offsets, minimizing the above cost function also includes minimizing the MEEF (mask error enhancement factor), which is defined as the ratio between the substrate EPE and the resulting mask edge deviation.

[0175] Design variables can have constraints, which can be represented as (z1, z2, ..., z...). NLet Z be the set of possible values ​​for the design variables. A possible constraint on the design variables can be imposed by the yield or desired output of the lithography projection equipment. The desired yield or output may limit the dose and thus affect stochastic effects (e.g., impose a lower bound on stochastic effects). Higher output generally results in lower dose, shorter exposure times, and larger stochastic effects. Higher yield generally results in a constrained design that is sensitive to stochastic risks. Since stochastic effects are a function of the design variables, minimizing substrate yield, yield, and stochastic effects can constrain the possible values ​​of the design variables. Without such a constraint imposed by the desired output, optimization can produce an unrealistic set of values ​​for the design variables. For example, if the dose is in the design variables, without such a constraint, optimization could yield dose values ​​that make the output economically impossible. However, the validity of the constraint should not be interpreted as necessary. Output can be affected by adjusting the parameters of the patterning process based on the failure rate. It is desirable to maintain a high output while having a low failure rate. Output can also be affected by the resist chemistry. Slower photoresists (e.g., those requiring more light to be properly exposed) result in lower yields. Therefore, appropriate parameters for the patterning process can be determined based on an optimization process involving the feature failure rate due to photoresist chemistry or fluctuations, and the dosage requirements for higher yields.

[0176] Therefore, the optimization process involves finding the set of design variable values ​​that minimizes the cost function under constraints, i.e., finding...

[0177]

[0178] Figure 13The figure illustrates an overall method for optimizing a photolithography projection apparatus according to one embodiment. The method includes step S1202: defining a multivariate cost function with multiple design variables. The design variables may include any suitable combination selected from characteristics of the illumination source (1200A) (e.g., pupil fill ratio, i.e., the percentage of radiation from the source passing through the pupil or aperture), characteristics of the projection optics (1200B), and characteristics of the design layout (1200C). For example, the design variables may include characteristics of the illumination source (1200A) and characteristics of the design layout (1200C) (e.g., global bias), but not characteristics of the projection optics (1200B) that cause SMO. Alternatively, the design variables may include characteristics of the illumination source (1200A), characteristics of the projection optics (1200B), and characteristics of the design layout (1200C), which leads to source-mask-lens optimization (SMLO). In step S1204, the design variables are simultaneously adjusted such that the cost function shifts toward convergence. In step S1206, it is determined whether a predetermined termination condition is met. The predetermined termination condition can include various possibilities, namely, the cost function can be minimized or maximized according to the needs of the numerical technique used, the value of the cost function is equal to or exceeds a threshold, the value of the cost function has reached a preset error limit, or a preset number of iterations has been reached. If any of the conditions in step S1206 are met, the method ends. If any of the conditions in step S1206 are not met, steps S1204 and S1206 are repeated until the desired result is obtained. Optimization does not necessarily result in a single set of values ​​for the design variables, as there may be physical constraints caused by factors such as failure rate, pupil fill factor, resist chemistry, yield, etc. Optimization can provide multiple sets of values ​​for the design variables and associated performance characteristics (e.g., yield), and allows the user of the lithography equipment to select one or more sets.

[0179] In photolithography projection equipment, the source, patterning apparatus, and projection optics can be optimized alternately (referred to as alternating optimization) or simultaneously (referred to as simultaneous optimization). The terms "simultaneously," "at the same time," "in conjunction," and "in conjunction" as used herein refer to design variables, or any other design variables, that allow the characteristics of the source, patterning apparatus, and projection optics to change simultaneously. The terms "alternatingly" and "alternatingly" as used herein refer to designs that do not allow all design variables to change simultaneously.

[0180] exist Figure 14 In this process, the optimization of all design variables is performed simultaneously. Such a process can be called a simultaneous process or a joint optimization process. Alternatively, such as... Figure 14As shown, the optimization of all design variables can be performed alternately. In this process, at each step, some design variables are fixed while others are optimized to minimize the cost function; then in the next step, a different set of variables is fixed while others are optimized to minimize the cost function. These steps are performed alternately until convergence or some termination condition is met.

[0181] like Figure 14 As shown in the non-limiting example flowchart, firstly, the design layout (step S1302) is obtained. Then, in step S1304, a source optimization step is performed, where all design variables of the illumination source are optimized (SO) to minimize the cost function while all other design variables are fixed. Then, in the next step S1306, mask optimization (MO) is performed, where all design variables of the patterning apparatus are optimized to minimize the cost function while all other design variables are fixed. These two steps are performed alternately until certain termination conditions are met in step S1308. Various termination conditions can be used, such as the cost function value becoming equal to a threshold, the cost function value exceeding a threshold, the cost function value reaching a preset error limit, or reaching a preset number of iterations. Note that SO-MO-alternating optimization is used as an example of an alternating process. Alternating processes can take many different forms, such as SO-LO-MO-alternating optimization, where SO, LO (lens optimization), and MO are performed alternately and iteratively; or, SMO can be performed once first, followed by LO and MO performed alternately and iteratively, etc. Finally, the optimization result is output in step S1310, and the process stops.

[0182] As previously described, the pattern selection algorithm can be combined with simultaneous or alternating optimization. For example, when alternating optimization is used, a full-chip SO can be performed first, "hot spots" or "warm spots" are identified, and then an MO can be performed. Given this disclosure, various permutations and combinations of sub-optimizations are possible to achieve the desired optimization results.

[0183] Figure 15AAn exemplary method for optimization is shown, in which the cost function is minimized. In step S502, initial values ​​for the design variables are obtained, including their tuning ranges (if any). In step S504, a multivariate cost function is established. In step S506, for the first iteration step (i = 0), the cost function is expanded within a sufficiently small neighborhood around the initial values ​​of the design variables. In step S508, standard multivariate optimization techniques are applied to minimize the cost function. Note that constraints, such as tuning ranges, may be applied during the optimization process in S508 or at a later stage of the optimization process. Step S520 instructs that each iteration be performed on a given test pattern (also referred to as a “baseline quantity”) of identified evaluation points selected to optimize the lithography process. In step S510, the lithography response is predicted. In step S512, the result of step S510 is compared with the expected or ideal lithography response value obtained in step S522. If the termination condition is met in step S514, i.e., the optimized lithographic response value is generated sufficiently close to the desired value, then the final value of the design variable is output in step S518. The output step may also include using the final value of the design variable to output other functions, such as outputting a wavefront aberration adjustment map at the pupil plane (or other plane), an optimized source image, and an optimized design layout. If the termination condition is not met, then in step S516, the value of the design variable is updated using the result of the i-th iteration, and the process returns to step S506. The following details this process. Figure 15A The process.

[0184] In the exemplary optimization process, besides f p (z1, z2, ..., z N Sufficiently smooth (e.g., having a first derivative) Apart from this, there are no assumptions or approximations regarding the design variables (z1, z2, ..., z). N ) and f p (z1, z2, ..., z N The relationship between these parameters is generally valid in photolithography projection equipment. Algorithms such as the Gauss-Newton algorithm, the Levenberg-Marquardt algorithm, gradient descent, simulated descent, and genetic algorithms can be applied to find this relationship.

[0185] Here, the Gauss-Newton algorithm is used as an example. The Gauss-Newton algorithm is an iterative method applicable to general nonlinear multivariable optimization problems. In the design variables (z1, z2, ..., z...),... N Take the value (z) 1i , z 2i , ..., z Ni In the i-th iteration of ), the Gauss-Newton algorithm reaches (z) 1i, z 2i , ..., z Ni The area near (z1, z2, ..., z) will be (z1, z2, ..., z). N Linearize, then calculate (z) 1i , z 2i , ..., z Ni The values ​​near ) give CF(z1, z2, ..., z) N The minimum value of (z) 1(i+1) , z 2(i+1) , ..., z N(i+1) Design variables (z1, z2, ..., z) N The value (z) is taken in the (i+1)th iteration. 1(i+1) , z 2(i+1) , ..., z N(i+1) The iteration continues until convergence is reached (i.e., CF(z1, z2, ..., z...)). N (It will not decrease further) or reach the preset number of iterations.

[0186] Specifically, in the i-th iteration, in (z 1i , z 2i , ..., z Ni )nearby,

[0187]

[0188] According to the approximation in Equation 3, the cost function becomes:

[0189]

[0190] These are the design variables (z1, z2, ..., z). N A quadratic function of . Except for the design variables (z1, z2, ..., z...). N Except for ), each term is a constant.

[0191] If the design variables are (z1, z2, ..., z...) N If there are no constraints, then (z) can be derived by solving the following N linear equations. 1(i+1) , z 2(i+1) , ..., z N(i+1) ): Where N = 1, 2, ... N.

[0192] If the design variables are (z1, z2, ..., z...) N In the form of J inequalities (e.g., (z1, z2, ..., z...), ... N Under the constraint of the tuning range of ), for j = 1, 2, ... J, The form; and under constraints in the form of K equations (e.g., independence between design variables), for k = 1, 2, ... K, The optimization process becomes a classic quadratic programming problem, where A nj B j C nk D k It is a constant. Additional constraints can be imposed on each iteration. For example, the "damping factor" Δ D It can be introduced to limit (z) 1(i+1) , z 2(i+1) , ..., z N(i+1) ) and (z 1i , z 2i , ..., z Ni The difference between z and z makes equation 3 hold. This constraint can be expressed as z ni -Δ D ≤z n ≤z ni +Δ D The method described, for example, by Jorge Nocedal and Stephen J. Wright in Numerical Optimization (2nd Edition) (Berlin / New York: Vandenberg, Cambridge University Press) can be used to derive (z 1(i+1) , z 2(i+1) , ..., z N(i+1) ).

[0193] Instead of f p (z1, z2, ..., z N Minimizing the RMS of the cost function, the optimization process can minimize the magnitude of the maximum deviation (worst-case defect) between evaluation points to their expected value. In this method, the cost function can alternatively be expressed as:

[0194]

[0195] Among them CL p For f p (z1, z2, ..., z N The maximum allowable value of the cost function is given. This cost function represents the worst-case defect at the evaluation point. Optimization using this cost function minimizes the magnitude of the worst-case defect. An iterative greedy algorithm can be used for this optimization.

[0196] The cost function in Equation 5 can be approximated as:

[0197]

[0198] Where q is an even positive integer, such as at least 4, preferably at least 10. Equation 6 simulates the behavior of Equation 5, while allowing for analytical optimization and acceleration through the use of methods such as deepest descent and conjugate gradient methods.

[0199] Minimizing the worst-case defect size can also be related to f. p (z1, z2, ..., z N The linearization of f is combined. Specifically, as in Equation 3, the approximation of f is... p (z1, z2, ..., z N Then, the constraint on the worst-case defect size is written as inequality E. Lp ≤f p (z1, z2, ..., z N )≤E Up E Lp and E Up It is specified for f p (z1, z2, ..., z N These are two constants representing the minimum and maximum permissible deviations. Inserting Equation 3, these constraints are transformed into the following equation, where p = 1, ..., P.

[0200]

[0201] as well as

[0202]

[0203] Since equation 3 is usually only in (z) 1i , z 2i , ..., z Ni The constraint E is valid in the vicinity of the constraint E; if it is in the vicinity of the constraint E, the desired constraint E cannot be achieved. Lp ≤f p (z1, z2, ..., z N )≤E Up (This can be determined by any conflict in the inequalities), then the constant E Lp and E Up This can be relaxed until the constraint is fulfilled. The optimization process makes (z...) 1i , z 2i , ..., z Ni The worst-case defect size near the given location is minimized. Then, the worst-case defect size is gradually reduced in each step, and each step is performed iteratively until some termination condition is met. This will result in an optimal reduction of the worst-case defect size.

[0204] Another way to minimize the worst-case defect is to adjust the weights w in each iteration. pFor example, after the i-th iteration, if the r-th evaluation point is the worst defect, then w can be increased in the (i+1)-th iteration. r This gives higher priority to reducing the defect size at this evaluation point.

[0205] Additionally, the cost functions in Equations 4 and 5 can be modified by introducing Lagrange multipliers to achieve a trade-off between optimizing the RMS of defect size and optimizing the worst-case defect size, i.e.,

[0206]

[0207] Here, λ is a preset constant representing the trade-off between RMS optimization for the specified defect size and worst-case defect size optimization. Specifically, if λ = 0, this becomes Equation 4, and only the RMS of the defect size is minimized; if λ = 1, this becomes Equation 5, and only the worst-case defect size is minimized; if 0 < λ < 1, both are considered in the optimization. Such optimization can be solved using various methods. For example, similar to what was previously described, the weights in each iteration can be adjusted. Alternatively, similar to minimizing the worst-case defect size from the inequality, the inequality between Equations 6' and 6" can be viewed as a constraint on the design variables during the solution of the quadratic programming problem. Then, the bounds on the worst-case defect size can be incrementally relaxed, or the weight of the worst-case defect size can be incrementally increased, the cost function value can be calculated for each achievable worst-case defect size, and the design variable value that minimizes the total cost function can be chosen as the initial point for the next step. By doing this iteratively, the minimization of this new cost function can be achieved.

[0208] Optimizing the lithography projection apparatus can expand the process window. A larger process window provides greater flexibility in process and chip design. The process window can be defined as a set of focus and dose values ​​for which the resist image is within certain limits of the design goals of the resist image. Note that all methods discussed herein can also be extended to a generalized process window definition, which can be established by different or additional fundamental parameters besides exposure dose and defocus. These can include, but are not limited to, optical settings such as NA, σ, aberrations, polarization, or optical constants of the resist layer. For example, as previously mentioned, if the PW also includes different mask biases, optimization includes minimizing the MEEF (mask error enhancement factor), which is defined as the ratio between the substrate EPE and the resulting mask edge deviation. The process window defined on focus and dose values ​​is merely an example of this disclosure. The following describes a method for maximizing the process window according to one embodiment.

[0209] In the first step, starting from the known conditions (f0, ε0) in the process window, where f0 is the nominal focus and ε0 is the nominal dose, one of the cost functions is minimized near the following terms.

[0210]

[0211] or

[0212]

[0213] or

[0214]

[0215] If nominal focus f0 and nominal dose f0 are allowed to shift, they can be used with design variables (z1, z2, ..., z...). N Joint optimization. In the next step, if a value set (z1, z2, ..., z...) can be found... N If f, ε) makes the cost function within a preset limit, then (f0 ± Δ) f , ε0±Δε) is accepted as part of the process window.

[0216] Alternatively, if focus and dose shifts are not permitted, then the design variables (z1, z2, ..., z...) are... N The optimization is performed where the focus and dose are fixed at the nominal focus f0 and nominal dose ε0. In an alternative embodiment, if a set of values ​​(z1, z2, ..., z...) can be found... N If the cost function is within a preset limit, then (f0±Δ) f , ε0±Δε) is accepted as part of the process window.

[0217] The methods previously described in this disclosure can be used to minimize the corresponding cost functions of equations 7, 7', or 7"". If the design variables are characteristics of the projection optics, such as Zernike coefficients, minimizing the cost function of equations 7, 7', or 7" results in maximization of the process window based on projection optics optimization (i.e., LO). If the design variables are characteristics of the source and patterning apparatus other than the characteristics of the projection optics, minimizing the cost function of equations 7, 7', or 7" results in maximization of the process window based on SMLO, such as... Figure 14 As explained. If the design variables are characteristics of the source and patterning apparatus, minimizing the cost function of Equation 7, 7', or 7" will result in maximizing the SMO-based process window. The cost function of Equation 7, 7', or 7" may also include at least one f from Equation 7 or 8. p (z1, z2, ..., z N ), f p (z1, z2, ..., zN ) is a function of one or more random effects (such as LWR of 2D features, or local CD variation and output).

[0218] Figure 16 This illustrates a concrete example of how the simultaneous SMLO process can be optimized using the Gauss-Newton algorithm. In step S702, the initial values ​​of the design variables are identified. The tuning range for each variable can also be identified. In step S704, the cost function is defined using the design variables. In step S706, the cost function is expanded around the initial values ​​of all evaluation points in the design layout. In optional step S710, a full-chip simulation is performed to cover all critical patterns in the full-chip design layout. In step S714, the desired lithographic response metrics (such as CD or EPE) are obtained, and in step S712, the desired lithographic response metrics are compared with the predicted values ​​of these quantities. In step S716, the process window is determined. Steps S718, S720, and S722 are similar to those in the reference... Figure 15A The corresponding steps S514, S516, and S518 are described above. As previously mentioned, the final output can be a wavefront aberration map in the pupil plane, which is optimized to produce the desired imaging performance. The final output can also be an optimized source image or an optimized design layout.

[0219] Figure 15B An exemplary method for optimizing the cost function is shown, where design variables (z1, z2, ..., z...) are used. N This includes design variables that can be expressed using only discrete values.

[0220] The method begins by defining pixel groups of the irradiation source and pattern forming apparatus blocks of the pattern forming apparatus (step S802). Typically, pixel groups or pattern forming apparatus blocks can also be referred to as the division of photolithography process components. In an exemplary method, substantially as described above, the irradiation source is divided into 117 pixel groups, and 94 pattern forming apparatus blocks are defined for the pattern forming apparatus, resulting in a total of 211 divisions.

[0221] In step S804, a lithography model is selected as the basis for lithography simulation. The lithography simulation produces results for calculating lithography parameters or responses. Specific lithography parameters are defined as performance parameters to be optimized (step S806). In step S808, initial (pre-optimized) conditions for the illumination source and pattern forming apparatus are set. Initial conditions include the initial state of the pixel group of the illumination source and the pattern forming apparatus block of the pattern forming apparatus, allowing reference to the initial illumination shape and the initial pattern forming apparatus pattern. Initial conditions may also include mask bias, NA, and focus ramp range. Although steps S802, S804, S806, and S808 are described as sequential steps, it should be understood that in other embodiments of the invention, these steps may be performed in other sequences.

[0222] In step S810, pixel groups and pattern forming apparatus blocks are sorted. Pixel groups and pattern forming apparatus blocks can be interleaved during sorting. Various sorting methods can be used, including: sequentially (e.g., from pixel group 1 to pixel group 117, and from pattern forming apparatus block 1 to pattern forming apparatus block 94), randomly, according to the physical location of pixel groups and pattern forming apparatus blocks (e.g., sorting pixel groups closer to the center of the illumination source higher), and according to how changes to pixel groups or pattern forming apparatus blocks affect performance metrics.

[0223] Once the pixel group and pattern forming apparatus blocks are sorted, the illumination source and pattern forming apparatus are adjusted to improve performance metrics (step S812). In step S812, each of the pixel group and pattern forming apparatus blocks is analyzed in sorting order to determine whether a change in the pixel group or pattern forming apparatus block will result in improved performance metrics. If it is determined that the performance metrics will be improved, the pixel group or pattern forming apparatus block is changed accordingly, and the resulting improved performance metrics and the modified illumination shape or modified pattern forming apparatus pattern form a baseline for comparison, which is then used for subsequent analysis of lower-level pixel groups and pattern forming apparatus blocks. In other words, the change to improve performance metrics is retained. When the state of the pixel group and pattern forming apparatus blocks is changed and maintained, the initial illumination shape and the initial pattern forming apparatus pattern are changed accordingly, such that the modified illumination shape and the modified pattern forming apparatus pattern are generated by the optimization process in step S812.

[0224] In other methods, the following are also performed during the optimization process of S812: adjustment of the polygon shape of the pattern forming apparatus for pixel groups or pattern forming apparatus blocks and pairwise polling.

[0225] In an alternative embodiment, the simultaneous optimization process may include changing the pixel group of the irradiation source and, if an improvement in performance metrics is found, gradually increasing and decreasing the dose to seek further improvements. In another alternative embodiment, the gradual increase and decrease of dose or intensity may be replaced by a change in the bias of the pattern of the pattern forming apparatus to seek further improvements in the simultaneous optimization process.

[0226] In step S814, it is determined whether the performance metric has converged. For example, if the performance metric has shown little or no improvement in the last few iterations of steps S810 and S812, it can be considered that the performance metric has converged. If the performance metric has not converged, steps S810 and S812 are repeated in the next iteration, wherein the modified irradiation shape and modified pattern forming apparatus from the current iteration are used as the initial irradiation shape and initial pattern forming apparatus for the next iteration (step S816).

[0227] The optimization methods described above can be used to increase the throughput of photolithography projection equipment. For example, the cost function can include a function f of the exposure time. p (z1, z2, ..., z N The optimization of such a cost function is preferably constrained or influenced by measures of stochastic effects or other metrics. Specifically, a computer-implemented method for increasing the yield of a lithography process may include optimizing a cost function to minimize exposure time, which is a function of one or more stochastic effects of the lithography process and the exposure time of the substrate.

[0228] In one embodiment, the cost function includes at least one f that is a function of one or more random effects. p (z1, z2, ..., z N Random effects can include characteristic faults, such as in Figure 3 The method identifies measurement data (e.g., SEPE), LWR of 2D features, or local CD variations. In one embodiment, random effects include random variations in the properties of the resist image. For example, such random variations may include feature failure rate, line edge roughness (LER), line width roughness (LWR), and critical dimension uniformity (CDU). Including random variations in the cost function allows finding design variable values ​​that minimize random variations, thereby reducing the risk of defects due to random effects.

[0229] Figure 17This is a block diagram illustrating a computer system 100 that can facilitate implementation in the various methods and systems disclosed herein. Computer system 100 includes a bus 102 or other communication mechanism for communicating information, and a processor 104 (or multiple processors 104 and 105) coupled to the bus 102 to process information. Computer system 100 also includes a main memory 106, such as random access memory (RAM) or other dynamic storage device, coupled to the bus 102, for storing information and instructions to be executed by the processor 104. Main memory 106 can also be used to store temporary variables or other intermediate information during the execution of instructions to be executed by the processor 104. Computer system 100 also includes a read-only memory (ROM) 108 or other static storage device coupled to the bus 102 for storing static information and instructions for the processor 104. A storage device 110, such as a magnetic disk or optical disk, is provided and coupled to the bus 102 for storing information and instructions.

[0230] Computer system 100 can be coupled to display 112, such as a cathode ray tube (CRT), flat panel display, or touch panel display, via bus 102 for displaying information to the computer user. Input device 114, including alphanumeric keys and other keys, is coupled to bus 102 for communicating information and command selection to processor 104. Another type of user input device is cursor controller 116, such as a mouse, trackball, or cursor arrow keys, for communicating directional information and command selection to processor 104, and for controlling cursor movement on display 112. This input device typically has two degrees of freedom on two axes (a first axis (e.g., x) and a second axis (e.g., y)), allowing the device to specify its position in a plane. Touch panel (screen) displays can also be used as input devices.

[0231] According to one embodiment, portions of one or more methods described herein can be executed by computer system 100 in response to processor 104 executing one or more sequences of one or more instructions contained in main memory 106. Such instructions may be read into main memory 106 from another computer-readable medium, such as storage device 110. Execution of the sequence of instructions contained in main memory 106 causes processor 104 to perform the process steps described herein. One or more processors in a multiprocessor arrangement may also be used to execute the sequence of instructions contained in main memory 106. In an alternative embodiment, hardwired circuitry may be used in place of or in combination with software instructions. Therefore, the description herein is not limited to any particular combination of hardware circuitry and software.

[0232] As used herein, the term "computer-readable medium" refers to any medium that participates in providing instructions to processor 104 for execution. Such media can take many forms, including but not limited to non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical discs or magnetic disks, such as storage device 110. Volatile media include dynamic memory, such as main memory 106. Transmission media include coaxial cables, copper wires, and optical fibers, including conductors forming bus 102. Transmission media can also take the form of acoustic or optical waves, such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media include, for example, floppy disks, flexible disks, hard disks, magnetic tape, any other magnetic media, CD-ROMs, DVDs, any other optical media, punched cards, paper tape, any other physical media with a perforated pattern, RAM, PROMs and EPROMs, FLASH-EPROMs, any other memory chips or cartridges, carrier waves as described below, or any other media from which a computer can read.

[0233] Various forms of computer-readable media may involve carrying one or more sequences of one or more instructions to processor 104 for execution. For example, the instructions may initially be carried on a disk of a remote computer. The remote computer may load the instructions into its dynamic memory and transmit the instructions over a telephone line using a modem. A modem local to computer system 100 may receive data over the telephone line and convert the data into an infrared signal using an infrared transmitter. An infrared detector coupled to bus 102 may receive the data carried in the infrared signal and place the data on bus 102. Bus 102 transfers the data to main memory 106, from which processor 104 retrieves and executes the instructions. The instructions received by main memory 106 may optionally be stored on storage device 110 before or after execution by processor 104.

[0234] Computer system 100 also preferably includes a communication interface 118 coupled to bus 102. Communication interface 118 provides bidirectional data communication coupled to network link 120, which is connected to local network 122. For example, communication interface 118 may be an Integrated Services Digital Network (ISDN) card or modem for providing data communication connectivity to a corresponding type of telephone line. As another example, communication interface 118 may be a Local Area Network (LAN) card for providing data communication connectivity to a compatible LAN. A wireless link may also be implemented. In any such implementation, communication interface 118 transmits and receives electrical, electromagnetic, or optical signals carrying digital data streams representing various types of information.

[0235] Network link 120 typically provides data communication to other data devices via one or more networks. For example, network link 120 may provide a connection to host computer 124 or to data devices operated by Internet Service Provider (ISP) 126 via local network 122. ISP 126 then provides data communication services via a global packet data communication network now commonly referred to as the "Internet" 128. Both local network 122 and Internet 128 use electrical, electromagnetic, or optical signals that carry digital data streams. Signals through various networks and on network link 120, as well as signals through communication interface 118 (which carries digital data to and from computer system 100), are exemplary forms of carrier waves for transmitting information.

[0236] Computer system 100 can send messages and receive data (including program code) via networks(s), network link 120, and communication interface 118. In the Internet example, server 130 can send request code for an application via Internet 128, ISP 126, local network 122, and communication interface 118. For example, an application downloaded in this way can provide illumination optimizations for an embodiment. The received code can be executed by processor 104 upon receipt or stored in storage device 110 or other non-volatile storage for later execution. In this way, computer system 100 can obtain application code in carrier form.

[0237] Figure 18 An exemplary photolithography projection apparatus is schematically depicted, whose illumination source can be optimized using the methods described herein. The apparatus includes:

[0238] - Irradiation system IL, used to modulate radiation beam B. In this specific case, the irradiation system also includes radiation source SO;

[0239] - A first stage (e.g., a mask stage) MT, which is provided with a pattern forming device holder to hold the pattern forming device MA (e.g., a mask), and is connected to a first locator to precisely position the pattern forming device relative to the article PS.

[0240] - A second stage (substrate stage) WT, which is provided with a substrate holder to hold the substrate W (e.g., a silicon wafer coated with resist) and is connected to a second positioner to precisely position the substrate relative to the article PS;

[0241] - A projection system (“lens”) PS (e.g., a refractive, reflective, or antirefractive optical system) for imaging the irradiated portion of the pattern forming apparatus MA onto a target portion C (e.g., including one or more dies) of the substrate W.

[0242] As described herein, the device is transmissive (i.e., has a transmissive mask). However, it can also typically be reflective, for example (with a reflective mask). Alternatively, the device can employ another class of patterning apparatus as an alternative to using a conventional mask; examples include programmable mirror arrays or LCD matrices.

[0243] A source SO (e.g., a mercury lamp or excimer laser) generates a radiation beam. This beam is fed directly or after passing through an adjustment device (such as a beam expander Ex) into an irradiation system (irradiator) IL. The irradiator IL may include an adjustment device AD ​​for setting the outer or inner radial range of the intensity distribution in the beam (typically referred to as σ_outer and σ_inner, respectively). Additionally, it typically includes various other components, such as a beam concentrator IN and a beam concentrater CO. In this way, the beam B striking the pattern forming apparatus MA has the desired uniformity and intensity distribution across its cross-section.

[0244] about Figure 18 It should be noted that the source SO can be inside the housing of the lithography projection device (e.g., this is usually the case when the source SO is a mercury lamp), but the source SO can also be located away from the lithography projection device, and the radiation beam it generates can be directed into the device (e.g., by means of a suitable directional mirror); the latter case is usually the case when the source SO is an excimer laser (e.g., based on KrF, ArF or F2 lasers).

[0245] The beam PB then intercepts the patterning apparatus MA held on the patterning apparatus stage MT. After passing through the patterning apparatus MA, the beam B passes through the lens PL, which focuses the beam B onto the target portion C of the substrate W. With the aid of a second positioning device (and an interferometric measuring device IF), the substrate stage WT can be precisely moved, for example, to position different target portions C within the path of the beam PB. Similarly, the first positioning device can be used to precisely position the patterning apparatus MA relative to the path of the beam B, for example, after mechanically retrieving the patterning apparatus MA from the patterning apparatus library, or during scanning. Typically, the movement of the stages MT and WT is achieved using long-stroke modules (coarse positioning) and short-stroke modules (fine positioning), which... Figure 18 It is not explicitly shown in the text. However, in the case of a wafer stepper (as opposed to a stepping scan tool), the pattern forming stage MT can be connected only to a short-stroke actuator or can be fixed.

[0246] The drawing tools can be used in two different modes:

[0247] - In step mode, the patterning apparatus stage MT remains essentially stationary, and the entire patterning apparatus image is projected onto the target portion C in one (i.e., a single "flash"). The substrate stage WT then shifts in the x or y direction, allowing the beam PB to illuminate different target portions C;

[0248] In scanning mode, the same scenario applies, except that the given target portion C is not exposed in a single "flash". Instead, the patterning stage MT can move at a speed v in a given direction (the so-called "scanning direction", such as the y-direction), causing the projected beam B to scan over the patterning image; simultaneously, the substrate stage WT moves simultaneously in the same or opposite direction at a speed V = Mv, where M is the magnification of the lens PL (typically M = 1 / 4 or 1 / 5). In this way, a relatively large target portion C can be exposed without sacrificing resolution.

[0249] Figure 19 Another exemplary lithography projection apparatus LA is schematically depicted, whose illumination source can be optimized using the methods described herein.

[0250] Photolithography projection equipment (LA) includes:

[0251] -Source collector module SO;

[0252] - Irradiation system (irradiator) IL, which is configured to modulate the radiation beam B (e.g., EUV radiation).

[0253] - A support structure (e.g., a mask stage) MT, configured to support a pattern forming apparatus (e.g., a mask or a mask plate) MA and connected to a first positioner PM, the first positioner PM being configured to precisely position the pattern forming apparatus.

[0254] - A substrate stage (e.g., a wafer stage) WT, configured to hold a substrate (e.g., a wafer coated with resist) W and connected to a second positioner PW, the second positioner PW being configured to precisely position the substrate; and

[0255] - A projection system (e.g., a reflective projection system) PS, configured to project a pattern given by a radiation beam B by a pattern forming device MA onto a target portion C (e.g., including one or more dies) of a substrate W.

[0256] As depicted herein, the apparatus LA is reflective (e.g., employing a reflective mask). It should be noted that because most materials are absorbent in the EUV wavelength range, the mask can have multiple layers of reflectors, including, for example, multiple stacks of molybdenum and silicon. In one example, the multi-layered reflector has 40 layers of molybdenum and silicon pairs, where the thickness of each layer is a quarter wavelength. Even smaller wavelengths can be produced using X-ray lithography. Since most materials are absorbent at both EUV and X-ray wavelengths, the patterned sheets of absorbing material on the morphology of the patterning apparatus (e.g., a TaN absorber on top of a multi-layered reflector) define the locations where features will be printed (positive resist) or not printed (negative resist).

[0257] refer to Figure 19 The irradiator IL receives an extreme ultraviolet (EUV) radiation beam from the source collector module SO. Methods for generating EUV radiation include, but are not limited to, converting a material into a plasma state having at least one element (e.g., xenon, lithium, or tin), wherein one or more emission lines are in the EUV range. In one such method, a plasma, commonly referred to as laser-generated plasma (“LPP”), can be generated by irradiating a fuel (such as droplets, streams, or clusters of a material having a line-emitting element) using a laser beam. Figure 19 (Not shown) is part of an EUV radiation system used to provide a laser beam for exciting the fuel. The resulting plasma emits output radiation, such as EUV radiation, which is collected using a radiation collector located in the source collector module. For example, when a CO2 laser is used to provide the laser beam for fuel excitation, the laser and the source collector module can be separate entities.

[0258] In such cases, the laser is not considered part of the lithography apparatus, and the radiation beam is delivered from the laser to the source collector module via a beam delivery system that includes, for example, suitable directional mirrors or beam expanders. In other cases, such as when the source is a discharge-generated plasma EUV generator (often referred to as a DPP source), the source can be part of the source collector module.

[0259] An irradiator IL may include adjusters for adjusting the angular intensity distribution of the radiation beam. Typically, at least the outer or inner radial range of the intensity distribution in the pupil plane of the irradiator (often referred to as σ_outer and σ_inner, respectively) can be adjusted. Additionally, the irradiator IL may include various other components, such as faceted field reflector assemblies and pupil reflector assemblies. The irradiator can be used to adjust the radiation beam to have a desired uniformity and intensity distribution in its cross-section.

[0260] A radiation beam B is incident on a patterning apparatus (e.g., a mask) MA held on a support structure (e.g., a mask stage) MT and patterned by the patterning apparatus. After reflection from the patterning apparatus (e.g., the mask) MA, the radiation beam B passes through a projection system PS, which focuses the beam onto a target portion C of the substrate W. The substrate stage WT can be precisely moved, for example, to position different target portions C within the path of the radiation beam B, using a second locator PW and a position sensor PS2 (e.g., an interferometer, a linear encoder, or a capacitive sensor). Similarly, a first locator PM and another position sensor PS1 can be used to precisely position the patterning apparatus (e.g., the mask) MA relative to the path of the radiation beam B. The patterning apparatus (e.g., the mask) MA and the substrate W can be aligned using patterning apparatus alignment marks M1, M2 and substrate alignment marks P1, P2.

[0261] The depicted device LA can be used in at least one of the following modes:

[0262] 1. In step mode, the support structure (e.g., mask stage) MT and substrate stage WT are kept substantially stationary, while the entire pattern imparting the radiation beam is projected onto the target portion C at once (i.e., single static exposure). The substrate stage WT is then shifted in the X or Y direction so that different target portions C can be exposed.

[0263] 2. In the scanning mode, while the pattern imparting the radiation beam is projected onto the target portion C, the support structure (e.g., mask stage) MT and the substrate stage WT are scanned synchronously (i.e., single dynamic exposure). The velocity and direction of the substrate stage WT relative to the support structure (e.g., mask stage) MT can be determined by the (reduced) magnification and image inversion characteristics of the projection system PS.

[0264] 3. In another mode, the support structure (e.g., a mask stage) MT is kept substantially stationary, thereby holding the programmable patterning apparatus in place, while the substrate stage WT is moved or scanned simultaneously as the pattern imparted by the radiation beam is projected onto the target portion C. In this mode, a pulsed radiation source is typically employed, and the programmable patterning apparatus is updated as needed after each movement of the substrate stage WT or between successive radiation pulses during scanning. This operating mode can be readily applied to maskless lithography utilizing a programmable patterning apparatus, such as programmable mirror arrays of the type described above.

[0265] Figure 20The apparatus LA is shown in more detail, comprising a source collector module SO, an irradiation system IL, and a projection system PS. The source collector module SO is constructed and arranged such that a vacuum environment can be maintained within a closed structure 220 of the source collector module SO. An EUV radiation emitting plasma 210 can be formed by a plasma source generated by a discharge. The EUV radiation can be generated by a gas or vapor (e.g., Xe gas, Li vapor, or Sn vapor), wherein a very hot plasma 210 is created to emit radiation in the EUV range of the electromagnetic spectrum. For example, the very hot plasma 210 is created by a discharge that causes at least partial ionization of the plasma. Effective generation of the radiation requires Xe, Li, Sn vapor, or any other suitable gas or vapor with a partial pressure of, for example, 10 Pa. In one embodiment, a plasma that excites tin (Sn) is provided to generate EUV radiation.

[0266] Radiation emitted by thermal plasma 210 enters collector chamber 212 from source chamber 211 via an optional gas barrier or contaminant trap 230 (also referred to in some cases as a contaminant barrier or foil trap), located in or behind an opening in source chamber 211. Contaminant trap 230 may include a channel structure. Contaminant trap 230 may also include a gas barrier or a combination of a gas barrier and a channel structure. Contaminant traps or contaminant barriers 230 further described herein include at least channel structures known in the art.

[0267] Collector chamber 211 may include a radiation collector CO, which may be a so-called grazing incidence collector. The radiation collector CO has an upstream radiation collector side 251 and a downstream radiation collector side 252. Radiation passing through the collector CO may be reflected from a grating spectral filter 240 to be focused along the optical axis indicated by the dashed line 'O' into a virtual source point IF. The virtual source point IF is often referred to as the intermediate focus, and the source collector module is arranged such that the intermediate focus IF is located at or near the opening 221 in the closed structure 220. The virtual source point IF is an image of the radiative emission plasma 210.

[0268] Subsequently, radiation passes through an illumination system IL, which may include a faceted field mirror assembly 22 and a faceted pupil mirror assembly 24. The faceted field mirror assembly 22 and the faceted pupil mirror assembly 24 are arranged to provide a desired angular distribution of the radiation beam 21 at the patterning apparatus MA, and to provide a desired uniformity of radiation intensity at the patterning apparatus MA. When the radiation beam 21 is reflected at the patterning apparatus MA, held by the support structure MT, a patterned beam 26 is formed, and the patterned beam 26 is imaged by the projection system PS via reflective elements 28 and 30 onto the substrate W, held by the substrate stage WT.

[0269] More components than are typically shown in the illumination optics unit IL and the projection system PS. Depending on the type of lithography equipment, a grating spectral filter 240 may be optionally present. Furthermore, more mirrors than are shown in the figure, for example, more mirrors than are present in the projection system PS. Figure 20 The 1-6 additional reflective elements are shown.

[0270] like Figure 20 As shown, the collector optics CO is described as a nested collector with grazing incidence reflectors 253, 254, and 255, as an example of a collector (or collector mirror). The grazing incidence reflectors 253, 254, and 255 are arranged symmetrically about the optical axis O, and this type of collector optics CO is preferably used in combination with a plasma source generated by a discharge, commonly referred to as a DPP source.

[0271] Alternatively, the source collector module SO can be as follows: Figure 21 This is part of the LPP radiation system shown. The laser LA is configured to deposit laser energy into a fuel such as xenon (Xe), tin (Sn), or lithium (Li), thereby creating a highly ionized plasma 210 with electron temperatures of tens of eV. Energy radiation generated during the deexcitation and recombination of these ions is emitted from the plasma, collected by a collector optics CO with approximately perpendicular incidence, and focused onto an opening 221 in the closed structure 220.

[0272] The concepts disclosed in this paper can be used to simulate or mathematically model any general imaging system for imaging sub-wavelength features, and are particularly applicable to emerging imaging technologies capable of generating increasingly shorter wavelengths. Emerging technologies already in use include EUV (Extreme Ultraviolet) lithography and DUV lithography, which can generate wavelengths of 193 nm using ARF lasers, and even 157 nm using fluorine lasers. Furthermore, EUV lithography can generate wavelengths in the 20 nm–5 nm range by using synchrotrons or by using high-energy electrons to bombard materials (solid-state or plasma), thus producing photons within this range.

[0273] While the concepts disclosed herein can be used for imaging on substrates such as silicon wafers, it should be understood that the disclosed concepts can be used in any type of lithography imaging system, such as lithography imaging systems for imaging on substrates other than silicon wafers.

[0274] The preceding paragraphs describe decomposing CD distribution or LCDU data into error contributions from various sources. For example, at least as referenced... Figure 6The resolver module 320 decomposes three input signals 615, 620, and 625 (which include a first δCD value set 515a, a second δCD value set 520a, and a third δCD value set 525a for a plurality of contact holes) into three output signals 601, 602, and 603, respectively. These three output signals 601, 602, and 603 represent error contributions from sources such as masks, resists, and SEMs. However, in some embodiments, the resolver module 320 may not be able to determine which output signal corresponds to which source's error contribution, because in some embodiments, error contributions from various sources can be similar, and therefore the resolver module 320 may not be able to distinguish them.

[0275] This disclosure identifies the error contribution source of a given error contribution signal. In some embodiments, a machine learning (ML) model is trained to distinguish error contributions from various sources, and the trained ML model is used to determine the classification of a given signal (e.g., error contribution source) or to identify a label for the error contribution source.

[0276] Figure 22 This is a block diagram illustrating the classification of a dataset or error contribution signal representing error contribution values ​​based on error contribution sources according to one embodiment. An error contribution signal 2205, representing the error contribution values, is input to a classifier model 2250. In some embodiments, the classifier model 2250 is an ML model trained to determine the classification of the input signal (e.g., the source of the error contribution values ​​in the signal). The classifier model 2250 analyzes the signal 2205 and determines or predicts a classification 2225 for the error contribution signal 2205. The classification 2225 may indicate the source of the error contribution value in the signal 2205, such as a mask, resist, or SEM. The classification 2225 value may be in any of a variety of formats. In some embodiments, the classification 2225 may be output as a probability value (e.g., 0.0 to 1.0) indicating the probability that the error contribution value in the signal 2205 comes from a specified source. For example, the classification 2225 value may be "P". RESIST =0.98”, this indicates that the probability that the error contribution value in signal 2205 is resist noise is “98%”. In some embodiments, the classification 2225 value can indicate the probability that the error contribution value comes from each source. For example, the classification 2225 value can be “P”. RESIST =0.98", P MASK =0.015” and “P SEM=0.005” indicates that the probability of the error contribution value in signal 2205 being resist noise is “98%”, the probability of the error contribution value in signal 2205 being mask noise is “1.5%”, and the probability of the error contribution value in signal 2205 being SEM noise is “0.5%”. In some embodiments, category 2225 can be an enumerated value that can indicate one of multiple sources. For example, category 2225 can be “1”, “2”, or “3”, where each number represents a specified error contribution source. In another example, category 2225 can be text indicating a specified error contribution source (such as “resist”, “mask”, or “SEM”).

[0277] In some embodiments, at least a reference may be used. Figure 6 Signal 2205 is generated using any of the described methods (e.g., the ICA method). Signal 2205 can be any output signal of the resolver module 320, such as the first output signal 601, the second output signal 602, and the third output signal 603. Signals 601-603 can include signals related to δCD. MASK Error contribution (e.g., mask noise), δCD RESIST Error contribution 602 (e.g., resist noise) and δCD SEM The value corresponding to the error contribution (e.g., SEM noise). Figure 6 In this context, error contributions 601-603 are classified based on their sources; however, in at least some embodiments, the resolver module 320 may not be able to identify the sources of error contributions to the output signal. The following references at least... Figure 23 Let's discuss the details of training the classifier model 2250.

[0278] Figure 23 The illustration shows training according to one embodiment. Figure 22 A block diagram of a classifier model for classifying error contribution signals based on error contribution sources is provided. In some embodiments, the classifier model 2250 is an ML model implemented using a neural network such as a convolutional neural network (CNN), a deep CNN, or a recurrent neural network. The following paragraphs describe classification using CNNs; however, it should be noted that classification is not limited to CNNs and other ML techniques can be used. In short, the CNN model used to determine the classification of error contribution signal 2305 includes an input layer 2330 and an output layer 2335, as well as multiple hidden layers (such as convolutional layers, normalization layers, and pooling layers) between the input layer 2330 and the output layer 2335. As part of training, the parameters of the hidden layers are optimized to give a minimum of the loss function. In some embodiments, the CNN model can be trained to model the behavior of any process or combination of processes related to metrology or lithography.

[0279] In some embodiments, training a CNN-based classifier model 2250 to determine the classification of error contribution signals includes adjusting model parameters, such as the weights and biases of the CNN, such that the cost function for predicting, determining, or generating classifications is minimized. In some embodiments, adjusting model parameter values ​​includes adjusting one or more weights of a CNN layer, one or more biases of a CNN layer, the CNN's hyperparameters, and / or the number of CNN layers. In some embodiments, the number of layers is a CNN hyperparameter that can be pre-selected and may remain unchanged during the training process. In some embodiments, a series of training procedures can be performed in which the number of layers can be modified.

[0280] In some embodiments, training the classifier model 2250 involves determining the value of a cost function and progressively adjusting the weights of one or more layers of the CNN such that the cost function is reduced (in one embodiment, minimized or not reduced beyond a specified threshold). In some embodiments, the cost function indicates the difference between the predicted classification 2320 of the input signal 2305 (e.g., the classification of the CNN's output vector) and the actual classification of the input signal 2305 (e.g., specified or provided along with the input signal 2305). In some embodiments, the cost function may be a loss function such as binary cross-entropy. The cost function is reduced by modifying the values ​​of the CNN model parameters (e.g., weights, biases, stride, etc.). In one embodiment, the cost function is computed as CF = f(predicted classification - CNN(input, cnn_parameters)). In this step, the input to the CNN includes the input signal and its corresponding actual classification, and cnn_parameters are the weights and biases of the CNN, with initial values ​​that can be randomly chosen.

[0281] In some embodiments, the gradient corresponding to the cost function can be dcost / dparameter, where the cnn_parameters value can be updated based on an equation (e.g., parameter = parameter - learning_rate * gradient). The parameters can be weights and / or biases, and the learning_rate can be a hyperparameter used to tune the training process and can be selected by the user or the computer to improve the convergence of the training process (e.g., faster convergence).

[0282] The classifier model 2250 is trained using labeled training data 2325, which includes multiple error contribution signals representing error contribution values ​​from multiple sources, such as a first error contribution signal 2305, a second error contribution signal 2310, and a third error contribution signal 2315. Each error contribution signal in the training data 2325 includes: (a) an error contribution value from a specified source to a set of contact holes printed on the substrate, and (b) a label indicating a specified error contribution source (e.g., the actual classification of the error contribution signal). For example, the first error contribution signal 2305 may include: (a) a first set of error contribution values ​​associated with a first set of contact holes printed on the substrate, and (b) a label indicating the error contribution source as "resist". Similarly, the second error contribution signal 2310 may include: (a) a set of second error contribution values ​​associated with the first set of contact holes printed on the substrate, and (b) a label indicating the error contribution source as a "mask," and the third error contribution signal 2315 may include: (a) a set of third error contribution values ​​associated with the first set of contact holes printed on the substrate, and (b) a label indicating the error contribution source as a "SEM." Training data 2325 may include various such error contribution signals for various contact holes. In some embodiments, training data 2325 is divided into multiple subsets, where each subset includes error contribution signals for a different set of contact holes. For example, a first subset of training data may include three error contribution signals for a first subset of contact holes (e.g., one error contribution signal for each source), while a second subset of training data includes three error contribution signals for a second subset of contact holes (e.g., one error contribution signal for each source). The classifier model 2250 is trained by inputting different subsets at different stages of training.

[0283] In some embodiments, training the classifier model 2250 is an iterative process, and each iteration may involve: inputting different training data (e.g., error contribution input signals, such as input signal 2305), predicting the classification 2320 corresponding to the error contribution signals, determining a cost function based on the actual classification (e.g., provided in the labels) and the predicted classification 2320, and minimizing the cost function. In some embodiments, a first set of iterations is performed using a first subset of the training data, then a second set of iterations is performed using a second subset of the training data, and so on. After several iterations of training (e.g., when the cost function is minimized or does not decrease beyond a specified threshold), optimized cnn_parameters are obtained and further used as model parameter values ​​for the trained classifier model 2250. Then, for example, at least referencing Figure 22The trained classifier model 2250 can be used to predict the classification of any desired error contribution signal by using the error contribution signal as input.

[0284] 5. Training data can be generated using any known number of methods. 232. The following at least refers to... Figure 24 This describes an example method for generating error contribution signals for training classifier model 2250.

[0285] Figure 24 This is a flowchart of a process 2400 for generating error contribution signals according to one embodiment. In some embodiments, process 2400 is a linear nested model used to decompose LCDU data associated with a set of contact holes into error contributions from multiple sources. The decomposition process is described in detail in the article “Roughness decomposition: an on-wafer methodology to discriminate mask, metrology, and shot noise contributions” by Lorusso, Gian, Rispens, Gijsbert, Rutigliani, Vito, Roey, Frieda, Frommhold, Andreas, and Schiffelers, Guido; - 2019 / 03 / 26, 10.1117 / 12.2515175, the entire contents of which are incorporated herein by reference. However, for convenience, the decomposition process 2400 is briefly described below. Process 2400 can be used to generate multiple error contribution signals that can be used to train classifier model 2250, such as Figure 23 The training data is 2325.

[0286] At operation 2405, a measurement process is performed to obtain measurement data 2401 of multiple contact holes printed on the substrate, such as CDs. Measurement values ​​can be obtained from CDU wafers and FEM wafers. The LCDU is decomposed into three components: mask noise, resist noise (including shot noise), and SEM noise. The measurement process is designed based on the following principles:

[0287] • Select the “N” contact holes on the mask;

[0288] • Each contact hole is imaged "M" times under equivalent conditions;

[0289] • Each image (N×M wafer image of the contact hole) was measured “S” times using SEM.

[0290] In this experiment, N contact holes of the same (expected) size were selected on a mask; these contact holes are typically part of a contact hole array. The actual size of the selected contact holes on the mask may vary due to mask errors. Mask errors are transferred to the wafer with each exposure, and thus result in a system fingerprint of the wafer CD measurement in each exposure result. The residual random component in the wafer CD variability is attributed to resist noise (along with shot noise) and SEM noise. To separate the SEM error component, as summarized in Table 1, all wafer CDs were measured S times (S images were taken at each measurement location).

[0291]

[0292] Table 1

[0293] The CD of the contact hole can be written as:

[0294]

[0295] in It is the average CD of the entire experiment and can be determined as:

[0296]

[0297] δCD i MASK This could be the effect of mask noise present in the contact hole i of the mask on the substrate, δCD ij SN This is resist noise that exists alongside the shot noise generated by the exposure j of contact hole i, and δCD ijk SEM This is residual random noise attributed to SEM error.

[0298] After obtaining measurement data 2401, at operation 2410, the error contribution 2411 is derived from the measurement data 2401 as follows. For example, the following equation represents the error contribution equation 2411 from sources such as masks, resists, and SEMs:

[0299]

[0300]

[0301]

[0302] Mask noise δCD of the i-th contact hole on the mask i MASK This is the deviation of the substrate CD, averaged across all measurements of the contact hole (in all exposures and SEM runs), from the total average CD. Shot noise δCD ijSN It is a factor nested under the mask error factor, and its level is related to the level of mask noise. δCD ij SN The effect of exposure j on contact hole i is measured. Specifically, for mask contact hole i, δCD ij SN δCD is the deviation of the substrate CD after exposure j from the average CD measured for that contact hole (averaged across all exposures and SEM runs). In the measurement for a specific hole i and exposure j, the SEM noise δCD is... ijk SEM It is the deviation of the k-th measurement from the average CD over all measurements of that image.

[0303] It is understood that the error contribution value 2411 corresponding to each source is calculated using equation 3A-5A. The above process 2400 can be used to generate multiple error contribution signals for multiple contact holes, and these error contribution signals can be used as training data 2325 to train, for example, at least as referenced. Figure 23 The described classifier model is 2250.

[0304] Figure 25A This is a flowchart of a process 2500 for determining the classification of error contributor signals according to one embodiment for training a classifier model. At operation 2505, training data having multiple datasets or error contributor signals is acquired, the error contributor signals representing the error contributions of multiple sources to features printed on a substrate. For example, the training data may be training data 2325 including error contribution signals such as a first error contribution signal 2305, a second error contribution signal 2310, and a third error contribution signal 2315. For example, the first error contribution signal 2305 may include: (a) a first set of error contribution values ​​associated with a first set of contact holes printed on the substrate, and (b) a label indicating the error contribution source as "resist". Similarly, the second error contribution signal 2310 may include: (a) a set of second error contribution values ​​associated with the first set of contact holes printed on the substrate, and (b) a label indicating the error contribution source as a "mask," and the third error contribution signal 2315 may include: (a) a set of third error contribution values ​​associated with the first set of contact holes printed on the substrate, and (b) a label indicating the error contribution source as a "SEM." Training data 2325 may include various such error contribution signals for various contact holes.

[0305] In some embodiments, the training data 2325 is divided into multiple subsets, each subset including error contribution signals for different sets of contact holes. For example, a first subset of the training data 2325 may include three error contribution signals for a first subset of contact holes (e.g., one error contribution signal for each source), while a second subset of the training data 2325 includes three error contribution signals for a second subset of contact holes (e.g., one error contribution signal for each source).

[0306] In some embodiments, training data 2325 uses any of a variety of methods (such as at least referring to the above). Figure 24 It is generated by describing a linear nested model.

[0307] At operation 2510, classifier model 2250 is trained based on training data to predict the classification of each error contributor signal from the training data. In some embodiments, classifier model 2250 is a CNN model. Classifier model 2250 is executed by inputting a first error contribution signal 2305 from training data 2325. Classifier model 2250 predicts a classification 2320 for the first error contribution signal 2305 (e.g., an error contribution source) and computes a cost function that determines the difference between the predicted classification and the actual classification of the first error contribution signal 2305. Training of classifier model 2250 is an iterative process and continues (e.g., by inputting different error contribution signals from different subsets of training data 2325) until the cost function decreases (e.g., exceeds a specified threshold or no longer decreases), i.e., the predicted classification of any error contributor signal from training data 2325 is similar to the actual classification of the corresponding error contributor signal. The following references at least to Figure 25B To describe additional details of the training process.

[0308] After the cost function has met specified criteria (e.g., no longer decreasing, decreasing beyond a specified threshold, or decreasing at a rate below a specified threshold), it is considered that the classifier model 2250 has been trained and can be used, for example, at least as referenced. Figure 22 The classification described is for predicting any expected error contribution signal.

[0309] Figure 25B This is a flowchart of a process 2550 for training a classifier model to determine the classification of error contributor signals according to one embodiment. In some embodiments, process 2550 is performed as part of operation 2510 of process 2500.

[0310] At operation 2555, a classifier model 2250 is executed by inputting a reference error contribution signal, such as a first error contribution signal 2305, to output a predicted classification of the reference error contribution signal, such as a predicted classification 2320 of the first error contribution signal 2305.

[0311] At operation 2560, the cost function of classifier model 2250 is calculated, for example, as the difference between the predicted classification and the actual classification. For example, cost function 2561 is determined as the difference between the predicted classification 2320 and the actual classification of the first error contribution signal 2305. In some embodiments, the actual classification, which serves as the source of error contribution for the first error contribution signal 2305, is provided as a label having the first error contribution signal 2305.

[0312] At operation 2565, classifier model 2250 is adjusted to reduce cost function 2561. In some embodiments, adjusting classifier model 2250 to reduce cost function 2561 includes adjusting model parameters, such as the weights and biases of classifier model 2250 (e.g., parameters of a CNN model).

[0313] At operation 2570, it is determined whether the cost function 2561 has been reduced (e.g., no longer reduced, reduced beyond a specified threshold, or reduced at a rate below a specified threshold).

[0314] If the cost function 2561 decreases, the classifier model 2250 is considered trained, and the process returns to operation 2510 of process 2500. However, if the cost function 2561 does not decrease, operations 2555-2570 are repeated using different error contribution signals from training data 2325 until the cost function 2561 decreases. For example, the first set of iterations can be performed by inputting a first subset of training data, which includes three error contribution signals for a first subset of contact holes (e.g., one error contribution signal for each source), then the second set of iterations can be performed using a second subset of training data, which includes three error contribution signals for a second subset of contact holes (e.g., one error contribution signal for each source), and so on, until the cost function 2561 decreases.

[0315] Figure 26This is a flowchart of a process 2600 for determining an error contribution signal source according to one embodiment. At operation 2605, an error contribution signal, such as error contribution signal 2205, is input to classifier model 2250. In some embodiments, error contribution signal 2205 includes a plurality of error contribution values ​​representing the error contribution to a feature set of a pattern printed on a substrate from one of a plurality of sources. For example, error contribution signal 2205 may represent the error contribution from a source such as a mask, resist, or SEM. Error contribution signal 2205 can be generated using any known number of methods. For example, as at least the above references... Figure 6 The error contribution signal 2205 can be generated using the ICA method from CD distribution or LCDU data associated with multiple contact holes.

[0316] At operation 2610, the error contribution signal 2205 is input to a trained classifier model 2250 to determine a classification 2225 that indicates the source of the error contribution value in the error contribution signal 2205. The classifier model 2250 can output the classification 2225 value in any of a variety of formats. In some embodiments, the classification 2225 can be output as a probability value (e.g., 0.0 to 1.0), indicating the probability that the error contribution value in the signal 2205 comes from a specified source. For example, the classification 2225 value could be "P". RESIST =0.98”, which indicates that the probability that the error contribution value in the error contribution signal 2205 is resist noise is “98%”. In some embodiments, the classification 2225 value can indicate the probability that the error contribution value comes from each source. For example, the classification 2225 value can be “P”. RESIST =0.98", P MASK =0.015” and “P SEM =0.005” indicates that the probability of the error contribution value in signal 2205 being resist noise is “98%”, the probability of the error contribution value in signal 2205 being mask noise is “1.5%”, and the probability of the error contribution value in signal 2205 being SEM noise is “0.5%”. In some embodiments, the classifier model 2250 can be configured to identify the error contribution source as the source with the highest probability.

[0317] This disclosure uses an ML model to determine error contributions from multiple sources. The ML model is trained to predict error contributions from various sources for a given feature. For example, an image of a feature (e.g., a contact hole) is provided as input to the ML model, and the ML model predicts error contributions from various sources for the input feature. (Refer to at least Figure 27-) Figure 28 To describe the details of training the ML model, and at least refer to Figures 29-30 This describes the contribution of prediction error.

[0318] Figure 27A This is a flowchart of a process 2700 for training an error contribution model to predict error contributions from multiple sources, according to one embodiment. Figure 28 This is a block diagram illustrating the determination of error contributions from multiple sources according to a training error contribution model based on one embodiment. In some embodiments, the error contribution model 2805 is an ML model implemented using a neural network such as a CNN, deep CNN, or recurrent neural network.

[0319] At operation 2705, multiple datasets are acquired as training data 2810, each dataset including image data of features of a pattern printed on a substrate and error contribution data, the error contribution data having error contribution values ​​representing the error contributions of different sources to the feature. For example, a first dataset 2815 may include first image data 2816 and first error contribution data 2817 of a first feature of the pattern (e.g., a contact hole), the first error contribution data 2817 having error contribution values ​​representing the error contributions of multiple sources such as a mask, resist, and SEM to the first feature. The first image data 2816 may include an image of the first feature. The image of the feature can be obtained using an inspection tool such as SEM. For example, the first error contribution data 2817 may include δCD. MASK δCD RESIST and δCD SEM The values ​​are respectively considered as error contributions from sources such as masks, resists, and SEM. As described above with reference to at least Equation 1, δCD is the deviation of the CD value of a given feature from the average of the CD values ​​of multiple features. The error contribution values ​​can be obtained using measurement data of features such as CD. For example, the error contribution values ​​can be obtained using, with reference to at least Equation 1... Figure 24 The described linear nested model is used to obtain the training data. The training data can include multiple such datasets for various features.

[0320] At operation 2710, training data 2810 is provided as input to error contribution model 2805, which is then trained to predict error contribution data using the training data. Training of error contribution model 2805 is an iterative process and continues (e.g., by using the same dataset or a different subset of the dataset as input to training data 2810) until the cost function decreases (e.g., exceeds a specified threshold or no longer decreases). The following references at least... Figure 27B To describe additional details of the training process. After the cost function has met specified metrics (e.g., no longer decreasing, decreasing beyond a specified threshold, or decreasing at a rate below a specified threshold), the error contribution model 2805 is considered "trained" and can be used, for example, at least as in the reference... Figure 28 The description refers to predicting the error contribution value for any desired feature.

[0321] Figure 27B This is a flowchart of a process 2750 for training an error contribution model to predict error contributions from multiple sources, according to one embodiment. In some embodiments, process 2750 is performed as part of operation 2710 of process 2700.

[0322] At operation 2755, an error contribution model 2805 is executed by inputting a reference dataset, such as a first dataset 2815, to output predicted error contribution data 2820 with error contribution values ​​from the reference dataset. In some embodiments, the predicted error contribution data 2820 may be a set of error contribution values, such as δCD. MASK δCD RESIST and δCD SEM .

[0323] At operation 2760, the cost function for calculating the error contribution model 2805 is calculated, for example, as the difference between the predicted error contribution data 2820 associated with the reference dataset and the actual error contribution data. For example, cost function 2761 is determined as the difference between the set of predicted error contribution values ​​in the predicted error contribution data 2820 and the set of error contribution values ​​from the first error contribution data 2817. In some embodiments, the set of error contribution values ​​from the first error contribution data 2817 is provided as a label having the first image data 2816.

[0324] At operation 2765, the error contribution model 2805 is adjusted to reduce the cost function 2761. In some embodiments, adjusting the error contribution model 2805 to reduce the cost function 2761 includes adjusting model parameters, such as the weights and biases of the error contribution model 2805.

[0325] At operation 2770, determine whether the cost function 2761 meets the training metric (e.g., the cost function no longer decreases, has decreased beyond a specified threshold, or its rate of decrease is below a specified threshold).

[0326] If the cost function 2761 satisfies the training metric, then the error contribution model 2805 is considered trained, and the process returns to operation 2710 of process 2700. However, if the cost function 2761 does not decrease, operations 2755-2770 are repeated using a different dataset or the same dataset from the training data 2810 until the cost function 2761 decreases. For example, a first set of iterations can be performed by inputting a first subset of the training data 2810 for a first subset of the contact holes, then a second set of iterations can be performed by inputting a second subset of the training data 2810 for a second subset of the contact holes, and so on, until the cost function 2761 decreases.

[0327] Figure 29This is a flowchart of a process 2900 for determining the error contribution of multiple sources to the features of a pattern printed on a substrate, according to one embodiment. Figure 30 This is a block diagram according to one embodiment for determining the error contribution of multiple sources to features of a pattern printed on a substrate. At operation 2905, image data 3005 of the feature (such as an image of a contact hole) is input to a trained error contribution model 2805, wherein an error contribution value is predicted for that feature. In some embodiments, image 3005 can be acquired using an inspection tool such as SEM.

[0328] At operation 2910, error contribution model 2805 is executed using image data 3005 to generate a prediction of error contribution data 3025. Error contribution data 3025 may include error contribution values ​​representing the error contributions of multiple sources to features in image data 3005. For example, the predicted error contribution data 3025 may include a set of error contribution values, such as δCD. MASK δCD RESIST and δCD SEM These are the error contributions from sources such as masks, resists, and SEM.

[0329] While the preceding paragraphs described the prediction of error contributions based on δCD, the error contribution model 2805 can also be used to predict error contributions based on LCDU. For example, the error contributions of source-pair features such as masks, resists, and SEMs to the LCDU can be expressed as such as LCDU... MASK LCDU RESIST and LCDU SEM The error contribution model 2805 can be trained using LCDU values ​​instead of δCD values. For example, in the process 2700 of training the error contribution model 2805, each dataset in the training data 2810 may include multiple images and a set of LCDU values ​​as error contribution values. For example, the first dataset 2815 may include multiple images corresponding to multiple features (e.g., contact holes) as image data 2816, and LCDU values. MASK LCDU RESIST and LCDU SEM The set of values ​​serves as the error contribution data for 2817, where LCDU MASK LCDU RESIST and LCDU SEM The value represents the error contribution of various sources to the LCDU of the feature. In some embodiments, similar to the δCD value, the LCDU error contribution value can be, at least by reference to... Figure 24The described model is obtained from a linear nested model. During the prediction process, multiple images corresponding to multiple features (e.g., contact holes) (for which predictions of LCDU error contribution values ​​are generated) are input as image data 3005 into the trained error contribution model 2805. The trained error contribution model 2805 generates LCDU... MASK LCDU RESIST and LCDU SEM The set of values ​​is used as error contribution data 3025, which represent error contributions from various sources.

[0330] Furthermore, while the preceding paragraphs described the prediction of error contribution values ​​for feature generation, the error contribution model 2805 can also be used to predict error contributions for multiple measurement points on a feature. For example, the error contribution model 2805 can predict a first set of error contribution values ​​(e.g., δCD) for a first measurement point on a feature. 1 MASK δCD 1 RESIST and δCD 1 SEM And predict the second set of error contribution values ​​(e.g., δCD) for the second measurement point. 2 MASK δCD 2 RESIST and δCD 2 SEMThe error contribution model 2805 can be trained using multiple sets of error contribution values ​​instead of a single set of error contribution values ​​for each feature. For example, in the process 2700 of training the error contribution model 2805, each dataset in the training data 2810 may include images of the features and multiple sets of error contribution values, where each set of error contribution values ​​corresponds to a single measurement point on the feature. For example, if the number of measurement points “n” is “20”, the first dataset 2815 may include an image of the first feature as image data 2816, and the error contribution data 2817 may include “20” sets of error contribution values ​​– one set for each of the “20” measurement points. During the prediction process, an image of the feature (for which predictions of error contribution values ​​are generated) as image data 3005 is input to the trained error contribution model 2805. The trained error contribution model 2805 generates predictions for “n” sets of error contribution values ​​as error contribution data 3025, where each set of error contribution values ​​corresponds to a test point among the “n” measurement points on the feature. Error contribution model 2805 can be configured in various ways to predict error contribution values ​​for "n" measurement points on a feature. For example, the dense layers in the neural network model used to implement error contribution model 2805 can be configured to generate n*m ​​values, where n is the number of measurement points on the feature and m is the number of sources contributing to the error (e.g., "3" for sources such as masks, resists, and SEMs). In another example, the image of the feature can be encoded (e.g., using a neural network encoder) into n*m values, which can be fed as training data into error contribution model 2805 to train error contribution model 2805 to generate predictions of error contribution values ​​for each of the n measurement points on the feature.

[0331] The embodiments may be further described using the following terms:

[0332] 1. A non-transitory computer-readable medium having instructions, which, when executed by a computer, cause the computer to perform a method for decomposing error contributions from multiple sources to multiple features of a pattern printed on a substrate, the method comprising:

[0333] Acquire an image of the pattern on the substrate;

[0334] Using images to obtain multiple measurements of the pattern's features, where the measurements are obtained against different sensor values;

[0335] Using a decomposition method, each of the multiple measurements is correlated with a linear mixture of error contributions to generate multiple linear mixtures of error contributions; and

[0336] Based on linear mixing and using a decomposition method, each of the error contributions is derived.

[0337] 2. The computer-readable medium according to Clause 1, wherein different sensor values ​​correspond to different thresholds associated with an image, wherein each threshold corresponds to a threshold value of a pixel in the image.

[0338] 3. The computer-readable medium according to Clause 2, wherein each measurement corresponds to a critical size (CD) value of a feature at one of different thresholds.

[0339] 4. The computer-readable medium as described in Clause 2, wherein the error contribution includes:

[0340] The error contribution of the image acquisition tool is related to the image acquisition tool used to acquire the image.

[0341] The contribution of mask error is related to the mask used to print patterns on the substrate, and

[0342] The resist error contribution, which is associated with the resist used for printing the pattern, includes: photoresist chemical noise and shot noise associated with the source of the photolithography equipment used for printing the pattern.

[0343] 5. The computer-readable medium according to Clause 4, further comprising:

[0344] Based on the mask error contribution, adjust one or more parameters of at least one of the mask or the source of the photolithography equipment used for printing the pattern.

[0345] 6. The computer-readable medium according to Clause 4, further comprising:

[0346] Based on the resist error contribution, adjust one or more parameters of at least one of the sources in the mask or the photolithography equipment used to print the pattern.

[0347] 7. A computer-readable medium according to any one of clauses 3-6, wherein acquiring the measurement value comprises:

[0348] At a first threshold among different threshold values, a first signal with multiple first ΔCD values ​​is acquired from multiple measurement points.

[0349] At the second threshold among different thresholds, a second signal with multiple second ΔCD values ​​is acquired from multiple measurement points, and

[0350] At the third threshold among different thresholds, a third signal with multiple third ΔCD values ​​is obtained from multiple measurement points.

[0351] 8. The computer-readable medium according to Clause 7, wherein each ΔCD value is determined based on each threshold and each measurement point, and indicates the deviation of the CD value of a given feature from the average of a plurality of CD values ​​of the feature.

[0352] 9. The computer-readable medium according to Clause 7, wherein each ΔCD value indicates, at a given threshold, the distance between a designated point on a contour line of a given feature and a reference point on a reference contour line of a given feature, wherein the reference contour line is a simulated version of the contour line of the given feature.

[0353] 10. The computer-readable medium as described in Clause 7, wherein associating each measurement value includes:

[0354] Each of the plurality of first ΔCD values ​​in the first signal is correlated with a first linear mixture of the image acquisition tool error contribution, mask error contribution, and resist error contribution.

[0355] Each of the multiple second ΔCD values ​​in the second signal is correlated with a second linear mixture of the image acquisition tool error contribution, mask error contribution, and resist error contribution, and

[0356] Each of the multiple third ΔCD values ​​in the third signal is correlated with a third linear mixture of the image acquisition tool error contribution, mask error contribution, and resist error contribution.

[0357] 11. The computer-readable medium according to Clause 10, wherein each of the derived error contributions includes:

[0358] Using a first linear blend, a second linear blend, and a third linear blend, and based on each of a plurality of first values, a plurality of second values, and a plurality of third ΔCD values, derive: (a) a first output signal with a plurality of image acquisition tool error contributions, (b) a second output signal with a plurality of mask error contributions, and (c) a third output signal with a plurality of resist error contributions.

[0359] 12. The computer-readable medium according to Clause 11, wherein each error contribution is determined based on the corresponding error contribution at the first threshold level, the second threshold level, and the third threshold level.

[0360] 13. The computer-readable medium as described in Clause 11, wherein each of the derived error contributions includes:

[0361] A mixing matrix with a set of coefficients is determined. Based on multiple first ΔCD values, multiple second ΔCD values, and multiple third ΔCD values, a first linear mixture, a second linear mixture, and a third linear mixture corresponding to the error contribution of each ΔCD value are generated.

[0362] Determine the inverse of the mixing matrix, and

[0363] Using the inverse of the mixing matrix, based on multiple first ΔCD values, multiple second ΔCD values, and multiple third ΔCD values, respectively determine (a) a first output signal with multiple image acquisition tool error contributions, (b) a second output signal with multiple mask error contributions, and (c) a third output signal with multiple resist error contributions.

[0364] 14. The computer-readable medium according to any one of clauses 2-3, wherein acquiring the measurement value comprises:

[0365] Obtain the first contour line corresponding to the first threshold among different thresholds of the feature.

[0366] Obtain the first CD value of the first contour line.

[0367] Obtain the second contour line corresponding to the second threshold among different thresholds of the feature, and

[0368] Obtain the second CD value of the second contour line.

[0369] 15. The computer-readable medium according to Clause 14, further comprising:

[0370] A first ΔCD value is obtained at a first threshold, wherein the first ΔCD indicates the deviation of the first CD value from the average of multiple first CD values ​​measured at multiple measurement points.

[0371] 16. The computer-readable medium according to Clause 15, wherein obtaining the first ΔCD value includes:

[0372] At multiple measurement points, multiple first CD values ​​corresponding to the first threshold are obtained.

[0373] Get the average of multiple first CD values.

[0374] Shift the average value to zero, and

[0375] The first ΔCD value is obtained as the difference between the first CD value and the average value.

[0376] 17. The computer-readable medium according to Clause 15, wherein a plurality of measurement points are located on at least one of the features of (a) the pattern or (b) a plurality of features of the pattern.

[0377] 18. A computer-readable medium according to any one of clauses 15-17, wherein associating each measurement value includes:

[0378] The first ΔCD value corresponding to the first threshold is associated with a first linear mixture of the first error contribution and the second error contribution in the error contribution, and

[0379] The second ΔCD value corresponding to the second threshold is associated with a second linear mixture of the first error contribution and the second error contribution.

[0380] 19. The computer-readable medium pursuant to Clause 18, wherein each of the derived error contributions comprises:

[0381] Using a decomposition method, the first error contribution and the second error contribution are derived based on the first ΔCD value and the second ΔCD value, as well as the first linear mixture and the second linear mixture.

[0382] 20. The computer-readable medium according to Clause 1, wherein for different sensor values, the measured value corresponds to a local critical size uniformity (LCDU) value of the feature.

[0383] 21. A computer-readable medium according to any one of Clauses 1 and 20, wherein different sensor values ​​correspond to different dose levels associated with a source of a photolithography apparatus used for printing patterns.

[0384] 22. The computer-readable medium according to any one of Clauses 1 and 20, wherein different sensor values ​​correspond to different focus levels associated with a source of a photolithography apparatus used for printing patterns.

[0385] 23. The computer-readable medium according to any one of clauses 20-21 further includes:

[0386] Based on a specified focus level, obtain the first LCDU value corresponding to the first dose level, and

[0387] Based on a specified focus level, obtain a second LCDU value corresponding to the second dose level.

[0388] 24. The computer-readable medium according to any one of clauses 20 or 22 further includes:

[0389] Based on a specified dose level, obtain the first LCDU value corresponding to the first focusing level, and

[0390] Based on a specified dose level, obtain a second LCDU value corresponding to the second focusing level.

[0391] 25. A computer-readable medium according to any one of clauses 23 or 24, wherein associating each measurement value includes:

[0392] The first LCDU value is correlated with a first linear mixture of the first error contribution and the second error contribution in the error contribution, and

[0393] The second LCDU value is correlated with a second linear mixture of the first error contribution and the second error contribution.

[0394] 26. The computer-readable medium pursuant to Clause 25, wherein each of the derived error contributions comprises:

[0395] Using a decomposition method, the first and second error contributions are derived based on the first LCDU value and the second LCDU value, as well as the first linear mixture and the second linear mixture.

[0396] 27. The computer-readable medium according to Clause 1, wherein the measured value corresponds to the linewidth roughness (LWR) value of the feature for different sensor values.

[0397] 28. A non-transitory computer-readable medium having instructions, which, when executed by a computer, cause the computer to perform a method for decomposing error contributions from multiple sources to multiple features associated with a pattern printed on a substrate, the method comprising:

[0398] Obtain the image of the pattern;

[0399] At different heights of the outline of the feature of the pattern, multiple Δ critical dimension (CD) values ​​are obtained, wherein the multiple ΔCD values ​​include: (a) a first ΔCD value set corresponding to the height of the first outline of the feature, (b) a second ΔCD value set corresponding to the height of the second outline of the feature, and (c) a third ΔCD value set corresponding to the height of the third outline of the feature.

[0400] Using a decomposition method, (a) the first set of ΔCD values ​​is associated with a first linear mixture of the first, second, and third error contributions; (b) the second set of ΔCD values ​​is associated with a second linear mixture of the first, second, and third error contributions; (c) the third set of ΔCD values ​​is associated with a third linear mixture of the first, second, and third error contributions; and the first, second, and third error contributions are derived based on the linear mixtures and using the decomposition method.

[0401] 29. The computer-readable medium according to Clause 28, wherein each ΔCD value indicates the deviation of the CD value of the feature from the average of a plurality of CD values ​​measured at a plurality of measurement points at a specified contour height.

[0402] 30. The computer-readable medium according to Clause 28, wherein each ΔCD value indicates the distance between a specified point on the contour of a feature and a reference point on a reference contour of the feature at a given contour height, wherein the reference contour is a simulated version of the contour of the given feature.

[0403] 31. The computer-readable medium as described in Clause 28, wherein the height of each contour line is determined by thresholding the pixel values ​​of the image to a specified value.

[0404] 32. The computer-readable medium as described in Clause 28 further comprises:

[0405] Based on one or more error contributions, adjust one or more parameters of at least one of the sources in the mask or the photolithography equipment used for printing patterns.

[0406] 33. A non-transitory computer-readable medium having instructions, which, when executed by a computer, cause the computer to perform a method for decomposing error contributions from multiple sources to multiple features associated with a pattern on a substrate, the method comprising:

[0407] Acquire local critical size uniformity (LCDU) data associated with a pattern, wherein for a specified focus level of the source of the photolithography apparatus used to print the pattern, the LCDU data includes: (a) a first set of LCDU values ​​of the pattern features corresponding to a first dose level of the source, (b) a second set of LCDU values ​​of the features corresponding to a second dose level, and (c) a third set of LCDU values ​​of the features corresponding to a third dose level.

[0408] Using a decomposition method, (a) a first set of LCDU values ​​is associated with a first linear mixture of the first, second, and third error contributions; (b) a second set of LCDU values ​​is associated with a second linear mixture of the first, second, and third error contributions; and (c) a third set of LCDU values ​​is associated with a third linear mixture of the first, second, and third error contributions; and

[0409] Based on linear mixing and using a decomposition method, the first, second, and third error contributions are derived.

[0410] 34. A method for decomposing error contributions from multiple sources to multiple features associated with a pattern printed on a substrate, the method comprising:

[0411] Acquire an image of the pattern on the substrate;

[0412] Using an image, multiple measurements of the pattern's features are obtained, where each measurement corresponds to a different threshold associated with the image;

[0413] Using a decomposition method, each of the multiple measurements is correlated with a linear mixture of error contributions to generate multiple linear mixtures of error contributions; and

[0414] Based on linear mixing and using a decomposition method, each of the error contributions is derived.

[0415] 35. The method according to Clause 34, wherein each measurement corresponds to a critical size (CD) value of the feature at one of different thresholds.

[0416] 36. The method according to Clause 35, wherein each threshold corresponds to a threshold value of a pixel in the image.

[0417] 37. The method according to any one of clauses 35-36, wherein the error contribution includes:

[0418] The first, second, and third error contributions to the CD value, wherein the first error contribution comes from the resist used to print the pattern, the second error contribution comes from the mask used to print the pattern on the substrate, and the third error contribution comes from the image acquisition tool used to acquire the image.

[0419] 38. A method for decomposing error contributions from multiple sources to one or more features associated with a pattern printed on a substrate, the method comprising:

[0420] Acquire local critical size uniformity (LCDU) data associated with a pattern, wherein for a specified focus level of the source of the photolithography apparatus used to print the pattern, the LCDU data includes: (a) a first set of LCDU values ​​of one or more features of the pattern corresponding to a first dose level of the source, (b) a second set of LCDU values ​​of one or more features corresponding to a second dose level, and (c) a third set of LCDU values ​​of one or more features corresponding to a third dose level.

[0421] Using a decomposition method, (a) a first set of LCDU values ​​is associated with a first linear mixture of the first, second, and third error contributions; (b) a second set of LCDU values ​​is associated with a second linear mixture of the first, second, and third error contributions; and (c) a third set of LCDU values ​​is associated with a third linear mixture of the first, second, and third error contributions; and

[0422] Based on linear mixing and using a decomposition method, the first, second, and third error contributions are derived.

[0423] 39. An apparatus for decomposing error contributions from multiple sources to multiple features of a pattern printed on a substrate, the apparatus comprising:

[0424] Memory for storing instruction sets; and

[0425] At least one processor is configured to execute a set of instructions to cause the device to perform the following methods:

[0426] Acquire an image of the pattern on the substrate;

[0427] Using images to obtain multiple measurements of the pattern's features, where the measurements are obtained against different sensor values;

[0428] Using a decomposition method, each of the multiple measurements is correlated with a linear mixture of error contributions to generate multiple linear mixtures of error contributions; and

[0429] Based on linear mixing and using a decomposition method, each of the error contributions is derived.

[0430] 40. The device according to Clause 39, wherein different sensor values ​​correspond to different thresholds associated with an image, wherein each threshold corresponds to a threshold value of a pixel in the image.

[0431] 41. The device according to Clause 40, wherein each measurement corresponds to a critical size (CD) value of a feature at one of different thresholds.

[0432] 42. The device as described in Clause 40, wherein the error contribution includes:

[0433] Image acquisition tool error contribution associated with the image acquisition tool used to acquire images.

[0434] The contribution of mask error associated with the mask used to print patterns on the substrate, and

[0435] The resist error contribution associated with the resist used for printing patterns, wherein the resist error contribution includes photoresist chemical noise and shot noise associated with the source of the photolithography equipment used for printing patterns.

[0436] 43. The device according to clause 42, further comprising:

[0437] Based on the mask error contribution, adjust one or more parameters of at least one of the mask or the source of the photolithography equipment used for printing the pattern.

[0438] 44. The device according to clause 42, further comprising:

[0439] Based on the resist error contribution, adjust one or more parameters of at least one of the sources in the mask or the photolithography equipment used to print the pattern.

[0440] 45. The device according to any one of clauses 41-44, wherein acquiring the measured value comprises:

[0441] At a first threshold among different threshold values, a first signal with multiple first ΔCD values ​​is acquired from multiple measurement points.

[0442] At the second threshold among different thresholds, a second signal with multiple second ΔCD values ​​is acquired from multiple measurement points, and

[0443] At the third threshold among different thresholds, a third signal with multiple third ΔCD values ​​is obtained from multiple measurement points.

[0444] 46. ​​The device according to Clause 45, wherein each ΔCD value is determined based on each threshold and each measurement point, and indicates the deviation of the CD value of a given feature from the average of a plurality of CD values ​​of the feature.

[0445] 47. The device according to Clause 45, wherein each ΔCD value indicates, at a given threshold, the distance between a designated point on the contour of a given feature and a reference point on a reference contour of a given feature, wherein the reference contour is a simulated version of the contour of the given feature.

[0446] 48. The device as described in Clause 45, wherein associating each measurement value includes:

[0447] Each of the plurality of first ΔCD values ​​in the first signal is correlated with a first linear mixture of the contributions from the image acquisition tool, mask, and resist errors.

[0448] Each of the multiple second ΔCD values ​​in the second signal is correlated with a second linear mixture of the contributions from the image acquisition tool, mask, and resist errors, and

[0449] Each of the multiple third ΔCD values ​​in the third signal is correlated with a third linear mixture of the contributions from the image acquisition tool, mask, and resist errors.

[0450] 49. The apparatus as described in Clause 48, wherein each of the derived error contributions includes:

[0451] Using a first, second, and third linear mixture, and based on each of a plurality of first ΔCD values, a plurality of second ΔCD values, and a plurality of third ΔCD values, the following are derived: (a) a first output signal with a plurality of image acquisition tool error contributions, (b) a second output signal with a plurality of mask error contributions, and (c) a third output signal with a plurality of resist error contributions.

[0452] 50. The apparatus as described in Clause 49, wherein each of the derived error contributions includes:

[0453] Use the Independent Component Analysis (ICA) method to derive each of the error contributions.

[0454] 51. The apparatus according to Clause 50, wherein deriving each of the error contributions using the ICA method includes:

[0455] A mixture matrix with a set of coefficients is determined, wherein the set of coefficients generates first, second, and third linear mixtures corresponding to the error contributions of each ΔCD value based on multiple first ΔCD values, multiple second ΔCD values, and multiple third ΔCD values.

[0456] Determine the inverse of the mixing matrix, and

[0457] Using the inverse of the mixing matrix, based on multiple first ΔCD values, multiple second ΔCD values, and multiple third ΔCD values, respectively determine: (a) a first output signal with multiple image acquisition tool error contributions, (b) a second output signal with multiple mask error contributions, and (c) a third output signal with multiple resist error contributions.

[0458] 52. The apparatus as described in Clause 49, wherein each of the derived error contributions includes:

[0459] Use the reconstruction ICA method or the orthogonal ICA method to derive each of the error contributions.

[0460] 53. The device according to any one of clauses 40-41, wherein acquiring the measured value comprises:

[0461] Obtain the first contour line corresponding to the first threshold among different thresholds of the feature.

[0462] Obtain the first CD value of the first contour line.

[0463] Obtain the second contour line corresponding to the second threshold among different thresholds of the feature, and

[0464] Obtain the second CD value of the second contour line.

[0465] 54. The device according to Clause 53, further comprising:

[0466] A first ΔCD value is obtained at a first threshold, wherein the first ΔCD indicates the deviation of the first CD value from the average of multiple first CD values ​​measured at multiple measurement points.

[0467] 55. The device according to clause 54, wherein obtaining the first ΔCD value includes:

[0468] Multiple first CD values ​​corresponding to the first threshold are obtained at multiple measurement points.

[0469] Get the average of multiple first CD values.

[0470] Shift the average value to zero, and

[0471] The first ΔCD value is obtained as the difference between the first CD value and the average value.

[0472] 56. The device according to Clause 55, wherein a plurality of measuring points are located on at least one of (a) the feature of the pattern or (b) a plurality of features.

[0473] 57. The device according to any one of clauses 53-55, wherein associating each measurement value includes:

[0474] The first ΔCD value corresponding to the first threshold is associated with a first linear mixture of the first error contribution and the second error contribution in the error contribution, and

[0475] The second ΔCD value corresponding to the second threshold is associated with a second linear mixture of the first and second error contributions.

[0476] 58. The apparatus as described in Clause 57, wherein each of the derived error contributions includes:

[0477] Using a decomposition method, the first and second error contributions are derived based on the first and second ΔCD values ​​and the first and second linear mixtures.

[0478] 59. The device according to Clause 39, wherein the measured value corresponds to a local critical size uniformity (LCDU) value of a feature for different sensor values.

[0479] 60. The device according to any one of clauses 39 and 59, wherein different sensor values ​​correspond to different dose levels associated with a source of the photolithography apparatus used for printing patterns.

[0480] 61. The apparatus according to any one of clauses 39 and 59, wherein different sensor values ​​correspond to different focus levels associated with a source of the photolithography apparatus used for printing patterns.

[0481] 62. The device according to any one of clauses 59-60 further includes:

[0482] Based on a specified focus level, obtain the first LCDU value corresponding to the first dose level, and

[0483] Based on a specified focus level, obtain a second LCDU value corresponding to the second dose level.

[0484] 63. The device according to any one of clauses 59 or 61 further includes:

[0485] Based on a specified dose level, obtain a first LCDU value corresponding to a first threshold of the focus level, and

[0486] Based on a specified dose level, a second LCDU value corresponding to a second threshold of the focusing level is obtained.

[0487] 64. The device according to any one of clauses 62 or 63, wherein associating each measurement value includes:

[0488] The first LCDU value is correlated with a first linear mixture of the first error contribution and the second error contribution in the error contribution, and

[0489] The second LCDU value is correlated with a second linear mixture of the first and second error contributions.

[0490] 65. The apparatus according to Clause 64, wherein each of the derived error contributions includes:

[0491] Using a decomposition method, the first and second error contributions are derived from the first and second LCDU values ​​and the first and second linear mixtures.

[0492] 66. The device according to Clause 39, wherein the measured value corresponds to the linewidth roughness (LWR) value of the feature for different sensor values.

[0493] 67. A computer program product comprising a non-transitory computer-readable medium having instructions recorded thereon, the instructions, when executed by a computer, implementing the method according to any one of the preceding clauses.

[0494] 68. A non-transitory computer-readable medium having instructions, which, when executed by a computer, cause the computer to perform a method for training a machine learning model to determine error contribution sources to multiple features of a pattern printed on a substrate, the method comprising:

[0495] Obtain training data from multiple datasets, each dataset having an error contribution value representing the error contribution to a feature from one of multiple sources, and each dataset being associated with an actual classification that identifies the source of the error contribution for the corresponding dataset; and

[0496] A machine learning model is trained based on the training data to predict the classification of a reference dataset, thereby reducing the cost function, which determines the difference between the predicted classification and the actual classification of the reference dataset.

[0497] 69. The computer-readable medium according to Clause 68, wherein obtaining training data includes:

[0498] Using different focus and dose levels of the equipment used for printing patterns, local critical size uniformity (LCDU) data associated with the features were obtained.

[0499] 70. The computer-readable medium according to Clause 69, wherein obtaining training data includes:

[0500] Decompose the LCDU data associated with the features to derive the error contribution value from each of the multiple sources.

[0501] 71. The computer-readable medium according to Clause 68, wherein obtaining training data includes:

[0502] Generate (a) a first dataset of training data, the first dataset having error contribution values ​​representing the error contributions from a first source among multiple sources; (b) a second dataset of training data, the second dataset having error contribution values ​​representing the error contributions from a second source among multiple sources; and (c) a third dataset of training data, the third dataset having error contribution values ​​representing the error contributions from a third source among multiple sources.

[0503] (d) Associating the first dataset with the first category, where the first category identifies the first source as an error contribution source; (e) Associating the second dataset with the second category, where the second category identifies the second source as an error contribution source; and (f) Associating the third dataset with the third category, where the third category identifies the third source as an error contribution source.

[0504] 72. The computer-readable medium according to Clause 71, wherein the first source is an image acquisition tool for acquiring an image of a pattern, wherein the second source is a mask for printing the pattern on a substrate, and wherein the third source is a resist for printing the pattern and photon shot noise of the apparatus for printing the pattern on the substrate.

[0505] 73. The computer-readable medium according to clause 71, wherein generating the first dataset comprises:

[0506] Generate multiple groups for the first, second, and third datasets, where each group includes error contribution values ​​representing the error contributions of the first, second, and third sources to different subsets of the features, respectively.

[0507] 74. The computer-readable medium according to Clause 68, wherein training the machine learning model is an iterative process, wherein each iteration includes:

[0508] (a) Use training data to execute a machine learning model to output a predicted classification of the reference dataset.

[0509] (b) Define the cost function as the difference between the predicted classification and the actual classification.

[0510] (c) Adjust the machine learning model.

[0511] (d) Determine whether the cost function decreases due to the adjustment, and

[0512] (e) In response that the cost function is not reduced, repeat steps (a), (b), (c) and (d).

[0513] 75. A computer-readable medium according to any one of clauses 68-74, wherein the machine learning model is a convolutional neural network.

[0514] 76. The computer-readable medium according to Clause 68, further comprising:

[0515] Receive a specified dataset with error contribution values, which represent the error contribution from one of multiple sources to a feature set of a specified pattern printed on a specified substrate; and

[0516] Execute a machine learning model to determine a classification associated with a given dataset, where the classification identifies a given source among multiple sources as the source of error contribution values ​​in the given dataset.

[0517] 77. The computer-readable medium according to clause 76, wherein receiving the specified dataset includes:

[0518] Using a decomposition method, multiple measurements associated with a feature set are decomposed to derive a set of datasets representing the error contribution from each of the multiple sources, where the specified dataset is one of the datasets and corresponds to the error contribution from one of the multiple sources.

[0519] 78. The computer-readable medium according to Clause 77, wherein the decomposition of the measurement value includes:

[0520] Get the image of the specified pattern;

[0521] Images are used to acquire measurements, which are obtained for different sensor values;

[0522] Using a decomposition method, each measurement is associated with a linear mixture of error contributions to generate multiple linear mixtures of error contributions; and

[0523] Based on linear mixing and using a decomposition method, each of the error contributions is derived.

[0524] 79. The computer-readable medium according to Clause 78, wherein different sensor values ​​correspond to different threshold levels associated with an image, wherein each measurement corresponds to a Δcritical size (CD) value of a feature in a feature set at one of different thresholds, wherein the ΔCD value indicates the deviation of the CD value of the feature from the average of a plurality of CD values ​​in the feature set.

[0525] 80. A computer-readable medium according to any one of Clauses 79, wherein each of the different thresholds corresponds to a threshold of pixel values ​​in an image.

[0526] 81. The computer-readable medium according to Clause 78, wherein the measured value corresponds to the LCDU value of the feature at different sensor values.

[0527] 82. The computer-readable medium according to Clause 81, wherein different sensor values ​​correspond to different dose levels associated with a source of a photolithography apparatus used for printing patterns.

[0528] 83. The computer-readable medium according to Clause 81, wherein different sensor values ​​correspond to different focus levels associated with a source of a photolithography apparatus used for printing patterns.

[0529] 84. A computer-readable medium according to any one of clauses 78-83, wherein each of the derived error contributions comprises:

[0530] Independent component analysis (ICA) was used as a decomposition method to derive each of the error contributions.

[0531] 85. A non-transitory computer-readable medium having instructions, which, when executed by a computer, cause the computer to perform a method for determining error contribution sources to a plurality of features of a pattern printed on a substrate, the method comprising:

[0532] Process one or more images of a pattern to obtain a dataset set, wherein each dataset in the dataset set has an error contribution value representing the error contribution to a feature from one of a plurality of sources;

[0533] Input a specified dataset from multiple datasets into a machine learning model; and

[0534] Execute a machine learning model to determine a classification associated with a given dataset, where the classification identifies a given source among multiple sources as the source of error contribution values ​​in the given dataset.

[0535] 86. A computer-readable medium according to any of clause 85, wherein performing a machine learning model to determine a classification includes:

[0536] Multiple datasets are used to train a machine learning model to determine the classification of a given dataset, wherein each of the multiple datasets includes an error contribution value representing the error contribution to a feature from one of multiple sources, and wherein each dataset is associated with an actual classification that identifies the error contribution source of the error contribution value for the corresponding dataset.

[0537] 87. The computer-readable medium according to any one of Clauses 86, wherein training the machine learning model comprises:

[0538] The machine learning model is trained to determine the predicted classification of the reference dataset in the dataset, thereby reducing the cost function, which determines the difference between the predicted classification and the actual classification of the reference dataset.

[0539] 88. The computer-readable medium as described in Clause 87, wherein training the machine learning model is an iterative process, wherein each iteration includes:

[0540] (a) Using multiple datasets to execute a machine learning model to output a predicted classification for a reference dataset.

[0541] (b) Define the cost function as the difference between the predicted classification and the actual classification.

[0542] (c) Adjust the machine learning model.

[0543] (d) Determine whether the cost function decreases due to the adjustment, and

[0544] (e) In response that the cost function is not reduced, repeat steps (a), (b), (c) and (d).

[0545] 89. The computer-readable medium as described in Clause 86, wherein training a machine learning model comprises:

[0546] Generate (a) a first dataset from multiple datasets, the first dataset having error contribution values ​​representing the error contributions from a first source among multiple sources; (b) a second dataset from multiple datasets, the second dataset having error contribution values ​​representing the error contributions from a second source among multiple sources; and (c) a third dataset from multiple datasets, the third dataset having error contribution values ​​representing the error contributions from a third source among multiple sources.

[0547] (d) Associating the first dataset with the first category, which identifies the first source as an error contribution source; (e) Associating the second dataset with the second category, which identifies the second source as an error contribution source; and (f) Associating the third dataset with the third category, which identifies the third source as an error contribution source.

[0548] 90. The computer-readable medium according to Clause 89, wherein generating the first dataset comprises:

[0549] Generate multiple groups for the first, second, and third datasets, where each group includes error contribution values ​​representing the error contributions of the first, second, and third sources to different subsets of the features, respectively.

[0550] 91. The computer-readable medium according to Clause 90, further comprising:

[0551] After another set of the first, second, and third datasets, the machine learning model is trained by inputting one set of the first, second, and third datasets.

[0552] 92. The computer-readable medium according to clause 85, wherein processing one or more images to obtain a dataset collection comprises:

[0553] Obtain multiple Δcritical dimension (CD) values ​​at different heights of the feature's contour line, wherein the multiple ΔCD values ​​include: (a) a first ΔCD value set corresponding to the height of the first contour line of the feature, (b) a second ΔCD value set corresponding to the height of the second contour line of the feature, and (c) a third ΔCD value set corresponding to the height of the third contour line of the feature.

[0554] Using a decomposition method, (a) a first set of ΔCD values ​​is associated with a first linear mixture of error contributions from multiple sources; (b) a second set of ΔCD values ​​is associated with a second linear mixture of error contributions from multiple sources; and (c) a third set of ΔCD values ​​is associated with a third linear mixture of error contributions from multiple sources; and

[0555] Based on linear mixing and using a decomposition method, the error contribution from each source is derived.

[0556] The first dataset in the dataset set includes error contribution values ​​representing the error contribution from the first source among multiple sources.

[0557] The second dataset in the dataset set includes error contribution values ​​representing the error contributions from a second source among multiple sources, and

[0558] The third dataset in the dataset set includes error contribution values ​​representing the error contribution from a third source among multiple sources.

[0559] 93. The computer-readable medium according to Clause 92, wherein each contour height is determined by thresholding the pixel values ​​of one or more images to a specified value.

[0560] 94. The computer-readable medium according to clause 85, wherein processing one or more images to obtain a dataset collection comprises:

[0561] Acquire local critical size uniformity (LCDU) data associated with the pattern, wherein for a specified focus level of the source of the photolithography apparatus used to print the pattern, the LCDU data includes: (a) a first set of LCDU values ​​corresponding to a first dose level of the source, (b) a second set of LCDU values ​​corresponding to a second dose level, and (c) a third set of LCDU values ​​corresponding to a third dose level.

[0562] Using a decomposition method, (a) a first set of LCDU values ​​is associated with a first linear mixture of error contributions from multiple sources, (b) a second set of LCDU values ​​is associated with a second linear mixture of error contributions from multiple sources, and (c) a third set of LCDU values ​​is associated with a third linear mixture of error contributions from multiple sources; and

[0563] Based on linear mixing and using a decomposition method, the error contribution from each source is derived.

[0564] The first dataset in the dataset set includes error contribution values ​​representing the error contribution from the first source among multiple sources.

[0565] The second dataset in the dataset set includes error contribution values ​​representing the error contributions from a second source among multiple sources, and

[0566] The third dataset in the dataset set includes error contribution values ​​representing the error contribution from a third source among multiple sources.

[0567] 95. A non-transitory computer-readable medium having instructions, which, when executed by a computer, cause the computer to perform a method for determining error contribution sources to a plurality of features of a pattern printed on a substrate, the method comprising:

[0568] Input a specified dataset into a machine learning model, the specified dataset having error contribution values ​​representing the error contribution to features from one of multiple sources; and

[0569] Execute a machine learning model to determine a classification associated with a given dataset, where the classification identifies a given source among multiple sources as the source of error contribution values ​​in the given dataset.

[0570] 96. A computer-readable medium according to any one of Clauses 95, wherein inputting the specified dataset comprises:

[0571] The image of the pattern is processed to obtain a set of datasets, where each dataset in the set has an error contribution value representing the error contribution to a feature from one of a plurality of sources, and the specified dataset is one of the datasets in the set of datasets.

[0572] 97. A method for training a machine learning model to determine sources of error contributions to multiple features of a pattern printed on a substrate, the method comprising:

[0573] Obtain training data from multiple datasets, each dataset having an error contribution value representing the error contribution to a feature from one of multiple sources, and each dataset being associated with an actual classification that identifies the source of the error contribution for that dataset; and

[0574] A machine learning model is trained based on the training data to predict the classification of a reference dataset in the dataset, thereby reducing the cost function, which determines the difference between the predicted classification and the actual classification of the reference dataset.

[0575] 98. The method according to Clause 97, wherein obtaining training data includes:

[0576] Using different focus and dose levels of the equipment used for printing patterns, acquire local critical size uniformity (LCDU) data or LWR data associated with the features.

[0577] 99. The method according to Clause 98, wherein obtaining training data includes:

[0578] Decompose the LCDU data or LWR data associated with the features to derive the error contribution from each of the multiple sources.

[0579] 100. The method according to Clause 97, wherein obtaining training data includes:

[0580] Generate (a) a first dataset of training data, the first dataset having error contribution values ​​representing the error contributions from a first source among multiple sources; (b) a second dataset of training data, the second dataset having error contribution values ​​representing the error contributions from a second source among multiple sources; and (c) a third dataset of training data, the third dataset having error contribution values ​​representing the error contributions from a third source among multiple sources.

[0581] (d) Associate the first dataset with the first category that identifies the source of error contribution as the first source, (e) Associate the second dataset with the second category that identifies the source of error contribution as the second source, and (f) Associate the third dataset with the third category that identifies the source of error contribution as the third source.

[0582] 101. The method according to Clause 100, wherein the first source is an image acquisition tool for acquiring an image of the pattern, wherein the second source is a mask for printing the pattern on a substrate, and wherein the third source is a resist for printing the pattern and photon shot noise of the apparatus for printing the pattern on the substrate.

[0583] 102. The method according to clause 100, wherein generating the first dataset comprises:

[0584] Generate multiple groups for the first, second, and third datasets, where each group includes error contribution values ​​representing the error contributions of the first, second, and third sources to different subsets of the features, respectively.

[0585] 103. The method according to clause 97, wherein training the machine learning model is an iterative process, wherein each iteration includes:

[0586] (a) Use training data to execute a machine learning model to output a predicted classification of the reference dataset.

[0587] (b) Define the cost function as the difference between the predicted classification and the actual classification.

[0588] (c) Adjust the machine learning model.

[0589] (d) Determine whether the cost function decreases due to the adjustment, and

[0590] (e) In response that the cost function is not reduced, repeat steps (a), (b), (c) and (d).

[0591] 104. A method for determining error contribution sources of multiple features of a pattern printed on a substrate, the method comprising:

[0592] The images of the patterns are processed to obtain a set of datasets, where each dataset in the set has an error contribution value representing the error contribution to the feature from one of a plurality of sources;

[0593] Input a specified dataset from multiple datasets into a machine learning model; and

[0594] Execute a machine learning model to determine a classification associated with a given dataset, where the classification identifies a given source among multiple sources as the source of error contribution values ​​in the given dataset.

[0595] 105. An apparatus for training a machine learning model to determine error contribution sources to multiple features of a pattern printed on a substrate, the apparatus comprising:

[0596] Memory for storing instruction sets; and

[0597] At least one processor is configured to execute an instruction set that causes the device to perform the following methods:

[0598] Obtain training data from multiple datasets, each dataset having an error contribution value representing the error contribution to a feature from one of multiple sources, and each dataset being associated with an actual classification that identifies the source of the error contribution for that dataset; and

[0599] A machine learning model is trained based on the training data to predict the classification of a reference dataset in the dataset, thereby reducing the cost function, which determines the difference between the predicted classification and the actual classification of the reference dataset.

[0600] 106. The device according to clause 105, wherein acquiring training data includes:

[0601] For different threshold levels on images with features, acquire local critical size uniformity (LCDU) data or line width roughness (LWR) data associated with the features, or use different focus and dose level values ​​of the equipment used for printing patterns.

[0602] 107. The device according to clause 106, wherein acquiring training data includes:

[0603] Decompose the LCDU data or LWR data associated with the features to derive the error contribution value from each of the multiple sources.

[0604] 108. The device according to clause 105, wherein acquiring training data includes:

[0605] Generate (a) a first dataset of training data, the first dataset having error contribution values ​​representing the error contributions from a first source among multiple sources; (b) a second dataset of training data, the second dataset having error contribution values ​​representing the error contributions from a second source among multiple sources; and (c) a third dataset of training data, the third dataset having error contribution values ​​representing the error contributions from a third source among multiple sources.

[0606] (d) Associate the first dataset with the first category that identifies the first source as an error contribution source, (e) Associate the second dataset with the second category that identifies the second source as an error contribution source, and (f) Associate the third dataset with the third category that identifies the third source as an error contribution source.

[0607] 109. The apparatus according to Clause 108, wherein the first source is an image acquisition tool for acquiring an image of a pattern, wherein the second source is a mask for printing the pattern on a substrate, and wherein the third source is a resist for printing the pattern and photon shot noise of the apparatus for printing the pattern on the substrate.

[0608] 110. The device according to clause 108, wherein generating the first dataset comprises:

[0609] Generate multiple groups for the first, second, and third datasets, where each group includes error contribution values ​​representing the error contributions of the first, second, and third sources to different subsets of the features, respectively.

[0610] 111. The device according to clause 105, wherein training the machine learning model is an iterative process, wherein each iteration includes:

[0611] (a) Use training data to execute a machine learning model to output a predicted classification of the reference dataset.

[0612] (b) Define the cost function as the difference between the predicted classification and the actual classification.

[0613] (c) Adjust the machine learning model.

[0614] (d) Determine whether the cost function decreases due to the adjustment, and

[0615] (e) In response that the cost function is not reduced, repeat steps (a), (b), (c) and (d).

[0616] 112. The device according to any one of clauses 105-111, wherein the machine learning model is a recurrent neural network.

[0617] 113. The device according to Clause 105, further comprising:

[0618] Receive a specified dataset with error contribution values, each representing the error contribution from one of multiple sources to a feature set of a specified pattern printed on a specified substrate; and

[0619] Execute a machine learning model to determine a classification associated with a given dataset, where the classification identifies a given source among multiple sources as the source of error contribution values ​​in the given dataset.

[0620] 114. The device according to clause 113, wherein receiving the specified dataset includes:

[0621] Using a decomposition method, multiple measurements associated with a feature set are decomposed to derive a set of datasets representing the error contribution from each of the multiple sources, where the specified dataset is one of the datasets and corresponds to the error contribution from one of the multiple sources.

[0622] 115. The device according to clause 114, wherein the decomposition of the measurement value includes:

[0623] Get the image of the specified pattern;

[0624] Images are used to acquire measurements, which are obtained for different sensor values;

[0625] Using a decomposition method, each measurement is associated with a linear mixture of error contributions to generate multiple linear mixtures of error contributions; and

[0626] Based on linear mixing and using a decomposition method, each of the error contributions is derived.

[0627] 116. The device according to Clause 115, wherein different sensor values ​​correspond to different threshold levels associated with an image, wherein each measurement corresponds to a Δ critical size (CD) value of a feature in a feature set at one of different thresholds, wherein the ΔCD value indicates the deviation of the CD value of the feature from the average of a plurality of CD values ​​in the feature set.

[0628] 117. The apparatus according to Clause 116, wherein the CD value is the difference between the measured profile of the feature and the simulated profile of the feature.

[0629] 118. The device according to clause 116, wherein each of the different thresholds corresponds to a threshold of pixel values ​​in an image.

[0630] 119. The device according to Clause 115, wherein the measured value corresponds to the LCDU value or LWR value of the feature at different sensor values.

[0631] 120. The apparatus according to Clause 119, wherein different sensor values ​​correspond to different dose levels associated with a source of a photolithography apparatus used for printing patterns.

[0632] 121. The apparatus according to Clause 119, wherein different sensor values ​​correspond to different focus levels associated with a source of a photolithography apparatus used for printing patterns.

[0633] 122. The device according to any one of clauses 115-121, wherein each of the derived error contributions comprises:

[0634] The ICA method is used as a decomposition method to derive each of the error contributions.

[0635] 123. A computer program product comprising a non-transitory computer-readable medium having instructions recorded thereon, the instructions, when executed by a computer, implementing the method according to any one of the preceding clauses.

[0636] 124. A non-transitory computer-readable medium having instructions, which, when executed by a computer, cause the computer to perform a method for training a machine learning model to determine error contributions to features of a pattern printed on a substrate, the method comprising:

[0637] Acquire training data with multiple datasets, wherein the datasets include a first dataset having (a) first image data of one or more features of a pattern to be printed on a substrate, and (b) first error contribution data, the first error contribution data including error contributions to one or more features from multiple sources; and

[0638] A machine learning model is trained based on the training data to predict the error contribution data of the first dataset, thereby reducing the cost function, which indicates the difference between the predicted error contribution data and the first error contribution data.

[0639] 125. The computer-readable medium according to Clause 124, wherein the first image data includes a first image of a feature among one or more features, and wherein the first error contribution data includes a set of first error contribution values ​​corresponding to a Δ critical size (CD) value of the first feature.

[0640] 126. The computer-readable medium according to Clause 125, wherein each ΔCD value indicates the deviation of the CD value of a first feature from the average of a plurality of CD values ​​of one or more features.

[0641] 127. The computer-readable medium according to Clause 124, wherein the first image data comprises a first image set of a plurality of features among one or more features, and wherein the first error contribution data comprises a first set of error contribution values ​​corresponding to the local CD uniformity (LCDU) values ​​of the features.

[0642] 128. The computer-readable medium according to Clause 124, wherein the first error contribution data includes: a plurality of error contribution value sets corresponding to a plurality of measurement points on a feature of one or more features, wherein the set of error contribution value sets includes a first set of error contribution value sets, which represent error contributions from a plurality of sources at a first measurement point among the measurement points.

[0643] 129. The computer-readable medium according to Clause 124, wherein the first error contribution data is determined based on measurement data of one or more features.

[0644] 130. The computer-readable medium according to Clause 129, wherein the measurement data includes CD values ​​of one or more features or LCDU values ​​of multiple features among one or more features.

[0645] 131. The computer-readable medium as described in Clause 124, wherein the error contribution includes:

[0646] Image acquisition tool error contribution associated with the image acquisition tool used to acquire the first image data.

[0647] The contribution of mask error associated with the mask used to print patterns on the substrate, and

[0648] The resist error contribution associated with the resist used for printing patterns, wherein the resist error contribution includes photoresist chemical noise and shot noise associated with the source of the photolithography equipment used for printing patterns.

[0649] 132. The computer-readable medium according to Clause 124, wherein training the machine learning model is an iterative process, wherein each iteration includes:

[0650] (a) Use multiple datasets to execute machine learning models to output prediction error contribution data.

[0651] (b) The cost function is determined as the difference between the prediction error contribution data and the first error contribution data.

[0652] (c) Adjust the machine learning model.

[0653] (d) Determine whether the cost function decreases due to the adjustment, and

[0654] (e) In response to the cost function not decreasing, repeat steps (a), (b), (c) and (d).

[0655] 133. The computer-readable medium as described in Clause 124, further comprising:

[0656] Receive image data of a feature set of a specified pattern to be printed on a specified substrate; and

[0657] Execute a machine learning model to determine error contribution data, which includes error contributions to the feature set from multiple sources.

[0658] 134. The computer-readable medium according to Clause 133, wherein the image data comprises an image of features in a feature set, and wherein the error contribution data comprises an error contribution value corresponding to a ΔCD value associated with a feature.

[0659] 135. The computer-readable medium according to Clause 133, wherein the image data comprises an image set of a feature set, and wherein the error contribution data comprises error contribution values ​​corresponding to LCDU values ​​associated with the feature set.

[0660] 136. The computer-readable medium according to Clause 133, wherein the error contribution data includes a plurality of error contribution value sets corresponding to a plurality of measurement points on features in a feature set, wherein the error contribution value set includes a first set of error contribution values ​​representing error contributions from a plurality of sources at a first measurement point in the measurement points.

[0661] 137. The computer-readable medium as described in Clause 133 further comprises:

[0662] Based on the mask error contribution in the error contribution, adjust one or more parameters of at least one of the mask or the source of the lithography equipment used to print the specified pattern.

[0663] 138. The computer-readable medium as described in Clause 133, further comprising:

[0664] Based on the resist error contribution in the error contribution, adjust one or more parameters of at least one of the sources of the mask or the photolithography equipment used to print the specified pattern.

[0665] 139. A non-transitory computer-readable medium having instructions, which, when executed by a computer, cause the computer to perform a method for determining error contribution data, the error contribution data including error contributions to features of a pattern printed on a substrate from multiple sources, the method comprising:

[0666] Receive image data of a feature set of a specified pattern to be printed on a first substrate;

[0667] Inputting image data into a machine learning model; and

[0668] Execute a machine learning model to determine error contribution data, which includes error contributions to the feature set from multiple sources.

[0669] 140. The computer-readable medium according to Clause 139, wherein the image data comprises an image of features in a feature set, and wherein the error contribution data comprises an error contribution value corresponding to a ΔCD value associated with a feature.

[0670] 141. The computer-readable medium according to Clause 139, wherein the image data comprises an image set of a feature set, and wherein the error contribution data comprises error contribution values ​​corresponding to LCDU values ​​associated with the feature set.

[0671] 142. The computer-readable medium according to Clause 139, wherein the error contribution data includes a plurality of error contribution value sets corresponding to a plurality of measurement points on a feature set, wherein the error contribution value set includes a first set of error contribution values ​​representing error contributions from a plurality of sources at a first measurement point among the measurement points.

[0672] 143. The computer-readable medium as described in Clause 139, wherein performing a machine learning model to determine error contribution data includes:

[0673] The machine learning model is trained using multiple datasets, wherein the datasets include a first dataset having: (a) first image data of one or more features of a pattern to be printed on a substrate, and (b) first error contribution data, which includes error contributions to one or more features from multiple sources.

[0674] 144. The computer-readable medium according to Clause 143, wherein the first image data includes a first image of a feature among one or more features, and wherein the first error contribution data includes a set of first error contribution values ​​corresponding to the ΔCD value of the first feature.

[0675] 145. The computer-readable medium according to Clause 143, wherein the first image data comprises a first image set of a plurality of features among one or more features, and wherein the first error contribution data comprises a first set of error contribution values ​​corresponding to the LCDU values ​​of the features.

[0676] 146. The computer-readable medium as described in Clause 143, wherein the error contribution includes:

[0677] Image acquisition tool error contribution associated with the image acquisition tool used to acquire the first image data.

[0678] The contribution of mask error associated with the mask used to print patterns on the substrate, and

[0679] The resist error contribution associated with the resist used for printing patterns, wherein the resist error contribution includes photoresist chemical noise and shot noise associated with the source of the photolithography equipment used for printing patterns.

[0680] 147. A method for training a machine learning model to determine the error contribution to features of a pattern printed on a substrate, the method comprising:

[0681] Acquire training data with multiple datasets, wherein the datasets include a first dataset having: (a) first image data of one or more features of a pattern to be printed on a substrate, and (b) first error contribution data, the first error contribution data including error contributions to one or more features from multiple sources; and

[0682] A machine learning model is trained based on the training data to predict the error contribution data of the first dataset, thereby reducing the cost function, which indicates the difference between the predicted error contribution data and the first error contribution data.

[0683] 148. The method according to Clause 147, wherein the first image data includes a first image of a feature among one or more features, and wherein the first error contribution data includes a set of first error contribution values ​​associated with a Δ critical size (CD) value of the first feature.

[0684] 149. The method according to Clause 148, wherein each ΔCD value indicates the deviation of the CD value of the first feature from the average of a plurality of CD values ​​of one or more features.

[0685] 150. The method according to Clause 147, wherein the first image data comprises a first image set of a plurality of features among one or more features, and wherein the first error contribution data comprises a first set of error contribution values ​​corresponding to the local CD uniformity (LCDU) values ​​of the features.

[0686] 151. The method according to Clause 147, wherein the first error contribution data includes a set of multiple error contribution values ​​corresponding to multiple measurement points on a feature of one or more features, wherein the set of error contribution values ​​includes a first set of error contribution values ​​representing error contributions from multiple sources at a first measurement point among the measurement points.

[0687] 152. The method according to Clause 147, wherein the first error contribution data is determined based on measurement data of one or more features.

[0688] 153. The method according to Clause 152, wherein the measurement data includes the CD value of one or more features or the LCDU value of multiple features among one or more features.

[0689] 154. The method according to Clause 147, wherein the error contribution includes:

[0690] Image acquisition tool error contribution associated with the image acquisition tool used to acquire the first image data.

[0691] The contribution of mask error associated with the mask used to print patterns on the substrate, and

[0692] The resist error contribution associated with the resist used for printing patterns, wherein the resist error contribution includes photoresist chemical noise and shot noise associated with the source of the photolithography equipment used for printing patterns.

[0693] 155. The method according to Clause 147, wherein training the machine learning model is an iterative process, wherein each iteration includes:

[0694] (a) Use multiple datasets to execute machine learning models to output prediction error contribution data.

[0695] (b) The cost function is determined as the difference between the prediction error contribution data and the first error contribution data.

[0696] (c) Adjust the machine learning model.

[0697] (d) Determine whether the cost function decreases due to the adjustment, and

[0698] (e) In response to the cost function not decreasing, repeat steps (a), (b), (c) and (d).

[0699] 156. The method described in accordance with Clause 147 further includes:

[0700] Receive image data of a feature set of a specified pattern to be printed on a specified substrate; and

[0701] Execute a machine learning model to determine error contribution data, which includes error contributions to the feature set from multiple sources.

[0702] 157. The method according to Clause 156, wherein the image data comprises images of features in a feature set, and wherein the error contribution data comprises error contribution values ​​corresponding to ΔCD values ​​associated with the features.

[0703] 158. The method according to Clause 156, wherein the image data comprises an image set of the feature set, and wherein the error contribution data comprises error contribution values ​​corresponding to LCDU values ​​associated with the feature set.

[0704] 159. The method according to Clause 156, wherein the error contribution data includes a plurality of error contribution value sets corresponding to a plurality of measurement points on a feature set, wherein the error contribution value set includes a first set of error contribution values ​​representing error contributions from a plurality of sources at a first measurement point among the measurement points.

[0705] 160. The method according to Clause 156 further includes:

[0706] Based on the mask error contribution in the error contribution, adjust one or more parameters of at least one of the mask or the source of the lithography equipment used to print the specified pattern.

[0707] 161. The method according to Clause 156 further includes:

[0708] Based on the resist error contribution in the error contribution, adjust one or more parameters of at least one of the sources of the mask or the photolithography equipment used to print the specified pattern.

[0709] 162. A method for determining error contribution data, the error contribution data including error contributions to features of a pattern printed on a substrate from multiple sources, the method comprising:

[0710] Receive image data of a feature set of a specified pattern to be printed on a first substrate;

[0711] Inputting image data into a machine learning model; and

[0712] Execute a machine learning model to determine error contribution data, which includes error contributions to the feature set from multiple sources.

[0713] 163. The method according to Clause 162, wherein the image data comprises images of features in a feature set, and wherein the error contribution data comprises error contribution values ​​corresponding to ΔCD values ​​associated with the features.

[0714] 164. The method according to Clause 162, wherein the image data comprises an image set of the feature set, and wherein the error contribution data comprises error contribution values ​​corresponding to LCDU values ​​associated with the feature set.

[0715] 165. The method according to Clause 162, wherein the error contribution data includes a plurality of error contribution value sets corresponding to a plurality of measurement points on a feature set, wherein the error contribution value set includes a first set of error contribution values ​​representing error contributions from a plurality of sources at a first measurement point among the measurement points.

[0716] 166. The method according to Clause 162, wherein performing a machine learning model to determine error contribution data includes:

[0717] The machine learning model is trained using multiple datasets, wherein the datasets include a first dataset having: (a) first image data of one or more features of a pattern to be printed on a substrate, and (b) first error contribution data, which includes error contributions to one or more features from multiple sources.

[0718] 167. The method according to Clause 162, wherein the error contribution includes:

[0719] Image acquisition tool error contribution associated with the image acquisition tool used to acquire the first image data.

[0720] The contribution of mask error associated with the mask used to print patterns on the substrate, and

[0721] The resist error contribution associated with the resist used for printing patterns, wherein the resist error contribution includes photoresist chemical noise and shot noise associated with the source of the photolithography equipment used for printing patterns.

[0722] 168. An apparatus for training a machine learning model to determine the error contribution to features of a pattern printed on a substrate, the apparatus comprising:

[0723] Memory for storing instruction sets; and

[0724] At least one processor is configured to execute an instruction set that causes the device to perform the following methods:

[0725] Acquire training data with multiple datasets, wherein the datasets include a first dataset having (a) first image data of one or more features of a pattern to be printed on a substrate, and (b) first error contribution data, the first error contribution data including error contributions to one or more features from multiple sources; and

[0726] The machine learning model is trained based on the training data to predict the error contribution data of the first dataset, thereby reducing the cost function that indicates the difference between the predicted error contribution data and the first error contribution data.

[0727] 169. The apparatus according to Clause 168, wherein the first image data includes a first image of a feature among one or more features, and wherein the first error contribution data includes a set of first error contribution values ​​corresponding to a Δ critical size (CD) value of the first feature.

[0728] 170. The device according to Clause 169, wherein each ΔCD value indicates the deviation of the CD value of a first feature from the average of a plurality of CD values ​​of one or more features.

[0729] 171. The apparatus according to claim 168, wherein the first image data comprises a first image set of a plurality of features among one or more features, and wherein the first error contribution data comprises a first set of error contribution values ​​associated with the local CD uniformity (LCDU) values ​​of the features.

[0730] 172. The apparatus according to Clause 168, wherein the first error contribution data includes a plurality of error contribution values ​​corresponding to a plurality of measurement points on a feature of one or more features, wherein the set of error contribution values ​​includes a first set of error contribution values ​​representing error contributions from a plurality of sources at a first measurement point among the measurement points.

[0731] 173. The device according to Clause 168, wherein the first error contribution data is determined based on measurement data of one or more features.

[0732] 174. The apparatus according to Clause 173, wherein the measurement data includes the CD value of one or more features or the LCDU value of multiple features among one or more features.

[0733] 175. The device as described in Clause 168, wherein the error contribution includes:

[0734] Image acquisition tool error contribution associated with the image acquisition tool used to acquire the first image data.

[0735] The contribution of mask error associated with the mask used to print patterns on the substrate, and

[0736] The resist error contribution associated with the resist used for printing patterns, wherein the resist error contribution includes photoresist chemical noise and shot noise associated with the source of the photolithography equipment used for printing patterns.

[0737] 176. The device according to Clause 168, wherein training the machine learning model is an iterative process, wherein each iteration includes:

[0738] (a) Use multiple datasets to execute machine learning models to output prediction error contribution data.

[0739] (b) The cost function is determined as the difference between the prediction error contribution data and the first error contribution data.

[0740] (c) Adjust the machine learning model.

[0741] (d) Determine whether the cost function decreases due to the adjustment, and

[0742] (e) In response to the cost function not decreasing, repeat steps (a), (b), (c) and (d).

[0743] 177. The device according to Clause 168, further comprising:

[0744] Receive image data of a feature set of a specified pattern to be printed on a specified substrate; and

[0745] Execute a machine learning model to determine error contribution data, which includes error contributions to the feature set from multiple sources.

[0746] 178. The apparatus according to Clause 177, wherein the image data comprises images of features in a feature set, and wherein the error contribution data comprises error contribution values ​​corresponding to ΔCD values ​​associated with the features.

[0747] 179. The apparatus according to Clause 177, wherein the image data comprises an image set of features, and wherein the error contribution data comprises error contribution values ​​corresponding to LCDU values ​​associated with the feature set.

[0748] 180. The apparatus according to Clause 177, wherein the error contribution data includes a plurality of error contribution value sets corresponding to a plurality of measurement points on a feature set, wherein the error contribution value set includes a first set of error contribution values ​​representing error contributions from a plurality of sources at a first measurement point among the measurement points.

[0749] 181. The device according to Clause 177, further comprising:

[0750] Based on the mask error contribution in the error contribution, adjust one or more parameters of at least one of the mask or the source of the lithography equipment used to print the specified pattern.

[0751] 182. The device according to Clause 177, further comprising:

[0752] Based on the resist error contribution in the error contribution, adjust one or more parameters of at least one of the sources of the mask or the photolithography equipment used to print the specified pattern.

[0753] 183. An apparatus for determining error contribution data, the error contribution data including error contributions to features of a pattern printed on a substrate from multiple sources, the apparatus comprising:

[0754] Memory for storing instruction sets; and

[0755] At least one processor is configured to execute an instruction set that causes the device to perform the following methods:

[0756] Receive image data of a feature set of a specified pattern to be printed on a first substrate;

[0757] Inputting image data into a machine learning model; and

[0758] Execute a machine learning model to determine error contribution data, which includes error contributions to the feature set from multiple sources.

[0759] 184. The apparatus according to Clause 183, wherein the image data comprises images of features in a feature set, and wherein the error contribution data comprises error contribution values ​​corresponding to ΔCD values ​​associated with the features.

[0760] 185. The apparatus according to Clause 183, wherein the image data comprises an image set of features, and wherein the error contribution data comprises error contribution values ​​corresponding to LCDU values ​​associated with the feature set.

[0761] 186. The apparatus according to Clause 183, wherein the error contribution data comprises a plurality of error contribution value sets corresponding to a plurality of measurement points on a feature set, wherein the error contribution value set comprises a first set of error contribution values ​​representing error contributions from a plurality of sources at a first measurement point among the measurement points.

[0762] 187. The apparatus according to Clause 183, wherein performing a machine learning model to determine error contribution data includes:

[0763] The machine learning model is trained using multiple datasets, wherein the datasets include a first dataset having (a) first image data of one or more features of a pattern to be printed on a substrate, and (b) first error contribution data, which includes error contributions to one or more features from multiple sources.

[0764] 188. The device as described in Clause 183, wherein the error contribution includes:

[0765] Image acquisition tool error contribution associated with the image acquisition tool used to acquire the first image data.

[0766] The contribution of mask error associated with the mask used to print patterns on the substrate, and

[0767] The resist error contribution associated with the resist used for printing patterns, wherein the resist error contribution includes photoresist chemical noise and shot noise associated with the source of the photolithography equipment used for printing patterns.

[0768] As used herein, unless otherwise specified, the term "or" covers all possible combinations unless it is impractical. For example, if a statement of components includes A or B, then unless otherwise explicitly stated or impractical, the components may include A or B, or A and B. As a second example, if a statement of components includes A, B, or C, then unless otherwise explicitly stated or impractical, the components may include A, or B, or C, or A and B, or A and C, or B and C, or A, B, and C. Expressions such as "at least one" do not necessarily modify the entire appended list and do not necessarily modify each member of the list, such that "at least one of A, B, and C" should be understood to include only one of A, only one of B, only one of C, or any combination of A, B, and C. The phrase "one of A and B" or "any one of A and B" should be interpreted in the broadest sense to include either one of A or one of B.

[0769] The above description is intended to be illustrative and not limiting. Therefore, it will be apparent to those skilled in the art that modifications as described can be made without departing from the scope of the claims set forth below.

Claims

1. A non-transitory computer-readable medium having instructions, which, when executed by a computer, cause the computer to perform operations for decomposing error contributions from multiple sources to multiple features of a pattern printed on a substrate via photolithography, the operations comprising: Acquire an image of the pattern on the substrate; The image is used to obtain multiple measurements of the features of the pattern as a function of height, wherein the measurements are obtained for different sensor values ​​of a parameter associated with the height; Using a decomposition method, each of the plurality of measurements is associated with a linear mixture of the error contributions corresponding to different values ​​of the parameter associated with the height, to generate a plurality of linear mixtures of the error contributions; as well as Based on the linear mixture and using the decomposition method, derive each of the error contributions. The different sensor values ​​correspond to different thresholds associated with the image.

2. The computer-readable medium of claim 1, wherein each threshold corresponds to a threshold value of a pixel in the image.

3. The computer-readable medium of claim 2, wherein each measurement corresponds to a critical size CD value of the feature at one of the different thresholds.

4. The computer-readable medium of claim 2, wherein the error contribution comprises: The image acquisition tool error contribution is associated with the image acquisition tool used to acquire the image. The mask error contribution is associated with the mask used to print the pattern on the substrate, and The resist error contribution, associated with the resist used to print the pattern, includes: photoresist chemical noise and shot noise associated with the source of the photolithography apparatus used to print the pattern.

5. The computer-readable medium of claim 4, further comprising: Based on the mask error contribution, adjust one or more parameters of at least one of the mask or the source of the photolithography apparatus used to print the pattern.

6. The computer-readable medium of claim 4, further comprising: Based on the resist error contribution, adjust one or more parameters of at least one of the mask or the source of the photolithography apparatus used to print the pattern.

7. The computer-readable medium of claim 3, wherein obtaining the measurement value comprises: At the first threshold among the different thresholds, a first signal with multiple first ΔCD values ​​is acquired from multiple measurement points. At the second threshold among the different thresholds, a second signal with multiple second ΔCD values ​​is acquired from the plurality of measurement points, and At the third threshold among the different thresholds, a third signal with multiple third ΔCD values ​​is obtained from the plurality of measurement points.

8. The computer-readable medium of claim 7, wherein each ΔCD value is determined based on a threshold and a measurement point, and indicates the deviation of the CD value of a given feature from the average of a plurality of CD values ​​of the feature.

9. The computer-readable medium of claim 7, wherein each ΔCD value indicates, at a given threshold, the distance between a designated point on a contour line of a given feature and a reference point on a reference contour line of the given feature, wherein the reference contour line is a simulated version of the contour line of the given feature.

10. The computer-readable medium of claim 7, wherein associating each measurement value comprises: Each of the plurality of first ΔCD values ​​in the first signal is associated with a first linear mixture of the image acquisition tool error contribution, mask error contribution, and resist error contribution. Each of the plurality of second ΔCD values ​​in the second signal is correlated with a second linear mixture of the image acquisition tool error contribution, the mask error contribution, and the resist error contribution, and Each of the plurality of third ΔCD values ​​in the third signal is associated with a third linear mixture of the image acquisition tool error contribution, the mask error contribution, and the resist error contribution.

11. The computer-readable medium of claim 10, wherein deriving each of the error contributions comprises: Using the first linear blend, the second linear blend, and the third linear blend, and based on each of the plurality of first ΔCD values, the plurality of second ΔCD values, and the plurality of third ΔCD values, derive: (a) a first output signal having the plurality of the image acquisition tool error contributions, (b) a second output signal having the plurality of the mask error contributions, and (c) a third output signal having the plurality of the resist error contributions.

12. The computer-readable medium of claim 11, wherein each error contribution is determined based on the corresponding error contributions at the first threshold level, the second threshold level, and the third threshold level.

13. The computer-readable medium of claim 11, wherein deriving each of the error contributions comprises: A mixing matrix with a set of coefficients is determined, wherein the set of coefficients generates a first linear mixture, a second linear mixture, and a third linear mixture corresponding to the error contribution for each ΔCD value, based on the plurality of first ΔCD values, the plurality of second ΔCD values, and the plurality of third ΔCD values. Determine the inverse of the mixture matrix, and Using the inverse of the mixing matrix, based on the plurality of first ΔCD values, the plurality of second ΔCD values, and the plurality of third ΔCD values, respectively determine (a) a first output signal having multiple contributions from the image acquisition tool error, (b) a second output signal having multiple contributions from the mask error, and (c) a third output signal having multiple contributions from the resist error.

14. The computer-readable medium of claim 2, wherein obtaining the measurement value comprises: Obtain the first contour line corresponding to the first threshold among different thresholds of the feature. Obtain the first CD value of the first contour line. Obtain the second contour line of the feature corresponding to the second threshold among the different thresholds, and Obtain the second CD value of the second contour line.

15. The computer-readable medium of claim 14, further comprising: At the first threshold, a first ΔCD value is obtained for the first CD value, wherein the first ΔCD indicates the deviation of the first CD value from the average of a plurality of first CD values ​​measured at a plurality of measurement points.

16. The computer-readable medium of claim 15, wherein obtaining the first ΔCD value comprises: At the plurality of measurement points, the plurality of first CD values ​​corresponding to the first threshold are obtained. Obtain the average value of the plurality of first CD values. Shift the average value to zero, and The first ΔCD value is obtained as the difference between the first CD value and the average value.

17. The computer-readable medium of claim 15, wherein the plurality of measurement points are located on at least one of (a) the feature of the pattern or (b) the plurality of features of the pattern.

18. The computer-readable medium according to any one of claims 15-17, wherein associating each measurement value comprises: The first ΔCD value corresponding to the first threshold is associated with a first linear mixture of the first error contribution and the second error contribution in the error contribution, and The second ΔCD value corresponding to the second threshold is associated with a second linear mixture of the first error contribution and the second error contribution.

19. The computer-readable medium of claim 18, wherein each of the derived error contributions comprises: Using a decomposition method, the first error contribution and the second error contribution are derived based on the first ΔCD value and the second ΔCD value, as well as the first linear mixture and the second linear mixture.

20. The computer-readable medium of claim 1, wherein for the different sensor values ​​of the parameter, the measured value corresponds to the local critical size uniformity (LCDU) value of the feature.

21. The computer-readable medium according to any one of claims 1 and 20, wherein the different sensor values ​​of said parameter correspond to different dose levels associated with a source of a photolithography apparatus used to print said pattern.

22. The computer-readable medium according to any one of claims 1 and 20, wherein the different sensor values ​​of said parameter correspond to different focus levels associated with a source of a photolithography apparatus used to print said pattern.

23. The computer-readable medium according to any one of claims 20-21, further comprising: Based on a specified focus level, obtain the first LCDU value corresponding to the first dose level, and Based on the specified focus level, obtain the second LCDU value corresponding to the second dose level.

24. The computer-readable medium according to any one of claims 20 or 22, further comprising: Based on a specified dose level, obtain the first LCDU value corresponding to the first focusing level, and Based on the specified dose level, obtain the second LCDU value corresponding to the second focusing level.

25. The computer-readable medium according to any one of claims 23 or 24, wherein associating each measurement value comprises: The first LCDU value is correlated with a first linear mixture of the first error contribution and the second error contribution in the error contribution, and The second LCDU value is associated with a second linear mixture of the first error contribution and the second error contribution.

26. The computer-readable medium of claim 25, wherein deriving each of the error contributions comprises: Using a decomposition method, the first error contribution and the second error contribution are derived based on the first LCDU value, the second LCDU value, the first linear mixture, and the second linear mixture.

27. The computer-readable medium of claim 1, wherein the measured value corresponds to the linewidth roughness (LWR) value of the feature for the different sensor values ​​of the parameter.

28. A method for decomposing error contributions from multiple sources to multiple features associated with a pattern printed on a substrate via photolithography, the method comprising: Acquire an image of the pattern on the substrate; Using the image, multiple measurements of the features of the pattern as a function of height are obtained, wherein the measurements correspond to different values ​​of a parameter associated with the height; Using a decomposition method, each of the plurality of measurements is associated with a linear mixture of the error contributions corresponding to different values ​​of the parameter associated with the height, to generate a plurality of linear mixtures of the error contributions; as well as Each of the error contributions is derived based on the linear mixture and using the decomposition method.

29. The method of claim 28, wherein each measurement corresponds to a critical size CD value of the feature at one of the different thresholds.

30. The method of claim 29, wherein each threshold corresponds to a threshold value of a pixel in the image.

31. The method according to any one of claims 29-30, wherein the error contribution comprises: The first, second, and third error contributions to the CD value, wherein the first error contribution comes from the resist used to print the pattern, the second error contribution comes from the mask used to print the pattern on the substrate, and the third error contribution comes from the image acquisition tool used to acquire the image.

32. An apparatus for decomposing error contributions from multiple sources to multiple features of a pattern printed on a substrate via photolithography, the apparatus comprising: Memory for storing instruction sets; as well as At least one processor is configured to execute a set of instructions to cause the device to perform the following methods: Acquire an image of the pattern on the substrate; The image is used to obtain multiple measurements of the features of the pattern as a function of height, wherein the measurements are obtained for different sensor values ​​of a parameter associated with the height; Using a decomposition method, each of the plurality of measurements is associated with a linear mixture of the error contributions corresponding to different values ​​of the parameter associated with the height, to generate a plurality of linear mixtures of the error contributions; as well as Based on the linear mixture and using the decomposition method, derive each of the error contributions. The different sensor values ​​correspond to different thresholds associated with the image.

33. The device of claim 32, wherein each threshold corresponds to a threshold value of a pixel in the image.

34. The device of claim 33, wherein each measurement corresponds to a critical size CD value of the feature at one of the different thresholds.

35. The device of claim 33, wherein the error contribution includes: Image acquisition tool error contribution associated with the image acquisition tool used to acquire the image. The mask error contribution associated with the mask used to print the pattern on the substrate, and The resist error contribution associated with the resist used to print the pattern, wherein the resist error contribution includes photoresist chemical noise and shot noise associated with the source of the photolithography apparatus used to print the pattern.

36. The apparatus of claim 35, further comprising: Based on the mask error contribution, adjust one or more parameters of at least one of the mask or the source of the photolithography apparatus used to print the pattern.

37. The device according to claim 35, further comprising: Based on the resist error contribution, adjust one or more parameters of at least one of the mask or the source of the photolithography apparatus used to print the pattern.

38. The device according to any one of claims 34-37, wherein acquiring the measured value comprises: At the first threshold among the different thresholds, a first signal with multiple first ΔCD values ​​is acquired from multiple measurement points. At the second threshold among the different thresholds, a second signal with multiple second ΔCD values ​​is acquired from the plurality of measurement points, and At the third threshold among the different thresholds, a third signal with multiple third ΔCD values ​​is obtained from the plurality of measurement points.

39. The device of claim 38, wherein each ΔCD value is determined based on each threshold and each measurement point, and indicates the deviation of the CD value of a given feature from the average of a plurality of CD values ​​of said feature.

40. The device of claim 38, wherein each ΔCD value indicates, at a given threshold, the distance between a designated point on the contour line of a given feature and a reference point on a reference contour line of the given feature, wherein the reference contour line is a simulated version of the contour line of the given feature.

41. The apparatus of claim 38, wherein associating each measurement value includes: Each of the plurality of first ΔCD values ​​in the first signal is associated with a first linear mixture of the image acquisition tool error contribution, mask error contribution, and resist error contribution. Each of the plurality of second ΔCD values ​​in the second signal is correlated with a second linear mixture of the image acquisition tool error contribution, mask error contribution, and resist error contribution, and Each of the plurality of third ΔCD values ​​in the third signal is associated with a third linear mixture of the image acquisition tool error contribution, mask error contribution, and resist error contribution.

42. The apparatus of claim 41, wherein each of the derived error contributions comprises: Using the first linear blend, the second linear blend, and the third linear blend, and based on each of the plurality of first ΔCD values, the plurality of second ΔCD values, and the plurality of third ΔCD values, derive: (a) a first output signal having the plurality of the image acquisition tool error contributions, (b) a second output signal having the plurality of the mask error contributions, and (c) a third output signal having the plurality of the resist error contributions.

43. The device of claim 42, wherein deriving each of the error contributions comprises: Each of the error contributions is derived using the Independent Component Analysis (ICA) method.

44. The device of claim 43, wherein deriving each of the error contributions using the ICA method comprises: A mixture matrix with a set of coefficients is determined, wherein the set of coefficients generates first, second, and third linear mixtures of the error contribution corresponding to each ΔCD value based on the plurality of first ΔCD values, the plurality of second ΔCD values, and the plurality of third ΔCD values, respectively. Determine the inverse of the mixture matrix, and Using the inverse of the mixing matrix, based on the plurality of first ΔCD values, the plurality of second ΔCD values, and the plurality of third ΔCD values, respectively determine: (a) a first output signal having multiple contributions from the image acquisition tool errors, (b) a second output signal having multiple contributions from the mask errors, and (c) a third output signal having multiple contributions from the resist errors.

45. The apparatus of claim 42, wherein deriving each of the error contributions comprises: Each of the error contributions is derived using either the reconstructed ICA method or the orthogonal ICA method.

46. ​​The device according to any one of claims 33-34, wherein acquiring the measured value comprises: Obtain the first contour line of the feature corresponding to the first threshold among the different thresholds. Obtain the first CD value of the first contour line. Obtain the second contour line of the feature corresponding to the second threshold among the different thresholds, and Obtain the second CD value of the second contour line.

47. The apparatus of claim 46, further comprising: At the first threshold, a first ΔCD value is obtained for the first CD value, wherein the first ΔCD indicates the deviation of the first CD value from the average of a plurality of first CD values ​​measured at a plurality of measurement points.

48. The device of claim 47, wherein obtaining the first ΔCD value comprises: At the plurality of measurement points, the plurality of first CD values ​​corresponding to the first threshold are obtained. Obtain the average value of the plurality of first CD values. Shift the average value to zero, and The first ΔCD value is obtained as the difference between the first CD value and the average value.

49. The device of claim 48, wherein the plurality of measuring points are located on at least one of (a) the feature of the pattern or (b) the plurality of features.

50. The device according to any one of claims 46-48, wherein associating each measurement value comprises: The first ΔCD value corresponding to the first threshold is associated with a first linear mixture of the first error contribution and the second error contribution in the error contribution, and The second ΔCD value corresponding to the second threshold is associated with a second linear mixture of the first error contribution and the second error contribution.

51. The apparatus of claim 50, wherein deriving each of the error contributions comprises: Using a decomposition method, the first error contribution and the second error contribution are derived based on the first ΔCD value and the second ΔCD value, as well as the first linear mixture and the second linear mixture.

52. The device of claim 32, wherein the measured value corresponds to the local critical size uniformity LCDU value of the different sensor values ​​for the parameter of the feature.

53. The device according to any one of claims 32 and 52, wherein the different sensor values ​​of said parameter correspond to different dose levels associated with a source of the photolithography apparatus used to print said pattern.

54. The device according to any one of claims 32 and 52, wherein the different sensor values ​​of said parameter correspond to different focus levels associated with the source of the photolithography apparatus used to print said pattern.

55. The device according to any one of claims 52-53, further comprising: Based on a specified focus level, obtain the first LCDU value corresponding to the first dose level, and Based on the specified focus level, obtain the second LCDU value corresponding to the second dose level.

56. The device according to any one of claims 52 or 54, further comprising: Based on a specified dose level, obtain a first LCDU value corresponding to a first threshold of the focus level, and Based on the specified dose level, a second LCDU value corresponding to a second threshold of the focusing level is obtained.

57. The device according to any one of claims 55 or 56, wherein associating each measurement value comprises: The first LCDU value is correlated with a first linear mixture of the first error contribution and the second error contribution in the error contribution, and The second LCDU value is associated with a second linear mixture of the first error contribution and the second error contribution.

58. The apparatus of claim 57, wherein deriving each of the error contributions comprises: Using a decomposition method, the first error contribution and the second error contribution are derived from the first LCDU value, the second LCDU value, the first linear mixture, and the second linear mixture.

59. The device of claim 32, wherein the measured value corresponds to the linewidth roughness (LWR) value of the feature for the different sensor values ​​of the parameter.