Using generative artifical intelligence for focus and dose metrology and for generating reference images for charged-particle beam metrology

A generative AI model generates synthetic images to tune image processing algorithms for charged-particle beam tools, addressing precision and efficiency challenges in focus and dose metrology, thereby improving semiconductor manufacturing yield.

WO2026130924A1PCT designated stage Publication Date: 2026-06-25ASML NETHERLANDS BV

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
ASML NETHERLANDS BV
Filing Date
2025-11-18
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing focus and dose metrology technologies for charged-particle beam tools struggle to achieve precise control and efficient processing of images for semiconductor manufacturing, particularly in high-NA EUV systems, leading to potential defects and reduced yield due to inefficient image processing algorithms.

Method used

Employing a generative artificial intelligence model, specifically a variational autoencoder, to generate synthetic images for tuning image processing algorithms, allowing for accurate focus and dose metrology by training on reference images with known values and using a decoder to construct new images for parameter tuning.

Benefits of technology

Enhances the efficiency and accuracy of focus and dose metrology, reducing the number of images needed for measurement and improving wafer throughput by calibrating image processing algorithms effectively.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure EP2025083344_25062026_PF_FP_ABST
    Figure EP2025083344_25062026_PF_FP_ABST
Patent Text Reader

Abstract

A non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computing device to cause the computing device to perform operations for tuning an image processing algorithm with constructed reference images. The operations include: using a decoder of an encoder-decoder model to construct new images having features with known reference values, wherein the encoder-decoder model has been trained with multiple target images with programmed reference values; extracting metrology values from the constructed new images using the encoder; calculating key performance indicators (KPIs) of the new images based on the extracted metrology values; and selecting the constructed new images whose KPIs are within a target range as reference images for tuning parameters of the image processing algorithm.
Need to check novelty before this filing date? Find Prior Art

Description

USING GENERATIVE ARTIFICAL INTELLIGENCE FOR FOCUS AND DOSE METROLOGY AND FOR GENERATING REFERENCE IMAGES FOR CHARGED-PARTICLE BEAM METROLOGYCROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority of US application 63 / 734,731 which was filed on 16 December 2024 and which is incorporated herein in its entirety by reference.TECHNICAL FIELD

[0002] The embodiments provided herein relate to using a generative artificial intelligence model for metrology, and more particularly to, using a generative artificial intelligence model for focus and dose metrology and to generate reference images for charged-particle beam metrology.BACKGROUND

[0003] Charged-particle beam tools, such as a charged-particle beam inspection tool, can acquire high spatial resolution images of test targets, actual devices, and device-like structures on a wafer. There are various image processing algorithms that can extract information from the images, such as critical dimension (CD) and overlay (OV) values. To achieve accurate metrology, the image processing algorithms need to be properly tuned.

[0004] Scanner focus and dose control directly impacts the wafer patterning quality and wafer yield. The term “focus” refers to the alignment of the focal plane of the lithography system’s optical projection system with the surface of the wafer. Proper focus ensures that the light used to expose the photoresist on the wafer is sharply concentrated at the desired level, producing clear and accurately defined patterns. The term “dose” refers to the amount of energy per unit area delivered to the photoresist-coated wafer during the exposure process. The dose determines how much light energy is absorbed by the photoresist, which in turn affects the chemical reactions necessary for pattern development. The term “focus and dose” refers to focus or dose and can include determining the focus alone, the dose alone, or both the focus and the dose.

[0005] The term “focus and dose marks” can refer to marks which may have properties that can be utilized to determine the focus or the dose. Focus marks are elements designed to be part of a lithography mask pattern and printed on the wafers. They help ensure that the photolithographic process achieves the necessary precision and accuracy for the fine features of semiconductor devices. Focus marks are used to monitor and control focus by helping to accurately determine and adjust the focal plane of the lithography system to ensure that the wafer surface is precisely in focus. They can also be used to calibrate and fine-tune the lithography equipment, ensuring optimal performance. Focus marks can include a series of patterns or structures fabricated onto the wafer or reticle. Accurate focusdetermination is important for controlling the critical dimensions (CD) of the features being printed on the wafer.

[0006] Dose marks, similar to focus marks, are also elements used in the lithography process for printing integrated circuits. They help monitor and control the exposure dose of light that reaches the wafer, ensuring the proper transfer of the mask pattern onto the wafer. Dose marks can help in accurately determining the amount of energy (or dose) delivered to the photoresist-coated wafer during the exposure process. They can assist in ensuring that the exposure dose is uniform across the entire wafer, which is important for consistent feature sizes and patterns. Dose marks can include a series of patterns or features that are sensitive to variations in exposure dose, such as arrays of lines, dots, or other geometries designed to exhibit measurable changes when exposed to different doses of light.

[0007] Optical metrology sensors may be used to determine focus or dose utilizing the focus and dose marks. An optical metrology sensor may be configured to project a radiation beam onto a focus and dose mark and determine, based on a reflected radiation beam, a focus or dose associated with the lithography system or process. As noted, focus and dose marks are special marks that are printed on the wafer, requiring engineering time to design the marks and to place the marks in certain locations on the wafer where the marks can be measured and not interfere with the rest of the patterns to be printed on the wafer. For example, the focus and dose marks may be printed in between the scribe lines of a wafer.

[0008] Due to the small depth of focus, high-NA EUV systems requires even tighter focus and dose control than previous scanner generations. Existing focus and dose metrology technologies are mainly based on optical system and charged-particle beam (CPB) tools. For example, existing focus and dose metrology optical tools acquire diffraction signals from specially designed marks. It is common to use CPB tools to measure wafer feature CD (critical dimension) at various FEM (focus-exposure-matrix) and calibrate Bossung curves to estimate the focus and dose. Another focus and dose measurement method includes correlating pattern placement errors and CDs found in large field of view CPB images.SUMMARY

[0009] Some embodiments provide a non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computing device to cause the computing device to perform operations for tuning an image processing algorithm with constructed reference images. The operations include: using a decoder of an encoder-decoder model to construct new images having features with known reference values, wherein the encoder-decoder model has been trained with multiple target images with programmed reference values; extracting metrology values from the constructed new images using the encoder; calculating key performance indicators (KPIs) of the new images based on the extracted metrology values; and selecting the constructed new images whose KPIs are within a target range as reference images for tuning parameters of the image processing algorithm.

[0010] Some embodiments provide a non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computing device to cause the computingdevice to perform operations for generating synthetic images to tune an image processing algorithm. The operations include: using a trained decoder of a variational autoencoder to generate the synthetic images based on a selected value of one latent space parameter and average latent space parameters, wherein the average latent space parameters are determined by training an encoder and the decoder of the variational autoencoder on a set of training images; verifying the generated synthetic images using the encoder; and using the verified generated synthetic images to tune the image processing algorithm.

[0011] Some embodiments provide a non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computing device to cause the computing device to perform operations for focus and dose metrology. The operations include: providing a charged particle beam (CPB) image of a wafer pattern to a machine learning model; and executing the machine learning model to generate focus and dose information in generating the wafer pattern in a lithographic process.

[0012] Other advantages of the embodiments of the present disclosure will become apparent from the following description taken in conjunction with the accompanying drawings wherein are set forth, by way of illustration and example, certain embodiments of the present invention.BRIEF DESCRIPTION OF FIGURES

[0013] The above and other aspects of the present disclosure will become more apparent from the description of exemplary embodiments, taken in conjunction with the accompanying drawings.

[0014] Fig. 1 is a schematic diagram illustrating an example charged-particle beam (CPB) system, consistent with embodiments of the present disclosure.

[0015] Fig. 2 is a schematic diagram illustrating an example charged-particle beam tool, consistent with embodiments of the present disclosure that may be a part of the example charged-particle beam system of Fig. 1.

[0016] Fig. 3 is a schematic diagram illustrating an example multi-beam tool, consistent with embodiments of the present disclosure that can be a part of the example charged-particle beam system of Fig. 1.

[0017] Fig. 4 is a block diagram of an exemplary server, consistent with embodiments of the present disclosure.

[0018] Fig. 5 is a schematic diagram illustrating an example neural network implementing a variational autoencoder, consistent with embodiments of the present disclosure.

[0019] Fig. 6 is a plot showing programmed overlay values and measured overlay values, consistent with embodiments of the present disclosure.

[0020] Fig. 7 is a schematic diagram of a variational autoencoder for use in connection with overlay metrology, consistent with embodiments of the present disclosure.

[0021] Fig. 8 is a diagram showing an example accuracy self-reference image dataset, consistent with embodiments of the present disclosure.

[0022] Fig. 9 is a flowchart of an example method for training a variational autoencoder for use in connection with overlay metrology, consistent with embodiments of the present disclosure.

[0023] Fig. 10 shows example images generated by a trained decoder, consistent with embodiments of the present disclosure.

[0024] Fig. 11 is a flowchart of an example method for generating new images using the trained decoder, consistent with embodiments of the present disclosure.

[0025] Fig. 12 is a schematic diagram of a variational autoencoder for use in connection with focus and dose metrology, consistent with embodiments of the present disclosure.

[0026] Fig. 13 is an example of synthesized images, consistent with embodiments of the present disclosure.

[0027] Fig. 14 is a flowchart of an example method for training a variational autoencoder for use in connection with focus and dose metrology, consistent with embodiments of the present disclosure.DETAILED DESCRIPTION

[0028] Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise represented. The implementations set forth in the following description of exemplary embodiments do not represent all implementations. Instead, they are merely examples of apparatuses and methods consistent with aspects related to the disclosed embodiments as recited in the appended claims. For example, although some embodiments are described in the context of utilizing electron beams, the disclosure is not so limited. Other types of charged-particle beams (e.g., including protons, ions, muons, or any other particle carrying electric charges) may be similarly applied. Furthermore, other imaging systems may be used, such as optical imaging, photon detection, x-ray detection, ion detection, etc.

[0029] Relative dimensions of components in drawings may be exaggerated for clarity. Within the following description of drawings, the same or like reference numbers refer to the same or like components or entities, and only the differences with respect to the individual embodiments are described. As used herein, unless specifically stated otherwise, the term “or” encompasses all possible combinations, except where infeasible. For example, if it is stated that a component may include A or B, then, unless specifically stated otherwise or infeasible, the component may include A, or B, or A and B. As a second example, if it is stated that a component may include A, B, or C, then, unless specifically stated otherwise or infeasible, the component may include A, or B, or C, or A and B, or A and C, or B and C, or A and B and C.

[0030] Electronic devices are constructed of circuits formed on a piece of semiconductor material called a substrate. The semiconductor material may include, for example, silicon, gallium arsenide, indium phosphide, silicon germanium, or the like. Many circuits may be formed together on the samepiece of silicon and are called integrated circuits or ICs. The size of these circuits has decreased dramatically so that many more of them can be fit on the substrate. For example, an IC chip in a smartphone can be as small as a thumbnail and yet may include over 2 billion transistors, the size of each transistor being less than l / 1000th the size of a human hair.

[0031] Making these ICs with extremely small structures or components is a complex, time-consuming, and expensive process, often involving hundreds of individual steps. Errors in even one step have the potential to result in defects in the finished IC, rendering it useless. Thus, one goal of the manufacturing process is to avoid such defects to maximize the number of functional ICs made in the process; that is, to improve the overall yield of the process.

[0032] One component of improving yield is monitoring the chip-making process to ensure that it is producing a sufficient number of functional integrated circuits. One way to monitor the process is to inspect the chip circuit structures at various stages of their formation. Inspection can be carried out using a scanning charged-particle microscope (SCPM). For example, an SCPM may be a scanning electron microscope (SEM). An SCPM can be used to image these extremely small structures, in effect, taking a “picture” of the structures of the wafer. The image can be used to determine if the structure was formed properly in the proper location. If the structure is defective, then the process can be adjusted, so the defect is less likely to recur.

[0033] The working principle of an SCPM (e.g., an SEM) is similar to a camera. A camera takes a picture by receiving and recording intensity of light reflected or emitted from people or objects. An SCPM takes a “picture” by receiving and recording energies or quantities of charged particles (e.g., electrons) reflected or emitted from the structures of the wafer. Typically, the structures are made on a substrate (e.g., a silicon substrate) that is placed on a platform, referred to as a stage, for imaging. Before taking such a “picture,” a charged-particle beam may be projected onto the structures, and when the charged particles are reflected or emitted (“exiting”) from the structures (e.g., from the wafer surface, from the structures underneath the wafer surface, or both), a detector of the SCPM may receive and record the energies or quantities of those charged particles to generate an inspection image. To take such a “picture,” the charged-particle beam may scan through the wafer (e.g., in a line-by-line or zigzag manner), and the detector may receive exiting charged particles coming from a region under charged particle -beam projection (referred to as a “beam spot”). The detector may receive and record exiting charged particles from each beam spot one at a time and join the information recorded for all the beam spots to generate the inspection image. Some SCPMs use a single charged-particle beam (referred to as a “single-beam SCPM,” such as a single-beam SEM) to take a single “picture” to generate the inspection image, while some SCPMs use multiple charged-particle beams (referred to as a “multi-beam SCPM,” such as a multi-beam SEM) to take multiple “sub-pictures” of the wafer in parallel and stitch them together to generate the inspection image. By using multiple charged-particle beams, the SCPM may provide more charged-particle beams onto the structures for obtaining these multiple “sub-pictures,” resulting in more charged particles exiting from the structures. Accordingly, the detector may receivemore exiting charged particles simultaneously and generate inspection images of the structures of the wafer with higher efficiency and faster speed.

[0034] Generating and processing these images to determine whether any defects exist (sometimes as small as the nanometer scale) are computationally intensive. And as the physical sizes of IC components continue to shrink, accuracy and yield in defect detection become more important. To inspect a single wafer, it is not uncommon for an inspection system to generate and process a substantial number of images. For example, if each image taken corresponds to 6 pm x 6 pm portion of a wafer, for a 200mm wafer, it would take over 872 million images to image the entire wafer. If these images are not processed and evaluated efficiently, not surprisingly, yield will be dramatically impacted, thereby affecting the wafer throughput.

[0035] As used herein, the term “charged-particle beam inspection tool” may be understood to include an SCPM or an SEM as described above. The term “charged-particle beam tool” may be understood to include similar tools used in different environments, such as used in connection with a scanner (e.g., an EUV or DUV scanner).

[0036] As the physical sizes of IC components continue to shrink, accuracy and yield in defect detection become more important. Metrology tools can be used to determine whether the ICs are correctly manufactured by identifying a number of defects on each wafer, including at different levels of detail, such as a pattern level, an image (field of view) level, a die level, a care area level, or a wafer level.

[0037] Embodiments of the present disclosure can provide a method for tuning image processing algorithms using specifically constructed reference images. An encoder-decoder model may be trained with multiple target images by special loss functions. The trained decoder of the model may be used to construct new images at reference values. The trained encoder of the model may be used to extract metrology values from the constructed new images and calculate reference values of the new images. The constructed new images having reference values are within or outside of a target range may be selected as reference images for tuning parameters of the image processing algorithm. For example, the reference images may be constructed with specific value for certain metrology parameters, such as overlay (e.g., placement of an object in a design between two patterning steps) and critical dimension (e.g., the size of an object in the design). By using reference images with known metrology values, other image processing algorithms may be better tuned or calibrated.

[0038] Embodiments of the present disclosure can provide a method for performing focus and dose metrology. A machine learning model may be trained using images that have substantially a same wafer pattern as a charged-particle beam (CPB) image to be inspected. The training images may include measured CPB images with known focus and dose values (e.g., labels of the images). A loss function between a measured focus and dose value and the labels may be minimized to train the model. After training, a new CPB image may be provided to the encoder which can quickly determine the focus and dose for the CPB image. Using this method may reduce the number of CPB images needed to measurethe wafer focus and dose, and also eliminate the pattern placement and size calculation and therefore reduce the time needed to process a wafer.

[0039] Fig. 1 illustrates an exemplary charged-particle beam (CPB) system 100 consistent with embodiments of the present disclosure. CPB system 100 may be used for imaging. For example, CPB system 100 may use an electron beam for imaging. As shown in Fig. 1, CPB system 100 includes a main chamber 101, a load / lock chamber 102, a beam tool 104, and an equipment front end module (EFEM) 106. Beam tool 104 is located within main chamber 101. EFEM 106 includes a first loading port 106a and a second loading port 106b. EFEM 106 may include additional loading port(s). First loading port 106a and second loading port 106b receive wafer front opening unified pods (FOUPs) that contain wafers (e.g., semiconductor wafers or wafers made of other material(s)) or samples to be inspected (the terms “wafers” and “samples” may be used interchangeably). A “lot” is a plurality of wafers that may be loaded for processing as a batch.

[0040] One or more robotic arms (not shown) in EFEM 106 may transport the wafers to load / lock chamber 102. Load / lock chamber 102 is connected to a load / lock vacuum pump system (not shown) which removes gas molecules in load / lock chamber 102 to reach a first pressure below the atmospheric pressure. After reaching the first pressure, one or more robotic arms (not shown) may transport the wafer from load / lock chamber 102 to main chamber 101. Main chamber 101 is connected to a main chamber vacuum pump system (not shown) which removes gas molecules in main chamber 101 to reach a second pressure below the first pressure. After reaching the second pressure, the wafer is subject to inspection by beam tool 104. Beam tool 104 may be a single-beam system or a multi-beam system.

[0041] A controller 109 is electronically connected to beam tool 104. Controller 109 may be a computer that may execute various controls of CPB system 100. While controller 109 is shown in Fig. 1 as being outside of the structure that includes main chamber 101, load / lock chamber 102, and EFEM 106, it is appreciated that controller 109 may be a part of the structure.

[0042] In some embodiments, controller 109 may include one or more processors (not shown). A processor may be a generic or specific electronic device capable of manipulating or processing information. For example, the processor may include any combination of any number of a central processing unit (or “CPU”), a graphics processing unit (or “GPU”), an optical processor, a programmable logic controller, a microcontroller, a microprocessor, a digital signal processor, an intellectual property (IP) core, a Programmable Logic Array (PLA), a Programmable Array Logic (PAL), a Generic Array Logic (GAL), a Complex Programmable Logic Device (CPLD), a Field- Programmable Gate Array (FPGA), a System On Chip (SoC), an Application-Specific Integrated Circuit (ASIC), and any type circuit capable of data processing. The processor may also be a virtual processor that includes one or more processors distributed across multiple machines or devices coupled via a network.

[0043] In some embodiments, controller 109 may further include one or more memories (not shown). A memory may be a generic or specific electronic device capable of storing codes and data accessibleby the processor (e.g., via a bus). For example, the memory may include any combination of any number of a random-access memory (RAM), a read-only memory (ROM), an optical disc, a magnetic disk, a hard drive, a solid-state drive, a flash drive, a security digital (SD) card, a memory stick, a compact flash (CF) card, or any type of storage device. The codes may include an operating system (OS) and one or more application programs (or “apps”) for specific tasks. The memory may also be a virtual memory that includes one or more memories distributed across multiple machines or devices coupled via a network.

[0044] Fig. 2 illustrates an example imaging system 200 consistent with embodiments of the present disclosure. Beam tool 104 of Fig. 2 may be configured for use in CPB system 100. Beam tool 104 may be a single beam apparatus or a multi-beam apparatus. As shown in Fig. 2, beam tool 104 includes a motorized sample stage 201, and a wafer holder 202 supported by motorized sample stage 201 to hold a wafer 203 to be inspected. Beam tool 104 further includes an objective lens assembly 204, a charged- particle detector 206 (which includes charged-particle sensor surfaces 206a and 206b), an objective aperture 208, a condenser lens 210, a beam limit aperture 212, a gun aperture 214, an anode 216, and a cathode 218. Objective lens assembly 204, in some embodiments, may include a modified swing objective retarding immersion lens (SORIL), which includes a pole piece 204a, a control electrode 204b, a deflector 204c, and an exciting coil 204d. Beam tool 104 may additionally include an Energy Dispersive X-ray Spectrometer (EDS) detector (not shown) to characterize the materials on wafer 203.

[0045] A primary charged-particle beam 220 (or simply “primary beam 220”), such as an electron beam, is emitted from cathode 218 by applying an acceleration voltage between anode 216 and cathode 218. Primary beam 220 passes through gun aperture 214 and beam limit aperture 212, both of which may determine the size of charged-particle beam entering condenser lens 210, which resides below beam limit aperture 212. Condenser lens 210 focuses primary beam 220 before the beam enters objective aperture 208 to set the size of the charged-particle beam before entering objective lens assembly 204. Deflector 204c deflects primary beam 220 to facilitate beam scanning on the wafer. For example, in a scanning process, deflector 204c may be controlled to deflect primary beam 220 sequentially onto different locations of top surface of wafer 203 at different time points, to provide data for image reconstruction for different parts of wafer 203. Moreover, deflector 204c may also be controlled to deflect primary beam 220 onto different sides of wafer 203 at a particular location, at different time points, to provide data for stereo image reconstruction of the wafer structure at that location. Further, in some embodiments, anode 216 and cathode 218 may generate multiple primary beams 220, and beam tool 104 may include a plurality of deflectors 204c to project the multiple primary beams 220 to different parts / sides of the wafer at the same time, to provide data for image reconstruction for different parts of wafer 203.

[0046] Exciting coil 204d and pole piece 204a generate a magnetic field that begins at one end of pole piece 204a and terminates at the other end of pole piece 204a. A part of wafer 203 being scanned by primary beam 220 may be immersed in the magnetic field and may be electrically charged, which, inturn, creates an electric field. The electric field reduces the energy of impinging primary beam 220 near the surface of wafer 203 before it collides with wafer 203. Control electrode 204b, being electrically isolated from pole piece 204a, controls an electric field on wafer 203 to prevent micro-arching of wafer 203 and to ensure proper beam focus.

[0047] A secondary charged-particle beam 222 (or “secondary beam 222”), such as secondary electron beams, may be emitted from the part of wafer 203 upon receiving primary beam 220. Secondary beam 222 may form a beam spot on sensor surfaces 206a and 206b of charged-particle detector 206. Charged- particle detector 206 may generate a signal (e.g., a voltage, a current, or the like) that represents an intensity of the beam spot and provide the signal to an image processing system 250. The intensity of secondary beam 222, and the resultant beam spot, may vary according to the external or internal structure of wafer 203. Moreover, as discussed above, primary beam 220 may be projected onto different locations of the top surface of the wafer or different sides of the wafer at a particular location, to generate secondary beams 222 (and the resultant beam spot) of different intensities. Therefore, by mapping the intensities of the beam spots with the locations of wafer 203, the processing system may reconstruct an image that reflects the internal or surface structures of wafer 203.

[0048] Imaging system 200 may be used for inspecting a wafer 203 on motorized sample stage 201 and includes beam tool 104, as discussed above. Imaging system 200 may also include an image processing system 250 that includes an image acquirer 260, storage 270, and controller 109. Image acquirer 260 may include one or more processors. For example, image acquirer 260 may include a computer, server, mainframe host, terminals, personal computer, any kind of mobile computing devices, and the like, or a combination thereof. Image acquirer 260 may connect with a detector 206 of beam tool 104 through a medium such as an electrical conductor, optical fiber cable, portable storage media, IR, Bluetooth, internet, wireless network, wireless radio, or a combination thereof. Image acquirer 260 may receive a signal from detector 206 and may construct an image. Image acquirer 260 may thus acquire images of wafer 203. Image acquirer 260 may also perform various post-processing functions, such as generating contours, superimposing indicators on an acquired image, and the like. Image acquirer 260 may perform adjustments of brightness and contrast, or the like of acquired images. Storage 270 may be a storage medium such as a hard disk, cloud storage, random access memory (RAM), other types of computer readable memory, and the like. Storage 270 may be coupled with image acquirer 260 and may be used for saving scanned raw image data as original images, post-processed images, or other images assisting of the processing. Image acquirer 260 and storage 270 may be connected to controller 109. In some embodiments, image acquirer 260, storage 270, and controller 109 may be integrated together as one control unit.

[0049] In some embodiments, image acquirer 260 may acquire one or more images of a sample based on an imaging signal received from detector 206. An imaging signal may correspond to a scanning operation for conducting charged particle imaging. An acquired image may be a single image including a plurality of imaging areas. The single image may be stored in storage 270. The single image may bean original image that may be divided into a plurality of regions. Each of the regions may include one imaging area containing a feature of wafer 203.

[0050] Consistent with some embodiments of this disclosure, a computer-implemented method of training a machine learning model for defect detection may include obtaining training data that includes an inspection image of a fabricated integrated circuit (IC) and design layout data of the IC. The obtaining operation, as used herein, may refer to accepting, taking in, admitting, gaining, acquiring, retrieving, receiving, reading, accessing, collecting, or any operation for inputting data. An inspection image, as used herein, may refer to an image generated as a result of an inspection process performed by a charged-particle inspection apparatus (e.g., system 100 of Fig. 1 or system 200 of Fig. 2). For example, an inspection image may be an SCPM image generated by image processing system 250 in Fig. 2. A fabricated IC in this disclosure may refer to an IC manufactured on a sample (e.g., a wafer) in a semiconductor manufacturing process (e.g., a photolithography process). For example, the fabricated IC may be manufactured in a die of the sample. Design layout data of an IC, as used herein, may refer to data representing a designed layout of the IC. In some embodiments, the design layout data may include a design layout file in a GDS format (e.g., a GDS layout file). The design layout file may be visualized (also referred to as “rendered”) to be a 2D image (referred to as a “rendered image” herein) that presents the layout of the IC. The rendered image may include various geometric features (e.g., vertices, edges, corners, polygons, holes, bridges, vias, or the like) of the IC.

[0051] Fig. 3 illustrates a schematic diagram of an example multi-beam beam tool 104 (also referred to herein as apparatus 104) and an image processing system 390 that may be configured for use in CPB system 100 (Fig. 1), consistent with embodiments of the present disclosure.

[0052] Beam tool 104 comprises a charged-particle source 302, a gun aperture 304, a condenser lens 306, a primary charged-particle beam 310 emitted from charged-particle source 302, a source conversion unit 312, a plurality of beamlets 314, 316, and 318 of primary charged-particle beam 310, a primary projection optical system 320, a motorized wafer stage 380, a wafer holder 382, multiple secondary charged-particle beams 336, 338, and 340, a secondary optical system 342, and a charged- particle detection device 344. Primary projection optical system 320 can comprise a beam separator 322, a deflection scanning unit 326, and an objective lens 328. Charged-particle detection device 344 can comprise detection sub-regions 346, 348, and 350.

[0053] Charged-particle source 302, gun aperture 304, condenser lens 306, source conversion unit 312, beam separator 322, deflection scanning unit 326, and objective lens 328 can be aligned with a primary optical axis 360 of apparatus 104. Secondary optical system 342 and charged-particle detection device 344 can be aligned with a secondary optical axis 352 of apparatus 104.

[0054] Charged-particle source 302 can emit one or more charged particles, such as electrons, protons, ions, muons, or any other particle carrying electric charges. In some embodiments, charged-particle source 302 may be an electron source. For example, charged-particle source 302 may include a cathode, an extractor, or an anode, wherein primary electrons can be emitted from the cathode and extracted oraccelerated to form primary charged-particle beam 310 (in this case, a primary electron beam) with a crossover (virtual or real) 308. For ease of explanation without causing ambiguity, electrons are used as examples in some of the descriptions herein. However, it should be noted that any charged particle may be used in any embodiment of this disclosure, not limited to electrons. Primary charged-particle beam 310 can be visualized as being emitted from crossover 308. Gun aperture 304 can block off peripheral charged particles of primary charged-particle beam 310 to reduce Coulomb effect. The Coulomb effect may cause an increase in size of probe spots.

[0055] Source conversion unit 312 can comprise an array of image-forming elements and an array of beam-limit apertures. The array of image-forming elements can comprise an array of micro-deflectors or micro-lenses. The array of image-forming elements can form a plurality of parallel images (virtual or real) of crossover 308 with a plurality of beamlets 314, 316, and 318 of primary charged-particle beam 310. The array of beam-limit apertures can limit the plurality of beamlets 314, 316, and 318. While three beamlets 314, 316, and 318 are shown in Fig. 3, embodiments of the present disclosure are not so limited. For example, in some embodiments, the apparatus 104 may be configured to generate a first number of beamlets. In some embodiments, the first number of beamlets may be in a range from 1 to 1000. In some embodiments, the first number of beamlets may be in a range from 200-500. In some embodiments, an apparatus 104 may generate 400 beamlets.

[0056] Condenser lens 306 can focus primary charged-particle beam 310. The electric currents of beamlets 314, 316, and 318 downstream of source conversion unit 312 can be varied by adjusting the focusing power of condenser lens 306 or by changing the radial sizes of the corresponding beam-limit apertures within the array of beam-limit apertures. Objective lens 328 can focus beamlets 314, 316, and 318 onto a wafer 330 for imaging, and can form a plurality of probe spots 370, 372, and 374 on a surface of wafer 330.

[0057] Beam separator 322 can be a beam separator of Wien filter type generating an electrostatic dipole field and a magnetic dipole field. In some embodiments, if they are applied, the force exerted by the electrostatic dipole field on a charged particle (e.g., an electon) of beamlets 314, 316, and 318 can be substantially equal in magnitude and opposite in a direction to the force exerted on the charged particle by magnetic dipole field. Beamlets 314, 316, and 318 can, therefore, pass straight through beam separator 322 with zero deflection angle. However, the total dispersion of beamlets 314, 316, and 318 generated by beam separator 322 can also be non-zero. Beam separator 322 can separate secondary charged-particle beams 336, 338, and 340 from beamlets 314, 316, and 318 and direct secondary charged-particle beams 336, 338, and 340 towards secondary optical system 342.

[0058] Deflection scanning unit 326 can deflect beamlets 314, 316, and 318 to scan probe spots 370, 372, and 374 over a surface area of wafer 330. In response to the incidence of beamlets 314, 316, and 318 at probe spots 370, 372, and 374, secondary charged-particle beams 336, 338, and 340 may be emitted from wafer 330. Secondary charged-particle beams 336, 338, and 340 may comprise charged particles (e.g., electrons) with a distribution of energies. For example, secondary charged-particle beams336, 338, and 340 may be secondary electron beams including secondary electrons (energies < 50 eV) and backscattered electrons (energies between 50 eV and landing energies of beamlets 314, 316, and 318). Secondary optical system 342 can focus secondary charged-particle beams 336, 338, and 340 onto detection sub-regions 346, 348, and 350 of charged-particle detection device 344. Detection sub-regions 346, 348, and 350 may be configured to detect corresponding secondary charged-particle beams 336, 338, and 340 and generate corresponding signals (e.g., voltage, current, or the like) used to reconstruct an inspection image of structures on or underneath the surface area of wafer 330.

[0059] The generated signals may represent intensities of secondary charged-particle beams 336, 338, and 340 and may be provided to image processing system 390 that is in communication with charged- particle detection device 344, primary projection optical system 320, and motorized wafer stage 380. The movement speed of motorized wafer stage 380 may be synchronized and coordinated with the beam deflections controlled by deflection scanning unit 326, such that the movement of the scan probe spots (e.g., scan probe spots 370, 372, and 374) may orderly cover regions of interest on the wafer 330. The parameters of such synchronization and coordination may be adjusted to adapt to different materials of wafer 330. For example, different materials of wafer 330 may have different resistance-capacitance characteristics that may cause different signal sensitivities to the movement of the scan probe spots.

[0060] The intensity of secondary charged-particle beams 336, 338, and 340 may vary according to the external or internal structure of wafer 330, and thus may indicate whether wafer 330 includes defects. Moreover, as discussed above, beamlets 314, 316, and 318 may be projected onto different locations of the top surface of wafer 330, or different sides of local structures of wafer 330, to generate secondary charged-particle beams 336, 338, and 340 that may have different intensities. Therefore, by mapping the intensity of secondary charged-particle beams 336, 338, and 340 with the areas of wafer 330, image processing system 390 may reconstruct an image that reflects the characteristics of internal or external structures of wafer 330.

[0061] In some embodiments, image processing system 390 may include an image acquirer 392, a storage 394, and a controller 396. Image acquirer 392 may comprise one or more processors. For example, image acquirer 392 may comprise a computer, server, mainframe host, terminals, personal computer, any kind of mobile computing devices, or the like, or a combination thereof. Image acquirer 392 may be communicatively coupled to charged-particle detection device 344 of beam tool 104 through a medium such as an electric conductor, optical fiber cable, portable storage media, IR, Bluetooth, internet, wireless network, wireless radio, or a combination thereof. In some embodiments, image acquirer 392 may receive a signal from charged-particle detection device 344 and may construct an image. Image acquirer 392 may thus acquire inspection images of wafer 330. Image acquirer 392 may also perform various post-processing functions, such as generating contours, superimposing indicators on an acquired image, or the like. Image acquirer 392 may be configured to perform adjustments of brightness and contrast of acquired images. In some embodiments, storage 394 may be a storage medium such as a hard disk, flash drive, cloud storage, random access memory (RAM), othertypes of computer-readable memory, or the like. Storage 394 may be coupled with image acquirer 392 and may be used for saving scanned raw image data as original images, and post-processed images. Image acquirer 392 and storage 394 may be connected to controller 396. In some embodiments, image acquirer 392, storage 394, and controller 396 may be integrated together as one control unit.

[0062] In some embodiments, image acquirer 392 may acquire one or more inspection images of a wafer based on an imaging signal received from charged-particle detection device 344. An imaging signal may correspond to a scanning operation for conducting charged particle imaging. An acquired image may be a single image comprising a plurality of imaging areas. The single image may be stored in storage 394. The single image may be an original image that may be divided into a plurality of regions. Each of the regions may comprise one imaging area containing a feature of wafer 330. The acquired images may comprise multiple images of a single imaging area of wafer 330 sampled multiple times over a time sequence. The multiple images may be stored in storage 394. In some embodiments, image processing system 390 may be configured to perform image processing steps with the multiple images of the same location of wafer 330.

[0063] In some embodiments, image processing system 390 may include measurement circuits (e.g., analog-to-digital converters) to obtain a distribution of the detected secondary charged particles (e.g., secondary electons). The charged-particle distribution data collected during a detection time window, in combination with corresponding scan path data of beamlets 314, 316, and 318 incident on the wafer surface, can be used to reconstruct images of the wafer structures under inspection. The reconstructed images can be used to reveal various features of the internal or external structures of wafer 330, and thereby can be used to reveal any defects that may exist in the wafer.

[0064] In some embodiments, the charged particles may be electrons. When electrons of primary charged-particle beam 310 are projected onto a surface of wafer 330 (e.g., probe spots 370, 372, and 374), the electrons of primary charged-particle beam 310 may penetrate the surface of wafer 330 for a certain depth, interacting with particles of wafer 330. Some electrons of primary charged-particle beam 310 may elastically interact with (e.g., in the form of elastic scattering or collision) the materials of wafer 330 and may be reflected or recoiled out of the surface of wafer 330. An elastic interaction conserves the total kinetic energies of the bodies (e.g., electrons of primary charged-particle beam 310) of the interaction, in which the kinetic energy of the interacting bodies does not convert to other forms of energy (e.g., heat, electromagnetic energy, or the like). Such reflected electrons generated from elastic interaction may be referred to as backscattered electrons (BSEs). Some electrons of primary charged-particle beam 310 may inelastically interact with (e.g., in the form of inelastic scattering or collision) the materials of wafer 330. An inelastic interaction does not conserve the total kinetic energies of the bodies of the interaction, in which some or all of the kinetic energy of the interacting bodies convert to other forms of energy. For example, through the inelastic interaction, the kinetic energy of some electrons of primary charged-particle beam 310 may cause electron excitation and transition of atoms of the materials. Such inelastic interaction may also generate electrons exiting the surface ofwafer 330, which may be referred to as secondary electrons (SEs). Yield or emission rates of BSEs and SEs depend on, e.g., the material under inspection and the landing energy of the electrons of primary charged-particle beam 310 landing on the surface of the material, among others. The energy of the electrons of primary charged-particle beam 310 may be imparted in part by its acceleration voltage (e.g., the acceleration voltage between the anode and cathode of charged-particle source 302 in Fig. 3). The quantity of BSEs and SEs may be more or fewer (or even the same) than the injected electrons of primary charged-particle beam 310.

[0065] Fig. 4 is a block diagram of an example server 400, consistent with embodiments of the disclosure. As shown in Fig. 4, server 400 can include processor 402. When processor 402 executes instructions described herein, server 400 can become a specialized machine. Processor 402 can be any type of circuitry capable of manipulating or processing information. For example, processor 402 can include any combination of any number of a central processing unit (“CPU”), a graphics processing unit (“GPU”), a neural processing unit (“NPU”), a microcontroller unit (“MCU”), an optical processor, a programmable logic controller, a microcontroller, a microprocessor, a digital signal processor, an intellectual property (IP) core, a Programmable Logic Array (PLA), a Programmable Array Logic (PAL), a Generic Array Logic (GAL), a Complex Programmable Logic Device (CPLD), a Field- Programmable Gate Array (FPGA), a System On Chip (SoC), an Application-Specific Integrated Circuit (ASIC), or the like. In some embodiments, processor 402 can also be a set of processors grouped as a single logical component. For example, as shown in Fig. 4, processor 402 can include multiple processors, including processor 402a, processor 402b, and processor 402n. In some embodiments, server 400 can assist with training a neural network by itself or in combination with other servers.

[0066] Server 400 can also include memory 404 configured to store data (e.g., a set of instructions, computer codes, intermediate data, or the like). For example, as shown in Fig. 4, the stored data can include program instructions and data for processing. Processor 402 can access the program instructions and data for processing (e.g., via bus 410), and execute the program instructions to perform an operation or manipulation on the data for processing. Memory 404 can include a high-speed random-access storage device or a non-volatile storage device. In some embodiments, memory 404 can include any combination of any number of a random-access memory (RAM), a read-only memory (ROM), an optical disc, a magnetic disk, a hard drive, a solid-state drive, a flash drive, a security digital (SD) card, a memory stick, a compact flash (CF) card, or the like. Memory 404 can also be a group of memories (not shown in Fig. 4) grouped as a single logical component.

[0067] Bus 410 can be a communication device that transfers data between components inside server 400, such as an internal bus (e.g., a CPU-memory bus), an external bus (e.g., a universal serial bus port, a peripheral component interconnect express port), or the like.

[0068] For ease of explanation without causing ambiguity, processor 402 and other data processing circuits are collectively referred to as a “data processing circuit” in this disclosure. The data processing circuit can be implemented entirely as hardware, or as a combination of software, hardware, or firmware.In addition, the data processing circuit can be a single independent module or can be combined entirely or partially into any other component of server 400.

[0069] Server 400 can further include network interface 406 to provide wired or wireless communication with a network (e.g., the Internet, an intranet, a local area network, a mobile communications network, or the like). In some embodiments, network interface 406 can include any combination of any number of a network interface controller (NIC), a radio frequency (RF) module, a transponder, a transceiver, a modem, a router, a gateway, a wired network adapter, a wireless network adapter, a Bluetooth adapter, an infrared adapter, a near-field communication (“NFC”) adapter, a cellular network chip, or the like.

[0070] In some embodiments, optionally, server 400 can further include peripheral interface 408 to provide a connection to one or more peripheral devices. As shown in Fig. 4, the peripheral device can include, but is not limited to, a cursor control device (e.g., a mouse, a touchpad, or a touchscreen), a keyboard, a display (e.g., a cathode -ray tube display, a liquid crystal display, or a light-emitting diode display), a video input device (e.g., a camera or an input interface coupled to a video archive), or the like.

[0071] Consistent with embodiments of this disclosure, the computer-implemented method of using a generative artificial intelligence model may also include training the model using obtained training data. In some embodiments, the model may be trained by a computer hardware system, such as server 400 or server 400 in combination with other servers.

[0072] For example, in some embodiments, a machine learning system may be operated in association with, e.g., controller 109, image processing system 250, image acquirer 260, storage 270, image processing system 390, image acquirer 392, or storage 394 of FIGs. 1-3, and server 400 of FIG. 4. For example, as described further below, a generative model may be configured for generating an image from a design clip that resembles a corresponding location on a wafer in a CPB image. This may be performed by 1) training the generative model with design clips and the associated actual CPB images from those locations on the wafer; and 2) using the model in inference mode to feed the model design clips in locations for which simulated CPB images are desired. Such simulated images can be used as reference images in, e.g., die-to-database inspection. While the same hardware and software can be used to perform both the training and the inferencing, it is appreciated that one or more servers (e.g., server 400) can be involved with training the model and separate hardware and software may be used at the inferencing stage, such as controller 109, image processing system 250, image acquirer 260, storage 270, image processing system 390, image acquirer 392, or storage 394.

[0073] A generative model can be generally defined as a model that is probabilistic in nature. In other words, a “generative” model is not one that performs forward simulation or rule -based approaches and, as such, it may not be necessary to model the physics of the processes involved in generating an actual image or output (for which a simulated image or output is being generated). Instead, the generative model can be learned (in that its parameters can be learned) based on a suitable training set of data.Such generative models may have a number of advantages for the embodiments described herein. In addition, the generative model may be configured to have a deep learning architecture in that the generative model may include multiple layers, which may perform a number of algorithms or transformations. The number of layers included in the generative model may depend on the particular use case. For practical purposes, a suitable range of layers is from two layers to a few tens of layers.

[0074] Deep learning is a type of machine learning. The machine learning described herein may be further performed as described in “Introduction to Statistical Machine Learning,” by Sugiyama, Morgan Kaufmann, 2016, 534 pages; “Discriminative, Generative, and Imitative Learning,” by Jebara, MIT Thesis, 2002, 212 pages; and “Principles of Data Mining (Adaptive Computation and Machine Learning)” by Hand et al., MIT Press, 2001, 578 pages; which are incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in these references.

[0075] In some embodiments, a machine learning system may comprise a neural network. For example, a model may be a deep neural network with a set of weights that model the world according to the data that it has been fed to train it. Neural networks can be generally defined as a computational approach which is based on a relatively large collection of neural units loosely modeling the way a biological brain solves problems with relatively large clusters of biological neurons connected by axons. Each neural unit is connected with many others, and links can be enforcing or inhibitory in their effect on the activation state of connected neural units. These systems are self-learning and trained rather than explicitly programmed and excel in areas where the solution or feature detection is difficult to express in a traditional computer program.

[0076] Neural networks typically consist of multiple layers, and the signal path traverses from front to back. The goal of the neural network is to solve problems in the same way that the human brain would, although several neural networks are much more abstract. Modern neural network projects typically work with a few thousand to a few million neural units and millions of connections. The neural network may have any suitable architecture or configuration known in the art.

[0077] In a further embodiment, a model may comprise a convolutional and deconvolution neural network. For example, the embodiments described herein can take advantage of learning concepts such as a convolution and deconvolution neural network to solve the normally intractable representation conversion problem (e.g., rendering). The model may have any convolution and deconvolution neural network configuration or architecture known in the art.

[0078] A neural network, as used herein, may refer to a computing model for analyzing underlying relationships in a set of input data by way of mimicking human brains. Similar to a biological neural network, the neural network may include a set of connected units or nodes (referred to as “neurons”), structured as different layers, where each connection (also referred to as an “edge”) may obtain and send a signal between neurons of neighboring layers in a way similar to a synapse in a biological brain. The signal may be any type of data (e.g., a real number). Each neuron may obtain one or more signalsas an input and output another signal by applying a non-linear function to the inputted signals. Neurons and edges may typically be weighted by corresponding weights to represent the knowledge the neural network has acquired. During a training process (similar to a learning process of a biological brain), the weights may be adjusted (e.g., by increasing or decreasing their values) to change the strengths of the signals between the neurons to improve the performance accuracy of the neural network. Neurons may apply a thresholding function (referred to as an “activation function”) to its output values of the nonlinear function such that a signal is outputted only when an aggregated value (e.g., a weighted sum) of the output values of the non-linear function exceeds a threshold determined by the thresholding function. Different layers of neurons may transform their input signals in different manners (e.g., by applying different non-linear functions or activation functions). The output of the last layer (referred to as an “output layer”) may output the analysis result of the neural network, such as, for example, a categorization of the set of input data (e.g., as in image recognition cases), a numerical result, or any type of output data for obtaining an analytical result from the input data.

[0079] Training of the neural network, as used herein, may refer to a process of improving the accuracy of the output of the neural network. Typically, the training may be categorized into three types: supervised training, unsupervised training, and reinforcement training. In the supervised training, a set of target output data (also referred to as “labels” or “ground truth”) may be generated based on a set of input data using a method other than the neural network. The neural network may then be fed with the set of input data to generate a set of output data that is typically different from the target output data. Based on the difference between the output data and the target output data, the weights of the neural network may be adjusted in accordance with a rule. If such adjustments are successful, the neural network may generate another set of output data more similar to the target output data in a next iteration using the same input data. If such adjustments are not successful, the weights of the neural network may be adjusted again. After a sufficient number of iterations, the training process may be terminated in accordance with one or more predetermined criteria (e.g., the difference between the final output data and the target output data is below a predetermined threshold, or the number of iterations reaches a predetermined threshold). The trained neural network may be applied to analyze other input data.

[0080] In the unsupervised training, the neural network is trained without any external gauge (e.g., labels) to identify patterns in the input data rather than generating labels for them. Typically, the neural network may analyze shared attributes (e.g., similarities and differences) and relationships among the elements of the input data in accordance with one or more predetermined rules or algorithms (e.g., principal component analysis, clustering, anomaly detection, or latent variable identification). The trained neural network may extrapolate the identified relationships to other input data.

[0081] In the reinforcement learning, the neural network is trained without any external gauge (e.g., labels) in a trial-and-error manner to maximize benefits in decision making. The input data sets of the neural network may be different in the reinforcement training. For example, a reward value or a penalty value may be determined for the output of the neural network in accordance with one or more rulesduring training, and the weights of the neural network may be adjusted to maximize the reward values (or to minimize the penalty values). The trained neural network may apply its learned decision-making knowledge to other input data.

[0082] During the training of a neural network, a loss function (or referred to as a “cost function”) may be used to evaluate the output data. The loss function, as used herein, may map output data of a machine learning model (e.g., the neural network) onto a real number (referred to as a “loss” or a “cost”) that intuitively represents a loss or an error (e.g., representing a difference between the output data and target output data) associated with the output data. The training of the neural network may seek to maximize or minimize the loss function (e.g., by pushing the loss towards a local maximum or a local minimum in a loss curve). For example, one or more parameters of the neural network may be adjusted or updated purporting to maximize or minimize the loss function. After adjusting or updating the one or more parameters, the neural network may obtain new input data in a next iteration of its training. When the loss function is maximized or minimized, the training of the neural network may be terminated.

[0083] By way of example, Fig. 5 is a schematic diagram illustrating an example neural network 500 implementing a variational autoencoder, consistent with embodiments of the present disclosure. As depicted in Fig. 5, neural network 500 may include an input layer 510, including input 510-1, . . ., input 510-b (b being an integer). For example, an input of neural network 500 may include any structure or unstructured data (e.g., an image). In some embodiments, neural network 500 may obtain a plurality of inputs simultaneously. For example, in Fig. 5, neural network 500 may obtain b inputs simultaneously. In some embodiments, input layer 510 may obtain b inputs in succession such that input layer 510 receives input 510-1 in a first cycle (e.g., in a first inference) and pushes data from input 510-1 to an encoder (e.g., encoder 520), then receives a second input in a second cycle (e.g., in a second inference) and pushes data from the second input to the encoder, and so on. Input layer 510 may obtain any number of inputs in the simultaneous manner, the successive manner, or any manner of grouping the inputs.

[0084] Encoder 520 may include one or more nodes, including node 520-1, node 520-2, . . ., node 520- c (c being an integer). A node (also referred to as a “machine perceptron” or a “neuron”) may model the functioning of a biological neuron. Each node may apply an activation function to received inputs (e.g., one or more of input 510-1, . . ., input 510-b). An activation function may include a Heaviside step function, a Gaussian function, a multiquadratic function, an inverse multiquadratic function, a sigmoidal function, a rectified linear unit (ReLU) function (e.g., a ReLU6 function or a Leaky ReLU function), a hyperbolic tangent (“tanh”) function, or any non-linear function. The output of the activation function may be weighted by a weight associated with the node. A weight may include a positive value between 0 and 1 , or any numerical value that may scale outputs of some nodes in a layer more or less than outputs of other nodes in the same layer. It is noted that while Fig. 5 shows one layer of nodes for the encoder 520, the encoder 520 may include multiple layers of nodes.

[0085] As further depicted in Fig. 5, neural network 500 includes a latent space 530. The latent space 530 may include multiple distinct sets of parameters generated by the encoder 520, including latentspace parameter 530-1, . . latent space parameter 530-d (d being an integer). It is noted that the latent space 530 may include any number of parameter sets derived from the input layer 510 and generated by the encoder 520. The encoder 520 may non-linearly decompose the inputs 510-1 to 510-b to generate the latent space parameters 530-1 to 530-d.

[0086] As further depicted in Fig. 5, neural network 500 may include a decoder 540, including one or more nodes, including node 540-1, node 540-2, . . ., node 540-e (e being an integer). Each node may apply an activation function to received inputs (e.g., one or more of latent space parameter 530-1, . . ., latent space parameter 530-d). Similar to the encoder 520, the activation function used in each node of the decoder 540 may include a Heaviside step function, a Gaussian function, a multiquadratic function, an inverse multiquadratic function, a sigmoidal function, a rectified linear unit (ReLU) function (e.g., a ReLU6 function or a Leaky ReLU function), a hyperbolic tangent (“tanh”) function, or any non-linear function. The output of the activation function may be weighted by a weight associated with the node. A weight may include a positive value between 0 and 1 , or any numerical value that may scale outputs of some nodes in a layer more or less than outputs of other nodes in the same layer. It is noted that while Fig. 5 shows one layer of nodes for the decoder 540, the decoder 540 may include multiple layers of nodes. In some embodiments, the number of nodes in the decoder 540 may not match the number of nodes in the encoder 520 (e.g., the integers c and e may be different). In other embodiments, the number of nodes in the decoder 540 may match the number of nodes in the encoder 520 (e.g., the integers c and e may be the same).

[0087] As further depicted in Fig. 5, neural network 500 may include an output layer 550 that finalizes outputs, including output 550-1, output 550-2, . . ., output 550-f (f being an integer). In some embodiments, the number of outputs 550 may not match the number of inputs 510 (e.g., the integers b and f may be different). In other embodiments, the number of outputs 550 may match the number of inputs 510 (e.g., the integers b and f may be the same).

[0088] Although the nodes of the neural network 500 are depicted in Fig. 5 as being connected to each node of its previous layer and next layer (referred to as “fully connected”), the layers of neural network 500 may use any connection scheme. For example, one or more layers (e.g., input layer 510, encoder 520, latent space 530, encoder 540, or output layer 550) of neural network 500 may be connected using a convolutional scheme, a sparsely connected scheme, or any connection scheme that uses fewer connections between one layer and a previous layer than the fully connected scheme as depicted in Fig. 5.

[0089] Moreover, although the inputs and outputs of the layers of neural network 500 are depicted as propagating in a forward direction (e.g., being fed from input layer 510 to output layer 550, referred to as a “feedforward network”) in Fig. 5, neural network 500 may additionally or alternatively use backpropagation (e.g., feeding data from output layer 550 towards input layer 510) for other purposes. For example, the backpropagation may be implemented by using long short-term memory nodes (LSTM). Accordingly, although neural network 500 is depicted similar to a convolutional neuralnetwork (CNN), neural network 500 may include a recurrent neural network (RNN) or any other neural network.

[0090] To achieve accurate and precise metrology, the metrology algorithm parameters need to be properly tuned. For example, with overlay metrology, the parameters of the overlay algorithm may be tuned based on a set-get overlay slope = 1. To determine the set-get overlay slope, the set-overlay value is plotted on an x-axis, where the set-overlay value is a programmed overlay value (for example, a known overlay value when a wafer is printed), and the get-overlay value is plotted on a y-axis, wherein the get-overlay value is a measured overlay value determined by the overlay image processing algorithm or latent space parameter. A first-order polynomial may be used on a pair of coordinates (the set-overlay value and the corresponding get-overlay value for a particular point) and may be fit to obtain the set- get overlay line slope and an intercept for the line. Prior work has not considered the intercept when tuning the overlay algorithm parameters, which may lead to overlay accuracy bias and uncertainty.

[0091] Fig. 6 is a plot 600 showing programmed overlay values and measured overlay values, consistent with embodiments of the present disclosure. In the plot 600, “set-OV” 602 (along the x-axis) is a programmed overlay value. “Get-OV” 604 (along the y-axis) is the measured overlay value. Prior work has tuned the overlay algorithm parameters based on achieving a set-get overlay slope = 1 , such as the slope of line 606 in plot 600. By fitting a first-order polynomial on a pair of coordinates (one point in the plot 600), the set-get overlay slope of the line 606 and intercept 608 may be obtained.

[0092] In the plot 600, five targets 610a-610e are printed on a wafer with a known spacing between the targets (e.g., the set-OV value is known for each of the targets 610a-610e). The targets 610a-610e are measured using a CPB tool (e.g., the get-OV values are obtained through measurements made by the CPB tool). The resulting set-OV values and get-OV values are plotted as shown in the plot 600. In the plot 600, the middle target 610c is intended to have an overlay value equal to zero (e.g., plotted at the origin of the plot 600). As shown in the plot 600, the target 610c is offset from the origin of the plot 600 by the intercept 608. It is still desirable to set the slope of the line 606 drawn through the targets 610a-610e to have a slope of 1 to determine a correct overlay value for an image. Determining the intercept 608 helps to provide accurate overlay results.

[0093] Fig. 7 is a schematic diagram of a variational autoencoder (VAE) 700 for use in connection with overlay metrology, consistent with embodiments of the present disclosure. The VAE 700 may be viewed as a non-linear decompose and reconstruct method, which is good at automating the process of representing raw data more efficiently. The VAE 700 includes an encoder 706 and a decoder 712. The encoder 706 helps to decompose input images with different overlay values efficiently into latent space parameters. The decoder 712 constructs new images based on the latent space parameters. The VAE 700 includes a probabilistic model that helps generate new content similar to, yet different from, the original content. The probability distribution is good at managing stochastics of feature characteristics due to wafer manufacturing process noise and CPB tool noise. In some embodiments, the VAE 700 may be implemented using the neural network 500.

[0094] A first input image 702 (at 0° rotation) and a second input image 704 (at 180° rotation relative to the first input image 702) are provided to the encoder 706. The encoder 706 generates a first set of latent space parameters 708 for the first image 702 and a second set of latent space parameters 710 for the second image 704. The latent space parameters 708, 710 may include an overlay (OV) parameter, which represents a placement shift of features between two layers of the wafer; a critical dimension (CD) parameter, which represents a feature size; and a non-linear (NL) parameter, which represents non-linear effects in the image, including, but not limited to CPB distortion, charging effect, and lithography pattern stochastic effect. It is noted that the latent space parameters 708, 710 may include additional parameters other than those specifically listed. In some embodiments, the latent space parameters 708, 710 may include eight or 16 parameters each.

[0095] The first set of latent space parameters 708 and the second set of latent space parameters 710 are provided to the decoder 712. The decoder 712 uses the latent space parameters to generate new images to attempt to recover the original input image based on the latent space parameters. The decoder 712 generates a first output image 714 using latent space parameters 708 (attempting to recover the first input image 702) and a second output image 716 using latent space parameters 710 (attempting to recover the second input image 704). The encoder 706 and the decoder 712 are trained by adjusting the weights of the encoder and decoder to get the output images 714, 716 close to the input images 702, 704.

[0096] The overlay parameter and the critical dimension parameter have a special relationship when viewed in the context of the input images 702, 704 being rotated by 180°. For example, the overlay value will reverse sign (e.g., positive to negative or negative to positive) when the input image is rotated by 180°. As another example, the critical dimension value will remain the same when the input image is rotated by 180° (e.g., the size of the feature as indicated by the critical dimension value will not change).

[0097] To train the encoder 706 and the decoder 712, various loss functions may be used. For example, an overlay loss may be calculated based on the formula:OV loss = mse (OV_latent@0° + OV_latent@ 180°) Equation (1) where mse ( ) is the mean squared error function, OV_latent@0° is the latent space overlay parameter obtained at image rotation angle 0°, and OV_latent@ 180° is the latent space overlay parameter obtained at image rotation angle 180°. As noted above, because the sign of the overlay value at image rotation angle 0° is the opposite of the sign of the overlay value at image rotation angle 180°, adding these overlay values together should be zero. For example, if the overlay value at image rotation angle 0° is +2 nm, the overlay value at image rotation angle 180° should be -2 nm. The OV loss function aims to minimize the difference between the overlay value at image rotation angle 0° and the overlay value at image rotation angle 180°. It is noted that while the mean squared error function is used to calculate theOV loss, other typical machine learning training loss functions may be used, such as mean absolute error. The choice of error function to use may vary without affecting the overall operation of the embodiments described herein.

[0098] A critical dimension loss lay be calculated based on the formula:CD loss = mse (CD_latent@0° - CD_latent@ 180°) Equation (2), where CD_latent@0° is the latent space CD parameter obtained at image rotation angle 0°, and CD_latent@ 180° is the latent space CD parameter obtained at image rotation angle 180°. As noted above, because the CD value at image rotation angle 0° is the same as the CD value at image rotation angle 180°, subtracting these CD values from one another should be zero. For example, if the CD value at image rotation angle 0° is 10 nm, the CD value at image rotation angle 180° should also be 10 nm. The CD loss function aims to minimize the difference between the CD value at image rotation angle 0° and the CD value at image rotation angle 180°.

[0099] The above examples are using image rotation angle 180° to show the OV and CD relationship between a non-zero angle and the image at 0° rotation. The same concept can be expanded to other image rotation angles (such as 90° or 270°) as well as flipping the image along the x-axis and the y- axis.

[0100] Fig. 8 is a diagram showing an example accuracy self-reference (ASR) image dataset 800, consistent with embodiments of the present disclosure. A wafer 802 includes a plurality of dies, and one or more ASR targets 804 can be placed on one or more dies. An ASR target 804 may be created with different ASR subtargets 806. Each ASR subtarget 806 has a different known overlay value. For example, as shown in Fig. 8, the ASR target 804 in each die is divided into 81 subtargets 806, whose structures are designed with different known overlay values. Because all the subtargets 806 in one ASR target 804 are printed at the same time, any shift in the images (e.g., the intercept value) will be the same. It is noted that the overlay value for each subtarget 806, the distribution of the overlay values in the ASR target 804, and the total number of subtargets 806 in the ASR target 804 may vary without affecting the operation of the systems and methods described herein. For example, the overlay values may vary between -5.0 nm and +5.0 nm.

[0101] A single subtarget 806 (as shown in enlargement 808) may include a repeatable pattern. For example, the pattern may include a unicell 810 having a parallelogram with contact holes on the left side and the right side of the parallelogram. The parallelogram is on one layer and the contact holes are on a separate layer. The overlay shift along the x-axis may be determined by comparing the difference between the center of the parallelogram along the x-axis and a center point between the two contact holes along the x-axis. If the overlay value is zero, then the center of the parallelogram and the center point between the two contact holes would be the same (e.g., the two layers are aligned).

[0102] In some embodiments, the subtargets 806 may include an overlay shift along the y-axis (not shown in Fig. 8). For example, the overlay shift along the x-axis may be zero and the overlay shift along the y-axis may be 2 nm. It is noted that any overlay shift in the x-y plane is possible without altering the operation of the systems and method described herein.

[0103] A mean position shift of many unicells 810 in the pattern may be obtained and fed back to the scanner to adjust for the offset during printing. The offset value for a single subtarget 806 may be based on an average shift of all unicells 810 in the subtarget 806. During training, random samples from the subtarget 806 may be taken by clipping different parts of the image of the subtarget 806, for example, in 128 x 128 pixel image clips. By randomly taking image clips from different parts of the image of the subtarget 806, the encoder 706 may be better trained to adjust the weights to handle any input image, because the input image may not always be centered such that one entire unicell 810 is visible.

[0104] When the encoder 706 is trained on the ASR subtargets 806, the latent space parameters may also include an intercept overlay value (e.g., intercept-OV_latent@0°, the intercept overlay value at 0° rotation). An intercept loss may be calculated based on the formula: intercept_OV_loss = mse (OV_label(j) + intercept-OV_latent@0°(j)) Equation (3), where OV_label is the predetermined overlay value for the target and j = 1..N, N is a number of ASR subtargets in an ASR target.

[0105] The intercept may be calculated based on the formula: intercept = mean(OV_latent@0°(j) - OV_label(j)) Equation (4) where j = 1..N, N is the number of ASR subtargets in an ASR target.

[0106] The total overlay loss function per ASR target may be calculated based on the formula:All_OV_loss = OV_loss + intercept_OV_loss Equation (5).

[0107] The VAE 700 may be trained using the ASR target images. Per ASR target 804, the decoder 712 constructs new images at reference overlay values. Per ASR target 804, the encoder 706 extracts overlay values from the constructed images and calculates the slope of the set_get_OV line 606 and the intercept 608. Reference images are selected from the constructed images where the constructed image has a set_get_OV line slope = 1 + 0.01 and an absolute value of the intercept < 0.05 nm. In some embodiments, the values 0.01 and 0.05 nm may be user-defined threshold values based on a specific user case.

[0108] Besides the above loss functions, a reconstruction loss and a Kullback-Leibler (KL) divergence loss may be calculated to ensure that the reconstructed image is similar to the original image. Thereconstruction loss measures how close the output image of decoder 712 is to the original input image. The reconstruction loss measures the difference between the output image and the input image at the pixel level, and may be calculated using the mean square error between the output image and the original input image. The KL divergence loss measures how much the distribution of the latent space parameters deviates from a prior distribution, which may be assumed to be a standard normal distribution.

[0109] The reconstruction loss and Kullback-Leibler (KL) divergence loss may be calculated based on the formulas: reconstruction_loss= (mse(image_reconstructed@0° - image_original@0°) / number_of_pixels) + (mse(image_reconstructed@ 180° - image_original@ 180°) / number_of_pixels) Equation (6)KL_divergence_loss= KL_divergence_loss@0° + KL_divergence_loss@ 180° Equation (7), where image_reconstructed@0° is the reconstructed image obtained at image rotation angle 0°, image_original@0° is the original image obtained at image rotation angle 0°, number_of_pixels is the total number of pixels per image, image_reconstructed@ 180° is the reconstructed image obtained at image rotation angle 180°, and image_original@ 180° is the original image obtained at image rotation angle 180°. The reconstruction error is used to ensure that the constructed image is similar to the input image. The KL divergence loss is used to capture the stochastic effect of noise from the metrology tool and the wafer manufacturing process.

[0110] A total loss function per ASR target may be calculated based on the formula:Total_Loss = reconstruction_loss + (P x KL_divergence_loss) + (y x All_OV_loss) + a x CD_loss Equation (8) where , y, and a are hyperparameters to weight the KL_divergence_loss, the All_OV_loss, and the CD_loss, respectively. As an example, P = le-6, y = 50, and a = 5 may be used as the weights. In some embodiments, other hyperparameters may be set, such as learning rate = le-3, latent space parameter dimension = 8, and mini-batch size = 64 images. It is noted that depending on the particular use case, the hyperparameters may have different values. In some embodiments, multiple ASR targets’ images may be used to train the neural network.

[0111] The VAE quality is verified using test data to confirm that the encoder 706 has achieved an acceptable set-get-OV slope of approximately 1, within a margin of error of ±3%, for example. After the VAE quality is verified, the VAE decoder 712 may be used to construct new images.

[0112] Fig. 9 is a flowchart of an example method 900 for training a variational autoencoder (e.g., VAE 700), consistent with embodiments of the present disclosure. In some embodiments, the method 900 may be implemented by controller 109, image processing system 250, image acquirer 260, storage270, image processing system 390, image acquirer 392, or storage 394 of FIGs. 1-3, or server 400 of FIG. 4.

[0113] At step 902, a sample wafer is created with features having known programmed overlay values. For example, the sample wafer may include features with overlay values at 1.0 nm intervals. As another example, the wafer shown in Fig. 8 may be used. It is noted that other latent space parameters besides overlay may be used. Latent space parameters are extracted from the sample images and serve as measured parameters, which may include a measured overlay value (referred to as get_OV). The known programmed overlay values are referred to as set_OV.

[0114] At step 904, images are taken of the sample wafer. For example, CPB or SEM images may be taken of the wafer. Because the overlay values are known, the images may be labeled with the known overlay values.

[0115] At step 906, image clips are obtained. For example, random samples may be taken by clipping different parts of the image, e.g., in 128 x 128 pixel image clips. By randomly taking image clips from different parts of the image, the encoder 706 may be better trained to adjust the weights to handle any input image, because the input image may not always be centered such that one entire unicell 810 is visible.

[0116] At step 908, each image clip is passed through the encoder 706 to obtain the overlay latent space parameters.

[0117] At step 910, the slope of the set_get_OV line 606 and the intercept 608 are calculated.

[0118] At step 912, a total loss is calculated. For example, Equation (8) may be used to calculate the Total_Loss.

[0119] At step 914, the encoder and decoder weights may be iteratively updated until the total loss is converged and the set_get_slope are within a threshold, for example, 0.99-1.01 for multiple ASR targets.

[0120] After training (e.g., by the method 900), the trained decoder 712 may be used to construct new images at reference overlay (OV) values. All the measured ASR subtargets’ SEM images with a set_OV label = 0 nm are selected. Each image is regularly cropped to unit cell image clips with the size the same as those used during the VAE model training.. It is noted that randomly clipped images are not generated for reference image construction. The VAE encoder 706 is applied to obtain the latent space parameters per unit cell image clips. The unit cell image clips are clipped from one subtarget image with the step size as the integer times of the pattern pitch. So all the image clips are similar to each other. All the latent space parameters from all the unit cell image clips are averaged to obtain latent_avg(i), i = l..latent_dimension, where latent_dimension is the dimension of the latent space parameters. The overlay parameter is set to a desired value (e.g., OV_latent = OV_ref(j), where j is the desired overlay value) and all other latent space parameters are kept the same as the latent_avg. Using these parameters, the decoder 712 constructs the new image. The decoder 712 constructs new images for each OV_ref (j),j = 1..N, where N is the number of the OV_ref array. A total N number of constructed images can be obtained from the decoder 712.

[0121] For example, per ASR target, all subtarget SEM images having OV_label = 0 nm are selected for constructing new images at reference OV_ref = [-10, .. -5, .. -2,.-l, 0, 1, 2, .. 5, .. 10] nm. As the VAE model for the ASR target has a latent space dimension = 8, 8 latent space parameters per subtarget SEM image clip can be obtained. For example, if there are 64 unit cell image clips per subtarget SEM image, there are a total of 64 x 8 latent parameters, and the averaged latent parameter latent_avg(i), i = 1..8 may be calculated by averaging the latent space parameters over all 64 unit cell image clips. Because a first parameter in the latent space has been trained to represent overlay, replacing the latent_avg(l) by OV_ref(j) and inputting the latent_avg vector into the decoder 712, new images constructed by the decoder with the desired latent space parameter may be obtained. If there are 41 SEM images with OV_label = 0 per ASR target, 41 sets of reference images may be generated per ASR target. For multiple ASR targets, more sets of reference images may be generated. In another embodiment, latent_avg(i) may be calculated through all image clips and all subtargets per ASR target. For example, if there are 64 unit cell image clips per subtarget SEM image, and 41 subtarget SEM images per ASR targets, latent_avg(i) may be calculated by averaging 64 x 8 x 41 latent space parameters.

[0122] Fig. 10 shows example images generated by a trained decoder, consistent with embodiments of the present disclosure. The averaged latent space parameters 1002 and the desired overlay value (OV_ref(j)) 1004 are provided to the decoder 712, which generates a new image 1006. For each of the OV_ref = -10 nm .. +10 nm values, the decoder 712 generates images at each overlay value; for example, image 1008a at an overlay value = 0 nm and image 1008b at an overlay value = -5 nm (e.g., the parallelogram is shifted along the x-axis relative to the circle). It is noted that while Fig. 10 shows images 1008a and 1008b generated with rounded overlay values (e.g., 0 nm and -5 nm), the generated images may be at any desired overlay value (e.g., 2.5 nm, 3.1 nm, etc.).

[0123] Not all the constructed images can be qualified as the OV reference image because some of the constructed images contain an overlay bias due to stochastic characteristics of CPB images. To evaluate the constructed images’ quality, the encoder is used to extract overlay values (encoder_OV) as the get_OV value of all constructed images. Per ASR target, the slope of the set_get_OV line and the intercept are calculated using OV_ref as the set_OV value. The ASR target that achieves the expected set-get-OV slope « 1 and intercept « 0 nm is selected as the reference image.

[0124] Based on the encoder overlay value, the set_get_OV line slope, and the intercept, the constructed images that meet the criteria of set_get_OV line slope =1 + 0.01 and an absolute value of the intercept < 0.05 nm can be selected as OV reference images. In some embodiments, the values of 0.01 for the slope variance and 0.05 nm for the intercept may be user-defined thresholds. In other embodiments, depending on the use case, the user may choose slope variance to be +0.02 or the intercept to be 0.025 nm. In other embodiments, the thresholds for the slope variance and the intercept may haveother values. The stringent criteria on the set_get_OV line slope and the intercept is to ensure that the OV reference images have negligible OV bias while still capturing CPB image stochastics. The OV reference images are then qualified to be used to fine-tune or check various OV algorithms’ parameters. For example, the other OV algorithms may include, but are not limited to, a contour-based algorithm (e.g., an algorithm that finds feature edges) or a region-based algorithm (e.g., template matching, in which an image is searched to look for a pattern in each unicell and the image is “slid” to find a crosscorrelation maximum).

[0125] Other latent space parameters besides the OV_latent parameter may be modified, such as critical dimension (CD) or edge placement error (EPE). The decoder is then used to construct images with various feature CDs, feature placement, and noise level. Those type of constructed images can be utilized to evaluate the robustness of various metrology algorithms.

[0126] Fig. 11 is a flowchart of an example method 1100 for generating new images using the trained decoder 712, consistent with embodiments of the present disclosure. In some embodiments, the method 1100 may be performed by image processing system 250 of Fig. 2 or by the VAE 700 of Fig. 7.

[0127] At step 1102, subtarget images with a set_OV label = 0 nm are selected. The reference images may be generated as described in connection with Figs. 9 and 10.

[0128] At step 1104, the selected subtarget images are cropped into unit cell image clips. The image clips are selected to provide similar images in each of the clips.

[0129] At step 1106, each image clip is provided to the trained encoder 706 to obtain latent space parameters for the image clip.

[0130] At step 1108, averaged latent space parameters are calculated over the set of all selected image clips.

[0131] At step 1110, the desired overlay value (e.g., OV_ref(j) 1004 as described in Fig. 10) is set in the latent space parameters. The desired overlay value may be set to any value.

[0132] At step 1112, the desired overlay value and the averaged latent space parameters are provided to the trained decoder 712. The decoder 712 generates a new image using the desired overlay value and the averaged latent space parameters.

[0133] At step 1114, the generated new image is provided to the trained encoder 706 to generate new latent space parameters.

[0134] At step 1116, the new overlay values (also referred to as encoder_OV) from latent space parameters of the generated new image and OV_ref(j) are used to obtain the set_get_OV slope and intercept. The generated new image having set-get-OV slope and intercept are within the threshold, for example, set_get_OV line slope =1 + 0.01 and an absolute value of the intercept < 0.05, is selected. In some embodiments, the values of 0.01 for the slope variance and 0.05 nm for the intercept may be user- defined thresholds. In other embodiments, depending on the use case, the user may choose slope variance to be ±0.02 or the intercept to be 0.025 nm. In other embodiments, the thresholds for the slope variance and the intercept may have other values.

[0135] At step 1118, the selected new image is provided to another image processing algorithm to tune the parameters of that image processing algorithm.

[0136] Focus and dose measurements relate to metrology of the scanner, e.g., an EUV scanner or a DUV scanner. The focus and dose variables may be plotted on a Bossung curve, which shows that for a given focus value, there are different dose values and different mean line CD values, to map a control surface for the CDs as a function of the focus and dose variables. For example, when the dose is large, the CD is small and conversely, when the dose is small, the CD is large. The Bossung curve reflects how good the scanner control is (to control the CD). For an anchor feature of a design, it is desirable to have large process window. The curves may be fit to be parabolic, with the center of the parabola at the nominal focus condition. It is also desirable to have the process window (e.g., the focus and dose window) to be as large as possible. When the pitch is small (e.g., the features on the wafer are close together) and the focus is changed, the distance between the features will also change. For example, if the distance between two features is 12 nm at nominal conditions and the focus changes, the distance between the two features will change (e.g., the distance may change from 12 nm to 11.5 nm). When the dose changes, the line width (as measured by the CD) may also change (e.g., from 12 nm to 13 nm). It is possible to analyze CPB images to examine the size of features and the distance between features to determine the focus and dose values.

[0137] One way to obtain the size of features and distance between features is through optical metrology. Optical metrology requires large-size marks with specially designed patterns (e.g., focus and dose marks). Those marks are typically placed in the scribe line area between dies on the wafer. When using optical metrology, mark-to-device focus and dose offset are also considered by using the focus and dose measurement in the device area as the reference.

[0138] In some cases, in-device focus and dose metrology may be based on CPB images. Most CPB- based focus and dose metrology methods depend on the CPB image analysis to acquire the pattern CD or placement measurement. A large number of CPB images are required to reduce CD and placement measurement uncertainty during both calibration and application phases. For both optical and CPB- based focus and dose metrology, the goal of the calibration phase is to build a correlation between the measured signals and the focus and dose conditions. The goal of the application phase is to infer the focus and dose from the measured signals based on the correlation found in the calibration phase.

[0139] Similar to the above-described methods for using a VAE to generate images for overlay metrology, a VAE may also be used to determine the focus and dose metrology. For example, instead of finding the edge placement error (EPE) in an image, which may help determine the size of a feature and the distance between features, a trained VAE may be used to directly find the focus and dose values. The VAE may be trained with CPB images acquired at various focus exposure matrix (FEM) conditions. Special loss functions are also defined to address this specific application of the VAE. Other generative models may also be applied to handle more challenging CPB images or layers. Examples of other model types include generative adversarial networks (GANs), diffusion models, and flow-based models.

[0140] Fig. 12 is a schematic diagram of a variational autoencoder (VAE) 1200 for use in connection with focus and dose metrology, consistent with embodiments of the present disclosure. The VAE 1200 includes an encoder 1206 and a decoder 1212. The encoder 1206 helps to decompose input images with different focus and dose values efficiently in latent space parameters. The decoder 1212 constructs new images based on the latent space parameters. The VAE 1200 includes a probabilistic model that helps generate new content similar to, yet different from, the original content. The probability distribution is good at managing stochastics of feature characteristics due to wafer manufacturing process noise and CPB tool noise. In some embodiments, the VAE 1200 may be implemented using the neural network 500.

[0141] A first input image 1202 (at 0° rotation) and a second input image 1204 (at 180° rotation relative to the first input image 1202) are provided to the encoder 1206. The encoder 1206 generates a first set of latent space parameters 1208 for the first image 1202 and a second set of latent space parameters 1210 for the second image 1204. The latent space parameters 1208, 1210 may include a focus parameter, which represents a focus condition based on the placement among different features in CPB images; a dose parameter, which represents a dose condition based on the feature size among different features in CPB images; and a non-linear (NL) parameter, which represents non-linear effects in the image, including, but not limited to CPB distortion, charging effect, and lithography pattern stochastic effect. It is noted that the latent space parameters 1208, 1210 may include additional parameters other than those specifically listed. In some embodiments, the latent space parameters 1208, 1210 may include eight or 16 parameters each.

[0142] The first set of latent space parameters 1208 and the second set of latent space parameters 1210 are provided to the decoder 1212. The decoder 1212 uses the latent space parameters to generate new images to attempt to almost exactly recover the original input image based on the latent space parameters. The decoder 1212 generates a first output image 1214 using the latent space parameters 1208 (attempting to recover the first input image 1202) and a second output image 1216 using the latent space parameters 1210 (attempting to recover the second input image 1204). The encoder 1206 and decoder 1212 are trained by adjusting the weights of the encoder and decoder to get the output images 1214, 1216 close to the input images 1202, 1204.

[0143] The feature placement parameter (e.g., the position of the feature) and the feature size parameter (e.g., the CD parameter) have a special relationship when viewed in the context of the input images 1202, 1204 being rotated by 180°. For example, the placement value will reverse sign (e.g., positive to negative or negative to positive) when the input image is rotated by 180°. As another example, the feature size will remain the same when the input image is rotated by 180° (e.g., the size of the feature will not change).

[0144] The scanner focus and dose parameters may be correlated with the feature placement and the feature size, respectively. To train the encoder 1206 and the decoder 1212, various loss functions may be used. For example, a focus loss may be calculated based on the formula:Focus loss = mse(Focus_latent@0° + Focus_latent@ 180°) Equation (9), where mse( ) is the mean squared error function, Focus_latent@0° is the latent space position parameter obtained at image rotation angle 0°, and Focus_latent@ 180° is the latent space position parameter obtained at image rotation angle 180°.

[0145] A dose loss may be calculated based on the formula:Dose loss = mse(Dose_latent@0° - Dose_latent@ 180°) Equation (10), where Dose_latent@0° is the latent space CD parameter obtained at image rotation angle 0°, and Dose_latent@ 180° is the latent space CD parameter obtained at image rotation angle 180°. It is noted that not all patterns have the same amount of placement shift or CD change when the focus and dose are changed. The above examples are using image rotation angle 180° to show the feature placement and the feature size relationship between a non-zero angle and the image at 0° rotation. The same concept can be expanded to other image rotation angles (such as 90° or 270°) as well as flipping the image along the x-axis and the y-axis.

[0146] In the latent space, the difference between Focus_latent and Focus_label, and the difference between Dose_latent and Dose_label should be reduced. Focus_label and Dose_label are read from the FEM conditions which are used during the field exposure process. Both Focus_label and Dose_label are known when the wafer is made and the measured focus and dose values are correlated with the labels. The set_get_focus loss function and the set_get_dose loss function may be calculated based on the formulas:Set_get_focus_loss= mse(Focus_latent@0° - Focus_label) Equation (11)Set_get_dose_loss= mse(Dose_latent@0° - Dose_label) Equation (12).

[0147] The focus and dose specific related loss function may be calculated based on the formula:Focus_Dose_loss = Focus_loss + Dose_loss + Set_get_focus_loss + Set_get_dose_loss Equation (13).

[0148] Besides the above specific loss functions related with the Focus_latent parameter and the Dose_latent parameter, the typical autoencoder loss functions are also included to ensure that the reconstructed image is similar to the original image. The reconstruction loss and Kullback-Leibler (KL) divergence loss may be calculated based on Equations (6) and (7) as described above.

[0149] A total loss function per image may be calculated based on the formula:Total_Loss = reconstruction_loss + (P x KL_divergence_loss) + (y x Focus_Dose_loss) Equation (14), where and y are hyperparameters to weight the KL_divergence_loss and the Focus_Dose_loss, respectively. As an example, = le-6 and y = 5 may be used as the weights. In some embodiments, other hyperparameters may be set, such as learning rate = le-3, latent space parameter dimension = 8, and mini-batch size = 64 images. It is noted that depending on the particular use case, the hyperparameters may have different values.

[0150] In the VAE training phase, one or more training wafers include various focus and dose conditions. CPB images at various locations on the wafers are obtained. Those CPB images and their focus and dose conditions are used to train the VAE. The VAE model quality is verified with CPB images that have not been seen during the training. Once the VAE model quality is confirmed, the VAE will be used in the application phase to infer focus and dose from any new images. Because some FEM wafers are manufactured to tune and verify wafer process settings, those same FEM wafers may be reused for CPB image acquisition.

[0151] In the inference phase, the VAE takes new CPB images as the input and predicts the focus and dose per image. Key performance indicators (KPIs) are tracked during the run time and the VAE may be retrained or fine-tuned as needed. For example, the local variance of the predicted focus and dose may be periodically checked. The noise resistance and input perturbation sensitivity may also be checked to ensure the robustness of the model. The reconstruction loss and Kullback-Leibler (KL) loss may be used as the KPI to monitor inference quality. Generative quality KPIs, such as Frechet Inception Distance (FID), and Inception Score (IS), may also be used to monitor inference quality. If some of those KPIs are outside of the user-defined threshold, updating the VAE model can be prompted. Using the VAE 1200 performs faster than prior solutions because it is not necessary to perform contour extraction, EPE finding, and linear model building on the input image to determine the focus and dose. The relationship between the placement and the focus are already captured in the VAE encoder during the training process. With a new input image, the focus and dose may be quickly reported, because the focus and dose are generated by the trained VAE encoder as the latent space parameters. No additional images are generated after the VAE is trained by associating the focus and dose with corresponding labels during training. The VAE 1200 can take one CPB image to obtain the focus and dose result without any feature size calculation or feature placement calculation. This can reduce the calculation time from 3-20 seconds to less than 50 ms-500 ms and improve the ease of use.

[0152] The VAE-based focus and dose metrology does not need a large FOV CPB image (such as a 4-20um FOV) and instead uses small FOV (such as 0.5-lum FOV), which can save in-device metrology area and CPB tool acquisition time (such as 1-5 seconds per site). This will allow more waferarea for the device and take less metrology tool time, and therefore a large dollar value saving per wafer. The VAE-based focus and dose metrology aims at in-device metrology, which can eliminate mark-to- device offset and achieve better accuracy. No special focus and dose marks need to be designed for the VAE-based focus and dose metrology, which can save engineering efforts on designing such marks. In the training phase, reusing an existing customer FEM wafer and the CPB images used in the wafer processing setup, can save at least 30 minutes per wafer, leading to faster overall wafer processing times.

[0153] Fig. 13 is an example of synthesized images, consistent with embodiments of the present disclosure. Each unit cell in the synthesized images contains a bar and a circle. Image 1302 shows features generated at a low dose, with small feature sizes (e.g., both the circle and the bar size are small). Image 1304 shows features generated at a higher dose than image 1302, with relatively larger feature sizes (e.g., both the circle and the bar are larger than in image 1302). To mimic the dose effect as a feature size change, the circle radius varies, for example, from 14 pixels to 22 pixels with a step size of 1 pixel while the bar width varies from 11 pixels to 19 pixels with a step size of 1 pixel. The circle size and bar width increase and decrease together, to provide a total of nine dose conditions. To mimic the focus effect as the placement change, the distance between the circle center and the bar center varies from -5 pixels to +4 pixels with a step size of 1 pixel, to provide a total of nine focus conditions. Combining the nine dose conditions and the nine focus conditions, 81 FEM conditions with 81 synthetic images may be created. For example, 35 synthetic images may be selected (which represents 35 FEM conditions) to train the model and the remaining 46 synthetic images may be used to test model performance.

[0154] Similar to the overlay metrology example, the synthetic images may be plotted using the set_get_dose line slope and the set_get_focus line slope. For example, a set_CD value may be plotted on the x-axis and the get_dose value may be plotted on the y-axis to determine the set_get_dose line slope. In some embodiments, the CD label may be used to approximate the dose label. For example, if a circle radius = 18pixels is equivalent to a dose = 40mJ / cm2in a wafer process, the 18 pixel CD labels may be replaced with a dose 40mJ / cm2label during VAE model training and inference. As another example, an on-product overlay (OPO) value may be plotted on the x-axis and the get_focus value may be plotted on the y-axis to determine the set_get_focus line slope.

[0155] Fig. 14 is a flowchart of an example method 1400 for training a variational autoencoder for use in connection with focus and dose metrology, consistent with embodiments of the present disclosure. In some embodiments, the method 1400 may be performed by image processing system 250 of Fig. 2 or by the VAE 1200 of Fig. 12.

[0156] At step 1402, one or more sample wafers are created with features having known focus and dose values.

[0157] At step 1404, images are taken of the sample wafers. For example, CPB or SEM images may be taken of the wafers. Because the focus and dose values are known, the images may be labeled with the known focus and dose values.

[0158] At step 1406, the image clips are obtained.

[0159] At step 1408, each sampled image clip is passed through the encoder 1206 to obtain the focus and dose latent space parameters.

[0160] At step 1410, the total loss (e.g., using Equation (14)) is calculated.

[0161] At step 1412, the encoder 1206 and decoder 1212 weights are adjusted to minimize the total loss.

[0162] At step 1414, the VAE 1200 is monitored based on latent space KPIs and robustness KPIs like local variance of the predicted focus and dose, resistance to noise, and sensitivity to input perturbation. In some embodiments, step 1414 is an optional step, and may be used to monitor wafer process change during focus and dose metrology.

[0163] A non-transitory computer readable medium may be provided that stores instructions for a processor of a controller (e.g., controller 109 of FIG. 1) to carry out, among other things, image inspection, image acquisition, stage positioning, beam focusing, electric field adjustment, beam bending, condenser lens adjusting, activating charged particle source, beam deflecting, method 900, method 1100, and method 1400. Common forms of non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a Compact Disc Read Only Memory (CD-ROM), any other optical data storage medium, any physical medium with patterns of holes, a Random Access Memory (RAM), a Programmable Read Only Memory (PROM), and Erasable Programmable Read Only Memory (EPROM), a FLASH-EPROM or any other flash memory, Non-Volatile Random Access Memory (NVRAM), a cache, a register, any other memory chip or cartridge, and networked versions of the same.

[0164] The embodiments may further be described using the following clauses:1. A non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computing device to cause the computing device to perform operations for tuning an image processing algorithm with constructed reference images, the operations comprising: using a decoder of an encoder-decoder model to construct new images having features with known reference values, wherein the encoder-decoder model has been trained with multiple target images with programmed reference values; extracting metrology values from the constructed new images using the encoder; calculating key performance indicators (KPIs) of the new images based on the extracted metrology values; and selecting the constructed new images whose KPIs are within a target range as reference images for tuning parameters of the image processing algorithm.2. The non-transitory computer readable medium of clause 1, wherein the reference values include metrology values.3. The non-transitory computer readable medium of clauses 1 or 2, wherein the reference values include any one of overlay, critical dimension, or edge placement error.4. The non-transitory computer readable medium of any one of clauses 1-3, wherein the encoderdecoder model includes a variational autoencoder.5. The non-transitory computer readable medium of any one of clauses 1-4, wherein the encoder creates a set of latent space parameters to represent the target image.6. The non-transitory computer readable medium of clause 5, wherein the reference value is a predetermined value for one of the latent space parameters.7. The non-transitory computer readable medium of any one of clauses 1-6, wherein the KPIs include an overlay slope, wherein: a first overlay value is a programmed overlay value of one new image; a second overlay value is a measured overlay value of the one new image; and the overlay slope is determined by: calculating an x-axis value based on the first overlay value; calculating a y-axis value based on the second overlay value; and calculating a line through the calculated x-axis value and the calculated y-axis value.8. The non-transitory computer readable medium of clause 7, wherein the KPIs further include an intercept determined by an offset of the line relative to an origin of the y-axis.9. The non-transitory computer readable medium of any one of clauses 1-8, the operations further comprising: clipping the target images, wherein the clipping includes setting an origin point for the constructed new images to match an origin point of the target images.10. The non-transitory computer readable medium of any one of clauses 1-9, wherein training of the encoder-decoder model includes: encoding an image of a batch of images to determine a first latent space representation; transforming the image; encoding the transformed image to determine a second latent space representation; selecting parameters from the first latent space representation and the second latent space representation; and calculating a loss function over all images in the batch of images based on the selected parameters from the first latent space representation and the second latent space representation.11. The non-transitory computer readable medium of clause 10, wherein the transforming includes rotating the image by any one of: 90°, 180°, 270°, or by flipping the image along the x-axis or the y- axis.12. The non-transitory computer readable medium of clauses 10 or 11, wherein: the first latent space representation and the second latent space representation includes an overlay value; and the loss function includes an error function based on adding the overlay value of the image and the overlay value of the transformed image.13. The non-transitory computer readable medium of clause 12, wherein the error function includes a mean squared error or a mean absolute error.14. The non-transitory computer readable medium of clauses 12 or 13, wherein training of the encoderdecoder model further includes: calculating an intercept value as a mean of a difference between the overlay value in the latent space and a corresponding overlay label for the image, wherein the mean is calculated over all images in the batch.15. The non-transitory computer readable medium of any one of clauses 10-14, wherein: the first latent space representation and the second latent space representation includes a critical dimension value; and the loss function includes an error function based on subtracting the critical dimension value of the transformed image from the critical dimension value of the image.16. The non-transitory computer readable medium of any one of clauses 10-15, wherein: the first latent space representation and the second latent space representation includes an intercept overlay value; and the loss function includes an error function based on adding the intercept overlay value of the image and a label of the image, wherein the label is a predetermined overlay value of the image.17. A non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computing device to cause the computing device to perform operations for generating synthetic images to tune an image processing algorithm, the operations comprising: using a trained decoder of a variational autoencoder to generate the synthetic images based on a selected value of one latent space parameter and average latent space parameters, wherein the average latent space parameters are determined by training an encoder and the decoder of the variational autoencoder on a set of training images; verifying the generated synthetic images using the encoder; and using the verified generated synthetic images to tune the image processing algorithm.18. The non-transitory computer readable medium of clause 17, wherein the one latent space parameter is any one of overlay, critical dimension, or edge placement error.19. A non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computing device to cause the computing device to perform operations for focus and dose metrology, the operations comprising: providing a charged particle beam (CPB) image of a wafer pattern to a machine learning model; and executing the machine learning model to generate focus and dose information in generating the wafer pattern in a lithographic process.20. The non-transitory computer readable medium of clause 19, wherein the machine learning model is trained using images that have substantially a same pattern as the CPB image.21. The non-transitory computer readable medium of clause 20, wherein the training images include measured images with known focus and dose values.22. The non-transitory computer readable medium of clause 21, wherein the known focus and dose values are labels of the training images and the operations further comprise: minimizing a loss function between a measured focus and dose value and the labels.23. The non-transitory computer readable medium of any one of clauses 19-22, wherein the operations further comprise: calculating a first focus loss as an error function of a sum of a first focus value of a first measured image at a first angle and a second focus value of a second measured image at a second angle; and calculating a first dose loss as an error function of a difference of a first dose value of the first measured image and a second dose value of the second measured image.24. The non-transitory computer readable medium of clause 23, wherein: the first angle and the second angle are different; and the first angle and the second angle are any one of: 0°, 90°, 180°, 270°, or by flipping the image along the x-axis or the y-axis.25. The non-transitory computer readable medium of clauses 23 or 24, wherein the operations further comprise: calculating a second focus loss as an error function of a difference between the first focus value and a focus label, wherein the focus label is a known focus value of the wafer pattern; and calculating a second dose loss as an error function of a difference between the first dose value and a dose label, wherein the dose label is a known dose value of the wafer pattern.26. The non-transitory computer readable medium of clause 25, wherein the operations further comprise: calculating a focus and dose loss as a sum of the first focus loss, the first dose loss, the second focus loss, and the second dose loss; and calculating the total loss as the weight sum of focus and dose loss, image reconstruction error and Kullback-Leibler (KL) divergence loss.27. The non-transitory computer readable medium of any one of clauses 19-26, wherein the machine learning model includes a variational autoencoder.28. The non-transitory computer readable medium of any one of clauses 19-27, wherein the executing the machine learning model includes: providing a new input image to an encoder to determine focus and dose values for the new input image.29. The non-transitory computer readable medium of any one of clauses 19-28, wherein the operations further comprise: monitoring key performance indicators (KPIs) during run time.30. The non-transitory computer readable medium of clause 29, wherein the KPIs include a local variance of the generated focus and dose information.31. The non-transitory computer readable medium of clauses 29 or 30, wherein the KPIs include a noise resistance, an input perturbation sensitivity, a reconstruction loss, a Kullback-Leibler loss, a Frechet Inception Distance, or an Inception Score.32. The non-transitory computer readable medium of any one of clauses 29-31 , wherein the operations further comprise: updating the machine learning model on a condition that the KPIs are outside of a threshold.

[0165] Block diagrams in the figures may illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer hardware or software products according to various exemplary embodiments of the present disclosure. In some embodiments, a non-transitory computer-readable medium is provided and can include instructions to perform the functions described in connection with any one or more of Figs. 6-14. In this regard, each block in a schematic diagram may represent certain arithmetical or logical operation processing that may be implemented using hardware such as an electronic circuit. Blocks may also represent a module, segment, or portion of code that comprises one or more executable instructions for implementing the specified logical functions. It should be understood that in some alternative implementations, functions indicated in a block may occur out of the order noted in the figures. For example, two blocks shown in succession may be executed or implemented substantially concurrently, or two blocks may sometimes be executed in reverse order, depending upon the functionality involved. Some blocks may also be omitted. It should also be understood that each block of the block diagrams, and combination of the blocks, may be implemented by special purpose hardware -based systems that perform the specified functions or acts, or by combinations of special purpose hardware and computer instructions.

[0166] It will be appreciated that the embodiments of the present disclosure are not limited to the exact construction that has been described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The present disclosure has been described in connection with various embodiments, and other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the technology disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope of the invention being indicated by the following claims.

Claims

CLAIMS1. A non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computing device to cause the computing device to perform operations for tuning an image processing algorithm with constructed reference images, the operations comprising: using a decoder of an encoder-decoder model to construct new images having features with known reference values, wherein the encoder-decoder model has been trained with multiple target images with programmed reference values; extracting metrology values from the constructed new images using the encoder; calculating key performance indicators (KPIs) of the new images based on the extracted metrology values; and selecting the constructed new images whose KPIs are within a target range as reference images for tuning parameters of the image processing algorithm.

2. The non-transitory computer readable medium of claim 1 , wherein the reference values include values of any one of overlay, critical dimension, or edge placement error.

3. The non-transitory computer readable medium of claim 1, wherein: the encoder creates a set of latent space parameters to represent the target image; and the reference value is a predetermined value for one of the latent space parameters.

4. The non-transitory computer readable medium of claim 1, wherein the KPIs include: an overlay slope, wherein: a first overlay value is a programmed overlay value of one new image; a second overlay value is a measured overlay value of the one new image; and the overlay slope is determined by: calculating an x-axis value based on the first overlay value; calculating a y-axis value based on the second overlay value; and calculating a line through the calculated x-axis value and the calculated y- axis value; and an intercept determined by an offset of the line relative to an origin of the y-axis.

5. The non-transitory computer readable medium of claim 1, wherein the image processing algorithm includes any one of an overlay calculating algorithm, a critical dimension calculating algorithm, an edge placement error calculating algorithm, or a defect detection algorithm.

6. The non-transitory computer readable medium of claim 1 , the operations further comprising: clipping the target images, wherein the clipping includes setting an origin point for the constructed new images to match an origin point of the target images.

7. The non-transitory computer readable medium of claim 1, wherein training of the encoder-decoder model includes: encoding an image of a batch of images to determine a first latent space representation; transforming the image; encoding the transformed image to determine a second latent space representation; selecting parameters from the first latent space representation and the second latent space representation; and calculating a loss function over all images in the batch of images based on the selected parameters from the first latent space representation and the second latent space representation.

8. The non-transitory computer readable medium of claim 7, wherein: the first latent space representation and the second latent space representation includes an overlay value and a critical dimension value; the loss function includes: a first error function based on adding the overlay value of the image and the overlay value of the transformed image; and a second error function based on subtracting the critical dimension value of the transformed image from the critical dimension value of the image; and training of the encoder-decoder model further includes: calculating an intercept value as a mean of a difference between the overlay value in the latent space and a corresponding overlay label for the image, wherein the mean is calculated over all images in the batch.

9. The non-transitory computer readable medium of claim 8, wherein the error function includes a mean squared error or a mean absolute error.

10. An apparatus for tuning an image processing algorithm with constructed reference images, comprising: a memory storing a set of instructions; and at least one processor configured to execute the set of instructions to cause the apparatus to perform operations comprising:using a decoder of an encoder-decoder model to construct new images having features with known reference values, wherein the encoder-decoder model has been trained with multiple target images with programmed reference values; extracting metrology values from the constructed new images using the encoder; calculating key performance indicators (KPIs) of the new images based on the extracted metrology values; and selecting the constructed new images whose KPIs are within a target range as reference images for tuning parameters of the image processing algorithm.

11. The apparatus of claim 10, wherein the reference values include any one of overlay, critical dimension, or edge placement error.

12. The apparatus of claim 10, wherein: the encoder creates a set of latent space parameters to represent the target image; and the reference value is a predetermined value for one of the latent space parameters.

13. The apparatus of claim 10, the operations further comprising: clipping the target images, wherein the clipping includes setting an origin point for the constructed new images to match an origin point of the target images.

14. The apparatus of claim 10, wherein the KPIs include: an overlay slope, wherein: a first overlay value is a programmed overlay value of one new image; a second overlay value is a measured overlay value of the one new image; and the overlay slope is determined by: calculating an x-axis value based on the first overlay value; calculating a y-axis value based on the second overlay value; and calculating a line through the calculated x-axis value and the calculated y- axis value; and an intercept determined by an offset of the line relative to an origin of the y-axis.

15. A method for tuning an image processing algorithm with constructed reference images, comprising: using a decoder of an encoder-decoder model to construct new images having features with known reference values, wherein the encoder-decoder model has been trained with multiple target images with programmed reference values; extracting metrology values from the constructed new images using the encoder;calculating key performance indicators (KPIs) of the new images based on the extracted metrology values; and selecting the constructed new images whose KPIs are within a target range as reference images for tuning parameters of the image processing algorithm.

16. The method of claim 15, wherein the KPIs include: an overlay slope, wherein: a first overlay value is a programmed overlay value of one new image; a second overlay value is a measured overlay value of the one new image; and the overlay slope is determined by: calculating an x-axis value based on the first overlay value; calculating a y-axis value based on the second overlay value; and calculating a line through the calculated x-axis value and the calculated y- axis value; and an intercept determined by an offset of the line relative to an origin of the y-axis.