Using generative artifical intelligence for charged-particle beam image defect detection

The use of a variational autoencoder model for defect detection in charged-particle beam inspection images addresses the challenge of efficiently identifying nanometer-scale defects, enhancing accuracy and throughput in IC manufacturing.

WO2026130987A1PCT designated stage Publication Date: 2026-06-25ASML NETHERLANDS BV

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
ASML NETHERLANDS BV
Filing Date
2025-11-25
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing charged-particle beam inspection methods struggle to efficiently detect defects in nanometer-scale IC components due to the stochastic nature of wafer manufacturing processes, leading to reduced yield and throughput in defect detection.

Method used

Employing a variational autoencoder (VAE) model trained on normal images with noise to determine latent space parameters and calculate key performance indicators (KPIs) for defect detection in charged-particle beam inspection images, using reconstruction error and latent space parameters to identify defects.

Benefits of technology

Enhances defect detection accuracy and efficiency, improving yield and throughput by accurately identifying defects in high-resolution charged-particle beam images.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure EP2025084111_25062026_PF_FP_ABST
    Figure EP2025084111_25062026_PF_FP_ABST
Patent Text Reader

Abstract

A non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computing device to cause the computing device to perform operations for defect detection. The operations include: inputting an image clip of a charged-particle beam inspection (CPBI) image to a variational autoencoder (VAE) model that has been trained with multiple training image clips. Latent space parameters are received from the VAE. Key performance indicators (KPIs) are calculated based on the received latent space parameters and a reconstruction error. Defects are determined in the CPBI image based on the calculated KPIs.
Need to check novelty before this filing date? Find Prior Art

Description

USING GENERATIVE ARTIFICAL INTELLIGENCE FOR CHARGED-PARTICLE BEAM IMAGE DEFECT DETECTIONCROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority of US application 63 / 736,552 which was filed on 19 December 2024, and which is incorporated herein in its entirety by referenceTECHNICAL FIELD

[0002] The embodiments provided herein relate to using a generative artificial intelligence model for defect detection, and more particularly to, using a generative artificial intelligence model to determine whether a charged-particle beam inspection image includes a defect.BACKGROUND

[0003] Charged-particle beam tools, such as a charged-particle beam inspection tool, can acquire high spatial resolution images of test targets, actual devices, and device-like structures on a wafer. There are various image processing algorithms that can determine whether the image includes a defect.

[0004] With shrinking feature sizes and higher density patterns on advanced chips, defects are getting smaller and occur more randomly due to the stochastic nature of wafer manufacturing process. Highspeed scanning electron microscopy (SEM) tools (including multi-electron-beam systems) may be used in defect inspection because of their higher resolution than optical inspection tools. Traditional SEM defect inspection usually determines the pattern defects by comparing an image with some reference data, for example, a die-to-die comparison.SUMMARY

[0005] Some embodiments provide a non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computing device to cause the computing device to perform operations for defect detection. The operations include: inputting an image clip of a charged-particle beam inspection (CPBI) image to a variational autoencoder (VAE) model that has been trained with multiple training image clips. Latent space parameters are received from the VAE. Key performance indicators (KPIs) are calculated based on the received latent space parameters and a reconstruction error. Defects are determined in the CPBI image based on the calculated KPIs.

[0006] Some embodiments provide an apparatus for performing defect detection. The apparatus includes a memory storing a set of instructions and at least one processor configured to execute the set of instructions to cause the apparatus to perform operations. The operations include: inputting an image clip of a charged-particle beam inspection (CPBI) image to a variational autoencoder (VAE) model that has been trained with multiple training image clips. Latent space parameters are receivedfrom the VAE. Key performance indicators (KPIs) are calculated based on the received latent space parameters and a reconstruction error. Defects are determined in the CPBI image based on the calculated KPIs.

[0007] Other advantages of the embodiments of the present disclosure will become apparent from the following description taken in conjunction with the accompanying drawings wherein are set forth, by way of illustration and example, certain embodiments of the present invention.BRIEF DESCRIPTION OF FIGURES

[0008] The above and other aspects of the present disclosure will become more apparent from the description of exemplary embodiments, taken in conjunction with the accompanying drawings.

[0009] Fig. 1 is a schematic diagram illustrating an example charged-particle beam (CPB) system, consistent with embodiments of the present disclosure.

[0010] Fig. 2 is a schematic diagram illustrating an example charged-particle beam tool, consistent with embodiments of the present disclosure that may be a part of the example charged-particle beam system of Fig. 1.

[0011] Fig. 3 is a schematic diagram illustrating an example multi-beam tool, consistent with embodiments of the present disclosure that can be a part of the example charged-particle beam system of Fig. 1.

[0012] Fig. 4 is a block diagram of an exemplary server, consistent with embodiments of the present disclosure.

[0013] Fig. 5 is a schematic diagram illustrating an example neural network implementing a variational autoencoder, consistent with embodiments of the present disclosure.

[0014] Fig. 6 is a schematic diagram of a variational autoencoder (VAE) being trained with image clips, consistent with embodiments of the present disclosure.

[0015] Fig. 7 is an example of generating synthetic images with noise and defects, consistent with embodiments of the present disclosure.

[0016] Fig. 8 is a flowchart of a method for performing inference using the trained VAE, consistent with embodiments of the present disclosure.

[0017] Fig. 9A is an example of generating image clips without overlapping margins, consistent with embodiments of the present disclosure.

[0018] Fig. 9B is an example of generating image clips with overlapping margins, consistent with embodiments of the present disclosure.

[0019] Fig. 10 is a flowchart of a method for determining whether an image clip contains a defect based on calculated key performance indicators, consistent with embodiments of the present disclosure.

[0020] Fig. 11 is an example of reclipping an image, consistent with embodiments of the present disclosure.DETAILED DESCRIPTION

[0021] Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise represented. The implementations set forth in the following description of exemplary embodiments do not represent all implementations. Instead, they are merely examples of apparatuses and methods consistent with aspects related to the disclosed embodiments as recited in the appended claims. For example, although some embodiments are described in the context of utilizing electron beams, the disclosure is not so limited. Other types of charged-particle beams (e.g., including protons, ions, muons, or any other particle carrying electric charges) may be similarly applied. Furthermore, other imaging systems may be used, such as optical imaging, photon detection, x-ray detection, ion detection, etc.

[0022] Relative dimensions of components in drawings may be exaggerated for clarity. Within the following description of drawings, the same or like reference numbers refer to the same or like components or entities, and only the differences with respect to the individual embodiments are described. As used herein, unless specifically stated otherwise, the term “or” encompasses all possible combinations, except where infeasible. For example, if it is stated that a component may include A or B, then, unless specifically stated otherwise or infeasible, the component may include A, or B, or A and B. As a second example, if it is stated that a component may include A, B, or C, then, unless specifically stated otherwise or infeasible, the component may include A, or B, or C, or A and B, or A and C, or B and C, or A and B and C.

[0023] Electronic devices are constructed of circuits formed on a piece of semiconductor material called a substrate. The semiconductor material may include, for example, silicon, gallium arsenide, indium phosphide, silicon germanium, or the like. Many circuits may be formed together on the same piece of silicon and are called integrated circuits or ICs. The size of these circuits has decreased dramatically so that many more of them can be fit on the substrate. For example, an IC chip in a smartphone can be as small as a thumbnail and yet may include over 2 billion transistors, the size of each transistor being less than 1 / 1000th the size of a human hair.

[0024] Making these ICs with extremely small structures or components is a complex, timeconsuming, and expensive process, often involving hundreds of individual steps. Errors in even one step have the potential to result in defects in the finished IC, rendering it useless. Thus, one goal of the manufacturing process is to avoid such defects to maximize the number of functional ICs made in the process; that is, to improve the overall yield of the process.

[0025] One component of improving yield is monitoring the chip-making process to ensure that it is producing a sufficient number of functional integrated circuits. One way to monitor the process is to inspect the chip circuit structures at various stages of their formation. Inspection can be carried outusing a scanning charged-particle microscope (SCPM). For example, an SCPM may be a scanning electron microscope (SEM). An SCPM can be used to image these extremely small structures, in effect, taking a “picture” of the structures of the wafer. The image can be used to determine if the structure was formed properly in the proper location. If the structure is defective, then the process can be adjusted, so the defect is less likely to recur.

[0026] The working principle of an SCPM (e.g., an SEM) is similar to a camera. A camera takes a picture by receiving and recording intensity of light reflected or emitted from people or objects. An SCPM takes a “picture” by receiving and recording energies or quantities of charged particles (e.g., electrons) reflected or emitted from the structures of the wafer. Typically, the structures are made on a substrate (e.g., a silicon substrate) that is placed on a platform, referred to as a stage, for imaging. Before taking such a “picture,” a charged-particle beam may be projected onto the structures, and when the charged particles are reflected or emitted (“exiting”) from the structures (e.g., from the wafer surface, from the structures underneath the wafer surface, or both), a detector of the SCPM may receive and record the energies or quantities of those charged particles to generate an inspection image. To take such a “picture,” the charged-particle beam may scan through the wafer (e.g., in a line- by-line or zig-zag manner), and the detector may receive exiting charged particles coming from a region under charged particle-beam projection (referred to as a “beam spot”). The detector may receive and record exiting charged particles from each beam spot one at a time and join the information recorded for all the beam spots to generate the inspection image. Some SCPMs use a single charged-particle beam (referred to as a “single-beam SCPM,” such as a single-beam SEM) to take a single “picture” to generate the inspection image, while some SCPMs use multiple charged- particle beams (referred to as a “multi-beam SCPM,” such as a multi-beam SEM) to take multiple “sub-pictures” of the wafer in parallel and stitch them together to generate the inspection image. By using multiple charged-particle beams, the SCPM may provide more charged-particle beams onto the structures for obtaining these multiple “sub-pictures,” resulting in more charged particles exiting from the structures. Accordingly, the detector may receive more exiting charged particles simultaneously and generate inspection images of the structures of the wafer with higher efficiency and faster speed.

[0027] Generating and processing these images to determine whether any defects exist (sometimes as small as the nanometer scale) are computationally intensive. And as the physical sizes of IC components continue to shrink, accuracy and yield in defect detection become more important. To inspect a single wafer, it is not uncommon for an inspection system to generate and process a substantial number of images. For example, if each image taken corresponds to a 6 pm x 6 pm portion of a wafer, for a 200mm wafer, it would take over 872 million images to image the entire wafer. If these images are not processed and evaluated efficiently, not surprisingly, yield will be dramatically impacted, thereby affecting the wafer throughput.

[0028] As used herein, the term “charged-particle beam inspection tool” may be understood to include an SCPM or an SEM as described above. The term “charged-particle beam tool” may beunderstood to include similar tools used in different environments, such as used in connection with a scanner (e.g., an EUV or DUV scanner).

[0029] As the physical sizes of IC components continue to shrink, accuracy and yield in defect detection become more important. Metrology tools can be used to determine whether the ICs are correctly manufactured by identifying a number of defects on each wafer, including at different levels of detail, such as a pattern level, an image (field of view) level, a die level, a care area level, or a wafer level.

[0030] Machine learning (ML) has been adopted for defect detection. In general, there are two approaches in applying ML to defect detection. In a first approach, the ML is trained with abnormal data in advance, and then the trained ML model finds abnormal parts of the data to inspect. In a second approach, the ML is trained with normal data in advance, and then the trained ML model finds the parts that do not match the learned normal data from the data to inspect.

[0031] Some prior implementations of the second approach used an autoencoder or a variational autoencoder to detect the defects in the SEM images. A discrepancy index key performance indicator may be calculated as a pixel number normalized mean square difference between an original image clip and a reconstructed image clip. The image clip with the defect is found if its discrepancy index is larger than a specific threshold (such as 0.003). The threshold depends on the use case and needs to be carefully determined to ensure correct defect detection. The value of the discrepancy index depends on the relationship between the clipping window position and the defect patterns. Using only the discrepancy index makes it difficult to find all the defects in an image.

[0032] Embodiments of the present disclosure can provide a variational autoencoder (VAE) trained to inspect image of a wafer to detect defects. The VAE may be trained on measured normal images containing noise but not various defects to be detected. The trained VAE may determine latent space parameters from an image. Key performance indicators (KPIs) are calculated based on the latent space parameters. Using the latent space parameter KPIs and a reconstruction difference KPI, the VAE may determine whether the image contains a defect. In some embodiments, the VAE can be trained with a majority number of normal images and small number of images that contain the defects.

[0033] Fig. 1 illustrates an exemplary charged-particle beam (CPB) system 100 consistent with embodiments of the present disclosure. CPB system 100 may be used for imaging. For example, CPB system 100 may use an electron beam for imaging. As shown in Fig. 1, CPB system 100 includes a main chamber 101, a load / lock chamber 102, a beam tool 104, and an equipment front end module (EFEM) 106. Beam tool 104 is located within main chamber 101. EFEM 106 includes a first loading port 106a and a second loading port 106b. EFEM 106 may include additional loading port(s). First loading port 106a and second loading port 106b receive wafer front opening unified pods (FOUPs) that contain wafers (e.g., semiconductor wafers or wafers made of other material(s)) or samples to be inspected (the terms “wafers” and “samples” may be used interchangeably). A “lot” is a plurality of wafers that may be loaded for processing as a batch.

[0034] One or more robotic arms (not shown) in EFEM 106 may transport the wafers to load / lock chamber 102. Load / lock chamber 102 is connected to a load / lock vacuum pump system (not shown) which removes gas molecules in load / lock chamber 102 to reach a first pressure below the atmospheric pressure. After reaching the first pressure, one or more robotic arms (not shown) may transport the wafer from load / lock chamber 102 to main chamber 101. Main chamber 101 is connected to a main chamber vacuum pump system (not shown) which removes gas molecules in main chamber 101 to reach a second pressure below the first pressure. After reaching the second pressure, the wafer is subject to inspection by beam tool 104. Beam tool 104 may be a single-beam system or a multi-beam system.

[0035] A controller 109 is electronically connected to beam tool 104. Controller 109 may be a computer that may execute various controls of CPB system 100. While controller 109 is shown in Fig. 1 as being outside of the structure that includes main chamber 101, load / lock chamber 102, and EFEM 106, it is appreciated that controller 109 may be a part of the structure.

[0036] In some embodiments, controller 109 may include one or more processors (not shown). A processor may be a generic or specific electronic device capable of manipulating or processing information. For example, the processor may include any combination of any number of a central processing unit (or “CPU”), a graphics processing unit (or “GPU”), an optical processor, a programmable logic controller, a microcontroller, a microprocessor, a digital signal processor, an intellectual property (IP) core, a Programmable Logic Array (PLA), a Programmable Array Logic (PAL), a Generic Array Logic (GAL), a Complex Programmable Logic Device (CPLD), a Field- Programmable Gate Array (FPGA), a System On Chip (SoC), an Application-Specific Integrated Circuit (ASIC), and any type circuit capable of data processing. The processor may also be a virtual processor that includes one or more processors distributed across multiple machines or devices coupled via a network.

[0037] In some embodiments, controller 109 may further include one or more memories (not shown). A memory may be a generic or specific electronic device capable of storing codes and data accessible by the processor (e.g., via a bus). For example, the memory may include any combination of any number of a random-access memory (RAM), a read-only memory (ROM), an optical disc, a magnetic disk, a hard drive, a solid-state drive, a flash drive, a security digital (SD) card, a memory stick, a compact flash (CF) card, or any type of storage device. The codes may include an operating system (OS) and one or more application programs (or “apps”) for specific tasks. The memory may also be a virtual memory that includes one or more memories distributed across multiple machines or devices coupled via a network.

[0038] Fig. 2 illustrates an example imaging system 200 consistent with embodiments of the present disclosure. Beam tool 104 of Fig. 2 may be configured for use in CPB system 100. Beam tool 104 may be a single beam apparatus or a multi-beam apparatus. As shown in Fig. 2, beam tool 104 includes a motorized sample stage 201, and a wafer holder 202 supported by motorized sample stage201 to hold a wafer 203 to be inspected. Beam tool 104 further includes an objective lens assembly 204, a charged-particle detector 206 (which includes charged-particle sensor surfaces 206a and 206b), an objective aperture 208, a condenser lens 210, a beam limit aperture 212, a gun aperture 214, an anode 216, and a cathode 218. Objective lens assembly 204, in some embodiments, may include a modified swing objective retarding immersion lens (SORIL), which includes a pole piece 204a, a control electrode 204b, a deflector 204c, and an exciting coil 204d. Beam tool 104 may additionally include an Energy Dispersive X-ray Spectrometer (EDS) detector (not shown) to characterize the materials on wafer 203.

[0039] A primary charged-particle beam 220 (or simply “primary beam 220”), such as an electron beam, is emitted from cathode 218 by applying an acceleration voltage between anode 216 and cathode 218. Primary beam 220 passes through gun aperture 214 and beam limit aperture 212, both of which may determine the size of charged-particle beam entering condenser lens 210, which resides below beam limit aperture 212. Condenser lens 210 focuses primary beam 220 before the beam enters objective aperture 208 to set the size of the charged-particle beam before entering objective lens assembly 204. Deflector 204c deflects primary beam 220 to facilitate beam scanning on the wafer. For example, in a scanning process, deflector 204c may be controlled to deflect primary beam 220 sequentially onto different locations of top surface of wafer 203 at different time points, to provide data for image reconstruction for different parts of wafer 203. Moreover, deflector 204c may also be controlled to deflect primary beam 220 onto different sides of wafer 203 at a particular location, at different time points, to provide data for stereo image reconstruction of the wafer structure at that location. Further, in some embodiments, anode 216 and cathode 218 may generate multiple primary beams 220, and beam tool 104 may include a plurality of deflectors 204c to project the multiple primary beams 220 to different parts / sides of the wafer at the same time, to provide data for image reconstruction for different parts of wafer 203.

[0040] Exciting coil 204d and pole piece 204a generate a magnetic field that begins at one end of pole piece 204a and terminates at the other end of pole piece 204a. A part of wafer 203 being scanned by primary beam 220 may be immersed in the magnetic field and may be electrically charged, which, in turn, creates an electric field. The electric field reduces the energy of impinging primary beam 220 near the surface of wafer 203 before it collides with wafer 203. Control electrode 204b, being electrically isolated from pole piece 204a, controls an electric field on wafer 203 to prevent microarching of wafer 203 and to ensure proper beam focus.

[0041] A secondary charged-particle beam 222 (or “secondary beam 222”), such as secondary electron beams, may be emitted from the part of wafer 203 upon receiving primary beam 220. Secondary beam 222 may form a beam spot on sensor surfaces 206a and 206b of charged-particle detector 206. Charged-particle detector 206 may generate a signal (e.g., a voltage, a current, or the like) that represents an intensity of the beam spot and provide the signal to an image processing system 250. The intensity of secondary beam 222, and the resultant beam spot, may vary according tothe external or internal structure of wafer 203. Moreover, as discussed above, primary beam 220 may be projected onto different locations of the top surface of the wafer or different sides of the wafer at a particular location, to generate secondary beams 222 (and the resultant beam spot) of different intensities. Therefore, by mapping the intensities of the beam spots with the locations of wafer 203, the processing system may reconstruct an image that reflects the internal or surface structures of wafer 203.

[0042] Imaging system 200 may be used for inspecting a wafer 203 on motorized sample stage 201 and includes beam tool 104, as discussed above. Imaging system 200 may also include an image processing system 250 that includes an image acquirer 260, storage 270, and controller 109. Image acquirer 260 may include one or more processors. For example, image acquirer 260 may include a computer, server, mainframe host, terminals, personal computer, any kind of mobile computing devices, and the like, or a combination thereof. Image acquirer 260 may connect with a detector 206 of beam tool 104 through a medium such as an electrical conductor, optical fiber cable, portable storage media, IR, Bluetooth, internet, wireless network, wireless radio, or a combination thereof. Image acquirer 260 may receive a signal from detector 206 and may construct an image. Image acquirer 260 may thus acquire images of wafer 203. Image acquirer 260 may also perform various post-processing functions, such as generating contours, superimposing indicators on an acquired image, and the like. Image acquirer 260 may perform adjustments of brightness and contrast, or the like of acquired images. Storage 270 may be a storage medium such as a hard disk, cloud storage, random access memory (RAM), other types of computer readable memory, and the like. Storage 270 may be coupled with image acquirer 260 and may be used for saving scanned raw image data as original images, post-processed images, or other images assisting of the processing. Image acquirer 260 and storage 270 may be connected to controller 109. In some embodiments, image acquirer 260, storage 270, and controller 109 may be integrated together as one control unit.

[0043] In some embodiments, image acquirer 260 may acquire one or more images of a sample based on an imaging signal received from detector 206. An imaging signal may correspond to a scanning operation for conducting charged particle imaging. An acquired image may be a single image including a plurality of imaging areas. The single image may be stored in storage 270. The single image may be an original image that may be divided into a plurality of regions. Each of the regions may include one imaging area containing a feature of wafer 203.

[0044] Consistent with some embodiments of this disclosure, a computer-implemented method of training a machine learning model for defect detection may include obtaining training data that includes an inspection image of a fabricated integrated circuit (IC) and design layout data of the IC. The obtaining operation, as used herein, may refer to accepting, taking in, admitting, gaining, acquiring, retrieving, receiving, reading, accessing, collecting, or any operation for inputting data. An inspection image, as used herein, may refer to an image generated as a result of an inspection process performed by a charged-particle inspection apparatus (e.g., system 100 of Fig. 1 or system 200 of Fig.2). For example, an inspection image may be a CPB image generated by image processing system 250 in Fig. 2. A fabricated IC in this disclosure may refer to an IC manufactured on a sample (e.g., a wafer) in a semiconductor manufacturing process (e.g., a photolithography process). For example, the fabricated IC may be manufactured in a die of the sample. Design layout data of an IC, as used herein, may refer to data representing a designed layout of the IC. In some embodiments, the design layout data may include a design layout file in a GDS format (e.g., a GDS layout fde). The design layout file may be visualized (also referred to as “rendered”) to be a 2D image (referred to as a “rendered image” herein) that presents the layout of the IC. The rendered image may include various geometric features (e.g., vertices, edges, corners, polygons, holes, bridges, vias, or the like) of the IC.

[0045] Fig. 3 illustrates a schematic diagram of an example multi-beam beam tool 104 (also referred to herein as apparatus 104) and an image processing system 390 that may be configured for use in CPB system 100 (Fig. 1), consistent with embodiments of the present disclosure.

[0046] Beam tool 104 comprises a charged-particle source 302, a gun aperture 304, a condenser lens 306, a primary charged-particle beam 310 emitted from charged-particle source 302, a source conversion unit 312, a plurality of beamlets 314, 316, and 318 of primary charged-particle beam 310, a primary projection optical system 320, a motorized wafer stage 380, a wafer holder 382, multiple secondary charged-particle beams 336, 338, and 340, a secondary optical system 342, and a charged- particle detection device 344. Primary projection optical system 320 can comprise a beam separator 322, a deflection scanning unit 326, and an objective lens 328. Charged-particle detection device 344 can comprise detection sub-regions 346, 348, and 350.

[0047] Charged-particle source 302, gun aperture 304, condenser lens 306, source conversion unit 312, beam separator 322, deflection scanning unit 326, and objective lens 328 can be aligned with a primary optical axis 360 of apparatus 104. Secondary optical system 342 and charged-particle detection device 344 can be aligned with a secondary optical axis 352 of apparatus 104.

[0048] Charged-particle source 302 can emit one or more charged particles, such as electrons, protons, ions, muons, or any other particle carrying electric charges. In some embodiments, charged- particle source 302 may be an electron source. For example, charged-particle source 302 may include a cathode, an extractor, or an anode, wherein primary electrons can be emitted from the cathode and extracted or accelerated to form primary charged-particle beam 310 (in this case, a primary electron beam) with a crossover (virtual or real) 308. For ease of explanation without causing ambiguity, electrons are used as examples in some of the descriptions herein. However, it should be noted that any charged particle may be used in any embodiment of this disclosure, not limited to electrons. Primary charged-particle beam 310 can be visualized as being emitted from crossover 308. Gun aperture 304 can block off peripheral charged particles of primary charged-particle beam 310 to reduce Coulomb effect. The Coulomb effect may cause an increase in size of probe spots.

[0049] Source conversion unit 312 can comprise an array of image-forming elements and an array of beam-limit apertures. The array of image-forming elements can comprise an array of micro-deflectorsor micro-lenses. The array of image-forming elements can form a plurality of parallel images (virtual or real) of crossover 308 with a plurality of beamlets 314, 316, and 318 of primary charged-particle beam 310. The array of beam-limit apertures can limit the plurality of beamlets 314, 316, and 318. While three beamlets 314, 316, and 318 are shown in Fig. 3, embodiments of the present disclosure are not so limited. For example, in some embodiments, the apparatus 104 may be configured to generate a first number of beamlets. In some embodiments, the first number of beamlets may be in a range from 1 to 1000. In some embodiments, the first number of beamlets may be in a range from 200-500. In some embodiments, an apparatus 104 may generate 400 beamlets.

[0050] Condenser lens 306 can focus primary charged-particle beam 310. The electric currents of beamlets 314, 316, and 318 downstream of source conversion unit 312 can be varied by adjusting the focusing power of condenser lens 306 or by changing the radial sizes of the corresponding beam-limit apertures within the array of beam-limit apertures. Objective lens 328 can focus beamlets 314, 316, and 318 onto a wafer 330 for imaging, and can form a plurality of probe spots 370, 372, and 374 on a surface of wafer 330.

[0051] Beam separator 322 can be a beam separator of Wien fdter type generating an electrostatic dipole field and a magnetic dipole field. In some embodiments, if they are applied, the force exerted by the electrostatic dipole field on a charged particle (e.g., an electron) of beamlets 314, 316, and 318 can be substantially equal in magnitude and opposite in a direction to the force exerted on the charged particle by magnetic dipole field. Beamlets 314, 316, and 318 can, therefore, pass straight through beam separator 322 with zero deflection angle. However, the total dispersion of beamlets 314, 316, and 318 generated by beam separator 322 can also be non-zero. Beam separator 322 can separate secondary charged-particle beams 336, 338, and 340 from beamlets 314, 316, and 318 and direct secondary charged-particle beams 336, 338, and 340 towards secondary optical system 342.

[0052] Deflection scanning unit 326 can deflect beamlets 314, 316, and 318 to scan probe spots 370, 372, and 374 over a surface area of wafer 330. In response to the incidence of beamlets 314, 316, and 318 at probe spots 370, 372, and 374, secondary charged-particle beams 336, 338, and 340 may be emitted from wafer 330. Secondary charged-particle beams 336, 338, and 340 may comprise charged particles (e.g., electrons) with a distribution of energies. For example, secondary charged-particle beams 336, 338, and 340 may be secondary electron beams including secondary electrons (energies < 50 eV) and backscattered electrons (energies between 50 eV and landing energies of beamlets 314, 316, and 318). Secondary optical system 342 can focus secondary charged-particle beams 336, 338, and 340 onto detection sub-regions 346, 348, and 350 of charged-particle detection device 344. Detection sub-regions 346, 348, and 350 may be configured to detect corresponding secondary charged-particle beams 336, 338, and 340 and generate corresponding signals (e.g., voltage, current, or the like) used to reconstruct an inspection image of structures on or underneath the surface area of wafer 330.

[0053] The generated signals may represent intensities of secondary charged-particle beams 336, 338, and 340 and may be provided to image processing system 390 that is in communication with charged-particle detection device 344, primary projection optical system 320, and motorized wafer stage 380. The movement speed of motorized wafer stage 380 may be synchronized and coordinated with the beam deflections controlled by deflection scanning unit 326, such that the movement of the scan probe spots (e.g., scan probe spots 370, 372, and 374) may orderly cover regions of interest on the wafer 330. The parameters of such synchronization and coordination may be adjusted to adapt to different materials of wafer 330. For example, different materials of wafer 330 may have different resistance-capacitance characteristics that may cause different signal sensitivities to the movement of the scan probe spots.

[0054] The intensity of secondary charged-particle beams 336, 338, and 340 may vary according to the external or internal structure of wafer 330, and thus may indicate whether wafer 330 includes defects. Moreover, as discussed above, beamlets 314, 316, and 318 may be projected onto different locations of the top surface of wafer 330, or different sides of local structures of wafer 330, to generate secondary charged-particle beams 336, 338, and 340 that may have different intensities. Therefore, by mapping the intensity of secondary charged-particle beams 336, 338, and 340 with the areas of wafer 330, image processing system 390 may reconstruct an image that reflects the characteristics of internal or external structures of wafer 330.

[0055] In some embodiments, image processing system 390 may include an image acquirer 392, a storage 394, and a controller 396. Image acquirer 392 may comprise one or more processors. For example, image acquirer 392 may comprise a computer, server, mainframe host, terminals, personal computer, any kind of mobile computing devices, or the like, or a combination thereof. Image acquirer 392 may be communicatively coupled to charged-particle detection device 344 of beam tool 104 through a medium such as an electric conductor, optical fiber cable, portable storage media, IR, Bluetooth, internet, wireless network, wireless radio, or a combination thereof. In some embodiments, image acquirer 392 may receive a signal from charged-particle detection device 344 and may construct an image. Image acquirer 392 may thus acquire inspection images of wafer 330. Image acquirer 392 may also perform various post-processing functions, such as generating contours, superimposing indicators on an acquired image, or the like. Image acquirer 392 may be configured to perform adjustments of brightness and contrast of acquired images. In some embodiments, storage 394 may be a storage medium such as a hard disk, flash drive, cloud storage, random access memory (RAM), other types of computer-readable memory, or the like. Storage 394 may be coupled with image acquirer 392 and may be used for saving scanned raw image data as original images, and postprocessed images. Image acquirer 392 and storage 394 may be connected to controller 396. In some embodiments, image acquirer 392, storage 394, and controller 396 may be integrated together as one control unit.

[0056] In some embodiments, image acquirer 392 may acquire one or more inspection images of a wafer based on an imaging signal received from charged-particle detection device 344. An imaging signal may correspond to a scanning operation for conducting charged particle imaging. An acquired image may be a single image comprising a plurality of imaging areas. The single image may be stored in storage 394. The single image may be an original image that may be divided into a plurality of regions. Each of the regions may comprise one imaging area containing a feature of wafer 330. The acquired images may comprise multiple images of a single imaging area of wafer 330 sampled multiple times over a time sequence. The multiple images may be stored in storage 394. In some embodiments, image processing system 390 may be configured to perform image processing steps with the multiple images of the same location of wafer 330.

[0057] In some embodiments, image processing system 390 may include measurement circuits (e.g., analog-to-digital converters) to obtain a distribution of the detected secondary charged particles (e.g., secondary electrons). The charged-particle distribution data collected during a detection time window, in combination with corresponding scan path data of beamlets 314, 316, and 318 incident on the wafer surface, can be used to reconstruct images of the wafer structures under inspection. The reconstructed images can be used to reveal various features of the internal or external structures of wafer 330, and thereby can be used to reveal any defects that may exist in the wafer.

[0058] In some embodiments, the charged particles may be electrons. When electrons of primary charged-particle beam 310 are projected onto a surface of wafer 330 (e.g., probe spots 370, 372, and 374), the electrons of primary charged-particle beam 310 may penetrate the surface of wafer 330 for a certain depth, interacting with particles of wafer 330. Some electrons of primary charged-particle beam 310 may elastically interact with (e.g., in the form of elastic scattering or collision) the materials of wafer 330 and may be reflected or recoiled out of the surface of wafer 330. An elastic interaction conserves the total kinetic energies of the bodies (e.g., electrons of primary charged-particle beam 310) of the interaction, in which the kinetic energy of the interacting bodies does not convert to other forms of energy (e.g., heat, electromagnetic energy, or the like). Such reflected electrons generated from elastic interaction may be referred to as backscattered electrons (BSEs). Some electrons of primary charged-particle beam 310 may inelastically interact with (e.g., in the form of inelastic scattering or collision) the materials of wafer 330. An inelastic interaction does not conserve the total kinetic energies of the bodies of the interaction, in which some or all of the kinetic energy of the interacting bodies convert to other forms of energy. For example, through the inelastic interaction, the kinetic energy of some electrons of primary charged-particle beam 310 may cause electron excitation and transition of atoms of the materials. Such inelastic interaction may also generate electrons exiting the surface of wafer 330, which may be referred to as secondary electrons (SEs). Yield or emission rates of BSEs and SEs depend on, e.g., the material under inspection and the landing energy of the electrons of primary charged-particle beam 310 landing on the surface of the material, among others. The energy of the electrons of primary charged-particle beam 310 may be imparted in part by itsacceleration voltage (e.g., the acceleration voltage between the anode and cathode of charged-particle source 302 in Fig. 3). The quantity of BSEs and SEs may be more or fewer (or even the same) than the injected electrons of primary charged-particle beam 310.

[0059] Fig. 4 is a block diagram of an example server 400, consistent with embodiments of the disclosure. As shown in Fig. 4, server 400 can include processor 402. When processor 402 executes instructions described herein, server 400 can become a specialized machine. Processor 402 can be any type of circuitry capable of manipulating or processing information. For example, processor 402 can include any combination of any number of a central processing unit (“CPU”), a graphics processing unit (“GPU”), a neural processing unit (“NPU”), a microcontroller unit (“MCU”), an optical processor, a programmable logic controller, a microcontroller, a microprocessor, a digital signal processor, an intellectual property (IP) core, a Programmable Logic Array (PLA), a Programmable Array Logic (PAL), a Generic Array Logic (GAL), a Complex Programmable Logic Device (CPLD), a Field-Programmable Gate Array (FPGA), a System On Chip (SoC), an Application- Specific Integrated Circuit (ASIC), or the like. In some embodiments, processor 402 can also be a set of processors grouped as a single logical component. For example, as shown in Fig. 4, processor 402 can include multiple processors, including processor 402a, processor 402b, and processor 402n.

[0060] Server 400 can also include memory 404 configured to store data (e.g., a set of instructions, computer codes, intermediate data, or the like). For example, as shown in Fig. 4, the stored data can include program instructions and data for processing. Processor 402 can access the program instructions and data for processing (e.g., via bus 410), and execute the program instructions to perform an operation or manipulation on the data for processing. Memory 404 can include a highspeed random-access storage device or a non-volatile storage device. In some embodiments, memory 404 can include any combination of any number of a random-access memory (RAM), a read-only memory (ROM), an optical disc, a magnetic disk, a hard drive, a solid-state drive, a flash drive, a security digital (SD) card, a memory stick, a compact flash (CF) card, or the like. Memory 404 can also be a group of memories (not shown in Fig. 4) grouped as a single logical component.

[0061] Bus 410 can be a communication device that transfers data between components inside server 400, such as an internal bus (e.g., a CPU-memory bus), an external bus (e.g., a universal serial bus port, a peripheral component interconnect express port), or the like.

[0062] For ease of explanation without causing ambiguity, processor 402 and other data processing circuits are collectively referred to as a “data processing circuit” in this disclosure. The data processing circuit can be implemented entirely as hardware, or as a combination of software, hardware, or firmware. In addition, the data processing circuit can be a single independent module or can be combined entirely or partially into any other component of server 400.

[0063] Server 400 can further include network interface 406 to provide wired or wireless communication with a network (e.g., the Internet, an intranet, a local area network, a mobile communications network, or the like). In some embodiments, network interface 406 can include anycombination of any number of a network interface controller (NIC), a radio frequency (RF) module, a transponder, a transceiver, a modem, a router, a gateway, a wired network adapter, a wireless network adapter, a Bluetooth adapter, an infrared adapter, a near-field communication (“NFC”) adapter, a cellular network chip, or the like.

[0064] In some embodiments, optionally, server 400 can further include peripheral interface 408 to provide a connection to one or more peripheral devices. As shown in Fig. 4, the peripheral device can include, but is not limited to, a cursor control device (e.g., a mouse, a touchpad, or a touchscreen), a keyboard, a display (e.g., a cathode-ray tube display, a liquid crystal display, or a light-emitting diode display), a video input device (e.g., a camera or an input interface coupled to a video archive), or the like.

[0065] Consistent with embodiments of this disclosure, the computer-implemented method of using a generative artificial intelligence model may also include training the model using obtained training data. In some embodiments, the model may be trained by a computer hardware system. For example, in some embodiments, a machine learning system may be operated in association with, e.g., controller 109, image processing system 250, image acquirer 260, storage 270, image processing system 390, image acquirer 392, or storage 394 of FIGs. 1-3, and server 400 of FIG. 4. For example, as described further below, a generative model may be configured for generating an image from a design clip that resembles a corresponding location on a wafer in a CPB image. This may be performed by 1) training the generative model with design clips and the associated actual CPB images from those locations on the wafer; and 2) using the model in inference mode to feed the model design clips in locations for which simulated CPB images are desired. Such simulated images can be used as reference images in, e.g., die-to-database inspection. While the same hardware and software can be used to perform both the training and the inferencing, it is appreciated that one or more servers (e.g., server 400) can be involved with training the model and separate hardware and software may be used at the inferencing stage, such as controller 109, image processing system 250, image acquirer 260, storage 270, image processing system 390, image acquirer 392, or storage 394.

[0066] A generative model can be generally defined as a model that is probabilistic in nature. In other words, a “generative” model is not one that performs forward simulation or rule-based approaches and, as such, it may not be necessary to model the physics of the processes involved in generating an actual image or output (for which a simulated image or output is being generated). Instead, the generative model can be learned (in that its parameters can be learned) based on a suitable training set of data. Such generative models may have a number of advantages for the embodiments described herein. In addition, the generative model may be configured to have a deep learning architecture in that the generative model may include multiple layers, which may perform a number of algorithms or transformations. The number of layers included in the generative model may depend on the particular use case. For practical purposes, a suitable range of layers is from two layers to a few tens of layers.

[0067] Deep learning is a type of machine learning. The machine learning described herein may be further performed as described in “Introduction to Statistical Machine Learning,” by Sugiyama, Morgan Kaufmann, 2016, 534 pages; “Discriminative, Generative, and Imitative Learning,” by Jebara, MIT Thesis, 2002, 212 pages; and “Principles of Data Mining (Adaptive Computation and Machine Learning)” by Hand et al., MIT Press, 2001, 578 pages; which are incorporated by reference as if fully set forth herein. The embodiments described herein may be further configured as described in these references.

[0068] In some embodiments, a machine learning system may comprise a neural network. For example, a model may be a deep neural network with a set of weights that model the world according to the data that it has been fed to train it. Neural networks can be generally defined as a computational approach which is based on a relatively large collection of neural units loosely modeling the way a biological brain solves problems with relatively large clusters of biological neurons connected by axons. Each neural unit is connected with many others, and links can be enforcing or inhibitory in their effect on the activation state of connected neural units. These systems are self-learning and trained rather than explicitly programmed and excel in areas where the solution or feature detection is difficult to express in a traditional computer program.

[0069] Neural networks typically consist of multiple layers, and the signal path traverses from front to back. The goal of the neural network is to solve problems in the same way that the human brain would, although several neural networks are much more abstract. Modern neural network projects typically work with a few thousand to a few million neural units and millions of connections. The neural network may have any suitable architecture or configuration known in the art.

[0070] In a further embodiment, a model may comprise a convolutional and deconvolution neural network. For example, the embodiments described herein can take advantage of learning concepts such as a convolution and deconvolution neural network to solve the normally intractable representation conversion problem (e.g., rendering). The model may have any convolution and deconvolution neural network configuration or architecture known in the art.

[0071] A neural network, as used herein, may refer to a computing model for analyzing underlying relationships in a set of input data by way of mimicking human brains. Similar to a biological neural network, the neural network may include a set of connected units or nodes (referred to as “neurons”), structured as different layers, where each connection (also referred to as an “edge”) may obtain and send a signal between neurons of neighboring layers in a way similar to a synapse in a biological brain. The signal may be any type of data (e.g., a real number). Each neuron may obtain one or more signals as an input and output another signal by applying a non-linear function to the inputted signals. Neurons and edges may typically be weighted by corresponding weights to represent the knowledge the neural network has acquired. During a training process (similar to a learning process of a biological brain), the weights may be adjusted (e.g., by increasing or decreasing their values) to change the strengths of the signals between the neurons to improve the performance accuracy of theneural network. Neurons may apply a thresholding function (referred to as an “activation function”) to its output values of the non-linear function such that a signal is outputted only when an aggregated value (e.g., a weighted sum) of the output values of the non-linear function exceeds a threshold determined by the thresholding function. Different layers of neurons may transform their input signals in different manners (e.g., by applying different non-linear functions or activation functions). The output of the last layer (referred to as an “output layer”) may output the analysis result of the neural network, such as, for example, a categorization of the set of input data (e.g., as in image recognition cases), a numerical result, or any type of output data for obtaining an analytical result from the input data.

[0072] Training of the neural network, as used herein, may refer to a process of improving the accuracy of the output of the neural network. Typically, the training may be categorized into three types: supervised training, unsupervised training, and reinforcement training. In the supervised training, a set of target output data (also referred to as “labels” or “ground truth”) may be generated based on a set of input data using a method other than the neural network. The neural network may then be fed with the set of input data to generate a set of output data that is typically different from the target output data. Based on the difference between the output data and the target output data, the weights of the neural network may be adjusted in accordance with a rule. If such adjustments are successful, the neural network may generate another set of output data more similar to the target output data in a next iteration using the same input data. If such adjustments are not successful, the weights of the neural network may be adjusted again. After a sufficient number of iterations, the training process may be terminated in accordance with one or more predetermined criteria (e.g., the difference between the final output data and the target output data is below a predetermined threshold, or the number of iterations reaches a predetermined threshold). The trained neural network may be applied to analyze other input data.

[0073] In the unsupervised training, the neural network is trained without any external gauge (e.g., labels) to identify patterns in the input data rather than generating labels for them. Typically, the neural network may analyze shared attributes (e.g., similarities and differences) and relationships among the elements of the input data in accordance with one or more predetermined rules or algorithms (e.g., principal component analysis, clustering, anomaly detection, or latent variable identification). The trained neural network may extrapolate the identified relationships to other input data.

[0074] In the reinforcement learning, the neural network is trained without any external gauge (e.g., labels) in a trial-and-error manner to maximize benefits in decision making. The input data sets of the neural network may be different in the reinforcement training. For example, a reward value or a penalty value may be determined for the output of the neural network in accordance with one or more rules during training, and the weights of the neural network may be adjusted to maximize the rewardvalues (or to minimize the penalty values). The trained neural network may apply its learned decisionmaking knowledge to other input data.

[0075] During the training of a neural network, a loss function (or referred to as a “cost function”) may be used to evaluate the output data. The loss function, as used herein, may map output data of a machine learning model (e.g., the neural network) onto a real number (referred to as a “loss” or a “cost”) that intuitively represents a loss or an error (e.g., representing a difference between the output data and target output data) associated with the output data. The training of the neural network may seek to maximize or minimize the loss function (e.g., by pushing the loss towards a local maximum or a local minimum in a loss curve). For example, one or more parameters of the neural network may be adjusted or updated purporting to maximize or minimize the loss function. After adjusting or updating the one or more parameters, the neural network may obtain new input data in a next iteration of its training. When the loss function is maximized or minimized, the training of the neural network may be terminated.

[0076] By way of example, Fig. 5 is a schematic diagram illustrating an example neural network 500 implementing a variational autoencoder, consistent with embodiments of the present disclosure. As depicted in Fig. 5, neural network 500 may include an input layer 510, including input 510-1, . . ., input 510-b (b being an integer). For example, an input of neural network 500 may include any structure or unstructured data (e.g., an image). In some embodiments, neural network 500 may obtain a plurality of inputs simultaneously. For example, in Fig. 5, neural network 500 may obtain b inputs simultaneously. In some embodiments, input layer 510 may obtain a inputs in succession such that input layer 510 receives input 510-1 in a first cycle (e.g., in a first inference) and pushes data from input 510-1 to an encoder (e.g., encoder 520), then receives a second input in a second cycle (e.g., in a second inference) and pushes data from the second input to the encoder, and so on. Input layer 510 may obtain any number of inputs in the simultaneous manner, the successive manner, or any manner of grouping the inputs.

[0077] Encoder 520 may include one or more nodes, including node 520-1, node 520-2, . . ., node 520-c (c being an integer). A node (also referred to as a “machine perceptron” or a “neuron”) may model the functioning of a biological neuron. Each node may apply an activation function to received inputs (e.g., one or more of input 510-1, . . ., input 510-b). An activation function may include a Heaviside step function, a Gaussian function, a multiquadratic function, an inverse multiquadratic function, a sigmoidal function, a rectified linear unit (ReLU) function (e.g., a ReLU6 function or a Leaky ReLU function), a hyperbolic tangent (“tanh”) function, or any non-linear function. The output of the activation function may be weighted by a weight associated with the node. A weight may include a positive value between 0 and 1, or any numerical value that may scale outputs of some nodes in a layer more or less than outputs of other nodes in the same layer. It is noted that while Fig. 5 shows one layer of nodes for the encoder 520, the encoder 520 may include multiple layers of nodes.

[0078] As further depicted in Fig. 5, neural network 500 includes a latent space 530. The latent space 530 may include multiple distinct sets of parameters generated by the encoder 520, including latent space parameter 530-1, ... , latent space parameter 530-d (d being an integer). It is noted that the latent space 530 may include any number of parameter sets derived from the input layer 510 and generated by the encoder 520. The encoder 520 may non-linearly decompose the inputs 510-1 to 510-b to generate the latent space parameters 530-1 to 530-d.

[0079] As further depicted in Fig. 5, neural network 500 may include a decoder 540, including one or more nodes, including node 540-1, node 540-2, . . ., node 540-e (e being an integer). Each node may apply an activation function to received inputs (e.g., one or more of latent space parameter 530-1, . . ., latent space parameter 530-d). Similar to the encoder 520, the activation function used in each node of the decoder 540 may include a Heaviside step function, a Gaussian function, a multiquadratic function, an inverse multiquadratic function, a sigmoidal function, a rectified linear unit (ReLU) function (e.g., a ReLU6 function or a Leaky ReLU function), a hyperbolic tangent (“tanh”) function, or any non-linear function. The output of the activation function may be weighted by a weight associated with the node. A weight may include a positive value between 0 and 1, or any numerical value that may scale outputs of some nodes in a layer more or less than outputs of other nodes in the same layer. It is noted that while Fig. 5 shows one layer of nodes for the decoder 540, the decoder 540 may include multiple layers of nodes. In some embodiments, the number of nodes in the decoder 540 may not match the number of nodes in the encoder 520 (e.g., the integers c and e may be different). In other embodiments, the number of nodes in the decoder 540 may match the number of nodes in the encoder 520 (e.g., the integers c and e may be the same).

[0080] As further depicted in Fig. 5, neural network 500 may include an output layer 550 that finalizes outputs, including output 550-1, output 550-2, . . ., output 550-f (f being an integer). In some embodiments, the number of outputs 550 may not match the number of inputs 510 (e.g., the integers b and f may be different). In other embodiments, the number of outputs 550 may match the number of inputs 510 (e.g., the integers b and f may be the same).

[0081] Although the nodes of the neural network 500 are depicted in Fig. 5 as being connected to each node of its previous layer and next layer (referred to as “fully connected”), the layers of neural network 500 may use any connection scheme. For example, one or more layers (e.g., input layer 510, encoder 520, latent space 530, decoder 540, or output layer 550) of neural network 500 may be connected using a convolutional scheme, a sparsely connected scheme, or any connection scheme that uses fewer connections between one layer and a previous layer than the fully connected scheme as depicted in Fig. 5.

[0082] Moreover, although the inputs and outputs of the layers of neural network 500 are depicted as propagating in a forward direction (e.g., being fed from input layer 510 to output layer 550, referred to as a “feedforward network”) in Fig. 5, neural network 500 may additionally or alternatively use backpropagation (e.g., feeding data from output layer 550 towards input layer 510) for other purposes.For example, the backpropagation may be implemented by using long short-term memory nodes (LSTM). Accordingly, although neural network 500 is depicted similar to a convolutional neural network (CNN), neural network 500 may include a recurrent neural network (RNN) or any other neural network.

[0083] A generative artificial intelligence model with special loss functions and multiple key performance indicators (KPIs) may be used for defect inspection. In some embodiments, the method may use a variational autoencoder (VAE) to detect a defect by utilizing two latent space parameter loss KPIs along with a reconstruction loss KPI. The VAE may be trained using CPBI images or generated images (such as a golden image) with normal patterns (e.g., images without defects). For example, the generated images may be generated by the encoder of VAE. The generated images reflect the “essence” of the real images (e.g., CPBI images), such as similar pattern types, similar gray level and contrast. In some embodiments, the generated images may be created by generative machine learning model. Other generative models may be applied to handle more challenging CPBI images or layers, such as generative adversarial networks (GANs), diffusion models, and flow-based models.

[0084] Fig. 6 is a schematic diagram of a variational autoencoder (VAE) 600 being trained with image clips, consistent with embodiments of the present disclosure. The VAE 600 may be viewed as a non-linear decompose and reconstruct method, which is good at automating the process of representing raw data more efficiently. The VAE 600 includes an encoder 608 and a decoder 614. The encoder 608 helps to decompose input images efficiently into latent space parameters. The decoder 614 constructs new images based on the latent space parameters. The VAE 600 includes a probabilistic model that helps generate new content similar to, yet different from, the original content. The probability distribution is good at managing stochastics of feature characteristics due to wafer manufacturing process noise and CPB tool noise. In some embodiments, the VAE 600 may be implemented using the neural network 500.

[0085] A training image 602 is divided into a plurality of image clips. The image clips are randomly taken from different parts of the image 602. For training the VAE 600, some of the image clips are taken at a 0° rotation (e.g., not rotated from their orientation in the training image 602) as a first set of input image clips 604. Some of the image clips are taken at a 180° rotation (e.g., rotated from their orientation in the training image 602 by 180°) as a second set of input image clips 606.

[0086] The first set of input image clips 604 and the second set of input image clips 606 are provided to the encoder 608. The encoder 608 generates a first set of latent space parameters 610 for the first set of input image clips 604 and a second set of latent space parameters 612 for the second set of input image clips 606. The latent space parameters 610, 612 may include an edge placement (EP) parameter, which represents a placement shift of all features in the image clips; a critical dimension (CD) parameter, which represents a feature size of all features in the image clips; and a non-linear (NL) parameter, which represents non-linear effects in the image, including, but not limited to CPB distortion, charging effect, and lithography pattern stochastic effect. It is noted that the EP parameterdoes not provide an exact EP shift in nanometers and that the CD parameter does not provide an exact CD change in nanometers. It is noted that the latent space parameters 610, 612 may include additional parameters other than those specifically listed. In some embodiments, the latent space parameters 610, 612 may include eight or 16 parameters each.

[0087] The first set of latent space parameters 610 and the second set of latent space parameters 612 are provided to the decoder 614. The decoder 614 uses the latent space parameters to generate new images to attempt to almost exactly recover the original input image based on the latent space parameters. The decoder 614 generates a first set of output image clips 616 using latent space parameters 610 (attempting to recover the first set of input image clips 604) and a second set of output image clips 618 using latent space parameters 612 (attempting to recover the second set of input image clips 606). The encoder 608 and decoder 614 are trained by adjusting the weights of the encoder and the decoder to get the sets of output image clips 616, 618 close to the input image clips 604, 606.

[0088] The EP parameter and the CD parameter have a special relationship when viewed in the context of the input image clips 604, 606 being rotated by 180°. For example, the EP value will reverse sign (e.g., positive to negative or negative to positive) when the input image clip is rotated by 180°. As another example, the CD value will remain the same when the input image clip is rotated by 180° (e.g., the size of the feature as indicated by the CD value will not change).

[0089] To train the encoder 608 and the decoder 614, various loss functions may be used. For example, an EP loss may be calculated based on the formula:EP loss = mse(EP_latent@0° + EP_latent@180°) Equation (1), where mse ( ) is the mean squared error function, EP_latent@0° is the latent space EP parameter obtained at image clip rotation angle 0°, and EP_latent@180° is the latent space EP parameter obtained at image clip rotation angle 180°. Because the EP_latent@0° and the EP_latent@180° have opposite signs, the EP loss function adds the values together to attempt to get the loss close to zero.

[0090] A CD loss may be calculated based on the formula:CD loss = mse(CD_latent@0° - CD_latent@180°) Equation (2), where CD_latent@0° is the latent space CD parameter obtained at image clip rotation angle 0°, and CD_latent@180° is the latent space CD parameter obtained at image clip rotation angle 180°. Because the CD_latent@0° and the CD_latent@180° have the same value, the CD loss function subtracts the values to attempt to get the loss close to zero. The above examples are using image rotation angle 180° to show the EP and CD relationship between a non-zero angle and the image at 0° rotation. Thesame concept can be expanded to other image rotation angles (such as 90° or 270°) as well as flipping the image along the x-axis and the y-axis.

[0091] Besides the EP loss and the CD loss, a reconstruction loss and a Kullback-Leibler (KL) divergence loss may be calculated to ensure that the reconstructed image clip is similar to the original image clip. The reconstruction loss measures how close the output image clip of decoder 714 is to the original input image clip. The reconstruction loss measures the difference between the output image clip and the input image clip at the pixel level, and may be calculated based on the formula: reconstruction loss = (mse(image_reconstructed@0° - image_original@0°) / number_of_pixels) + (mse(image_reconstmcted@180° - image_original@180°) / number_of_pixels) Equation (3), where image_reconstructed@0° is the reconstructed image clip obtained at image clip rotation angle 0°, image_original@0° is the original image clip obtained at image clip rotation angle 0°, number_of_pixels is the total number of pixels per image clip, image_reconstructed@180° is the reconstructed image clip obtained at image clip rotation angle 180°, and image_original@180° is the original image clip obtained at image clip rotation angle 180°.

[0092] The KL divergence loss measures how much the distribution of the latent space parameters deviates from a prior distribution, which may be assumed to be a standard normal distribution. The KL divergence loss is used to capture the stochastic effect of noise from the metrology tool and the wafer manufacturing process. The KL divergence loss may be calculated based on the formula:KL divergence loss = KL_divergence_loss@0° + KL_divergence_loss@180° Equation (4).

[0093] A total loss function per image clip may be calculated based on the formula:Total Loss = reconstruction loss + (PxKL divergence loss) + (yx(EP loss + CD loss)) Equation (5), where and y are hyperparameters to weight the KL divergence loss and the combined EP loss and CD loss, respectively. As an example, P = le-6 and y = 1 may be used as the weights. In some embodiments, other hyperparameters may be set, such as learning rate = le-3, latent space parameter dimension = 16, and batch size = 256 images. It is noted that depending on the particular use case, the hyperparameters may have different values.

[0094] In the training phase, CPBI images with normal patterns may be used, which means that there are no defects in the images. The CPBI images may be randomly clipped into image clips with, forexample, a 32^32 pixel size. As the standard ML practice, the model quality may be verified with CPBI images that have not been seen during the training. Once the ML model quality is confirmed, the model can be used in the inference phase to calculate the KPIs for defect detection. It is noted that the image clip size may be optimized per use case. In some embodiments, the starting image clip size may be approximately 2-3 times the minimal feature size. For example, if the minimal feature size on the layer is 16 nm, then the image clip size can be started with clip sizes in the range of 32x32 nm- 48x48nm.

[0095] As an example of how to detect a defect, synthesized images may be used, as shown in Fig. 7. Fig. 7 is an example 700 of generating synthetic images with noise and defects, consistent with embodiments of the present disclosure.

[0096] The unit cell in the synthesized clean image 702 contains a bar and a circle. To mimic the pattern size variations in the wafer manufacturing process, the circle radius may vary from 16 pixels to 20 pixels with a step size of 1 pixels while the bar width may vary from 13 pixels to 17 pixels with a step size of 1 pixels. The circle size and the bar width may increase and decrease together, resulting in a total of five feature size conditions. To mimic the pattern placement variations in the wafer manufacturing process, the distance between the circle center and the bar center may vary from -2 pixels to +2 pixels with a step size of 1 pixels, resulting in a total of five placement conditions. Combining the five size conditions and the five placement conditions, 25 synthetic images may be created, each with a slightly different pattern size and placement. It is noted that the feature sizes, the step size, and the distance between the features may have different values than those listed above without affecting the overall operation of the embodiments described herein.

[0097] Random noise 704 may be added to the synthesized clean image 702 and then the combined noisy image may be converted to 8-bit bitmap (bmp) image 706. It is noted that other gray level resolution (e.g., 16-bit) and other image formats (e.g., JPEG, GIF, PNG) may be used without affecting the overall operation of the embodiments described herein. The converted combined image 706 contains no defects, as can be seen in enlargement 708 of a portion of image 706.

[0098] Defects may then be added to the converted combined image 706 to create a second combined image 710. As can be seen in enlargement 712 of the image 710, three unit cells in the top left area have added defects to imitate a necking type defect 714a, a bridging type defect 714b, and extra spot type defect 714c. It is noted that the defects 714a, 714b, and 714c shown in Fig. 7 are examples and that other defect types may be introduced into the image 710.

[0099] During the VAE training, 12 synthetic images (from the 25 synthetic images noted above) may be selected (and the remaining 13 synthetic images may be used for VAE testing, as described below). Each training image 706 may be randomly clipped into 7168 image clips at a size of 32x32 pixels. For example, with 12 training images, a total of 86016 image clips (12 images x 7168 clips per image) may be used per epoch of training. In some embodiments, the image intensity may be normalized to be in a range from 0-1 during the VAE training.

[0100] During the VAE testing, the remaining 13 synthetic images may be used to check the set-get- EP and the set-get-CD to confirm the quality of the VAE model prior to using the VAE model to detect defects. The set-EP value is a programmed EP value (for example, a known EP value when a wafer is printed), and the get-EP value is a measured EP value. The set-EP value may be plotted on an x-axis and the get-EP value may be plotted on a y-axis. A first-order polynomial may be used on a pair of coordinates (the set-EP value and the corresponding get-EP value for a particular point) may be fit to obtain the set-get EP line slope and an intercept for the line. Similarly, the set-CD value is a programmed CD value (for example, a known CD value when a wafer is printed), and the get-CD value is a measured CD value. The set-CD value may be plotted on an x-axis and the get-CD value may be plotted on a y-axis. A first-order polynomial may be used on a pair of coordinates (the set-CD value and the corresponding get-CD value for a particular point) may be fit to obtain the set-get CD line slope and an intercept for the line. It is noted that the set-get-CD correlation R-square is near 1.

[0101] In some embodiments, eight image clips may be randomly selected from the 13 testing images to check the reconstruction loss because reconstructed image clips have lower noise.

[0102] Fig. 8 is a flowchart of a method 800 for performing inference using the trained VAE 600, consistent with embodiments of the present disclosure. In some embodiments, the method 800 may be performed by the VAE 600, which may be implemented using the neural network 500.

[0103] At step 802, a test image is divided into image clips. In some embodiments, the image clips may overlap each other, may not overlap each other, or be a combination of overlapping and nonoverlapping image clips. For example, as shown in connection with Fig. 6, the image clips in the first set of input image clips 604 and the second set of input image clips 606 may overlap, may not overlap, or may be a combination or overlapping and non-overlapping image clips.

[0104] Fig. 9A is an example 900 of generating image clips without overlapping margins, consistent with embodiments of the present disclosure. An image 902 may be divided into image clips 904, with each image clip 904 having a size (e.g., a clip interval) of 32^32 pixels.

[0105] Fig. 9B is an example 920 of generating image clips with overlapping margins, consistent with embodiments of the present disclosure. The image 902 may be divided into image clips 922, with each image clip 922 having a size 32x32 pixels with the different clip interval. In some embodiments, a clip interval of 24x24 pixels can result in the overlapping margin per image clip of 8 pixels. As the overlapping margin increases, the number of image clips 922 that may be created from the image 902 also increases. Using overlapping image clips 922 may help avoid missing tiny defects located on the clipping boundary and that have a small number of pixels in the defect area. In some embodiments, the defect detection of overlapping image clips 922 may be tracked along with its four sides’ neighboring image clips 922.

[0106] Referring back to Fig. 8, at step 804, the image clips are input into the VAE model to generate latent space parameters for the image clip. For example, as shown in connection with Fig. 6, the image clips in the first set of input image clips 604 and the second set of input image clips 606 areprovided to the encoder 608 to generate the first set of latent space parameters 610 and the second set of latent space parameters 612.

[0107] At step 806, the test code calculates and reports defect-related KPIs. For example, as shown in connection with Fig. 6, the encoder 608 outputs the first set of latent space parameters 610 and the second set of latent space parameters 612, and decoder 614 takes the latent space parameters and then outputs the constructed image clips, finally the test code utilizes the latent space parameters and constructed image to calculate defect-related KPIs. The defect-related KPIs may include an EP -related KPI (EP dif KPI), a CD-related KPI (CD dif KPI), and a reconstruction-related KPI (reconstruction dif KPI). EP dif KPI and CD dif KPI indicate that the defect caused placement and size changes, respectively, while reconstruction dif KPI represents the dissimilarity between the reconstructed image and the original image.

[0108] EP dif KPI and CD dif KPI may be based on latent space parameters 610, 612 obtained by the VAE encoder 608 from image clips at 0° rotation and 180° rotation, and may be calculated based on the formulas:EP dif O = EP_latent[O]@O° - EP_latent[0]@180° Equation (6)EP dif l = EP_latent[l]@O° - EP_latent[l]@180° Equation (7)CD dif O = CD_latent[O]@O° + CD_latent[0]@180° Equation (8)CD dif l = CD_latent[l]@O° + CD_latent[l]@180° Equation (9)EP dif KPI = max(|EP_dif_O|, |EP_dif_l|) Equation (10)CD dif KPI = max(|CD_dif_O|, |CD_dif_l|) Equation (11), where EP_latent[O]@O° and EP_latent[l]@O° are two EP latent space parameters from an image clip at 0° rotation, CD_latent[0]@0° and CD_latent[l]@O° are two CD latent space parameters from an image clip at 0° rotation, EP_latent[0]@180° and EP_latent[l]@180° are two EP latent space parameters from an image clip at 180° rotation, and CD_latent[0]@180° and CD_latent[l]@180° are two CD latent space parameters from an image clip at 180° rotation. EP dif KPI is a maximum of the absolute value of EP dif O and EP dif l . CD dif KPI is a maximum of the absolute value of CD dif O and CD dif l. The EP and CD values are calculated based on all pixels in an image clip.

[0109] It is noted that the EP dif in Equations (6) and (7) is calculated differently from the EP loss function in Equation (1). In Equations (6) and (7), the EP values are subtracted to magnify the KPI sensitivity to placement. When trying to find a defect, it is preferable to have the EP be morenoticeable, so by subtracting the EP values, it doubles the placement error, making a defect easier to locate. This is different from the EP loss, in which the EP values are added to minimize the loss. In some embodiments, two different EP latent space parameters may be used, such as an x-axis placement and a y-axis placement. For example, if an image flipped along the x-axis or the y-axis is used, EP_latent[0] may represent the x-direction shift or the y-direction shift. In some embodiments, the images used to calculate EP dif may include an image clip at 0° rotation and an image clip at 90° rotation, and more latent space parameters for EP representation may be used. For example, if the image is rotated every 90°, four latent space parameters may be used for the EP representation.

[0110] It is noted that the CD dif in Equations (8) and (9) is calculated differently from the CD loss function in Equation (2). In Equations (8) and (9), the CD values are added to magnify the KPI sensitivity to feature size. When trying to find a defect, it is preferable to have the CD be more noticeable, so by adding the CD values, it doubles the feature size error, making a defect easier to locate. This is different from the CD loss, in which the CD values are subtracted to minimize the loss. In some embodiments, two different CD latent space parameters may be used, such as an x-axis feature size and a y-axis feature size.

[0111] Reconstruction dif KPI may be based on a sum square of the intensity difference between the original image clip and the reconstructed image clip, and may be calculated based on the formulas:Image dif = reconstructed image - original image Equation (12)G smoothed lmage dif = GaussianFilter(Image dif) Equation (13), where GaussianFilter( ) is Gaussian filter function with mean 0 and 1 pixel sigma,Sorted l square array = sort(G_smoothed_Image_difA2, direction = ‘descend’) Equation (14), where sort( ) is the sort function which generates an array from largest value to smallest value, reconstruction dif KPI = sum(Sorted_I_square_array[:N]) Equation (15)N = round(top_pixel_area_ratio x total_number_of_pixels_per_image_clip) Equation (16), where sum(Sorted_I_square_array[:N]) is the sum function which sums up the values of first N values of Sorted l square array, and N is the number of pixels that can be calculated as top_pixel_area_ratio multiplied by the total number of the pixels in one image clip.

[0112] For example, if a 32^32 pixel image clip is used, then total_number_of_pixels_per_image_ clip=32x32=1024 pixels. If the top_pixel_area_ratio = 0.05, then N = round(0.05x l024) = 51 pixels. This means that reconstruction dif KPI is the sum of square intensity of 51 pixels with maximum construction error.

[0113] It is noted that applying the Gaussian filter can effectively suppress the noise at the pattern boundary on Image dif.

[0114] If only reconstruction dif KPI is used to detect defects, some defects may be mixed with normal images (e.g., images with no defects). To further refine the defect detection, the image clips are sorted by reconstruction dif KPI.

[0115] Referring back to Fig. 8, at step 808, the defect-related KPIs are compared to respective thresholds. For example, there may be separate thresholds for each of the EP -related KPI, the CD- related KPI, and the reconstruction-related KPI.

[0116] Fig. 10 is a flowchart of a method 1000 for determining whether an image clip contains a defect based on calculated key performance indicators (KPIs), consistent with embodiments of the present disclosure. The method 1000 may be performed as part of step 808 of Fig. 8. In some embodiments, application software may be performed by the VAE 600 which may be implemented using the neural network 500 and method 1000, and may enable fully automated defect inspection.

[0117] At step 1002, a determination is made whether reconstruction dif KPI is greater than a maximum threshold. In some embodiments, a top 1% of pixels or a top 5% of pixels may be used to calculate reconstruction dif KPI. The value selected for the maximum threshold may vary depending on what percentage of pixels are selected. For example, if the top 1% of pixels are selected, the maximum threshold may be 0.06. As another example, if the top 5% of pixels are selected, the maximum threshold may be 0.05. It is noted that other values for the maximum threshold may be used.

[0118] At step 1004, if reconstruction dif KPI is greater than the maximum threshold (step 1002, “yes” branch), then this is a good indicator that there is a defect present in the image.

[0119] At step 1006, if reconstruction dif KPI is not greater than the maximum threshold (step 1002, “no” branch), then a determination is made whether reconstruction dif KPI is greater than a minimum threshold. The value selected for the minimum threshold may vary depending on what percentage of pixels are selected. For example, if the top 1% of pixels are selected, the minimum threshold may be 0.025. As another example, if the top 5% of pixels are selected, the minimum threshold may be 0.014. It is noted that other values for the minimum threshold may be used.

[0120] At step 1008, if reconstruction dif KPI is not greater than the minimum threshold (step 1006, “no” branch), then this is a good indicator that there is no defect present in the image.

[0121] At step 1010, if reconstruction dif KPI is greater than the minimum threshold (step 1006, “yes” branch), then this is an indication that the image likely has a defect but that additional parameters should be evaluated to confirm whether the image contains a defect. A determination ismade whether EP dif KPI is greater than an EP dif KPI threshold. For example, the EP dif KPI threshold may be 0.25. It is noted that because EP dif KPI is based on all pixels in an image clip, the number of pixels selected during the reconstruction dif KPI calculation does not impact the EP dif KPI. If EP dif KPI is greater than the EP dif KPI threshold (step 1010, “yes” branch), then this is an indication that a defect is present in the image (step 1004).

[0122] At step 1012, if EP dif KPI is not greater than the EP dif KPI threshold (step 1010, “no” branch), then a determination is made whether CD dif KPI is greater than a CD dif KPI threshold. For example, the CD dif KPI threshold may be 0.35. It is noted that because CD dif KPI is based on all pixels in an image clip, the number of pixels selected during the reconstruction dif KPI calculation does not impact the CD dif KPI. If CD dif KPI is greater than the CD dif KPI threshold (step 1012, “yes” branch), then this is an indication that a defect is present in the image (step 1004).

[0123] The different thresholds used in the method 1000 may be summarized in Table 1 below.Table 1. Defect KPI thresholds

[0124] At step 1014, if CD dif KPI is not greater than the CD dif KPI threshold (step 1012, “no” branch), then a determination is made whether the image clip being examined is from a reclipped image. It is possible that the image clip being examined may have a defect, for example, a defect near an edge of the image clip. To confirm whether the image clip has a defect near the edge, the image may be reclipped to create multiple images such that the edges of the original image clip are located in different portions of the reclipped images. This image reclipping process is performed once per image clip, so if the image clip is from a reclipped image (step 1014, “yes” branch), then this an indication that there is no defect present in the image clip (step 1008).

[0125] At step 1016, if the image clip is not from a reclipped image (step 1014, “no” branch), then the image is reclipped.

[0126] Fig. 11 is an example 1100 of reclipping an image, consistent with embodiments of the present disclosure. The example 1100 may be performed as part of step 1016 of Fig. 10. A first reclipping pattern 1102 reclips an original image clip 1104 into four segments, with each image reclip including one half of the original image clip 1104. For example, a first image reclip 1106a may include a top half of the original image clip 1104 as a bottom half of the first image reclip 1106a. A second image reclip 1106b may include a right half of the original image clip 1104 as a left half of thesecond image reclip 1106b. A third image reclip 1106c may include a bottom half of the original image clip 1104 as a top half of the third image reclip 1106c. A fourth image reclip 1106d may include a left half of the original image clip 1104 as a right half of the fourth image reclip 1106d.

[0127] A second reclipping pattern 1110 reclips an original image clip 1112 into four segments, with each image reclip including one quarter of the original image clip 1112. For example, a fifth image reclip 1114a may include an upper left quarter of the original image clip 1112 as a lower right quarter of the fifth image reclip 1114a. A sixth image reclip 1114b may include an upper right quarter of the original image clip 1112 as a lower left quarter of the sixth image reclip 1114b. A seventh image reclip 1114c may include a lower right quarter of the original image clip 1112 as an upper left quarter of the seventh image reclip 1114c. An eighth image reclip 1114d may include a lower left quarter of the original image clip 1112 as an upper right quarter of the eighth image reclip 1114d.

[0128] In some embodiments, all eight image reclips 1106a-l 106d and 1114a- 1114d may be evaluated to determine whether each image reclip contains a defect. In some embodiments, either the first reclipping pattern 1102 (e.g., image reclips 1106a-l 106d) or the second reclipping pattern 1110 (e.g., image reclips 1114a-l 114d) may be evaluated to determine whether each image reclip contains a defect.

[0129] Referring back to Fig. 8, at step 810, based on results of the comparisons of the KPIs to their respective thresholds, each image clip is assigned to a “normal” group (meaning that there are no defects in the image clip, as indicated in step 1008 of Fig. 10) or to a “defect” group (meaning that there is a defect in the image clip, as indicated in step 1004 of Fig. 10).

[0130] In some embodiments, the defect KPIs may be tracked during the run time and VAE 600 may be retrained or fine-tuned as needed. For example, if it is noticed that a new type of defect is missed due to, for example, a different pattern type or a different pattern size, new images may be added to retrain the VAE 600 on the different pattern type or different pattern size.

[0131] A non-transitory computer readable medium may be provided that stores instructions for a processor of a controller (e.g., controller 109 of FIG. 1) to carry out, among other things, image inspection, image acquisition, stage positioning, beam focusing, electric field adjustment, beam bending, condenser lens adjusting, activating charged particle source, beam deflecting, method 800, and method 1000. Common forms of non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a Compact Disc Read Only Memory (CD-ROM), any other optical data storage medium, any physical medium with patterns of holes, a Random Access Memory (RAM), a Programmable Read Only Memory (PROM), and Erasable Programmable Read Only Memory (EPROM), a FLASH-EPROM or any other flash memory, Non-Volatile Random Access Memory (NVRAM), a cache, a register, any other memory chip or cartridge, and networked versions of the same.

[0132] The embodiments may further be described using the following clauses:1. A method of defect detection, comprising:inputting an image clip of a charged-particle beam inspection (CPBI) image to a variational autoencoder (VAE) model that has been trained with multiple training image clips; receiving latent space parameters from the VAE; calculating key performance indicators (KPIs) based on the received latent space parameters and a reconstruction error; and determining defects in the CPBI image based on the calculated KPIs.2. The method of clause 1, wherein the latent space parameters include edge placement (EP) latent space parameters and critical dimension (CD) latent space parameters.3. The method of clause 2, wherein the latent space parameters include at least two EP latent space parameters.4. The method of clause 3, wherein calculating KPIs includes: calculating a first EP difference between a first image and a second image; calculating a second EP difference between the first image and the second image; and calculating an EP KPI as a maximum of the first EP difference and the second EP difference.5. The method of clause 4, wherein: the first image is an input image at 0° rotation; and the second image is an input image at 180° rotation relative to the first image.6. The method of any one of clauses 2-5, wherein the latent space parameters include at least two CD latent space parameters.7. The method of clause 6, wherein calculating KPIs includes: calculating a first CD difference between a first image and a second image; calculating a second CD difference between the first image and the second image; and calculating a CD KPI as a maximum of the first CD difference and the second CD difference.8. The method of clause 7, wherein: the first image is an input image at 0° rotation; and the second image is an input image at 180° rotation relative to the first image.9. The method of any one of clauses 2-8, wherein: the first image is an input image at 0° rotation; and the second image is an input image at 90° rotation relative to the first image, 270° rotation relative to the first image, flipped in an x-direction, or flipped in a y-direction.10. The method of any one of clauses 1-9, wherein calculating KPIs includes: calculating a reconstruction difference between an input image and a corresponding reconstructed image output by the VAE.11. The method of clause 10, wherein calculating the reconstruction difference includes: applying a Gaussian filter to a difference between the reconstructed image and the input image; sorting image pixels in descending order of intensity; andselecting a percentage of the sorted image pixels.12. The method of any one of clauses 1-11, wherein calculating KPIs includes: calculating a first edge placement (EP) difference between a first image at 0° rotation and a second image at 180° rotation relative to the first image; calculating a second EP difference between the first image and the second image; calculating an EP KPI as a maximum of the first EP difference and the second EP difference; calculating a first critical dimension (CD) difference direction between the first image and the second image; calculating a second CD difference between the first image and the second image; calculating a CD KPI as a maximum of the first CD difference and the second CD difference; and calculating a reconstruction difference between an input image and a corresponding reconstructed image output by the VAE.13. The method of clause 12, wherein determining defects in the CPBI image based on the calculated KPIs includes: determining whether the reconstruction difference is greater than a first reconstruction threshold; and determining that a defect is present on a condition that the reconstruction difference is greater than a first reconstruction threshold.14. The method of clause 13, wherein on a condition that the reconstruction difference is not greater than the first reconstruction threshold: determining whether the reconstruction difference is greater than a second reconstruction threshold; and determining that a defect is not present on a condition that the reconstruction difference is greater than the second reconstruction threshold.15. The method of clause 14, wherein on a condition that the reconstruction difference is not greater than the second reconstruction threshold: determining whether the EP KPI is greater than an EP threshold; and determining that a defect is present on a condition that the EP KPI is greater than the EP threshold.16. The method of clause 15, wherein on a condition that the EP KPI is not greater than the EP threshold: determining whether the CD KPI is greater than a CD threshold; and determining that a defect is present on a condition that the CD KPI is greater than the CD threshold.17. The method of clause 16, wherein on a condition that the CD KPI is not greater than the CD threshold:determining whether the image clip is from a reclipped image; determining that a defect is not present on a condition that the image clip is from a reclipped image; and reclipping the image on a condition that the image clip is not from a reclipped image.18. The method of clause 17, wherein reclipping the image includes: generating a plurality of reclipped images, wherein each reclipped image includes a portion of the image.19. The method of clause 18, wherein generating the plurality of reclipped images includes: generating a first reclipped image including a top half of the image; generating a second reclipped image including a bottom half of the image; generating a third reclipped image including a left half of the image; and generating a fourth reclipped image including a right half of the image.20. The method of clause 19, wherein generating the plurality of reclipped images further includes: generating a fifth reclipped image including an upper left portion of the image; generating a sixth reclipped image including an upper right portion of the image; generating a seventh reclipped image including a lower left portion of the image; and generating an eighth reclipped image including a lower right portion of the image.21. The method of any one of clauses 1-20, wherein the VAE model is trained on images without defects.22. The method of clause 21, wherein: the training images are randomly clipped to generate the training image clips; a first image clip is positioned at a 0° rotation relative to the training image; and a second image clip is positioned at a 180° rotation relative to the first image clip.23. The method of clause 22, wherein the VAE model is configured to generate a set of latent space parameters based on the image clips.24. The method of clause 23, wherein: the set of latent space parameters includes an edge placement parameter; and the VAE model is trained using a first loss function that is calculated based on minimizing a mean square error of a sum of the edge placement parameter in the first image clip and the edge placement parameter in the second image clip.25. The method of clause 23, wherein: the set of latent space parameters includes a critical dimension parameter; and the VAE model is trained using a second loss function that is calculated based on minimizing a mean square error of a difference of the critical dimension parameter in the first image clip and the critical dimension parameter in the second image clip.26. A non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computing device to cause the computing device to perform operations for defect detection, the operations comprising any one of clauses 1-25.27. An apparatus for performing defect detection, comprising: a memory storing a set of instructions; and at least one processor configured to execute the set of instructions to cause the apparatus to perform operations comprising any one of clauses 1-25.

[0133] Block diagrams in the figures may illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer hardware or software products according to various exemplary embodiments of the present disclosure. In some embodiments, a non- transitory computer-readable medium is provided and can include instructions to perform the functions described in connection with any one or more of Figs. 6-11. In this regard, each block in a schematic diagram may represent certain arithmetical or logical operation processing that may be implemented using hardware such as an electronic circuit. Blocks may also represent a module, segment, or portion of code that comprises one or more executable instructions for implementing the specified logical functions. It should be understood that in some alternative implementations, functions indicated in a block may occur out of the order noted in the figures. For example, two blocks shown in succession may be executed or implemented substantially concurrently, or two blocks may sometimes be executed in reverse order, depending upon the functionality involved. Some blocks may also be omitted. It should also be understood that each block of the block diagrams, and combination of the blocks, may be implemented by special purpose hardware-based systems that perform the specified functions or acts, or by combinations of special purpose hardware and computer instructions.

[0134] It will be appreciated that the embodiments of the present disclosure are not limited to the exact construction that has been described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The present disclosure has been described in connection with various embodiments, and other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the technology disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope of the invention being indicated by the following claims.

Claims

CLAIMS1. A non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computing device to cause the computing device to perform operations for defect detection, the operations comprising: inputting an image clip of a charged-particle beam inspection (CPBI) image to a variational autoencoder (VAE) model that has been trained with multiple training image clips; receiving latent space parameters from the VAE; calculating key performance indicators (KPIs) based on the received latent space parameters and a reconstruction error; and determining defects in the CPBI image based on the calculated KPIs.

2. The non-transitory computer readable medium of claim 1, wherein: the latent space parameters include at least two EP latent space parameters; and calculating KPIs includes: calculating a first EP difference between a first image and a second image; calculating a second EP difference between the first image and the second image; and calculating an EP KPI as a maximum of the first EP difference and the second EP difference.

3. The non-transitory computer readable medium of claim 2, wherein: the first image is an input image at 0° rotation; and the second image is an input image at 180° rotation relative to the first image.

4. The non-transitory computer readable medium of claim 1, wherein: the latent space parameters include at least two CD latent space parameters; and calculating KPIs includes: calculating a first CD difference between a first image and a second image; calculating a second CD difference between the first image and the second image; and calculating a CD KPI as a maximum of the first CD difference and the second CD difference.

5. The non-transitory computer readable medium of claim 4, wherein: the first image is an input image at 0° rotation; and the second image is an input image at 180° rotation relative to the first image.

6. The non-transitory computer readable medium of claim 1, wherein calculating KPIs includes: calculating a reconstruction difference between an input image and a corresponding reconstructed image output by the VAE.

7. The non-transitory computer readable medium of claim 1, wherein calculating KPIs includes: calculating a first edge placement (EP) difference between a first image at 0° rotation and a second image at 180° rotation relative to the first image; calculating a second EP difference between the first image and the second image; calculating an EP KPI as a maximum of the first EP difference and the second EP difference; calculating a first critical dimension (CD) difference direction between the first image and the second image; calculating a second CD difference between the first image and the second image; calculating a CD KPI as a maximum of the first CD difference and the second CD difference; and calculating a reconstruction difference between an input image and a corresponding reconstructed image output by the VAE.

8. The non-transitory computer readable medium of claim 7, wherein determining defects in the CPBI image based on the calculated KPIs includes: determining whether the reconstruction difference is greater than a first reconstruction threshold; and determining that a defect is present on a condition that the reconstruction difference is greater than a first reconstruction threshold.

9. The non-transitory computer readable medium of claim 8, wherein on a condition that the reconstruction difference is not greater than the first reconstruction threshold: determining whether the reconstruction difference is greater than a second reconstruction threshold; and determining that a defect is not present on a condition that the reconstruction difference is greater than the second reconstruction threshold.

10. The non-transitory computer readable medium of claim 9, wherein on a condition that the reconstruction difference is not greater than the second reconstruction threshold: determining whether the EP KPI is greater than an EP threshold; anddetermining that a defect is present on a condition that the EP KPI is greater than the EP threshold.

11. The non-transitory computer readable medium of claim 10, wherein on a condition that the EP KPI is not greater than the EP threshold: determining whether the CD KPI is greater than a CD threshold; and determining that a defect is present on a condition that the CD KPI is greater than the CD threshold.

12. The non-transitory computer readable medium of claim 11, wherein on a condition that the CD KPI is not greater than the CD threshold: determining whether the image clip is from a reclipped image; determining that a defect is not present on a condition that the image clip is from a reclipped image; and reclipping the image on a condition that the image clip is not from a reclipped image.

13. The method of claim 12, wherein reclipping the image includes: generating a plurality of reclipped images, wherein each reclipped image includes a portion of the image.

14. An apparatus for performing defect detection, comprising: a memory storing a set of instructions; and at least one processor configured to execute the set of instructions to cause the apparatus to perform operations comprising: inputting an image clip of a charged-particle beam inspection (CPBI) image to a variational autoencoder (VAE) model that has been trained with multiple training image clips; receiving latent space parameters from the VAE; calculating key performance indicators (KPIs) based on the received latent space parameters and a reconstruction error; and determining defects in the CPBI image based on the calculated KPIs.

15. A method of defect detection, comprising: inputting an image clip of a charged-particle beam inspection (CPBI) image to a variational autoencoder (VAE) model that has been trained with multiple training image clips; receiving latent space parameters from the VAE; calculating key performance indicators (KPIs) based on the received latent space parameters and a reconstruction error; anddetermining defects in the CPBI image based on the calculated KPIs.