Multi-wavelength substrate defect analysis

A multi-wavelength inspection system with AI-enhanced defect analysis addresses the sensitivity-throughput tradeoff, enhancing defect detection and classification efficiency in substrate manufacturing.

WO2026142955A1PCT designated stage Publication Date: 2026-07-02APPLIED MATERIALS INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
APPLIED MATERIALS INC
Filing Date
2025-12-19
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Conventional substrate inspection systems face a tradeoff between sensitivity and throughput, often compromising on one or the other, and struggle to efficiently detect and classify various types of defects using single illumination sources.

Method used

Implementing a multi-wavelength inspection system that utilizes multiple illumination sources and artificial intelligence models to analyze substrate defects, combining spatially segmented data and background-corrected images to enhance defect detection and classification.

Benefits of technology

Improves defect detection accuracy and throughput by effectively identifying and classifying defects across different types and materials, reducing the likelihood of defects and associated costs, and optimizing manufacturing processes.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure US2025060595_02072026_PF_FP_ABST
    Figure US2025060595_02072026_PF_FP_ABST
Patent Text Reader

Abstract

A method includes obtaining a plurality of images of a substrate. Each of the plurality of images are associated with a different inspection wavelength. The method further includes obtaining a plurality of reference frames. Each of the reference frames is associated with one of the images. The method further includes determining a defect of the substrate, based on the plurality of images and the plurality of reference frames. The method further includes providing an alert to a user indicative of the defect.
Need to check novelty before this filing date? Find Prior Art

Description

Attorney Docket No.: 36119.2956 (L2093PCT)MULTI- WAVELENGTH SUBSTRATE DEFECT ANALYSISTECHNICAL FIELD

[0001] Embodiments of the present disclosure relate to methods associated with analysis of defects in substrate processing. Specifically, embodiments of the present disclosure relate to multi-wavelength analysis of defects in substrate processing.BACKGROUND

[0002] Products may be produced by performing one or more manufacturing processes using manufacturing equipment. For example, semiconductor manufacturing equipment may be used to produce substrates via semiconductor manufacturing processes. Products are to be produced with particular properties, suited for a target application. Product properties may include repeatability, e.g., freedom of products from defects. Machine learning models are used in various process control and predictive functions associated with manufacturing equipment. Machine learning models are trained using data associated with the manufacturing equipment. Output of machine learning models may be associated with the generation of substrate defects.SUMMARY

[0003] The following is a simplified summary of the disclosure in order to provide a basic understanding of some aspects of the disclosure. This summary is not an extensive overview of the disclosure. It is intended to neither identify key or critical elements of the disclosure, nor delineate any scope of the particular embodiments of the disclosure or any scope of the claims. Its sole purpose is to present some concepts of the disclosure in a simplified form as a prelude to the more detailed description that is presented later.

[0004] In one aspect of the present disclosure, a method includes obtaining a plurality of images of a substrate. Each of the plurality of images are associated with a different inspection wavelength. The method further includes obtaining a plurality of reference frames. Each of the reference frames is associated with one of the images. The method further includes determining a defect of the substrate, based on the plurality of images and the plurality of reference frames. The method further includes providing an alert to a user indicative of the defect.

[0005] In another aspect of the disclosure, a method includes obtaining a plurality of inspection images. The plurality of inspection images includes images associated with a plurality of inspection wavelengths. The plurality of inspections images further includesAttorney Docket No.: 36119.2956 (L2093PCT)images associate with a plurality of substrates. The method further includes obtaining a plurality of defect location data associated with the plurality of substrates. The method further includes training an artificial intelligence model to determine defect location data based on a set of inspection images, by providing the plurality of inspection images as training input and the plurality of defect location data as target output.

[0006] In another aspect of the disclosure, a non-transitory machine-readable storage medium is disclosed. The storage medium stores instructions which, when executed, cause a processing device to perform operations. The operations include obtaining a plurality of images of a substrate. Each of the plurality of images are associated with a different inspection wavelength. The operations further include obtaining a plurality of reference frames. Each of the reference frames is associated with one of the images. The operations further include determining a defect of the substrate, based on the plurality of images and the plurality of reference frames. The operations further include providing an alert to a user indicative of the defect.BRIEF DESCRIPTION OF THE DRAWINGS

[0007] The present disclosure is illustrated by way of example, and not by way of limitation in the figures of the accompanying drawings.

[0008] FIG. 1 is a block diagram illustrating an exemplary system architecture, according to some embodiments.

[0009] FIG. 2 depicts a block diagram of a system including an example data set generator for creating data sets for one or more supervised models, according to some embodiments.

[0010] FIG. 3 is a block diagram illustrating a system for generating output data, according to some embodiments.

[0011] FIG. 4 A is a flow diagram of a method for generating a data set for a machine learning model, according to some embodiments.

[0012] FIG. 4B is a flow diagram of a method for performing multi-wavelength defect analysis, according to some embodiments.

[0013] FIG. 4C is a flow diagram of a method fortraining an artificial intelligence model to perform multi-wavelength substrate defect analysis, according to some embodiments.

[0014] FIG. 5 depicts a logical flow for performance of multi-wavelength defect analysis, according to some embodiments.Attorney Docket No.: 36119.2956 (L2093PCT)

[0015] FIG. 6 is a block diagram illustrating a computer system, according to some embodiments.DETAILED DESCRIPTION

[0016] Described herein are technologies related to improving processes of substrate manufacturing, in particular by reducing substrate defects. Manufacturing equipment is used to produce products, such as substrates (e.g., semiconductor wafers). Manufacturing equipment may include a manufacturing or processing chamber to separate the substrate from the environment. The properties of produced substrates are to meet target values to facilitate specific functionalities. Manufacturing parameters are selected to produce substrates that meet the target property values. Many manufacturing parameters (e.g., hardware parameters, process parameters, etc.) contribute to the properties of processed substrates. Manufacturing systems may control parameters by specifying a set point for a property value and receiving data from sensors disposed within the manufacturing chamber, and making adjustments to the manufacturing equipment until the sensor readings match the set point. Adjustments made to the manufacturing equipment may be made based on one or more metrics.

[0017] Various types of models may be applied in several ways associated with processing chambers and / or manufacturing equipment. Models applicable to a process chamber may include a physics-based model, a digital twin model, a statistical model, a machine learning model, or the like.

[0018] In some systems, substrate defects may occur during processing. The substrate defects may occur in connection with one or more of process parameters (e.g., process recipe), hardware parameters (e.g., process tool equipment constants), installed hardware components, chamber chemistry, or other constraints affecting substrate processing operations. It may be the focus of considerable effort to analyze root causes of defects, predict defect formation, and correct defect sources to improve consistency of substrate processing procedures.

[0019] Defects may be of varioustypes (e.g., pits, scratches, particles, etc.), be provided from various sources, be related to various procedures, or the like. In some systems, particle defects may be a concern. A particle defect may occur when a particle of material is liberated from some location of a processing system, and falls onto a substrate undergoing processing, which may impact substrate performance, interrupt or interfere with further process operations performed on the substrate, etc. In some systems, a scratch, irregularity, substrate damage, warping, or another defect may be of interest. Determining presence of a defect,Attorney Docket No.: 36119.2956 (L2093PCT)location of a defect, and / or classification of a defect may be performed via imaging, e.g., based on one or more inspection images of the substrate.

[0020] In some systems, inspection images may be generated and analyzed based on a single illumination source, type, or wavelength. In some systems, multiple illumination types may be considered separately, e.g., based on expected performance for a target defect type, substrate design, or the like.

[0021] In some systems, an inspection system may target high sensitivity and / or high throughput for locating and / or classifying substrate defects. Of further utility in an inspection system is non-specificity, e.g., an inspection system that is efficient at identifying many types of defects in many substrate materials, designs, or the like. There may be a tradeoff in conventional systems between these goals, as well as cost and / or convenience of manufacturing or operating the inspection apparatus. For example, higher throughput devices may provide lower sensitivity.

[0022] In some systems, selecting a single illumination metric may represent a compromise or a targeting of a particular defect or set of defects. For example, different defects may provide different responses to various illumination types (e.g., illumination wavelengths). Different defect types, defect materials, substrate properties, or the like may cause a defect to appear more strongly in an inspection system based on one illumination type than another type. Some more complex inspection systems may utilize multiple illuminations (e.g., different wavelength sources, filters applied to illumination or detection side of the system, or the like), which may improve applicability of the inspection system but may result in poor throughput and at best a lack of improvement to sensitivity compared to comparable single illumination systems.

[0023] An inspection system may provide illumination to a target region of a substrate for imaging and / or analyzing substrate defects. The illumination source may be selected to target a particular combination of defect type, defect size, substrate material, substrate design, etc., of interest.

[0024] Aspects of the present disclosure may address one or more of the shortcomings of conventional solutions. In some embodiments, an inspection system includes multiple illumination sources (e.g., different sources of electromagnetic radiation, different filtering schemes on the illumination or detection side, wavelength-sensitive detection, or the like). A set of inspection images may be obtained. The set may include spatially segmented data (e.g., an inspection image may correspond to a portion of a substrate, and multiple inspection images may be generated to map the entirety of the substrate). The set may include dataAttorney Docket No.: 36119.2956 (L2093PCT)segmented by wavelength (e.g., by scanning illumination source, scanning filtering, utilizing color-dependent or wavelength dependent detection, or the like). Each inspection image may be associated with a target substrate region and a target wavelength or wavelength profile. Inspection images associated with multiple wavelengths (e.g., multiple wavelength profiles) and the same region of the substrate may be utilized together in determining whether a defect is present, classifying the defect, performing a corrective action based on the defect, or the like.

[0025] In some embodiments, an inspection image (e.g., a new image frame) is obtained by a processing system. A reference frame may be generated and used for background cancelation, e.g., to highlight any unusual structures such as defects. Generation of the reference frame and background subtraction may include calibration, pre-processing, registration, averaging, and other techniques. Areas of concern may be determined based on the background-corrected images. Oneor more peak finding tools may be utilized, including thresholding, difference of Gaussian techniques, Laplacian of Gaussian techniques, top-hat filtering, template matching, Hough circle transform, or other techniques used for finding areas of interest in the background-corrected image.

[0026] A number of inspection images may be treated similarly. The inspection images may be grouped by area of the substrate, e.g., many images may be used to represent an entire surface or area of interest of the substrate. The inspection images may further be grouped by wavelength. The same techniques may be applied to determine locations of anomalies (e.g., potential defects) in the images and corresponding locations on the substrate(s).

[0027] Potential defect locations may be gathered and grouped together into presumptive defects. For example, defect locations from different wavelength images within a radius of each other may be grouped together, as they may be deemed likely to result from the same on-substrate anomaly. For example, a first provisional location may be identified based on a first wavelength, a provisional second location may be identified based on a second wavelength, and an aggregated location maybe generatedbasedon the provisional locations.

[0028] In some embodiments, a table of congregated defects may be generated. Signal intensity may be collected in relation to each presumptive defect at each inspection wavelength. For example, an N-digit intensity vector may be generated, where N is the number of inspection wavelengths, and each digit comprises an intensity value of the defect in the corresponding wavelength inspection image (e.g., background-corrected inspection image). The N-digit intensity vector may be utilized for various detection and classificationAttorney Docket No.: 36119.2956 (L2093PCT)operations. For example, the N-digit intensity vector may be used to determine whether the abnormality is predicted to be a defect, or an artifact. For example, if the intensity is low in most wavelengths, it may be determined that the signal is an artifact or nuisance signal. An expected intensity profile across wavelengths may be generated (e.g., based on modeling, empirical data, or the like) corresponding to one or more expected defect responses. For example, a codebook of N-digit intensity vectors may be generated corresponding to one or more expected, predicted, or target types of defects. The codebook may include predicted defect signatures. A distance metric between the expecting intensity profile(s) and the measured intensity profile (e.g., the N-digit intensity vector) may be utilized to determine whether the abnormality corresponds to a defect or artifact, may be utilized to classify defect type, or the like.

[0029] In some embodiments, the N-digit intensity vector may be simplified, e.g., binarized (for example, intensities above a threshold value may be converted to one, while intensities below the threshold value are converted to zero). The simplified N-digit intensity vector ma be compared to reference vectors (e.g., a codebook), used for distance calculations, or the like for classification of abnormalities or defects.

[0030] In some embodiments, one or more of the operations described may be performed by a trained artificial intelligence (e.g., machine learning) model. For example, operations directed toward identifying image anomalies (e.g., spot finding or peak finding operations performed on background-corrected inspection images) may be performed by trained artificial intelligence models. In some embodiments, defect classification may be performed by a trained artificial intelligence model. For example, a N-digit intensity vector may be provided to a model, and the model may predict what type of defect the vector corresponds to (in some embodiments with some additional input, such as substrate material, substrate design, etc.). In some embodiments, artificial intelligence may be utilized to set one or more boundary conditions. For example, a combination of average N-digit vector intensity and distance from a target vector (e.g., corresponding to an expected defect response) may be used to determine whether a signal is an artifact or a likely defect. An artificial intelligence model may be trained to set such a boundary, which may be of complex shape, based on training data indicating false defect detections and true defect detections. In some embodiments, grouping anomalies into a smaller number of defects (e.g., anomalies whose calculated locations differ slightly between different wavelength inspections) may be performed or augmented by artificial intelligence. For example, a radius within which two anomalies may be considered to be the same defect may be tuned for one or moreAttorney Docket No.: 36119.2956 (L2093PCT)applications by artificial intelligence, or an artificial intelligence model maybe provided with data indicative of anomaly locations (e.g., potentially further including additional information, such as wavelength of illumination or intensity), and provide predictions of groups of anomalies that are likely to correspond to the same defects. In some embodiments, a radius or other metric for grouping of anomalies maybe adjusted on-the-fly, e.g., based on quality checks on registration, background subtraction, image quality, or the like. In some embodiments, a reference frame may be generated based on one or more other (e.g., adjacent) structures, dies, patterns, cells, or the like to reference a target area of the substrate, e.g., for background subtraction. In some embodiments, an artificial intelligence model may be utilized to determine which available candidates may be of use in generating a reference frame, may be utilized to determine a combination method (e.g., maximum intensity, averaging, median, etc.) for generating the reference image, or the like. In some embodiments, image registration operations (e.g., ensuring two images are well aligned, such as for the purpose of background removal) may be performed and / or augmented by trained artificial intelligence models. In some embodiments, one or more artificial intelligence models may provide a confidence output indicating a level of certainty that the output is accurate. Confidence intervals may be utilized, e.g., to determine parameters for additional processing steps.

[0031] In some embodiments, one or more artificial intelligence models may perform a significant portion of operations related to defect detection. For example, a set of inspection images may be provided to a trained machine learning model. The set of images may be raw, pre-processed, background-corrected, or the like for various applications. The set of inspection images may include images generated based on multiple wavelengths (e.g., multiple wavelength bands, multiple illumination sources or schemes, etc.). An artificial intelligence model (e.g., a convolution neural network, aU-netmodel, or the like) may obtain the images, and output one or more outputs related to defect detections. The output may include predicted defect locations, a map of defect locations, a down-sampled binarized map of defect locations, classification of defects, or the like.

[0032] Systems and methods of the present disclosure provide technological advantages over conventional methods. Performing modeling operations to predict defect locations based on multi-wavelength input may improve performance of an inspection system, in particular when multiple types of defects, materials, or the like may be of interest to detect. Improving detection of defects enables performance of corrective actions, correction of process recipes, indicates maintenance to be performed on the process chamber, enables adjustments toAttorney Docket No.: 36119.2956 (L2093PCT)improve one or more metrics of interest in the process, etc. Reducing likelihood of defects may increase a likelihood of developing products that meet performance thresholds, increase efficiency of processing in terms of throughput, material cost, energy cost, environmental impact, etc., reduce costs associated with disposing of defective products, reduce wear and tear on substrate processing equipment, etc. Performing multi-wavelength analysis operations to determine substrate defects may improve efficiency (e.g., accuracy and / or throughput) of correcting defect root causes compared to other methods. Increased efficiency in correcting root causes may include reduced chamber down time or maintenance time, more time at peak chamber productivity, etc. Performing multi-wavelength analysis to detect defects may improve costs of determining defect root causes above experimental methods, by avoiding costs associated with performing experiments to determine a likelihood of defect formation, such as costs associated with process materials, substrate materials, energy expenditure, time, environmental impact, costs associated with disposing of test substrates, etc.

[0033] In one aspect of the present disclosure, a method includes obtaining a plurality of images of a substrate. Each of the plurality of images are associated with a different inspection wavelength. The method further includes obtaining a plurality of reference frames. Each of the reference frames is associated with one of the images. The method further includes determining a defect of the substrate, based on the plurality of images and the plurality of reference frames. The method further includes providing an alert to a user indicative of the defect.

[0034] In another aspect of the disclosure, a method includes obtaining a plurality of inspection images. The plurality of inspection images includes images associated with a plurality of inspection wavelengths. The plurality of inspections images further includes images associate with a plurality of substrates. The method further includes obtaining a plurality of defect location data associated with the plurality of substrates. The method further includes training an artificial intelligence model to determine defect location data based on a set of inspection images, by providing the plurality of inspection images as training input and the plurality of defect location data as target output.

[0035] In another aspect of the disclosure, a non-transitory machine-readable storage medium is disclosed. The storage medium stores instructions which, when executed, cause a processing device to perform operations. The operations include obtaining a plurality of images of a substrate. Each of the plurality of images are associated with a different inspection wavelength. The operations further include obtaining a plurality of reference frames. Each of the reference frames is associated with one of the images. The operationsAttorney Docket No.: 36119.2956 (L2093PCT)further include determining a defect of the substrate, based on the plurality of images and the plurality of reference frames. The operations further include providing an alert to a user indicative of the defect.

[0036] FIG. 1 is a block diagram illustrating an exemplary system 100 (exemplary system architecture), according to some embodiments. The system 100 includes a client device 120, manufacturing equipment 124, metrology equipment 128, predictive server 112, and data store 140. The predictive server 112 maybe part of predictive system 110. Predictive system 110 may further include server machines 170 and 180. Manufacturing equipment 124 may include one or more sensors, which may include an inspection system for performing multiwavelength inspection of substrates.

[0037] Manufacturing equipment 124 may include one or more process tools, process chambers, or the like for performing processing operations to manufacture substrates. Substrates may have property values (film thickness, film strain, etc.) measured by metrology equipment 128. Metrology data 160 may be a component of data store 140. Metrology data 160 may include historical metrology data (e.g., metrology data associated with previously processed products). In some embodiments, historical metrology data may be used in training a machine leaning model, in calibrating a physics-based model, in generating a reduced-order model, or the like. For example, historical metrology data may include defect classification data, defect location data, etc., which may be utilized in preparing analysis techniques for defect identification. Historical metrology data may be utilized in determining a historical likelihood of developing substrate defects, and the historical likelihood may be utilized in generating a machine learning model, in calibrating a physics-based model, in determining whether to use a model in association with a process of interest, or the like.

[0038] Metrology data 160 may be provided by instruments separate from a manufacturing mainframe, e.g., substrates may be measured at a standalone metrology facility. In some embodiments, metrology data 160 may be provided without use of a standalone metrology facility, e.g., in-situ metrology data (e.g., metrology or a proxy for metrology collected during processing), integrated metrology data (e.g., metrology or a proxy for metrology collected while a product is within a chamber or under vacuum, but not during processing operations), inline metrology data (e.g., data collected after a substrate is removed from vacuum), etc. A substrate inspection system may be included as integrated, in-situ, or inline metrology, in some embodiments. Metrology data 160 may include current metrology data (e.g., metrology data associated with a product currently or recently processed). Current metrology data may be provided to update one or more models in association with defect rootAttorney Docket No.: 36119.2956 (L2093PCT)cause correction, e.g., by updating weights or biases of a machine learning or artificial intelligence model, updating parameters of a physics-based model, updating coefficients of a reduced order model, or the like.

[0039] Data store 140 may further include manufacturing parameters 150. Manufacturing parameters 150 may include parameters associated with performing substrate processing procedures, such as recipe data (e.g., process parameters), equipment constants (e.g., hardware parameters, parameters determining how operations of manufacturing equipment 124 are performed), indications ofinstalled hardware components, or the like. Manufacturing parameter data, similar to metrology data 160, may include historical parameters and current parameters. Historical parameters may be utilized in generating a model (e.g., one or more models 190) for defect correction, e.g., to be used to reduce a likelihood of developing a particle defect during substrate processing. Manufacturing parameters 150 may include data associated with upcoming processes, e.g., recipe data, which may be adjusted responsive to analysis of one or more substrate defects via a multi-wavelength defect analysis operation.

[0040] Data store 140 may further include inspection image data 154. Inspection image data 154 may include substrate images, pre-processed images and pre-processing parameters, registration parameters, reference frames, background-corrected substrate images, or other data for use in performing multi-wavelength defect analysis.

[0041] In some embodiments metrology data 160, inspection image data 154, and / or manufacturing parameters 150 may be processed (e.g., by the client device 120 and / or by the predictive server 112). Processing of the data may include generating features. In some embodiments, the features are a pattern in the metrology data 160, inspection image data 154, and / or manufacturing parameters 150 (e.g., slope, width, height, peak, etc.) or a combination of values from the metrology data and / or manufacturing parameters (e.g., power derived from voltage and current, etc.). Inspection image data 154 may include features and the features may be used by predictive component 114 for performing signal processing and / or for obtaining predictive data 168 for performance of a corrective action. Predictive data 168 may include indications of defect locations, defect classifications, etc., based on multi-wavelength inspection image defect analysis.

[0042] . Each instance of metrology data 160, inspection image data 154, and / or manufacturing parameters 150 may correspond to a product, a set of manufacturing equipment, a type of substrate produced by manufacturing equipment, or the like. A model 190 may also be associated with a particular product, substrate design, set of manufacturing equipment, design of manufacturing chamber, or the like. For example, a defect analysisAttorney Docket No.: 36119.2956 (L2093PCT)model may be associated with a particular product or substrate design, a reduced order or machine learning model may be generated based on data from a particular design of chamber or a specific specimen of process chamber (e.g., to account for differences between nominally identical chambers), or the like. The data store may further store information associating sets of different data types, e.g. information indicative that a set of sensor data, a set of metrology data, and a set of manufacturing parameters are all associated with the same product, manufacturing equipment, type of substrate, etc.

[0043] In some embodiments, a processing device (e.g., via a model) may be used to generate predictive data 168. Predictive data 168 may include predictions of defect locations and / or classifications based on inspection images. Predictive data 168 may include one or more indications of predicted improvements to a processing operation (e.g., to improve efficiency, to reduce a likelihood of generating defects on substrate, or the like). Predictive data 168 may be utilized by system 100 for performance of a corrective action (e.g., providing alerts to a user, updating process recipes, updating manufacturing parameters, scheduling maintenance, or the like).

[0044] In some embodiments, predictive system 110 may generate predictive data 168 utilizing a physics-based model. A physics-based model may include a mathematical representation of the laws of nature at play in the process chamber. The physics-based model may be a first principles model, an approximate model, or the like. The physics-based model may include a representation or parameterization of chamber geometry, pumping parameters, gas flow parameters, or the like. A physics-based model may include one or more parameters that are allowed to be adjusted to fit the physics-based model to data, e.g., historical metrology data 164, e.g., to account for details of physics of the process chamber not captured by the original model parameters.

[0045] In some embodiments, predictive system 110 may generate predictive data 168 utilizing a reduced order model. A reduced order model may include a simplified version of a complex model (e.g., a simplified version of a first-principles computational model). The reduced order model may mimic the performance of the full model under a target range of conditions (e.g., relevant to substrate processing conditions), while being more computationally efficient. Training data (e.g., historical metrology data 164, historical parameters 152, etc.) may be utilizing in determining which simplifications from a more complete model to make, in determining coefficients of a reduced order model, or the like.

[0046] In some embodiments, predictive system 110 may generate predictive data 168 using supervised machine learning (e.g., predictive data 168 includes output from a machineAttorney Docket No.: 36119.2956 (L2093PCT)learning model that was trained using labeled data, such as manufacturing parameter data labelled with metrology data (e.g., which may include rates of defect formation, or other metrology of interest). In some embodiments, predictive system 110 may generate predictive data 168 using unsupervised machine learning (e.g., predictive data 168 includes output from a machine learning model that was trained using unlabeled data, output may include clustering results, principal component analysis, anomaly detection, etc.). In some embodiments, predictive system 110 may generate predictive data 168 using semi-supervised learning (e.g., training data may include a mix of labeled and unlabeled data, etc.).

[0047] Client device 120, manufacturing equipment 124, metrology equipment 128, predictive server 112, data store 140, server machine 170, and server machine 180 may be coupled to each other via network 130 for generating predictive data 168 to perform corrective actions. In some embodiments, network 130 may provide access to cloud-based services. Operations performed by client device 120, predictive system 110, data store 140, etc., may be performed by virtual cloud-based devices.

[0048] In some embodiments, network 130 is a public network that provides client device 120 with access to the predictive server 112, data store 140, and other publicly available computing devices. In some embodiments, network 130 is a private network that provides client device 120 access to manufacturing equipment 124, metrology equipment 128, data store 140, and other privately available computing devices. Network 130 may include one or more Wide Area Networks (WANs), Local Area Networks (LANs), wired networks (e.g., Ethernet network), wireless networks (e.g., an 802.11 network or a Wi-Fi network), cellular networks (e.g., a Long Term Evolution (LTE) network), routers, hubs, switches, server computers, cloud computing networks, and / or a combination thereof.

[0049] Client device 120 may include computing devices such as Personal Computers (PCs), laptops, mobile phones, smartphones, tablet computers, netbook computers, network connected televisions (“smart TV”), network-connected media players (e.g., Blu-ray player), a set-top-box, Over-the-Top (OTT) streaming devices, operator boxes, etc. Client device 120 may include a corrective action component 122. Corrective action component 122 may receive user input (e.g., via a Graphical User Interface (GUI) displayed via the client device 120) of an indication associated with manufacturing equipment 124. In some embodiments, corrective action component 122 transmits the indication to the predictive system 110, receives output (e.g., predictive data 168) from the predictive system 110, determines a corrective action based on the output, and causes the corrective action to be implemented. In some embodiments, corrective action component 122 obtains model input data associatedAttorney Docket No.: 36119.2956 (L2093PCT)with manufacturing equipment 124 (e.g., from data store 140, etc.) and provides the model input data (e.g., inspection image data 154) associated with the manufacturing equipment 124 to predictive system 110.

[0050] In some embodiments, corrective action component 122 receives an indication of a corrective action from the predictive system 110 and causes the corrective action to be implemented. Each client device 120 may include an operating system that allows users to one or more of generate, view, or edit data (e.g., indication associated with manufacturing equipment 124, corrective actions associated with manufacturing equipment 124, etc.).

[0051] In some embodiments, metrology data 160 (e.g., historical metrology data) corresponds to historical property data of products (e.g., products processed using manufacturing parameters associated with historical manufacturing parameters) and predictive data 168 is associated with predicted property data (e.g., of products to be produced or that have been produced in conditions recorded by current manufacturing parameters). In some embodiments, predictive data 168 is or includes predicted metrology data (e.g., virtual metrology data, defect appearance likelihood, defect location or classification) of the products to be produced or that have been produced according to conditions recorded as current measurement data and / or current manufacturing parameters, e.g., as recorded in inspection image data 154.

[0052] In some embodiments, predictive data 168 is or includes an indication of any abnormalities (e.g., abnormal products, abnormal components, abnormal manufacturing equipment 124, abnormal energy usage, etc.) and optionally one or more causes of the abnormalities. In some embodiments, predictive data 168 is an indication of change overtime or drift in some component of manufacturing equipment 124, metrology equipment 128, and the like. In some embodiments, predictive data 168 is an indication of an end of life of a component of manufacturing equipment 124, metrology equipment 128, or the like.

[0053] Performing manufacturing processes that result in defective products can be costly in time, energy, products, components, manufacturing equipment 124, the cost of identifying the defects and discardingthe defective product, etc. By inputting inspection image data 154 into predictive system 110, receiving output of predictive data 168, and performing a corrective action based on the predictive data 168, system 100 can have the technical advantage of avoiding the cost of producing, identifying, and discarding defective products.

[0054] Performing manufacturing processes that result in failure of the components of the manufacturing equipment 124 can be costly in downtime, damage to products, damage to equipment, express ordering replacement components, etc. By inputting inspection imageAttorney Docket No.: 36119.2956 (L2093PCT)data 154 to predictive system 110, receiving output of predictive data 168, and performing corrective action (e.g., predicted operational maintenance, such as replacement, processing, cleaning, etc. of components causing defects to develop on substrates during processing) based on the predictive data 168, system 100 can have the technical advantage of avoiding the cost of one or more of unexpected component failure, unscheduled downtime, productivity loss, unexpected equipment failure, product scrap, or the like. Monitoring the performance overtime of components, e.g. manufacturing equipment 124, metrology equipment 128, and the like, may provide indications of degrading components.

[0055] Manufacturing parameters may be suboptimal for producing product which may have costly results of increased resource (e.g., energy, coolant, gases, etc.) consumption, increased amount of time to produce the products, increased component failure, increased amounts of defective products, etc. By inputting inspection image data 154 into a model 190 (which may include many operations, types of models, algorithms, etc.), receiving an output of predictive data 168, and performing a corrective action of updating manufacturing parameters (e.g., setting optimal manufacturing parameters, updating a process recipe, or the like), system 100 can have the technical advantage of using optimal manufacturing parameters (e.g., hardware parameters, process parameters, optimal design) to avoid costly results of suboptimal manufacturing parameters, including reducing a likelihood of developing particle defects on substrates, maintaining high product throughput while managing a likelihood of developing defects, or the like. Further improvements such as reducing cost of operation, reducing environmental impact, improving throughput, or the like may be attained through similar means as those described above.

[0056] In some embodiments, the corrective action includes providing an alert to a user (e.g., an alarm to stop or not perform the manufacturing process if the predictive data 168 indicates a predicted abnormality, such as an abnormality of the product, a component, or manufacturing equipment 124). In some embodiments, performance of the corrective action includes causing updates to one or more manufacturing parameters. Performance of a corrective action may include performing maintenance, replacing one or more components, initiating chamber cleaning or seasoning operations, etc. In some embodiments, performance of a corrective action may include recalibration or adjustment of parameters of a physicsbased model or reduced order model. In some embodiments performance of a corrective action may include retraining a machine learning model associated with manufacturing equipment 124. In some embodiments, performance of a corrective action may include training a new machine learning model associated with manufacturing equipment 124.Attorney Docket No.: 36119.2956 (L2093PCT)

[0057] Manufacturing parameters 150 may include hardware parameters (e.g., information indicative of which components are installed in manufacturing equipment 124, indicative of component replacements, indicative of component age, indicative of software version or updates, etc.) and / or process parameters (e.g., temperature, pressure, flow, rate, electrical current, voltage, gas flow, lift speed, etc.). In some embodiments, the corrective action includes causing preventative operative maintenance (e.g., replace, process, clean, etc. components of the manufacturing equipment 124). In some embodiments, the corrective action includes causing design optimization (e.g., updating manufacturing parameters, manufacturing processes, manufacturing equipment 124, etc. for an optimized product). In some embodiments, the corrective action includes a updating a recipe (e.g., altering the timing of manufacturing subsystems entering an idle or active mode, altering set points of various property values, etc.).

[0058] Predictive server 112, server machine 170, and server machine 180 may each include one or more computing devices such as a rackmount server, a router computer, a server computer, a personal computer, a mainframe computer, a laptop computer, a tablet computer, a desktop computer, Graphics Processing Unit (GPU), accelerator Application-Specific Integrated Circuit (ASIC) (e.g., Tensor Processing Unit (TPU)), etc. Operations of predictive server 112, server machine 170, server machine 180, data store 140, etc., may be performed by a cloud computing service, cloud data storage service, etc.

[0059] Predictive server 112 may include a predictive component 114. In some embodiments, the predictive component 114 may receive inspection image data 154 (e.g., receive from the client device 120, retrieve from the data store 140) and generate output (e.g., predictive data 168) for performing corrective action associated with the manufacturing equipment 124 based on the current data. In some embodiments, predictive data 168 may include one or more predicted defects of a processed product. In some embodiments, predictive data 168 may include a prediction of conditions in-chamber that may result in defect formation, such as gas backflow. In some embodiments, predictive component 114 may use one or more models, which may include trained artificial intelligence models, to determine the output for performing the corrective action based on current data.

[0060] Manufacturing equipment 124 may be associated with one or more models, e.g., model 190. In some embodiments, model(s) 190 may be or include physics-based models, reduced order models, artificial intelligence models, etc. Artificial intelligence models associated with manufacturing equipment 124 may perform many tasks, including process control, classification, performance predictions, etc. Model 190 may be trained using dataAttorney Docket No.: 36119.2956 (L2093PCT)associated with manufacturing equipment 124 or products processed by manufacturing equipment 124, e.g., sensor data, manufacturing parameters 150 (e.g., associated with process control of manufacturing equipment 124), metrology data 160 (e.g., generated by metrology equipment 128), etc.

[0061] One type of machine learning model that may be used to perform some or all of the above tasks is an artificial neural network, such as a deep neural network. Artificial neural networks generally include a feature representation component with a classifier or regression layers that map features to a desired output space. A convolutional neural network (CNN), for example, hosts multiple layers of convolutional filters. Pooling is performed, and nonlinearities may be addressed, at lower layers, on top of which a multi-layer perceptron is commonly appended, mapping top layer features extracted by the convolutional layers to decisions (e.g. classification outputs).

[0062] In some embodiments, aU-netmodel maybe utilized for one or more operations associated with defect analysis based on multi-wavelength inspection. A U-net is a type of neural network architecture configured to localization of image regions, and may be particularly applicable to locating defects in substrate inspection images. A U-net may include encoder and decoder layers, e.g., down-sampling and up-sampling layers to capture context, learn spatial features, reconstruct images, enable pixel-level classification, etc. A U-net model may include skip connections to enable recovery of spatial information that may be lost during down-sampling.

[0063] A recurrent neural network (RNN) is another type of machine learning model. A recurrent neural network model is designed to interpret a series of inputs where inputs are intrinsically related to one another, e.g., time trace data, sequential data, etc. Output of a perceptron of an RNN is fed back into the perceptron as input, to generate the next output.

[0064] Deep learning is a class of machine learning algorithms that use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. Deep neural networks may learn in a supervised (e.g., classification) and / or unsupervised (e.g., pattern analysis) manner. Deep neural networks include a hierarchy of layers, where the different layers learn different levels of representations that correspond to different levels of abstraction. In deep learning, each level learns to transform its input data into a slightly more abstract and composite representation. In an image recognition application, for example, the raw input may be a matrix of pixels; the first representational layer may abstract the pixels and encode edges; the second layer may compose and encode arrangements of edges; the third layer may encodeAttorney Docket No.: 36119.2956 (L2093PCT)higher level shapes (e.g., teeth, lips, gums, etc.); and the fourth layer may recognize a scanning role. Notably, a deep learning process can learn which features to optimally place in which level on its own. The "deep" in "deep learning" refers to the number of layers through which the data is transformed. More precisely, deep learning systems have a substantial credit assignment path (CAP) depth. The CAP is the chain of transformations from input to output. CAPs describe potentially causal connections between input and output. For a feedforward neural network, the depth of the CAPs may be that of the network and may be the number of hidden layers plus one. For recurrent neural networks, in which a signal may propagate through a layer more than once, the CAP depth is potentially unlimited.

[0065] In some embodiments, predictive component 114 obtains inspection image data 154, performs signal processing to break down the current data into sets of current data, provides the sets of current data as input to a trained model 190, and obtains outputs indicative of predictive data 168 from the trained model 190. In some embodiments, predictive data is indicative of metrology data (e.g., prediction of substrate quality, substate defect likelihood, or the like). In some embodiments, predictive data is indicative of manufacturing equipment health (e.g., an indication of a component or components likely to be contributing to substrate defects).

[0066] In some embodiments, the various models discussed in connection with model 190 (e.g., supervised machine learning model, unsupervised machine learning model, etc.) may be combined in one model (e.g., an ensemble model), or may be separate models.

[0067] Data store 140 may be a memory (e.g., random access memory), a drive (e.g., a hard drive, a flash drive), a database system, a cloud-accessible memory system, or another type of component or device capable of storing data. Data store 140 may include multiple storage components (e.g., multiple drives or multiple databases) that may span multiple computing devices (e.g., multiple server computers). The data store 140 may store manufacturing parameters 150, metrology data 160, inspection image data 154, and predictive data 168.

[0068] In some embodiments, predictive system 110 further includes server machine 170 and server machine 180. Server machine 170includes a data set generator 172thatis capable of generating data sets (e.g., a set of data inputs and a set of target outputs) to train, validate, and / or test model(s) 190, including one or more machine learning models. Some operations of data set generator 172 are described in detail below with respect to FIG. 2 and 4 A. In some embodiments, data set generator 172 may partition the historical data (e.g., historical metrology data, historical inspection image data, etc.) into a training set (e.g., sixty percent ofAttorney Docket No.: 36119.2956 (L2093PCT)the historical data), a validating set (e.g., twenty percent of the historical data), and a testing set (e.g., twenty percent of the historical data).

[0069] Server machine 180 includes a training engine 182, a validation engine 184, selection engine 185, and / or a testing engine 186. An engine (e.g., training engine 182, a validation engine 184, selection engine 185, and a testing engine 186) may refer to hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, processing device, etc.), software (such as instructions run on a processing device, a general purpose computer system, or a dedicated machine), firmware, microcode, or a combination thereof. The training engine 182 may be capable of training a model 190 using one or more sets of features associated with the training set from data set generator 172. The training engine 182 may generate multiple trained models 190, where each trained model 190 corresponds to a distinct set of features of the training set. For example, a first trained model may have been trained using all features (e.g., XI -X5), a second trained model may have been trained using a first subset of the features (e.g., XI, X2, X4), and a third trained model may have been trained usinga second subset of the features (e.g., XI, X3, X4, andX5)thatmay partially overlap the first subset of features. Data set generator 172 may receive the output of a trained, collect that data into training, validation, and testing data sets, and use the data sets to train a second model (e.g., a machine learning model configured to output predictive data, corrective actions, etc.).

[0070] Validation engine 184 may be capable of validating a trained model 190 using a corresponding set of features of the validation set from data set generator 172. For example, a first trained machine learning model 190 that was trained using a first set of features of the training set may be validated using the first set of features of the validation set. The validation engine 184 may determine an accuracy of each of the trained models 190 based on the corresponding sets of features of the validation set. Validation engine 184 may discard trained models 190 that have an accuracy that does not meet a threshold accuracy. In some embodiments, selection engine 185 maybe capable of selecting one or more trained models 190 that have an accuracy that meets a threshold accuracy. In some embodiments, selection engine 185 may be capable of selecting the trained model 190 that has the highest accuracy of the trained models 190.

[0071] Testing engine 186 may be capable of testing a trained model 190 using a corresponding set of features of a testing set from data set generator 172. For example, a first trained machine learning model 190 that was trained using a first set of features of the training set may be tested using the first set of features of the testing set. Testing engine 186Attorney Docket No.: 36119.2956 (L2093PCT)may determine a trained model 190 that has the highest accuracy of all of the trained models based on the testing sets.

[0072] In the case of a machine learning model, model 190 may refer to the model artifact that is created by training engine 182 using a training set that includes data inputs and corresponding target outputs (correct answers for respective training inputs. Patterns in the data sets can be found that map the data input to the target output (the correct answer), and machine learning model 190 is provided mappings that capture these patterns. The machine learning model 190 may use one or more of Support Vector Machine (SVM), Radial Basis Function (RBF), clustering, supervised machine learning, semi-supervised machine learning, unsupervised machine learning, k -Nearest Neighbor algorithm (k-NN), linear regression, random forest, neural network (e.g., artificial neural network, recurrent neural network), etc. In some embodiments, one or more machine learning models 190 may be trained using historical data (e.g., historical inspection image data 154, historical metrology data including defect locations, etc.).

[0073] Predictive component 114 may provide current data to model 190 and may run model 190 on the input to obtain one or more outputs. For example, predictive component 114 may provide inspection image data 154 to model 190 and may run model 190 on the input to obtain one or more outputs. Predictive component 114 may be capable of determining (e.g., extracting) predictive data 168 from the output of model 190. Predictive component 114 may determine (e.g., extract) confidence data from the output that indicates a level of confidence that predictive data 168 is an accurate predictor of a process associated with the input data for products produced or to be produced using the manufacturing equipment 124 at the current manufacturing parameters. Predictive component 114 or corrective action component 122 may use the confidence data to decide whether to cause a corrective action associated with the manufacturing equipment 124 based on predictive data 168.

[0074] The confidence data may include or indicate a level of confidence that the predictive data 168 is an accurate prediction for products or components associated with at least a portion of the input data. In one example, the level of confidence is a real number between 0 and 1 inclusive, where 0 indicates no confidence that the predictive data 168 is an accurate prediction for products processed according to input data or component health of components of manufacturing equipment 124 and 1 indicates absolute confidence that the predictive data 168 accurately predicts properties of products processed according to input data or component health of components of manufacturing equipment 124. Responsive to theAttorney Docket No.: 36119.2956 (L2093PCT)confidence data indicating a level of confidence below a threshold level for a predetermined number of instances (e.g., percentage of instances, frequency of instances, total number of instances, etc.) predictive component 114 may cause trained model 190 to be re-trained (e.g., based on current manufacturing parameters, current metrology, measurements of conditions in the chamber, etc.). In some embodiments, retraining may include generating one or more data sets (e.g., via data set generator 172) utilizing historical data.

[0075] For purpose of illustration, rather than limitation, aspects of the disclosure describe the training of one or more machine learning models 190 using historical data (e.g., historical metrology data 164, historical manufacturing parameters) and inputting current data (e.g., current manufacturing parameters, and current metrology data) into the one or more trained machine learning models to determine predictive data 168. In other embodiments, a heuristic model, physics-based model, or rule-based model is used to determine predictive data 168 (e.g., without using a trained machine learning model). In some embodiments, such models may be trained using historical data. In some embodiments, these models may be retrained utilizing a historical data and / or current data. Predictive component 114 may monitor historical manufacturing parameters, and metrology data 160. Any of the information described with respect to data inputs 210 of FIG. 2 may be monitored or otherwise used in the heuristic, physics-based, or rule-based model.

[0076] In some embodiments, the functions of client device 120, predictive server 112, server machine 170, and server machine 180 may be provided by a fewer number of machines. For example, in some embodiments server machines 170 and 180 may be integrated into a single machine, while in some other embodiments, server machine 170, server machine 180, and predictive server 112 may be integrated into a single machine. In some embodiments, client device 120 and predictive server 112 may be integrated into a single machine. In some embodiments, functions of client device 120, predictive server 112, server machine 170, server machine 180, and data store 140 may be performed by a cloudbased service.

[0077] In general, functions described in one embodiment as being performed by client device 120, predictive server 112, server machine 170, and server machine 180 can also be performed on predictive server 112 in other embodiments, if appropriate. In addition, the functionality attributed to a particular component can be performed by different or multiple components operating together. For example, in some embodiments, the predictive server 112 may determine the corrective action based on the predictive data 168. In another example,Attorney Docket No.: 36119.2956 (L2093PCT)client device 120 may determine the predictive data 168 based on output from the trained machine learning model.

[0078] In addition, the functions of a particular component can be performed by different or multiple components operating together. One or more of the predictive server 112, server machine 170, or server machine 180 may be accessed as a service provided to other systems or devices through appropriate application programming interfaces (API).

[0079] In embodiments, a “user” may be represented as a single individual. However, other embodiments of the disclosure encompass a “user” being an entity controlled by a plurality of users and / or an automated source. For example, a set of individual users federated as a group of administrators may be considered a “user.”

[0080] FIG. 2 depicts a block diagram of example data set generator 272 (e.g., data set generator 172 of FIG. 1) to create data sets for training, testing, validating, calibrating, etc. a model (e.g., model 190 of FIG. 1), according to some embodiments. Each data set generator 272 may be part of server machine 170 of FIG. 1. In some embodiments, data set generator 272 may generate data sets to be utilized to adjust, validate, test, or the like a physics-based model or reduced order model. In some embodiments, data set generator 272 may generate data sets to be utilized in generating, validating, etc., machine learning models in association with the manufacturing equipment. In some embodiments, several models associated with manufacturing equipment 124 may be trained, used, and maintained (e.g., within a manufacturing facility). One or more physics-based models, one or more reduced order models, and / or one or more trained machine learning models may be generated and maintained in association with the manufacturing equipment. Each model may be associated with one data set generators 272, multiple models may share a data set generator 272, etc.

[0081] FIG. 2 depicts a system 200 including data set generator 272 for creating data sets for one or more supervised models (e.g., including data associated with input to a model and output from the model). Data set generator 272 may create data sets (e.g., data input 210, target output 220) using historical data, which may include manufacturing parameters, defect generation likelihood, defect classification, defect locations, inspection images in multiple wavelengths, N-dimensional vectors related to anomaly intensity, or the like. In some embodiments, a data set generator similar to data set generator 272 may be utilized to train an unsupervised model, e.g., target output 220 may not be generated by data set generator 272. For example, for clustering, anomaly recognition, peak-finding operations, or the like, target output 220 may not be generated.Attorney Docket No.: 36119.2956 (L2093PCT)

[0082] Data set generator 272 may generate data sets to train, test, and validate a model, e.g., a machine learning model. Data set generator 272 may generate data sets to calibrate a model, e.g., a physics-based model (including reduced order models). In some embodiments, data set generator 272 may generate data sets for an artificial intelligence model. In some embodiments, data set generator 272 may generate data sets for training, testing, and / or validating a model configured to generate predictive defect data in a substrate processing system, such as generating data indicating a likelihood of particle defect formation, a defect classification, a defect location, a predicted particle source, a recommended update to substrate processing, or the like.

[0083] A model to be generated (e.g., trained, calibrated, or the like) may be provided with a set of historical inspection images 252-1 as data input 210. The set of historical inspection images 252-1 may include multiple wavelength images, e.g., images associated with illumination or detection from multiple wavelengths or wavelength bands. The model may be configured to accept multi-wavelength inspection images as input and generate locations and / or classification of defects as output. The model may instead be configured to perform operations related to multi-wavelength defect analysis, such as peak finding in background-corrected images; defect classification based on intensity, confidence, anomaly locations, N-digit intensity vectors, or the like; generation of boundary conditions for classification of anomalies, defects, artifacts, or the like; selection of clustering radii for defect conglomeration; generation of reference frames; or the like. For these or other applications, data set generator 272 may generate sets of data appropriate for training an artificial intelligence or other model to perform the target functions, corresponding to operations described with respect to an artificial intelligence model configured to generate defect data based on inspection images, as described in connection with FIG. 2.

[0084] In some embodiments, data set generator 272 generates a data set (e.g., training set, validating set, testing set) that includes one or more data inputs 210 (e.g., training input, validating input, testing input). Data inputs 210 may be provided to training engine 182, validating engine 184, or testing engine 186. The data set may be used to train, validate, or test the model (e.g., model 190 of FIG. 1).

[0085] In some embodiments, data input 210 may include one or more sets of data. As an example, system 200 may produce sets of manufacturing parameter data that may include one or more of parameter data from one or more types of components, combinations of parameter data from one or more types of components, patterns from parameter data from one or moreAttorney Docket No.: 36119.2956 (L2093PCT)types of components, or the like. In some embodiments, target output 220 may include sets of output related to the various sets of data input 210.

[0086] In some embodiments, data set generator 272 may generate a first data input corresponding to a first set of historical inspection images 252-1 to train, validate, or test a first machine learning model. Data set generator 272 may generate a second data input corresponding to a second set of historical inspection images (e.g., a set of historical inspection images 252-2, not shown) to train, validate, or test a second machine learning model. Further sets of historical data may further be utilized in generating further machine learning models. Any number of sets of historical data may be utilized in generating any number of machine learning models, up to a final set, set of historical inspection images 252-N (N representing any target quantity of data sets, models, etc.)

[0087] In some embodiments, data set generator 272 may generate a first data input corresponding to a first set of historical inspection images 252-1 to train, validate, or test a first machine learning model. Data set generator 272 may generate a second data input corresponding to a second set of historical inspection images 252-2 (not shown) to train, validate, or test a second machine learning model.

[0088] In some embodiments, data set generator 272 generates a data set (e.g., training set, validating set, testing set) that includes one or more data inputs 210 (e.g., training input, validating input, testing input) and may include one or more target outputs 220 that correspond to the data inputs 210. The data set may also include mapping data that maps the data inputs 21 O to the target outputs 220. In some embodiments, data set generator 272 may generate data for training a model configured to output relevant to preventing particle defect formation, by generating data sets including output predictive defect data 268. Data inputs 210 may also be referred to as “features,” “attributes,” or “information.” In some embodiments, data set generator 272 may provide the data set to training engine 182, validating engine 184, or testing engine 186, where the data set is used to train, validate, or test the model (e.g., one of the machine learning models that are included in model 190, ensemble model 190, etc.).

[0089] In some embodiments, subsequent to generating a data set and training, validating or testing a machine learning model using the data set, the model may be further trained, validated, or tested, or adjusted (e.g., adjusting weights or parameters associated with input data of the model, such as connection weights in a neural network).

[0090] FIG. 3 is a block diagram illustrating system 300 for generating output data (e.g., predictive data 168 of FIG. 1), according to some embodiments. In some embodiments,Attorney Docket No.: 36119.2956 (L2093PCT)system 300 may be used in conjunction with a model (e.g., physics-based, reduced order, data-based, machine learning, artificial intelligence, or the like) configured to generate predictive data related to substrate defects. In some embodiments, system 300 is utilized for generating output data by a model such as model 190 of FIG. 1. In some embodiments, system 300 is utilized in conjunction with a model that receives as input multi-wavelength inspection image data, and generates as output defect locations, defect classification, defect analysis, and / orthe like. In some embodiments, system 300 maybe used in conjunction with a model to determine a corrective action associated with manufacturing equipment. In some embodiments, system 300 may be used in conjunction with a model to determine a fault of manufacturing equipment, e.g., a component resulting in particles being deposited on substrates during processing operations. In some embodiments, system 300 may be used in conjunction with a machine learning model to cluster or classify substrates or substrate defects, e.g., clustering defects which react similarly in a multi-wavelength inspection system. System 300 may be used in conjunction with a machine learning model with a different function than those listed, associated with a manufacturing system.

[0091] At block 310, system 300 (e.g., components of predictive system 110 of FIG. 1) performs data partitioning (e.g., via data set generator 172 of server machine 170 of FIG. 1) of data to be used in training, validating, and / or testing a model, such as a machine learning model. In some embodiments, inspection and defect data 364 includes historical data, such as historical metrology data (e.g., defect locations or classifications), historical inspection images (e.g., multi-wavelength inspection images), etc. Inspection and defect data 364 may undergo data partitioning at block 310 to generate training set 302, validation set 304, and testing set 306. For example, the training set may be 60% of the training data, the validation set may be 20% of the training data, and the testing set may be 20% of the training data.

[0092] The generation of training set 302, validation set 304, and testing set 306 may be tailored for a particular application. For example, the training set may be 60% of the training data, the validation set may be 20% of the training data, and the testing set may be 20% of the training data. System 300 may generate a plurality of sets of features for each of the training set, the validation set, and the testing set. For example, if inspection and defect data 364 includes inspection images associated with 10 different wavelengths or wavelength bands, the data may be divided into a first set of features associated with wavelengths 1-5 and a second set of features associated with wavelengths 6-10. Either target input, target output, both, or neither may be divided into sets. Multiple models may be trained on different sets of data.Attorney Docket No.: 36119.2956 (L2093PCT)

[0093] At block 312, system 300 performs model training (e.g., via training engine 182 of FIG. 1) using training set 302. Training of a machine learning model and / or of a physicsbased model (e.g., a digital twin) may be achieved in a supervised learning manner, which involves providing a training dataset including labeled inputs through the model, observing its outputs, defining an error (by measuring the difference between the outputs and the label values), and using techniques such as deep gradient descent and backpropagation to tune the weights of the model such that the error is minimized. In many applications, repeating this process across the many labeled inputs in the training dataset yields a model that can produce correct output when presented with inputs that are different than the ones present in the training dataset. In some embodiments, training of a machine learning model may be achieved in an unsupervised manner, e.g., labels or classifications may not be supplied during training. An unsupervised model may be configured to perform anomaly detection, result clustering, etc.

[0094] For each training data item in the training dataset, the training data item may be input into the model (e.g., into the machine learning model). The model may then process the input training data item (e.g., one or more manufacturing parameter values, etc.) to generate an output. The output may include, for example, a location of defect formation or a classification of a defect type. The output may be compared to a label of the training data item (e.g., a measured defect location or classification).

[0095] Processing logic may then compare the generated output (e.g., predicted defect generation data) to the label (e.g., actual defect generation data) that was included in the training data item. Processing logic determines an error (i.e., a classification error) based on the differences between the output and the label(s). Processing logic adjusts one or more weights and / or values of the model based on the error.

[0096] In the case of training a neural network, an error term or delta may be determined for each node in the artificial neural network. Based on this error, the artificial neural network adjusts one or more of its parameters for one or more of its nodes (the weights for one or more inputs of a node). Parameters may be updated in a back propagation manner, such that nodes at a highest layer are updated first, followed by nodes at a next layer, and so on. An artificial neural network contains multiple layers of “neurons”, where each layer receives as input values from neurons at a previous layer. The parameters for each neuron include weights associated with the values that are received from each of the neurons at a previous layer. Accordingly, adjusting the parameters may include adjusting the weights assigned toAttorney Docket No.: 36119.2956 (L2093PCT)each of the inputs for one or more neurons at one or more layers in the artificial neural network.

[0097] System 300 may train multiple models using multiple sets of features of the training set 302 (e.g., a first set of features of the training set 302, a second set of features of the training set 302, etc.). For example, system 300 may train a model to generate a first trained model using the first set of features in the training set (e.g., inspection images associated with wavelengths 1-5, etc.) and to generate a second trained model using the second set of features in the training set (e.g., inspection images associated with wavelengths 6-10, etc.). In some embodiments, the first trained model and the second trained model may be combined to generate a third trained model (e.g., which maybe a better predictor than the first or the second trained model on its own). In some embodiments, sets of features used in comparing models may overlap (e.g., first set of features being wavelengths 1-7 and second set of features being wavelengths 3-10). In some embodiments, hundreds of models may be generated including models with various permutations of features and combinations of models.

[0098] At block 314, system 300 performs model validation (e.g., via validation engine 184 of FIG. l)usingthe validation set 304. The system 300 may validate each of the trained models using a corresponding set of features of the validation set 304. For example, system 300 may validate the first trained model using the first set of features in the validation set (e.g., inspection images associated with wavelengths 1-5) and the second trained model using the second set of features in the validation set (e.g., inspection images associated with wavelengths 6-10). In some embodiments, system 300 may validate hundreds of models (e.g, models with various permutations of features, combinations of models, etc.) generated at block 312. At block 314, system 300 may determine an accuracy of each of the one or more trained models (e.g., via model validation) and may determine whether one or more of the trained models has an accuracy that meets a threshold accuracy. Responsive to determining that none of the trained models has an accuracy that meets a threshold accuracy, flow returns to block 312 where the system 300 performs model training using different sets of features of the training set. Responsive to determining that one or more of the trained models has an accuracy that meets a threshold accuracy, flow continues to block 316. System 300 may discard the trained models that have an accuracy that is below the threshold accuracy (e.g., based on the validation set).

[0099] At block 316, system 300 performs model selection (e.g., via selection engine 185 of FIG. 1) to determine which of the one or more trained models that meet the thresholdAttorney Docket No.: 36119.2956 (L2093PCT)accuracy has the highest accuracy (e.g., the selected model 308, based on the validating of block 314). Responsive to determining that two or more of the trained models that meet the threshold accuracy have the same accuracy, flow may return to block 312 where the system 300 performs model training using further refined training sets corresponding to further refined sets of features for determining a trained model that has the highest accuracy.

[0100] At block 318, system 300 performs model testing (e.g., via testing engine 186 of FIG. 1) using testing set 306 to test selected model 308. System 300 may test, using the first set of features in the testing set (e.g., inspection images associated with wavelengths 1-5), the first trained model to determine the first trained model meets a threshold accuracy.Determining whether the first trained model meets a threshold accuracy may be based on the first set of features oftestingset306. Responsive to accuracy of the selected model 308 not meeting the threshold accuracy, flow continues to block 312 where system 300 performs model training (e.g., retraining) using different training sets corresponding to different sets of features. Accuracy of selected model 308 may not meet threshold accuracy if selected model 308 is overly fit to the training set 302 and / or validation set 304. Accuracy of selected model 308 may not meet threshold accuracy if selected model 308 is not applicable to other data sets, including testing set 306. Training using different features may include training using data from different sensors, different manufacturing parameters, etc. Responsive to determining that selected model 308 has an accuracy that meets a threshold accuracy based on testing set 306, flow continues to block 320. In at least block 312, the model may learn patterns in the training data to make predictions. In block 318, the system 300 may apply the model on the remaining data (e.g., testing set 306) to test the predictions.

[0101] At block 320, system 300 uses the trained model (e.g., selected model 308) to receive current data 322 and determines (e.g., extracts), from the output of the trained model, predictive data 324. Current data 322 may be inspection images, metrology data, or other data related to a process, operation, or action of interest. Current data 322 may be substrate data related to a process under development, redevelopment, investigation, etc. A corrective action associated with the manufacturing equipment 124 of FIG. 1 may be performed in view of predictive data 324. In some embodiments, current data 322 may correspond to the same types of features in the historical data used to train the machine learning model. In some embodiments, current data 322 corresponds to a subset of the types of features in historical data that are used to train selected model 308. For example, a machine learning model may be trained using a number of wavelengths of illumination and / or detection, and configured to generate output based on a subset of the wavelengths.Attorney Docket No.: 36119.2956 (L2093PCT)

[0102] In some embodiments, the performance of a machine learning model trained, validated, and tested by system 300 may deteriorate. For example, a manufacturing system associated with the trained machine learning model may undergo a gradual change or a sudden change. A change in the manufacturing system may result in decreased performance of the trained machine learning model. A new model may be generated to replace the machine learning model with decreased performance. The new model may be generated by altering the old model by retraining, by generating a new model, etc.

[0103] Generation of a new model may include providing additional training data 346. Generation of a new model may further include providing current data 322, e.g., data that has been used by the model to make predictions. In some embodiments, current data 322 when provided for generation of a new model may be labeled with an indication of an accuracy of predictions generated by the model based on current data 322. Additional training data 346 may be provided to model training at block 312 for generation of one or more new machine learning models, updating, retraining, and / or refining of selected model 308, etc.

[0104] In some embodiments, one or more of the acts 310-320 may occur in various orders and / or with other acts not presented and described herein. In some embodiments, one ormore of acts 310-320may notbe performed. For example, in some embodiments, one or more of data partitioning of block 310, model validation of block 314, model selection of block 316, or model testing of block 318 may notbe performed.

[0105] FIG. 3 depicts a system configured fortraining, validating, testing, and using one or more machine learning models. The machine learning models are configured to accept data as input (e.g., set points provided to manufacturing equipment, sensor data, metrology data, etc.) and provide data as output (e.g., predictive data, corrective action data, classification data, etc.). Partitioning, training, validating, selection, testing, and using blocks of system 300 may be executed similarly to train a second model, utilizing different types of data.Retraining may also be performed, utilizing current data 322 and / or additional training data 346.

[0106] FIGS. 4A-C are flow diagrams of methods 400A-C associated with utilizing models to predict and / or correct substrate particle defect root causes, according to certain embodiments. Methods 400A-C may be performed by processing logic that may include hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, processing device, etc.), software (such as instructions run on a processing device, a general purpose computer system, or a dedicated machine), firmware, microcode, or a combination thereof. In some embodiment, methods 400A-C may be performed, in part, by predictive system 110. MethodAttorney Docket No.: 36119.2956 (L2093PCT)400Amay be performed, in part, by predictive system 110 (e.g., server machine 170 and data set generator 172 of FIG. 1, data set generator 272 of FIG. 2). Predictive system 110 may use method 400A to generate a data set to at least one of train, validate, or assess / test a model (e.g., a physics-based model, a reduced order model, an artificial intelligence model), in accordance with embodiments of the disclosure. Methods 400B-C may be performed by predictive server 112 (e.g., predictive component 114) and / or server machine 180 (e.g., training, validating, and testing operations may be performed by server machine 180). In some embodiments, a non-transitory machine-readable storage medium stores instructions that when executed by a processing device (e.g., of predictive system 110, of server machine 180, of predictive server 112, etc.) cause the processing device to perform one or more of methods 400A-C.

[0107] For simplicity of explanation, methods 400A-C are depicted and described as a series of operations. However, operations in accordance with this disclosure can occur in various orders and / or concurrently and with other operations not presented and described herein. Furthermore, not all illustrated operations may be performed to implement methods 400A-C in accordance with the disclosed subject matter. In addition, those skilled in the art will understand and appreciate that methods 400A-C could alternatively be represented as a series of interrelated states via a state diagram or events.

[0108] FIG. 4 A is a flow diagram of a method 400 A for generating a data set for a model, according to some embodiments. Referring to FIG. 4A, in some embodiments, at block 401 the processing logic implementing method 400A initializes a training set T to an empty set.

[0109] At block 402, processing logic generates first data input (e.g., first training input, first validating input) that may include multi-wavelength inspection images, N-dimensional intensity vectors, or other quantities related to multi-wavelength defect analysis operations. In some embodiments, the first data input may include a first set of features for types of data and a second data input may include a second set of features for types of data (e.g., as described with respect to FIG. 3). Input data may include historical data and / or data output by a model (e.g., a physics-based model output used for training an artificial intelligence model).

[0110] In some embodiments, at block 403, processing logic optionally generates a first target output for one or more of the data inputs (e.g., first data input). In some embodiments, the input includes multi-wavelength inspection images and the target output is an indication related to defect formation or classification. In some embodiments, the target output is a recommended corrective action, such as an update to a process recipe, recommendedAttorney Docket No.: 36119.2956 (L2093PCT)maintenance, seasoning, or cleaning operations, recommended component replacement, etc. In some embodiments, the first target output is predictive data.

[0111] At block 404, processing logic optionally generates mapping data that is indicative of an input / output mapping. The input / output mapping (or mapping data) may refer to the data input (e.g., one or more of the data inputs described herein), the target output for the data input, and an association between the data input(s) and the target output. In some embodiments, such as in association with artificial intelligence models where no target output is provided, block 404 may not be executed.

[0112] At block 405, processing logic adds the mapping data generated at block 404 to data set T, in some embodiments.

[0113] At block 406, processing logic branches based on whether data set T is sufficient for at least one of training, validating, and / or testing a machine learning model, such as synthetic data generator 174 or model 190 of FIG. 1. If so, execution proceeds to block 407, otherwise, execution continues back at block 402. It should be noted that in some embodiments, the sufficiency of data set T may be determined based simply on the number of inputs, mapped in some embodiments to outputs, in the data set, while in some other embodiments, the sufficiency of data set T may be determined based on one or more other criteria (e.g., a measure of diversity of the data examples, accuracy, etc.) in addition to, or instead of, the number of inputs.

[0114] At block 407, processing logic provides data set T (e.g., to server machine 180) to train, validate, and / or test machine learning model 190. In some embodiments, data set T is a training set and is provided to training engine 182 of server machine 180 to perform the training. In some embodiments, data set T is a validation set and is provided to validation engine 184 of server machine 180 to perform the validating. In some embodiments, data set T is a testing set and is provided to testing engine 186 of server machine 180 to perform the testing. In the case of a neural network, for example, input values of a given input / output mapping (e.g., numerical values associated with data inputs 210) are input to the neural network, and output values (e.g., numerical values associated with target outputs 220) of the input / output mapping are stored in the output nodes of the neural network. The connection weights in the neural network are then adjusted in accordance with a learning algorithm (e.g., back propagation, etc.), and the procedure is repeated for the other input / output mappings in data set T. After block 407, a model (e.g., model 190) can be at least one of trained using training engine 182 of server machine 180, validated using validating engine 184 of server machine 180, or tested using testing engine 186 of server machine 180. The trained modelAttorney Docket No.: 36119.2956 (L2093PCT)may be implemented by predictive component 114 (of predictive server 112) to generate predictive data 168 for performing signal processing, or for performing a corrective action associated with manufacturing equipment 124.

[0115] FIG. 4B is a flow diagram of a method 400B for performing multi-wavelength defect analysis, according to some embodiments. At block 412, process logic obtains a plurality of images of a substrate. Each of the plurality of images is associated with a different inspection wavelength. Each of the plurality of images are associated with the same spatial region of the substrate. In embodiments, multiple sets (e.g., multiple pluralities) of images may be obtained, which are associated with different spatial regions of the substrate, differentsubstrates, or the like. For example, an inspection region (e.g., field of view of an inspection system) may be on the order of a millimeter, a die of interest may be on the order of a centimeter, a substrate (e.g., semiconductor wafer) of interest may be on the order of 30 centimeters, etc.

[0116] At block 414, process logic obtains a plurality of reference frames. Each reference frame is associated with one of the images (e.g., each reference frame is associated with a location and a wavelength). The process logic may further generate the reference frames. The reference frame may be generated from images of one or more nominally identical portions of substrates to the portion pictured in the image of interest, e.g., of the plurality of images of the substrate. For example, the reference frame may be of a corresponding position or corresponding area of another die (e.g., of the same substrate, of a different substrate, of an adjacent die of the same substrate depicted in the image, etc.). The reference frame may be generated based on another location of the substrate (or another substrate) with nominally identical structures, e.g., another cell, another portion of a die that include periodic structures, or the like. The reference frame may be generated based on another portion of a repeated pattern of structures. The reference frame may be generated by combining multiple reference regions, e.g., by averaging or another statistical combination (e.g., thresholding) of multiple nearby regions. The reference frames may be generated by combining data from multiple locations of the substrate, each reference frame based on data associated with one inspection wavelength.

[0117] At block 416, process logic determines a defect of the substrate. Determining the defect is based on the plurality of images and plurality of reference frames. Determining the defect may include generating background-free, e.g., background corrected images, based on the frames of interest and the reference frames. For example, background subtraction may be used to eliminate signal from the nominally identical structures. In some embodiments,Attorney Docket No.: 36119.2956 (L2093PCT)registration methods may be utilized to align the components of the reference frame and the image, improving background cancelation. For example, die-to-die, cell-to-cell, or structure-to-structure (e.g., in a periodically repeating patterned substrate) registration methods may be utilized. In some embodiments, multiple reference frames may be generated, e.g., including different reference frame components, different adjacent structure images, or the like, and a reference frame that is predicted or calculated to perform the best background cancelation may be selected for use in analysis.

[0118] In some embodiments, an abnormality may be detected in a background-free image, e.g., indicative of a difference in signal received from a target image and associated reference frame. A determination may be made of whether the abnormality corresponds to an on-substrate defect or some other artifact, nuisance, false signal, or the like. Determining the abnormality corresponds to a defect may be based on one or more intensities extracted from background-free images, e.g., based on intensities of signals in various wavelength channels. Determining the abnormality corresponds to a defect may be based on an aggregate intensity, e.g., aggregated across the various wavelengths. For example, low intensity across wavelengths may indicate an artifact (e.g., based on poor image registration), high intensity in only a few wavelengths may indicate an artifact associated with those images, high to moderate intensity in many wavelengths may indicate a defect.

[0119] In some embodiments, abnormalities may be detected in multiple wavelength channels. In some cases, a location of the abnormalities (e.g., found via peak -finding algorithms from the background -free images) may be somewhat different between different wavelength channels. Different signals (e.g., from different wavelength channels) may be collected into predicted defects. Conglomeration of anomalies into predicted defects may include determining whether multiple anomalies are within a target radius (e.g., within a threshold distance), may be performed by a trained artificial intelligence model, or the like.

[0120] At block 418, process logic optionally determines a number of intensity values associated with the abnormality, each associated with a different wavelength and extracted from the corresponding image. Determining that the abnormalities correspond to a defect may be based on the intensity values. Determining that the abnormalities correspond to a defect may include generating a list, array, vector, or the like of the intensities across the inspection wavelengths represented in inspection images associated with the abnormality / defect. In some embodiments, a vector of intensities may be used for performing a distance calculation, e.g., distance from a target vector corresponding to a target defect, defect of interest, or the like. For example, physics-based modeling may be utilized toAttorney Docket No.: 36119.2956 (L2093PCT)determine an expected wavelength-dependent intensity response to a particular defect, and a distance between the modeled intensity profile and the measured intensity profile across the inspection wavelengths may be used to determine a likelihood that the anomaly corresponds to such a defect. In some embodiments, an aggregate intensity (e.g., average intensity, total vector magnitude, or another metric) may also be used to determine whether the anomaly corresponds to a defect, e.g., overall low intensity may be less likely to correspond to a defect. In some embodiments, multiple defect types may have associated target vectors (e.g., modeled intensity profiles across relevant inspection wavelengths). Determining a defect is not an artifact or determining a defect type may be based on a distance between an intensity vector and a reference vector associated with a predicted defect signatures satisfies a threshold condition (e.g., the distance is small enough to be confident that the defect determination is accurate). In some embodiments, boundaries (e.g., boundaries between regions corresponding to defects and artifacts, boundaries between regions of vector space corresponding to various defect types) may be placed by subject matter experts, placed empirically, placed by artificial intelligence, or the like.

[0121] At block 420, process logic optionally determines a type of the defect based on a relationship between intensity values and corresponding inspection wavelengths, e.g., based on intensity vector values. Determining the defect type may include determining a distance between a set of target vectors corresponding to various defect types. Determining the defect type may include simplifying the vector, e.g., rounding intensity values, binarizing the intensity values, or the like. Simplifying the intensity vector values may improve performance of defect type classification, may indicate an intensity lookup (e.g., comparing an intensity vector to a defect codebook, comparing intensity to predicted defect signatures, or the like) rather than a distance calculation, or the like. Binarizing the vector values (e.g., intensities under 50% of some maximum values may be converted to 0, intensities over 50% may be converted to 1) may ease comparison to a set of reference codes associated with a plurality of defect types. In some embodiments, a binarized intensity vector may be compared to one or more binary codes associated with defects, defect types, or the like.

[0122] At block 422, process logic provides an alert to a user indicative of the defect. The alert may include defect location. The alert may include defect type. The process logic may further enact further corrective actions. Process logic may make one or more updates or perform one or more corrective operationsbased on the presence of one or more defects on one or more substrates. Corrective actions may include updating a process recipe, scheduling maintenance of a process chamber, scheduling replacement of one or more components of aAttorney Docket No.: 36119.2956 (L2093PCT)process chamber, scheduling or initiating cleaning or seasoning operations (e.g., cleaning or seasoning recipes associated with operation of components of the process chamber to adjust future performance of the process chamber), or the like. One or more of the operations described in connection with FIG. 4B may be performed by algorithms, physics-based models, artificial intelligence models, or the like.

[0123] FIG. 4C is a flow diagram of a method 400C for training an artificial intelligence model to perform multi-wavelength substrate defect analysis, according to some embodiments. At block 432, process logic obtains a plurality of inspection images. The plurality of inspection images comprise images associated with a plurality of inspection wavelengths. The plurality of inspection images includes images associated with a plurality of substrates. The plurality of inspect! on images may include images targeting multiple areas of a substrate, e.g., complete scans of substrates may include many images each imaging a different portion of the substrate.

[0124] At block 434, process logic obtains a plurality of defect location data associated with the plurality of substrates. The defect location datamay be generated by metrology. The defect location data may indicate presence or absence of defects at any location imaged.

[0125] At block 436, process logic trains an artificial intelligence model. The model may be a U-net model. The model may be configured to determine defect location data based on a set of inspection images. Training the model may include providing the plurality of inspection images as training input. Training the model may include providing the plurality of defect location data as target output.

[0126] In some embodiments, the output data may comprise a map corresponding to locations of the substrate. The map may indicate defect locations, e.g., the map may indicate a likelihood of a defect being present allocations represented by the map. The map may be a binary map, e.g., pixels of the map may indicate that there is predicted to be a defect in the corresponding location or there is not expected to be a defect in the corresponding location. In some embodiments, output may be downsampled compared to input. For example, input may include an array of images, such as a 128xl28xN set of input data (e.g., 128 pixel by 128 pixel by N wavelength set of input images). Output may be a down-sampled image, e.g., a 16 by 16 grid indicating locations of detected defects in the substrate images.

[0127] At block 438, process logic optionally further trains the artificial intelligence model to perform additional tasks. For example, the model may be trained to classify defect types. Training the model to classify defect types may include providing defect type data as target output, instead or in addition to the defect location data.Attorney Docket No.: 36119.2956 (L2093PCT)

[0128] FIG. 5 depicts a logical flow 500 for performance of multi-wavelength defect analysis, according to some embodiments. Flow 500 includes broad categories of actions, pre-processing 510, defect finding 520, and defect lookup 530. Pre-processing 510 includes operations directed toward collecting inspection images and preparing the images for analysis. Defect finding 520 includes operations directed toward detecting anomalies in the inspection images which may be connected to on-substrate defects. Defect lookup 530 includes operations directed toward determining which anomalies are predicted to correspond to defects, classifying the defects, and performing actions (e.g., generating output data) based on the defects.

[0129] Flow 500 begins with process logic receiving new image frame 502. The new frame is associated with a particular portion of a substrate (e.g., related to the field of view of the inspection system). The new frame is associated with a wavelength of inspection (e.g., each frame including the new frame is acquired under a certain inspection wavelength). The inspection system could be of many designs, including different options for illumination source, illumination delivery, filtering, detection, etc. For example, illumination could be provided via a scanning model (e.g., illumination provided to the substrate is scanned through different wavelengths, either by adjusting the source, providing filtering, or the like) or a wide-field model. Scanned illumination sources may include lasers, LEDs, or other illumination scannable illumination sources. Wide field illumination could include xenon arc lamps, mercury arc lamps, supercontinuum lasers, or the like. Delivery could be performed through free space, via fiber coupling, etc. Filtering could be performed via spectral dispersion (e.g., diffraction grating, prism dispersion, etc.) or wavelength switching mechanisms (e.g., filter wheels, tunable filtering, orthe like). Detection options could include a charge-coupled device, a complementary metal oxide semiconductor detector, an array of detectors such as photomultiplier tubes or avalanche photodiodes, or the like.

[0130] Imaging techniques may be directed toward imaging multiple wavelengths simultaneously, imaging multiple points on a substrate simultaneously, or the like for improving imaging throughput. For example, a point scan may be performed with a dispersive element on the detection side, provided to an array of sensors. At each point, many wavelengths may be simultaneously probed, and images associated with the dispersed wavelengths generated. In another example scheme, illumination may be provided along a line, and detection may further be dispersed by wavelength, to generate a two-dimensional array of detection in wavelength and location along the scan line. In another example scheme, a single wavelength may be collected (e.g., by single source or filtered detection) across anAttorney Docket No.: 36119.2956 (L2093PCT)area of the substrate, and a two-dimensional detection scheme may capture information from a region of the substrate in the target wavelength. In another example scheme, broadband illumination may be provided to a two-dimensional region of the substrate, which may be collected by an array of prisms and provided to multiple arrays of detectors, for capturing two-dimensional spatial information and wavelength information in a single shot. In some embodiments, the inspection system may be an optical inspection system. The inspection system may include other wavelengths instead or in addition to optical wavelengths, e.g., infrared, ultraviolet, etc.

[0131] Once a target image frame is received, flow proceeds to calibration 504.Calibration may be used to correct for non-ideal behavior of an inspection system. In some embodiments, there may be unexpected shifts, e.g., lateral shifts, axial shifts, etc. In some embodiments, preliminary measurements maybe performed to adjust properties of incoming images, such as brightness or contrast, translation or rotation, or the like. In some embodiments, one or more processes of calibration may be performed or augmented by artificial intelligence algorithms. In some embodiments, performance of calibration may include or depend on characterization of a tool or instrument, e.g., for establishing one or more calibration parameters.

[0132] Flow proceeds to processing frame 506. Processing frame 506 may include making adjustments to the incoming frame in accordance with the calibration operations to conclude pre-processing 510, upon which the image data is provided for defect finding 520.

[0133] The calibrated, pre-proceed frame is provided to find corresponding structures 508. The corresponding structures may be used for performing background negation operations. Background subtraction maybe performed based on die-to-die correction, cell-to-cell correction, or the like. For some substrates (e.g., integrated circuits), multiple regions of a substrate may be nominally identical, called dies. Images generated from corresponding regions on other dies of a substrate may be utilized for background correction operations. In some embodiments, adjacent dies may be convenient, as increased on-substrate distance between regions used for generating reference frames and regions associated with target frames may increase a probability of poor background subtraction, e.g., due to substrate warping or other effects aggravated over comparatively large distances.

[0134] For some substrates (e.g., substrates for use as memory devices), a pattern is repeated on a substrate. Another portion of the substrate (e.g., an image frame associated with a nearby portion) may include a nominally identical pattern, and may be used for background cancelation operations. In some embodiments, multiple corresponding structures may beAttorney Docket No.: 36119.2956 (L2093PCT)utilized. For example, 4 adjacent dies may be used for background cancelation. In some embodiments, several combinations of potential corresponding structure images may be collected and utilized for background correction, with a “best” result (e.g., based on measurements of degree of background cancelation or another metric of interest) used for further processing operations.

[0135] Flow continues to register frames 509. Registering frames (e.g., registering the reference frames including corresponding structuresto the target frame) may be performed to ensure the information can be well extracted. High quality background subtraction may markedly improve performance of the multi-wavelength defect analysis operations. Many algorithms may be used for registration, including intensity-based methods (e.g., mutual information, cross correlation, normalized cross correlation, etc.), feature-based methods (e.g., scale-invariant feature transform, Harris comer detection, etc.), Optimization-based methods (e.g., gradient descent, Powell’s method, etc.), or the like. Artificial intelligence methods may also be used for registration, such as a convolutional neural network, spatial transformer network, feature pyramid network, or the like.

[0136] Flow continues to generate reference frame 512. Generating the reference frame may be based on the one or more corresponding structure images, and may include averaging maximum intensity projection, minimum intensity projection, or the like. Generating the reference frame may be based on improving background cancelation, e.g., by leveraging reference data from neighboring structures. After generating the reference frame, flow can proceed to background correction 514. Background correction may utilize the target frame and the reference frame to produce a background-free image, e.g., via background subtraction. If registration and generation of a reference frame is performed well, the highest intensity signals of background-free inspection images are likely to be defects or measurement artifacts. In some embodiments, using a subset of available reference images to generate the reference frame may cause better performance, e.g., when a reference frame and the target frame both have a defect. In such cases, metrics, models, or algorithms may be applied to improve performance of the reference frame, e.g., based on a completeness of background subtraction.

[0137] Flow continues to spot detection 516. Spot detection 516 may include one or more algorithms or other methods for finding spots of intensity in the background corrected images, e.g., likely defects. Spot detection 516 may include gaussian feeding, maximum intensity peak finding, difference of gaussians, Laplacian of gaussians, Hogue circle finding, or other algorithms.Attorney Docket No.: 36119.2956 (L2093PCT)

[0138] Flow continues to loop over first metric 518 and over second metric 522. In some embodiments, the first and second metrics may be wavelength and region of the substrate. Looping by both metrics may be performed until spot detection has been performed on all areas of interest of the substrate, on inspection images associated with all inspection wavelengths of interest. Operations of defect lookup 530 may then be performed.

[0139] Flow continues from defect finding 520 to congregate defects 524. In some embodiments, defect detection associated with a particular region of the substrate may flag anomalies in multiple locations, e.g., slightly different locations in different wavelength channels. Congregating defects may include generating a table of locations, intensities, and wavelengths corresponding to detected anomalies. Congregating defects may include grouping anomalies into likely defect groups, e.g., similarly positioned defect signals across multiple wavelength channels may be congregated with positions averaged or otherwise aggregated.

[0140] Flow continues to generate intensity vectors 526. Vectors for each congregated defect may be generated. The vectors may be N-dimensional, where N is the number of wavelength channels of interest. The vector values maybe intensities of the defect signal. In some embodiments, the vector values may be simplified, e.g., rounded, binarized, or the like.

[0141] Flow continues to classify defect vs. artifact 528. Attributes may be utilized to classify whether an anomaly flagged and congregated corresponds to an actual defect or an artifact, nuisance, or the like. Intensity vectors, the congregate table, or another input may be utilized in determining whether a detected anomaly is likely to be an on-substrate defect or some other artifact. In some embodiments, intensity vectors may be compared to known or predicted intensity responses, e.g., a defect codebookto determine whether they are likely to be defects or artifacts. In some embodiments, an aggregate or total intensity may be used to determine whether the anomaly corresponds to a defect. In some embodiments, a distance metric of the vector from one or more target vectors may be utilized to determine whether the anomaly corresponds to a defect. In some embodiments, an artificial intelligence model may perform classification operations.

[0142] Flow continues to classify defect types 532. Classification of defect types may be performed based on one or more vectors of intensities, the congregate table, or the like. Boundaries may be drawn in N-dimensional vector space between defect types. In some embodiments, boundary drawing may be performed by an artificial intelligence model. In some embodiments, classification operations may be performed by an artificial intelligence model. In some embodiments, distance from one or more defect fingerprints (e.g., vectorAttorney Docket No.: 36119.2956 (L2093PCT)intensity codebook) may be utilized in classifying defect types. Output of defect analysis may then be provided via generate defect list 534, for the use in further actions, corrective actions, providing alerts to users, etc.

[0143] In some embodiments, many of the operations described with respect to FIG. 5 may be replaced by operations of a trained artificial intelligence model, such as a U-net model. A artificial intelligence model (e.g., a neural network such as a deep learning model, or the like) may abstract much of the complexity of the process. In some embodiments, a more thorough analysis procedure maybe used, and output of the analysis procedure may be provided to train, retrain, or update an artificial intelligence model, e.g., for improved computational efficiency for future substrates. In some embodiments, an artificial intelligence model may enter in various portions of the flow described in FIG. 5. For example, the model may be trained to receive as input the raw inspection images, calibrated images, or background corrected images and generate as output predictions on defect locations, defect types, etc.

[0144] FIG. 6 is a block diagram illustrating a computer system 600, according to some embodiments. In some embodiments, computer system 600 may be connected (e.g., via a network, such as a Local Area Network (LAN), an intranet, an extranet, or the Internet) to other computer systems. Computer system 600 may operate in the capacity of a server or a client computer in a client-server environment, or as a peer computer in a peer-to-peer or distributed network environment. Computer system 600 may be provided by a personal computer (PC), a tablet PC, a Set-Top Box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a network router, switch or bridge, or any device capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that device. Further, the term "computer" shall include any collection of computers that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methods described herein.

[0145] In a further aspect, the computer system 600 may include a processing device 602, a volatile memory 604 (e.g., Random Access Memory (RAM)), a non-volatile memory 606 (e.g., Read-Only Memory (ROM) or Electrically -Erasable Programmable ROM (EEPROM)), and a data storage device 618, which may communicate with each other via a bus 608.

[0146] Processing device 602 may be provided by one or more processors such as a general purpose processor (such as, for example, a Complex Instruction Set Computing (CISC) microprocessor, a Reduced Instruction Set Computing (RISC) microprocessor, a Very Long Instruction Word (VLIW) microprocessor, a microprocessor implementing other typesAttorney Docket No.: 36119.2956 (L2093PCT)of instruction sets, or a microprocessor implementing a combination of types of instruction sets) or a specialized processor (such as, for example, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), or a network processor).

[0147] Computer system 600 may further include a network interface device 622 (e.g., coupled to network 674). Computer system 600 also may include a video display unit 610 (e.g., an LCD), an alphanumeric input device 612 (e.g., a keyboard), a cursor control device 614 (e.g., a mouse), and a signal generation device 620.

[0148] In some embodiments, data storage device 618 may include a non-transitory computer-readable storage medium 624 (e.g., non-transitory machine-readable medium, non-transitory machine-readable storage medium, or the like) on which may store instructions 626 encoding any one or more of the methods or functions described herein, including instructions encoding components of FIG. 1 (e.g., predictive component 114, corrective action component 122, model 190, etc.) and for implementing methods described herein. The non-transitory machine-readable storage medium may store instructions which are used to execute methods related to multi-wavelength defect analysis operations, adjusting processing system operations to improve substrate processing operations, reducing defect generation during substrate processing, or the like.

[0149] Instructions 626 may also reside, completely or partially, within volatile memory 604 and / or within processing device 602 during execution thereof by computer system 600, hence, volatile memory 604 and processing device 602 may also constitute machine-readable storage media.

[0150] While computer-readable storage medium 624 is shown in the illustrative examples as a single medium, the term "computer-readable storage medium" shall include a single medium or multiple media (e.g., a centralized or distributed database, and / or associated caches and servers) that store the one or more sets of executable instructions. The term "computer-readable storage medium" shall also include any tangible medium that is capable of storing or encoding a set of instructions for execution by a computer that cause the computer to perform any one or more of the methods described herein. The term "computer-readable storage medium" shall include, but not be limited to, solid-state memories, optical media, and magnetic media.

[0151] The methods, components, and features described herein maybe implemented by discrete hardware components or may be integrated in the functionality of other hardware components such as ASICS, FPGAs, DSPs or similar devices. In addition, the methods,Attorney Docket No.: 36119.2956 (L2093PCT)components, and features may be implemented by firmware modules or functional circuitry within hardware devices. Further, the methods, components, and features may be implemented in any combination of hardware devicesand computer program components, or in computer programs.

[0152] Unless specifically stated otherwise, terms such as “receiving,” “performing,” “providing,” “obtaining,” “causing,” “accessing,” “determining,” “adding,” “using,” “training,” “reducing,” “generating,” “correcting,” or the like, refer to actions and processes performed or implemented by computer systems that manipulates and transforms data represented as physical (electronic) quantities within the computer system registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices. Also, the terms "first," "second," "third," "fourth," etc. as used herein are meant as labels to distinguish among different elements and may not have an ordinal meaning according to their numerical designation.

[0153] Examples described herein also relate to an apparatus for performing the methods described herein. This apparatus may be specially constructed for performing the methods described herein, or it may include a general purpose computer system selectively programmed by a computer program stored in the computer system. Such a computer program may be stored in a computer-readable tangible storage medium.

[0154] The methods and illustrative examples described herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used in accordance with the teachings described herein, or it may prove convenient to construct more specialized apparatus to perform methods described herein and / or each of their individual functions, routines, subroutines, or operations. Examples of the structure for a variety of these systems are set forth in the description above.

[0155] The above description is intended to be illustrative, and not restrictive. Although the present disclosure has been described with references to specific illustrative examples and embodiments, it will be recognized that the present disclosure is not limited to the examples and embodiments described. The scope of the disclosure should be determined with reference to the following claims, along with the full scope of equivalents to which the claims are entitled.

Claims

Attorney Docket No.: 36119.2956 (L2093PCT)CLAIMSWhat is claimed is:

1. A method, comprising:obtaining a plurality of images of a substrate, each of the plurality of images associated with a different inspection wavelength;obtaining a plurality of reference frames, each associated with one of the images; based on the plurality of images and the plurality of reference frames, determining a defect of the substrate; andproviding an alert to a user indicative of the defect.

2. The method of claim 1, wherein determining the defect comprises:utilizing a first image of the plurality of images and a first reference frame of the plurality of reference frames to generate a first background-free image; anddetecting an abnormality in the first background-free image.

3. The method of claim 2, wherein the first reference frame is generated based on one or more regions of the substrate, different than a region associated with the first image.

4. The method of claim 3, wherein the one or more regions comprise corresponding areas of one or more dies of the substrate different than a die associated with the first image, or one or more areas of the die associated with the first image, wherein the die associated with the first image comprises a repeated pattern of structures.

5. The method of claim 2, further comprising:determining a first intensity, the first intensity associated with the first image and the abnormality;utilizing a second image of the plurality of images and a second reference frame of the plurality of reference frames to generate a second background-free image;detecting the abnormality in the second background-free image;determining a second intensity, the second intensity associated with the second image and the abnormality; andbased on the first intensity, the second intensity, and a distance metric, determining that the abnormality corresponds to the defect.Attorney Docket No.: 36119.2956 (L2093PCT)6. The method of claim 5, further comprising determining a type of the defectbasedon a relationship between the first intensity and the second intensity, and a first inspection wavelength associated with the first intensity and a second inspection wavelength associated with the second intensity.

7. The method of claim 6, wherein determining the type of defect comprises converting intensities associated with the abnormality to a binary code, each digit of the binary code corresponding to one of the inspection wavelengths, and comparing the binary code to a plurality of binary codes associated with a plurality of defect types.

8. The method of claim 5, wherein determining the abnormality corresponds to the defect comprises:determining a vector of intensities associated with the abnormality, each intensity corresponding to one of the inspection wavelengths;determining an aggregate intensity of the vector; anddetermining that a combination of a distance between the vector of intensities and one or more reference vectors associated with predicted defect signatures and the aggregate intensity satisfies a threshold condition.

9. The method of claim 1, further comprising performing a corrective action based on the defect, wherein the corrective action comprises one or more of:updating a process recipe associated with the substrate;scheduling maintenance of a process chamber associated with the substrate; or scheduling replacement of one or more components of a process tool associated with the substrate.

10. The method of claim 1, wherein determining the defect comprises:determining a first provisional location of the defect based on a first inspection wavelength;determining a second provisional location of the defect based on a second inspection wavelength; anddetermining that the first provisional location is within a threshold distance of the second provisional location.Attorney Docket No.: 36119.2956 (L2093PCT)11. A method, comprising:obtaining, by a processing device, a plurality of inspection images, the plurality of inspection images comprising:images associated with a plurality of inspection wavelengths, and images associated with a plurality of substrates;obtaining a plurality of defect location data associated with the plurality of substrates; train an artificial intelligence model to determine defect location data based on a set of inspection images, the training comprising providing the plurality of inspection images as training input and the plurality of defect location data as target output.

12. The method of claim 11, wherein the artificial intelligence model comprises as U-net model.

13. The method of claim 11, wherein the artificial intelligence model is configured to output an image indicating defect locations, and wherein the output image is down-sampled compared to an associated input image.

14. The method of claim 11, wherein the artificial intelligence model is configured to receive, as input, a plurality of inspection images, each associated with a target spatial region of a substrate, and each associated with one of the plurality of inspection wavelengths.

15. The method of claim 11, further comprising providing defect type data as target output, wherein the artificial intelligence model is further trained to determine a defect type based on the set of inspection images.

16. A non-transitory machine-readable storage medium, storing instructions which, when executed, cause a processing device to perform operations comprising:obtaining a plurality of images of a substrate, each of the plurality of images associated with a different inspection wavelength;obtaining a plurality of reference frames, each associated with one of the images; based on the plurality of images and the plurality of reference frames, determining a defect of the substrate; andproviding an alert to a user indicative of the defect.Attorney Docket No.: 36119.2956 (L2093PCT)17. The non-transitory machine-readable storage medium of claim 16, wherein determining the defect comprises:utilizing a first image of the plurality of images and a first reference frame of the plurality of reference frames to generate a first background-free image; anddetecting an abnormality in the first background-free image.

18. The non-transitory machine-readable storage medium of claim 17, wherein the first reference frame is generated based on one or more regions of the substrate, different than a region associated with the first image.

19. The non-transitory machine-readable storage medium of claim 17, wherein the operations further comprise:determining a first intensity, the first intensity associated with the first image and the abnormality;utilizing a second image of the plurality of images and a second reference frame of the plurality of reference frames to generate a second background-free image;detecting the abnormality in the second background-free image;determining a second intensity, the second intensity associated with the second image and the abnormality; andbased on the first intensity, the second intensity, and a distance metric, determining that the abnormality corresponds to the defect.

20. The non-transitory machine-readable storage medium of claim 19, wherein determining the abnormality corresponds to the defect comprises;determining a vector of intensities associated with the abnormality, each intensity corresponding to one of the inspection wavelengths;determining an aggregate intensity of the vector; anddetermining that a combination of a distance between the vector of intensities and one or more reference vectors associated with predicted defect signatures and the aggregate intensity satisfies a threshold condition.