Generation of synthetic microscope images of substrates

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
Machine learning models trained on spectral and measurement data generate synthetic microscope images and CD profiles, addressing the limitations of conventional predictive modeling by optimizing model selection and reducing resource consumption.

JP2026520806APending Publication Date: 2026-06-25APPLIED MATERIALS INC

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: APPLIED MATERIALS INC
Filing Date: 2024-05-24
Publication Date: 2026-06-25

AI Technical Summary

Technical Problem

Conventional predictive modeling algorithms struggle to accurately predict spatial variations in process results, such as critical dimensions (CD) profiles across substrates, often requiring destructive and costly measurements, and are limited by the variability of measurement techniques and high resource consumption.

Method used

A method involving machine learning models trained on spectral and measurement data to generate synthetic microscope images and CD profiles, using feature model configurations and combinations to optimize model selection, enabling non-destructive and cost-effective prediction of substrate features.

Benefits of technology

Enables accurate and efficient generation of synthetic microscope images and CD profiles, reducing material waste and costs associated with destructive measurements while maintaining high prediction accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026520806000001_ABST

Patent Text Reader

Abstract

The method includes processing measurement data of a substrate processed according to a manufacturing process using a first trained machine learning model to predict the substrate's limit dimension (CD) profile. The method further includes generating a CD profile prediction image based on the predicted CD profile of the substrate. The method further includes processing the CD profile prediction image using a second trained machine learning model to generate a composite microscope image associated with the substrate.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0004] ,

[0001] This disclosure relates to an image of a substrate. More particularly, this disclosure relates to generating a synthetic microscope image of a substrate.

Background Art

[0002] Products may be produced by performing one or more manufacturing processes using manufacturing equipment. For example, semiconductor manufacturing equipment may be used to produce a substrate by a semiconductor manufacturing process. Products are produced such that specific characteristics are suitable for a target application. Machine learning models are used in various process control and prediction functions associated with manufacturing equipment. The machine learning models are trained using data associated with the manufacturing equipment. Images of products (e.g., manufactured devices) may be taken, and those images may, for example, enhance the understanding of device function, faults, and / or performance, or may be used for measurement or inspection.

Summary of the Invention

[0003] The following is a simplified summary of the present disclosure to provide a basic understanding of some aspects of the present disclosure. This summary is not an extensive overview of the present disclosure. This summary is not intended to identify key or critically important elements of the present disclosure, nor is it intended to limit the scope of the specific embodiments of the present disclosure or the scope of the claims. The sole purpose of this summary is to present some concepts of the present disclosure in a simplified form as a prelude to the more detailed description that follows.

[0004] In one aspect of this disclosure, the method includes receiving spectral data of a substrate and measurement data corresponding to the spectral data of the substrate. The method further includes determining a plurality of feature model configurations for each of a plurality of feature models, wherein each of the plurality of feature model configurations includes one or more feature model conditions. The method further includes determining a plurality of feature model combinations, wherein each of the plurality of feature model combinations includes a subset of the plurality of feature model configurations. The method further includes generating a plurality of input datasets, wherein each of the plurality of input datasets is generated on the basis of applying spectral data to each of the plurality of feature model combinations. The method further includes training a plurality of machine learning models, wherein each machine learning model is trained to produce an output using an input dataset from a plurality of input datasets and measurement data. The method further includes selecting from the plurality of trained machine learning models a trained machine learning model that satisfies one or more selection criteria.

[0005] In another aspect of this disclosure, a non-temporary machine-readable storage medium is disclosed. The storage medium stores instructions, which, when executed, cause a processing device to perform an operation. The operation includes receiving spectral data of a substrate and measurement data corresponding to the spectral data of the substrate. The operation includes determining a plurality of feature model configurations for each of a plurality of feature models, further including determining that each of the plurality of feature model configurations includes one or more feature model conditions. The operation includes determining a plurality of feature model combinations, further including determining that each of the plurality of feature model combinations includes a subset of the plurality of feature model configurations. The operation includes generating a plurality of input datasets, each of which input datasets is generated on the basis of applying spectral data to each of the plurality of feature model combinations. The operation includes training a plurality of machine learning models, each machine learning model is trained to produce an output using an input dataset from a plurality of input datasets and measurement data. The operation further includes selecting a trained machine learning model from among several trained machine learning models that satisfies one or more selection criteria.

[0006] Additional aspects of the present disclosure include a system, the system including a memory and a processing device coupled to the memory. The processing device is configured to receive spectral data of a substrate and measurement data corresponding to the spectral data of the substrate. The processing device is further configured to determine a plurality of feature model configurations for each of a plurality of feature models, wherein each of the plurality of feature model configurations includes one or more feature model conditions. The processing device is further configured to determine a feature model combination, wherein the feature model combination includes a subset of the plurality of feature model configurations. The processing device is further configured to generate an input dataset, wherein the input dataset is generated based on applying spectral data to the feature model combination. The processing device is further configured to train a plurality of machine learning models, wherein each machine learning model is trained to produce an output using the input dataset and measurement data. The processing device is further configured to select from the plurality of trained machine learning models a trained machine learning model that satisfies one or more selection criteria.

[0007] In another aspect of the present disclosure, the method includes processing measurement data of a substrate processed according to a manufacturing process using a first trained machine learning model to predict the limit dimension (CD) profile of the substrate. The method further includes generating a CD profile prediction image based on the predicted CD profile of the substrate. The method further includes processing the CD profile prediction image using a second trained machine learning model to generate a composite microscope image associated with the substrate.

[0008] In another aspect of the present disclosure, the method includes receiving a plurality of scanning electron microscope (SEM) images and a plurality of CD measurements related to a substrate. The method further includes generating a plurality of CD profile images based on the plurality of CD measurements. The method further includes generating an input dataset comprising the plurality of SEM images and the plurality of CD profile images. The method is to train a machine learning model using the input dataset, and training the machine learning model further includes providing the plurality of CD measurements to the machine learning model as training inputs and providing the plurality of SEM images to the machine learning model as target outputs.

[0009] In another aspect of this disclosure, a non-transient machine-readable storage medium is disclosed. The storage medium stores instructions, which, when executed, cause a processing device to perform an operation. The operation includes processing measurement data of a substrate processed according to a manufacturing process using a first trained machine learning model to predict the CD profile of the substrate. The operation further includes generating a CD profile prediction image based on the predicted CD profile of the substrate. The operation further includes processing the CD profile prediction image using a second trained machine learning model to generate a composite microscope image associated with the substrate.

[0010] In another aspect of this disclosure, a non-temporary machine-readable storage medium is disclosed. The storage medium stores instructions, which, when executed, cause a processing device to perform an operation. The operation includes receiving a plurality of SEM images and a plurality of CD measurements related to a substrate. The operation further includes generating a plurality of CD profile images based on the plurality of CD measurements. The operation further includes generating an input dataset containing the plurality of SEM images and the plurality of CD profile images. The operation is to train a machine learning model using the input dataset, and training the machine learning model further includes providing the plurality of CD measurements to the machine learning model as training inputs and providing the plurality of SEM images to the machine learning model as target outputs.

[0011] Additional aspects of the present disclosure include a system, the system including a memory and a processing device coupled to the memory. The processing device is configured to process measurement data of a substrate processed according to a manufacturing process using a first trained machine learning model to predict the CD profile of the substrate. The processing device is further configured to generate a CD profile prediction image based on the predicted CD profile of the substrate. The processing device is further configured to process the CD profile prediction image using a second trained machine learning model to generate a composite microscope image associated with the substrate.

[0012] In the attached drawings, this disclosure is shown as an example and is not intended to limit it. [Brief explanation of the drawing]

[0013] [Figure 1] This block diagram shows an exemplary system architecture in several embodiments. [Figure 2A] This is a block diagram of a system including an exemplary dataset generator for generating datasets for one or more supervised models, according to several embodiments. [Figure 2B] This is a block diagram of an exemplary dataset generator for generating datasets for a supervised model configured to generate anomaly indicators, according to several embodiments. [Figure 3] A block diagram showing a system for generating output data according to several embodiments. [Figure 4A-4B] This is a flowchart of methods 400A-B related to the training and use of machine learning models according to one embodiment. [Figure 5A] This block diagram shows feature modeling and training of a set of machine learning models in several embodiments. [Figures 5B-5D] This figure shows a few examples of different combinations of feature model combinations and machine learning models that can be trained in parallel. [Figure 6] This is a flowchart of a method for model training and verification according to several embodiments. [Figure 7] This is a block diagram showing the output processing of cross-sectional measurement data according to several embodiments. [Figure 8] This is a block diagram showing feature modeling in several embodiments. [Figure 9A] This is a scatter plot of single-output training and test predictions for several embodiments of machine learning models. [Figure 9B] This is a time series diagram of single-output predictions from trained machine learning models in several embodiments. [Figure 9C] This is a wafer map of single-output predictions of machine learning models in several embodiments. [Figure 10A-10B] This is a flowchart of methods 1000A-B related to generating synthetic microscope images by training and utilizing a machine learning model, according to one embodiment. [Figure 11A] This block diagram shows the process of training a machine learning model to generate synthetic microscope images. [Figure 11B] This is a block diagram showing the generation of synthetic microscope images. [Figure 12] This block diagram shows the processing of scanning electron microscope (SEM) images according to several embodiments. [Figure 13] This is a block diagram related to the generation of a CD profile according to several embodiments. [Figure 14A] This is a block diagram of generative adversarial networks in several embodiments. [Figure 14B] This block diagram shows exemplary machine learning architectures for generating synthetic data, according to several embodiments. [Figure 14C] This is a flowchart of a method for training a machine learning model to generate realistic synthetic microscope images, according to several embodiments. [Figure 14D]A flowchart of a method for generating a synthetic microscope image using a machine learning-based trained image generator according to some embodiments. [Figure 15] A block diagram showing an adversarial generation network according to some embodiments. [Figure 16] An example of a synthetic microscope image according to some embodiments. [Figure 17] A block diagram showing a computer system according to some embodiments.

Embodiments for Carrying Out the Invention

[0014] This specification describes techniques related to generating synthetic microscope images of substrates. This specification further describes techniques related to predicting the feature profiles (e.g., critical dimension (CD) profiles) of processed substrates, and training a model to predict such feature profiles.

[0015] Manufacturing equipment is used to produce products (e.g., semiconductor devices) by performing one or more operations on a substrate (e.g., a wafer). The manufacturing equipment may include a manufacturing chamber or a processing chamber for isolating the substrate from the external environment and for performing one or more processes on the substrate. The characteristics of the processed substrate are intended to meet target values to facilitate certain functions. Manufacturing parameters are selected to produce a substrate (e.g., a patterned substrate) that meets the target characteristic values. Many manufacturing parameters (e.g., hardware parameters, process parameters, etc.) contribute to the characteristics of the processed substrate. The manufacturing system may control the parameters by specifying a setpoint for the characteristic value, receiving data from sensors disposed within the manufacturing chamber, and adjusting the manufacturing equipment until the sensor readings match the setpoint. In some embodiments, a trained machine learning model is utilized to improve the performance of the manufacturing equipment.

[0016] Embodiments disclosed herein include a trained machine learning model that receives measurement data of a processed substrate and estimates one or more features or parameters of the processed substrate. The measurement data may be generated by a measurement tool, which may be connected to the same mainframe as the processing chamber, and the measurement tool may receive and measure the processed substrate immediately after processing (e.g., before performing subsequent processes on the substrate).

[0017] In some cases, predictive modeling algorithms (such as virtual metrology (VM) algorithms) may be used to predict values such as critical dimensions or film thickness, using sensor data collected from the processing chamber during processing of the processed substrate. However, such predictive modeling algorithms that predict features (e.g., CD) based on sensor data generally cannot predict spatial variations in process results (e.g., changes in CD across the surface of the processed substrate). In contrast, embodiments described herein can predict such spatial variations in process results, such as CD profiles or thickness profiles across the substrate.

[0018] Embodiments disclosed herein further cover techniques for training a machine learning model to perform feature profile prediction (e.g., CD profile prediction) across a processed substrate. In the embodiments, spectral data of the processed substrate and related measurement data (e.g., scanning electron microscope (SEM) images) are received. The processing logic determines multiple feature model configurations for multiple different feature models, each of which includes one or more feature model conditions. Examples of feature models include principal component analysis (PCA) models, independent component analysis (ICA) models, and fast Fourier transform (FFT) models. Each feature model correlates features of the input data (e.g., spectral data and / or reflectometry data) with a feature profile (e.g., CD profile). The processing logic determines multiple feature model combinations, each of which includes a subset of feature model configurations. The processing logic generates multiple input datasets, each input dataset generated by applying spectral data to each of several feature model combinations. The processing logic then trains several machine learning models, each machine learning model being trained to produce an output using an input dataset from the multiple input datasets and measurement data. The processing logic then compares the accuracy of the different trained models and may select from the multiple trained machine learning models a trained machine learning model that satisfies one or more selection criteria (e.g., the trained machine learning model with the highest confidence or accuracy).

[0019] Machine learning models are trained by using algorithms and inputting data for training the machine learning model. Based on criteria such as accuracy, processor utilization, processing speed, and / or memory utilization, a machine learning model trained with a particular algorithm may be selected if it performs better than another model trained with a different algorithm. The algorithms available for training machine learning models are generally limited, and therefore, a limited number of configurations exist when attempting to optimize a machine learning model. Many trained machine learning models do not meet the criteria for accuracy, processor utilization, processing speed, and / or memory utilization. Embodiments described herein test multiple different machine learning models in parallel, and each of these multiple different machine learning models may be associated with a different feature model and / or combination of feature models. Based on this testing, the optimal machine learning model may be selected.

[0020] Machine learning models may be applied in several ways related to processing chambers and / or manufacturing equipment. A machine learning model may take sensor data measuring the values of characteristics within the processing chamber as input. A machine learning model may be configured to predict process results, e.g., measurement results of the finished product. A machine learning model may take insite data (e.g., reflection spectroscopy data of a semiconductor wafer during an etching process) and / or exsite data (e.g., measurement data related to the workpiece or substrate) related to a workpiece or substrate as input. This machine learning model may be configured to estimate one or more characteristics (e.g., limit dimensions) of a manufactured device on a substrate or workpiece. In some embodiments, a machine learning model may accept measurement data of a substrate (e.g., generated after the process is complete) and / or insite data of a substrate (e.g., generated during the manufacturing process) as input. This measurement data and / or insite data may generally be data collected with or without damaging the substrate. Many measurements are destructive and damage the substrate, such as those performed by cross-sectional imaging tools including cross-sectional scanning electron microscopy (XSEM) or transmission electron microscopy (TEM). Machine learning models may be configured to produce outputs such as one or more measurements (e.g., critical dimension (CD) measurements) and / or measurement or feature profiles (e.g., CD profiles) that would typically be generated after substrate failure, synthetic microscope images, and predictions of the root cause of product anomalies (e.g., processing defects). These are just a few representative examples of machine learning applications related to manufacturing equipment, though many others exist.

[0021] In some embodiments, product measurements may be predicted (e.g., using machine learning models, models based on physical phenomena, etc.). Measurement prediction may be performed by considering target conditions in the processing chamber, measured conditions near the substrate, and in-process measurements of the product during processing (e.g., during processing). In some embodiments, the predicted measurements may include several predictions of product dimensions, such as a prediction of the thickness at the center of the substrate, a prediction of the thickness profile across the substrate, and a prediction of the CD profile across the substrate.

[0022] In some embodiments, measurement data of a product may be measured. Performing a measurement may involve obtaining one or more microscopic images of the product. Microscopic images may include images captured using optical techniques, images captured using electron-based techniques (e.g., scanning electron microscope (SEM) images, transmission electron microscope (TEM) images, etc.), or images captured using other similar techniques. In some embodiments, measurements of internal structures may be measured. Measuring the internal structure of a product may involve cutting a cross section of the product and taking images of the internal structure. Advantages of performing microscopy of a cross section of a product include imaging internal structures that are not normally visible, providing the ability to measure (e.g., from microscopic images) the dimensions of structures that would otherwise not be measured or predicted, and ensuring that predictive models maintain threshold accuracy. Measurements (e.g., standalone measurements, measurements performed outside the processing chamber, etc.) may be costly and / or time-consuming to perform. Measurements (e.g., cross-sectional microscopy) may damage or destroy the processed product. Measurements may be noisy, and the measurement data may require preprocessing.

[0023] In some embodiments, microscopic images may be used when training a machine learning model. For example, to train a machine learning model, dimensions measured by performing cross-sectional measurements (e.g., by SEM) may be provided as a target output for predicting the internal dimensions of a processed product. In some embodiments, large amounts of data are used to train a machine learning model. Hundreds, thousands, or even more substrates may be used when training a machine learning model. The cost of performing comprehensive measurements, such as the time spent, materials consumed, products destroyed and discarded, and the energy and processing equipment used in production, may be multiplied by this large amount of data collected for training.

[0024] In some embodiments, microscopic images (e.g., contrast) may vary due to inconsistent measurement techniques, differences in cross-section formation or exposure resulting from different imaging technicians or other similar factors. Performing measurements using several images can be challenging, further increasing the cost of generating sufficient data to train machine learning models.

[0025] The methods and devices disclosed herein may solve one or more of the shortcomings of conventional solutions. In some embodiments, realistic microscopic images (e.g., top views, cross-sectional views, etc.) of the processed product are generated. These images may be used for the measurement, visualization, input, or training of machine learning models or other models.

[0026] In the embodiment, one or more trained machine learning models may be used to generate a synthetic microscope image of the product. A first trained machine learning model may be used to generate a synthetic CD profile (e.g., a predicted CD profile) of the product. The synthetic CD profile may then be input to a generative model, which will generate a synthetic microscope image.

[0027] A machine learning model may be trained by using input data (i.e., measurement data and spectral data) and applying a feature model combination, which includes feature model configurations and feature model conditions, to the input data. An input dataset may be generated using spectral data and feature model combinations. From among several machine learning models trained using different feature model combinations, a machine learning model that satisfies certain criteria may be selected. The selected machine learning model may be a model trained to generate a feature profile (e.g., CD profile) of the processed substrate.

[0028] In some embodiments, a large amount of historical image data may be available. For example, a large amount of historical data related to related products may be available for use when training a synthetic microscope image generation machine learning model, including related products of different designs, older generations of products, related manufacturing processes, related process recipes, etc. In some embodiments, a generative adversarial network (GAN) generator model may be configured to generate synthetic data consistent with the distribution of true data, for example, synthetic data that is statistically and structurally similar to true microscope images.

[0029] In some embodiments, the microscope image generator may be configured to accept one or more feature profiles of the substrate (e.g., the substrate's CD profile) as input. In some embodiments, a machine learning model associated with the processing chamber (e.g., a machine learning model configured to accept reflectometry data and / or other generated optical data of the substrate after processing as input and to produce a predicted CD profile or other feature profile as output) may generate an output used by the image generator to produce a composite microscope image. In some embodiments, in-situ measurement values (e.g., spectral measurement values of the substrate during processing) may be used as input to the image generator. In some embodiments, integrated measurement values (e.g., measurement values taken while the substrate is not being processed but is still in a vacuum) or in-line measurement values (e.g., measurement values from equipment coupled to the processing equipment but outside the vacuum) may be used by the image generator to produce a composite measurement image. In some embodiments, independent measurement values (e.g., measurement values measured in a measurement facility, such as measurement values that do not damage the substrate or do not involve cross-sectional formation, and are less invasive or destructive than a target image) may be used as input to the image generator.

[0030] In some embodiments, the CD profile generator may be configured to receive spectral data related to the substrate as input. In some embodiments, a machine learning model related to the processing chamber (e.g., a machine learning model configured to receive sensor data during processing and / or spectral data generated after processing as input and to produce an indication of one or more predicted measurements of the product as output) may generate an output used by the synthetic CD profile generator to generate a synthetic CD profile. In some embodiments, in-site measurement values (e.g., spectral measurement values of the substrate during processing) may be used as input to the synthetic CD profile generation. In some embodiments, integrated measurement values (e.g., measurement values taken while the substrate is not being processed but is still in a vacuum) or in-line measurement values (e.g., measurement values from equipment coupled to the processing equipment but outside the vacuum) may be used by the synthetic CD profile generator to generate a synthetic CD profile. In some embodiments, standalone measurement values (e.g., measurement values measured in a measurement facility, such as measurement values that are less invasive or destructive than the target CD profile, such as measurement values that do not damage the substrate or do not involve cross-section formation) may be used as input to the CD profile generator.

[0031] In some embodiments, the data used to train a generator model that generates a composite microscope image, and / or a machine learning model that generates a CD profile or feature profile, may be labeled with one or more attributes. The attributes may include labels that identify one or more features of the data. The attribute information may include data that instructs a process recipe related to the product, for example, data that instructs a series of recipe operations. The attribute information may include structural information of the product, for example, instructions for rules regarding the order and / or arrangement of parts of the product during imaging. The attribute information may include data that instructs the product design. The attribute information may include target features of the output data, for example, a color scale, contrast values and / or luminance values of the target composite image. The attributes may include labels that identify the state of the manufacturing system, for example, labels for defects present in the processing equipment, instructions for the time since the manufacturing equipment was installed or maintained.

[0032] In some embodiments, a composite image generator may accept a CD profile, a predicted CD profile, or a CD profile predicted image, for example, a CD profile, a predicted CD profile, or a CD profile predicted image output by a CD profile predictor generator model, as input. The CD profile predictor generator model may be configured to synthesize data indicating product measurements (e.g., insitu measurements, output of a predictive machine learning model, etc.) and additional product information (e.g., product design, device type, structural layer order, relationships between structural dimensions, etc.) to generate a predicted CD profile for a product or device. The CD profile predictor generator model may or may not include a trained machine learning model. The predicted CD profile may be provided as input to a composite microscope image generator. This generator may be configured to produce a realistic composite image incorporating data from the predicted CD profile, for example, a realistic composite image replicating structural information from the CD profile. In some embodiments, the predicted CD profile may be converted to a CD profile predicted image, and the CD profile predicted image may be provided as input to the composite microscope image generator.

[0033] In some embodiments, the generation of synthetic data may involve the use of a GAN. A GAN is a type of unsupervised machine learning model in which training inputs are provided to the model without providing a target output during the training operation. A basic GAN consists of two parts: a generator and a discriminator. The generator generates synthetic data, for example, synthetic microscope image data. The discriminator is then provided with both the synthetic data and true data, for example, data collected by cross-sectional SEM images of a product. The discriminator attempts to label the data as true or synthetic, and the generator attempts to generate synthetic data that the discriminator cannot distinguish from true data. After the generator has achieved its target efficiency (for example, after reaching a threshold portion of outputs that the discriminator does not classify as synthetic data), the generator may be used to generate synthetic data for use in other applications.

[0034] The aspects of this disclosure offer technical advantages over conventional solutions. The techniques of this disclosure enable the generation of accurate (e.g., sufficiently realistic) synthetic microscope images of a product based on several measured parameters and / or one or more design attributes of a manufactured device, either measured or predicted. In some embodiments, this technique enables the prediction of device dimensions that may be measured directly, typically at great expense, such as by using independent measurement equipment or by destroying the device to generate cross-sectional images. In some conventional systems, generating accurate (e.g., beyond a threshold) predictions may involve performing independent measurements (e.g., destructive cross-sectional imaging) on a large number of products. This large amount of data used to generate accurate predictions can be further intensified by changing chamber conditions (e.g., aging and drift, parts replacement, maintenance, etc.), rare target events (e.g., defect or anomaly detection), etc. Conventional systems may involve numerous processing runs to generate the data used to generate predictions. This can result in large amounts of wasted material, significant chamber downtime, and energy consumption.

[0035] Measurement images (e.g., SEM microscope images of a product) can have considerable variability between images. More consistent composite image data can be generated than measurement data, for example, by training a composite image generator using a selection of similar images, or by configuring the generator to produce composite images with selected contrast values, luminance values, or other homogeneous values.

[0036] In some embodiments, measured images are used to train a machine learning model to predict defects present in, for example, a manufacturing or processing system based on microscopic images. The use of a synthetic microscope image generator can enable the rapid and low-cost generation of large amounts of synthetic microscope data. Attribute data may be provided to this generator to generate synthetic data containing a target set of characteristics, which may be difficult to obtain without this data. For example, the generator may be configured to generate image data associated with products processed using equipment where defects occur. Recording image data that indicates defects may involve operating processing equipment under non-ideal conditions, which may increase costs, increase processing time, increase material consumption, increase energy consumption, or reduce component lifespan, or cause other similar issues. Generating synthetic microscope images using a generator may help avoid these additional costs.

[0037] In one aspect of this disclosure, the method includes receiving spectral data of a substrate and measurement data corresponding to the spectral data of the substrate. The method further includes determining a plurality of feature model configurations for each of a plurality of feature models, wherein each of the plurality of feature model configurations includes one or more feature model conditions. The method further includes determining a plurality of feature model combinations, wherein each of the plurality of feature model combinations includes a subset of the plurality of feature model configurations. The method further includes generating a plurality of input datasets, wherein each of the plurality of input datasets is generated on the basis of applying spectral data to each of the plurality of feature model combinations. The method further includes training a plurality of machine learning models, wherein each machine learning model is trained to produce an output using an input dataset from a plurality of input datasets and measurement data. The method further includes selecting from the plurality of trained machine learning models a trained machine learning model that satisfies one or more selection criteria.

[0038] In another aspect of this disclosure, a non-temporary machine-readable storage medium is disclosed. The storage medium stores instructions, which, when executed, cause a processing device to perform an operation. The operation includes receiving spectral data of a substrate and measurement data corresponding to the spectral data of the substrate. The operation includes determining a plurality of feature model configurations for each of a plurality of feature models, further including determining that each of the plurality of feature model configurations includes one or more feature model conditions. The operation includes determining a plurality of feature model combinations, further including determining that each of the plurality of feature model combinations includes a subset of the plurality of feature model configurations. The operation includes generating a plurality of input datasets, each of which input datasets is generated on the basis of applying spectral data to each of the plurality of feature model combinations. The operation includes training a plurality of machine learning models, each machine learning model is trained to produce an output using an input dataset from a plurality of input datasets and measurement data. The operation further includes selecting a trained machine learning model from among several trained machine learning models that satisfies one or more selection criteria.

[0039] Additional aspects of the present disclosure include a system, the system including a memory and a processing device coupled to the memory. The processing device is configured to receive spectral data of a substrate and measurement data corresponding to the spectral data of the substrate. The processing device is further configured to determine a plurality of feature model configurations for each of a plurality of feature models, wherein each of the plurality of feature model configurations includes one or more feature model conditions. The processing device is further configured to determine a feature model combination, wherein the feature model combination includes a subset of the plurality of feature model configurations. The processing device is further configured to generate an input dataset, wherein the input dataset is generated based on applying spectral data to the feature model combination. The processing device is further configured to train a plurality of machine learning models, wherein each machine learning model is trained to produce an output using the input dataset and measurement data. The processing device is further configured to select from the plurality of trained machine learning models a trained machine learning model that satisfies one or more selection criteria.

[0040] In another aspect of the present disclosure, the method includes processing measurement data of a substrate processed according to a manufacturing process using a first trained machine learning model to predict the CD profile of the substrate. The method further includes generating a CD profile prediction image based on the predicted CD profile of the substrate. The method further includes processing the CD profile prediction image using a second trained machine learning model to generate a composite microscope image associated with the substrate.

[0041] In another aspect of the present disclosure, the method includes receiving a plurality of SEM images and a plurality of CD measurements related to a substrate. The method further includes generating a plurality of CD profile images based on the plurality of CD measurements. The method further includes generating an input dataset comprising a plurality of SEM images and a plurality of CD profile images. The method is to train a machine learning model using the input dataset, and training the machine learning model further includes providing the plurality of CD measurements to the machine learning model as training inputs and providing the plurality of SEM images to the machine learning model as target outputs.

[0042] In another aspect of this disclosure, a non-transient machine-readable storage medium is disclosed. The storage medium stores instructions, which, when executed, cause a processing device to perform an operation. The operation includes processing measurement data of a substrate processed according to a manufacturing process using a first trained machine learning model to predict the CD profile of the substrate. The operation further includes generating a CD profile prediction image based on the predicted CD profile of the substrate. The operation further includes processing the CD profile prediction image using a second trained machine learning model to generate a composite microscope image associated with the substrate.

[0043] In another aspect of this disclosure, a non-temporary machine-readable storage medium is disclosed. The storage medium stores instructions, which, when executed, cause a processing device to perform an operation. The operation includes receiving a plurality of SEM images and a plurality of CD measurements related to a substrate. The operation further includes generating a plurality of CD profile images based on the plurality of CD measurements. The operation further includes generating an input dataset containing the plurality of SEM images and the plurality of CD profile images. The operation is to train a machine learning model using the input dataset, and training the machine learning model further includes providing the plurality of CD measurements to the machine learning model as training inputs and providing the plurality of SEM images to the machine learning model as target outputs.

[0044] Additional aspects of the present disclosure include a system, the system including a memory and a processing device coupled to the memory. The processing device is configured to process measurement data of a substrate processed according to a manufacturing process using a first trained machine learning model to predict the CD profile of the substrate. The processing device is further configured to generate a CD profile prediction image based on the predicted CD profile of the substrate. The processing device is further configured to process the CD profile prediction image using a second trained machine learning model to generate a composite microscope image associated with the substrate.

[0045] Figure 1 is a block diagram showing an exemplary system 100 (exemplary system architecture) in several embodiments. System 100 includes a client device 120, manufacturing equipment 124, sensors 126, measuring equipment 128, a prediction server 112, and a data store 140. The prediction server 112 may be part of the prediction system 110. The prediction system 110 may further include server machines 170 and 180.

[0046] Sensor 126 may provide sensor data 142 related to the manufacturing equipment 124 (for example, related to the manufacturing equipment 124 producing corresponding products such as substrates). Sensor data 142 may be used to verify the health of the equipment and / or the health of the products (e.g., product quality). The manufacturing equipment 124 may produce products according to a recipe or by running runs over a period of time. In some embodiments, sensor data 142 may include one or more values from among optical sensor data, spectral data, temperature (e.g., temperature of a heating device), interval (SP), pressure, high-frequency radio frequency (HFRF), high-frequency (RF) match voltage, RF match current, RF match capacitor position, electrostatic chuck (ESC) voltage, actuator position, current, flow rate, power, voltage, etc. Sensor data 142 may include historical sensor data 144 and current sensor data 146. Current sensor data 146 may relate to the product currently being processed, the most recently processed product, the number of recently processed products, etc. Current sensor data 146 may be used as input to a trained machine learning model, for example, as input to a trained machine learning model for generating predictive data 168. Historical sensor data 144 may include data stored in relation to previously produced products. Historical sensor data 144 may be used to train a machine learning model, for example, model 190, synthetic data generator 174, etc. Historical sensor data 144 and / or current sensor data 146 may include attribute data, for example, manufacturing equipment ID or design, sensor ID, type and / or location labels, manufacturing equipment status labels, for example, labels such as present defects, service life, etc.

[0047] The sensor data 142 may relate to, or indicate, manufacturing parameters such as hardware parameters of the manufacturing equipment 124 (e.g., hardware settings or installed components, e.g., size, type, etc.) or process parameters of the manufacturing equipment 124 (e.g., heating device settings, gas flow, etc.). Alternatively, or in addition to, data related to several hardware parameters and / or process parameters may be stored as manufacturing parameters 150, which may include historical manufacturing parameters (e.g., related to historical processing runs) and current manufacturing parameters. The manufacturing parameters 150 may indicate input settings for the manufacturing device (e.g., heating device power, gas flow, etc.). The sensor data 142 and / or manufacturing parameters 150 may be provided while the manufacturing equipment 124 is performing the manufacturing process (e.g., readings from the equipment while processing a product). The sensor data 142 may differ for each product (e.g., each substrate). The substrate may have characteristic values (film thickness, film strain, etc.) measured by the measuring instrument 128, for example, characteristic values (film thickness, film strain, etc.) measured in an independent measuring facility. The measurement data 160 may be a component of the data store 140. The measurement data 160 may include historical measurement data 164 (for example, measurement data related to previously processed products).

[0048] In some embodiments, measurement data 160 may be provided without the use of standalone measurement equipment, such as in-site measurement data (e.g., measurements or substitutes of measurements collected during processing), integrated metrology data (e.g., measurements or substitutes of measurements collected while the product is in the chamber or under vacuum, other than during processing), in-line measurement data (e.g., data collected after the substrate has been removed from the vacuum), etc. Measurement data 160 may include current measurement data 166 (e.g., measurement data related to the product currently being processed or a recently processed product). In embodiments, measurement data 160 may include data generated based on non-destructive optical measurements of the substrate after the substrate has been processed, giving reflectometry data or spectral data. Measurement data 160 may include measurements at many different locations across the surface of the substrate. A measurement tool (e.g., an integrated or in-site measurement tool) may generate measurements at multiple locations on the substrate by moving the scanning head and / or the substrate during measurement.

[0049] In some embodiments, sensor data 142, measurement data 160, or manufacturing parameters 150 may be processed (e.g., by a client device 120 and / or a prediction server 112). Processing of sensor data 142 may include generating features. In some embodiments, these features may be patterns of sensor data 142, measurement data 160, and / or manufacturing parameters 150 (e.g., slope, width, height, peak, etc.), or combinations of values from sensor data 142, measurement data, and / or manufacturing parameters (e.g., power derived from voltage and current). Sensor data 142 may also contain features, which may be used by the prediction component 114 to perform signal processing and / or to obtain prediction data 168. The prediction data 168 may be used for performance to perform corrective actions and to predict product yield, etc.

[0050] Each instance (e.g., set) of sensor data 142 may correspond to a product (e.g., a substrate), a set of manufacturing equipment (e.g., a processing chamber), a type of substrate produced by the manufacturing equipment, or other similar items. Similarly, each instance of measurement data 160 and manufacturing parameter 150 may correspond to a product, a set of manufacturing equipment, a type of substrate produced by the manufacturing equipment, or other similar items. The data store may also store information relating sets of different data types, for example, information indicating that sets of sensor data, sets of measurement data, and sets of manufacturing parameters all relate to the same product, the same manufacturing equipment, the same type of substrate, etc.

[0051] In some embodiments, a processing device may be used (e.g., through the application of a machine learning model) to generate the synthetic data 162. The synthetic data may be processed in any of the ways described above in relation to other types of data, for example, by generating features, combining values, concatenating data from a particular recipe, chamber, or substrate. The synthetic data 162 may share features with some of the measurement data 160, for example, it may include image data, CD profile data (e.g., vertical CD profile data), or it may resemble microscope images (e.g., SEM and / or TEM images) contained in the measurement data 160.

[0052] In some embodiments, the prediction system 110 may generate prediction data 168 using supervised machine learning (for example, the prediction data 168 includes output from a machine learning model trained on labeled data such as sensor data labeled with other measurement data (e.g., destructive measurement data) or first measurement data (e.g., non-destructive measurement data)). In some embodiments, the prediction system 110 may generate prediction data 168 using unsupervised machine learning (for example, the prediction data 168 includes output from a machine learning model trained on unlabeled data, the output may include clustering results, principal component analysis (PCA), anomaly detection, etc.). In some embodiments, the prediction system 110 may generate prediction data 168 using semi-supervised learning (for example, the training data may include a mixture of labeled and unlabeled data, etc.).

[0053] The client device 120, manufacturing equipment 124, sensor 126, measuring instrument 128, prediction server 112, data store 140, server machines 170 and 180 may be coupled to each other via network 130 to generate prediction data 168, such as prediction feature profiles, prediction CD profiles, or prediction (e.g., synthesized) microscopic images. Such prediction data may be used, for example, to perform corrective actions or to predict product yield. In some embodiments, network 130 may provide access to cloud-based services. The operations performed by the client device 120, prediction system 110, data store 140, etc., may be performed by cloud-based virtual devices.

[0054] In some embodiments, network 130 is a public network that provides client devices 120 with access to a prediction server 112, a data store 140, and other public computing devices. In some embodiments, network 130 is a private network that provides client devices 120 with access to manufacturing equipment 124, sensors 126, measuring instruments 128, a data store 140, and other private computing devices. Network 130 may include one or more wide area networks (WANs), local area networks (LANs), wired networks (e.g., Ethernet networks), wireless networks (e.g., 802.11 networks or Wi-Fi networks), cellular networks (e.g., Long-Term Evolution (LTE) networks), routers, hubs, switches, server computers, cloud computing networks, and / or combinations thereof.

[0055] The client device 120 may include computing devices such as personal computers (PCs), laptops, mobile phones, smartphones, tablet computers, netbooks, network-connected televisions ("smart TVs"), network-connected media players (e.g., Blu-ray players), set-top boxes, over-the-top (OTT) streaming devices, and operator boxes. The client device 120 may also include a corrective action component 122. The corrective action component 122 may receive user input of instructions related to the manufacturing equipment 124 (e.g., via a graphical user interface (GUI) displayed through the client device 120). In some embodiments, the corrective action component 122 transmits these instructions to a prediction system 110, receives output from the prediction system 110 (e.g., prediction data 168), determines a corrective action based on the output, and has the corrective action implemented. In some embodiments, the corrective action component 122 acquires sensor data 142 (e.g., current sensor data 146) related to the manufacturing equipment 124 (e.g., from a data store 140, etc.) and provides the sensor data 142 (e.g., current sensor data 146) related to the manufacturing equipment 124 to the prediction system 110.

[0056] In some embodiments, the corrective action component 122 may extract current measurement data 166 (e.g., predicted or surrogate measurement values of the product being processed) and provide this data to the feature profile generator 175 and / or the composite data generator 174. The feature profile generator 175 may generate a predictive feature profile, for example, a predictive feature profile of the substrate's CD, thickness, material composition, dielectric value, etc. In embodiments, the feature profile generated by the composite data generator 174 may be provided as input to the composite data generator 174, which may generate a predictive composite microscope image of the product related to the current measurement data 166 as output.

[0057] In some embodiments, the corrective action component 122 may store feature profiles (e.g., CD profiles) and / or composite image data in the data store 140. In some embodiments, the corrective action component 122 stores data in the data store 140 to be used as input to machine learning models or other models (e.g., current sensor data 146 provided to models 190A-Z, composite data generator 174, feature profile generator 175, prediction component 114, etc., and current measurement data 166 provided to models 190A-Z, composite data generator 174, feature profile generator 175, prediction component 114, etc.), and components of the prediction system 110 (e.g., prediction server 112, server machine 170) retrieve the sensor data 142 from the data store 140. In some embodiments, the prediction server 112 may store the output of the trained model 190 (e.g., prediction data 168) in the data store 140, and a client device 120 may retrieve this output from the data store 140.

[0058] In some embodiments, the corrective action component 122 receives instructions for corrective action from the prediction system 110 and causes the corrective action to be implemented. Each client device 120 may include an operating system that enables the user to perform one or more of the following actions: generate, view, or edit data (e.g., instructions related to manufacturing equipment 124, corrective actions related to manufacturing equipment 124, etc.).

[0059] In some embodiments, measurement data 160 (e.g., historical measurement data 164) corresponds to historical characteristic data of a product (e.g., a product processed using historical sensor data 144 and manufacturing parameters related to historical manufacturing parameters of manufacturing parameters 150), and prediction data 168 corresponds to prediction characteristic data (e.g., prediction characteristic data of a product produced or a product manufactured under conditions recorded by current sensor data 146 and / or current manufacturing parameters). In some embodiments, prediction data 168 is, or includes, prediction measurement data (e.g., virtual measurement data, virtual synthetic microscope image, virtual CD profile data) of a product produced or a product manufactured under conditions recorded as current sensor data 146, current measurement data, current measurement data and / or current manufacturing parameters. In some embodiments, the predictive data 168 is an indication of any anomaly (e.g., an abnormal product, an abnormal component, an abnormal manufacturing equipment 124, an abnormal energy usage, etc.) and optionally an indication of one or more causes of such anomalies, or includes such indications of anomalies and indications of the causes of such anomalies. In some embodiments, the predictive data 168 is an indication of the time change or drift of a component of the manufacturing equipment 124, sensor 126, measuring instrument 128, and other similar items. In some embodiments, the predictive data 168 is an indication of the end of life of a component of the manufacturing equipment 124, sensor 126, measuring instrument 128, or other similar items. In some embodiments, the predictive data 168 is an indication of the progress of an ongoing processing operation, e.g., an indication of the progress of an ongoing processing operation used for process control.

[0060] Implementing a manufacturing process that results in defective products can be costly in terms of time, energy, products, components, manufacturing equipment 124, and the costs of identifying defects and discarding defective products. By inputting sensor data 142 (e.g., manufacturing parameters used or used to manufacture a product) and / or measurement data 160 into a prediction system 110, receiving the output of prediction data 168, and performing corrective actions based on the prediction data 168, system 100 may have the technical advantage of avoiding the costs of producing, identifying, and discarding defective products. By supplying several measurement values to a feature profile generator 175 to generate a predictive feature profile, and / or supplying several measurement values and / or a predictive feature profile to a composite data generator 174 and receiving a composite microscope image as output, it is possible to identify products that are not expected to meet performance thresholds, stop production, perform corrective actions, send alarms to users, update recipes, etc.

[0061] Executing a manufacturing process that results in the failure of components of the manufacturing equipment 124 can lead to high costs related to downtime, product damage, equipment damage, and the need to urgently order replacement components. By inputting sensor data 142 (e.g., manufacturing parameters used or used to manufacture the product), measurement data, etc., into one or more trained machine learning models, receiving output of predictive data 168 (e.g., predictive CD profile, synthetic microscope image, etc.), and performing corrective actions based on the predictive data 168 (e.g., predicted operational maintenance, e.g., component replacement, processing, cleaning, etc.), the system 100 may have the technical advantage of avoiding one or more costs among unexpected component failures, unplanned downtime, loss of productivity, unexpected equipment failures, product disposal, or other similar costs. The performance of components, e.g., the performance of the manufacturing equipment 124, sensors 126, measuring instruments 128, and other similar components may be monitored over time to provide indications of degraded components.

[0062] Manufacturing parameters may not be optimal for producing a product, and producing the product may have costly consequences such as increased resource (e.g., energy, coolant, gas, etc.) consumption, increased production time, increased component failures, and increased defective product volume. By inputting instructions such as measurement data and sensor data into the feature profile generator 175 and / or the synthetic data generator 174, receiving the output of the synthetic data 162 and / or the predictive data 168, and performing corrective actions (e.g., based on the synthetic data 162 and / or the predictive data 168) to update the manufacturing parameters (e.g., set optimal manufacturing parameters), the system 100 may have the technical advantage of using optimal manufacturing parameters (e.g., hardware parameters, process parameters, optimal design) to avoid the costly consequences of suboptimal manufacturing parameters.

[0063] The corrective action may relate to one or more of the following: Computational Process Control (CPC), Statistical Process Control (SPC) (e.g., SPC for electronic components to determine the process under control, SPC for predicting the effective life of components, SPC for comparison with a 3-sigma graph, etc.), Advanced Process Control (APC), Model-Based Process Control, Proactive Operational Maintenance, Design Optimization, Manufacturing Parameter Updates, Manufacturing Recipe Updates, Feedback Control, Machine Learning Corrections, or other similar actions.

[0064] In some embodiments, the corrective action includes issuing an alarm (for example, an alarm warning to stop or not proceed with the manufacturing process if the predictive data 168 indicates a predicted anomaly, such as a product, component, or manufacturing equipment 124).

[0065] In some embodiments, the corrective action includes providing feedback control (e.g., feedback control that modifies manufacturing parameters in response to predictive data 168 indicating a predicted anomaly). In some embodiments, performing the corrective action includes performing an update to one or more manufacturing parameters. In some embodiments, performing the corrective action may include retraining a machine learning model associated with the manufacturing equipment 124. In some embodiments, performing the corrective action may include training a new machine learning model associated with the manufacturing equipment 124.

[0066] The manufacturing parameters 150 may include hardware parameters (e.g., information indicating which components are installed in the manufacturing equipment 124, information indicating component replacement, information indicating the age of the components, information indicating the software version or update, etc.) and / or process parameters (e.g., temperature, pressure, flow rate, rate, current, voltage, gas flow rate, lift speed, etc.). In some embodiments, corrective actions include causing preventive operational maintenance (e.g., replacement, processing, cleaning, etc. of components of the manufacturing equipment 124). In some embodiments, corrective actions include causing design optimization (e.g., updating manufacturing parameters, manufacturing processes, manufacturing equipment 124, etc., to optimize the product). In some embodiments, corrective actions include updating the recipe (e.g., changing the timing of when the manufacturing subsystem enters idle or active mode, changing the setpoints for various characteristic values, etc.).

[0067] The prediction server 112, server machine 170, and server machine 180 may each include one or more computing devices such as a rack-mount server, router computer, server computer, personal computer, mainframe computer, laptop computer, tablet computer, desktop computer, graphics processing unit (GPU), and accelerator application-specific integrated circuit (ASIC) (e.g., tensor processing unit (TPU)). The operation of the prediction server 112, server machine 170, server machine 180, data store 140, etc., may be performed by a cloud computing service, a cloud data storage service, etc.

[0068] The prediction server 112 may include a prediction component 114. In some embodiments, the prediction component 114 may receive current sensor data 146 and / or current manufacturing parameters (e.g., received from a client device 120 and retrieved from a data store 140), as well as / or recent or current measurement data (e.g., reflectometry and / or spectral data), and generate output (e.g., prediction data 168, such as prediction feature profile data). In some embodiments, the prediction data 168 may be available to perform corrective actions related to the manufacturing equipment 124. In some embodiments, the prediction component corresponds to a feature profile generator 175. In some embodiments, the prediction data 168 may be feature profile data (e.g., prediction CD profile) and may be provided to a synthetic data generator 174 to generate synthetic microscope image data, e.g., synthetic data 162. In some embodiments, the prediction data 168 may include one or more prediction dimensional measurements of the processed product (e.g., prediction CD profile). In some embodiments, the prediction data 168 may be further processed (for example, the prediction CD profile may be converted into a CD profile prediction image). In some embodiments, the processed prediction data 168 (e.g., CD profile prediction image) may be provided to a synthetic data generator 174 to generate synthetic microscope image data, e.g., synthetic data 162. In some embodiments, the prediction component 114 may use one or more trained machine learning models 190 to determine an output for performing corrective actions based on the current data.

[0069] The manufacturing equipment 124 may be associated with one or more machine learning models, e.g., models 190A-Z. The machine learning models associated with the manufacturing equipment 124 may perform many tasks, including process control, classification, and performance prediction. Models 190A-Z may be trained using data associated with the manufacturing equipment 124 or data associated with products processed by the manufacturing equipment 124, e.g., sensor data 142 (e.g., collected by sensor 126), manufacturing parameters 150 (e.g., related to process control of the manufacturing equipment 124), measurement data 160 (e.g., generated by measuring instrument 128), etc. In embodiments, different models 190A-Z may be trained for different manufacturing equipment 124. For example, one model may be trained for each process chamber and / or for each category of process chambers.

[0070] One type of machine learning model that may be used to perform some or all of the above tasks is an artificial neural network, such as a deep neural network. Artificial neural networks generally include feature representation components that have classifiers or regression layers that map features to a desired output space. For example, a convolutional neural network (CNN) acts as a host for multiple layers of convolutional filters. Pooling may be performed and nonlinearity may be addressed in the lower layers, and above the lower layers, a multilayer perceptron is typically added to map the upper layer features extracted by the convolutional layers to a decision (e.g., a classification output).

[0071] A recurrent neural network (RNN) is another type of machine learning model. Recurrent neural network models are designed to interpret a series of inputs that are inherently related to each other, such as time-trace data or sequential data. The output of an RNN perceptron is fed back as input to that perceptron to generate the next output.

[0072] Deep learning is a type of machine learning algorithm that uses a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer takes the output from the preceding layer as input. Deep neural networks may be trained in a supervised (e.g., classification) and / or unsupervised (e.g., pattern analysis) manner. Deep neural networks have a hierarchical structure of layers, where different layers learn different levels of representation corresponding to different levels of abstraction. In deep learning, each level learns to transform its input data into a slightly more abstract and complex representation. For example, in an image recognition application, the raw input may be a matrix of pixels, the first representative layer may abstract the pixels and encode the edges, the second layer may constitute and encode the arrangement of the edges, the third layer may encode higher-order shapes (e.g., teeth, lips, gums, etc.), and the fourth layer may recognize the scanning task. In particular, the deep learning process can independently learn which features are best placed at which level. The "deep" in "deep learning" refers to the number of layers through which data is transformed. More precisely, deep learning systems have a considerable credit assignment path (CAP) depth. A CAP is a chain of transformations from input to output. A CAP describes the potentially causal connections between inputs and outputs. For feedforward neural networks, the CAP depth may be the depth of the network or the number of hidden layers + 1. For recurrent neural networks, where a signal may propagate through layers two or more times, the CAP depth is potentially infinite.

[0073] In some embodiments, the GAN's generator includes a synthetic data generator 174, a feature profile generator 175, a predictive component 114, and / or models 190A-Z. In one embodiment, this generator is trained to produce a synthetic microscope image. A machine learning model may be trained by including this target model in the GAN. The GAN juxtaposes two (or more) machine learning models (e.g., in an adversarial arrangement) to facilitate model training. A simple GAN includes a generator and a discriminator. The generator is configured to produce synthetic data that resembles data from a set of true data. The discriminator is configured to classify the output from the generator as either true data or synthetic data. The model weights and biases are adjusted to improve the generator's data generation and the discriminator's classification. The GAN may also be configured to transform images from one space to another, for example, by replacing certain characteristics of an input image with other characteristics. In some embodiments, an image-to-image (e.g., pix2pix) GAN may be configured to convert a primitive drawing or cartoon into a realistic image. In some embodiments, the discriminator of the image-to-image GAN may be provided with the original (e.g., true) image and / or a composite image produced by the generator, and the discriminator of the image-to-image GAN may classify which of the two is true and which is composite.

[0074] In some embodiments, the prediction component 114 receives current sensor data 146, current measurement data 166, and / or current manufacturing parameters, performs signal processing to decompose the current data into sets of current data, provides these sets of current data as input to one or more trained models 190A-Z, and obtains an output from the trained models 190A-Z indicating prediction data 168. In some embodiments, the prediction component 114 receives measurement data of the substrate (e.g., prediction measurement data based on sensor data) and provides the measurement data to the trained models 190A-Z. For example, the current sensor data 146 may include sensor data indicating measurement values of the substrate (e.g., shape dimensions). Models 190A-Z may be configured to accept data indicating substrate measurements and generate a prediction composite microscope image as an output (e.g., by performing the operation of a composite data generator 174) and / or generate a prediction feature profile as an output (e.g., by performing the operation of a feature profile generator 175). In some embodiments, the predicted data indicates the measured data (e.g., the predicted quality of the substrate). In some embodiments, the predicted data indicates the health of the components. In some embodiments, the predicted data indicates the progress of a process (e.g., used to terminate a processing operation).

[0075] In some embodiments, the various models discussed in relation to Model 190 (e.g., supervised machine learning models, unsupervised machine learning models, etc.) may be combined into a single model (e.g., an ensemble model) or they may be separate models.

[0076] Data may be passed bidirectionally between several separate models included in models 190A-Z, the synthetic data generator 174, the feature profile generator 175, and / or the predictive component 114. In some embodiments, instead, some or all of these operations may be performed by different devices, such as client device 120, server machine 170, server machine 180, etc. Those skilled in the art will understand that variations in data flow, which component performs which process, which data is provided to which model, and other similar matters are within the scope of this disclosure.

[0077] The data store 140 may be memory (e.g., random access memory), drives (e.g., hard drives, flash drives), a database system, a cloud-accessible memory system, or another type of component or device capable of storing data. The data store 140 may include multiple storage components (e.g., multiple drives or multiple databases) that may reside across multiple computing devices (e.g., multiple server computers). The data store 140 may store sensor data 142, manufacturing parameters 150, measurement data 160, composite data 162, and prediction data 168.

[0078] Sensor data 142 may include historical sensor data 144 and current sensor data 146. The sensor data may include time tracking of sensor data over the entire duration of the manufacturing process, correlation between data and physical sensors, preprocessed data such as averages and composite data, and data indicating sensor performance over time (i.e., across many manufacturing processes). Manufacturing parameters 150 and measurement data 160 may also include similar features, e.g., historical measurement data 164 and current measurement data 166. Historical sensor data 144, historical measurement data 164 and historical manufacturing parameters may be historical data (e.g., at least portions of this data may be used to train model 190). Current sensor data 146 and / or current measurement data 166 may be current data (e.g., at least portions that are input to the learning model 190 following the historical data) for which prediction data 168 is generated (e.g., to perform corrective actions). The synthesized data 162 may include synthesized data that resembles a synthesized image generated by the synthesized data generator 174, such as an SEM image, a transmission electron microscope (TEM) image, or other similar images. The synthesized data 162 may also include synthesized data such as a predicted CD profile, a CD profile prediction image, or other similar images generated by the synthesized data generator 174.

[0079] In some embodiments, the prediction system 110 includes server machines 170 and 180. Server machine 170 includes a dataset generator 172 that can generate datasets (e.g., sets of data inputs and sets of target outputs) for training, verifying, and / or testing a model 190 that includes one or more machine learning models. Some operations of the dataset generator 172 are described in detail below with reference to Figures 2A-2B and 4A. In some embodiments, the dataset generator 172 may divide historical data (e.g., historical sensor data 144, historical manufacturing parameters, historical measurement data 164) into a training set (e.g., 60 percent of the historical data), a verification set (e.g., 20 percent of the historical data), and a test set (e.g., 20 percent of the historical data).

[0080] In some embodiments, the prediction system 110 generates multiple sets of features (for example, via prediction components 114). For example, a first set of features may correspond to a first set of sensor data and / or measurement data types corresponding to each of the datasets (e.g., a training set, a verification set, and a test set) (e.g., a first set of measuring instruments, a first combination of measurements from the first set of measuring instruments, a first pattern of measurements from the first set of measuring instruments, a first set of sensors, a first combination of values from the first set of sensors, and / or a first pattern of values from the first set of sensors), and a second set of features may correspond to a second set of sensor data types corresponding to each of the datasets (e.g., a second set of sensors different from the first set of sensors, a second combination of values different from the first combination, a second pattern different from the first pattern).

[0081] The server machine 170 may include a synthetic data generator 174. The synthetic data generator 174 may include one or more trained machine learning models, models based on physical phenomena, rule-based models or other similar models. In one embodiment, the synthetic data generator 174 is a GAN generator, such as an image-to-image GAN generator, or includes a GAN generator, such as an image-to-image GAN generator. The synthetic data generator 174 may be trained to generate synthetic microscope images from input data. In an embodiment, this input data may be a simple line drawing or cartoon of a side section of a device or structure, or includes a simple line drawing or cartoon of a side section of a device or structure. The synthetic data generator 174 may be trained using measurement data 160, for example, measurement data 160 collected by a measuring instrument 128. The synthetic data generator 174 may be configured to generate synthetic data, for example, synthetic microscope image data. The synthetic data 162 may resemble historical measurement data 164. The synthetic data 162 may be used to train the machine learning model 190 to generate predictive data 168 for performing corrective actions, for example. The dataset generator 172 may combine the measurement data 160 and the synthetic data 162 to generate datasets such as training, test, and confirmation datasets.

[0082] In some embodiments, the synthetic data generator 174 may be configured to generate a synthetic microscope image of a manufactured product. During training, the synthetic data generator 174 may be provided with a true microscope image (e.g., an image of the manufactured device acquired by a microscope system such as an SEM or TEM system), and the synthetic data generator 174 may be configured to generate a synthetic image similar to the true image. In some embodiments, during training, the synthetic data generator 174 may be provided with instructions for the dimensions of the substrate. For example, during training, the output of an input measurement system may be provided to the synthetic data generator 174. The synthetic data generator 174 may accept instructions for one or more measurements (e.g., a list of measurements from a measurement system, a list of predicted measurements from a model configured to predict substrate dimensions), and be configured to generate one or more synthetic microscope images as output. In some embodiments, the synthetic data generator 174 may be provided with indications of the measured values of the manufactured product in the form of a cartoon image or cartoon drawing of the product (e.g., a CD profile prediction image, which may be generated based on the CD profile output by the feature profile generator 175 and / or prediction component 114). In some embodiments, the synthetic data generator 174 may include a CD profile prediction image generator.

[0083] In some embodiments, the outputs from the synthetic data generator 174 and / or the feature profile generator 175 may be used for performance analysis (e.g., analysis of predicted substrate performance, substrate processing system performance, etc.). In some embodiments, the outputs from the synthetic data generator 174 and / or the feature profile generator 175 may be used to select substrates for further investigation, for example, to flag potentially defective substrates for additional measurement. In some embodiments, the outputs from the synthetic data generator 174 and / or the feature profile generator 175 may be provided to another model, e.g., another machine learning model. This second model may be configured to accept one or more measurement images related to the substrate. This second model may be configured to perform defect detection, performance estimation, corrective action recommendations, etc., based on a microscopic image (e.g., a synthetic microscopic image generated by the synthetic data generator 174) or a predicted CD profile (e.g., a synthetic CD profile generated by the synthetic data generator 174).

[0084] In some embodiments, historical data is provided to machine learning models 190A-Z as training data. In some embodiments, synthetic data 162 is provided to machine learning models 190A-Z as training data. In some embodiments, this historical and / or synthetic sensor data may be or include microscopic image data. This historical and / or synthetic sensor data may be measurement data (e.g., reflectometry data or spectral data) of the substrate surface generated by a non-destructive measurement device, or include measurement data (e.g., reflectometry data or spectral data) of the substrate surface generated by a non-destructive measurement device. The type of data provided will depend on the application of the machine learning model. For example, the machine learning model may be trained by providing historical sensor data 144 and / or first measurement data 160 (e.g., measurement data from a non-destructive measurement tool) as training input and the corresponding measurement data 160 (e.g., measurement data from a destructive measurement tool) as the target output. In some embodiments, models 190A-Z may be trained using a large amount of data. For example, sensor and measurement data from hundreds of substrates may be used. In some embodiments, a relatively small amount of data is available to train models 190A-Z, for example, models 190A-Z are trained to recognize rare events such as equipment failures, or to generate predictions for newly seasoned or maintained chambers. To increase the amount of true data available to train models 190A-Z (e.g., data generated by the measuring instrument 128), synthetic data 162 may be generated by the synthetic data generator 174.

[0085] The server machine 180 includes a training engine 182, a verification engine 184, a selection engine 185, and / or a test engine 186. The engines (e.g., training engine 182, verification engine 184, selection engine 185, and test engine 186) may refer to hardware (e.g., circuits, dedicated logic circuits, programmable logic circuits, microcode, processing devices, etc.), software (e.g., instructions executed on processing devices, general-purpose computer systems, or dedicated machines), firmware, microcode, or a combination thereof. The training engine 182 may be capable of training models 190A-Z using one or more sets of features associated with a training set from the dataset generator 172 (models 190A-Z may correspond, for example, to a synthetic data generator 174 and / or a feature profile generator 175). The training engine 182 may generate multiple trained models 190A to 190Z, in which case each trained model 190A to Z corresponds to a different set of features in the training set (e.g., sensor data from different sets of sensors, or different sets of features in sensor and / or measurement data). For example, the first trained model may be trained using all features (e.g., X1 to X5, where X refers to a feature), the second trained model may be trained using a first subset of features (e.g., X1, X2, X4), and the third trained model may be trained using a second subset of features (e.g., X1, X3, X4, and X5), the second subset of features may partially overlap with the first subset of features. The dataset generator 172 may receive the output of a trained model (e.g., synthetic data 162 from the synthetic data generator 174), collect that data to form training, confirmation, and test datasets, and use those datasets to train a second model (e.g., a machine learning model configured to output predictive data, corrective actions, etc.).

[0086] The verification engine 184 may be able to verify the trained model 190 using a corresponding set of features from the verification set of the dataset generator 172. For example, a first trained machine learning model 190A, trained using a first set of features from the training set, may be verified using a first set of features from the verification set. The verification engine 184 may determine the accuracy of each of the trained models 190A to Z based on a corresponding set of features from the verification set. The verification engine 184 may discard trained models 190A to Z that have an accuracy that does not meet a threshold accuracy. In some embodiments, the selection engine 185 may be able to select one or more trained models 190A to Z that have an accuracy that meets a threshold accuracy. In some embodiments, the selection engine 185 may be able to select the trained model 190A to Z that has the highest accuracy among the trained models 190A to Z.

[0087] The test engine 186 may be capable of testing the trained models 190A-Z using a corresponding set of features from the test set derived from the dataset generator 172. For example, a first trained machine learning model 190A-Z, trained using a first set of features from the training set, may be tested using a first set of features from the test set. Based on the test set, the test engine 186 may determine which of all the trained models 190A-Z has the highest accuracy.

[0088] In the case of machine learning models, models 190A-Z may refer to model artifacts generated by the training engine 182 using a training set containing data inputs and corresponding target outputs (the correct answers for each training input). In one embodiment, the training set includes a synthetic microscope image generated by the synthetic data generator 174. Patterns in the dataset can be found that map data inputs to target outputs (correct answers), and machine learning models 190A-Z are provided with mappings that capture these patterns. Machine learning models 190A-Z may use one or more of the following: support vector machines (SVMs), radial basis functions (RBFs), clustering, supervised machine learning, semi-supervised machine learning, unsupervised machine learning, k-nearest neighbors algorithm (k-NN), linear regression, random forests, and neural networks (e.g., artificial neural networks, recurrent neural networks). The synthetic data generator 174 may contain one or more machine learning models, and these one or more machine learning models may contain one or more models of these same type (e.g., artificial neural networks).

[0089] In some embodiments, one or more machine learning models 190A to Z may be trained using historical data (e.g., historical sensor data 144). In some embodiments, model 190 may be trained using synthetic data 162, or a combination of historical and synthetic data.

[0090] In some embodiments, the synthetic data generator 174 and / or the feature profile generator 175 may be trained using historical data. For example, the synthetic data generator 174 may be trained to generate synthetic data 162 using historical measurement data 164. In some embodiments, the synthetic data generator 174 may include a GAN. The GAN includes at least a generator and a discriminator. The generator attempts to generate data (e.g., time-trace sensor data) that is similar to the input data (e.g., true sensor data). The discriminator attempts to distinguish the true data from the synthetic data. Training the GAN involves the generator becoming proficient in generating data similar to the true sensor data, and the discriminator becoming proficient in distinguishing the true data from the synthetic data. The trained GAN includes a generator configured to generate synthetic data that includes many of the features of the true data used to train the GAN. In some embodiments, the input data may be labeled with one or more attributes, such as information about tools, sensors, products, or product designs related to the input data. In some embodiments, the generator may be configured to generate synthetic data having a certain set of attributes, such as target processing operation, target processing equipment defects, target measurement configuration (e.g., contrast, cross-sectional or top image, brightness, etc.) or other similar data.

[0091] Generating and utilizing synthetic data 162 offers significant technical advantages over other methods. In some embodiments, it generates measurements of one or more dimensions of a device. Some measurements may be performed using cost-effective, non-destructive methods. For example, in-situ reflection spectrum measurement data may correlate with etching depth. Similarly, integrated or in-line measurements may provide one or more dimensions of a manufactured device. In some embodiments, target dimension measurements, such as the dimensions of the internal structure of a product or device, are not readily available. Directly measuring target dimensions can be costly in terms of time, for example, because the measurement may be performed in a standalone measurement facility, or costly in terms of materials, or can be destructive to the product, for example. Measurements that are difficult to obtain by cost-effective conventional methods may be obtained by performing synthetic data 162 measurements, for example, by calculating product measurements from one or more synthetic microscope images. The synthetic data 162 generated by the synthetic data generator 174 may have high accuracy. Therefore, the synthetic data generator 174 may generate synthetic data that provides information about the internal structure of a substrate that can only be obtained by destructive means.

[0092] The synthesized data 162 may also offer technical advantages when used to train additional models such as models 190A-Z. In some embodiments, large amounts of data (e.g., data from hundreds of substrates) may be used to train machine learning models, models based on physical phenomena, etc. Generating such amounts of data can be costly, for example, in terms of raw materials consumed, process gases, energy, time, and equipment wear. A synthesized data generator 174 may be used to quickly and inexpensively generate large amounts of image data that may be used to train models.

[0093] In some embodiments, the true microscopic image may vary in unpredictable or unfavorable ways. For example, different images may have different characteristics, such as different contrast, brightness, clarity, etc. This may be due to operator error, microscopy procedure, etc. The synthesized data 162 may be tuned to exhibit desired characteristics. In some embodiments, training data may be selected (e.g., by the user, algorithm) to train a synthesized data generator 174 that exhibits desired characteristics. In some embodiments, attribute data describing the characteristics of the image may be provided to the generator, and the generator may be configured to generate an image according to a target set of attributes and / or characteristics.

[0094] The prediction component 114 may provide current data to models 190A-Z and run models 190A-Z on this input to obtain one or more outputs. For example, the prediction component 114 may provide current sensor data 146 to models 190A-Z and run models 190A-Z on this input to obtain one or more outputs. The prediction component 114 may determine (e.g., extract) prediction data 168 from the outputs of models 190A-Z. From its outputs, the prediction component 114 may determine (e.g., extract) confidence data indicating the confidence that the prediction data 168 is an accurate predictor of the process related to the input data for products produced or to be produced using the manufacturing equipment 124 with respect to the current sensor data 146 and / or current manufacturing parameters. The prediction component 114 or the corrective action component 122 may use this confidence data to decide, based on the prediction data 168, whether to perform a corrective action related to the manufacturing equipment 124.

[0095] The confidence data may include, or indicate, a confidence level that the prediction data 168 is an accurate prediction of a product or component related to at least a portion of the input data. For example, the confidence level may be a real number between 0 and 1, where 0 indicates no confidence that the prediction data 168 is an accurate prediction of a product processed according to the input data or an accurate prediction of the component health of a component of the manufacturing equipment 124, and 1 indicates absolute confidence that the prediction data 168 is an accurate prediction of the characteristics of a product processed according to the input data or the component health of a component of the manufacturing equipment 124. In response to the confidence data indicating a confidence level below a threshold level for a given number of instances (e.g., percentage of instances, frequency of instances, total number of instances, etc.), the prediction component 114 may retrain the trained models 190A-Z (e.g., based on current sensor data 146, current manufacturing parameters, etc.). In some embodiments, retraining may include generating one or more datasets (e.g., via a dataset generator 172) using historical and / or synthetic data.

[0096] For illustrative purposes rather than limitation, aspects of this disclosure describe training one or more machine learning models 190A-Z using historical data (e.g., historical sensor data 144, historical manufacturing parameters) and synthetic data 162, and inputting current data (e.g., current sensor data 146, current manufacturing parameters, and current measurement data) into one or more trained machine learning models to determine predictive data 168. In other embodiments, heuristic models, physical phenomenon-based models, or rule-based models are used to determine predictive data 168 (e.g., without using trained machine learning models). In some embodiments, such models may be trained using historical and / or synthetic data. In some embodiments, these models may be retrained using a combination of true historical and synthetic data. The predictive component 114 may monitor historical sensor data 144, historical manufacturing parameters, and measurement data 160. Any of the information described with respect to the data inputs 210A-B in Figures 2A-2B may be monitored by the heuristic models, physical phenomenon-based models, or rule-based models, or used in other ways.

[0097] In some embodiments, the functions of client device 120, prediction server 112, server machine 170, and server machine 180 may be provided by fewer machines. For example, in some embodiments, server machines 170 and 180 may be integrated into a single machine, and in some other embodiments, server machine 170, server machine 180, and prediction server 112 may be integrated into a single machine. In some embodiments, client device 120 and prediction server 112 may be integrated into a single machine. In some embodiments, the functions of client device 120, prediction server 112, server machine 170, server machine 180, and data store 140 may be performed by a cloud-based service.

[0098] In general, functions described in one embodiment as being performed by the client device 120, prediction server 112, server machine 170, and server machine 180 may, in other embodiments, be performed on the prediction server 112 where appropriate. Furthermore, functions that are attributed to a particular component may be performed by different or multiple components working together. For example, in some embodiments, the prediction server 112 may determine corrective actions based on the prediction data 168. In another example, the client device 120 may determine the prediction data 168 based on the output from a trained machine learning model.

[0099] Furthermore, different or multiple components working together can perform the functions of a particular component. One or more of the prediction server 112, server machine 170, or server machine 180 may be accessed as a service provided to other systems or devices through an appropriate application programming interface (API).

[0100] In some embodiments, “User” may be represented as a single individual. However, other embodiments of the present disclosure include the case where “User” is an entity controlled by multiple users and / or an automated source. For example, a collection of individual users integrated as a group of administrators may be considered “User.”

[0101] Embodiments of this disclosure may be applied to data quality assessment, feature enhancement, model evaluation, virtual instrumentation (VM), predictive maintenance (PdM), marginal optimization, process control, or other similar applications.

[0102] Figures 2A and 2B show block diagrams of exemplary dataset generators 272A and B (e.g., dataset generator 172 in Figure 1) that generate datasets for training, testing, and verifying models (e.g., models 190A to Z in Figure 1) according to several embodiments. Each dataset generator 272 may be part of the server machine 170 in Figure 1. In some embodiments, several machine learning models associated with the manufacturing equipment 124 may be trained, used, and maintained (e.g., within the manufacturing facility). For example, each machine learning model may be associated with one of the dataset generators 272, and multiple machine learning models may share a dataset generator 272.

[0103] Figure 2A shows a system 200A that includes a dataset generator 272A for generating datasets for training one or more supervised models (e.g., the synthetic data generator 174 and / or feature profile generator 175 in Figure 1). The dataset generator 272A may generate datasets (e.g., data input 210A, target output 220A) using historical data. In some embodiments, an unsupervised machine learning model may be trained using a dataset generator similar to the dataset generator 272A, for example, the dataset generator 272A may not generate the target output 220A.

[0104] The dataset generator 272A may generate datasets for training, testing, and verifying a model. In some embodiments, the dataset generator 272A may generate datasets for a machine learning model. In some embodiments, the dataset generator 272A may generate datasets for training, testing, and / or verifying a generator model configured to generate synthetic microscopy image data. In some embodiments, the dataset generator 272A may generate datasets for training, testing, and / or verifying a generator model configured to generate predictive feature profile data, such as predicted CD profile data. This machine learning model is provided with a set of historical measurement data 264A and / or related historical CD profile data as data input 210A. This machine learning model may be configured to accept measurement and / or CD profiles as input data and to generate synthetic microscopy image data as output. This machine learning model may be configured to accept measurement data (e.g., spectral data) as input data and to generate synthetic CD profile prediction data as output.

[0105] In some embodiments, the dataset generator 272A may be configured to generate datasets for training, testing, and verifying GANs. In some embodiments, the dataset generator 272A may be configured to generate datasets for parallel training, testing, and verifying multiple different types of machine learning models and / or configurations of machine learning models. In some embodiments, the dataset generator 272A may be used to generate datasets for image-to-image (e.g., pix-2-pix) GANs. In some embodiments, measurement data may be provided to another model, which generates synthetic data from the measurements and provides it to the machine learning model. In some embodiments, this synthetic model may generate a predicted CD profile or CD profile predictive image of a manufactured product, for example, a predicted CD profile or CD profile predictive image of a manufactured product incorporating known measurement values and various design rules. With respect to Figure 2A, we assume that this dataset generator supplies measurement data to this model. For example, this model may be considered an ensemble model incorporating a predicted CD profile generator (or CD profile predictive image generator) and an image-to-image GAN.

[0106] The dataset generator 272A may be used to generate data for any type of machine learning model that takes measurement data as input. The dataset generator 272A may also be used to generate data for a machine learning model that generates predictive measurement data of a substrate, such as a predicted CD profile of the substrate and / or a synthetic microscope image of the substrate.

[0107] In some embodiments, the dataset generator 272A generates a dataset (e.g., training set, verification set, test set) containing one or more data inputs 210A (e.g., training input, verification input, test input). The data inputs 210A may be provided to the training engine 182, verification engine 184, or test engine 186. This dataset may be used to train, verify, or test a model (e.g., the synthetic data generator 174 in Figure 1).

[0108] In some embodiments, data input 210A may include one or more sets of data. For example, system 200A may generate a set of measurement data which may include one or more of the following: measurement data from one or more types of measuring instruments, combinations of measurement data from one or more types of measuring instruments, patterns from measurement data from one or more types of measuring instruments, and / or composite versions thereof. For example, system 200A may generate a set of sensor data which may include one or more of the following: sensor data from one or more types of sensors, combinations of sensor data from one or more types of sensors, patterns from sensor data from one or more types of sensors, and / or composite versions thereof.

[0109] In some embodiments, data input 210A may include one or more sets of data. For example, system 200A may include one or more sets of measurement data for a group of device dimensions (e.g., including device height and width, but not optical data or surface roughness, etc.), sets of historical measurement data, measurement data derived from one or more types of sensors, combinations of measurement data derived from one or more types of sensors, patterns from measurement data, etc. The set of data input 210A may include data describing different manufacturing modes, e.g., combinations of measurement data and sensor data, combinations of measurement data and manufacturing parameters, combinations of certain measurement data, certain manufacturing parameter data and certain sensor data, etc.

[0110] In some embodiments, the dataset generator 272A may generate a first data input corresponding to a first set 264A of historical measurement data for training, verifying, or testing a first machine learning model. The dataset generator 272A may generate a second data input corresponding to a second set 264B of historical measurement data for training, verifying, or testing a second machine learning model.

[0111] In some embodiments, the dataset generator 272A generates a dataset (e.g., a training set, a confirmation set, a test set) which includes one or more data inputs 210A (e.g., a training input, a confirmation input, a test input) and may include one or more target outputs 220A corresponding to the data inputs 210A. The dataset may further include mapping data that maps the data inputs 210A to the target outputs 220A. In some embodiments, the dataset generator 272A may generate data for training a machine learning model configured to output realistic synthetic microscope image data by generating a dataset including output data 268. In some embodiments, the dataset generator 272A may generate data for training a machine learning model configured to output predicted CD profile data and / or CD profile predicted image data by generating a dataset including output data 268. The data inputs 210A may also be referred to as “features,” “attributes,” or “information.” In some embodiments, the dataset generator 272A may provide this dataset to the training engine 182, the verification engine 184, or the test engine 186, which use this dataset to train, verify, or test machine learning models (e.g., one of the machine learning models included in the synthetic data generator 174, model 190, ensemble model 190, etc.).

[0112] Figure 2B shows a block diagram of an exemplary dataset generator 272B for generating datasets for supervised models configured to generate anomaly predictions, according to several embodiments. System 200B, including dataset generator 272B (e.g., dataset generator 172 in Figure 1), generates datasets for one or more machine learning models (e.g., model 190 in Figure 1). Dataset generator 272B may generate datasets (e.g., data input 210B) using historical data. The exemplary dataset generator 272B is configured to generate datasets for a machine learning model configured to take predicted microscopy image data and / or predicted CD profile data as input and produce anomaly prediction data 269 as output. Similar dataset generators (or similar operations of dataset generator 272B) may be used for machine learning models configured to perform different functions, such as machine learning models configured to accept sensor data and predictive measurement data as input, or machine learning models configured to accept target measurement data (e.g., target microscope images, target predicted CD profiles, target CD profile predicted images, etc.) as input and to produce estimated conditions or processing operation recipes as output that may produce a device matching the input target data. Dataset generator 272B may share features and / or functions with dataset generator 272A.

[0113] The dataset generator 272B may generate datasets for training, testing, and validating a machine learning model. This machine learning model is provided with a set of historical synthetic microscope data 262A (e.g., output synthetic data from the synthetic data generator 174, output from a model trained using datasets from the dataset generator 272A) as data input 210B. This machine learning model may include two or more separate models (e.g., this machine learning model may be an ensemble model). This machine learning model may be configured to generate output data indicating the performance of a processing chamber, such as an indication of anomalies present within the processing equipment. In some embodiments, training may not include providing a target output to the machine learning model. The dataset generator 272B may generate datasets for training an unsupervised machine learning model, such as a model configured to take synthetic microscope data as input and generate clustered data, outlier detection data, anomaly detection data, etc., as outputs.

[0114] In some embodiments, the dataset generator 272B generates a dataset (e.g., a training set, a validation set, a test set), which includes one or more data inputs 210B (e.g., a training input, a validation input, a test input). The data inputs 210B may also be referred to as “features,” “attributes,” or “information.” In some embodiments, the dataset generator 272B may provide this dataset to the training engine 182, the validation engine 184, or the test engine 186, which use this dataset to train, validate, or test a machine learning model (e.g., model 190 in Figure 1). Several embodiments of generating the training set are described further with reference to Figure 4A.

[0115] In some embodiments, the dataset generator 272B may generate a first data input corresponding to a first set 244A of historical sensor data for training, verifying, or testing a first machine learning model, and the dataset generator 272A may generate a second data input corresponding to a second set 244B of historical sensor data for training, verifying, or testing a second machine learning model.

[0116] The data input 210B for training, verifying, or testing a machine learning model may include information for a specific manufacturing chamber (e.g., of a particular substrate manufacturing machine). In some embodiments, the data input 210B may include information for a specific type of manufacturing machine, e.g., manufacturing machines that share certain characteristics. The data input 210B may include data related to a certain type of device, e.g., a device with an intended function, a device with a design, a device produced using a specific recipe, etc. Training a machine learning model based on one type of machine, device, recipe, etc. may enable the trained model to generate plausible synthetic sensor data in several settings (e.g., for several different pieces of equipment, products, etc.).

[0117] In some embodiments, the process may involve generating a dataset and then training, verifying, or testing a machine learning model using that dataset, followed by further training, verifying, or testing the model (for example, by adjusting weights or parameters such as connection weights in a neural network that are related to the model's input data).

[0118] Figure 3 is a block diagram showing a system 300 for generating output data (e.g., the synthesized data 162 and / or predicted data 168 in Figure 1) according to several embodiments. In some embodiments, the system 300 may be used with a machine learning model (e.g., the synthesized data generator 174 in Figure 1) configured to generate synthesized microscope image data and / or synthesized CD profile data. In some embodiments, the system 300 may be used with a machine learning model to determine corrective actions related to manufacturing equipment. In some embodiments, the system 300 may be used with a machine learning model to determine defects in manufacturing equipment. In some embodiments, the system 300 may be used with a machine learning model to cluster or classify substrates. The system 300 may be used with a machine learning model related to a manufacturing system that has functions different from those listed above.

[0119] Figure 3 shows a system configured to train, validate, test, and use one or more machine learning models. These machine learning models are configured to accept data (e.g., setpoints provided to manufacturing equipment, sensor data, measurement data, etc.) as input and to provide data (e.g., predictive data, corrective action data, classification data, composite image data, etc.) as output. The split, train, validate, select, test, and use blocks of system 300 may be performed similarly to train a second model using a different type of data. Furthermore, retraining may be performed using the current data 322 and / or additional training data 346.

[0120] In block 310, system 300 (for example, a component of prediction system 110 in Figure 1) performs data partitioning of data used when training, verifying, and / or testing machine learning models (for example, via a dataset generator 172 on server machine 170 in Figure 1). In some embodiments, training data 364 includes historical data such as historical measurement data, historical classification data (e.g., classification of whether a product meets a performance threshold), historical microscopy image data, and historical CD profile data. In some embodiments, for example, when training a second machine learning model using a synthetic microscopy image generated by a trained machine learning model, training data 364 may include synthetic microscopy image data, for example, synthetic microscopy image data generated by the synthetic data generator 174 in Figure 1. In some embodiments, for example, when training a second machine learning model using a synthetic CD profile generated by a trained machine learning model, training data 364 may include synthetic CD profile data, for example, synthetic CD profile data generated by the synthetic data generator 174 in Figure 1. To generate the training set 302, the confirmation set 304, and the test set 306, the training data 364 may undergo data partitioning in block 310. For example, the training set may be 60% of the training data, the confirmation set may be 20% of the training data, and the test set may be 20% of the training data.

[0121] The generation of training set 302, verification set 304, and test set 306 may be adjusted to suit a specific application. For example, the training set may consist of 60% of the training data, the verification set may consist of 20% of the training data, and the test set may consist of 20% of the training data. The system 300 may generate multiple sets of features for each training set, verification set, and test set. For example, if training data 364 includes sensor data, and this sensor data includes features derived from sensor data from 20 sensors (e.g., sensor 126 in Figure 1) and 10 manufacturing parameters (e.g., manufacturing parameters corresponding to the same processing runs as the sensor data from these 20 sensors), the sensor data may be divided into a first set of features including sensors 1-10 and a second set of features including sensors 11-20. Furthermore, the manufacturing parameters may be divided into multiple sets, for example, into a first set of manufacturing parameters including parameters 1-5 and a second set of manufacturing parameters including parameters 6-10. The target input may be split into multiple sets, the target output may be split into multiple sets, both the target input and target output may be split into multiple sets, or neither the target input nor target output may be split into multiple sets. Multiple models may be trained on different sets of data.

[0122] In block 312, system 300 performs model training (e.g., via training engine 182 in Figure 1) using training set 302. Training of machine learning models and / or models based on physical phenomena (e.g., digital twins) may be achieved in a supervised learning manner, which includes providing a training dataset consisting of labeled inputs through the model, observing its output, defining the error (by measuring the difference between the output and the labeled value), and tuning the model's weights to minimize the error using techniques such as deep gradient descent and backpropagation. In many applications, repeating this process across many labeled inputs of the training dataset gives a model that can produce the correct output when presented with inputs different from those present in the training dataset. In some embodiments, training of a machine learning model may be achieved in an unsupervised manner, for example, without providing labels or classifications during training. The unsupervised model may be configured to perform anomaly detection, result clustering, etc.

[0123] For each training data item in the training dataset, that training data item may be input into a model (e.g., a machine learning model). The model may then process the input training data items (e.g., some measured dimensions of a manufactured device, a cartoon picture or cartoon drawing of a manufactured device, a predicted CD profile image of a manufactured device, spectral data of a manufactured device, etc.) to produce an output. The output may include, for example, a synthetic microscope image, a predicted CD profile, a predicted CD profile image, and / or other similar items. This output may be compared to the labels of the training data items (e.g., an actual microscope image of a device, an actual CD profile, and / or an actual CD profile image related to the measured dimensions and / or spectral data).

[0124] Next, the processing logic may compare the generated output (e.g., composite image, predicted CD profile, and / or CD profile predicted image) with the labels included in the training data items (e.g., actual image, actual CD profile, and / or actual CD profile image). The processing logic determines the error (i.e., classification error) based on the difference between the output and the labels. The processing logic adjusts one or more weights and / or values of the model based on this error.

[0125] When training a neural network, an error term or delta may be determined for each node of the artificial neural network. Based on this error, the artificial neural network adjusts one or more parameters of the artificial neural network parameters (weights for one or more inputs to one node) for one or more nodes of the artificial neural network. Parameters may be updated in a backpropagation manner, starting with updating the nodes of the top layer, then the nodes of the next layer, and so on. The artificial neural network contains multiple layers of "neurons," each layer receiving values as input from the neurons of the previous layer. The parameters for each neuron include weights associated with the values received from each of the neurons of the previous layer. Therefore, parameter adjustment may involve adjusting the weights assigned to each of the inputs to one or more neurons in one or more layers of the artificial neural network.

[0126] System 300 may train multiple models using multiple sets of features from the training set 302 (e.g., a first set of features from the training set 302, a second set of features from the training set 302, etc.). In addition, or instead, System 300 may train multiple models based on different feature model configurations and / or combinations. System 300 may select one or more feature model configurations to process the data (e.g., measurement data) before inputting it into one or more machine learning models under training. Different feature model configurations include different principal component analysis (PCA) model configurations, different independent component analysis (ICA) model configurations, different fast Fourier transform (FFT) model configurations, other different model configurations, and / or combinations thereof. Furthermore, different types of models may be trained in parallel. For example, System 300 may train models to produce a first trained model using a first feature model combination, and a second trained model using a second feature model combination. The first and second models may each be, for example, one of the following: multi-layer perception (MLP), gradient boosted tree (GBT), random forest, support vector regression (SVR), neural network, or recursive algorithm. In some embodiments, the first and second trained models may be combined to produce a third trained model (which may, for example, be a predictor or synthetic data generator that is better than the first or second trained model on its own). In some embodiments, there may be overlapping sets of features and / or feature models used in different machine learning models. In some embodiments, hundreds of models may be generated, including models, feature models, machine learning models, and / or combinations thereof, with various feature substitutions.

[0127] In block 314, system 300 performs model verification using verification set 304 (e.g., via verification engine 184 in Figure 1). System 300 may verify each of the trained models using a corresponding set of features, feature model configurations, etc., from verification set 304. In some embodiments, system 300 may verify hundreds of models generated in block 312 (e.g., models with various feature substitutions, combinations of models, etc.). In some embodiments, system 300 may determine a score for each of the trained machine learning models. In some embodiments, system 300 may determine which of the trained machine learning models has the highest accuracy score, processor utilization score, processing speed score, and / or memory utilization score. In some embodiments, system 300 may further determine a selection value for each of the trained machine learning models, the selection value including at least one of the root mean square error (RMSE) value, R-squared (R2) value, or error variance value. In some embodiments, the system 300 may determine which trained machine learning model has the best root mean square error (RMSE), R-squared (R2) value, or error variance value.

[0128] In block 314, system 300 may determine the accuracy of each of the one or more trained models (e.g., via model confirmation) and determine whether one or more of the trained models have an accuracy that meets a threshold accuracy. In response to determining that none of the trained models have an accuracy that meets a threshold accuracy, the flow returns to block 312, in which system 300 performs model training using different sets of features from the training set. In response to determining that one or more of the trained models have an accuracy that meets a threshold accuracy, the flow proceeds to block 316. System 300 may discard trained models that have an accuracy lower than the threshold accuracy (e.g., based on the confirmation set).

[0129] In block 316, system 300 performs model selection (e.g., via selection engine 185 in Figure 1) to determine which of the one or more trained models that satisfy the threshold precision has the highest precision (e.g., the selected model 308 based on the confirmation in block 314). If it has determined that two or more of the trained models that satisfy the threshold precision have the same precision, the flow may return to block 312, in which system 300 performs model training to determine the trained model with the highest precision using an improved training set corresponding to an improved set of features.

[0130] In block 318, system 300 tests the selected model 308 by performing a model test (e.g., via the test engine 186 in Figure 1) using the test set 306. System 300 may also test a first trained model using a first set of features and / or feature models in the test set and determine that the first trained model meets the threshold precision. In response that the precision of the selected model 308 does not meet the threshold precision (e.g., the selected model 308 is too well-fitted to the training set 302 and / or verification set 304 and cannot be applied to other datasets such as the test set 306), the flow proceeds to block 312, in which system 300 performs model training (e.g., retraining) using different training sets corresponding to different sets of features and / or feature models. In response that it has determined, based on the test set 306, that the selected model 308 has precision that meets the threshold precision, the flow proceeds to block 320. In at least block 312, the model may learn patterns in the training data to make predictions or generate synthetic data, and in block 318, the system 300 may apply this model to the remaining data (e.g., test set 306) to test the predictions or generation of synthetic data.

[0131] In block 320, system 300 receives current data 322 (current measurement data 166 in Figure 1, e.g., measurements from an in-situ measurement device or integrated measurement device) using a trained model (e.g., selected model 308) and determines (e.g., extracts) output data 324 (e.g., composite data 162 in Figure 1) from the output of the trained model. In one embodiment, a CD profile is generated by the trained model. In one embodiment, the CD profile is used by another trained model to generate a composite microscopic image. In one embodiment, measurements from the measurement device, the CD profile and / or the composite microscopic image are input to another machine learning model and / or rule-based engine that determines corrective actions. Taking the output data 324 into consideration, corrective actions related to the manufacturing equipment 124 in Figure 1 may be performed. In some embodiments, the current data 322 may correspond to the same type of features in the historical data used to train the machine learning model. In some embodiments, the current data 322 may correspond to a subset of the types of features in the historical data used to train the selected model 308 (for example, the machine learning model may be trained using some measurements and may be configured to produce an output based on a subset of the measurements).

[0132] In some embodiments, the operation using the trained model of block 320 may not include providing current data 322 to the selected model 308. In some embodiments, the selected model 308 may be configured to generate synthetic microscopy image data. Training may include providing true microscopy image data to the machine learning model. In some embodiments, the selected model 308 may be configured to generate synthetic CD profile data and / or synthetic CD profile prediction image data. Training may include providing true CD profile data and / or true CD profile prediction image data to the machine learning model. The training data (e.g., training set 302) may include attribute data. The attribute data may include information to label the training data, such as an indication of which tool is associated with the data, the type and ID of the sensor, an indication of the tool's service life (e.g., time elapsed since the tool was installed, time elapsed since the last maintenance event), an indication of defects or impending defects in the manufacturing equipment which may be reflected in the training data, product type or design, target characteristics of the output image, target characteristics of the output data, and so on. The use of the selected model 308 may include providing the model with instructions to generate synthetic microscope image data. The use of the selected model 308 may also include providing the model with instructions to generate synthetic CD profile data and / or synthetic CD profile predictive image data. The use of the selected model 308 may also include providing one or more attributes. The generated data may conform to this one or more attributes, and synthetic data may be generated that resembles, for example, data from a specific tool, data collected when a defect is present in manufacturing equipment, data collected from a specific product design, or image data of a target contrast level or target brightness level.

[0133] In some embodiments, the performance of a machine learning model trained, verified, and tested by system 300 may degrade. For example, the manufacturing system associated with the trained machine learning model may undergo gradual or abrupt changes. As a result of the changes in the manufacturing system, the performance of the trained machine learning model may degrade. A new model may be generated to use in place of the degraded machine learning model. This new model may be generated by modifying the old model through retraining, generating a new model, or other means.

[0134] In some embodiments, one or more of operations 310-320 may be performed in various orders and / or together with other operations not presented and described herein. In some embodiments, one or more of operations 310-320 may not be performed. For example, in some embodiments, one or more of the data partitioning of block 310, model verification of block 314, model selection of block 316, or model testing of block 318 may not be performed.

[0135] Figures 4A-4B are flowcharts of methods 400A-4B relating to the training and utilization of a machine learning model according to one embodiment. Methods 400A-4B may be executed by processing logic, which may include hardware (e.g., circuits, dedicated logic circuits, programmable logic circuits, microcode, processing devices, etc.), software (e.g., instructions executed on processing devices, general-purpose computer systems or dedicated machines), firmware, microcode, or a combination thereof. In some embodiments, methods 400A-4B may be partially executed by a prediction system 110. Method 400A may be partially executed by a prediction system 110 (e.g., server machine 170 and dataset generator 172 in Figure 1, dataset generators 272A-4B in Figures 2A-2B, etc.). The prediction system 110 may use method 400A to generate a dataset for performing at least one of training, verification, or testing a machine learning model according to embodiments of the present disclosure. Method 400B may be executed by the prediction server 112 (e.g., prediction component 114) and / or the server machine 180 (e.g., the server machine 180 may perform training, verification, and test operations). In some embodiments, a non-temporary machine-readable storage medium stores instructions that cause a processing device (e.g., the processing device of the prediction system 110, the processing device of the server machine 180, the processing device of the prediction server 112, etc.) to execute one or more of the methods 400A to C when executed by that processing device.

[0136] For the sake of simplicity, methods 400A and 400B are illustrated and described as a series of operations. However, the operations according to this disclosure may be performed in various orders and / or simultaneously, and may be performed in conjunction with other operations not presented and described herein. Furthermore, not all illustrated operations may be performed to carry out methods 400A and 400B according to the disclosed subject matter. In addition, those skilled in the art will understand and recognize that methods 400A and 400B may also be alternatively represented as a series of interrelated states by a state diagram or events.

[0137] Figure 4A is a flowchart of method 400A for generating a dataset for a machine learning model, according to several embodiments. Referring to Figure 4A, in some embodiments, in block 401, the processing logic that implements method 400A initializes the training set T to an empty set.

[0138] In block 402, the processing logic generates a first data input (e.g., a first training input, a first verification input) which may include one or more of the following: sensor data, manufacturing parameters, and measurement data (e.g., non-destructive measurement data and / or destructive measurement data). In some embodiments, (e.g., as described with respect to Figure 3,) this first data input may include a first set of features for the data type, and a second data input may include a second set of features for the data type. In some embodiments, the input data may include historical data and / or synthesized data.

[0139] In some embodiments, in block 403, the processing logic optionally generates a first target output for one or more data inputs (e.g., a first data input) from among these data inputs. In some embodiments, this input includes one or more measured values, and the target output is a microscopic image. In some embodiments, this input includes one or more predicted CD profiles and / or CD profile predicted images, and the target output is a microscopic image. In some embodiments, this input includes spectral data, and the target output is a predicted CD profile and / or CD profile predicted image. In some embodiments, this input includes a cartoon image of the device (e.g., generated using a combination of measured values and one or more design rules), and the target output is a microscopic image. In some embodiments, the first target output is predicted data. In some embodiments, the input data may be in the form of sensor data, as in the case of a machine learning model configured to identify a faulty manufacturing system, and the target output may also be a list of components that are likely to be defective. In some embodiments, no target output is generated (e.g., an unsupervised machine learning model that can group or find correlations in the input data without requiring the provision of a target output).

[0140] In block 404, the processing logic optionally generates mapping data that instructs input / output mapping. This input / output mapping (or mapping data) may relate to data inputs (e.g., one or more of the data inputs described herein), target outputs for the data inputs, and the relationship between the data inputs and target outputs. In some embodiments, such as embodiments relating to machine learning models in which no target outputs are provided, block 404 may not be executed.

[0141] In some embodiments, in block 405, the processing logic adds the mapping data generated in block 404 to the dataset T.

[0142] In block 406, the processing logic branches based on whether the dataset T is sufficient for at least one of training, verification, and / or testing a machine learning model, such as the synthetic data generator 174 or model 190 in Figure 1. If it is sufficient, execution proceeds to block 407; otherwise, execution returns to block 402. It should be noted that in some embodiments, the determination of whether the dataset T is sufficient may be based simply on the number of inputs in the dataset, in some embodiments on the number of inputs in the dataset mapped to outputs, and in some other embodiments, the determination of whether the dataset T is sufficient may be based on one or more other criteria (e.g., a measure of data example diversity, precision, etc.) in addition to or instead of the number of inputs.

[0143] In block 407, the processing logic provides a dataset T (to, for example, a server machine 180) to train, verify, and / or test a machine learning model 190. In some embodiments, dataset T is the training set, and dataset T is provided to the training engine 182 of the server machine 180 to perform training. In some embodiments, dataset T is the verification set, and dataset T is provided to the verification engine 184 of the server machine 180 to perform verification. In some embodiments, dataset T is the test set, and dataset T is provided to the test engine 186 of the server machine 180 to perform testing. For example, in the case of a neural network, input values of a given input / output mapping (e.g., numerical values associated with data input 210A) are input to the neural network, and output values of the input / output mapping (e.g., numerical values associated with target output 220A) are stored in the output nodes of the neural network. The connection weights of the neural network are then adjusted according to a learning algorithm (e.g., backpropagation), and this procedure is repeated for the remaining input / output mappings of dataset T. After block 407, at least one of the following can be performed on the model (e.g., model 190): training using the training engine 182 of the server machine 180, verification using the verification engine 184 of the server machine 180, or testing using the test engine 186 of the server machine 180. The trained model may be performed by the prediction component 114 (of the prediction server 112) to generate prediction data 168 for performing signal processing, to generate synthetic data 162, or to perform corrective actions related to the manufacturing equipment 124.

[0144] Figure 4B is a flowchart of a method for model training and verification according to several embodiments. Referring to Figure 4B, in some embodiments, in block 412, the processing logic that implements method 400B receives spectral data of the substrate (which may be a first type of measurement data, e.g., from a non-destructive measurement tool) and measurement data corresponding to the spectral data of the substrate (e.g., TEM images, SEM images, data from a destructive measurement tool, etc.). In some embodiments, the spectral data may include infrared (IR) signal data and / or reflectance data.

[0145] In block 414, the processing logic determines a plurality of feature model configurations for each of the plurality of feature models, each of the plurality of feature model configurations containing one or more feature model conditions. In some embodiments, the plurality of feature models may include at least one of the following: principal component analysis (PCA) models, independent component analysis (ICA) models, or fast Fourier transform (FFT) models. In some embodiments, the feature models may be used to generate feature vectors and / or feature datasets from spectral data. In some embodiments, such feature vectors and / or feature datasets are input to the machine learning model instead of, or in addition to, the raw spectral data. In some embodiments, different representations of spectral data (e.g., different feature models) may better correspond to different types of predictions (e.g., predicted CD profiles). In some embodiments, spectral data may be processed using one model or a combination of models and / or model configurations before inputting the data to the machine learning model in order to provide accurate predictions for a given type of prediction.

[0146] In block 416, the processing logic determines a plurality of feature model combinations, each of which includes a subset of a plurality of feature model configurations. In some embodiments, the feature model combinations may better address different types of predictions (e.g., predicted CD profiles). In some embodiments, spectral data may be processed using the model configurations before inputting the data into the machine learning model in order to provide accurate predictions for a given type of prediction. For example, in some embodiments, the feature combinations may include PCA conditions 1 and 2, ICA condition 1, and FFT conditions 1 and 2. In some embodiments, the processing logic may iteratively generate many feature combinations and exhaust all available combinations.

[0147] In block 418, the processing logic generates multiple input datasets, each of which is generated by applying spectral data to each of the multiple feature model combinations. In some embodiments, spectral data may be applied to a feature model combination including PCA condition 1, ICA condition 1, and FFT condition 1, for example. Alternatively, spectral data may be applied to a feature model combination including PCA condition 1 and ICA condition 1. Since each feature model combination is different, input datasets are generated by applying spectral data to each of these feature model combinations.

[0148] In block 420, the processing logic trains multiple machine learning models, each machine learning model being trained to produce an output using an input dataset from a plurality of input datasets and the measurement data. In some embodiments, this output may include CD profile predictions. In some embodiments, the plurality of machine learning models may include at least one of the following: multilayer perception (MLP), gradient boosting tree (GBT), random forest, support vector regression (SVR), neural network, or recursive algorithm. In some embodiments, the processing logic further processes the measurement data before performing training, and this processing includes at least one of filtering, smoothing, clustering, or quantizing the measurement data. In some embodiments, the models being trained may be of the same type, but the models being trained may be trained using different input data (e.g., different data generated by different feature model combinations). In some embodiments, the models being trained may be of different types, and the models being trained may be trained using the same input data and / or different input data (e.g., different data generated by different feature model combinations). To obtain the model that best satisfies the selection criteria (see Block 422), modify the type of model being trained and the data used to train the model.

[0149] In block 422, the processing logic selects a trained machine learning model from among several trained machine learning models that satisfies one or more selection criteria. In some embodiments, one or more selection criteria may include at least one of accuracy criteria, processor utilization criteria, processing speed criteria, or memory utilization criteria. In some embodiments, the processing logic may further determine a selection value for each trained machine learning model among the several trained machine learning models. In some embodiments, this selection value may include at least one of root mean square error (RMSE) value, R-squared (R2) value, or error variance value. In some embodiments, this selection value is a numerical value corresponding to a criterion. For example, the processing logic may select the machine learning model that most accurately defines the depth of CD. In such an example, the machine learning model may be selected under this criterion because it has the selection value (e.g., error variance value) that best satisfies the selection criterion (e.g., it has the lowest error variance value, i.e., it is closest to predicting the depth of CD). In some embodiments, one or more selection criteria may include a selection value criterion.

[0150] Figure 5 is a block diagram illustrating the feature modeling and training of a set of machine learning models according to several embodiments. In some embodiments, feature modeling 500 may be performed to extract meaningful information for predictive modeling and training of the machine learning model. In some embodiments, different methods may be applied to generate features with different physical and / or statistical meanings. Each method requires configuration parameters that need to be optimized. In some embodiments, features may be measurable attributes of the training data for the machine learning model. Features may also be quantifiable properties that a machine learning model algorithm can use to train the machine learning model. In other words, features may be properties of the input data (e.g., spectral data) used to perform predictions. In some embodiments, multiple feature models may include at least one of the following: principal component analysis (PCA) models, independent component analysis (ICA) models, or fast Fourier transform (FFT) models. Each feature may be identified as more or less important to a certain type of model that performs certain predictions by using the feature model. In some embodiments, feature models may be used to generate feature vectors and / or feature datasets from spectral data.

[0151] In some embodiments, a feature model 502 is applied to spectral data 501. In some embodiments, the spectral data 501 may include reflectometry data and / or IR data of the substrate. The spectral data 501 may include a set of measurements taken across the surface of the substrate. For example, the spectral data of the substrate may include a map of the substrate having spectral measurements at multiple locations on the substrate. In some embodiments, the feature model may include principal component analysis (PCA), independent component analysis (ICA), and / or fast Fourier transform (FFT) analysis. Each of the feature models may perform feature extraction of different types and / or levels of the spectral data 501 to extract useful information from the spectral data 501 and / or put the spectral data 501 into a form more convenient for input into a machine learning model. In some embodiments, different feature extraction techniques may be used to analyze the spectral data 501.

[0152] In some embodiments, PCA may be used to analyze datasets containing multiple dimensions and / or features per observation, increasing the interpretability of the data while preserving the maximum amount of information and enabling visualization of multidimensional data. In some embodiments, PCA may be used as a statistical technique to reduce the dimensionality of the dataset. This may be done by linearly transforming the data into a new coordinate system that can describe (most of) the variation in the data with fewer dimensions than the initial data. In some embodiments, spectral data 501 (e.g., IR signals) may be compressed (dimensionality reduced) into a common correlation pattern space. The number of common patterns found in the IR signal data may determine the amount or number of PCA components 503. Different numbers of PCA components may be more or less useful for predicting different types of properties (e.g., CD profiles). In some embodiments, individual components may correspond to PCA components 506. PCA condition 1 may be a first common pattern or a first principal component, PCA condition 2 may be a second common pattern or a second principal component, and so on. Different PCA feature model configurations may be determined and used to train different machine learning models to identify the optimal PCA model configuration for a particular feature being predicted. Each PCA feature model configuration may correspond to several different PCA components (for example, the top X PCA components may be used for model configuration 1, the top Y PCA components for model configuration 2, and so on). In some embodiments, feature model configuration 1 for PCA may include setting the number of PCA components to use for feature calculation (e.g., PCA conditions 1 to K). PCA-based features may indicate common behavior across a substrate (e.g., across a wafer) that can be extracted from spectral data.

[0153] In some embodiments, ICA may be used to split a multivariate signal into additive subcomponents. This may be done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent of each other. ICA may be used to determine where on the substrate surface the reflectometry data originates. For example, an IR signal may be decomposed into individual components. The different components may be based on spectral data, for example, spectral data from the substrate plane, the substrate hole area, the surface area between holes on the substrate, etc. In some embodiments, the individual components may correspond to ICA condition 507. For example, ICA condition 1 may be a component of the decomposed IR signal corresponding to an etched pattern on the substrate surface. ICA condition 2 may be a component of the decomposed IR signal corresponding to particles on the substrate surface. Different ICA feature model configurations may be determined and used to train different machine learning models to identify the optimal ICA model configuration for a particular characteristic under prediction. Each ICA feature model configuration may correspond to a different single ICA component or combination of ICA components. In some embodiments, the feature modeling configuration 1 for ICA may include setting the number and / or classes (e.g., ICA conditions 1 to M) of ICA components to be used in feature calculations. In some embodiments, the ICA-based features may include ICA conditions 507 and ICA-based reprojections 504. In embodiments, the ICA features may provide information relating to etching morphology.

[0154] In some embodiments, the FFT computes the discrete Fourier transform (DFT) or inverse transform (IDFT) of the sequence. The FFT analysis transforms a signal from the original domain (often time or space) into a frequency domain representation and vice versa. The DFT may be obtained by decomposing a sequence of values into components of different frequencies. In some embodiments, the FFT may be used to decompose an IR signal into individual frequency domains. In some embodiments, the individual components (e.g., individual frequency domains) may correspond to different FFT conditions 508. For example, FFT condition 1 may be the frequency domain with the highest intensity. FFT condition 2 may be the frequency domain with the second highest intensity. In some embodiments, the FFT may be used to decompose an IR signal into individual components of the frequency domain. In some embodiments, the feature modeling configuration 1 for the FFT may include setting the number of frequency domains (e.g., FFT conditions 1 to N) to be used for feature calculation. In some embodiments, the individual components may correspond to FFT conditions 508. For example, FFT condition 1 may be the decomposed frequency of the frequency domain with the highest intensity. FFT condition 2 may be the decomposed frequency in the frequency domain that has the second highest intensity. In some embodiments, feature modeling configuration 1 for the FFT may include setting the number of frequency domain frequencies (e.g., FFT conditions 1 to N) to be used for feature calculation.

[0155] Different feature model combinations (e.g., different combinations of PCA, ICA, and / or FFT configurations) may be determined. The determined feature model combinations may then be used to process the training data, which is then used to train a machine learning model. Different feature model combinations may be used to process data for training different machine learning models. These different models may be of the same type or of different types. Therefore, different combinations of processes performed on the input data may be used for different models and different types of models.

[0156] In some embodiments, the input processing may include settings for how feature modeling results are combined to generate the input data (e.g., PCA score / ICA projection / FFT projection / ICA projection / PCA+ICA projection / ICA projection+FFT projection / PCA+FFT projection, etc.). In some embodiments, the FFT-based features may include an FFT condition 508 and a reprojection 505 based on the FFT.

[0157] In some embodiments, different types of machine learning algorithms / models may be trained. Different machine learning models may be trained using data processed with the same or different input combinations (e.g., combinations of feature models used to process the data before inputting the data into the machine learning model). In some embodiments, the machine learning algorithms may include multilayer perception (MLP) 512, gradient boosting tree (GBT) 514, random forest algorithms, support vector regression (SVR) 516, neural networks 517, recurrent algorithms, or other types of machine learning models.

[0158] Figures 5B–5D show a few examples of different combinations of feature model configurations and machine learning models that can be trained in parallel. It should be noted that in practice, many more different combinations of feature model configurations and / or machine learning models may be tested.

[0159] In Figure 5B, feature model configurations are applied to training data 519. In some embodiments, for example, PCA configuration 1 520A, ICA configuration 1 522A, and FFT configuration 1 524A are applied to training data 519. After applying this set of feature model configurations, training data 519 becomes modified training data 1 526A. Then, the modified training data 1 526A is used to train the machine learning model 1 530A.

[0160] In Figure 5C, a feature model configuration is applied to the training data 519. In some embodiments, for example, PCA configuration 1 520A and ICA configuration 1 522A are applied to the training data 519. After applying this set of feature model configurations, the training data 519 becomes modified training data 2 526B. The machine learning model 1 530A is then trained using the modified training data 2 526B. It should be noted that although machine learning model 1 530B is trained using the same machine learning model training algorithm as machine learning model 1 530A, machine learning model 1 530B may be a different machine learning model that is trained using a different dataset generated by applying a different set of feature models to the training data 519.

[0161] In Figure 5D, a feature model configuration is applied to the training data 519. In some embodiments, for example, PCA configuration 1 520A, ICA configuration 1 522A, and FFT configuration 1 524A are applied to the training data 519. After applying the feature model configuration, the training data 519 becomes modified training data 1 526A. Next, the machine learning model 2 530C is trained using the modified training data 1. It should be noted that although machine learning model 2 530C is trained on the same data as machine learning model 1 530A, it may be a different machine learning model that is trained using a different machine learning model algorithm.

[0162] Referring again to Figure 5A, multiple machine learning models may be trained in parallel or sequentially. Different models may be trained using input data processed with different feature models (e.g., different PCA models, ICA models, FFT models, etc.). The different models may be different types of machine learning models, or they may be of the same type. In block 518, each trained model may be checked. Checking may include determining the accuracy of the trained machine learning model in making predictions (e.g., predicting CD profiles based on spectral data). Based on the accuracy of one or more trained models, patterns of feature model combinations and / or machine learning algorithms that yield more favorable results may be determined, and such patterns may be used to select additional feature model combinations and / or machine learning algorithms to test. In embodiments, multiple training iterations may be performed, in which case each subsequent iteration may be based on the results of the previous one or more iterations. In each iteration, different combinations of feature model configurations and / or machine learning models notified by the previous tests may be tested. Generally, one or more parameters are augmented with each additional test iteration or several additional tests. For example, accuracy may be improved, process time may be reduced, and ML size may be decreased. This process may be repeated until a machine learning model that satisfies one or more target criteria is trained. The ML model may then be put into production. For example, the determined feature model configuration may be applied to new spectral data of the processed substrate, and then the processed substrate may be processed by the trained machine learning model to predict the CD profile of the processed substrate.

[0163] Figure 6 is a flowchart of a method for model training and verification according to several embodiments. Method 600 may be performed by processing logic, which may include hardware (e.g., circuits, dedicated logic circuits, programmable logic circuits, microcode, processing devices, etc.), software (e.g., processing devices, instructions executed on a general-purpose computer system or dedicated machine), firmware, microcode, or a combination thereof. In some embodiments, Method 600 may be performed in part by a prediction system 110. Method 600 may be performed in part by a prediction system 110 (e.g., server machine 170 and dataset generator 172 in Figure 1, dataset generators 272A-B in Figures 2A-2B). The prediction system 110 may use Method 600 to generate a dataset for performing at least one of training, verification, or testing a machine learning model according to embodiments of the present disclosure. Method 600 may be performed by a prediction server 112 (e.g., prediction component 114) and / or server machine 180 (e.g., training, verification, and testing operations may be performed by server machine 180). In some embodiments, a non-temporary machine-readable storage medium stores instructions that cause a processing device (e.g., a processing device of the prediction system 110, a processing device of the server machine 180, a processing device of the prediction server 112, etc.) to execute one or more of the methods 600 when executed by that processing device.

[0164] For the sake of simplicity, Method 600 is illustrated and described as a series of operations. However, the operations provided herein may be performed in various orders and / or simultaneously, and may be performed in conjunction with other operations not presented and described herein. Furthermore, not all illustrated operations may be performed to carry out Method 600 according to the disclosed subject matter. In addition, those skilled in the art will understand and recognize that Method 600 may also be alternatively represented as a series of interrelated states by a state diagram or events.

[0165] Referring to Figure 6, in some embodiments, block 602 determines what type of output prediction the processing logic implementing method 600 will train and use. In some embodiments, the model output is a single output prediction (i.e., the model output predicts one CD measurement, e.g., top CD, middle CD, or bottom CD). In some embodiments, the model output is a multiple output prediction (e.g., vertical CD profile measurement points, wafer map of a defined depth region). In some embodiments, the wafer map of a defined depth region may include the top CD region, the middle CD region, and / or the bottom CD region.

[0166] In some embodiments, in block 604, the processing logic loads a dataset. In some embodiments, this dataset is matched with the IR and measurement outputs of one or more substrates. In some embodiments, there are test locations defined by customer monitoring locations.

[0167] In some embodiments, in block 606, processing logic processes output data using output processing logic. In some embodiments, the output processing logic is applied to a multi-output prediction model. Measurement data loaded as a dataset for a multi-output prediction model may be noisy, and the measurement data may be processed to reduce the noise. In some embodiments, the output data processing logic includes filtering, smoothing, clustering, and / or quantization of the measurement data.

[0168] In some embodiments, in block 608, the processing logic executes an automatic modeling loop. In some embodiments, the automatic modeling loop includes blocks 610-618.

[0169] In some embodiments, in block 610, the processing logic computes a feature modeling approach for PCA.

[0170] In some embodiments, in block 612, the processing logic computes a feature modeling approach for ICA.

[0171] In some embodiments, in block 614, the processing logic computes a feature modeling approach for the FFT.

[0172] In some embodiments, in block 616, the processing logic generates an input dataset for modeling using the feature modeling results. Generating the input dataset involves processing data from the training dataset using the determined feature modeling approach (e.g., determined in blocks 610, 612, and 614) to generate an updated training dataset.

[0173] In some embodiments, in block 618, the processing logic trains and validates a model using a predefined test dataset generated in block 616, as described with reference to Figure 5A. After training and validation, the automated modeling loop 608 may be repeated using different combinations of feature modeling approaches and / or different types of machine learning models. This process may be repeated until the trained machine learning model satisfies one or more target decision criteria.

[0174] In some embodiments, in block 620, the processing logic summarizes the automated modeling results. In some embodiments, this summary includes stacking the results from the loop. For example, the feature modeling approach calculation may determine that the feature combination includes PCA conditions 1 and 2, ICA condition 1, and FFT conditions 1 and 2. The processing logic may use the automated modeling loop to train a first model using the feature model combination PCA 2 + ICA 1. The processing logic may then use the automated modeling loop to train a second model using the feature combination PCA 1 + ICA 1 + FFT 1. In embodiments, the results of these different trained machine learning models may be stacked and / or compared. In some embodiments, the processing logic may continue to use the automated modeling loop to train machine learning models with new feature combinations until all possible feature combinations have been used, or until one or more machine learning models that satisfy one or more criteria have been trained. In some embodiments, each feature combination may be used to train a machine learning model or multiple machine learning models using a multilayer perception (MLP) algorithm, a gradient boosting tree (GBT) algorithm, a random forest algorithm, a support vector regression (SVR) algorithm, a neural network algorithm, and / or a recursive algorithm. In some embodiments, other machine learning algorithms may be used to train the machine learning model.

[0175] In some embodiments, the MLP algorithm configuration may include the initial seed, training speed, etc. In some embodiments, the GBT algorithm configuration may include the number of parameters, the number of local models, etc. In some embodiments, the SVR algorithm configuration example may include the regularization condition, tolerance condition, etc.

[0176] In some embodiments, the automated modeling scheme may include an automated input combination process incorporated with different feature modeling configurations. Different input combinations may be used and verified during the model training and verification steps.

[0177] In some embodiments, in block 622, the processing logic determines the best modeling configuration. In some embodiments, this determination includes determining which models satisfy one or more selection criteria. In some embodiments, one or more selection criteria may include at least one of accuracy criteria, processor utilization criteria, processing speed criteria, or memory utilization criteria. In some embodiments, the processing logic may further determine a selection value for each of the trained machine learning models, the selection value including at least one of root mean square error (RMSE) value, R-squared (R2) value, or error variance value. In some embodiments, one or more selection criteria may include a selection value criterion.

[0178] In some embodiments, in block 624, the processing logic performs the final training and verification of the selected model.

[0179] Figure 7 is a block diagram showing the output processing of cross-sectional measurement data according to several embodiments. In the embodiments, this output processing may be performed, for example, in block 606 of method 600.

[0180] In some embodiments, output processing logic is applied to remove noise from cross-sectional measurement data (e.g., cross-sectional SEM (xSEM) and / or TEM). In some embodiments, the cross-sectional measurement data is raw data 706. In some embodiments, the output processing logic may include a smoothing process 707, a filtering process 708, a clustering process 710, and / or a quantization process 709.

[0181] In some embodiments, the prediction algorithm may benefit from a discretized output rather than a continuous signal, and the output data may be discretized using a quantization process 709.

[0182] Cross-sectional SEM measurements may be taken from the top to the bottom. In some embodiments, measurements may be performed at 5-nanometer intervals from the top to the bottom. In some embodiments, measurements may be performed at 10-nanometer intervals. In some embodiments, measurements may be performed at 10-nanometer intervals or any other arbitrary intervals. In some embodiments, the measurement at the highest point is top CD701, the intermediate measurement is intermediate CD702, and the measurement at the lowest point is bottom CD703. In some embodiments, only one CD measurement is performed per sample. The measurement performed may be any one of top CD701, intermediate CD702, or bottom CD703.

[0183] Figure 8 is a block diagram showing feature modeling according to several embodiments. In some embodiments, the feature modeling module 800 may be expanded to extract meaningful information for predictive modeling from the IR signal data 802. In some embodiments, different methods may be applied to generate features with different physical and / or statistical meanings. Each method requires configuration parameters that need to be optimized.

[0184] In some embodiments, the PCA-based feature may include a PCA condition 804 and a PCA component 810. The PCA-based feature may exhibit common behavior across the substrate. In some embodiments, the overall behavior may be extracted.

[0185] In some embodiments, the ICA-based feature may include an ICA condition 806 and an ICA-based reprojection. The ICA-based feature may represent signals collected from different sources (e.g., a plane, a hole area, a surface between holes, etc.). In some embodiments, the ICA-based feature may be etching morphology-oriented information.

[0186] In some embodiments, the FFT-based feature may represent the FFT condition 808 and the reprojection based on the FFT. The FFT-based feature may include signals collected from different phases (for example, the light source is a kind of wave signal). In some embodiments, the FFT-based feature may compare the etching shape and different wave signals.

[0187] Figure 9A is a scatter plot of single-output training and test predictions for several embodiments of machine learning models.

[0188] In some embodiments, scatter plot 902 shows training and test predictions. In some embodiments, the y-axis represents measurement results and the x-axis represents model results. Each point on the scatter plot represents a prediction made by the machine learning model (e.g., a predicted CD measurement). In some embodiments, training predictions represent predictions made during training of the machine learning model. In some embodiments, test predictions represent predictions made during testing of the trained machine learning model. In some embodiments, a correlation between training and test predictions indicates that the trained machine learning model is performing well on both the training and test data. In some embodiments, such a correlation may mean that the model is performing well on new data that was not used to train the model.

[0189] Figure 9B shows time series diagrams of single-output predictions from trained machine learning models in several embodiments.

[0190] In some embodiments, time series diagram 904 shows the measurement results and model prediction results. In some embodiments, the y-axis represents the measurement results and the x-axis represents the samples. Each line on the time series diagram represents the relationship between the predictions made by the machine learning model (e.g., measurement results such as predicted CD measurements) and the samples over time. In some embodiments, the measurement results represent the actual measurement results that were performed (e.g., data points in the training dataset). In some embodiments, the model prediction results represent the predictions made during testing of the trained machine learning model. In some embodiments, the correlation between the training predictions and the test predictions indicates that the trained machine learning model is performing well and making predictions that are close to the actual measurement results. In some embodiments, such a correlation may mean that the model is performing well on new data that was not used to train the model.

[0191] Figure 9C is a wafer map of single-output predictions of machine learning models in several embodiments.

[0192] In some embodiments, the wafer map 906 indicates the measurement result at a certain coordinate on the wafer map 906 using pixel colors. In some embodiments, different colors or color intensities may represent different measurement results. For example, if a grayscale is used, darker grays may correlate with deeper CD holes, and lighter grays may correlate with shallower CD holes. In a color-coded map, different colors may represent different ranges of measured values and / or predicted values.

[0193] Figures 11A and 11B are flowcharts of methods 1100A and 11B relating to generating synthetic microscope images by training and utilizing a machine learning model, according to one embodiment. Methods 1100A and 11B may be performed by processing logic, which may include hardware (e.g., circuits, dedicated logic circuits, programmable logic circuits, microcode, processing devices, etc.), software (e.g., instructions executed on processing devices, general-purpose computer systems or dedicated machines), firmware, microcode, or a combination thereof. In some embodiments, methods 1100A and 11B may be performed in part by a prediction system 110. Method 1100A may be performed in part by a prediction system 110 (e.g., server machine 170 and dataset generator 172 in Figure 1, dataset generators 272A and 272B in Figures 2A and 2B). The prediction system 110 may use method 1100A to generate a dataset for performing at least one of training, verification, or testing a machine learning model according to embodiments of the present disclosure. Method 1100B may be executed by the prediction server 112 (e.g., prediction component 114) and / or the server machine 180 (e.g., the server machine 180 may perform training, verification, and test operations). In some embodiments, a non-temporary machine-readable storage medium stores instructions that cause a processing device (e.g., the processing device of the prediction system 110, the processing device of the server machine 180, the processing device of the prediction server 112, etc.) to execute one or more of Methods 1100A to B when executed by that processing device.

[0194] For the sake of simplicity, methods 1100A and B are illustrated and described as a series of operations. However, the operations according to this disclosure may be performed in various orders and / or simultaneously, and may be performed in conjunction with other operations not presented and described herein. Furthermore, not all illustrated operations may be performed to carry out methods 1100A and B according to the disclosed subject matter. In addition, those skilled in the art will understand and recognize that methods 1100A and B may also be alternatively represented as a series of interrelated states by a state diagram or events.

[0195] Figure 10A is a flowchart of method 1000A related to generating synthetic microscope images according to several embodiments.

[0196] Referring to Figure 10A, in some embodiments, in block 1002, a processing logic that implements method 1000A processes measurement data of a substrate processed according to a manufacturing process using a first trained machine learning model to predict the CD profile of the substrate. In some embodiments, the measurement data may include a profile map of at least one of films or features on the substrate. In some embodiments, the measurement data may include a topography map of the substrate, a defect map of the substrate, an electrical characterization of the substrate, a yield analysis of the substrate, and / or other similar data. In some embodiments, the measurement data may include spectral data. In some embodiments, the measurement data may include reflectometry data. In some embodiments, the measurement data may include ellipsometry data, photoluminescence spectroscopy data, X-ray diffraction data, Hall effect measurement data, current-voltage characteristic data, and / or other similar data. In some embodiments, the first trained machine learning model may include a deep neural network. In some embodiments, the first trained machine learning model may include a convolutional neural network, a recurrent neural network, a Boltzmann machine, a multilayer perception, a gradient boosting machine, a support vector machine, a radial basis function network, a random forest, a Gaussian mixture model, and / or other analogous models. In some embodiments, the measured data is processed using one or more feature models (e.g., an ICA model, a PCA model, an FFT model, etc.) to generate a transformed representation (e.g., a feature vector) of the measured data. As described with respect to Figures 5A to 8, the feature models may be parts of a feature model combination, which in some embodiments may be determined when the first trained machine learning model was trained. The output of the feature models may be input to the first trained machine learning model, which may output a CD profile.

[0197] In block 1004, the processing logic generates a CD profile prediction image based on the predicted CD profile of the substrate. In some embodiments, a curve symmetric to the predicted CD profile is generated by an image processing technique (for example, using the predicted CD profile and drawing a symmetric curve using decalcomania).

[0198] In block 1006, the processing logic processes the CD profile prediction image using a second trained machine learning model to generate a composite microscope image related to the substrate. In some embodiments, the second trained machine learning model may include a generative model. In some embodiments, the composite microscope image may include a virtual scanning electron microscope (VSEM) image of the cross-section of the substrate. In some embodiments, the processing logic may further measure the features of the composite microscope image and calculate the dimensions of the manufactured device based on the measurement of the features of the composite microscope image. In some embodiments, the processing logic may further cause corrective actions to be performed considering the calculated dimensions of the manufactured device. In some embodiments, the corrective actions include scheduling maintenance, updating the process recipe, and / or providing an alert to the user.

[0199] Figure 10B is a flowchart of a method relating to training a machine learning model to generate a synthetic microscope image, according to several embodiments. Referring to Figure 10B, in some embodiments, in block 1022, a processing logic that implements method 1000B receives a plurality of SEM images and a plurality of CD measurements related to a substrate. In some embodiments, the plurality of SEM images may include a plurality of CD hole images. In some embodiments, the processing logic may further remove embedded SEM scanning information and resolution defects from the plurality of SEM images. In some embodiments, the processing logic may further determine the tops and bottoms of the plurality of CD hole images based on noise reduction and blurring. In some embodiments, the processing logic may further determine the CD hole separation for the plurality of CD hole images based on maximal search. In some embodiments, the processing logic may further determine a plurality of individual CD hole cropping areas corresponding to the plurality of CD hole images based on the tops, bottoms and CD hole separation of the plurality of CD hole images. In some embodiments, the processing logic may further crop the plurality of CD hole images based on the individual CD hole cropping areas. In some embodiments, the processing logic may further resize the multiple cropped CD Hall images. In some embodiments, the resizing of the multiple cropped CD Hall images may be based on the pixel specification of the corresponding CD measurement among the multiple CD measurements.

[0200] In block 1024, the processing logic generates multiple CD profile images based on multiple CD measurements.

[0201] In block 1026, the processing logic generates an input dataset containing multiple SEM images and multiple CD profile images.

[0202] In block 1028, the processing logic trains a machine learning model using the input dataset, and training the machine learning model includes providing multiple CD measurements to the machine learning model as training inputs and providing multiple SEM images to the machine learning model as target outputs. In some embodiments, training the machine learning model may include training a GAN. In some embodiments, the input dataset may include multiple cropped CD Hall images.

[0203] In some embodiments, the SEM image generator may generate SEM images using predicted CD values. In some embodiments, SEM images (e.g., of a substrate) may be generated during mass production (e.g., using sampling).

[0204] Figure 11A is a block diagram showing the process of training a machine learning model to generate synthetic microscope images.

[0205] In some embodiments, the SEM image 1102 (or multiple SEM images) may be preprocessed. In some embodiments, the preprocessing includes removing embedded SEM scanning information and resolution defects from the SEM image 1102. In some embodiments, the preprocessing includes determining the top and bottom of the SEM image 1102 based on noise reduction and blurring. In some embodiments, the preprocessing includes determining the CD hole separation for individual CD hole images in the SEM image 1102 based on maximal search. In some embodiments, the preprocessing includes determining individual CD hole cropping areas corresponding to the CD hole images based on the top, bottom, and CD hole separation of the CD hole images. In some embodiments, the preprocessing includes cropping the CD hole images based on the individual CD hole cropping areas. In some embodiments, the preprocessing includes resizing the cropped CD hole images. In some embodiments, the resizing of multiple cropped CD hole images may be based on the pixel specification of the corresponding CD measurement among multiple CD measurements.

[0206] In some embodiments, a CD profile measurement value 1106 is obtained by measuring an SEM image 1102.

[0207] In some embodiments, block 1108 generates a profile image (e.g., a CD predictive profile image) using the CD profile prediction model 1124 shown in Figure 11B. In some embodiments, the CD profile prediction model 1124 may be the selected model 308 shown in Figure 3.

[0208] In some embodiments, block 1110 combines the preprocessed image from block 1104 with the generated profile image from block 1108. In some embodiments, the preprocessed image from block 1104 may be resized based on the pixel specification of the corresponding generated profile image (e.g., CD measurement).

[0209] In some embodiments, block 1112 inputs the combined image from block 1110 as training data into the VSEM generator model. In some embodiments, the VSEM generator model 1112 is a GAN. In some embodiments, the VSEM generator model 1112 may be trained using an SEM image (e.g., SEM image 1102) and a CD profile measurement (e.g., CD profile measurement 1106).

[0210] Figure 11B is a block diagram showing the generation of a synthetic microscope image.

[0211] In some embodiments, IR signal data 1122 is provided to a CD profile prediction model 1124. In some embodiments, the IR signal data 1122 is processed using filtering, smoothing, clustering, quantization, and / or other similar methods before being provided to the CD profile prediction model 1124. In some embodiments, a feature model (e.g., a feature model combination) is applied to the IR signal data 1122 to extract features before being provided to the CD profile prediction model 1124.

[0212] In some embodiments, the CD profile prediction model 1124 outputs a predicted CD profile.

[0213] In some embodiments, the predicted CD profile 1126 is converted into an image 1228. In some embodiments, the image 1128 may be more suitable than the predicted CD profile 1126 for input to the VSEM generator model 1130. In some embodiments, a curve symmetric to the predicted CD profile is generated by image processing techniques (e.g., using the predicted CD profile and drawing a symmetric curve using decalcomania).

[0214] In some embodiments, an image 1128 (e.g., a CD profile prediction image) is provided as input to a VSEM generator model 1130. The VSEM generator model 1130 outputs a VSEM image 1132.

[0215] Figure 12 is a block diagram illustrating the processing of SEM images according to several embodiments. In these embodiments, the SEM image data is segmented to obtain multiple different data inputs that are used to train a generative model to generate a synthetic microscope image.

[0216] In some embodiments, block 1201 processes the cross-sectional SEM image 1202. In some embodiments, the cross-sectional SEM image 1202 may be a cross-sectional SEM image of the die. The cross-sectional SEM image 1202 of the die may include multiple images of individual CD holes or other features on the substrate. In some embodiments, the initial preparation of the cross-sectional SEM image 1202 may include removing any embedded SEM scanning information 1203 and resolution defects that may interfere with image processing.

[0217] In some embodiments, in block 1205, noise reduction and blurring 1206 may be applied to the cross-sectional SEM image 1202 based on Gaussian smoothing to determine the top and bottom edges of the CD hole.

[0218] In some embodiments, in block 1210, all CD holes in the image are separated by CD hole separation 1211 based on maximal search. In some embodiments, CD hole separation may be determined individually.

[0219] In some embodiments, block 1215 determines the cropping boundary values for the top and bottom of the CD hole. In some embodiments, the cropping boundary values include a top cropping boundary 1216 and a bottom cropping boundary 1217. In some embodiments, the CD hole isolation 1211, the top cropping boundary 1216, and the bottom cropping boundary 1217 form the CD hole cropping area 1218.

[0220] In some embodiments, cropping is performed at the individual hole level in block 1220. In some embodiments, a CD hole cropping area 1218 defines the area to be cropped. In some embodiments, the size of each cropped hole image 1221 may be changed by a pixel specification consistent with the actual CD profile.

[0221] The cross-sectional SEM image of the die contains many smaller images of the CD holes. The process described in Figure 12 is a data augmentation technique to turn one cross-sectional SEM image of the die into many individual CD hole images, and thus is a data augmentation technique that increases the number of data fragments in the dataset.

[0222] Figure 13 is a block diagram related to the generation of a CD profile according to several embodiments.

[0223] In some embodiments, ground truth CD measurements from SEM images are included in a ground truth measurement table 1305, and these ground truth CD measurements are used when plotting the ground truth CD hole profile 1310. In some embodiments, each CD hole may have many acquired measurements, and these values are stored in a table, and these values may be used to form a vertical CD hole profile. In some embodiments, a single curve (e.g., CD hole profile 1310) is transformed into a symmetrical curve (e.g., CD profile prediction image 1320). In some embodiments, the CD profile 1310 and the SEM image are used to generate the CD profile prediction image 1320. In some embodiments, image processing techniques may be used to transform the CD hole profile 1310 into the CD profile prediction image 1320. In some embodiments, individually cropped CD hole images from the SEM image are aligned with the corresponding CD profile prediction image 1320.

[0224] In some embodiments, multiple output data may be xSEM data. In some embodiments, a vertical CD profile may be predicted. In some embodiments, a wafer map may be generated, and the wafer map may define depth regions (e.g., top region CD, middle region CD, bottom region CD). In some embodiments, the vertical CD profile may include the top CD, middle CD, and / or bottom CD. In some embodiments, the vertical CD profile may include more data points than the top CD, middle CD, and / or bottom CD.

[0225] In some embodiments, two CD profiles are shown. In some embodiments, one CD profile may represent a measured CD profile, and the other CD profile may represent a CD profile prediction (e.g., a predicted measurement) made by a trained machine learning model. In some embodiments, if the two profiles are closely correlated, the CD profile prediction by the trained machine learning model may be an accurate prediction. If the two profiles are not closely correlated, the CD profile prediction by the trained machine learning model may be an inaccurate prediction.

[0226] In some embodiments, wafer maps may be used to show measurement results and / or CD profile prediction results. In some embodiments, wafer maps may exist for the top, middle, and bottom regions. In some embodiments, different colors or color intensities may represent different measurement results. For example, if a grayscale is used, darker grays may correlate with deeper CD holes, and lighter grays may correlate with shallower CD holes. In a color-coded map, different colors may represent different ranges of measured values and / or predicted values.

[0227] After selecting a combination of feature models and training a machine learning model, spectral data may be processed using the combination of feature models and the trained machine learning model to determine predicted and / or estimated values regarding films, devices, etc., associated with the processed substrate that generated the spectral data. In one embodiment, the machine learning model is trained to output CD profile predictions. In an embodiment, such CD profile predictions may be input to another trained machine learning model (e.g., a generative model), which may be trained to output synthetic microscope images based on CD profiles.

[0228] Figures 14A and 14B illustrate the processes and architectures for training and operating generative adversarial networks in several embodiments. Figure 14A shows a simple GAN 1400A. In some embodiments, Figures 14A and 14B describe how GANs such as the VSEM generator model 1212 in Figure 12A and the VSEM generator model 1230 in Figure 12B (as well as the models described in Figures 11A and 11B) may be trained. During training, input data 1402 is provided to the discriminator 1408. The discriminator 1408 is configured to distinguish whether the input data 1402 is true data or synthetic data. The discriminator 1408 is trained until it achieves an acceptable accuracy. Accuracy parameters may be tuned based on the application, for example, based on the amount of training data available.

[0229] In some embodiments, input data 1402 (e.g., drawn from the same dataset used to train discriminator 1408) may be provided to the generator 1406 to train it to produce plausible synthetic data. Random inputs, such as noise 1404 (e.g., a fixed-length vector of pseudorandom values), may be provided to the generator 1406. The generator 1406 uses this random input as a seed to generate synthetic data (e.g., a synthetic microscope image). The generator 1406 provides this synthetic data to the discriminator 1408. The discriminator 1408 is further provided with additional input data 1402 (e.g., true data drawn from the same set of data used to train discriminator 1408). The discriminator 1408 attempts to distinguish the input data 1402 from the synthetic data provided by the generator 1406.

[0230] The discriminator 1408 provides the classification validation module 1410 with the classification result (e.g., whether each dataset supplied to the discriminator 1408 was labeled as true or as synthetic). The classification validation module 1410 determines whether one or more datasets were correctly labeled by the discriminator 1408. It provides feedback data indicating the labeling accuracy to both the discriminator 1408 and the generator 1406. Both the generator 1406 and the discriminator 1408 are updated taking into account the information received from the classification validation module 1410 (e.g., by backpropagation). The generator 1406 is updated to produce synthetic data that better reproduces the features of the input data 1402, for example, to produce synthetic data that is more frequently labeled as true data by the discriminator 1408. The discriminator 1408 is updated to improve its accuracy in distinguishing true data from synthetic data. The training process may be repeated until the generator 1406 reaches an accuracy threshold, for example, until a sufficiently large portion of the data generated by the generator 1406 is no longer correctly classified by the discriminator 1408.

[0231] Figure 14B is a block diagram showing the operational process of an exemplary GAN1400B for generating synthetic microscope image data according to several embodiments. In some embodiments, the exemplary GAN1400B may include many of the features discussed in relation to Figure 14A.

[0232] In some embodiments, the GAN 1400B includes a set of generators 1420 and a set of discriminators 1430. In some embodiments, the discriminators 1430 are trained by supplying them with input data 1436. The discriminators 1430 are configured to distinguish between true data and synthetic data. The generators 1420 may be configured to generate synthetic data. Noise 1412, for example, random or pseudo-random inputs, may be provided to the generators 1420 as a seed.

[0233] In some embodiments, the GAN1400B may include a plurality of generators 1420 and / or a plurality of discriminators 1430. The discriminators 1430 may be configured to accept output data from different generators or sets of generators. In some embodiments, the generator 1420 may be configured to generate attribute data by an attribute generator 1422 and to generate related data (e.g., composite microscope image data) by a feature generator 1426. In some embodiments, the feature generator 1426 is configured to generate normalized data (e.g., composite microscope data with luminance values ranging from 0 to 1), and a minimum / maximum generator is configured to generate the minimum and maximum values of that data. In some embodiments, a technique for separating the minimum / maximum generator from the feature generator 1426 may improve the performance of the generator 1420.

[0234] In some embodiments, noise 1412 may be provided to attribute generator 1422 and / or feature generator 1426. In some embodiments, each of the generators 1420 may be provided with a different set of noise (e.g., a different set of random inputs). In some embodiments, the output of attribute generator 1422 (e.g., synthetic attribute data) may be provided to auxiliary discriminator 1432. The auxiliary discriminator 1432 may determine whether this combination of attribute values is likely to be related to true data. A preliminary determination may be performed to save processing power from generating and / or discriminating synthetic data from feature generator 1426. The output of generator 1420 may be provided to discriminator 1434. The discriminator 1434 may distinguish true data from synthetic data, including attribute data, feature (e.g., microscopic image) data, etc. In some embodiments, the minimum / maximum value generator may be an optional feature. For example, GAN1400B may be configured to normalize data from feature generator 1426 with minimum / maximum values, or it may be configured to generate data values using feature generator 1426.

[0235] In some embodiments, the generator 1420 may be provided with target attributes 1423. The target attributes may define the characteristics of the target microscope image to be generated. In some embodiments, the target attributes may include features of the generated image, such as image quality, contrast, brightness, etc. In some embodiments, the target attributes may include target measurement values, one or more target design rules, etc. In some embodiments, the target measurement values and / or design rules may be provided to the generator 1420 by providing the generator 1420 with a cartoon image (such as the one shown in Figure 5E). The generator 1420 may be configured to generate a realistic composite microscope image that shares features common to the provided attributes (e.g., including a provided predicted CD profile, CD profile prediction, drawing, and / or a shape similar to the cartoon).

[0236] In some embodiments, the feature generator 1426 may include a machine learning generator model designed to generate image data, for example, a machine learning generator model designed to generate synthetic microscope image data of a manufactured product. In some embodiments, the GAN 1400B may include a conditional GAN. In the conditional GAN, during training, synthetic and true images may be provided to a discriminator, which may be configured to determine whether the synthetic image is an acceptable representation of the true image. In some embodiments, the generator may be further updated by an additional loss function. In some embodiments, the output of the generator (generated from, for example, cartoon images, drawings, predicted CD profiles and / or CD profile predicted images related to a manufactured product) may be compared to true microscope images of the related product. In some embodiments, these images may be compared pixel by pixel, feature by feature, or in other homogeneous ways. The generator may be penalized with respect to the difference between the input and output, based on the sum of absolute errors (e.g., L1 loss function), the sum of squares of errors (e.g., L2 loss function), etc.

[0237] In some embodiments, the synthetic data may include target shapes or patterns. For example, a future simulated image, e.g., a future simulated image for training a second machine learning model, may include one or more anomalous structures within an imaged product. In some embodiments, the feature generator 1426 may accept instructions to facilitate the generation of synthetic data containing target shapes or patterns. Ranges or distributions such as position (e.g., spatial position), value, or shape may be provided to the feature generator 1426. The feature generator 1426 may generate data having target shapes or patterns represented according to a distribution of features, for example, anomalous structures may appear in many sets of synthetic data within a single position range having one height range and one width range.

[0238] In some embodiments, synthetic data (e.g., data output from generator 1420) may be used to train one or more machine learning models. In some embodiments, synthetic data may be used to train a machine learning model configured for event detection, for example, a machine learning model configured to determine whether synthetic image data is within normal variation or indicates a system anomaly. In some embodiments, a robust model may be generated using synthetic data, or synthetic data with a higher noise level than the true data, and a machine learning model trained using such synthetic data may provide more useful outputs for a wider variety of inputs than a model trained only on true data. In some embodiments, the model may be tested for robustness using synthetic data. In some embodiments, a model for anomaly detection and / or classification may be generated using synthetic data. In some embodiments, synthetic data may be provided as training input to train a machine learning model, and one or more attribute data (e.g., attribute data indicating a system defect) may be provided as target output to train a machine learning model. In some embodiments, attribute data may include indications of the service life of the manufacturing system, such as the time since the system was installed, the number of products produced since the system was installed, the time since the last maintenance event, or the number of products produced since the last maintenance event.

[0239] Figure 14C is a flowchart of Method 1400C for training a machine learning model (e.g., GAN) to generate realistic synthetic microscope images, according to several embodiments. In block 1440, a true image (e.g., a microscope image of a manufactured device taken using a microscopy technique) is generated. This image may be generated using any microscopy technique, such as scanning electron microscopy or transmission electron microscopy. This image may be an image of a device or structure, including a cross-sectional image, a top view, etc. In some embodiments, the images used for training may be selected for use, because those images exhibit target characteristics, such as contrast, clarity, sharpness, etc. In some embodiments, one or more characteristics of the image (e.g., contrast, sharpness, etc.) may be classified as attributes.

[0240] In block 1442, one or more limiting dimensions (CD) are measured from the microscope image. The measured dimensions may include the height of the structure, the width of the structure, etc. The measurement may include determining the edges of the structure (e.g., using a machine learning image processing model), determining the distance from one edge to the opposite edge of the structure on the image, and calculating the size of the imaged device from the size of the image.

[0241] In block 1444, the measured CD and / or the design rules (from block 1443) are provided to the CD profile predictive image generator. The CD profile predictive image generator is configured to synthesize the predicted CD profile and the design rules to produce a CD profile predictive image of the manufactured device (e.g., a generated drawing of the CD profile). This CD profile predictive image may include information from both the measured CD and the design rules. In some embodiments, this CD profile predictive image presents accurate dimensions in a simplified picture. An exemplary CD profile predictive image is shown in Figure 13.

[0242] In block 1446, this CD profile predicted image is supplied to the composite image generator to be trained. This composite image generator may be included in a GAN. This composite image generator may be included in a conditional GAN. This composite image generator may be included in an image-to-image (e.g., pix2pix) GAN. This composite image generator generates a composite image in block 1448.

[0243] In block 1452, the CD is measured from the composite image. In some embodiments, the same CD measured in block 1442 may be measured. In some embodiments, this composite image may resemble the true image, and similar CDs may be measurable from these two images. The operation of block 1452 may have features common to the operation of block 1442.

[0244] In block 1454, an additional loss term is included in the analysis. For example, the true image and the synthesized image may be compared pixel by pixel. In some embodiments, an L1 loss function may be applied to the synthesized image, for example, to calculate a penalty to help train a machine learning model. Data indicating the differences between the synthesized image and the true image (e.g., loss terms, differences in measured CD, etc.) may be provided to the image generator. The generator may be updated to improve the similarity between the true image and the synthesized image (e.g., by adjusting the weights between neurons in a neural network).

[0245] Figure 14D shows a flowchart of Method 1400D for generating a synthetic microscope image using a machine learning-based trained image generator, according to several embodiments. In block 1460, several limit dimensions (CD) related to the manufactured product are provided to the processing logic. The provided CD may be a predicted CD, for example, generated by a machine learning model related to a system that processes the manufactured product. The provided CD may be measured, for example, by an in-situ measurement device during product processing. The provided CD may be measured, for example, by an in-line, integrated, or standalone measurement system between or after processing operations. In some embodiments, the provided product CD may be generated using non-destructive means, for example, using optical measurements, predictions based on sensor data, etc.

[0246] In block 1462, the product CD and the design rules (from block 1461) are provided to the CD profile predictive image generator. This CD profile predictive image generator may have functions similar to those of the CD profile predictive image generator in block 1444 in Figure 5C. This CD profile predictive image generator produces a cartoon and / or image associated with the manufactured device as output. This cartoon and / or image may be a CD profile predictive image. This cartoon and / or image may indicate the provided measurements (e.g., CD) and / or the provided design rules. In block 1464, the CD profile predictive image is provided to the composite image generator.

[0247] The composite image generator in block 1464 may include components such as a GAN (e.g., one or more generators of a GAN), a conditional GAN, an image-to-image GAN, a neural network, or other machine learning models. This composite image generator accepts a CD profile predictive image indicating the structure of a fabricated device and is configured to produce a realistic composite microscope image 1466 of the device as output.

[0248] The CD profile prediction image generator may be a trained machine learning model or drawing module that generates a composite CD profile prediction image (e.g., a simple line drawing) using one or more rules, or it may include a trained machine learning model or drawing module that generates a composite CD profile prediction image (e.g., a simple line drawing) using one or more rules. The cartoon generator may be provided with several design rules to use to approximate the shape of the manufactured device. The design rules may be determined based on intended device characteristics, characteristics measured by a measurement system, etc. In some embodiments, the design rules may be an approximation of the dimensions of the device. For example, the two-dimensional cross-section of a CD hole in a SEM image of a device may be roughly trapezoidal or V-shaped, and may have deviations from a strict trapezoidal or V-shape due to the physical properties of deposition / etching, heterogeneity of deposition / etching, influence from adjacent structures, etc. The CD profile prediction image generator may be configured to approximate the true cross-sectional shape of the CD hole as a V-shape. Similarly, the CD profile predictive image generator may approximate other structures as having a simpler shape when those structures may actually be more complex.

[0249] The CD profile predictive image generator may receive several measurements of the device from a manufacturing system, measurement system, etc. The CD profile predictive image generator may also receive several design rules. The design rules may be approximations of the true structural shape. In some embodiments, the composite image generator may transform the approximation into a more realistic shape.

[0250] A CD profile predictive image generator may generate a CD profile predictive image by combining supplied measurement values with design rules. Measurement data may be available for some dimensions of the device and not for others. The CD profile predictive image generator may fill in the missing measurements using design rules. For example, suppose the measurements provide a measurement of a structure, such as height. By combining this height and known design rules, the cartoon generator may generate an image that includes all internal structures by, for example, estimating the width (or incorporating another measurement value that specifies it), incorporating the intended design rules, and generating an image of the entire device.

[0251] In some embodiments, the CD profile prediction image 1320 in Figure 13 may be provided to a machine learning image generator (e.g., an image-to-image GAN generator). This machine learning model may generate a synthetic microscope image of the manufactured device as output. In some embodiments, measurements not taken during / after processing, measurements not provided to the CD profile prediction image generator, etc., may be calculated from this synthetic image. For example, consecutive width measurements at various locations on the structure may be measured from this synthetic image, and the etching depth may be quantified.

[0252] Figure 15 is a block diagram showing generative adversarial networks in several embodiments.

[0253] In some embodiments, a CD profile 1502 is provided as input to a generator 1504. The generator 1504 generates a generated SEM image 1506. In some embodiments, the generated SEM image 1506 may be provided as input to a discriminator 1508. In some embodiments, a real SEM image 1512 may be provided as input to the discriminator 1508. The discriminator 1508 may classify the data given as input as generated data 1509 or real data 1510.

[0254] In some embodiments, the discriminator 1508 may receive a CD profile 1502 (e.g., a CD profile image), a generated SEM image 1506, and / or a real SEM image 1512 as input. In some embodiments, the discriminator 1508 may receive a CD profile 1502 (e.g., a CD profile image) and a generated SEM image 1506, or a CD profile 1502 and a real SEM image 1512 as input. The CD profile 1502 is provided as input to the generator 1504 in order to save the CD profile in the output of the generator 1504 (e.g., a generated SEM image 1506).

[0255] In some embodiments, during training, the output of the discriminator 1508 is provided as input to the generator 1504 via a feedback loop. In some embodiments, during training, the output of the discriminator 1508 is provided as input to the generator 1504 via a feedback loop in order to provide the generator 1504 with feedback on how to make a better generated image.

[0256] In some embodiments, the generator is sufficiently trained when the discriminator cannot distinguish between the generated SEM image and the actual SEM image.

[0257] Figure 16 shows examples of synthetic microscope images from several embodiments.

[0258] In some embodiments, the input image 1602 (e.g., a CD profile prediction image) corresponds to the ground truth SEM image 1604 (e.g., of a cross-sectional CD hole) and the prediction SEM image 1606 (e.g., output by the generator 1504 in Figure 15). In some embodiments, the input to the generator 1504 may be multiple input images 1602, and the generator may output a cross-sectional virtual SEM image of the die. In some embodiments, when the generator outputs a single CD hole SEM image, image processing techniques may be used to combine multiple output images to generate and output a cross-sectional virtual SEM image of one or more dies on the substrate.

[0259] Figure 17 is a block diagram showing computer system 1700 in several embodiments. In some embodiments, computer system 1700 may be connected to other computer systems (for example, via a network such as a local area network (LAN), intranet, extranet, or internet). Computer system 1700 may operate as a server or client computer in a client-server environment, or as a peer computer in a peer-to-peer or distributed network environment. Computer system 1700 may be provided by a personal computer (PC), tablet PC, set-top box (STB), personal digital assistant (PDA), mobile phone, web appliance, server, network router, switch or bridge, or a device capable of executing a set of instructions (sequential or otherwise) that specify the actions to be taken by such device. Furthermore, the term “computer” includes a collection of computers that individually or jointly execute a set (or more) of instructions to perform one or more of the methods described herein.

[0260] In an additional embodiment, the computer system 1700 may include a processing device 1702, volatile memory 1704 (e.g., random access memory (RAM)), non-volatile memory 1706 (e.g., read-only memory (ROM) or electrically erasable programmable ROM (EEPROM)), and a data storage device 1718, which may communicate with each other via a bus 1708.

[0261] The processing device 1702 may be provided by one or more processors, such as a general-purpose processor (e.g., a composite instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a microprocessor that implements other types of instruction sets, or a microprocessor that implements a combination of instruction set types) or a specialized processor (e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a digital signal processor (DSP), or a network processor).

[0262] The computer system 1700 may further include a network interface device 1722 (e.g., coupled to network 1774). The computer system 1700 may also include a video display unit 1710 (e.g., a liquid crystal display), a character / number input device 1712 (e.g., a keyboard), a cursor control device 1714 (e.g., a mouse), and a signal generation device 1720.

[0263] In some embodiments, the data storage device 1718 may include a non-temporary computer-readable storage medium 1724 (e.g., a non-temporary machine-readable medium) which may store instructions 1726 that code one or more of the methods or functions described herein, including instructions that code the components of Figure 1 (e.g., the predictive component 114, the corrective action component 122, the model 190, etc.) and instructions for performing the methods described herein.

[0264] Instruction 1726 may also reside entirely or partially in volatile memory 1704 and / or processing device 1702 while instruction 1726 is being executed by computer system 1700, so that volatile memory 1704 and processing device 1702 may also constitute machine-readable storage media.

[0265] In the illustrative examples, computer-readable storage medium 1724 is shown as a single medium, but the term “computer-readable storage medium” includes a single or multiple mediums (e.g., a centralized or distributed database and / or associated caches and servers) that store one or more sets of executable instructions. The term “computer-readable storage medium” also includes any tangible medium that can store or encode a set of instructions for a computer to execute, causing that computer to perform one or more of the methods described herein. The term “computer-readable storage medium” includes, but is not limited to, solid memory, optical media and magnetic media.

[0266] The methods, components, and features described herein may be implemented by individual hardware components or integrated into the functions of other hardware components such as application-specific integrated circuits (ASICS), FPGAs, DSPs, or similar devices. Furthermore, the methods, components, and features may be implemented by firmware modules or by functional circuits within hardware devices. Moreover, the methods, components, and features may be implemented by any combination of hardware devices and computer program components, or by computer programs.

[0267] Unless otherwise specified, terms such as “select,” “process,” “receive,” “provide,” “execute,” “decide,” “use,” “train,” “generate,” “measure,” “calculate,” “schedule,” “update,” “remove,” “cropping,” “resizing,” “smooth,” “filtering,” “cluster,” “quantize,” or other similar terms relate to operations and processes performed or carried out by a computer system that operate on data in computer system registers and memory, expressed as physical (electronic) quantities, and convert them into other data in computer system memory or registers, or other such information storage, transmission, or display devices, also expressed as physical quantities. Furthermore, terms such as “first,” “second,” “third,” and “fourth” as used herein have meaning as indicators to distinguish between different elements and may not have ordinal meanings based on the numerical designations of those terms.

[0268] The examples described herein further relate to apparatus for carrying out the methods described herein. This apparatus may be specifically constructed for carrying out the methods described herein, or it may include a general-purpose computer system selectively programmed by a computer program stored in the computer system. Such a computer program may be stored in a computer-readable tangible storage medium.

[0269] The methods and examples described herein are not inherently related to any particular computer or other device. Various general-purpose systems may be used in accordance with the teachings provided herein, or it may be more convenient to construct a more specialized device to perform the methods and / or each of the individual functions, routines, subroutines, or operations of those methods described herein. Examples of structures for these various systems are provided in the above description.

[0270] The above description is intended to be illustrative and not limiting. While the disclosure has described examples and embodiments for specific illustrative purposes, it should be recognized that the disclosure is not limited to the examples and embodiments described. The scope of the disclosure should be determined by reference to the appended claims, along with the entire scope of the equivalents to which such claims are granted.

Claims

1. The measurement data of a substrate processed according to the manufacturing process is processed using a first trained machine learning model to predict the limit dimension (CD) profile of the substrate. To generate a CD profile prediction image based on the predicted CD profile of the substrate, The CD profile prediction image is processed using a second trained machine learning model to generate a synthetic microscope image related to the substrate. Methods that include...

2. The method according to claim 1, wherein the measurement data includes a profile map of at least one of the films or features on the substrate.

3. The method according to claim 1, wherein the measurement data includes at least one of spectral data or reflectometry data.

4. The method according to claim 1, wherein the first trained machine learning model includes a deep neural network.

5. The method according to claim 1, wherein the second trained machine learning model includes a generative model.

6. The method according to claim 1, wherein the synthesized microscope image includes a virtual scanning electron microscope (VSEM) image of the cross-section of the substrate.

7. To measure the characteristics of the aforementioned composite microscope image, Based on the measurements of the features of the synthetic microscope image, the dimensions of the manufactured device are calculated. The method according to claim 1, further comprising:

8. The process further includes performing a corrective action taking into account the calculated dimensions of the manufactured device, wherein the corrective action is performed Schedule maintenance, Updating the process recipe, or To provide alerts to users The method according to claim 7, comprising one or more of the above.

9. Receiving multiple scanning electron microscope (SEM) images and multiple limiting dimension (CD) measurements related to the substrate, The process involves generating multiple CD profile images based on the aforementioned multiple CD measurement values, To generate an input dataset including the plurality of SEM images and the plurality of CD profile images, Training a machine learning model using the aforementioned input dataset, wherein training the machine learning model includes providing the plurality of CD measurements to the machine learning model as training inputs and providing the plurality of SEM images to the machine learning model as target outputs. A method that includes this.

10. The method according to claim 9, wherein training the machine learning model includes training a generative adversarial network (GAN).

11. The plurality of SEM images include a plurality of CD Hall images, and the method is The process involves removing embedded SEM scanning information and resolution defects from the aforementioned multiple SEM images, Based on noise reduction and blurring, the top and bottom of the multiple CD hole images are determined, Based on the search for the maximum value, the CD Hall separation for the plurality of CD Hall images is determined, Based on the top, bottom and CD hole separation of the plurality of CD hole images, a plurality of individual CD hole cropping areas corresponding to the plurality of CD hole images are determined, The process involves cropping the multiple CD hole images based on the individual CD hole cropping areas, Resize multiple cropped CD hole images and The method according to claim 9, further comprising:

12. The method according to claim 11, wherein the resizing of the plurality of cropped CD hole images is based on the pixel specification of the corresponding CD measurement among the plurality of CD measurement values.

13. The method according to claim 11, wherein the input dataset includes the plurality of cropped CD Hall images.

14. A non-temporary computer-readable storage medium storing instructions, wherein when the instructions are executed, a processing device is made to perform an action, and the action is performed The measurement data of a substrate processed according to the manufacturing process is processed using a first trained machine learning model to predict the limit dimension (CD) profile of the substrate. To generate a CD profile prediction image based on the predicted CD profile of the substrate, The CD profile prediction image is processed using a second trained machine learning model to generate a synthetic microscope image related to the substrate. Non-temporary computer-readable storage media, including [specific data / information].

15. The non-temporary computer-readable storage medium according to claim 14, wherein the measurement data includes a profile map of at least one of the films or features on the substrate.

16. The non-temporary computer-readable storage medium according to claim 14, wherein the measurement data includes at least one of spectral data or reflectometry data.

17. The non-temporary computer-readable storage medium according to claim 14, wherein the second trained machine learning model includes a generative model.

18. The non-temporary computer-readable storage medium according to claim 14, wherein the synthesized microscope image includes a virtual scanning electron microscope (VSEM) image of a cross-section of the substrate.

19. A non-temporary computer-readable storage medium storing instructions, wherein when the instructions are executed, a processing device is made to perform an action, and the action is performed Receiving multiple scanning electron microscope (SEM) images and multiple limiting dimension (CD) measurements related to the substrate, The process involves generating multiple CD profile images based on the aforementioned multiple CD measurement values, To generate an input dataset including the plurality of SEM images and the plurality of CD profile images, Training a machine learning model using the aforementioned input dataset, wherein training the machine learning model includes providing the plurality of CD measurements to the machine learning model as training inputs and providing the plurality of SEM images to the machine learning model as target outputs. Non-temporary computer-readable storage media, including [specific data / information].

20. The plurality of SEM images include a plurality of CD Hall images, and the operation is, The process involves removing embedded SEM scanning information and resolution defects from the aforementioned multiple SEM images, Based on noise reduction and blurring, the top and bottom of the multiple CD hole images are determined, Based on the search for the maximum value, the CD Hall separation for the plurality of CD Hall images is determined, Based on the top, bottom and CD hole separation of the plurality of CD hole images, a plurality of individual CD hole cropping areas corresponding to the plurality of CD hole images are determined, The process involves cropping the multiple CD hole images based on the individual CD hole cropping areas, Resize multiple cropped CD hole images and A non-temporary computer-readable storage medium according to claim 19, further comprising:

21. Memory and A processing device coupled to the memory and A system comprising, wherein the processing device is The measurement data of a substrate processed according to the manufacturing process is processed using a first trained machine learning model to predict the limit dimension (CD) profile of the substrate. To generate a CD profile prediction image based on the predicted CD profile of the substrate, The CD profile prediction image is processed using a second trained machine learning model to generate a synthetic microscope image related to the substrate. The system that is supposed to execute this.

22. The processing device further, Receiving multiple scanning electron microscope (SEM) images and multiple CD measurements related to the substrate, The process involves generating multiple CD profile images based on the aforementioned multiple CD measurement values, To generate an input dataset including the plurality of SEM images and the plurality of CD profile images, Training a machine learning model using the aforementioned input dataset, wherein training the machine learning model includes providing the plurality of CD measurements to the machine learning model as training inputs and providing the plurality of SEM images to the machine learning model as target outputs. The system according to claim 21, which is to perform the following.