Apparatus and method for predicting wafer test data
The electronic device with an AI model preprocesses WAT data into positional vectors and combines it with CP data to predict the complete wafer distribution, addressing the challenge of limited test data representation and improving spatial resolution.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- DIGWISE TECH CORP LTD
- Filing Date
- 2024-12-16
- Publication Date
- 2026-07-02
AI Technical Summary
Existing methods struggle to accurately represent the electrical characteristic distribution of a wafer using limited test data, making it difficult to determine the complete distribution of electrical characteristics across the entire wafer.
An electronic device with a processing circuit and memory that utilizes an artificial intelligence model to preprocess wafer acceptance test (WAT) data into positional embedding vectors and combines it with chip probing (CP) data to predict the complete distribution of WAT data using an MLP, VAE, or U-Net model, thereby generating a higher number of data points.
The method accurately predicts the electrical characteristic distribution of the entire wafer using a relatively small amount of test data, enhancing spatial resolution and completeness of the test data.
Smart Images

Figure 0007884055000001_ABST
Abstract
Description
Technical Field
[0001] The present invention relates to a method and an apparatus, and particularly to a method for predicting wafer test data and an electronic device related thereto.
Background Art
[0002] The electrical characteristics on a wafer usually have a complex distribution, and in many test situations, it is difficult to completely cover or reflect these characteristics with limited test data. Therefore, how to accurately obtain the electrical characteristic distribution of the entire wafer using limited test data has become an important issue in this field.
Summary of the Invention
[0006] Based on the above, the wafer test data prediction method and electronic device provided by the present invention can predict the electrical characteristic distribution on a wafer using a relatively small amount of test data. [Brief explanation of the drawing]
[0007] [Figure 1] This is a block diagram of an electronic device according to an embodiment of the present invention. [Figure 2] This is a flowchart of the prediction method according to an embodiment of the present invention. [Figure 3] This is a positional distribution diagram of test data on a wafer according to an embodiment of the present invention. [Figure 4] This is a schematic diagram of an artificial intelligence model and preprocessing according to an embodiment of the present invention. [Figure 5] This is a distribution diagram of test data on multiple wafers according to embodiments of the present invention. [Figure 6] This is a schematic diagram of an artificial intelligence model and preprocessing according to an embodiment of the present invention. [Figure 7] This is a flowchart illustrating the training method for artificial intelligence models. [Modes for carrying out the invention]
[0008] Figure 1 is a block diagram of an electronic device 1 according to an embodiment of the present invention. The electronic device 1 in Figure 1 includes a processing circuit 10 and a memory 11. The memory 11 stores instructions 110, and the electronic device 1 can access the instructions 110 in the memory 11. Although not clearly shown in Figure 1, the memory 11 also stores preprocessing and related weight data of the artificial intelligence model, and therefore the processing circuit 10 executes the artificial intelligence model to process the received chip probing (CP) data CP1 and wafer acceptance test (Waf e Based on the Acceptance Test (WAT) data WAT1, predicted WAT data WAT2 can be generated by predicting the distribution of test data across the entire wafer. Specifically, through CP data CP1 with a relatively large number of data points, the processing circuit 10 can accurately predict the complete distribution of predicted WAT data WAT2 on the wafer using the relatively sparse WAT data WAT1.
[0009] In some embodiments, the processing circuit 10 may be, for example, a central processing unit (CPU), or other programmable general-purpose or application-specific microcontroller (MCU), microprocessor, digital signal processor (DSP), programmable controller, application-specific integrated circuit (ASIC), graphics processing unit (GPU), neural processing unit (NPU), arithmetic unit (ALU), complex-programmable logic device (CPLD), field-programmable gate array (FPGA), any other type of integrated circuit, state machine, advanced RISC machine (ARM) based processor, or other similar components, or a combination of the above-mentioned components.
[0010] In some embodiments, the memory 11 may be, for example, any form of fixed or removable random-access memory (RAM), read-only memory (ROM), flash memory, hard disk drive (HDD), solid-state drive (SSD), or similar component, or a combination of the above components, and is used to store multiple modules or application programs that can be executed by the processing circuit 10. In this embodiment, the artificial intelligence model 110 is stored in the memory 11, and its function will be described later.
[0011] In some embodiments, the artificial intelligence model 110 may be a machine learning model, such as an MLP (Multilayer Perceptron), a VAE (Variational Autoencoder), a cVAE (Conditional Variational Autoencoder), a U-Net, or a model of another similar architecture.
[0012] Figure 2 is a flowchart of a prediction method according to an embodiment of the present invention. This method may be applied to the electronic device 1 shown in Figure 1, and when executed, it predicts the distribution of test data across the entire wafer based on the received WAT data WAT1 and CP data CP1 of the same wafer, thereby generating WAT data WAT2.
[0013] The prediction method in Figure 2 includes steps S20 and S21. In step S20, the processing circuit 10 may preprocess the wafer WAT data WAT1 to convert the positional information corresponding to the WAT data WAT1 on the wafer into a positional embedding vector. Specifically, the WAT data WAT1 that is preprocessed is relatively sparse, records average value information on the wafer, and has only a first number of data points. In some embodiments, WAT data is actually obtained by testing only 13 data points on a single wafer. In other embodiments, the first number of data points may be increased or decreased according to different requirements.
[0014] Figure 3 is a positional distribution diagram of test data WAT1 on wafer 3 according to an embodiment of the present invention. Specifically, the wafer 3 shown in Figure 3 is divided into multiple grids based on the size of the unit photomask. During the process of manufacturing wafer 3, processing may be carried out sequentially based on the size of the unit photomask. After wafer 3 is completed, wafer acceptance testing (WAT), chip probing (CP), and final testing (Final Test, FT) are performed.
[0015] In CP testing, each die within the entire wafer 3 may be tested by probing, with the purpose of ensuring that the electrical characteristics of each die within the entire wafer 3 (e.g., verification of functions such as current, voltage, and timing) meet the basic design specifications.
[0016] On wafer 3, scribe lines are provided at the boundaries between each unit photomask. WAT testing may acquire test data by providing test circuits (test keys) on the scribe lines. Specifically, the test circuits include various components such as N-type transistors (NMOS), P-type transistors (PMOS), resistors, capacitors, etc., of different sizes. WAT testing may acquire their electrical parameters (e.g., conduction current, conduction resistance, breakdown voltage, threshold voltage, etc.) by testing these components. Therefore, WAT testing may acquire test data at different locations by providing test circuits at multiple locations on wafer 3 as shown in Figure 3.
[0017] Finally, after packaging the die on wafer 3, an FT test may be performed. The packaged die may be subjected to an automated test equipment (ATE) and / or a system level test (SLT). Generally, FT tests can be used to verify and reliably ensure that the functionality of the packaged die is normal.
[0018] Returning to the WAT test, as described above, since the test circuit can obtain more detailed test data, the WAT test circuit usually has a relatively large size. Therefore, in the formal mass production stage of the wafer, only a small number of test circuits can be provided on wafer 3. In the embodiment of FIG. 3, only 13 sets of test circuits are provided on wafer 3, and they are provided at 13 marked unit photomask positions D1 to D13 respectively.
[0019] Furthermore, since the WAT test can only obtain test data at partial positions in FIG. 3, one of the main purposes of the prediction method in FIG. 2 is how to determine the complete electrical characteristic distribution of the entire wafer 3 using limited test data.
[0020] Furthermore, in step S20, the processing circuit 10 preprocesses the WAT data WAT1 and inputs its position information into the artificial intelligence model 110 as a position embedding vector. In this way, the artificial intelligence model 110 can capture position-related information in the prediction process and make predictions related to the position based on this. In some embodiments, the preprocessing may be absolute position embedding (Sinusoidal Positioning Embedding) or generation of a position embedding vector using an MLP.
[0021] Subsequently, in step S21, the artificial intelligence model 110 receives the CP data CP1 and the position embedding vector and performs calculations to predict the WAT data WAT2 of the entire wafer. Specifically, the position embedding vector (test data WAT1 from a small number of sample points) and the CP data (for example, output impedance RO or leakage current SIDD) CP1 that can fully characterize the uniformity of the wafer with a relatively large number of data points. mosquitoThus, the artificial intelligence model can predict the complete test data distribution across the entire wafer 3 and generate the corresponding predicted WAT data WAT2. In other words, the second quantity of data points in the predicted WAT data WAT2 is greater than the first quantity of data points in the WAT data WAT1, meaning it has a higher spatial resolution. In some embodiments, the CP data CP1 may have a full map of the wafer data points, and the generated predicted WAT data WAT2 may have the same quantity of data points as the CP data CP1. In some embodiments, the artificial intelligence model may be an MLP, VAE, U-Net model, etc.
[0022] Figure 4 is a schematic diagram of an artificial intelligence model 41 and preprocessing 40 according to an embodiment of the present invention. As shown in the figure, Figure 4 includes preprocessing 40 and artificial intelligence model 41. Test data WAT1 may be sent to preprocessing 40. After calculation, preprocessing 40 can represent the position of the test data WAT1 with a position embedding vector.
[0023] In this embodiment, the artificial intelligence model 41 is a multilayer perceptron model. Although not clearly shown in the figure, the position embedding vectors are provided to several residual network (ResNet) units, and the artificial intelligence model 41 combines the CP data CP1 and WAT data WAT1, and performs calculations based on information such as the mean value, positional relationship, and order to generate predicted WAT data WAT2.
[0024] Figure 5 is a distribution diagram of test data on multiple wafers according to embodiments of the present invention. Figure 5 illustrates the distribution diagrams of test data on three wafers. As shown, the test data on these three wafers generally exhibits a donut-shaped distribution tendency, with the data protruding around the periphery of the wafer and concave in the center. In some embodiments, the test data on the wafer may have a distribution structure other than a donut shape, such as a Mexican hat shape (higher in the middle, lower around the periphery), a bowl shape (lower in the middle, higher around the periphery), or other annular or radial patterns.
[0025] Figure 6 is a schematic diagram of an artificial intelligence model 61 and preprocessing 60 according to an embodiment of the present invention. In this embodiment, the preprocessing 60 is, for example, absolute position embedding or MLP, and the artificial intelligence model 61 is, for example, VAE, cVAE, U-Net, MLP, etc. In the illustrated embodiment, the invention is described using an implementation in which the artificial intelligence model 61 is a U-Net, but the present invention is not limited thereto. The U-Net designated as the artificial intelligence model 61 includes an input / output unit, a 2D convolutional unit, a residual network unit, an attention unit, a linear attention unit, and an up / downsampling unit.
[0026] As shown in Figure 6, test data WAT1 may be provided to preprocessing 60. Preprocessing 60 may preprocess the positional information of test data WAT1 to convert it into positional embedding vectors and provide them to the residual network unit of artificial intelligence model 61. In this embodiment, the left side of artificial intelligence model 61 is an encoder, and the right side is a corresponding and symmetrical decoder. Artificial intelligence model 61 may receive CP data CP1 and perform calculations based on the positional embedding vectors. As the calculation process progresses, the encoder of artificial intelligence model 61 may extract features by gradually reducing the size of the CP data CP1, and the decoder may gradually enhance the features in the process of restoring the resolution to generate predicted WAT data WAT2. Throughout the encoding and decoding process, the U-Net may be further provided with residual paths having several skip connections, which are used to avoid gradient vanishing due to an excessively deep model structure by supporting the transmission of information through direct connections. Furthermore, in the process in which the entire artificial intelligence model 61 performs calculations, the preprocessing generates position embedding vectors by encoding the position information in the WAT data WAT1 and provides them to multiple residual paths of the artificial intelligence model 61, enabling the artificial intelligence model 61 to be constructed based on the WAT data WAT1.
[0027] Figure 7 is a flowchart of a method for training an artificial intelligence model. The training method in Figure 7 may be used, for example, to train the artificial intelligence model 110 stored in memory 11 in Figure 1, the artificial intelligence model 41 in Figure 4, or the artificial intelligence model 61 in Figure 6. Alternatively, the training method in Figure 7 may be executed, for example, by the electronic device 1 in Figure 1, or by executing it on another electronic device with similar computing capabilities to obtain a trained artificial intelligence model, which may then be stored in the electronic device 1 in Figure 1.
[0028] The training method shown in Figure 7 includes steps S70 to S76. In step S70, a training dataset and a verification dataset may be obtained. Specifically, the CP data, training dataset, and verification dataset are test data on the same wafer. To perform training and verification, tests may first be performed at different locations on the same wafer to obtain a relatively large number of WAT measurement datasets. These measurement datasets may then be divided into a training dataset and a verification dataset. For example, the quantities of the training dataset and the verification dataset may be divided in a 2:8 ratio. Alternatively, the measurement data of 13 data points in the measurement dataset may be used as the training dataset, and the measurement data of 78 data points may be used as the verification dataset. In some embodiments, the training dataset and the verification dataset may partially overlap or not overlap at all. In other embodiments, the number of data points in the training dataset and the verification dataset may be adjusted according to different design requirements.
[0029] In step S71, the CP data and training dataset from the same wafer are provided to the artificial intelligence model for training. Although not clearly shown in Figure 7, in this embodiment, the training dataset may also undergo preprocessing to encode the positional information of the training dataset into positional embedding vectors and provide them to the artificial intelligence model. Therefore, the artificial intelligence model may make predictions based on the training dataset and the corresponding positional embedding vectors, and generate predicted WAT data in step S72. Since both the preprocessing and the artificial intelligence model may be MLPs, both the preprocessing and the artificial intelligence model may be trained simultaneously during the training process.
[0030] Next, in step S73, the difference between the predicted WAT data and the validation dataset may be compared. Generally, in this step, the difference between the predicted WAT data and the validation data can be expressed and quantified using a loss function, and based on this, it may be determined whether the predicted WAT data converges to the validation dataset or to a predetermined range.
[0031] If, in step S74, it is determined that the predicted data does not converge to the validation data, or if the discrepancy between the predicted data and the validation dataset exceeds a predetermined range, it may be determined that the training of the artificial intelligence model on that data is incomplete, and the process proceeds to step S75.
[0032] In step S75, the artificial intelligence model may be adjusted by modifying parameters or weights in the artificial intelligence model based on the discrepancy between the predicted data and the validation data, and then trained again based on the training dataset and the validation dataset.
[0033] Conversely, if the predicted data converges to the validation data, or if the difference between the predicted data and the validation data is within a predetermined range, it can be determined that training of the artificial intelligence model on that data is already complete, and the process proceeds to step S76.
[0034] In some embodiments, after the training is completed, the next set of measurement data may be received, and the training of the artificial intelligence model may continue until all the measurement data has been input into the artificial intelligence model and the training is complete. In some embodiments, the training method shown in Figure 7 may be applied to retrain the artificial intelligence model, generating more accurate measurement WAT data by dynamically adjusting the weight data of the artificial intelligence model for each wafer and each lot of production, and adapting to changes in the characteristics of different wafer production lots. In some embodiments, the circuit may be adaptively fine-tuned based on predicted WAT data to overcome imperfections in the manufacturing process, improve yield, and improve circuit performance.
[0035] In summary, the wafer test data prediction method and electronic device of the present invention can appropriately represent the positional information of the measurement data as a positional embedding vector by preprocessing the measurement data. In this way, the artificial intelligence model can generate prediction data of the measurement data distribution across the entire wafer based on a relatively small amount of measurement data, given that positional information is available. [Industrial applicability]
[0036] The wafer test data prediction method and electronic device of the present invention can be used in the semiconductor verification field to determine the chip yield in a wafer, and can be applied in the circuit design field to adaptively adjust circuit parameters. [Explanation of symbols]
[0037] 1:Electronic device 3: Wafer 10: Processing Circuit 11: Memory 40, 60: Pre-processing 41, 61: Artificial Intelligence Models CP1: CP Data D1~D13:Position WAT1: WAT data WAT2: Predicted WAT data S20, S21, S70~S76: Step
Claims
1. The process involves receiving WAT data collected by performing a Wafer Acceptance Test (WAT) on a wafer, and pre-processing the WAT data to convert multiple positional information of the WAT data within the wafer into multiple positional embedding vectors, wherein the WAT data has a first number of data points, and the pre-processing is either a sinusoidal positioning embedding model or a multilayer perceptron model. The method involves receiving CP data collected by performing chip probing (CP) on the wafer, inputting the CP data and the plurality of position embedding vectors into an artificial intelligence model, and causing the artificial intelligence model to generate predicted WAT data for the wafer, wherein the CP data and the predicted WAT data have a second number of data points greater than the first number, and the artificial intelligence model is a multilayer perceptron model. including, Methods for predicting wafer test data.
2. The CP data and the predicted WAT data have data points for the full map of the wafer. A method for predicting wafer test data according to claim 1.
3. The plurality of position embedding vectors are provided to a plurality of residual network (ResNet) units of the artificial intelligence model. A method for predicting wafer test data according to claim 1.
4. The aforementioned multilayer perceptron model is one of the following: VAE (Variational Autoencoder) model, cVAE (Conditional Variational Autoencoder) model, or U-Net model. A method for predicting wafer test data according to claim 3.
5. The wafer is a first wafer, the CP data, the WAT data, and the predicted WAT data of the first wafer are, respectively, first CP data, first WAT data, and first predicted WAT data, and the prediction method is, The second WAT data from the second wafer is divided into training WAT data and validation WAT data, The training WAT data of the second wafer is input to the preprocessing, the second CP data of the second wafer is input to the artificial intelligence model, and the artificial intelligence model is made to generate second predicted WAT data. The preprocessing and artificial intelligence model are adjusted by comparing the second predicted WAT data with the verification WAT data and calculating a loss function based on the comparison results. The preprocessing and training of the artificial intelligence model are further included in the following steps: A method for predicting wafer test data according to claim 1.
6. The WAT data records the average value information of the wafer, and the parameters of the artificial intelligence model are dynamically adjusted based on the predicted WAT data. A method for predicting wafer test data according to claim 1.
7. An electronic device used to predict wafer data, Memory for storing instructions, The processing circuit coupled to the memory and Includes, The processing circuit executes the artificial intelligence model by accessing the instructions, and the processing circuit, The process involves receiving WAT data collected by performing a Wafer Acceptance Test (WAT) on a wafer, and pre-processing the WAT data to convert multiple positional information of the WAT data within the wafer into multiple positional embedding vectors, wherein the WAT data has a first number of data points, and the pre-processing is either a sinusoidal positioning embedding model or a multilayer perceptron model. The method involves receiving CP data collected by performing chip probing (CP) on the wafer, inputting the CP data and the plurality of position embedding vectors into the artificial intelligence model, and causing the artificial intelligence model to generate predicted WAT data for the wafer, wherein the CP data and the predicted WAT data have a second number of data points greater than the first number, and the artificial intelligence model is the multilayer perceptron model. Used to perform, electronic equipment.
8. The CP data and the predicted WAT data have data points for the full map of the wafer. The electronic device according to claim 7.
9. The plurality of position embedding vectors are provided to a plurality of residual network (ResNet) units of the artificial intelligence model. The electronic device according to claim 7.
10. The aforementioned multilayer perceptron model is one of the following: VAE (Variational Autoencoder) model, cVAE (Conditional Variational Autoencoder) model, or U-Net model. The electronic device according to claim 9.
11. The wafer is a first wafer, and the CP data, WAT data, and predicted WAT data of the first wafer are, respectively, first CP data, first WAT data, and first predicted WAT data, and the processing circuit is, The second WAT data from the second wafer is divided into training WAT data and validation WAT data, The training WAT data of the second wafer is input to the preprocessing, the second CP data of the second wafer is input to the artificial intelligence model, and the artificial intelligence model is made to generate second predicted WAT data. The preprocessing and artificial intelligence model are adjusted by comparing the second predicted WAT data with the verification WAT data and calculating a loss function based on the comparison results. The preprocessing and training of the artificial intelligence model are further included in the following steps: The electronic device according to claim 7.