Screening device

The screening device addresses the challenge of high-dimensional and biased training data by using machine learning techniques for feature extraction and classification, enabling reliable candidate material selection for applications like solid electrolytes.

JP2026109006APending Publication Date: 2026-07-01UNIV OF TSUKUBA +1

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
UNIV OF TSUKUBA
Filing Date
2024-12-19
Publication Date
2026-07-01

AI Technical Summary

Technical Problem

Existing screening devices face challenges in selecting candidate materials for applications like solid electrolytes due to high-dimensional and small-sample training data with significant bias between positive and negative examples, making reliable screening difficult.

Method used

A screening device utilizing a trained model for machine learning that performs high-dimensional small-sample statistics, employing methods like Difference of Average and Regularized Principal Component Analysis for feature extraction and Geometric Quadratic Discriminant Analysis for classification to select candidate samples.

Benefits of technology

Enables highly reliable screening of materials even with high-dimensional and small-sample training data, effectively reducing dimensionality and bias, ensuring accurate candidate selection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026109006000001_ABST
    Figure 2026109006000001_ABST
Patent Text Reader

Abstract

To provide a screening device capable of performing highly reliable screening. [Solution] The screening device according to this disclosure comprises: a classification unit that classifies multiple training data used for machine learning from among multiple data; a feature extraction unit that extracts principal components of multiple data using a first trained model constructed by performing learning to extract principal components characteristic of training data belonging to the first class for each of the multiple training data; a classification unit that selects one or more data belonging to the first class from among the multiple data from which principal components have been extracted by the feature extraction unit using a second trained model constructed by performing learning to select training data belonging to the first class from among the multiple training data from which principal components have been extracted; and an output unit that outputs one or more samples corresponding to one or more data selected by the classification unit.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present disclosure relates to a screening device.

Background Art

[0002] In screening devices, it is required to select one or more candidate samples suitable for application to an application target from a plurality of samples, such as selecting a candidate of a material having high ionic conductivity suitable for application to a solid electrolyte of an all-solid-state battery. As a related technique, for example, Patent Document 1 discloses a technique related to material search.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] Although it is conceivable for a screening device to select candidate materials using a learned model constructed by machine learning, for example, training data for high-performance crystal structure screening obtained from a crystal structure model of a material has a problem that it is not suitable for machine learning because it has a high dimensionality and a significant bias between positive and negative examples with a small number of samples.

[0005] The present disclosure has been made in view of the above background, and an object thereof is to provide a screening device capable of performing highly reliable screening.

Means for Solving the Problems

[0006] The screening device according to this disclosure is a screening device that screens data of multiple samples and outputs one or more candidate samples to be applied to the target, and comprises: an acquisition unit that acquires multiple datasets consisting of data and class labels representing indicators of the data for the multiple samples; a classification unit that classifies multiple training data used for machine learning from the multiple data into a first class in which the class labels show indicators of a predetermined value or higher and a second class in which the indicators show indicators of a predetermined value or lower; a feature extraction unit that extracts principal components of the multiple data using a first trained model constructed by performing learning to extract principal components characteristic of training data belonging to the first class for each of the multiple training data; a classification unit that selects one or more data belonging to the first class from the multiple data from which principal components have been extracted by the feature extraction unit using a second trained model constructed by performing learning to select training data belonging to the first class from the multiple training data from which principal components have been extracted; and an output unit that outputs one or more samples corresponding to the one or more data selected by the classification unit as candidates for one or more samples to be applied to the target. The screening device described herein can perform highly reliable screening even when the training data is high-dimensional and few in number, and there is a significant bias between positive and negative examples, by using a trained model constructed by performing machine learning utilizing high-dimensional small-sample statistics with high-dimensional and few training data. [Effects of the Invention]

[0007] This disclosure provides a screening device capable of performing highly reliable screening. [Brief explanation of the drawing]

[0008] [Figure 1] This figure shows an example of the configuration of the screening device relating to this disclosure. [Figure 2]This is a conceptual diagram illustrating the processing flow of the screening device related to this disclosure. [Figure 3] This diagram illustrates the Difference of Average method used for feature extraction. [Figure 4] This is a conceptual diagram illustrating the evaluation process of the screening device related to this disclosure. [Figure 5] This figure shows the first evaluation result of the screening device relating to this disclosure. [Figure 6] This figure shows the second evaluation result of the screening device relating to this disclosure. [Figure 7] This figure shows the third evaluation result of the screening device relating to this disclosure. [Modes for carrying out the invention]

[0009] The following describes specific embodiments to which the present invention is applied, with reference to the drawings. However, the present invention is not limited to the following embodiments. Also, for clarity of explanation, the following description and drawings have been simplified as appropriate.

[0010] <Configuration of the screening device 100> Figure 1 shows an example configuration of the screening device 100 according to this disclosure. Figure 2 is a conceptual diagram illustrating the processing flow of the screening device 100 according to this disclosure.

[0011] The screening device 100 described herein is a device for selecting one or more candidate samples from a group of samples that are suitable for application to the target. As an example, the screening device 100 selects candidate materials (samples) with high ion conductivity that are suitable for application to the solid electrolyte (target) of an all-solid-state battery. Here, the screening device 100 described herein can perform highly reliable screening even when the training data is high-dimensional and few in number, and there is a significant bias between positive and negative examples, by using a trained model constructed by performing machine learning utilizing high-dimensional small-sample statistics using high-dimensional and few training data. This will be explained in detail below.

[0012] As shown in Figure 1, the screening device 100 includes an acquisition unit 101, a classification unit 102, a feature extraction unit 103, a classification unit 104, and an output unit 105.

[0013] The acquisition unit 101 acquires multiple datasets 201 for multiple samples. Each sample dataset 201 consists of data 201a related to the sample and a class label 201b assigned according to an index representing the suitability of the sample for application to the target. The acquisition unit 101 may also acquire datasets with a significant bias in the number of samples between positive and negative examples, such as datasets of high-dimensional data used in research and development of materials, or datasets of data on defective products generated during product manufacturing, as datasets 201 for multiple samples.

[0014] Sample data 201a refers to data about the sample, such as materials that may be applied to the target application. For example, if the target application is a solid electrolyte for an all-solid-state battery, the sample would be a material that can be applied to the solid electrolyte. Sample data 201a also includes, for example, the results of X-ray Diffraction (XRD) analysis of the material's crystal structure, or the results of Radial Density Function (RDF) analysis of the material's crystal structure.

[0015] The class label 201b is information on the class (group) assigned according to an index indicating the suitability of the application of the sample to the application target. In other words, the class label 201b is information on the class (group) assigned according to an index indicating the performance of the sample with respect to the application target. Note that the class label 201b may be the index itself indicating the suitability of the application of the sample to the application target.

[0016] For example, when the application target is the solid electrolyte of an all-solid-state battery, the index indicating the suitability of the application of the sample to the application target is the migration energy of the material that is the sample. Here, the lower the migration energy of the material, the higher the ionic conductivity of the solid electrolyte. Therefore, the index indicating the suitability of the application of the material to the solid electrolyte shows a high value. On the contrary, the higher the migration energy of the material, the lower the ionic conductivity of the solid electrolyte. Therefore, the index indicating the suitability of the application of the material to the solid electrolyte shows a low value.

[0017] In this example, for a sample showing an index of a predetermined value or more, a class label of label C0 is assigned, and for a sample showing an index less than the predetermined value, a class label of label C1 is assigned. However, the number of class labels is not limited to two and may be three or more. Also, as already described, the class label may be the index itself indicating the suitability of the application of the sample to the application target.

[0018] The classification unit 102 classifies a plurality of training data 202 used for machine learning among the plurality of data 201a into a class (first class) with a class label C0 indicating an index of a predetermined value or more and a class (second class) with a class label C1 indicating an index less than the predetermined value. Hereinafter, the class with the class label C0 is also referred to as class C0, and the class with the class label C1 is also referred to as class C1.

[0019] The feature extraction unit 103 extracts the principal components (feature quantities) of each of the plurality of data 201a. In other words, the feature extraction unit 103 reduces the dimensionality of the plurality of data 201a. In the principal component extraction by the feature extraction unit 103, for example, the method of Difference of Average or Regularized Principal Component Analysis (RPCA) is used.

[0020] Specifically, first, in the learning mode, the feature extraction unit 103 performs learning to extract the principal components characteristic of the training data 202 belonging to the class C0 for each of the plurality of training data 202. Thereby, a first learned model is constructed. After that, in the inference mode (normal operation mode), the feature extraction unit 103 extracts the principal components of the plurality of data 201a using the first learned model. That is, the feature extraction unit 103 reduces the dimensionality of the plurality of data 201a using the first learned model.

[0021] FIG. 3 is a diagram for explaining the method of Difference of Average used for feature extraction. As shown in FIG. 3, in the method of Difference of Average, for example, the data obtained by averaging the plurality of data belonging to the class C0 and the data obtained by averaging the plurality of data belonging to the class C1 are compared, and the component with a large difference is extracted as the principal component. In the method of Difference of Average, the number of peaks extracted by the Coefficient of Variation (CV) is determined based on the F1 statistic.

[0022] The classification unit 104 selects one or more data belonging to the class C0 from among the plurality of data (that is, the plurality of data with reduced dimensionality) 201a whose principal components have been extracted by the feature extraction unit 103. In the classification by the classification unit 104, for example, Geometric Quadratic Discriminant Analysis (GQDA) is used.

[0023] Specifically, first, in the learning mode, the classification unit 104 learns to select training data belonging to class C0 from among the multiple training data 202 extracted by the feature extraction unit 103. This constructs a second trained model. Subsequently, in the inference mode, the classification unit 104 uses the second trained model to select data belonging to class C0 from among the multiple data 201a extracted by the feature extraction unit 103.

[0024] The output unit 105 outputs one or more samples selected by the classification unit 104 as candidates for one or more samples to be applied to the target. The output contents of the output unit 105 are displayed on a monitor, for example.

[0025] Thus, the screening device 100 according to this disclosure can perform highly reliable screening even when the training data is high-dimensional and small in number, and there is a significant bias between positive and negative examples, by using a trained model constructed by performing machine learning utilizing high-dimensional small-sample statistics with high-dimensional and small-numbered training data.

[0026] <Evaluation of Screening Device 100> Next, we will explain the evaluation of the screening device 100. Figure 4 is a diagram illustrating the evaluation flow of the screening device 100. In the following, the target of application is the solid electrolyte of an all-solid-state battery, and the index representing the suitability of the sample for application to the target (an index representing the performance of the sample for the target) is assumed to be the transfer energy possessed by the material of the sample.

[0027] As shown in Figure 4, in the evaluation of the screening device 100, the dataset 201 for each sample uses either the results of XRD analysis on a Crystallographic Information File (CIF) representing the crystal structure of the sample material, or the results of RDF analysis on the CIF of the sample material. In addition, the dataset 201 is assigned a class label corresponding to the transfer energy of the sample material, which is calculated by performing a simulation on the CIF of the sample material.

[0028] In the example in Figure 4, out of 941 sample data, the top 1% of data representing the top metric are assigned the class label C0, and the remaining 931 sample data representing the bottom 99% of the metric are assigned the class label C1. In addition, in the example in Figure 4, approximately 70% of the 941 sample data are used as training data 202, and the remaining approximately 30% are used as evaluation data (test data).

[0029] Furthermore, in the example shown in Figure 4, the Difference of Average method or the RPCA method is used for feature extraction (principal component extraction) by the feature extraction unit 103. Also, in the example shown in Figure 4, the GQDA method is used for classification (selection) by the classification unit 104. However, for comparison purposes, the Generalized Support Vector Machine (GSVM) method is also used for classification by the classification unit 104.

[0030] (Comparison of RPCA and Difference of Average) Figure 5 shows the first evaluation results of the screening device 100. Two confusion matrices are shown in Figure 5. Specifically, the right side of Figure 5 shows the confusion matrix when the dataset 201 uses XRD analysis results, the feature extraction unit 103 uses the Difference of Average method for feature extraction, and the classification unit 104 uses the GQDA method for classification (selection). The left side of Figure 5 shows the confusion matrix when the dataset 201 uses XRD analysis results, the feature extraction unit 103 uses the RPCA method for feature extraction, and the classification unit 104 uses the GQDA method for classification. In other words, the two confusion matrices shown in Figure 5 differ in whether the feature extraction unit 103 uses the Difference of Average method or the RPCA method for feature extraction.

[0031] Referring to Figure 5, it can be seen that when the RPCA method is used for feature extraction, the true positive rate is higher compared to when the Difference of Average method is used, making it suitable for screening.

[0032] (Comparison of GQDA and GSVM) Figure 6 shows the second evaluation result of the screening device 100. Two confusion matrices are shown in Figure 6. Specifically, the right side of Figure 6 shows the confusion matrix when the GSVM method is used for classification by the classification unit 104. The left side of Figure 6 shows the confusion matrix when the GQDA method is used for classification by the classification unit 104.

[0033] Referring to Figure 6, it can be seen that when the GQDA method is used for classification by the classification unit 104, compared to when the GSVM method is used, a larger amount of data for high-performance class C0 samples remains, and a larger amount of data for low-performance class C1 samples is removed, making it suitable for screening.

[0034] (Comparison of XRD and RDF) Figure 7 shows the third evaluation result of the screening device 100. Two confusion matrices are shown in Figure 7. Specifically, the right side of Figure 7 shows the confusion matrix when RDF analysis results are used for the dataset 201, the RPCA method is used for feature extraction by the feature extraction unit 103, and the GQDA method is used for classification by the classification unit 104. The left side of Figure 7 shows the confusion matrix when XRD analysis results are used for the dataset 201, the RPCA method is used for feature extraction by the feature extraction unit 103, and the GQDA method is used for classification by the classification unit 104. In other words, the difference between the two confusion matrices shown in Figure 7 is whether RDF analysis results or XRD analysis results are used for the dataset 201.

[0035] Referring to Figure 7, it can be seen that when RDF analysis results are used for dataset 201, the number of false positives is lower compared to when XRD analysis results are used, making it more suitable for screening.

[0036] From the evaluation results in Figures 5 to 7, it can be seen that the screening device 100 is effective when the RDF analysis results are used for the dataset 201, the RPCA method is used for feature extraction by the feature extraction unit 103, and the GQDA method is used for classification (selection) by the classification unit 104.

[0037] As described above, the screening device 100 according to this disclosure can perform highly reliable screening even when the training data is high-dimensional and few in number and shows a significant bias between positive and negative examples, by using a trained model constructed by performing machine learning utilizing high-dimensional small-sample statistics with high-dimensional and few training data.

[0038] Furthermore, this disclosure can be realized by having a Central Processing Unit (CPU) execute a computer program to perform part or all of the processing of the screening device 100.

[0039] The program described above includes, when loaded into a computer, a set of instructions (or software code) for causing the computer to perform one or more of the functions described in the embodiments. The program may be stored in a non-temporary computer-readable medium or a physical storage medium. Examples, but not limited to, include Random-Access Memory (RAM), Read-Only Memory (ROM), flash memory, Solid-State Drive (SSD), or other memory technologies, CD-ROM, Digital Versatile Disc (DVD), Blu-ray® disc, or other optical disc storage, magnetic cassette, magnetic tape, magnetic disk storage, or other magnetic storage devices. The program may be transmitted over a temporary computer-readable medium or a communication medium. Examples, but not limited to, include temporary computer-readable medium or a communication medium that includes electrically, optically, acoustically, or otherwise propagating signals.

[0040] Although the present disclosure has been described above with reference to embodiments, the present disclosure is not limited to the embodiments described above. Various modifications to the structure and details of the present disclosure can be made as can be understood by those skilled in the art within the scope of the present disclosure. Furthermore, each embodiment can be combined with other embodiments as appropriate. [Explanation of Symbols]

[0041] 100 Screening devices 101 Acquisition Department 102 Classification Department 103 Feature Extraction Unit 104 Classification Department 105 Output section 201 datasets 201a Data 201b Class Label 202 Training Data

Claims

1. A screening device that screens multiple samples and outputs one or more candidate samples that are applicable to the target, An acquisition unit that acquires multiple datasets, each consisting of data and class labels corresponding to an index representing the appropriateness of applying the data to the target, A classification unit classifies the training data used for machine learning from the aforementioned multiple data into a first class with a class label indicating an indicator of a predetermined value or higher, and a second class with a class label indicating an indicator of a predetermined value or lower. A feature extraction unit extracts principal components of the multiple data sets using a first trained model constructed by performing learning to extract principal components characteristic of the training data belonging to the first class for each of the multiple training data sets, A classification unit uses a second trained model constructed by performing learning to select training data belonging to the first class from among the plurality of training data from which principal components have been extracted by the feature extraction unit to select one or more data belonging to the first class from among the plurality of data from which principal components have been extracted, An output unit outputs one or more samples corresponding to the one or more data selected by the classification unit as candidates for one or more samples to be applied to the target, A screening device equipped with the following features.

2. The aforementioned classification unit is Geometric Quadratic Discriminant Analysis (GQDA). The screening apparatus according to claim 1.

3. The feature extraction unit uses the Difference of Average or Regularized Principal Component Analysis (RPCA) method. The screening apparatus according to claim 1.

4. The aforementioned sample is a material, The dataset includes the results of X-ray Diffraction (XRD) analysis of the crystal structure of the material, or the results of analysis using Radial Density Function (RDF) of the crystal structure of the material. The screening apparatus according to claim 1.

5. The aforementioned application targets are solid electrolytes for all-solid-state batteries, An index representing the performance of the multiple sample materials for the application is the transfer energy possessed by the multiple sample materials, which corresponds to the ionic conductivity of the solid electrolyte. The screening apparatus according to claim 1.