Information processing systems, information processing methods, and programs

The information processing system addresses variability in material design by using rule-based workflows and AI integration to enhance analysis support, ensuring data confidentiality and improving the consistency and accuracy of material design outcomes.

JP7875642B1Active Publication Date: 2026-06-18データケミカル株式会社

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
データケミカル株式会社
Filing Date
2025-12-05
Publication Date
2026-06-18

Smart Images

  • Figure 0007875642000001_ABST
    Figure 0007875642000001_ABST
Patent Text Reader

Abstract

We provide information processing systems and other tools that can effectively support analysis, including material design. [Solution] The information processing system 1 includes a cloud service server 10 that provides cloud services related to analysis including material design, molecular design, and process design, and allows a user to select a predetermined workflow; a first generation unit 21 that controls the selected predetermined workflow to operate the cloud service server 10 and generates analysis results and comments as first generation results in a rule-based manner; and a second generation unit 22 that, based on the first generation results, communicates with an external generation AI 30 connected via a network and generates a summary, insights / interpretation comments / recommendation comments, and a report related to the analysis as second generation results in a large-scale language model (LLM) layer.
Need to check novelty before this filing date? Find Prior Art

Description

【Technical Field】 【0001】 The present disclosure relates to an information processing system, an information processing method, and a program, and particularly to an information processing system, an information processing method, and a program for assisting analysis including material design. 【Background Art】 【0002】 Conventionally, for example, with the problem of providing a material design device that can derive an optimal solution of design conditions satisfying desired material properties in a short time, a design condition setting unit that sets a specified range of design conditions for a material to be designed, a comprehensive prediction point generation unit that generates a plurality of comprehensive prediction points within the specified range set by the design condition setting unit, a data set in which material property values calculated by inputting the comprehensive prediction points generated by the comprehensive prediction point generation unit into a learned model are associated with each point of the comprehensive prediction points, a required property setting unit that sets a specified range of required properties of the material to be designed, and a design condition extraction unit that extracts a data set satisfying the required properties set by the required property setting unit from the design condition - material property table have been disclosed (see Patent Document 1). 【Prior Art Documents】 【Patent Documents】 【0003】 【Patent Document 1】 International Publication No. 2020 / 090848 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0004】 In the background technology described above, the design condition setting unit, comprehensive prediction point generation unit, trained model, and design condition-material property table are positioned as the forward problem analysis unit, while the required property setting unit and design condition extraction unit are positioned as the inverse problem analysis unit. The design conditions that satisfy the required properties displayed by the design condition extraction unit are to be adjusted by the material designer based on their experience, and the collaborative material design between machine learning prediction and the material designer's experience is emphasized as an effective feature. However, this presents a challenge in the material design device: the analysis process and results may be influenced by the material designer's experience level. 【0005】 Therefore, in one aspect, the present invention aims to provide an information processing system, etc., that can effectively support analysis, including material design. [Means for solving the problem] 【0006】 In one aspect, We provide cloud services related to analysis, including material design, molecular design, and process design, and a cloud service server where users can select a predetermined workflow. The system includes an analysis support agent server that supports the analysis, comprising: a first generation unit that controls the selected predetermined workflow to operate the cloud service server and generates analysis results and comments as first generation results in a rule-based manner; and a second generation unit that, based on the first generation results, communicates with an external generation AI connected via a network and generates a summary, insights / interpretation comments / recommendation comments, and a report related to the analysis as second generation results in a large-scale language model (LLM) layer, The second generation unit is provided with an information processing system that performs masking on the feature names related to the analysis when communicating with the generation AI, and removes the masking on the feature names when generating the report for the user. [Effects of the Invention] 【0007】 In one respect, the present invention provides an information processing system, etc., that can effectively support analysis, including material design. [Brief explanation of the drawing] 【0008】 [Figure 1] This figure shows an example configuration of the information processing system according to this embodiment. [Figure 2] This is a diagram (part 1) showing an example of a user screen (input) in the user interface section. [Figure 3] This is a diagram (part 2-1) showing an example of a user screen (input) in the user interface section. [Figure 4] This is a diagram (part 2-2) showing an example of a user screen (input) in the user interface section. [Figure 5] This figure shows an example of a user screen (output) in the user interface section. [Figure 6] This figure shows an example of training data. [Figure 7] This is a scatter plot showing an example of training data. [Figure 8] This diagram shows an example of the operation of the information processing system of this embodiment in flowchart form. [Modes for carrying out the invention] 【0009】 The embodiments will be described in detail below with reference to the attached drawings. 【0010】 (Overall configuration of Information Processing System 1) Figure 1 shows an example configuration of the information processing system 1 according to this embodiment. The information processing system 1 comprises a cloud service server 10 and an analysis support agent server 20. The information processing system 1 may also further comprise a database server 30. The information processing system 1 communicates with the generated AI 40 via the network using the analysis support agent server 20. The user communicates with the information processing system 1 via the network using a user terminal 50 and utilizes the cloud services provided by the information processing system 1. 【0011】 The cloud service server 10 provides users with data analysis and machine learning cloud services in the chemical field for various materials, particularly chemical materials. Data analysis may include material design, molecular design, and process design. This disclosure mainly provides examples of material design, but supplementary information on molecular design and process design will be provided as appropriate. Chemical materials include reagents used in chemical analysis, experiments, research and development, and testing, as well as industrial chemicals and industrial raw materials used in the manufacture of various products. Furthermore, chemical materials include pharmaceuticals and medicines used in the fields of biochemistry and medicine. This disclosure is applicable to any of these chemical materials and is not limited to any particular chemical material. 【0012】 The cloud service server 10 is a server that provides cloud services related to analysis, including material design, and includes a user interface unit 11 and a processing unit 13. 【0013】 The user interface unit 11 includes an input unit 111, which includes a workflow selection unit 111a and various setting input units 111b, and an output unit 112. The workflow selection unit 111a accepts the selection of a workflow related to the analysis from the user, and the various setting input units 111b accept inputs related to various settings necessary for the analysis from the user, while the output unit 112 outputs the analysis results. The processing unit 13 processes one or more tasks that constitute the selected predetermined workflow and includes a data verification / visualization unit 131, a data preprocessing unit 132, a feature design unit 133, a model optimization / evaluation unit 134, a prediction data creation unit 135, and a prediction / search unit 136. 【0014】 Each part of these processing units 13 may process tasks such as the following: The data verification and visualization unit 131 may process tasks such as box plots, correlation coefficients, scatter plots, histograms, basic statistics, scatter plot matrices, dimensionality reduction, clustering, and Pareto optimal solution search. The data preprocessing unit 132 may process tasks such as missing value imputation, dummy variable creation (string to numerical conversion), and spectral smoothing. The feature design unit 133 may process tasks such as feature transformation (creating new variables), feature selection (deleting unnecessary variables), and mixture calculation (creating new variables). The model optimization and evaluation unit 134 may process tasks such as model optimization (when the target variable is numerical), classification (when the target variable is a string), variable importance calculation, and anomaly factor analysis. The prediction data creation unit 135 may process tasks for creating prediction data related to desired matters, including virtual sample generation. The prediction and search unit 136 may process tasks such as regression analysis and Bayesian optimization. Furthermore, the data verification / visualization unit 131, data preprocessing unit 132, feature design unit 133, model optimization / evaluation unit 134, and prediction / search unit 136 process tasks common to material design, molecular design, and process design, while the prediction data creation unit 135 processes tasks as needed. 【0015】 In material design, tasks are processed by the data confirmation / visualization unit 131, data preprocessing unit 132, feature quantity design unit 133, model optimization / evaluation unit 134, prediction data creation unit 135, and prediction / search unit 136. For example, when a predetermined workflow is the first workflow related to the material design described later, the first workflow mainly includes tasks related to the correlation coefficient calculation unit and scatter diagram drawing unit in the data confirmation / visualization unit 131, the model optimization / evaluation unit 134, and the prediction / search unit 136. Based on the control of the selected workflow by the analysis support agent server 20, the processing unit 13 will execute the correlation coefficient calculation process by the correlation coefficient calculation unit in the data confirmation / visualization unit 131, the scatter diagram drawing process by the scatter diagram drawing unit, the model optimization process by the model optimization / evaluation unit 134, and the prediction process by the prediction / search unit 136. In addition, tasks related to other units may be processed as necessary. For example, the prediction data creation unit 135 may prepare table data including material properties and experimental conditions as usage data and create prediction data as virtual samples (numerical data satisfying constraints). 【0016】 In molecular design, table data including material properties and molecular structures is prepared as usage data, and tasks are processed by the data confirmation / visualization unit 131, data preprocessing unit 132, feature quantity design unit 133, model optimization / evaluation unit 134, prediction data creation unit 135, and prediction / search unit 136. In the data preprocessing unit 132, descriptor calculation (molecular structure data (SMILES) → numerical conversion) is performed, and in the prediction data creation unit 135, structure generation (creation of molecular structure data by combining fragments) and reagent database search (search / extraction of molecular structure data from the DB) are performed. 【0017】 In process design, time-series data including sensor values obtained from equipment and devices is prepared as usage data, and tasks are processed by the data confirmation / visualization unit 131, data preprocessing unit 132, feature quantity design unit 133, model optimization / evaluation unit 134, and prediction / search unit 136. In the model optimization / evaluation unit 134, construction of an anomaly detection model, construction of a soft sensor model, etc. are performed, and in the prediction / search unit 136, anomaly detection, soft sensor prediction, etc. are performed. 【0018】 The analysis support agent server 20 is a server that provides an agent service for supporting analysis including material design, and includes a first generation unit 21 and a second generation unit 22. 【0019】 The first generation unit 21 controls the selected workflow and operates each part of the processing unit 13 of the cloud service server 10 (data confirmation / visualization unit 131, data preprocessing unit 132, feature quantity design unit 133, model optimization / evaluation unit 134, prediction data creation unit 135, prediction / search unit 136) based on rules, that is, based on a series of rules and conditions preset by experts related to material design. Further, the first generation unit 21 aggregates and generates the analysis results obtained by each part of the processing unit 13 (data confirmation / visualization unit 131, data preprocessing unit 132, feature quantity design unit 133, model optimization / evaluation unit 134, prediction data creation unit 135, prediction / search unit 136), and generates an interpretation comment (rule-based) (also referred to as the first generation result). The second generation unit 22 includes a summary generation unit 221, a comment generation unit 222, and a report generation unit 223. The summary generation unit 221 generates a summary of the first generation result, the comment generation unit 222 gives suggestions, interpretation comments, and recommended comments for the first generation result, and the report generation unit 223 generates a report in the large language model (LLM; Large Language Models) layer (also referred to as the second generation result). Here, the purpose of generating rule-based interpretation comments, etc. in the first generation unit 21 includes complementing domain knowledge that cannot be handled by the external generation AI 40 described later, improving the accuracy of the output of the generation AI 40, and reducing the risk of hallucination (illusion) of the output of the generation AI 40. 【0020】 Here, the various setting input units 111b of the user interface unit 11 receive input for the training data and prediction data to be analyzed, but the feature names of these data are masked in the second generation unit 22 of the analysis support agent server 20. This is done considering that feature names are often confidential to the user. As will be described later, the second generation unit 22 utilizes an external generation AI 40, and since the data may be stored for a certain period of time by this external service, the feature name masking process is implemented to ensure the confidentiality of the user's data. Specifically, the second generation unit 22 masks the feature names of the first generation results in the preceding processing, then sends the masked data to the generation AI 40 to obtain a summary, insights / interpretation comments / recommendation comments, reports, etc. After the analysis is completed, the second generation unit 22 demasks the feature names in the subsequent processing and sends the results, including the report with the demasked feature names, from the report generation unit 223 to the output unit 112 of the user interface unit 11. The user can check the content displayed by feature name on the output unit 112 on the user terminal 50. 【0021】 The database server 30 is a server for performing Retrieval Augmented Generative (RAG). When the comment generation unit 222 of the analysis support agent server 20 provides suggestions and generates interpretation and recommendation comments, the database server 30 is set up to prevent hallucination and utilize the corporate knowledge of the provider of the information processing system 1, in response to requests from the comment generation unit 222. Text data and image data adapted to the workflow are chunked and stored as vector data 31 in the database server 30. When new text data or image data is generated, the database server 30 adds them to the vector data 31 as new entries. 【0022】 The generating AI 40 is not limited as long as it is available on the network and capable of providing an LLM 41 useful to the information processing system 1 related to this disclosure. Examples include GPT® provided by OpenAI, LaMDA® provided by Google, and LLaMA® provided by Meta. The summary generation unit 221, comment generation unit 222, and report generation unit 223 of the second generation unit 22 of the analysis support agent server 20 communicate with the external generating AI 40 when creating a summary of the first generation result, suggestions for the first generation result, interpretation comments and recommendation comments, and a report as the second generation result, respectively. Each part of the second generation unit 22 sends a prompt to the generating AI 40, and the generating AI 40 replies with a response. This response is reflected in the final processing of the second generation result. 【0023】 (Functions of Information Processing System 1) Information processing system 1, with the configuration described above, performs the following functions. Here, we will explain using the workflow (1) related to material design as an example. The functions are broadly divided into steps 0 to 5. 【0024】 Step 0 In step 0, the user terminal 50 selects a workflow, loads training data (CSV file), and inputs various settings. 【0025】 Specifically, first, a workflow is selected in the workflow selection unit 111a of the input unit 111 in the user interface unit 11 of the cloud service server 10. Multiple workflows may be provided, but below we will explain the case where a first workflow (sometimes displayed as workflow WF_1) is selected as the predetermined workflow, which analyzes the individual tasks related to data visualization (correlation coefficient, scatter plot), model optimization, and prediction in the order of "#1 (1) Data visualization (correlation coefficient, scatter plot) → Model optimization → Prediction (regression analysis or Bayesian optimization)". 【0026】 Next, training data consisting of a CSV (Comma-Separated Values) file is loaded into the various setting input section 111b. When loading the training data, various settings are set, such as the target variable and the target value (target value range) of the target variable. One or two target variables are set. The loaded training data undergoes a data format check. Specifically, the training data format check is performed, and missing data, strings, and data exceeding the capacity are excluded. 【0027】 The training data consists of, for example, property a, property b, raw material 1, raw material 2, raw material 3, temperature, and time. Property a and property b refer to properties a and b of a chemical material (an example of a desired substance) obtained as a result of the analysis, and are the objective variables of the analysis in Information Processing System 1. Raw material 1, raw material 2, raw material 3, temperature, and time refer to matters that may correlate with the properties of the chemical material, and are the explanatory variables of the analysis in Information Processing System 1. For example, raw material 1, raw material 2, and raw material 3 may represent the content of raw materials 1-3 of the chemical material, temperature may represent the reaction temperature, and time may represent the reaction time. However, these raw material 1, raw material 2, raw material 3, temperature, and time comprehensively represent specific feature names, and the specific events targeted by the index or number of raw materials indicated by raw material, and the temperature and time indicated by temperature and time, are not limited to specific ones. Furthermore, once the aforementioned masking is applied, these feature names are converted into meaningless strings or tokens such as "f_001" and "Qx7pL2". 【0028】 Step 1 In the first step, as the first stage of data visualization, the correlation coefficient between the dependent variable and the independent variables is calculated. 【0029】 Specifically, in the input section 111 of the user interface section 11 of the cloud service server 10, once input in the various setting input section 111b is completed and the execute button is pressed, the first generation section 21 of the analysis support agent server 20 controls the workflow and activates the correlation coefficient calculation section of the data confirmation / visualization section 131 of the processing section 13 of the cloud service server 10. The training data and prediction data input in the various setting input section 111b are read into the first generation section 21 as feature names related to the analysis. The correlation coefficient calculation section calculates the correlation coefficient between property a and / or property b and raw material 1, raw material 2, raw material 3, temperature, and time using the read training data as data visualization. The pair of features with the highest correlation coefficient is extracted, and the calculation and extraction results are sent to the first generation section 21 as the first generation result, and the first generation section 21 adds interpretation comments (rule-based). 【0030】 Step 2 In the second step, as the second stage of data visualization, a scatter plot is drawn that graphs the correlation between the dependent variable and the independent variables. The scatter plot can be drawn with the dependent variable / independent variable on the vertical axis / horizontal axis, or with the dependent variable / independent variable on the horizontal axis / vertical axis, or both patterns can be placed side by side. 【0031】 Specifically, the first generation unit 21 of the analysis support agent server 20 controls the workflow and, after the correlation coefficient calculation unit, activates the scatter plot drawing unit of the data verification and visualization unit 131 of the processing unit 13 of the cloud service server 10. The scatter plot drawing unit uses the loaded training data to create scatter plots of property a and / or property b and raw material 1, raw material 2, raw material 3, temperature, and time, for the pair of features with the highest correlation coefficient extracted in the correlation coefficient calculation. The drawing results are then sent to the first generation unit 21 as the first generation result, and the first generation unit 21 adds interpretation comments (rule-based). 【0032】 • Step 3 In the third step, the model is optimized using MI (Materials Informatics) techniques. 【0033】 Specifically, the first generation unit 21 of the analysis support agent server 20 controls the workflow and, after the scatter plot drawing unit, activates the model optimization and evaluation unit 134 of the processing unit 13 of the cloud service server 10. The model optimization and evaluation unit 134 constructs all models (linear models, various nonlinear models) using the training data, and from these, the model with the best accuracy index is selected as the optimal model. The selection result is then sent to the first generation unit 21 as the first generation result, and the first generation unit 21 adds interpretation comments (rule-based). The selection result may include drawing a yy plot (a plot of actual values ​​(y-obs) on the horizontal axis and predicted values ​​(y-pred) on the vertical axis) to visualize the selection of the optimal model. 【0034】 • Step 4 In the fourth step, predictions are made using MI (Materials Informatics) methods. 【0035】 Specifically, the first generation unit 21 of the analysis support agent server 20 controls the workflow, and after the model optimization and evaluation unit 134, the prediction and exploration unit 136 of the processing unit 13 of the cloud service server 10 is activated. The prediction and exploration unit 136 performs regression analysis or Bayesian optimization, and recommended conditions are presented based on the prediction results and AD (Applicability Domain; model application range). Then, the target variable of the prediction data is predicted using the optimal model with the best accuracy, and the prediction results are sent to the first generation unit 21 as the first generation result, and the first generation unit 21 adds interpretation comments (rule-based). 【0036】 • Step 5 In the fifth step, the first generation results obtained in the first to fourth steps described above are used to generate the second generation results using the database server 30 (RAG) and / or an external generation AI 40. 【0037】 Specifically, the first generation unit 21 of the analysis support agent server 20 controls the workflow, and once the processing of the prediction / search unit 136 is completed, the second generation unit 22 performs the following as second generation results: a summary of the first generation results by the summary generation unit 221, suggestions for the first generation results by the comment generation unit 222, generation of interpretation comments and recommendation comments, and generation of a report by the report generation unit 223. These results are then sent to the output unit 112 of the user interface unit 11 of the cloud service server 10. 【0038】 In the fifth step, the database server 30 for Search Enhanced Generation (RAG) may be used in the generation of suggestion, interpretation, and recommendation comments for the first generation results by the comment generation unit 222. Furthermore, an external generation AI 40 may be used in the summarization of the first generation results by the summary generation unit 221, the generation of suggestion, interpretation, and recommendation comments for the first generation results by the comment generation unit 222, and the generation of the report by the report generation unit 223. When using the external generation AI 40, as described above, feature names are masked to ensure confidentiality before being sent to the generation AI 40. The comment generation unit 222 is composed of an LLM layer and adds suggestion, interpretation, and recommendation comments to the analysis data, analysis flow, and analysis results obtained as the first generation results, and outputs a summary of them as a report. The report generation unit 223 removes the masking of feature names, generates a report, and sends it to the output unit 112 of the user interface unit 11. At the output unit 112, the report can be downloaded, for example, as a PDF file. When the second generation unit 22 communicates with the generation AI 40, it masks the feature names related to the analysis to ensure confidentiality, and when generating a report for the user, it removes the masking from the feature names before outputting it. 【0039】 (Data Visualization - Correlation Coefficient, Scatter Plot) The data visualization (calculation of correlation coefficients and plotting of scatter plots) related to the first and second steps described above will be explained in more detail below. Note that the feature design described below may be processed as a task related to the feature design unit 133 of the processing unit 13. 【0040】 • General Information The purpose of calculating the correlation coefficient is to understand the relationships between features and to use this information to improve feature design (feature engineering) and the accuracy, stability, and interpretability of predictive models. Since the correlation coefficient may be overestimated or underestimated due to outliers, it is recommended to actually draw a scatter plot and check it visually. To prepare for such cases, the input section 111 of the user interface section 11 of the cloud service server 10 displays a "Visualization" button, and under its functions, in addition to "Visualization > Data Visualization > Correlation Coefficient," it may be possible to draw a scatter plot by selecting "Visualization > Data Visualization > Scatter Plot" or "Visualization > Data Visualization > Scatter Plot Matrix." The data distribution and outliers may also be checked using the various functions under "Visualization > Data Visualization." 【0041】 • Are there any explanatory variables with a large correlation coefficient |r| for each dependent variable? When determining the correlation coefficient, a threshold value may be set depending on the degree required for the analysis. For example, |r|≧0.7, which is generally considered to indicate a strong correlation, can be used as the threshold value. 【0042】 [If applicable explanatory variables exist] The relevant explanatory variables are important variables for predicting the corresponding dependent variable and may be candidates for variables that should be prioritized in experiments, such as those whose conditions should be varied or whose range should be broadened. Regarding the sign of the correlation coefficient |r|, a positive sign indicates that as the explanatory variable increases / decreases, the dependent variable tends to increase / decrease; a negative sign indicates that as the explanatory variable increases / decreases, the dependent variable tends to decrease / increase. For model construction, it may be possible to construct a highly interpretable model using a linear model (OLS (Ordinary Least Squares)). However, it is necessary to check the correlation coefficients with other explanatory variables and to confirm that there is no multicollinearity (correlation between explanatory variables). 【0043】 [If there are no applicable explanatory variables] When constructing a model, it may be necessary to consider nonlinear models (NSVR (Nonlinear Support Vector Regression)), preprocessing of feature transformation, and the addition of features based on domain knowledge. In such cases, the user can press the "Preprocessing" button displayed on the input section 111 of the user interface section 11 of the cloud service server 10, and under its functions, preprocessing of feature transformation can be performed via "Preprocessing > Feature Transformation". 【0044】 Regarding whether there are pairs of features with a large correlation coefficient |r| between the explanatory variables. In determining the correlation coefficient, |r|≧0.7 can be used as the baseline value, as described above. 【0045】 [If there is a matching pair of features] The relevant feature pairs have overlapping information, and multicollinearity can lead to unstable results in model construction. Therefore, it is desirable to remove one of the feature pairs with a large correlation coefficient |r| beforehand. In such cases, the "Preprocessing" button displayed in the input section 111 can be pressed, and under its functions, feature removal preprocessing can be performed by selecting "Preprocessing > Feature Selection > Remove Features with a Large Proportion of Same Values". Additionally, eliminating information redundancy through dimensionality reduction can be effective. In such cases, the "Visualization" button displayed in the input section 111 can be pressed, and under its functions, dimensionality reduction can be performed by selecting "Visualization > Reduce Dimensionality". Furthermore, there are methods to prioritize the selection of models that are relatively resistant to multicollinearity during model construction, such as "PLS (Partial Least Squares), RR (Ridge Regression), EN (Elastic Net)". However, it is desirable to remove explanatory variables beforehand whenever possible. 【0046】 [If no matching feature pair is found] No action is required. 【0047】 • When there are two or more dependent variables, is there a pair of features with a large correlation coefficient |r| between the dependent variables? In determining the correlation coefficient, |r|≧0.7 can be used as the baseline value, as described above. 【0048】 [If no matching feature pair is found] It can be said that each target variable tends to be independent, and in such cases, simultaneous optimization may be possible by pressing the "MI" button displayed on the input unit 111 and then using the "MI>Model Optimization" function to build and select the optimal model for each target variable. 【0049】 [If there is a matching pair of features / the correlation coefficient has a negative sign] If the sign is negative, it may indicate that the paired features are in a trade-off relationship, and prioritizing the target variable may be necessary when determining experimental conditions. Understanding the Pareto optimal solution is useful for visualizing trade-off relationships and as a guideline when determining experimental conditions. In such cases, the "Visualize" button displayed on the input unit 111 can be pressed, and this can be done under the function "Visualize > Pareto Optimal Solution". 【0050】 [If there is a matching pair of features / the sign of the correlation coefficient is positive] If the sign is positive, there may be overlapping information, such as one feature in a pair being computable, and it is necessary to consider whether optimization is truly needed for all target variables. 【0051】 [Other cases] If there are correlations between multiple target variables and you want to build a model that takes these relationships into account, you can press the "MI-Expert" button displayed in the input section 111 and use the "MI-Expert>Multiple Y Simultaneous" function to optimize multiple target variables simultaneously. However, since this is an advanced function, it is recommended to start with "MI>Model Optimization". 【0052】 (Example of a user interface) Based on the configuration of the information processing system 1 described above, an example of the user interface section 11 of the cloud service server 10 (hereinafter sometimes referred to as the user screen; the user screen is displayed on the user terminal 50) will be explained with reference to Figures 2 to 5. These user screens relate to material design, and the menu section includes buttons for "AI Agent" to operate the analysis support agent server 20, as well as buttons for "Preprocessing," "Visualization," "MI," and "MI-Expert" to be selected according to the analysis of the input data. 【0053】 The user screen first displays a workflow selection screen, as shown in Figure 2 (Figure 1 showing an example of input). On this screen, under the title "Workflow Selection," the message "Please select the workflow you want to execute." is displayed. Multiple workflows are listed, but here we will explain the case where "#1 (1) Data Visualization (Correlation Coefficient, Scatter Plot) → Model Optimization → Prediction (Regression Analysis or Bayesian Optimization)" is selected as workflow (1). Note that the workflow may also include other options, such as workflow (2) "#2 (2) Material Composition Recipe Optimization." 【0054】 The workflow selection screen displays an overview, user requirements, and notes for each workflow, but for workflow (1), it will be displayed as follows, for example: 【0055】 [overview] The functionality of Workflow (1) is described as follows: "It handles training data and prediction data (loaded / automatically generated), automatically recommends regression analysis / BO (Bayesian Optimization) based on accessibility and interpolability, and generates a summary & PDF using Markdown (one of the lightweight markup languages) notation." Here, the " / " in "prediction data (loaded / automatically generated)" represents "or," and the user selects one or the other. From the user's perspective, "loaded" means uploading the data. 【0056】 [User preparations] The following points are displayed as things that users using workflow (1) need to prepare. The training data and prediction data are prepared as CSV text files, but an example of training data shown in tabular format is shown in Figure 6. Regarding the "Training Data (CSV)," please note that "the first column should be name or blank / NaN (Not a Number). A header is required. Rows ≤ 100, and description columns ≤ 10." Note that the "name" column should be excluded from the "dependent variable (columns 1-2)". In the example in Figure 6, the dependent variable is represented by two columns: property a and property b. Regarding the "target range for the dependent variable," a note is added stating that "lower and upper numerical values ​​are required." Regarding "Prediction Data (CSV)," a note states that "it can be imported or automatically created. Rows ≤ 1000, and all explanatory variables from the training data must be included." 【0057】 [remarks] The following points are displayed as notes and supplementary information regarding the functionality of Workflow (1). • Note: "Correlation is Pearson (numerical columns only). Be aware of the influence of outliers." • Note that "the automatic creation of prediction data will be based on the min-max ± extrapolation ratio (default 20%)." 【0058】 Furthermore, if workflow (2) is specified as, for example, "#2 (2) Optimization of material composition recipe," the description would be: "Bayesian optimization search of the design space for material composition to optimize the trade-off between performance and cost." 【0059】 The user screen then displays the input screen for training data, as shown in Figure 3 (Figure 2-1 showing an example of input). On this screen, under the title "Analysis Support Agent: WF_1", the items to be entered for "Training Data" are displayed, for example, as follows: 【0060】 • At the top, you will see a button labeled "<Select Training Data>". Click it. • The filename "virtual_resin.csv" is a note indicating that the input is a CSV file. • Under "Select dependent variables (1-2)", and with the note "Dependent variables (excluding the name column)",<property a > ,<property b > The selection button will be displayed. The user will<property a > and<property b > Press at least one of them. • Under "Select explanatory variables," and with the note "Explanatory variables (dependent variable cannot be selected),"<raw material 1 > ,<raw material 2 > ,<raw material 3 > , <temperature> 、 <time>The selection button will be displayed. The user will<raw material 1 > ,<raw material 2 > ,<raw material 3 > , <temperature>and <time>Press at least one of them. ·<property a > If selected, enter numerical values ​​in the <Lower Limit> and <Upper Limit> input fields displayed below. Below the input fields, there is a note that reads "Enter a numerical value (cannot be left blank: to enable the prediction tab)" as "Target value for property a". The lower and upper limits can be set arbitrarily depending on the specific analysis, but here we have provided an example where <Lower Limit=0> and <Upper Limit=1> are used. Below the <Lower Limit>, there is a note that reads "min=0.037 (sample_12), max=0.775 (sample_8), number of items=20 (sample_1~20)" based on the distribution of the input training data (here, an example shown in Figure 6). ·<property b > If this option is selected, the "target value for property b" will be set accordingly. 【0061】 The user screen then displays a screen for inputting prediction data, as shown in Figure 4 (Figure 2-2 showing an example of input). On this screen, under the title "Analysis Support Agent: WF_1", the items to be entered for "Prediction Data" are displayed, for example, as follows: 【0062】 Regarding "How to create prediction data," select either <Load> or <Automatic creation (including training data distribution ± extrapolation)>. This example shows the case where <Load> is selected. After making the selection, press the <Select prediction data> button. If the user selects "Automatic generation," the prediction data creation unit 135 of the processing unit 13 will create the prediction data. • For "Recommended Approach (Automatic Determination: Feasibility + Interpolability)," either <Regression Analysis> or <Bayesian Optimization> will be selected. Here, we show the case where <Bayesian Optimization> is selected, and as justification, based on the input training data (here, an example shown in Figure 6), "Feasibility Check: Possible (Number of training data rows that satisfy the condition: 20)" and "Interpolability Check: Extrapolation (Percentage of interpolated rows: 62.2% / Threshold: 80%)" are displayed. • The entered (loaded) "prediction data" is displayed in a table format at the bottom of the screen. Here, as selected in Figure 3, name, raw material 1, raw material 2, raw material 3, temperature, and time are displayed, with the case where the number of items is 1000 being used as an example. 【0063】 Once the analysis is complete, the user screen will display the output screen, as shown in Figure 5 (a diagram showing an example of the output). On this screen, under the title "Summary Report (Markdown Preview)," "# Analysis Results Report" will be displayed. Markdown is a lightweight markup language that uses symbols such as "#" and "*" to structure text. The content of "# Analysis Results Report" will be displayed based on the training data and prediction data given as examples above, for example, as follows: 【0064】 [## 1. Overview] • The "dependent variable / target range (target value range)" is displayed as "property a, property b / property a: [0,1] / property b:[0,1]". • The "explanatory variables (selection)" will be displayed as "raw material 1, raw material 2, raw material 3, temperature, time". • The "Recommended Approach" is displayed as "**Bayesian Optimization** (Basis: Feasibility = Possible (Number of cases: 20), Interpolation rate = 62.2%)". 【0065】 [## 2. Data Overview] • The "training data" is displayed as "20 rows x 8 columns (0.0% missing data)". Regarding "Predicted Data," it displays "1000 rows x 6 columns (Creation method: Import)." 【0066】 [## 3. Correlation Analysis (Pearson (Training Data - Numeric Columns Only))] The analysis results for "Major Positive Correlations" are displayed as "property a  ̄ raw material 3 (+0.95) / property a  ̄ temperature (+0.05) / property b  ̄ raw material 2 (+0.65) / property b  ̄ raw material 3 (+0.30)". For property a, raw material 3 and temperature are listed as feature pairs, and for property b, raw material 2 and raw material 3 are listed as feature pairs. The analysis results for "Major Negative Correlation" are displayed as "property a  ̄ raw material 1 (-0.89) / property a  ̄ raw material 2 (-0.24) / property b  ̄ raw material 1 (-0.54) / property b  ̄ time (-0.03)". For property a, raw material 1 and raw material 2 are listed as feature pairs, and for property b, raw material 1 and time are listed as feature pairs. The "Notes" section displays the following: "Correlation does not imply causation. Please be aware of the influence of outliers." Regarding "scatter plots," for example, it will be displayed as "*Auxiliary scatter plots (if available) are shown below: property a × raw material 3, property a × raw material 1*". The displayed scatter plots will be those with the largest correlation coefficients among the major positive and negative correlations. The scatter plots may continue to be displayed on the output screen, but for the sake of explanation, Figure 7 shows the scatter plot of property a × raw material 3 as an example, as shown in a separate figure. The scatter plot of property a × raw material 1 is omitted. 【0067】 Figure 7 shows an example where the dependent variable property a is on the horizontal axis and the independent variable raw material 3 is on the vertical axis. However, the axes can also be reversed, with the dependent variable property a on the vertical axis and the independent variable raw material 3 on the horizontal axis. When the dependent variable property a is on the horizontal axis and the independent variable raw material 3 is on the vertical axis, as in Figure 7, it has the advantage of making it relatively easier to determine the amount of raw material 3 needed to obtain the target value of property a. 【0068】 [## 4. Model Selection and Performance] • The selected model will be displayed as "Bayesian optimization". • Regression performance (R 2 The " / RMSE / MAE)" may be included in the output screen, but here it is shown as an external output. 2 ∫ represents the coefficient of determination, RMSE represents the Root Mean Squared Error, and MAE represents the Mean Absolute Error. Note that the evaluation metrics for regression performance are not limited to these and other metrics may be used. • For "important features," you can select and display them from raw material 1, raw material 2, raw material 3, temperature, and time. 【0069】 [Correlation Analysis (Additional Visualization)] • As an aid to the "Correlation Analysis (Pearson)" guidelines, the message "A scatter plot is being drawn" is displayed. Here, as mentioned above, Figure 7 shows a scatter plot of property a × raw material 3 as an example. 【0070】 (Example of information processing system operation) Next, we will explain an example of the operation of the workflow (1) related to material design in the information processing system 1. 【0071】 Figure 8 is a flowchart showing an example of the operation of the information processing system 1. This example is implemented as a computer program that can effectively support analysis, including material design, through the processing functions related to each step described below. The program implements each processing function corresponding to each part of the cloud service server 10 and the analysis support agent server 20 (workflow selection function and various setting input functions included in the input function, output function, memory function, correlation coefficient calculation function, scatter plot drawing function, model optimization function and prediction function included in the processing function, first generation function, summary generation function, comment generation function and report generation function included in the second generation function). The second generation function includes communication functions with the database server 30 (RAG) and the generation AI 40. 【0072】 In step S100, the user selects a workflow related to data analysis to be processed in the information processing system 1 using the user terminal 50. Data analysis may include material design, molecular design, and process design, but here we will explain the analysis process for material design. Furthermore, the workflow related to material design may include multiple workflows, but here we will explain the case where "#1 (1) Data visualization (correlation coefficient, scatter plot) → Model optimization → Prediction (regression analysis or Bayesian optimization)" is selected as workflow (1). 【0073】 In step S102, the training data (CSV file) is input into the information processing system 1. Various settings, including the target variable and the target value (target value range) of the target variable, are also input. Note that steps S100 and S102 correspond to the aforementioned step 0. 【0074】 In step S104, the prediction data (CSV file) is input into the information processing system 1. 【0075】 In step S106, the data visualization process is executed in the processing unit 13 of the cloud service server 10 by the control of the workflow (1) by the first generation unit 21 of the analysis support agent server 20. For data visualization, first, the correlation coefficient calculation process is performed in the correlation coefficient calculation unit of the data verification and visualization unit 131, and based on the result, the scatter plot drawing process is performed in the scatter plot drawing unit of the same data verification and visualization unit 131. Step S106 corresponds to the first and second steps described above. 【0076】 In step S108, the first generation unit 21 of the analysis support agent server 20 controls the workflow (1), and the model optimization and evaluation unit 134 selects the optimal model from among multiple models. Step S108 corresponds to the third step described above. 【0077】 In step S110, the first generation unit 21 of the analysis support agent server 20 controls the workflow (1), and the prediction / search unit 136 performs a prediction using the optimized model. Step S110 corresponds to the fourth step described above. 【0078】 In step S112, the first generation unit 21 of the analysis support agent server 20 aggregates the results of data visualization (correlation coefficient, scatter plot), model optimization, and prediction as the first generation result, and the second generation unit 22 generates a summary of the first generation result, insights / interpretation comments / recommendation comments, and a report as the second generation result. At this time, it communicates with the database server 30 (RAG) and the external generation AI 40 to make the second generation result even more appropriate. 【0079】 In step S114, upon receiving a request from the comment generation unit 222 of the second generation unit 22, the database server 30 (RAG) performs an extended search to prevent hallucinations and utilize the corporate knowledge of the provider of the information processing system 1, and this is reflected in the generation of suggestion / interpretation comments / recommendation comments. 【0080】 In step S116, the generation AI 40 receives prompts from the summary generation unit 221, comment generation unit 222, and report generation unit 223 of the second generation unit 22, and generates responses related to the generation of a summary of the first generation result, insight / interpretation comments / recommendation comments, and a report. The final second generation result is generated by reflecting each of these responses. When the first generation result is sent to the generation AI, the feature names are masked. 【0081】 In step S118, the final second generation result is output as a report to the output unit 112 of the cloud service server 10. The user can view the report output to the output unit 112 on the user screen of the user interface unit 11 on the user terminal 50, and can also download it as a PDF file. In the report, the masking of feature names is removed. 【0082】 As explained in detail above, the information processing system 1 according to this embodiment includes a cloud service server 10 and an analysis support agent server 20. Through the cooperation of both servers, analysis results with extremely high-quality comments utilizing the LLM layer can be obtained, thereby providing users with an information processing system that can effectively support analysis, including material design. 【0083】 Although embodiments have been described in detail, the invention is not limited to any particular embodiment, and various modifications and changes are possible within the scope of the claims. Furthermore, it is possible to combine all or more of the components of the embodiments described above. [Explanation of symbols] 【0084】 1. Information Processing System 10 Cloud Service Servers 11. User Interface Section 111 Input Section 111a Workflow Selection Section 111b Various setting input section 112 Output section 13 Processing Unit 131 Data Verification and Visualization Department 132 Data Preprocessing Section 133 Feature Design Department 134 Model Optimization and Evaluation Department 135 Prediction Data Creation Department 136 Prediction and Search Department 20 Analysis support agent server 21 1st generation part 22 Second generation part 221 Summary generator 222 Comment Generation Unit 223 Report Generation Department 30 Database Servers 31 Vector Data 40 Generation AI 41 LLM 50 User Terminals< / time> < / temperature> < / time> < / temperature>

Claims

[Claim 1] A cloud service server that provides data analysis and machine learning cloud services in the chemical field related to analysis including material design, molecular design, and process design, and includes a processing unit that processes one or more tasks constituting a predetermined workflow selected by the user for the said analysis to obtain analysis results related to the said analysis, The system includes an analysis support agent server that supports the analysis, comprising: a first generation unit that controls the selected predetermined workflow to operate the cloud service server and generates a first generation result including the analysis results and rule-based interpretation comments on the analysis results based on a series of rules and conditions set in advance by the expert involved in the analysis; and a second generation unit that transmits the first generation result to a large-scale language model (LLM) of an external generation AI connected via a network, receives a reply from the generation AI, and generates a second generation result including a summary, insights / interpretation comments / recommendation comments, and a report related to the analysis. The second generation unit is an information processing system that, when communicating with the generation AI, performs a masking process on the feature names related to the analysis, and when generating the report for the user, removes the masking of the feature names. [Claim 2] The processing unit of the aforementioned cloud service server is The data verification and visualization unit handles tasks including the visualization of input data, A preprocessor that handles tasks including missing value imputation, dummy variable creation, and spectral smoothing, A feature design unit that handles tasks including feature transformation, feature selection, and calculation of mixtures, The optimization and evaluation unit handles tasks including model optimization, classification, variable importance calculation, and anomaly factor analysis models. A predictive data creation unit that processes tasks including the creation of predictive data relating to desired matters, The information processing system according to claim 1, comprising a prediction and search unit that handles tasks including regression analysis and Bayesian optimization. [Claim 3] Furthermore, it includes a database server related to search extension generation (RAG), The information processing system according to claim 1, wherein the analysis support agent server communicates with the database server via the second generation unit and, in order to prevent hallucinations in the second generation results, utilizes the in-house knowledge of the provider of the system and reflects it in the generation of the suggestion / interpretation comments / recommendation comments. [Claim 4] The aforementioned predetermined workflow includes a first workflow relating to material design, The information processing system according to claim 1, wherein the first workflow comprises individual tasks related to data visualization, model optimization, and prediction regarding the correlation between a target variable, whose variable is a numerical value representing the properties of a desired substance, and an explanatory variable, whose variable is a numerical value that may have a correlation with the properties of the desired substance. [Claim 5] The aforementioned data visualization is performed by calculating correlation coefficients and drawing scatter plots based on training data in a comma-separated values ​​(CSV) file entered by the user. The aforementioned model optimization is performed by selecting from multiple models based on accuracy metrics. The information processing system according to claim 4, wherein the prediction is performed by selecting from regression analysis or Bayesian optimization based on prediction data in a CSV file input by the user or automatically generated. [Claim 6] A cloud service server that provides data analysis and machine learning cloud services in the chemical field, including material design, molecular design, and process design, includes a processing step of processing one or more tasks that constitute a predetermined workflow selected by the user for the said analysis to obtain analysis results related to the said analysis, In the analysis support agent server that supports the analysis, a first generation step is performed to control the selected predetermined workflow to make the cloud service server function and to generate a first generation result that includes the analysis results and rule-based interpretation comments on the analysis results based on a series of rules and conditions set in advance by the expert involved in the analysis. The analysis support agent server includes a second generation step in which it transmits the first generation result to a large-scale language model (LLM) of an external generation AI connected via a network, receives a reply from the generation AI, and generates a summary, insights / interpretation comments / recommendation comments, and a report related to the analysis as a second generation result. The second generation step is an information processing method that, when communicating with the generating AI, masks the feature names related to the analysis, and when generating the report for the user, removes the masking of the feature names. [Claim 7] A cloud service server that provides data analysis and machine learning cloud services in the chemical field, including material design, molecular design, and process design, includes a processing step of processing one or more tasks that constitute a predetermined workflow selected by the user for the said analysis to obtain analysis results related to the said analysis, In the analysis support agent server that supports the analysis, a first generation step is performed to control the selected predetermined workflow to make the cloud service server function and to generate a first generation result that includes the analysis results and rule-based interpretation comments on the analysis results based on a series of rules and conditions set in advance by the expert involved in the analysis. In the aforementioned analysis support agent server, the first generation result is transmitted to a large-scale language model (LLM) of an external generation AI connected via the network, and a reply is received from the generation AI, and a second generation step is performed in which a summary, insights / interpretation comments / recommendation comments, and a report related to the analysis are generated as the second generation result. The cloud service server and the analysis support agent server are instructed to execute the following: The second generation step is a program that, when communicating with the generation AI, performs a masking process on the feature names related to the analysis, and when generating the report for the user, removes the masking of the feature names.