Facility state diagnosis system and method using multivariate probability distribution and classification model
The system addresses the challenges of data label scarcity and expert reliance by using a multivariate probability distribution and classification model to automate data preprocessing and labeling, enhancing the accuracy and efficiency of equipment condition diagnosis in power plants.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- KOREA HYDRO & NUCLEAR POWER CO LTD
- Filing Date
- 2025-06-19
- Publication Date
- 2026-07-02
AI Technical Summary
Existing equipment condition diagnosis systems face challenges in accurately diagnosing equipment status due to a lack of data labels and reliance on expert knowledge, which hinders the reliability and efficiency of condition diagnosis in complex power plant facilities.
A system utilizing a multivariate probability distribution and classification model that automatically generates data labels and integrates data preprocessing, label assignment, model training, and diagnosis, employing techniques like PCA and SVM to analyze operating signals and distinguish normal from abnormal data.
Improves the reliability and efficiency of equipment condition diagnosis by accurately diagnosing equipment status and reducing the need for manual labeling and expert intervention, enabling real-time detection of abnormalities.
Smart Images

Figure KR2025095439_02072026_PF_FP_ABST
Abstract
Description
Equipment condition diagnosis system and method using multivariate probability distribution and classification model
[0001] The present invention relates to an equipment condition diagnosis system using a multivariate probability distribution and a classification model, and more specifically, to an equipment condition diagnosis system using a multivariate probability distribution and a classification model that diagnoses the condition of equipment and classifies abnormal data by utilizing data labeling automation and a multivariate probability distribution model.
[0002] Modern power plant facilities consist of complex systems, and technology for accurately diagnosing equipment status is essential to enhance operational efficiency and prevent failures in advance. While data-driven approaches are being introduced in equipment status diagnosis, existing technologies have the following problems:
[0003] First, it is difficult to accurately diagnose the condition of collected equipment data due to a lack of labels. Labels contain information indicating whether specific data is in a normal or abnormal state; however, manually assigning labels at power plant sites is practically difficult due to the vast volume and complexity of the data. This lack of labels lowers the reliability of the training model, which in turn makes accurate condition diagnosis difficult.
[0004] Second, in the absence of experts, it is difficult to analyze data and secure reliable labels. Expert knowledge of equipment and operations is essential for condition diagnosis, which requires considering various factors and variables affecting equipment status. However, when experts are scarce or absent, situations arise where it is difficult to accurately interpret the data status.
[0005] To address this, technology is required that analyzes operating signals generated by equipment to identify data status and automatically assigns normal and abnormal labels. Such technology can simultaneously resolve the issue of data label shortages and reliance on experts. Furthermore, a condition diagnosis process is required that integrates and automates the entire process, from operating signal analysis and label generation to learning model construction and condition diagnosis.
[0006] [Prior Art Literature]
[0007] [Patent Literature]
[0008] (Patent Document 1) Republic of Korea Patent Publication No. 2023-0028995 (March 3, 2023)
[0009] The present invention aims to provide an equipment condition diagnosis system using a multivariate probability distribution and a classification model that resolves the problem of insufficient data labels and the difficulty of diagnosis in the absence of experts, and improves the reliability and efficiency of condition diagnosis by automatically distinguishing normal and abnormal data based on operating signals.
[0010] According to the present invention, the apparatus comprises: a data set generating unit (100) that generates training data including normal data and abnormal data and preprocesses it to form a dataset suitable for training; a multivariate probability distribution calculating unit (200) that receives driving signal data, analyzes key variables, and calculates a joint probability density function (Joint-PDF) to calculate the degree of abnormality of the data; a label automation unit (300) that automatically assigns normal and abnormal labels based on the calculated degree of abnormality and generates labeling data; a learning model generating unit (400) that generates a classification model capable of distinguishing equipment status based on the labeled data; a status diagnosis execution unit (500) that utilizes the generated classification model to analyze input unlabeled data and diagnose equipment status to generate diagnosis result data; and a result output unit (600) that visualizes the results of the equipment status based on the diagnosis result data.
[0011] The data set generation unit (100) processes missing values in driving signal data, removes outliers, and performs normalization of the data to generate a dataset suitable for learning.
[0012] The above multivariate probability distribution calculation unit (200) applies principal component analysis (PCA) to calculate the joint probability density function (Joint-PDF) to extract principal variables.
[0013] The above label automation unit (300) automatically assigns normal and abnormal labels by comparing a predefined abnormal condition threshold with a calculated degree of abnormality.
[0014] The above learning model generation unit (400) generates a model capable of classifying equipment status using Support Vector Machines (SVM) based on labeled data.
[0015] A method using an equipment condition diagnosis system utilizing a multivariate probability distribution and a classification model comprises: (a) a step in which the equipment condition diagnosis system generates training data including normal data and abnormal data; (b) a step in which the equipment condition diagnosis system receives operating signal data, analyzes key variables, and calculates a Joint Probability Density Function (Joint-PDF) to calculate the degree of abnormality of the data; (c) a step in which the equipment condition diagnosis system automatically assigns normal and abnormal labels based on the calculated degree of abnormality and generates labeled data; (d) a step in which the equipment condition diagnosis system generates a classification model capable of distinguishing equipment conditions based on the labeled data; (e) a step in which the equipment condition diagnosis system utilizes the generated classification model to analyze input unlabeled data and diagnose equipment conditions to generate diagnosis result data; and (f) a step in which the equipment condition diagnosis system visualizes the results of the equipment conditions based on the generated diagnosis result data.
[0016] According to the present invention, the problem of insufficient data labels and reliance on experts is reduced, and the diagnostic efficiency and reliability are improved by accurately diagnosing the condition of the equipment based on operating signals.
[0017] FIG. 1 shows an equipment condition diagnosis system using a multivariate probability distribution and a classification model according to one embodiment of the present invention.
[0018] FIG. 2 is a flowchart illustrating a method using an equipment condition diagnosis system using a multivariate probability distribution and a classification model according to one embodiment of the present invention.
[0019] Figure 3 is intended to explain the condition diagnosis of an equipment condition diagnosis system using a multivariate probability distribution and a classification model according to one embodiment of the present invention.
[0020] Figure 4 is intended to explain multivariate probability distribution-based diagnosis by defect mode in an equipment condition diagnosis system using a multivariate probability distribution and a classification model according to an embodiment of the present invention.
[0021] FIG. 1 shows an equipment condition diagnosis system using a multivariate probability distribution and a classification model according to one embodiment of the present invention.
[0022] As illustrated in FIG. 1, the equipment condition diagnosis system (10) using a multivariate probability distribution and a classification model includes a data set generation unit (100), a multivariate probability distribution calculation unit (200), a label automation unit (300), a learning model generation unit (400), a condition diagnosis execution unit (500), and a result output unit (600).
[0023] The data set generation unit (100) is responsible for generating training data including normal data and abnormal data. This configuration processes missing values, removes outliers, and performs preprocessing steps such as data normalization to form a dataset suitable for training. This ensures data quality and improves the accuracy of subsequent analysis and training stages.
[0024] That is, the data set generation unit (100) processes missing values in the driving signal data, removes outliers, and performs normalization of the data to generate a dataset suitable for learning.
[0025] The multivariate probability distribution calculation unit (200) receives driving signal data, analyzes key variables, and calculates the joint probability density function (Joint-PDF) to calculate the degree of abnormality in the data. To this end, it uses principal component analysis (PCA) to extract key variables and quantitatively evaluates the abnormal state of the data based on this.
[0026] For reference, Principal Component Analysis (PCA) is a data dimensionality reduction technique that extracts key information by reducing high-dimensional data to a lower dimension. PCA transforms data to represent it in a new coordinate system called principal components, and sets the main axes based on the direction that maximizes the data's variance.
[0027] The key characteristics of Principal Component Analysis are as follows. First, it enhances the efficiency of data visualization and analysis by reducing high-dimensional data to a smaller dimension. During this process, it preserves as much important information regarding data variability as possible even while reducing dimensions. Second, it extracts key features of the data by selecting principal components, which are the axes containing the most information based on the data's variance. The first principal component represents the direction of the data's greatest variance, while the second principal component represents the direction of the second greatest variance that is orthogonal to the first.
[0028] Third, unnecessary data can be removed through noise removal, and the core patterns of the data can be derived. Fourth, the covariance matrix is calculated using mathematical methods, and eigenvalue decomposition is performed to obtain eigenvectors (principal components) and eigenvalues (importance of each principal component). Then, important principal components—that is, those with large eigenvalues—are selected to transform the data.
[0029] Applications of Principal Component Analysis include data visualization, feature extraction, and noise reduction. In data visualization, high-dimensional data can be reduced to two or three dimensions and represented as graphs. In feature extraction, computational efficiency can be improved by reducing the dimensionality of input data for training models. Additionally, noise reduction can minimize abnormal fluctuations in data and strengthen signals.
[0030] In the present invention, key variables are extracted from driving signal data through PCA to select key data necessary for calculating the Joint Probability Density Function (Joint-PDF). This increases the efficiency and accuracy of the model and prevents overfitting caused by unnecessary data.
[0031] The label automation unit (300) automatically generates labeling data by distinguishing between normal and abnormal states based on the calculated degree of abnormality. This unit assigns normal and abnormal labels by comparing the degree of abnormality with a predefined threshold value, thereby efficiently performing data labeling without manual work.
[0032] The learning model generation unit (400) generates a classification model capable of distinguishing equipment status based on labeled data. This learning model generation unit trains a model that predicts equipment status by utilizing algorithms such as Support Vector Machine (SVM), and improves the accuracy of status classification by searching for an optimal hyperplane.
[0033] The state diagnosis execution unit (500) utilizes the generated classification model to analyze input unlabeled data and diagnose the state of the equipment to generate diagnosis result data. This unit receives real-time or batch data to diagnose the state of the equipment, thereby enabling rapid detection of abnormal conditions in the equipment currently in operation.
[0034] The result output unit (600) visualizes the results of the equipment status or provides them to the user based on the diagnostic result data generated by the status diagnosis execution unit (500). This unit outputs the diagnostic results in the form of a graph, dashboard, or report to provide the user with intuitive information about the equipment status.
[0035] FIG. 2 is a flowchart illustrating a method using an equipment condition diagnosis system (hereinafter referred to as the system) using a multivariate probability distribution and a classification model according to one embodiment of the present invention.
[0036] As illustrated in FIG. 2, in a method using a facility condition diagnosis system using a multivariate probability distribution and a classification model, (a) the system generates training data including normal data and abnormal data.
[0037] Next, (b) system receives driving signal data, analyzes key variables, and calculates the Joint Probability Density Function (Joint-PDF) to determine the degree of abnormality in the data.
[0038] Next, (c) the system automatically assigns normal and abnormal labels based on the calculated degree of abnormality and generates labeling data.
[0039] Next, (d)the system generates a classification model capable of distinguishing equipment status based on labeled data.
[0040] Next, the (e) system utilizes the generated classification model to analyze the input unlabeled data and diagnose the equipment status to generate diagnostic result data.
[0041] And (f) the system includes a step of visualizing the results of the equipment status based on the generated diagnostic result data.
[0042] Figure 3 is intended to explain the condition diagnosis of an equipment condition diagnosis system using a multivariate probability distribution and a classification model according to one embodiment of the present invention.
[0043] When condition diagnosis begins, the system establishes a Joint Probability Density Function (JPDF) for each failure mode. This prepares the basic data for quantitatively calculating the degree of abnormality in each failure mode.
[0044] Next, regarding the calculation of the abnormal rate, the abnormality is calculated using JPDF based on the collected driving signal data. Through this, an initial assessment is performed to determine whether the current data is in a normal or abnormal state.
[0045] Next, a dataset containing normal and abnormal data is generated using preliminary training data. The generated dataset undergoes a preprocessing stage where tasks such as handling missing values, removing outliers, and normalizing the data are performed to transform it into data suitable for training.
[0046] Next, the preprocessed data is organized into a training dataset, and the creation of a classification model begins. During this process, key variables are extracted, and their relationships with specific failure modes are analyzed.
[0047] During the time stamp verification process, it is checked whether the time information of the existing labeled data is up to date. If it is not up to date, the data is updated.
[0048] During the model training accuracy evaluation process, the accuracy of the trained classification model is evaluated, and model training is completed when the accuracy is 90% or higher. The classifiers used include SVM (Support Vector Machine).
[0049] In the status diagnosis execution, the generated classification model is utilized to receive new unlabeled data as input and perform a status diagnosis. Through this process, it is determined whether the input data is in a normal or abnormal state, and diagnosis result data regarding the equipment status is generated.
[0050] Furthermore, the diagnostic results generated by the status diagnosis execution unit are visualized in the result output unit or delivered to the user. The output results are provided in the form of graphs, dashboards, or reports, allowing for an intuitive understanding of the equipment status.
[0051] Regarding abnormal state detection and feedback, if an abnormal state is determined based on the diagnostic results, the system updates the degree of abnormality and provides feedback for model training. This allows for the continuous improvement of the system's diagnostic accuracy by incorporating new data. This step-by-step state diagnosis process analyzes and diagnoses equipment status in real time based on operating signal data, thereby enhancing equipment stability and operational efficiency.
[0052] As illustrated in FIG. 3, the state diagnosis proposed in the present invention consists of calculating the degree of abnormality in a data set, creating a learning model using normal and abnormal data as learning data, and diagnosing a new data state based on the generated classification model.
[0053] For reference, the state diagnosis proposed in this invention calculates the abnormal rate in a data set through the following procedure.
[0054] First, driving signal data is analyzed to establish Joint Probability Density Functions (JPDFs) for each failure mode. This lays the foundation for modeling how specific variables in the data influence abnormal states.
[0055] Second, data is analyzed using Principal Component Analysis (PCA) or similar methods to select important variables (key parameters) for each failure mode. These variables play a key role in calculating the degree of abnormality.
[0056] Third, a joint probability density function is generated based on the selected key variables to represent the distribution of the data set. This function includes normal and abnormal data and calculates the probability that, given new data, the data is in a normal or abnormal state.
[0057] Fourth, when new data is input, this data is substituted into the joint probability density function to calculate the likelihood. This value indicates the probability that the data belongs to a stationary state.
[0058] Finally, the abnormal rate is calculated by determining the 1-likelihood. A higher value indicates that the data is closer to an abnormal state.
[0059] The present invention can diagnose the condition of equipment by expressing data as a joint probability density function and quantifying the degree of abnormality based on likelihood. Through this, the state of new data can be efficiently evaluated, and abnormal conditions can be detected early.
[0060] In addition, the process of Figure 3 can be reorganized and explained in detail as follows.
[0061] First, when the condition diagnosis process begins, the system enters an initial phase for analyzing the equipment status. In this phase, data is analyzed by failure mode, and a Joint Probability Density Function (JPDF) is generated to model the distribution of key variables influencing each failure mode. The generated JPDF is used to calculate the degree of abnormality in the input data, a task performed by the abnormality calculation module. Finally, the degree of abnormality is quantitatively calculated to evaluate how far the data has deviated from the normal state.
[0062] In the next step, a classification learning model is created. Preliminary training data is prepared to construct a dataset containing both normal and abnormal data, and data standardization and normalization processes are performed to transform the data into a form suitable for training. Subsequently, a classification model is generated based on the training data using supervised learning algorithms (e.g., Support Vector Machines). The prediction accuracy of the trained model is evaluated, and the training process is terminated if the accuracy is 90% or higher. During this process, the time information of the training data is checked and updated if necessary to repeatedly perform training with new data.
[0063] Finally, the predictive diagnosis process begins. Utilizing a trained classification model, new unlabeled data is received as input to evaluate the current status of the equipment. The input data is analyzed by the predictive diagnosis judgment module, which quantitatively determines the equipment status. The diagnosis results are visualized or output in the form of a report and provided to the user. Once all analyses are complete, the predictive diagnosis process is terminated, and the final results are delivered to the user.
[0064] Figure 4 is intended to explain multivariate probability distribution-based diagnosis by defect mode in an equipment condition diagnosis system using a multivariate probability distribution and a classification model according to an embodiment of the present invention.
[0065] The present invention includes a process of securing training data (data labeling) through diagnosis based on a joint model multivariate probability distribution. A Joint Probability Density Function (Joint-PDF) is used to extract abnormal proportions or abnormal values from a data set. For each predefined type of failure, explanatory factors (diagnostic parameters) are constructed using the values of operational data affecting the corresponding failure, and principal variables (n parameters) are extracted through Principal Component Analysis (PCA) for the data (multivariate data) for each failure type. The extracted principal variables are utilized to generate the Joint Probability Density Function, and when new data is input, the likelihood for that data is calculated. Finally, the conservatism of the likelihood (1-Likelihood) is calculated to quantitatively output the degree of abnormality.
[0066] As illustrated in Fig. 4, the multivariate probability distribution-based diagnostic process by combination mode is shown, and the data analysis and processing flow for equipment status diagnosis is explained.
[0067] First, the system collects and accumulates data based on the equipment's failure modes. This data includes both normal and abnormal states and is stored separately by failure mode. It serves as foundational data for identifying key variables that influence failure states.
[0068] Next, the collected data undergoes preprocessing steps such as missing value handling, outlier removal, and normalization. Through these preprocessing steps, the quality of the data is improved, and it is transformed into a format suitable for training and diagnosis.
[0069] A Joint Probability Density Function (JPDF) is generated based on the preprocessed data. To achieve this, key variables are analyzed and the data distribution is modeled to establish criteria for distinguishing between steady and abnormal states.
[0070] When new data is input into a trained model, it is diagnosed whether the data is in a normal or abnormal state. The new data is evaluated based on the previously learned data distribution.
[0071] The likelihood is calculated for the new input data. The likelihood value represents the probability that the data is in a steady state, and this calculation is performed based on the learned joint probability density function.
[0072] Finally, the abnormal rate is calculated by determining the 1-likelihood of the likelihood. The abnormal rate is used as a value to quantitatively evaluate how far the equipment condition deviates from the normal state.
[0073] Figure 4 systematically visualizes this process and explains the data-driven approach for equipment condition diagnosis step by step. In particular, the accumulation of data by failure mode, preprocessing, generation of a probability density model, likelihood calculation, and calculation of abnormality are systematically linked, providing a foundation for quantitatively and reliably evaluating the equipment condition.
[0074] The labeled data, obtained by extracting the proportion of abnormalities by combination mode from the data set, is utilized as preliminary training data to form a training dataset. Normalization is performed on the normal and abnormal datasets separated by combination mode for scale comparison, and key features are extracted. The classifier used in this process is Support Vector Machines (SVM), and a procedure is performed to search for a hyperplane that separates the optimal characteristic values for the damage status of rotating equipment using the classifier.
[0075] The present invention includes a series of diagnostic processes that utilize a multivariate probability distribution model to store data and construct an abnormal data set, generate a classification model, and output results, for the purpose of introducing data-based diagnostic technology in an initial environment where data labels are scarce. Cost reduction is possible by automating the labeling process, and it is expected that the performance of condition diagnosis will be improved by linking different diagnostic algorithms.
Claims
1. A data set generating unit (100) that generates training data including normal data and abnormal data, A multivariate probability distribution calculation unit (200) that receives driving signal data, analyzes key variables, calculates a joint probability density function (Joint-PDF), and calculates the degree of abnormality of the data, A label automation unit (300) that automatically assigns normal and abnormal labels based on the calculated degree of abnormality and generates labeling data, A learning model generation unit (400) that generates a classification model capable of distinguishing equipment status based on labeled data, A state diagnosis execution unit (500) that utilizes a generated classification model to analyze input unlabeled data and diagnoses the equipment status to generate diagnosis result data, and A system for diagnosing equipment condition using a multivariate probability distribution and a classification model, comprising a result output unit (600) that visualizes the results of the equipment condition based on the diagnosis result data.
2. In Paragraph 1, The above data set generation unit (100) is characterized by processing missing values in driving signal data, removing outliers, and performing normalization of the data to generate a dataset suitable for learning, in a multivariate probability distribution and classification model equipment condition diagnosis system.
3. In Paragraph 1, The above multivariate probability distribution calculation unit (200) is characterized by applying principal component analysis (PCA) to calculate the joint probability density function (Joint-PDF) to extract principal variables, thereby forming an equipment condition diagnosis system using a multivariate probability distribution and a classification model.
4. In Paragraph 1, The above label automation unit (300) is characterized by automatically assigning normal and abnormal labels by comparing a predefined abnormal condition threshold with a calculated degree of abnormality, in a multivariate probability distribution and classification model equipment condition diagnosis system.
5. In Paragraph 1, The above learning model generation unit (400) is characterized by generating a model capable of classifying equipment status using Support Vector Machines (SVM) based on labeled data, in a multivariate probability distribution and classification model equipment status diagnosis system.
6. In a method using an equipment condition diagnosis system utilizing multivariate probability distributions and classification models, (a) A step in which the above-described equipment condition diagnostic system generates learning data including normal data and abnormal data, (b) A step in which the above-mentioned equipment condition diagnostic system receives operating signal data, analyzes key variables, calculates a joint probability density function (Joint-PDF), and calculates the degree of abnormality of the data, (c) A step in which the above equipment condition diagnostic system automatically assigns normal and abnormal labels based on the calculated degree of abnormality and generates labeling data, (d) A step in which the above equipment condition diagnosis system generates a classification model capable of distinguishing equipment conditions based on labeled data, (e) A step of analyzing input unlabeled data and diagnosing the equipment status using a classification model generated by the above-mentioned equipment status diagnostic system to generate diagnostic result data, and (f) A method using a multivariate probability distribution and classification model for diagnosing equipment condition, comprising the step of visualizing the results of the equipment condition based on the diagnostic result data generated by the above equipment condition diagnosis system.