A feature-to-indicator influence evaluation method, device, equipment and storage medium

By screening and building feature models, the impact of features on industrial indicators can be quickly assessed, solving the problem of difficulty in assessing feature impact in existing technologies. This provides simple and effective production guidance and improves product quality.

CN115878994BActive Publication Date: 2026-06-26CISDI INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CISDI INFORMATION TECH CO LTD
Filing Date
2022-11-16
Publication Date
2026-06-26

Smart Images

  • Figure CN115878994B_ABST
    Figure CN115878994B_ABST
Patent Text Reader

Abstract

The application provides a feature-to-indicator influence evaluation method, device and equipment and a storage medium, the method comprising the following steps: obtaining an industrial production data set, fitting regression processing industrial indicator data and corresponding feature data in the industrial production data, screening out important feature data of the industrial indicator data, inputting the industrial indicator data and the important feature data into a preset feature model, obtaining an indicator-feature relationship curve to indicate the correlation between the industrial indicator data and the important feature data, and then evaluating the influence of the feature on the indicator. The application can evaluate the influence direction and numerical influence level of the important feature on the industrial indicator data through the visual curve, and can more conveniently and clearly use the evaluation result directly for production guidance.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data analysis, and specifically to a method, apparatus, device, and storage medium for evaluating the impact of features on indicators. Background Technology

[0002] In modern industrial product manufacturing, various industrial indicator data exist that determine product quality and performance. With the development of IoT technology, more and more companies are beginning to collect various production process parameters and environmental parameters associated with these industrial indicator data to analyze their impact. Utilizing big data and machine learning technologies, the influence trends and degrees of various correlation features on industrial indicator data can be accurately analyzed and evaluated. This assists industrial production operators in analyzing the reasons for changes in product indicator values, thereby rationally adjusting various parameters in the production process to keep product indicator values ​​within a reasonable range and improve product qualification rates. In most large factories, the control of industrial indicator data mainly relies on manual adjustments by operators based on experience, which is inefficient and requires extremely high levels of operator experience.

[0003] There is currently limited research in both academia and industry on the evaluation and analysis of correlation characteristics in industrial production indicators. Existing methods suffer from two main shortcomings: First, most existing methods rely on tree models or neural networks to analyze feature importance. These methods can only determine the degree of importance of features to indicator values, but cannot directly assess the direction and level of direct influence of features on indicators, thus making them difficult to directly apply to production guidance. Second, some existing deep learning methods are highly complex, requiring significant time and training data for retraining under changing operating conditions, posing a considerable challenge for direct application. Summary of the Invention

[0004] In view of the shortcomings of the prior art described above, which makes it difficult to directly assess the direction and level of the direct influence of features on indicators, the present invention provides a method, apparatus, device and storage medium for evaluating the influence of features on indicators, so as to quickly complete the analysis of the direction and level of the direct influence of features on indicators.

[0005] This invention provides a method for evaluating the impact of features on indicators, comprising: acquiring an industrial production dataset to be analyzed, the dataset including industrial indicator data and multiple feature data corresponding to the industrial indicator data; filtering the industrial production dataset to obtain important feature data, the filtering including performing fitting regression processing on the industrial indicator data and multiple feature data; inputting the industrial indicator data and important feature data into a preset feature model to obtain an indicator-feature relationship curve to indicate the correlation between the industrial indicator data and the important feature data, thereby evaluating the impact of features on indicators.

[0006] In one embodiment of the present invention, screening important feature data includes training a preset regression model based on each feature data and the industrial indicator data until the preset regression model converges, determining the convergence feature weight coefficients of each feature data; sorting the feature data based on the convergence feature weight coefficients; and determining the feature data that is ranked at a preset sequence position as important feature data.

[0007] In one embodiment of the present invention, the screening of important feature data further includes deleting the feature data whose convergence feature weight coefficient is 0; determining the feature data whose sorting position is at a preset sequence position and whose convergence feature weight coefficient is positive as positively correlated important features; determining the feature data whose sorting position is at a preset sequence position and whose convergence feature weight coefficient is negative as negatively correlated important features; and determining the positively correlated important features and the negatively correlated important features as the important feature data.

[0008] In one embodiment of the present invention, the industrial indicator data and the important feature data are input into a preset feature model to obtain an indicator-feature relationship curve, which includes constructing sub-models for each of the important feature data based on a preset function, wherein the preset function is a polynomial piecewise function; the industrial indicator data and the sub-models are input into the preset feature model, wherein the preset feature model is an additive model; and the feature coefficients of the important features corresponding to each of the sub-models are determined based on the preset feature model, so as to obtain the indicator-feature relationship curve of the important features based on the feature coefficients.

[0009] In one embodiment of the present invention, determining the direction and level of influence of the important feature on the industrial indicator data based on the indicator-feature relationship curve includes: determining the direction of influence of the important feature on the industrial indicator data according to the trend of the indicator-feature relationship curve; and determining the level of influence of the important feature on the industrial indicator data according to the coordinate values ​​of the indicator-feature relationship curve.

[0010] In one embodiment of the present invention, after determining the numerical impact level of the important feature on the industrial indicator data, the feature impact evaluation method further includes determining the industrial implementation parameter set of the industrial indicator data based on multiple indicator-feature relationship curves of the industrial indicator data; acquiring current real-time industrial production data, the current real-time industrial production data including real-time indicator data and multiple real-time feature data corresponding to the real-time indicator data; if the real-time indicator value of the real-time indicator data is less than a target indicator threshold, the target indicator threshold is determined according to a preset correspondence between the real-time indicator data and the industrial indicator data; and adjusting the real-time feature data through the industrial implementation parameter set of the industrial indicator data corresponding to the real-time indicator data to improve the indicator level of the adjusted real-time indicator data.

[0011] In one embodiment of the present invention, after obtaining the industrial production dataset to be analyzed, the method for evaluating the impact of features on indicators further includes deleting invalid outliers in the industrial production dataset to be analyzed; deleting data items in the industrial production dataset to be analyzed that are missing data of the industrial indicator; determining the data missing rate of the feature data field in the industrial production dataset to be analyzed; if the data missing rate is greater than a preset threshold, deleting the feature data field; if the data missing rate is less than or equal to the preset threshold, filling the feature data field with data based on the median value.

[0012] This invention also provides a device for evaluating the impact of features on indicators. The device includes a data acquisition module for acquiring an industrial production dataset to be analyzed, the dataset including industrial indicator data and multiple feature data corresponding to the industrial indicator data; a feature selection module for performing fitting regression processing on the industrial indicator data and the multiple feature data corresponding to the industrial indicator data to select at least one feature data as important feature data; and a feature analysis and display module for obtaining an indicator-feature relationship curve from the industrial indicator data and the important feature data corresponding to the industrial indicator data through a preset feature model, and determining the direction and numerical impact level of the important feature on the industrial indicator data based on the indicator-feature relationship curve to evaluate the impact state of the important feature on the industrial indicator data.

[0013] The present invention also provides an electronic device, the electronic device including one or more processors; and a storage device for storing one or more programs, which, when executed by the one or more processors, cause the electronic device to implement the feature impact evaluation method as described in the above embodiments.

[0014] The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a computer processor, causes the computer to perform a method for evaluating the impact of features on indicators as described in the above embodiments.

[0015] The beneficial effects of this invention are as follows: A method, apparatus, device, and storage medium for evaluating the impact of features on indicators. The method includes acquiring an industrial production dataset and performing regression analysis on industrial indicator data and corresponding feature data within the industrial production data to filter out important feature data. The industrial indicator data and important feature data are then input into a preset feature model to obtain an indicator-feature relationship curve, indicating the correlation between the industrial indicator data and the important feature data, thereby evaluating the impact of features on the indicators. The method for selecting important features in this invention is simple and fast, requiring minimal effort in training the feature recognition model. Furthermore, this invention visualizes the industrial indicator data and important feature data through a preset feature model, allowing for a clear assessment of the direction and numerical impact of important features on the indicators, and facilitating the direct application of the assessment results to production guidance.

[0016] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit the invention. Attached Figure Description

[0017] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention. It is obvious that the drawings described below are merely some embodiments of the invention, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort. In the drawings:

[0018] Figure 1 This is a schematic diagram illustrating the impact of features on indicators in an exemplary embodiment of the present invention;

[0019] Figure 2 This is a schematic diagram of a method for evaluating the impact of features on indicators, as illustrated in an exemplary embodiment of the present invention.

[0020] Figure 3(a) is a schematic diagram of the yield strength-flying shear temperature relationship curve shown in an exemplary embodiment of the present invention;

[0021] Figure 3(b) is a schematic diagram of the yield strength-flying shear temperature relationship curve shown in an exemplary embodiment of the present invention;

[0022] Figure 3(c) is a schematic diagram of the yield strength-S element content relationship curve shown in an exemplary embodiment of the present invention;

[0023] Figure 3(d) is a schematic diagram of the yield strength-water tank flow rate relationship curve shown in an exemplary embodiment of the present invention;

[0024] Figure 3(e) is a schematic diagram of the yield strength-No. 18 mill current relationship curve shown in an exemplary embodiment of the present invention;

[0025] Figure 3(f) is a schematic diagram of the yield strength-slab length relationship curve shown in an exemplary embodiment of the present invention;

[0026] Figure 3(g) is a schematic diagram of the relationship between yield strength and Si content in molten steel, illustrating an exemplary embodiment of the present invention.

[0027] Figure 3(h) is a schematic diagram of the relationship between yield strength and C content in molten steel, illustrating an exemplary embodiment of the present invention.

[0028] Figure 3(i) is a schematic diagram of the relationship between yield strength and N content in molten steel, illustrating an exemplary embodiment of the present invention.

[0029] Figure 3(j) is a schematic diagram of the relationship between yield strength and V content in molten steel, illustrating an exemplary embodiment of the present invention.

[0030] Figure 4 This is a block diagram illustrating an exemplary embodiment of the present invention of a device for evaluating the impact of features on indicators;

[0031] Figure 5 A schematic diagram of a computer system suitable for implementing embodiments of the present invention is shown. Detailed Implementation

[0032] The embodiments of the present invention will be described below with reference to the accompanying drawings and preferred embodiments. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be understood that the preferred embodiments are only for illustrating the present invention and not for limiting the scope of protection of the present invention.

[0033] It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. Therefore, the drawings only show the components related to the present invention and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.

[0034] In the following description, numerous details are explored to provide a more thorough explanation of embodiments of the invention. However, it will be apparent to those skilled in the art that embodiments of the invention may be practiced without these specific details. In other embodiments, well-known structures and devices are shown in block diagram form rather than in detail to avoid obscuring embodiments of the invention.

[0035] First, it's important to clarify that regression models are a predictive modeling technique that studies the relationship between a dependent variable (target) and independent variables (predictors). This technique is commonly used in predictive analytics, time series models, and for discovering causal relationships between variables.

[0036] Generalized additive models are a type of flexible statistical model that can be used to detect the effects of nonlinear regression. Although linear models are simple, intuitive and easy to understand, in real life, the effects of variables are usually not linear, and the linear assumption may not meet the actual needs or even directly violate the actual situation. In this case, generalized additive models can solve the problem well.

[0037] Existing methods for analyzing industrial indicator data mostly utilize big data and machine learning technologies. However, these methods, based on tree models or neural networks, analyze the important features corresponding to industrial indicator data. These methods can only determine the degree of importance of the feature to the indicator value, but cannot directly assess the direction and level of the feature's direct impact on the indicator, making them difficult to directly apply to production guidance. Furthermore, analysis using deep learning methods is highly complex, and retraining under changing operating conditions is time-consuming, making direct application difficult. To address these issues, embodiments of the present invention propose a method for evaluating the impact of features on indicators, a device for evaluating the impact of features on indicators, an electronic device, and a computer-readable storage medium. These embodiments will be described in detail below.

[0038] Please see Figure 1 , Figure 1This is a schematic diagram illustrating a feature-based indicator evaluation system according to an exemplary embodiment of the present invention. In one embodiment, the feature-based indicator evaluation system consists of two ends: a data acquisition unit 101 and a computer device 102. The computer device 102 can be at least one of a microcomputer, an embedded computer, or a neural network computer, while the data acquisition unit 101 can be an IoT sensor device or other device with data acquisition capabilities. The computer device 102 is used to perform fitting regression processing on the industrial production dataset, filter out important feature data, and then input the industrial indicator data and important feature data into a preset feature model to obtain an indicator-feature relationship curve, indicating the correlation between the industrial indicator data and the important feature data, thereby evaluating the influence of the feature on the indicator. The data acquisition unit 101 is used to collect the industrial production dataset and other required data, and input the collected data into the computer device 102.

[0039] Please see Figure 2 , Figure 2 This is a schematic flowchart illustrating a method for evaluating the impact of features on indicators, as shown in an exemplary embodiment of the present invention. In one exemplary embodiment, the method for evaluating the impact of features on indicators includes at least steps S210 to S240, which are described in detail below:

[0040] Step S210: Obtain the industrial production dataset to be analyzed. The industrial production dataset to be analyzed includes industrial indicator data and multiple feature data corresponding to the industrial indicator data.

[0041] In one embodiment of the present invention, an industrial production dataset is generated by collecting industrial indicator data and multiple feature datasets corresponding to the industrial indicator data using IoT sensor devices. The industrial indicator data includes performance values, quality values, and other data of interest in industrial product production. The feature data consists of parameters that may affect the industrial indicator data during the production process, such as production parameters like temperature, composition, and equipment settings.

[0042] Step S220: Perform regression fitting on the industrial production dataset to be analyzed and select important feature data.

[0043] In one embodiment of the present invention, the method for determining important feature data is as follows: training a preset regression model based on each feature dataset and industrial indicator data until the preset regression model converges; determining the convergence feature weight coefficients of each feature dataset; sorting the feature datasets based on the convergence feature weight coefficients; deleting feature data with convergence feature weight coefficients of 0; determining feature data that are ranked at a preset position in the sequence and have positive convergence feature weight coefficients as positively correlated important features; determining feature data that are ranked at a preset position in the sequence and have negative convergence feature weight coefficients as negatively correlated important features; and determining both positively correlated and negatively correlated important features as important feature data.

[0044] In one embodiment of the present invention, the Lasso model (Least Absolute Shrinkage and Selection Operator, regression model) is used to fit and regress industrial indicator data and feature datasets to obtain convergent feature weight coefficients. Then, important feature data are determined based on the convergent weight coefficients. Specifically, the following steps are included:

[0045] Initialize the Lasso model. The loss function J(β) of the Lasso model is defined as:

[0046]

[0047] In equation (1), N is the number of data entries in the industrial production dataset, m is the number of feature data, and y i Let x be the data for the i-th industrial indicator. ij Let β be the j-th feature data corresponding to the i-th industrial indicator data. j For each feature data, the convergent feature weight coefficient is i = 1, ..., N, j = 1, ..., m, and λ is the L1 penalty coefficient. As a 1-norm penalty term, it restricts the range of the convergent feature weight coefficient, thereby achieving dimensionality reduction and shrinkage, and realizing the selection of important features.

[0048] A Lasso model is trained using N industrial indicator data and their corresponding feature data. After the model converges, the convergence feature weight coefficients β1,...,β1 corresponding to m feature data are derived. j ,...,β m .

[0049] The selection of important feature data is based on the above m convergence feature weight coefficients. The selection steps include: first, excluding feature data with convergence feature weight coefficients of 0; then, sorting the remaining feature data; selecting the n1 feature data with the largest and positive convergence feature weight coefficients as the positively correlated important feature set; and selecting the n2 feature data with the smallest and negative convergence feature weight coefficients as the negatively correlated important feature set. Finally, the selected important features are the set of positively correlated important feature sets and negatively correlated important feature sets.

[0050] Step S230: Input industrial indicator data and important feature data into a preset feature model to obtain an indicator-feature relationship curve, which indicates the correlation between industrial indicator data and important feature data, and then evaluates the impact of features on indicators.

[0051] In one embodiment of the present invention, sub-models for each important feature data are constructed based on a preset function, wherein the preset function is a polynomial piecewise function; industrial indicator data and sub-models are input into a preset feature model, wherein the preset feature model is an additive model; feature coefficients of important features corresponding to each sub-model are determined based on the preset feature model, so as to obtain the indicator-feature relationship curve of important features based on the feature coefficients. The specific steps are as follows:

[0052] Initialize the generalized additive model. The expression of the generalized additive model is defined as follows:

[0053]

[0054] In equation (2), f j (x ij Let f be a univariate function for important feature data j, and b be the intercept. For each important feature data j, a different sub-model f is established. j (x ij In this embodiment, f j (x ij The construction mainly uses spline functions.

[0055] To solve the generalized additive model, this embodiment employs a local integration algorithm. In practical applications, Python or R toolkits can be used directly. The solution yields the spline function f corresponding to each important feature data j. j (x ij The coefficients of the spline function f j (x ij The coefficients of the important features are the characteristic coefficients.

[0056] spline function f for n important features j (x ijBased on the characteristic coefficients, the indicator-characteristic relationship curve of each important feature is obtained through visualization. The indicator-characteristic relationship curve can indicate the degree of impact of changes in each important feature on the industrial indicator data value.

[0057] In one embodiment of the present invention, after collecting the industrial production dataset to be analyzed, the method for evaluating the impact of features on indicators further includes: deleting invalid outliers in the industrial production dataset to be analyzed; deleting data items with missing industrial indicator data in the industrial production dataset to be analyzed; determining the data missing rate of the feature data field in the industrial production dataset to be analyzed; if the data missing rate is greater than a preset threshold, deleting the feature data field; if the data missing rate is less than or equal to the preset threshold, filling the feature data field with data based on the median value.

[0058] In one embodiment of the present invention, invalid outliers are removed using a clustering algorithm (DBSCAN), including clustering the feature data field; if outlier data points exist in the feature data field after clustering, the data at the outlier data points are determined to be invalid outliers; and all invalid outliers in the feature data field are deleted.

[0059] In one embodiment of the present invention, if the missing rate of a certain feature data field in the industrial production dataset to be analyzed is greater than 10%, it is directly deleted, and then the median value is used to fill the feature data field with the remaining fields that have certain existence.

[0060] In one embodiment of the present invention, after obtaining the index-feature relationship curve in step S230, the method for evaluating the influence of features on indicators further includes:

[0061] The maximum value of industrial indicator data on the indicator-feature relationship curve is determined as the preset indicator threshold of industrial indicator data; the important feature data corresponding to the preset indicator threshold on the indicator-feature relationship curve are determined as industrial implementation reference values; and the industrial implementation reference values ​​determined based on the indicator-feature relationship curves of each important feature are determined as the industrial implementation parameter set of industrial indicator data.

[0062] The system acquires real-time industrial production data, which includes real-time indicator data and multiple real-time feature data corresponding to the real-time indicator data. If the real-time indicator value of the real-time indicator data is less than the target indicator threshold, the target indicator threshold is determined according to the preset correspondence between the real-time indicator data and the industrial indicator data. The system adjusts the real-time feature data by using the industrial implementation parameter set of the industrial indicator data corresponding to the real-time indicator data, so as to improve the indicator level of the adjusted real-time indicator data.

[0063] Please see Figures 3(a)-3(j) , Figures 3(a)-3(j)This is a schematic diagram of the index-feature relationship curves for multiple important features, as illustrated in an exemplary embodiment of the present invention. In one embodiment, the yield strength of the steel from a steel production line in a steel mill's bar, wire, and rolling mill is selected as the industrial index data. The corresponding feature dataset includes the composition of molten steel, furnace temperature, rolling temperature during wire rolling, and water tank flow rate, etc. The collected industrial data index and feature datasets are used as the industrial production dataset to be analyzed and preprocessed. The preprocessing method includes deleting invalid outliers from the industrial production dataset to be analyzed; deleting data items with missing industrial index data from the industrial production dataset to be analyzed; determining the data missing rate of the feature data fields in the industrial production dataset to be analyzed; if the missing rate of a certain feature data field in the industrial production dataset to be analyzed is greater than 10%, it is directly deleted, and then the remaining fields with valid data are filled with data using the median value.

[0064] After preprocessing, the industrial production dataset to be analyzed contains 2113 data points, i.e., N=2113. The remaining feature data consists of 31 data points, i.e., m=31. The Lasso model in the above embodiment is used to fit and regress the industrial production dataset to be analyzed to determine the important feature set. The number of positively correlated important features n1 is set to 6, and the number of negatively correlated important features n2 is set to 4. Finally, 10 important features are selected for analysis and evaluation. Among them, the important positive correlation features include: ['mill_electricity_18','billet_len','si_element','c_element','n_element','v_element'], which correspond to the current of mill No. 18, the length of the billet, and the content of Si, C, N, and V in the molten steel, respectively; the important negative correlation features include: ['snip_two_temp','snip_three_temp_mean','s_element','water_tank_flow_4'], which correspond to the temperature of flying shear No. 2, the temperature of flying shear No. 3, the content of S element, and the flow rate of water tank No. 4, respectively.

[0065] Based on the generalized additive model in the above embodiments, sub-models were established for each of the 10 important features, resulting in index-feature relationship curves for the 10 important features. Figure 3(a) shows the yield strength-flying shear temperature relationship curve; Figure 3(b) shows the yield strength-flying shear temperature relationship curve; Figure 3(c) shows the yield strength-S element content relationship curve; Figure 3(d) shows the yield strength-water tank flow rate relationship curve; Figure 3(e) shows the yield strength-rolling mill current relationship curve; Figure 3(f) shows the yield strength-billet length relationship curve; Figure 3(j) shows the yield strength-Si element content relationship curve in molten steel; Figure 3(h) shows the yield strength-C element content relationship curve in molten steel; Figure 3(i) shows the yield strength-N element content relationship curve in molten steel; Figure 3(j) shows the yield strength-V element content relationship curve in molten steel.

[0066] Depend on Figures 3(a)-3(j) It can be seen that the overall trend of positively correlated important features is that industrial indicator data increases with the increase of the important feature, while the overall trend of negatively correlated important features is that industrial indicator data decreases with the increase of the important feature, which is consistent with the features selected by the Lasso model. According to Figures 3(a)-3(j) Furthermore, we can also determine the impact of changes in each important characteristic on the yield strength of industrial index data. For example, as shown in Figure 3(c), for s_element, i.e. S element content, it is not difficult to find that an increase of 0.01 in S element content will cause the yield strength to decrease by about 4 MPa.

[0067] In this embodiment, by evaluating the impact of features on indicators, steel mill operators can be assisted in analyzing the reasons for the changes in steel yield strength, thereby reasonably controlling various parameters in the production process to keep the yield strength within a reasonable range and improve the steel qualification rate. Specific methods include obtaining real-time data of current industrial production. If the yield strength is lower than the reasonable range at this time, the yield strength is increased by reducing the sulfur content.

[0068] And according to Figures 3(a)-3(j) The image shows preset industrial implementation parameters that are adjusted to improve the performance of real-time industrial production data.

[0069] In embodiments of the present invention, by sorting the convergence feature weight coefficients to select important feature data, features that have a greater impact on industrial indicator data can be selected as important features.

[0070] In embodiments of the present invention, by determining positively correlated important features and negatively correlated important features to finally determine important feature data, the degree of influence and numerical influence level of features on indicators can be analyzed from two dimensions of influence: positive and negative.

[0071] In embodiments of the present invention, an index-feature relationship curve is obtained based on a multi-segment piecewise function and an additive model, thereby indicating the degree of influence of features on the index and the level of numerical influence through the curve.

[0072] In embodiments of the present invention, the influence direction and numerical level of important features on industrial indicator data are determined by analyzing the trend and coordinate values ​​of the indicator-feature relationship curve.

[0073] In embodiments of the present invention, the industrial implementation parameter set is determined based on information obtained from the index-feature relationship curve to provide theoretical guidance for the numerical values ​​of actual industry, thereby improving the index data of industrial production. Then, the real-time feature data is adjusted based on the real-time industrial parameter set to improve the level of real-time index data in actual industry.

[0074] In embodiments of the present invention, invalid data in the industrial production dataset to be analyzed is removed by preprocessing the data, including deleting invalid outliers, data items with missing industrial indicator data, feature data fields with high data missing rates, and supplementing feature data fields, thus facilitating subsequent steps.

[0075] Please see Figure 4 , Figure 4 This is a block diagram illustrating an exemplary embodiment of the present invention of a feature-based indicator evaluation device. The device includes a data acquisition module 401, a feature selection module 402, and a feature analysis and display module 403.

[0076] The data acquisition module 401 is used to acquire the industrial production dataset to be analyzed, which includes industrial indicator data and multiple feature data corresponding to the industrial indicator data.

[0077] The feature selection module 402 is used to perform fitting regression processing on the industrial indicator data and multiple feature data corresponding to the industrial indicator data, so as to select at least one feature data as important feature data.

[0078] The feature analysis and display module 403 is used to obtain the indicator-feature relationship curve by using a preset feature model to analyze the industrial indicator data and the important feature data corresponding to the industrial indicator data. Based on the indicator-feature relationship curve, the module determines the direction and level of influence of important features on the industrial indicator data in order to evaluate the influence status of important features on the industrial indicator data.

[0079] It should be noted that the feature impact evaluation device and the feature impact evaluation method provided in the above embodiments belong to the same concept. The specific operation methods of each module and unit have been described in detail in the method embodiments and will not be repeated here. In practical applications, the feature impact evaluation device provided in the above embodiments can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. This is not a limitation here.

[0080] Embodiments of the present invention also provide an electronic device, including: one or more processors; and a storage device for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the electronic device enables the evaluation method for the impact of features on indicators provided in the above embodiments.

[0081] Figure 5 A schematic diagram of a computer system suitable for implementing embodiments of the present invention is shown. It should be noted that... Figure 5 The computer system 500 of the electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of the present invention.

[0082] like Figure 5 As shown, the computer system 500 includes a Central Processing Unit (CPU) 501, which can perform various appropriate actions and processes based on programs stored in Read-Only Memory (ROM) 502 or programs loaded from storage portion 508 into Random Access Memory (RAM) 503, such as performing the methods described in the above embodiments. The RAM 503 also stores various programs and data required for system operation. The CPU 501, ROM 502, and RAM 503 are interconnected via a bus 504. An Input / Output (I / O) interface 505 is also connected to the bus 504.

[0083] The following components are connected to I / O interface 505: an input section 506 including a keyboard, mouse, etc.; an output section 507 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 508 including a hard disk, etc.; and a communication section 509 including a network interface card such as a LAN (Local Area Network) card, modem, etc. The communication section 509 performs communication processing via a network such as the Internet. A drive 510 is also connected to I / O interface 505 as needed. Removable media 511, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., are installed on drive 510 as needed so that computer programs read from them can be installed into storage section 508 as needed.

[0084] In particular, according to embodiments of the present invention, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing computer programs for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 509, and / or installed from removable medium 511. When the computer program is executed by central processing unit (CPU) 501, it performs various functions defined in the system of the present invention.

[0085] It should be noted that the computer-readable medium shown in the embodiments of the present invention can be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), flash memory, optical fiber, portable compact disc read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In the present invention, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying a computer-readable computer program. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media can also be any computer-readable medium other than computer-readable storage media, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to wireless, wired, etc., or any suitable combination thereof.

[0086] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. Each block in a flowchart or block diagram may represent a module, segment, or portion of code, which contains one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0087] The units described in the embodiments of the present invention can be implemented in software or hardware, and the described units can also be located in a processor. The names of these units do not necessarily limit the specific unit itself.

[0088] Another aspect of the present invention provides a computer-readable storage medium storing a computer program thereon, which, when executed by a computer processor, causes the computer to perform the aforementioned method for evaluating the impact of features on indicators. This computer-readable storage medium may be included in the electronic device described in the above embodiments, or it may exist independently and not incorporated into the electronic device.

[0089] Another aspect of the present invention provides a computer program product or computer program including computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the feature impact evaluation method for indicators provided in the various embodiments described above.

[0090] The above embodiments are merely illustrative of the principles and effects of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or alter the above embodiments without departing from the spirit and scope of the present invention. Therefore, all equivalent modifications or alterations made by those skilled in the art without departing from the spirit and technical concept disclosed in the present invention should still be covered by the claims of the present invention.

Claims

1. A method for evaluating the impact of a feature on an indicator, characterized in that, The methods for evaluating the impact of the features on the indicators include: Obtain the industrial production dataset to be analyzed, which includes industrial indicator data and multiple feature data corresponding to the industrial indicator data; The industrial production dataset to be analyzed is filtered to obtain important feature data. The filtering includes fitting regression processing of the industrial indicator data and multiple feature data. Sub-models for each important feature data are constructed based on a preset function, wherein the preset function is a polynomial piecewise function. The industrial indicator data and the sub-model are input into a preset feature model, which includes a generalized additive model, characterized as follows: Among them, the For the first The industrial indicator data, the For important feature data The univariate function, the For each important feature data feature, the intercept is used. Establish different sub-models The sub-model The construction includes construction based on spline functions; Solving the generalized additive model yields the data for each important feature. Corresponding sub-model The coefficients, and the sub-model The coefficients are determined as characteristic coefficients; right Sub-models corresponding to key feature data Visualization based on feature coefficients yields the indicator-feature relationship curve for each important feature data, indicating the correlation between industrial indicator data and important feature data, and thus evaluating the impact of features on indicators.

2. The method for evaluating the impact of features on indicators according to claim 1, characterized in that, Filtering important feature data includes: A preset regression model is trained based on the aforementioned feature data and the aforementioned industrial indicator data until the preset regression model converges, and the convergence feature weight coefficients of the aforementioned feature data are determined. The feature data are sorted based on the convergence feature weight coefficients; The feature data that is sorted at a preset sequence position is identified as important feature data.

3. The method for evaluating the impact of features on indicators according to claim 2, characterized in that, Filtering data based on important features also includes: Delete the feature data whose convergence feature weight coefficient is 0; The feature data that is ranked at a preset number of positions in the sequence and whose convergence feature weight coefficient is positive are identified as positively correlated important features; The feature data that is ranked at a preset number of positions in the sequence and whose convergence feature weight coefficient is negative are identified as negatively correlated important features; The positively correlated important features and the negatively correlated important features are determined as the important feature data.

4. The method for evaluating the impact of features on indicators according to any one of claims 1-3, characterized in that, Determining the direction and level of influence of the important features on the industrial indicator data based on the indicator-feature relationship curve includes: The direction of the influence of the important feature on the industrial indicator data is determined based on the trend of the indicator-feature relationship curve. The influence level of the important feature on the industrial indicator data is determined based on the coordinate values ​​of the indicator-feature relationship curve.

5. The method for evaluating the impact of features on indicators according to any one of claims 1-3, characterized in that, After determining the numerical impact level of the important features on the industrial indicator data, the method for evaluating the impact of the features on the indicators further includes: The industrial implementation parameter set of the industrial indicator data is determined based on multiple indicator-feature relationship curves of the industrial indicator data. Acquire real-time data of current industrial production, which includes real-time indicator data and multiple real-time feature data corresponding to the real-time indicator data; If the real-time indicator value of the real-time indicator data is less than the target indicator threshold, the target indicator threshold is determined according to the preset correspondence between the real-time indicator data and the industrial indicator data. The real-time feature data is then adjusted using the industrial implementation parameter set of the industrial indicator data corresponding to the real-time indicator data, so as to improve the indicator level of the adjusted real-time indicator data.

6. The method for evaluating the impact of features on indicators according to any one of claims 1-3, characterized in that, After obtaining the industrial production dataset to be analyzed, the method for evaluating the impact of features on indicators further includes: Remove invalid outliers from the industrial production dataset to be analyzed; Delete the data items in the industrial production dataset to be analyzed that are missing the industrial indicator data; Determine the data missing rate of the characteristic data fields in the industrial production dataset to be analyzed; If the data missing rate is greater than a preset threshold, then the feature data field is deleted; If the data missing rate is less than or equal to a preset threshold, then the feature data field is filled with data based on the median value.

7. A device for evaluating the impact of a feature on an index, characterized in that, The device for evaluating the impact of the feature on the indicator includes: The data acquisition module is used to acquire the industrial production dataset to be analyzed, which includes industrial indicator data and multiple feature data corresponding to the industrial indicator data. The feature selection module is used to perform fitting regression processing on the industrial indicator data and multiple feature data corresponding to the industrial indicator data, so as to select at least one feature data as important feature data. The feature analysis and display module is used to construct sub-models for various important feature data based on a preset function, where the preset function is a polynomial piecewise function. The industrial indicator data and the sub-models are input into a preset feature model, which includes a generalized additive model, characterized as follows: Among them, the For the first The industrial indicator data, the For important feature data The univariate function, the For each important feature data feature, the intercept is used. Establish different sub-models The sub-model The construction includes building based on spline functions; solving the generalized additive model to obtain the data of each important feature. Corresponding sub-model The coefficients, and the sub-model The coefficients are determined as characteristic coefficients; for Sub-models corresponding to key feature data Visualizing the feature coefficients, we obtain the indicator-feature relationship curve for each important feature data. Based on the indicator-feature relationship curve, we determine the direction and level of influence of the important feature on the industrial indicator data, so as to evaluate the influence status of the important feature on the industrial indicator data.

8. An electronic device, characterized in that, The electronic device includes: One or more processors; A storage device for storing one or more programs, which, when executed by one or more processors, cause the electronic device to implement the method for evaluating the impact of features on indicators as described in any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that, It stores a computer program, which, when executed by the computer's processor, causes the computer to perform the method for evaluating the impact of the features on the indicators as described in any one of claims 1 to 6.