A method for lithology identification from well logging data that integrates noisy learning and local classifiers

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By integrating noisy learning and local classifiers, the method optimizes lithology identification of well logging data, solves the problem of insufficient accuracy in lithology identification in complex geological structures, improves the accuracy of similar lithology and refined lithology identification, and enhances the data differentiation capability of mineral exploration.

CN118520275BActive Publication Date: 2026-06-30CHINA UNIV OF GEOSCIENCES (WUHAN) +1

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHINA UNIV OF GEOSCIENCES (WUHAN)
Filing Date: 2024-05-31
Publication Date: 2026-06-30

Application Information

Patent Timeline

31 May 2024

Application

30 Jun 2026

Publication

CN118520275B

IPC: G06F18/213; G06F18/2415; G06F18/2431; G06F18/214; G06N3/047; G06N3/08

AI Tagging

Technology Topics

LithologyWell logging

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Fusion multivariate earthwork distribution space interpolation method and device, electronic equipment
CN122265571AImprove characterization accuracyimprove accuracy Ensemble learning Geometric image transformationLithologyTerrain
Drilling speed dynamic robust optimization method considering uncertainty of formation characteristic parameters, medium, equipment and product
CN122263571ABiological models Design optimisation/simulationLithologyThermodynamics
Adaptive radiation environment evaluation method based on gamma spectrum data and gum-bayes fusion
CN122260380AMathematical models X/gamma/cosmic radiation measurmentLithologyCovariance matrix
A large platform horizontal well design method based on a three-dimensional geological model
CN116595700BLithologyHorizontal wells
A hybrid ore-prospecting robot complex-terrain path planning method and system
CN122281931ADistribution matrixLithology

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing well logging lithology identification methods struggle to accurately distinguish diverse lithological types in complex geological structures. Traditional methods cannot meet the refined requirements of modern lithology identification, especially regarding the unclear distinction between similar lithologies and lithology labels.

Method used

A method combining noisy learning and local classifiers is adopted to improve the accuracy of lithology identification through well logging data training set preprocessing, global optimal feature subset construction, easily confused lithology combination analysis, optimization of local classifier model and information-guided noisy learning algorithm.

Benefits of technology

It improves the accuracy and efficiency of well logging lithology identification, especially in distinguishing between similar lithologies and refined lithologies, solving the problem of insufficient accuracy in lithology identification and enhancing the application effect of machine learning algorithms in mineral exploration.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN118520275B_ABST

Patent Text Reader

Abstract

This invention discloses a method for lithology identification from well logging data that integrates noisy learning and local classifiers, belonging to the field of deep learning technology. The method includes the following steps: preprocessing of the well logging data training set; constructing a pre-trained model; identifying easily confused lithological combinations; constructing an optimized local classifier model; establishing a lithology identification model that integrates noisy learning and optimized local classifiers; and lithology identification. This invention employs the above-mentioned method for lithology identification from well logging data that integrates noisy learning and local classifiers, which helps to distinguish well logging data in the field of mineral exploration, improves the accuracy and efficiency of machine learning algorithms in well logging lithology identification in the geological and mining fields, and effectively solves the problems of difficulty in distinguishing similar lithologies and unclear lithology labeling in lithology identification.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of deep learning technology, and in particular to a method for identifying lithology in well logging data that integrates noisy learning and local classifiers. Background Technology

[0002] Well logging lithology identification is the foundation for formation assessment and reservoir characterization at sampling points. It relies on well logging data to express information about underground rocks. The results of lithology identification can serve as the basis for reservoir parameter calculation (permeability, porosity, etc.) and geological research (formation correlation, sedimentary models, etc.), providing important reference for the spatial distribution and location of mineral resources.

[0003] Current geological lithology identification work mainly focuses on three research directions: lithology identification based on physical samples, lithology identification based on rock images and borehole images, and lithology identification based on well logging data. Among these, geological well logging data acquisition is automatically completed by mature well logging instruments, the data collection process is relatively convenient, and the data has high continuity, thus possessing high engineering application value.

[0004] Due to the geological structure of the study area, the lithology is extremely diverse, mainly including widely distributed volcanic rocks such as diabase and andesite, clastic rocks such as sandstone, mudstone, and conglomerate, as well as coal-bearing seams. The formation of these lithologies is primarily influenced by multiple phases of volcanic activity, unique sedimentary environments, and tectonic evolution processes. The lithology includes many complex types, such as silty mudstone, calcareous medium-grained sandstone, and sandy conglomerate, which affects the accuracy of lithology identification. However, there is a growing expectation that lithology identification can cover more lithological types, and with the continuous advancement of well logging technology leading to more refined lithology classification, traditional lithology identification methods face the challenge of insufficient accuracy to meet modern needs. Summary of the Invention

[0005] The purpose of this invention is to provide a method for identifying lithology in well logging data that integrates noisy learning and local classifiers. This method helps to distinguish well logging data in the field of mineral exploration, improves the accuracy and efficiency of machine learning algorithms in well logging lithology identification in the geological and mining fields, and effectively solves the problems of difficulty in distinguishing similar lithologies and unclear lithology labeling in lithology identification.

[0006] To achieve the above objectives, this invention provides a method for lithology identification of well logging data that integrates noisy learning and local classifiers, comprising the following steps:

[0007] S1. Preprocessing of well logging data training set: Based on the lithology and property characteristics in the study area data, data screening and outlier data removal are performed, and the number of sampling points for lithology is balanced to form a well logging data training set;

[0008] S2. Construct a pre-trained model: Based on the correlation values between lithology and attributes, and between attributes, in the well logging data training set of step S1, analyze the globally optimal feature subset for all lithologies, construct a pre-trained model, and use the globally optimal feature subset as input.

[0009] S3. Determine easily confused lithological combinations: Input the data from step S1 into the model in step S2, train and verify it based on the global optimal feature subset, and then analyze the easily confused lithological combinations based on the identification results.

[0010] S4. Constructing an optimized local classifier model: Based on the easily confused lithological assemblages as the foundation for constructing the base classifier and the selection of the local optimal feature subset in step S2, an adaptive weight based on the Gaussian mixture model is introduced as the base classifier ensemble algorithm for the local classifier.

[0011] S5. Establish a fusion-noisy learning optimization local classifier model: Based on the optimized local classifier model in step S4, integrate the information-guided noisy learning algorithm to establish a fusion-noisy learning optimization local classifier model.

[0012] S6. Lithology Identification: Using the optimized model, lithology identification and accuracy comparison are performed on well logging data in mineral exploration based on its performance and efficiency.

[0013] Preferably, in step S1, outliers are directly removed; for redundant data and discrete sampling points, based on the box plot of each logging attribute, sampling points whose dispersion exceeds half of the maximum value of the box plot and whose phenomenon occurs in more than half of the logging attributes are directly removed, depending on the sufficiency of the number of lithological samples.

[0014] Preferably, the specific process of constructing the pre-trained model in step S2 is as follows:

[0015] S21. Correlation analysis of various attributes and the relationships between attributes in the study area;

[0016] Kendall's correlation coefficient and Spearman's correlation coefficient were used to comprehensively analyze the correlation between various attributes and attributes in the study area, as shown below:

[0017]

[0018] Among them, Gain_Attr i For the comprehensive score of the i-th attribute, Kendell(I) i ) represents the Kendall coefficient value of the i-th attribute, Spearmen(I) i ) represents the Spearman coefficient value for the i-th attribute;

[0019] S22. Correlation analysis between various attributes and lithology in the study area;

[0020] The information gain and mutual information gain were calculated to comprehensively analyze the correlation between various attributes and lithology in the study area, as shown below:

[0021] Gain i =RobustScale(IG) i )+RobustScale(MIG i (2)

[0022] Where i is the i-th attribute feature, IG i MIG is the information gain of the i-th attribute feature. i Let Gain be the mutual information gain of the i-th attribute feature, RobustScale be the normalization function, and Gain be the value of the gain. i The comprehensive information gain of the i-th attribute feature;

[0023] S23. Determination of the globally optimal feature subset;

[0024] The comprehensive score of each attribute is obtained by comprehensively analyzing the correlation between attributes and between attributes and lithology. Since strong correlation between attributes may lead to information redundancy and affect the accuracy of lithology identification, the comprehensive score is obtained by formula (3) to reduce the redundant information caused by the correlation between attributes, as shown below:

[0025]

[0026] Among them, G i Let be the i-th attribute of the study area; n is the number of attribute features in the study area; Gain i is the final result of information gain for the i-th attribute; Gain_Attr is the correlation result between the i-th attribute and the j-th attribute; the value range of i is [1, n], and the value range of j is [1, i-1] ∪ [i+1, n]; abs is the absolute value function.

[0027] Preferably, in step S4, an optimized local classifier model is constructed, and the specific steps are as follows:

[0028] S41. For each easily confused lithological assemblages, construct a base classifier;

[0029] S42. Based on the specific process of step S2, determine the local optimal feature subset of each combination in the easily confused lithological combination, and use it as the input of the base classifier;

[0030] S43. Calculate adaptive weights based on Gaussian mixture models, calculate the similarity between the test sample and easily confused lithological combinations, and the fitness of each base classifier with each easily confused lithological combination, as shown below:

[0031]

[0032] in, For cluster samples in cluster t The probability vector of the Gaussian mixture model, P(x e ) is the test sample x e The probability vector of the Gaussian mixture model. For this sample and a sample in the cluster The similarity of their probability distributions For cluster samples The probability that the data belongs to the j-th Gaussian distribution, γ e,j For test sample x e The probability that the distribution belongs to the j-th Gaussian distribution;

[0033]

[0034] in, Select label q for each cluster The local evaluation set of this cluster, consisting of the nearest neighbor samples. For test sample x e Local region sample sets of various clusters C Feature similarity vectors; Let represent the total number of samples in the t-th cluster under label q, and α represent the proportion of samples in the nearest neighbor cluster. Indicates rounding up;

[0035]

[0036] in, The set of local regions of each base classifier on each cluster. There exists a corresponding fitness vector for it. Represents the base classifier M j Local evaluation set of the t-th cluster The number of samples correctly classified Let represent the total number of samples in the t-th cluster under label q, and α represent the proportion of samples in the nearest neighbor cluster. Indicates rounding up;

[0037]

[0038] in, For the set of K base classifiers in the local regions of K clusters fitness matrix, The set of local regions of the i-th base classifier on the j-th cluster. There exists a corresponding fitness vector for it;

[0039]

[0040] Where, ω q For test sample x e The predicted weights are those for label q. For the i-th base classifier, the test sample x e The weights predicted for label q;

[0041] S44, Results of the ensemble base classifier;

[0042] The base classifier is used to predict the probability that the test belongs to each label, and the final predicted label is output through a weighted voting strategy, as shown below:

[0043]

[0044] Where q∈[1,…,Q], and Q is the number of labels. Predict the test sample x for the j-th base classifier e The probability of belonging to label q.

[0045] Preferably, in step S5, a fusion-noisy learning-optimized local classifier model is established;

[0046] Information guidance is incorporated into the noisy learning stage of a noisy learning algorithm based on dynamic label updates and a phased loss function. This is achieved by adding a penalty term to the loss function to control the direction of loss function reduction, as shown below:

[0047]

[0048] Where L is the original noisy learning stage loss function, y di ∈Y d {y:y∈[0,1] c ,1 T y = 1} is obtained by using the softmax function from y i The obtained distributed labels, θ is the network parameter set, c is the number of lithology categories, and L c This is the classification loss function used to guide the update of network parameters θ. f(x; θ) is the model prediction value of the discrete label distribution after processing by the softmax function. x is the well logging attribute data input to the neural network, and α and β are weight parameters used to balance the contributions of different loss functions. o Used to ensure distribution label y di and original tag Not entirely the same adaptive loss function, L e The entropy loss function is used to make the output more skewed towards a single category rather than the average distribution of each category. Represents the original lithology label;

[0049]

[0050] L'=L+L_similarity (12)

[0051] Where L_similarity is the cluster similarity penalty term, and L is the original loss function for the noisy learning stage. The positive hyperparameter is used to control the influence of the penalty term, and L' is the loss function for the noisy learning stage after adding the penalty term.

[0052] Preferably, in step S6, lithology identification specifically includes:

[0053] S61. Construction of lithology identification model: The parameters of the original ensemble learning model are adjusted by building an optimized local classifier model and fusing a noisy learning algorithm.

[0054] S62. Lithology Identification of Well Logging Data: Using an optimized model, lithology identification and accuracy comparison are performed on well logging data in mineral exploration based on its performance and efficiency.

[0055] Therefore, the present invention employs a well logging data lithology identification method that integrates noisy learning and local classifiers, which helps to differentiate well logging data in the field of mineral exploration and improves the accuracy and efficiency of machine learning algorithms in well logging lithology identification in the geological and mining fields. By integrating an information-guided noisy learning algorithm based on the optimization of the local classifier, the present invention solves the problem that when a certain lithology (such as sandy mudstone) and its refined lithology (such as silty mudstone) exist, the unrefined lithology will reduce the identification accuracy of this type of lithology, thus improving the identification accuracy of well logging lithology identification.

[0056] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0057] Figure 1 Flowchart of a well logging data lithology identification method that integrates noisy learning and local classifiers according to the present invention;

[0058] Figure 2 Flowchart of the optimized classifier algorithm of this invention;

[0059] Figure 3 The flowchart of the optimized classifier algorithm that integrates noisy learning in this invention is shown. Detailed Implementation

[0060] The technical solution of the present invention will be further described below with reference to the accompanying drawings and embodiments.

[0061] like Figure 1As shown, a method for lithology identification from well logging data that integrates noisy learning and local classifiers includes the following steps:

[0062] S1. Preprocessing of well logging data training set: Based on the lithology and property characteristics in the study area data, data screening and outlier data removal are performed, and the number of sampling points for lithology is balanced to form a well logging data training set;

[0063] S2. Construct a pre-trained model: Based on the correlation values between lithology and attributes, and between attributes, in the well logging data training set of step S1, analyze the globally optimal feature subset for all lithologies, construct a pre-trained model, and use the globally optimal feature subset as input.

[0064] S3. Determine easily confused lithological combinations: Input the data from step S1 into the model in step S2, train and verify it based on the global optimal feature subset, and then analyze the easily confused lithological combinations based on the identification results.

[0065] S4. Constructing an optimized local classifier model: Based on the easily confused lithological assemblages as the foundation for constructing the base classifier and the selection of the local optimal feature subset in step S2, an adaptive weight based on the Gaussian mixture model is introduced as the base classifier ensemble algorithm for the local classifier.

[0066] S5. Establish a fusion-noisy learning optimization local classifier model: Based on the optimized local classifier model in step S4, integrate the information-guided noisy learning algorithm to establish a fusion-noisy learning optimization local classifier model.

[0067] S6. Lithology Identification: Using the optimized model, lithology identification and accuracy comparison are performed on well logging data in mineral exploration based on its performance and efficiency.

[0068] Example

[0069] S1. Preprocessing of well logging data training set.

[0070] By partially selecting from the lithology and attribute characteristics of the study area data, the attributes of the sampling points in the well logging data to be processed are reduced, and duplicate category labels in the well logging data to be processed are simplified according to rock lithology. Based on this, outliers are directly removed. For redundant data and discrete sampling points, using the box plot for each well logging attribute as a reference, sampling points whose dispersion exceeds half of the maximum value of the box plot, and whose phenomenon occurs in more than half of the well logging attributes, are directly removed based on the sufficiency of the number of lithological samples. The number of sampling points is then balanced for each lithology to form a well logging data training set. The specific steps include:

[0071] S11, Lithology Reduction: First, the data attributes of the well logging data are reduced, and lithology with a sample size greater than 10,000 is used as the target for multi-class lithology identification. The results are shown in Table 1.

[0072] Table 1. Major Category Classification Labels and Number of Categories in the Study Area

[0073] Rock lithology Category Tags Sample size Rock lithology Category Tags Sample size mudstone 2 52960 sandy conglomerate 17 19802 siltstone 3 52960 Sandy mudstone 25 10592 fine sandstone 4 52960 Fourth Series 27 52960 medium sandstone 5 52960 coarse sand 31 23136 coarse sandstone 6 52960 medium sand 32 52960 sandstone 7 12403 fine sand 33 37656 Silty mudstone 8 52960 silt 34 12095 conglomerate 9 22706 diabase 41 33082 Calcic sandstone 10 10696 total - 617187

[0074] S12. Outlier and Redundant Value Handling: During the field logging data acquisition process, some sampling points may have data that deviates from the actual data due to equipment errors. This type of data is called outlier data. When the number of outliers in a sampling point is less than two, the average of its nearest neighbors is used as a substitute. When the number of outliers in a sampling point is more than two but less than five, the sampling point is marked as a redundant sampling point. When the training set data is sufficient, the sampling point is removed. When the training set data is missing, the redundant sampling point is retained. When the number of outliers in a sampling point is more than five, such sampling points need to be removed.

[0075] S2. Construct a pre-trained model.

[0076] Based on the correlation values between lithology and attributes, and between attributes, in the well logging data training set of step S1, the globally optimal feature subset for all lithologies is analyzed to construct a pre-trained model, which is then used as input. The specific process is as follows:

[0077] S21. Correlation analysis of various attributes and the relationships between attributes in the study area;

[0078] Kendall's correlation coefficient and Spearman's correlation coefficient were used to comprehensively analyze the correlation between various attributes and attributes in the study area, as shown below:

[0079]

[0080] Among them, Gain_Attr i For the comprehensive score of the i-th attribute, Kendell(I) i ) represents the Kendall coefficient value of the i-th attribute, Spearmen(I) i ) represents the Spearman coefficient value for the i-th attribute;

[0081] S22. Correlation analysis between various attributes and lithology in the study area;

[0082] The information gain and mutual information gain were calculated to comprehensively analyze the correlation between various attributes and lithology in the study area, as shown below:

[0083] Gain i =RobustScale(IG) i )+RobustScale(MIGi (2)

[0084] Where i is the i-th attribute feature, IG i MIG is the information gain of the i-th attribute feature. i Let Gain be the mutual information gain of the i-th attribute feature, RobustScale be the normalization function, and Gain be the value of the gain. i The comprehensive information gain of the i-th attribute feature;

[0085] S23. Determination of the globally optimal feature subset;

[0086] The comprehensive score of each attribute is obtained by comprehensively analyzing the correlation between attributes and between attributes and lithology. Since strong correlation between attributes may lead to information redundancy and affect the accuracy of lithology identification, the comprehensive score is obtained by formula (3) to reduce the redundant information caused by the correlation between attributes, as shown below:

[0087]

[0088] Among them, G i Let be the i-th attribute of the study area; n is the number of attribute features in the study area; Gain i is the final result of information gain for the i-th attribute; Gain_Attr is the correlation result between the i-th attribute and the j-th attribute; the value range of i is [1, n], and the value range of j is [1, i-1] ∪ [i+1, n]; abs is the absolute value function.

[0089] S3. Determine easily confused lithological combinations.

[0090] Input the data from step S1 into the model in step S2. After training and validation based on the globally optimal feature subset, analyze the easily confused lithological combinations based on the recognition results. Determine whether a lithology is easily confused based on the difference in the proportion of sampling points identified as a certain lithology in the total number of sampling points of that lithology.

[0091] If the final identification result of a certain lithology is that the number of lithology sampling points that are not that lithology is relatively even, then it is determined that there are no easily confused lithologies in that lithology and it is an independent lithology;

[0092] If the final identification result of a certain lithology shows a significant difference in the number of lithology sampling points that are not that lithology, then that lithology and the other lithologies with the highest number of sampling points are determined to be a group of easily confused lithologies.

[0093] Finally, a comprehensive analysis based on the judgment results yielded the final easily confused lithological combination.

[0094] S4. Construct an optimized local classifier model.

[0095] like Figure 2As shown, based on the easily confused lithological assemblage as the foundation for constructing the base classifier and the selection of the locally optimal feature subset in step S2, an adaptive weighting algorithm based on a Gaussian mixture model is introduced as the base classifier ensemble algorithm for the local classifier. The specific steps are as follows:

[0096] S41. For each easily confused lithological assemblages, construct a base classifier;

[0097] S42. Based on the specific process of step S2, determine the local optimal feature subset of each combination in the easily confused lithological combination, and use it as the input of the base classifier;

[0098] S43. Calculate adaptive weights based on Gaussian mixture models, calculate the similarity between the test sample and easily confused lithological combinations, and the fitness of each base classifier with each easily confused lithological combination, as shown below:

[0099]

[0100] in, For cluster samples in cluster t The probability vector of the Gaussian mixture model, P(x e ) is the test sample x e The probability vector of the Gaussian mixture model. For this sample and a sample in the cluster The similarity of their probability distributions For cluster samples The probability that the data belongs to the j-th Gaussian distribution, γ e,j For test sample x e The probability that the distribution belongs to the j-th Gaussian distribution;

[0101]

[0102] in, Select label q for each cluster The local evaluation set of this cluster, consisting of the nearest neighbor samples. For test sample x e Local region sample sets of various clusters C Feature similarity vectors;

[0103]

[0104] in, The set of local regions of each base classifier on each cluster. There exists a corresponding fitness vector for it. Represents the base classifier M j Local evaluation set of the t-th cluster The number of samples correctly classified Let represent the total number of samples in the t-th cluster under label q, and α represent the proportion of samples in the nearest neighbor cluster. Indicates rounding up;

[0105]

[0106] in, For the set of K base classifiers in the local regions of K clusters fitness matrix, The set of local regions of the i-th base classifier on the j-th cluster. There exists a corresponding fitness vector for it;

[0107]

[0108] Where, ω q For test sample x e The predicted weights are those for label q. For the i-th base classifier, the test sample x e The weights predicted for label q;

[0109] S44, Results of the ensemble base classifier;

[0110] The base classifier is used to predict the probability that the test belongs to each label, and the final predicted label is output through a weighted voting strategy, as shown below:

[0111]

[0112] Where q∈[1,…,Q], and Q is the number of labels. Predict the test sample x for the j-th base classifier e The probability of belonging to label q.

[0113] S5. Establish a local classifier model that integrates noisy learning optimization.

[0114] like Figure 3 As shown, based on the optimized local classifier model in step S4, an information-guided noisy learning algorithm is fused to establish a fused noisy learning optimized local classifier model. The specific process is as follows:

[0115] S51. A noisy learning algorithm based on dynamic label update and staged loss function divides the entire training process into different stages and uses different loss functions for each stage to avoid the prediction accuracy from reaching a peak in the middle stage of the training process and then gradually declining, which would lead to performance degradation. At the same time, it draws on the pattern of forward propagation training, backward loss reduction and network parameter update of deep neural networks to dynamically update labels in order to reduce the impact of noisy labels.

[0116] In the noisy learning stage, information guidance is incorporated by adding a penalty term to the loss function to control the direction of the loss function's decrease, as shown below:

[0117]

[0118] Where L is the original noisy learning stage loss function, y di ∈Y d {y:y∈[0,1] c ,1 T y = 1} is obtained by using the softmax function from y i The obtained distributed labels, θ is the network parameter set, c is the number of lithology categories, and L c This is the classification loss function used to guide the update of network parameters θ. f(x; θ) is the model prediction value of the discrete label distribution after processing by the softmax function. x is the well logging attribute data input to the neural network, and α and β are weight parameters used to balance the contributions of different loss functions. o Used to ensure distribution label y di and original tag Not entirely the same adaptive loss function, L e The entropy loss function is used to make the output more skewed towards a single category rather than the average distribution of each category. Represents the original lithology label;

[0119]

[0120] L'=L+L_similarity (12)

[0121] Where L_similarity is the cluster similarity penalty term, and L is the original loss function for the noisy learning stage. The positive hyperparameter is used to control the influence of the penalty term, and L' is the loss function for the noisy learning stage after adding the penalty term.

[0122] S52. Based on the noisy learning algorithm based on dynamic label update and staged loss function, the weights of the label distribution predicted by the model and the representative label distribution of the cluster in the label update are dynamically adjusted by calculating the similarity between the sample and the cluster. This is mainly achieved by adding a penalty term L_similarity to the staged loss function.

[0123] S6, Lithology identification.

[0124] S61. Construction of lithology identification model: The parameters of the original ensemble learning model are adjusted by building an optimized local classifier model and fusing a noisy learning algorithm.

[0125] S62. Lithology Identification of Well Logging Data: Using an optimized model, lithology identification and accuracy comparison are performed on well logging data in mineral exploration based on its performance and efficiency.

[0126] Therefore, the present invention adopts the above-mentioned method for lithology identification of well logging data that integrates noisy learning and local classifier, which helps to distinguish well logging data in the field of mineral exploration, improves the accuracy and efficiency of machine learning algorithms in well logging lithology identification in the field of geology and mining, and effectively solves the problems of difficulty in distinguishing similar lithologies and unclear lithology labeling in lithology identification.

[0127] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the technical solutions of the present invention, and these modifications or equivalent substitutions cannot cause the modified technical solutions to deviate from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method of well logging data lithology identification fusing noisy learning and local classifiers, characterized in that, Includes the following steps: S1. Preprocessing of well logging data training set: Based on the lithology and property characteristics in the study area data, data screening and outlier data removal are performed, and the number of sampling points for lithology is balanced to form a well logging data training set; S2. Construct a pre-trained model: Based on the correlation values between lithology and attributes, and between attributes, in the well logging data training set of step S1, analyze the globally optimal feature subset for all lithologies, construct a pre-trained model, and use the globally optimal feature subset as input. S3. Determine easily confused lithological combinations: Input the data from step S1 into the model in step S2, train and verify it based on the global optimal feature subset, and then analyze the easily confused lithological combinations based on the identification results. S4. Constructing an optimized local classifier model: Based on easily confused lithological assemblages as the foundation for constructing the base classifier and the selection of the locally optimal feature subset in step S2, an adaptive weighting algorithm based on a Gaussian mixture model is introduced as the base classifier ensemble algorithm for the local classifier. The specific steps are as follows: S41. For each easily confused lithological assemblages, construct a base classifier; S42. Based on the specific process of step S2, determine the local optimal feature subset of each combination in the easily confused lithological combination, and use it as the input of the base classifier; S43. Calculate adaptive weights based on Gaussian mixture model, calculate the similarity between the test sample and the easily confused lithological combination, and the fitness of each base classifier with each easily confused lithological combination. S44, Results of the ensemble base classifier; The base classifier is used to predict the probability that the test belongs to each label, and the final predicted label is output through a weighted voting strategy, as shown below: （9）； wherein, , is the number of labels, is the weight of the test sample predicted by the th base classifier to be the label , is the probability that the test sample belongs to the label predicted by the th base classifier; S5. Establish a fusion-noisy learning optimization local classifier model: Based on the optimized local classifier model in step S4, integrate the information-guided noisy learning algorithm to establish a fusion-noisy learning optimization local classifier model. S6. Lithology Identification: Using the optimized model, lithology identification and accuracy comparison are performed on well logging data in mineral exploration based on its performance and efficiency.

2. The method for lithology identification of well logging data integrating noisy learning and local classifiers according to claim 1, characterized in that: In step S1, outliers are directly removed. For redundant data and discrete sampling points, the box plot of each logging attribute is used as the standard. For sampling points whose discrete amplitude exceeds half of the maximum value of the box plot and whose phenomenon occurs in more than half of the logging attributes, the decision is made to remove them directly based on the sufficiency of the number of lithological samples.

3. The method for lithological identification of well logging data integrating noisy learning and local classifiers according to claim 1, characterized in that, The specific process of building the pre-trained model in step S2 is as follows: S21. Correlation analysis of various attributes and the relationships between attributes in the study area; Kendall's correlation coefficient and Spearman's correlation coefficient were used to comprehensively analyze the correlation between various attributes and attributes in the study area, as shown below: （1）； in, For the first The overall score of the attributes For the first Kendall coefficient values for this attribute For the first Spearman coefficient values for the species attribute; S22. Correlation analysis between various attributes and lithology in the study area; The information gain and mutual information gain were calculated to comprehensively analyze the correlation between various attributes and lithology in the study area, as shown below: （2）； in, For the first Each attribute feature For the first Information gain of each attribute feature For the first Mutual information gain of each attribute feature For normalization function, For the first The overall information gain of each attribute feature; S23. Determination of the globally optimal feature subset; The comprehensive score of each attribute is obtained by comprehensively analyzing the correlation between attributes and between attributes and lithology. Since strong correlation between attributes may lead to information redundancy and affect the accuracy of lithology identification, the comprehensive score is obtained by formula (3) to reduce the redundant information caused by the correlation between attributes, as shown below: （3）； in, For the study area One attribute; It is the number of attribute features of the study area; It is the first Each attribute is reflected in the final result of information gain; Then it is the first The attribute and the first The correlation results of each attribute; The range of values is , The range of values is ; It is an absolute value function.

4. The method for lithological identification of well logging data integrating noisy learning and local classifiers according to claim 1, characterized in that, In step S43, adaptive weights are calculated based on the Gaussian mixture model, and the similarity between the test sample and easily confused lithological combinations, as well as the fitness of each base classifier with each easily confused lithological combination, are calculated, as shown below: （4）； in, For clusters Medium cluster samples The probability vector of the Gaussian mixture model. For test samples The probability vector of the Gaussian mixture model. For this sample and a sample in the cluster The similarity of their probability distributions For cluster samples Belongs to the The probability of a Gaussian distribution. For test samples Belongs to the The probability of a Gaussian distribution; （5）； in, Selecting labels for each cluster Down The local evaluation set of this cluster, consisting of the nearest neighbor samples. For test samples With various clusters Local region sample set Feature similarity vectors; , Indicates the first Each cluster is labeled The total number of samples below This indicates the proportion of samples in the nearest neighbor area. Indicates rounding up; （6）； in, Let j be the set of local regions of the base classifiers in each cluster. There exists a corresponding fitness vector for it. Representation base classifier In the Local evaluation sets of each cluster The number of samples correctly classified , Indicates the first Each cluster is labeled The total number of samples below This indicates the proportion of samples in the nearest neighbor area. Indicates rounding up; （7）； in, for Each base classifier in Local region set of each cluster fitness matrix For the first The base classifiers at the th... Set of local regions on each cluster There exists a corresponding fitness vector for it; （8）； in, For test samples Predict as label The weight, For the first The base classifier will test samples Predict as label The weight.

5. The method for lithology identification of well logging data integrating noisy learning and local classifiers according to claim 1, characterized in that: In step S5, a fusion-noisy learning-optimized local classifier model is established; Information guidance is incorporated into the noisy learning stage of a noisy learning algorithm based on dynamic label updates and a phased loss function. This is achieved by adding a penalty term to the loss function to control the direction of loss function reduction, as shown below: （10）； in, This is the original loss function for the noisy learning stage. Is it using the softmax function from The obtained distributed tags, It is a distributed tag collection. It is a network parameter set. For the number of lithological categories, It is used to guide network parameters Update the classification loss function, The model predictions of the discrete label distribution after processing with the softmax function. To input the well logging attribute data into the neural network, Weight parameters used to balance the contributions of different loss functions. Used to ensure distribution labels and original tag Different adaptive loss functions The entropy loss function is used to make the output more skewed towards a single category rather than the average distribution of each category. Represents the original lithology label; （11）；（12）； in, This is a penalty term for cluster similarity. This is the original loss function for the noisy learning stage. A positive hyperparameter is used to control the effect of the penalty term. It is the loss function for the noisy learning stage after adding a penalty term.

6. The method for lithological identification of well logging data integrating noisy learning and local classifiers according to claim 1, characterized in that, In step S6, lithology identification specifically includes: S61. Construction of lithology identification model: The parameters of the original ensemble learning model are adjusted by building an optimized local classifier model and fusing a noisy learning algorithm. S62. Lithology Identification of Well Logging Data: Using an optimized model, lithology identification and accuracy comparison are performed on well logging data in mineral exploration based on its performance and efficiency.