Method and system for identifying lithology based on comprehensive multi-scale relative feature recognition of XGBoost algorithm

By constructing a multi-scale relative feature recognition model using the XGBoost algorithm, the problems of low accuracy and slow efficiency in well logging lithology identification were solved, achieving high-precision lithology identification for complex geological blocks, especially significantly improving the identification effect of thin-layer lithology.

CN116167195BActive Publication Date: 2026-06-12CHINA PETROLEUM & CHEMICAL CORP +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA PETROLEUM & CHEMICAL CORP
Filing Date
2021-11-24
Publication Date
2026-06-12

Smart Images

  • Figure CN116167195B_ABST
    Figure CN116167195B_ABST
Patent Text Reader

Abstract

The application provides a method and system for identifying lithology based on XGBoost algorithm, which selects a reasonable advantage layer thickness interval and sets a smoothing window, then decides a multi-scale relative feature meeting the requirements according to the smoothing window, trains and constructs a multi-scale relative feature identification model by using the XGBoost algorithm, selects relevant petrophysical parameters as rock property features according to the measurement data during logging and well logging operation for the well section to be identified, and then inputs the rock property features into the pre-constructed multi-scale relative feature identification model to obtain a lithology label identification result. By using the scheme, the difference in feature distribution at different well points caused by the geological heterogeneity is comprehensively considered, the problems of elastic feature aliasing caused by sedimentation, structure and diagenesis and the insufficient lithology description ability of single scale feature are solved, the lithology distribution around the well is effectively identified, and the accuracy of lithology identification can be significantly improved compared with the traditional identification technology.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of oil and gas field exploration and development technology, mainly to the identification of lithology of underground drilling strata using well logging data during oil and gas field exploration or development operations, and particularly to a method and system for identifying lithology based on comprehensive multi-scale relative features using the XGBoost algorithm. Background Technology

[0002] In studies on hydrocarbon potential assessment and reservoir characterization, lithology serves as a prerequisite for reservoir distribution, playing a crucial role in delineating favorable stratigraphic positions and identifying oil-bearing layers. Based on this, geophysical logging lithology identification technology refers to the judgment of the lithology surrounding the wellbore using logging data. If lithology identification is inaccurate, the lithology logging responses used for identification will generate incorrect information, leading to biases in seismic lithology identification based on logging analysis.

[0003] Well logging response information of lithology is affected by many factors, resulting in different characteristic distributions of the same lithology at different depths and in different geological regions. Furthermore, the characteristic distributions of different lithologies may have a large degree of overlap, making geophysical well logging lithology identification quite difficult.

[0004] Currently, methods for identifying formation lithology using well logging data mainly include cross plotting and statistical methods. For example, CN110805435A discloses a method and system for identifying complex lithologies based on well logging data. This method determines multiple well logging characteristic values ​​of a template lithology based on core data and well logging data. Then, it standardizes the minimum and maximum values ​​of each well logging characteristic value of the template lithology. Finally, it plots the characteristic graphs of the standardized minimum and maximum values ​​on the same chart to obtain a lithology identification chart. The standardized well logging characteristic values ​​to be identified are then projected onto the lithology identification chart to identify the lithology.

[0005] The aforementioned traditional reservoir lithology identification techniques suffer from low accuracy, slow efficiency, and are highly susceptible to human factors. They also struggle to process high-dimensional information, hindering widespread practical application. In recent years, machine learning methods for automatic lithology identification have been widely adopted. These methods utilize machine learning models, such as support vector machines, random forests, gradient boosting trees, and various neural networks, to learn the complex nonlinear mapping relationship between multi-dimensional feature information and lithology labels, thereby determining the lithology category of the sample to be predicted based on its characteristics.

[0006] However, the aforementioned methods for well logging lithology identification using machine learning often neglect the attention to lithology-related data characteristics. When the geological conditions of the study block are complex, the following problems may arise:

[0007] ① The logging petrophysical response characteristics of the well to be predicted differ from those of the training well, leading to deviations in the model prediction results. For example, when the tectonic undulations of the study block are large, or when factors such as sedimentary facies and lithofacies change, the parameter characteristic distribution range of wells in different locations may differ. When the characteristic diversity of the samples in the training well cannot cover the situation of the prediction well, the machine learning model obtained from the training samples will have difficulty correctly identifying the lithology of the well to be predicted.

[0008] ② The distribution of characteristics of different lithologies overlaps significantly, limiting the ability to distinguish lithologies. For example, when the target layer is a shallow to medium-depth stratum with obvious sedimentary compaction effects, shallow and deep characteristics are mixed together, blurring the regularity of different lithological characteristics and increasing the difficulty of lithology identification;

[0009] ③ Low-dimensional, single-scale features are insufficient to comprehensively and effectively characterize lithology. The complexity and diversity of rocks are reflected in the differences in various features. Using a limited number of features often cannot fully represent the differences between lithologies, and single-scale data features can only focus on a specific range, and cannot more appropriately represent lithological characteristics at larger or smaller scales.

[0010] To address the aforementioned issues, a lack of attention to and mining of data features will limit the algorithm's ability to learn and represent the true mapping relationships, potentially leading to erroneous lithology identification results when using machine learning for lithology prediction.

[0011] The information disclosed in the background section of this invention is intended only to enhance the understanding of the general background of this invention, and should not be construed as an admission or in any way implying that such information constitutes prior art known to those skilled in the art. Summary of the Invention

[0012] To address the above problems, this invention provides a method for identifying lithology based on comprehensive multi-scale relative features using the XGBoost algorithm. In one embodiment, the method includes:

[0013] The steps for acquiring features to be identified are as follows: Selecting rock physical parameters associated with lithology identification based on the measurement data from the current well logging and well logging operations, and using them as the rock layer attribute features of the current well section;

[0014] The lithology label identification step involves inputting the rock layer attribute characteristics into a pre-constructed multi-scale relative feature identification model to obtain the output lithology label data, which serves as the lithology identification result for the current well section.

[0015] The multi-scale relative feature recognition model is constructed by selecting the advantageous layer thickness interval according to the set strategy, setting a smooth window based on it, and then deciding on the multi-scale relative features that meet the requirements according to the set smooth window, and optimizing the training using the XGBoost algorithm.

[0016] Preferably, in the process of constructing the multi-scale relative feature recognition model, the advantageous layer thickness interval is selected through the following operations:

[0017] Step A1: Collect data on all rock layer thickness intervals that have appeared in the selected wells within the set period and meet the set conditions. Sort the collected objects according to their frequency of occurrence and select several rock layer thickness intervals as the first candidate layer thickness intervals based on the highest frequency of occurrence.

[0018] Step A2: Using the greatest layer thickness bonus as the selection criterion, select several rock layer thickness ranges as the second candidate layer thickness ranges;

[0019] Step A3: Perform a union operation on the first dominant layer thickness interval and the second dominant layer thickness interval to obtain several layer thickness intervals as the target dominant layer thickness intervals.

[0020] Furthermore, in one embodiment, a smoothing window is set according to the selected advantageous layer thickness range based on the following logic:

[0021] w = v2 / s

[0022] In the formula, w is the reasonable smoothing window size corresponding to the current dominant layer thickness interval [v1, v2), and S represents the logging sampling interval, m.

[0023] Specifically, in one embodiment, during the construction of the multi-scale relative feature recognition model, multi-scale relative features are obtained based on a set smoothing window decision through the following operations:

[0024] Step B1: Obtain the rock layer attribute characteristics of each selected existing well as the original features;

[0025] Step B2: Using the moving average method, obtain the corresponding multiple smooth features for each original feature of the existing well based on each sliding window. Subtract the smooth features from each original feature to obtain the matching multiple relative features, and associate them with the records of the existing well to which they belong.

[0026] In a preferred embodiment, to reduce feature redundancy during the construction of the multi-scale relative feature recognition model, a feature optimization step is further included, which selects the target relative feature set by performing the following operations:

[0027] Step C1: Group the relative features of the same smooth window into a group and test the contribution of the relative features of each smooth window to lithology identification.

[0028] Step C2: Based on the contribution capability test results, add the features of these smoothed windows to the feature set formed by the original features in descending order of capability. After each update of the feature set, test the identification effect of the updated feature set on lithology compared to the previous feature set. If the identification effect is improved, proceed to step C2 to further add a set of features corresponding to the next smoothed window to the current feature set. If the identification effect is not improved, proceed to step C3.

[0029] Step C3: Stop adding features and use the current feature set as the target relative feature set.

[0030] Furthermore, in one embodiment, in step C1, the contribution of the relative features of each smoothing window to lithology identification is obtained by applying the XGBoost algorithm combined with cross-validation to all relative feature data, wherein the average F1 score is used as its evaluation index.

[0031] Specifically, in one embodiment, during the construction of the multi-scale relative feature recognition model, the relational function of the multi-scale relative feature recognition model is obtained according to the following operations:

[0032] Select several known well sections as training wells according to the set requirements, and obtain the rock formation properties of the training wells;

[0033] The rock strata properties of each training well are combined with the optimized relative features of the corresponding target as input to the XGBoost gradient boosting tree algorithm, and the corresponding lithology identification label is used as the standard output. By combining the set parameter tuning strategy, the XGBoost algorithm fits the best mapping relationship between the input and the standard output. This relationship function is the final multi-scale relative feature recognition model.

[0034] In one optional embodiment, the rock layer properties include: P-wave velocity, S-wave velocity, density, natural gamma, porosity, and resistivity parameters.

[0035] Based on other aspects of the methods described in any one or more of the foregoing embodiments, the present invention also provides a storage medium storing program code that can implement the methods described in any one or more of the foregoing embodiments.

[0036] Based on the execution aspects of the methods described in any one or more of the above embodiments, the present invention also provides a system for identifying lithology based on the XGBoost algorithm using integrated multi-scale relative features, which executes the methods described in any one or more of the above embodiments.

[0037] Compared with the closest prior art, the present invention also has the following beneficial effects:

[0038] This invention provides a method and system for identifying lithology based on the XGBoost algorithm using comprehensive multi-scale relative features. The method selects a reasonable advantageous layer thickness range and sets a smoothing window. Then, based on the set smoothing window, it determines the multi-scale relative features that meet the requirements. The XGBoost algorithm is used to train and construct a multi-scale relative feature identification model. In application, for the well section to be identified, related petrophysical parameters are selected based on measurement data from logging and well logging operations as stratum attribute features. These are input into the pre-constructed multi-scale relative feature identification model to obtain the lithology label identification result. This scheme deeply explores the multi-scale performance of stratum features, comprehensively considering the differences in feature distribution at different well points caused by geological heterogeneity. It addresses the problems of elastic feature overlay caused by sedimentary, tectonic, and diagenetic differences, and the insufficient ability of single-scale features to characterize lithology. It effectively conducts well-perimeter lithology distribution identification, significantly improving the accuracy of lithology identification compared to traditional identification techniques, obtaining lithology distribution data that conforms to the actual underground lithological development, and is more effective for identifying thin layers.

[0039] Other features and advantages of the invention will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the invention. The objects and other advantages of the invention may be realized and obtained by means of the structures particularly pointed out in the description, claims, and drawings. Attached Figure Description

[0040] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with the embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings:

[0041] Figure 1 This is a flowchart illustrating a method for identifying lithology based on the XGBoost algorithm using integrated multi-scale relative features, as provided in an embodiment of the present invention.

[0042] Figure 2 This is a schematic diagram of the process for constructing an identification model in a method for identifying lithology using comprehensive multi-scale relative features provided in another embodiment of the present invention.

[0043] Figure 3 This diagram illustrates a comparison of the F1 scores of lithological prediction results from five wells in the study area using the original feature identification method and the integrated multi-scale relative feature identification method as described in one embodiment of the present invention.

[0044] Figure 4This diagram illustrates the actual lithology (Label) of a well in the study area, the lithology prediction results using the original features, and the lithology prediction results using the integrated multi-scale relative feature identification method of this invention (Original+Relative Features).

[0045] Figure 5 This diagram illustrates the change in accuracy of lithological prediction results using the integrated multi-scale relative feature identification method of this invention for five wells in the study area at corresponding layer thicknesses, compared to lithological prediction results using the original features.

[0046] Figure 6 This is a schematic diagram of the structure of a system for identifying lithology based on the XGBoost algorithm using integrated multi-scale relative features, provided in an embodiment of the present invention. Detailed Implementation

[0047] The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples. Those skilled in the art will then fully understand how the present invention uses technical means to solve technical problems and achieve technical effects, and will be able to implement the present invention specifically based on the above-described implementation process. It should be noted that, as long as there is no conflict, the various embodiments and features of the present invention can be combined with each other, and the resulting technical solutions are all within the protection scope of the present invention.

[0048] Although the flowchart describes the operations as sequential processes, many of these operations can be performed in parallel, concurrently, or simultaneously. The order of the operations can be rearranged. A process can terminate when its operation is complete, but it may also have additional steps not included in the diagram. A process can correspond to a method, function, procedure, subroutine, subroutine, etc.

[0049] Computer equipment includes user equipment and network equipment. User equipment or clients include, but are not limited to, computers, smartphones, PDAs, etc.; network equipment includes, but is not limited to, a single network server, a server group consisting of multiple network servers, or a cloud based on cloud computing consisting of a large number of computers or network servers. Computer equipment can operate independently to implement this invention, or it can connect to a network and implement this invention through interaction with other computer equipment in the network. The network in which the computer equipment is located includes, but is not limited to, the Internet, wide area network, metropolitan area network, local area network, VPN network, etc.

[0050] The terms “first,” “second,” etc., may be used herein to describe various units, but these units should not be limited by these terms; they are used merely to distinguish one unit from another. The term “and / or” as used herein includes any and all combinations of one or more of the associated listed items. When a unit is referred to as “connected” or “coupled” to another unit, it may be directly connected or coupled to said other unit, or there may be intermediate units present.

[0051] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the exemplary embodiments. Unless the context clearly indicates otherwise, the singular forms “a” and “an” as used herein are also intended to include the plural. It should also be understood that the terms “comprising” and / or “including” as used herein specify the presence of the stated features, integers, steps, operations, units, and / or components, without excluding the presence or addition of one or more other features, integers, steps, operations, units, components, and / or combinations thereof.

[0052] In studies on hydrocarbon evaluation and reservoir description, lithology serves as a prerequisite for reservoir distribution and plays a crucial role in delineating favorable strata and identifying oil layers. In actual construction, if lithology identification is inaccurate, the lithology logging response based on it will generate incorrect information, leading to deviations in seismic lithology identification based on logging analysis.

[0053] Well logging response information of lithology is affected by many factors, resulting in different characteristic distributions of the same lithology at different depths and in different geological regions. Furthermore, the characteristic distributions of different lithologies may have a large degree of overlap, making geophysical well logging lithology identification quite difficult.

[0054] Currently, the main methods for identifying lithology using well logging data include cross plotting and statistical methods. For example, the following technical achievements: [1] Jian Z, Fu-hong GA O. Application of crossplots based on welllog data in identifying volcanic lithology[J]. Global Geology, 2003, 22(2): 136-140; [2] Hu Dongfeng, Guo Xusheng, Wang Yan, et al. Method and system for identifying complex lithology based on well logging data, CN110805435A[P]. 2020; Based on core data and well logging data, determine multiple well logging characteristic values ​​of template lithology, then standardize the minimum and maximum values ​​of each well logging characteristic value of template lithology, and then draw the characteristic graphics of the standardized minimum and maximum values ​​in the same chart to obtain the lithology identification chart; Project the standardized well logging characteristic values ​​to be identified onto the lithology identification chart to identify the lithology. [3] Tian Yukun, Zhou Hui, Yuan Sanyi. Lithology identification method based on Markov random field [J]. Chinese Journal of Geophysics, 2013, 56(4):1360-1368.

[0055] The aforementioned traditional reservoir lithology identification techniques suffer from low accuracy, slow efficiency, and are highly susceptible to human factors. They also struggle to process high-dimensional information, hindering widespread practical application. In recent years, machine learning methods for automatic lithology identification have been widely adopted. These methods utilize machine learning models, such as support vector machines, random forests, gradient boosting trees, and various neural networks, to learn the complex nonlinear mapping relationship between multi-dimensional feature information and lithology labels, thereby determining the lithology category of the sample to be predicted based on its characteristics.Specific results can be found in: [4] Mou D, Wang Z WA comparison of binary and multiclass support vector machine models for volcanic lithology estimation using geophysical log data from Liaohe Basin, China [J]. ExplorationGeophysics, 2015, 47(2): 145-149; [5] Cracknell MJ, Reading A M. The upside ofuncertainty: Identification of lithology contact zones from airbornegeophysics and satellite data using random forests and support vectormachines[J].Geophysics,2013,78(3):WB113-WB126;[6]Benaouda D,Wadge G,WhitmarshR B,et al.Inferring the lithology of borehole rocks by applying neuralnetwork classifiers to downhole logs:an example from the Ocean DrillingProgram[J].Geophysical Journal International,1999,136;[7]Yxa B,Cz C,Wen ZB, et al.Evaluation of machine learning methods for formation lithologyidentification:A comparison of tuning processes and model performances[J].Journal of Petroleum Science and Engineering,2018,160:182-193.

[0056] However, the aforementioned methods for well logging lithology identification using machine learning often neglect the attention to lithology-related data characteristics. When the geological conditions of the study block are complex, the following problems may arise:

[0057] ① The logging petrophysical response characteristics of the well to be predicted differ from those of the training well, leading to deviations in the model prediction results. For example, when the tectonic undulations of the study block are large, or when factors such as sedimentary facies and lithofacies change, the parameter characteristic distribution range of wells in different locations may differ. When the characteristic diversity of the samples in the training well cannot cover the situation of the prediction well, the machine learning model obtained from the training samples will have difficulty correctly identifying the lithology of the well to be predicted.

[0058] ② The high degree of overlap in characteristic distribution limits the ability to distinguish lithologies. For example, when the target layer is a shallow to medium-depth stratum with obvious sedimentary compaction effects, shallow and deep characteristics are mixed together, blurring the regularity of different lithological characteristics and increasing the difficulty of lithology identification;

[0059] ③ Low-dimensional, single-scale features are insufficient to comprehensively and effectively characterize lithology. The complexity and diversity of rocks are reflected in the differences in various features. Using a limited number of features often cannot fully represent the differences between lithologies, and single-scale data features can only focus on a specific range, and cannot more appropriately represent lithological characteristics at larger or smaller scales.

[0060] Clearly, in addressing the aforementioned issues, a lack of focus on and mining of data features will limit the algorithm's ability to learn and represent the true mapping relationships, potentially leading to erroneous lithology identification results when using machine learning for lithology prediction.

[0061] To address the aforementioned issues, this invention provides a method and system for identifying lithology based on the XGBoost algorithm using comprehensive multi-scale relative features. This primarily involves the accurate identification of lithology in underground formations encountered during drilling using well logging data, laying the foundation for further formation evaluation, reservoir description, and reservoir parameter calculation.

[0062] The researchers of this invention considered the diverse elastic and physical characteristics of rocks in complex geological structures, and the fact that their logging response patterns vary with spatial location and depth, inevitably leading to overlapping characteristic distributions of different lithologies. Therefore, the ability of single-scale logging data to characterize lithology is limited. To address these issues, this invention proposes a method for identifying lithology using the XGBoost algorithm and multi-scale relative rock physical characteristics. This method can obtain lithology distributions that conform to the actual underground lithological development and effectively improves the lithology identification and prediction results for thin rock layers.

[0063] This invention utilizes lithology and logging data calibrated by well logging to obtain relative rock physical characteristics at different scales. A multi-scale relative feature recognition model trained based on the XGBoost machine learning method is used to identify well logging lithology around the rock physical characteristics, thereby improving prediction accuracy and enhancing the ability to identify thin layers.

[0064] The following describes the detailed flow of the method according to an embodiment of the present invention with reference to the accompanying drawings, the steps of which can be executed in a computer system containing, for example, a set of computer-executable instructions. Although the logical order of the steps is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than that shown here.

[0065] Example 1

[0066] Figure 1 This diagram illustrates a flowchart of the method for identifying lithology based on the XGBoost algorithm using integrated multi-scale relative features, as provided in Embodiment 1 of the present invention. (Refer to...) Figure 1 As can be seen, the method includes the following steps.

[0067] The step of acquiring the features to be identified involves selecting rock physical parameters associated with lithology identification based on measurement data from logging and well logging operations in the current well section. These parameters are used as the rock layer attribute characteristics of the current well section. The rock layer attribute characteristics include at least: P-wave velocity, S-wave velocity, density, natural gamma, porosity, and resistivity parameters. The well section includes any oil and gas well or interval well section with research needs.

[0068] The lithology label identification step involves inputting the rock layer attribute characteristics into a pre-constructed multi-scale relative feature identification model to obtain the output lithology label data, which serves as the lithology identification result for the current well section.

[0069] The multi-scale relative feature recognition model is constructed by selecting the advantageous layer thickness interval according to the set strategy, setting a smooth window based on it, and then deciding on the multi-scale relative features that meet the requirements according to the set smooth window, and optimizing the training using the XGBoost algorithm.

[0070] To enhance the recognition model's focus on multi-scale relative features, this invention, in constructing the multi-scale relative feature recognition model, not only acquires the rock layer attribute features of known well sections, but also selects a reasonable rock layer thickness range for decision smoothing windows. Based on the original features and corresponding smoothed features, a series of multi-scale relative features are then determined to train the auxiliary input parameters of the recognition model. The selection of the advantageous layer thickness range follows two principles: 1) High layer thickness frequency, meaning that the rock layer of this thickness appears most frequently in existing wells; 2) High layer thickness reward, meaning that if the formation of this thickness is correctly identified, it can significantly improve the prediction accuracy.

[0071] Figure 2 This diagram illustrates the process of constructing an identification model in the method for identifying lithology using integrated multi-scale relative features provided in an embodiment of the present invention. Figure 2As shown, in a preferred embodiment, during the construction of the multi-scale relative feature recognition model, the advantageous layer thickness interval is selected through the following operations:

[0072] Step A1: Collect data on all rock layer thickness intervals that have appeared in the selected wells within the set period and meet the set conditions. Sort the collected objects according to their frequency of occurrence and select several rock layer thickness intervals as the first candidate layer thickness intervals based on the highest frequency of occurrence.

[0073] Step A2: Using the greatest layer thickness bonus as the selection criterion, select several rock layer thickness ranges as the second candidate layer thickness ranges;

[0074] Step A3: Perform a union operation on the first dominant layer thickness interval and the second dominant layer thickness interval to obtain several layer thickness intervals as the target dominant layer thickness intervals.

[0075] Based on the strategy described in the above embodiments, advantageous layer thickness intervals are selected. These intervals will serve as the basis for determining the smoothing window size when subsequently acquiring relative features. Layer thickness is defined as the total length of lithological sampling points continuously distributed within the same lithology. Using the logging sampling interval s as a unit interval, the layer thickness distribution of existing wells is statistically analyzed, and advantageous layer thickness intervals are selected. Based on the above principles, the N1 layer thickness intervals with the highest frequency and the N2 layer thickness intervals with the highest reward are selected, and their union is used to obtain M advantageous intervals. To leverage the role of relative features while preventing feature redundancy, N1 and N2 are often set to be less than or equal to 5 as needed.

[0076] Step A2 aims to select the layer thickness interval with the larger reward. That is, if the identification of a certain layer thickness interval is correct, it will bring a greater improvement to the overall accuracy. The calculation is thickness * frequency. For example, for a stratum with a total thickness of 500m, with a sampling interval of 0.1m, there are a total of 5000 sampling points.

[0077] If a 10m thick stratum (i.e., 100 sampling points) can be accurately identified (assuming 100% accuracy), the accuracy will increase by 100*n / 5000, where n represents n 10m thick strata.

[0078] If the 2m thick strata (i.e., 20 sampling points) are accurately identified, the accuracy will increase by 20*k / 5000, where k represents the number of 2m thick strata.

[0079] When k / n > 5, meaning the number of 2m thick strata is five times greater than the number of 10m thick strata, the improvement in overall accuracy from adding 2m thick strata is only greater than that from adding 10m thick strata. Therefore, a larger stratum thickness bonus means that if strata of that thickness are correctly identified, the prediction accuracy can be significantly improved.

[0080] Further, a smoothing window (w) is determined based on the dominant range. In an optional embodiment, the smoothing window is set according to the selected dominant layer thickness range based on the following logic:

[0081] w = v2 / s

[0082] In the formula, w is the reasonable smoothing window size corresponding to the current dominant layer thickness interval [v1, v2), s represents the logging sampling interval, and m.

[0083] In an optional embodiment, during the construction of the multi-scale relative feature recognition model, relative features are further obtained based on a smoothing window. For each feature of each well, a smoothing feature (sf) is obtained using a moving average method. The smoothing feature (sf) is then subtracted from the original feature (f) to obtain the relative feature (relf). Specifically, the multi-scale relative features are obtained based on the smoothing window decision set in the above scheme through the following operations:

[0084] Step B1: Use the obtained rock formation properties of each selected existing well as the original features; in practical applications, rock physical parameters obtained from well logging data are selected as features, such as P-wave velocity (Vp), S-wave velocity (Vs), density (Den), natural gamma (Gr), porosity (CNC), and resistivity (RD).

[0085] Step B2: Using the moving average method, obtain the corresponding multiple smooth features for each original feature of the existing well based on each sliding window. Subtract the smooth features from each original feature according to the following formula to obtain the matching multiple relative features, and associate them with the records of the existing wells to which they belong.

[0086] relf = f - sf

[0087] For each original feature, multiple relative features are obtained under multiple smoothing windows.

[0088] To reduce feature redundancy, the construction of the multi-scale relative feature recognition model also includes a feature optimization step to optimize the obtained relative features. To further reduce feature redundancy, a training-validation set is defined, and relative features within the same window are grouped together for training. The model's ability to identify lithology based on the relative features of each window is tested on the validation set, and these window features are added to the feature set formed from the original features in descending order of capability. If adding a new feature to the current feature set does not improve the model's lithology recognition performance, the relative features in the current feature set are retained as the preferred relative features. Therefore, in a preferred embodiment, the target relative feature set is selected by performing the following operations:

[0089] Step C1: Group the relative features of the same smooth window into a group and test the contribution of the relative features of each smooth window to lithology identification.

[0090] Step C2: Based on the contribution capability test results, add the features of these smoothed windows to the feature set formed by the original features in descending order of capability. After each update of the feature set, test the identification effect of the updated feature set on lithology compared to the previous feature set. If the identification effect is improved, proceed to step C2 to further add a set of features corresponding to the next smoothed window to the current feature set. If the identification effect is not improved, proceed to step C3.

[0091] Step C3: Stop adding features and use the current feature set as the target relative feature set.

[0092] Specifically, in one embodiment, in step C1, the XGBoost algorithm combined with cross-validation is applied to all relative feature data to obtain the contribution ability of the relative features of each smooth window to lithology identification, so as to characterize the feature identification ability of lithology, wherein the average F1 score is used as its evaluation index.

[0093] XGBoost is an optimized gradient boosting tree algorithm that integrates the results of multiple decision trees and is suitable for processing small to medium-sized structured data or tabular data. After selecting the set of relative features for the target, this invention uses the XGBoost algorithm to train a multi-scale relative feature recognition model based on multi-scale relative rock physical characteristics, which is used to achieve efficient and accurate lithology identification for the rock strata of the well section to be identified.

[0094] Therefore, in one embodiment, during the construction of the multi-scale relative feature recognition model, the relational function of the multi-scale relative feature recognition model is obtained by the following operations:

[0095] Select several known well sections as training wells according to the set requirements, and obtain the rock formation properties of the training wells;

[0096] The rock strata properties of each training well are combined with the optimized target relative features as input to the XGBoost gradient boosting tree algorithm, and the corresponding lithology identification label is used as the standard output. The XGBoost algorithm is fitted to obtain the best mapping relationship between the input and the standard output by combining the set parameter tuning strategy. This relationship function is the final multi-scale relative feature recognition model.

[0097] In the above embodiments, the multi-scale relative features selected in the preceding steps are combined with the original rock layer attribute features of the training well section as the input of the model, and the lithology label is used as the output of the model. Under parameter tuning, the XGBoost algorithm is fitted to obtain the best mapping relationship between the input and the output. This relationship is the final lithology identification model. XGBoost is an optimized gradient boosting tree algorithm. This method integrates the results of multiple decision trees and is suitable for processing small and medium-sized structural data or tabular data.

[0098] The lithology identification logic described in the above embodiments of the present invention comprehensively considers the diversity and differences in feature distribution at different well points caused by the heterogeneity of geological features of the target block, the elastic feature superposition caused by differences in sedimentation, tectonics, and diagenesis, and the insufficient ability of single-scale features to characterize lithology. Combining well logging observation data and layer thickness distribution range, a well logging lithology accurate identification method is proposed that integrates the computational advantages of the XGBoost algorithm and the assistance of multi-scale relative rock physical features. This method can effectively carry out well perimeter lithology distribution identification, significantly improve the accuracy of lithology identification, and is more effective for the identification of thin-layer lithology.

[0099] Example Case:

[0100] Taking a gas field P as an example, lithology identification was carried out for this gas field. Five wells are distributed in the block of gas field P, with the main lithologies being types A, B, and C. Well logging obtained six petrophysical parameters: P-wave velocity (Vp), S-wave velocity (Vs), density (Den), natural gamma (Gr), porosity (CNC), and resistivity (RD). The well logging sampling interval s was 0.1 m. Natural gamma, porosity, resistivity, and the calculated P-wave impedance (Ip) and P-wave / S-wave velocity ratio (Vp / Vs) were used as features for lithology identification.

[0101] After analyzing and statistically analyzing the layer thickness distribution of the five wells, five layer thickness intervals with high thickness frequency were selected as [0.9m, 1m), [1.9m, 2m), [1.1m, 1.2m), [1.3m, 1.4m), and [0.7m, 0.8m].

[0102] The five thickness intervals with the largest rewards are selected as [29.9m, 30m), [1.9m, 2m), [0.9m, 1m), [2.9m, 3m), and [7.5m, 7.6m].

[0103] Then, by taking the union of the values, we obtain eight dominant layer thickness intervals: [0.7m, 0.8m), [0.9m, 1m), [1.1m, 1.2m), [1.3m, 1.4m), [1.9m, 2.0m), [2.9m, 3m), [7.5m, 7.6m), [29.9m, 30m].

[0104] The corresponding smoothing window sizes for the decision are 8, 10, 12, 14, 20, 30, 76, and 300 points. Based on these eight smoothing windows, each feature yields eight relative features (taking longitudinal wave impedance as an example, the eight relative features obtained are relIP8, relIP...). 10 relIP 12 relIP 14 relIP 20 relIP 30 relIP 76 relIP 300 That is, 40 features were constructed.

[0105] The ability of the relative features of each window to identify lithology was tested, and the target relative features obtained by smoothing the windows of 8, 12, 20, 76 and 300 points were finally selected.

[0106] The XGBoost algorithm is used to predict the lithology of each well in turn; that is, the model is trained using data from four wells in turn to predict the lithology of the remaining well. The lithology prediction performance and thin-layer identification capabilities are compared using a combination of relative features (30 features in total, including relative and original features) versus using only the original features (5 features). Figure 3 This section compares the F1 scores for the two scenarios. It's important to note that for well w1, due to its different sedimentary facies, the distribution range of its petrophysical parameters differs from the other four wells. Therefore, only the 25 constructed relative features were used for prediction of this well. After using relative features, the lithological predictions for the five wells became more accurate, showing improvements in F1 scores of 0.0677, 0.0492, 0.0147, 0.0484, and 0.0270, respectively. Figure 4 The example illustrates the actual lithology (Label), lithology prediction results using original features, and lithology prediction results using both original and relative features for a well in the study area; (See Appendix) Figure 3 This is also reflected in the well lithology column data shown.

[0107] In the above embodiments, the present invention introduces multi-scale relative features into the application of lithology identification. In practical scenarios, the application mainly falls into two categories:

[0108] 1. Most application scenarios are: the training and prediction objects are similar (feature distribution ranges overlap / are similar), such as w2, w3, w4, w5. Based on such data, training and prediction using only the original features can achieve a good prediction accuracy. Using the original features in combination with relative features can further improve the prediction accuracy.

[0109] 2. In a few application scenarios, the training and prediction targets are dissimilar. For example, when predicting well w1, its feature / rock physics parameter distribution range differs from other wells (almost no overlap / minor overlap). In this case, it is important to note that only relative features should be used, and the original features should not be combined. This operation will significantly affect the prediction results.

[0110] In practical applications, rock physical analysis is often performed before lithological prediction. The results of rock physical analysis can effectively determine whether the characteristic distribution ranges overlap or are similar. Among them, the data distribution range of rock physical parameters is used to determine whether there is overlap or similarity. For example, the sedimentary facies and well depth of a well are known, but these two factors do not constitute the final judgment criteria. Instead, the overlap or similarity of the data distribution ranges is the most important reference.

[0111] Figure 5 The figure details the improvement effect of using the combined relative features compared to using only the original features for specific layer thickness segments. The figure mainly shows the change in recognition rate. For example, for a stratum with a thickness of [1.7, 1.8) meters, the original recognition accuracy rate was 15%, and now it is 90%, which is an improvement of 75%, as shown in the figure (black). If it decreases, it will be 75% (gray).

[0112] The accuracy is calculated as follows: There are a total of m sample points belonging to the [1.7, 1.8) meter stratum (0.1m is one sample point). Under the original features, i sample points are correctly identified, and under the multi-scale relative features, j sample points are correctly identified. Therefore, the identification accuracies are i / m and j / m, respectively. According to... Figure 5 The information revealed clearly shows that the accuracy of lithological prediction in thin-layer (0-10m) sections using the multi-scale relative feature recognition model of the present invention has been improved.

[0113] For the foregoing method embodiments, in order to simplify the description, they are all described as a series of actions. However, those skilled in the art should understand that the present invention is not limited to the described order of actions, because according to the present invention, some steps can be performed in other orders or simultaneously. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to the present invention.

[0114] It should be noted that, in other embodiments of the present invention, the method can also combine one or more of the above embodiments to obtain a new method for identifying the lithology of oil and gas well reservoirs, so as to achieve accurate analysis of well section exploration data and development data.

[0115] It should be noted that, based on the methods in any one or more embodiments of the present invention described above, the present invention also provides a storage medium storing program code that can implement the methods described in any one or more embodiments. When the program code is executed by the operating system, it can implement the method for identifying lithology based on the XGBoost algorithm as described above.

[0116] Example 2

[0117] The methods described in the above-disclosed embodiments of the present invention are detailed. These methods can be implemented using various forms of devices or systems. Therefore, based on other aspects of the methods described in any one or more of the above embodiments, the present invention also provides a system for identifying lithology using integrated multi-scale relative features based on the XGBoost algorithm. This system is used to execute the method for identifying lithology using integrated multi-scale relative features based on the XGBoost algorithm described in any one or more of the above embodiments. Specific embodiments are given below for detailed description.

[0118] Specifically, Figure 6 The diagram shows a schematic representation of the structure of a system for identifying lithology based on the XGBoost algorithm using integrated multi-scale relative features, as provided in an embodiment of the present invention. Figure 6 As shown, the system includes:

[0119] The feature acquisition module is configured to select rock physical parameters associated with lithology identification based on the measurement data during the current well logging and well logging operations, and use them as the rock layer attribute features of the current well layer.

[0120] The lithology label recognition module is configured to input the rock layer attribute features into a pre-constructed multi-scale relative feature recognition model to obtain the output lithology label data, which serves as the lithology recognition result of the current well section.

[0121] The recognition model construction module is configured to select the advantageous layer thickness interval according to the set strategy, set a smooth window based on it, and then decide on the multi-scale relative features that meet the requirements according to the set smooth window, and use the XGBoost algorithm to optimize and train the multi-scale relative feature recognition model.

[0122] In a preferred embodiment, the recognition model construction module selects the advantageous layer thickness range through the following operations:

[0123] Step A1: Collect data on all rock layer thickness intervals that have appeared in the selected wells within the set period and meet the set conditions. Sort the collected objects according to their frequency of occurrence and select several rock layer thickness intervals as the first candidate layer thickness intervals based on the highest frequency of occurrence.

[0124] Step A2: Using the greatest layer thickness bonus as the selection criterion, select several rock layer thickness ranges as the second candidate layer thickness ranges;

[0125] Step A3: Perform a union operation on the first dominant layer thickness interval and the second dominant layer thickness interval to obtain several layer thickness intervals as the target dominant layer thickness intervals.

[0126] Furthermore, in one embodiment, the recognition model construction module sets a smoothing window according to the selected advantageous layer thickness range based on the following logic:

[0127] w = v2 / s

[0128] In the formula, w is the reasonable smoothing window size corresponding to the current dominant layer thickness interval [v1, v2), and S represents the logging sampling interval, m.

[0129] Specifically, in one embodiment, the recognition model building module obtains multi-scale relative features based on a set smoothing window decision through the following operations:

[0130] Step B1: Use the obtained strata attribute features of each selected existing well as the original features;

[0131] Step B2: Using the moving average method, obtain the corresponding multiple smooth features for each original feature of the existing well based on each sliding window. Subtract the smooth features from each original feature to obtain the matching multiple relative features, and associate them with the records of the existing well to which they belong.

[0132] To reduce feature redundancy, in a preferred embodiment, the recognition model building module selects the target relative feature set by performing the following operations:

[0133] Step C1: Group the relative features of the same smooth window into a group and test the contribution of the relative features of each smooth window to lithology identification.

[0134] Step C2: Based on the contribution capability test results, add the features of these smoothed windows to the feature set formed by the original features in descending order of capability. After each update of the feature set, test the identification effect of the updated feature set on lithology compared to the previous feature set. If the identification effect is improved, proceed to step C2 to further add a set of features corresponding to the next smoothed window to the current feature set. If the identification effect is not improved, proceed to step C3.

[0135] Step C3: Stop adding features and use the current feature set as the target relative feature set.

[0136] Furthermore, in step C1, the identification model construction module applies the XGBoost algorithm combined with cross-validation to all relative feature data to obtain the contribution ability of the relative features of each smooth window to lithology identification, wherein the average F1 score is used as its evaluation index.

[0137] In one embodiment, the recognition model construction module further obtains the relational function of the multi-scale relative feature recognition model by performing the following operations:

[0138] Select several known well sections as training wells according to the set requirements, and obtain the rock formation properties of the training wells;

[0139] The rock strata properties of each training well are combined with the optimized target relative features as input to the XGBoost gradient boosting tree algorithm, and the corresponding lithology identification label is used as the standard output. The XGBoost algorithm is fitted to obtain the best mapping relationship between the input and the standard output by combining the set parameter tuning strategy. This relationship function is the final multi-scale relative feature recognition model.

[0140] In the lithology identification system based on the XGBoost algorithm provided in this embodiment of the invention, each module or unit structure can operate independently or in combination according to actual identification and computational needs to achieve the corresponding technical effects.

[0141] It should be understood that the embodiments disclosed herein are not limited to the specific structures, processing steps, or materials disclosed herein, but should be extended to equivalent substitutions of these features as understood by those skilled in the art. It should also be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.

[0142] The phrase "an embodiment" in the specification means that a specific feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Therefore, the phrase "an embodiment" appearing in various places throughout the specification does not necessarily refer to the same embodiment.

[0143] While the embodiments disclosed in this invention are as described above, the content is merely for the purpose of facilitating understanding of the invention and is not intended to limit the invention. Any person skilled in the art to which this invention pertains may make any modifications and variations in form and detail of the implementation without departing from the spirit and scope disclosed herein; however, the scope of patent protection for this invention shall still be determined by the scope defined in the appended claims.

Claims

1. A method for identifying lithology based on comprehensive multi-scale relative features using the XGBoost algorithm, characterized in that, The method includes: The steps for acquiring features to be identified are as follows: Selecting rock physical parameters associated with lithology identification based on the measurement data from the current well logging and well logging operations, and using them as the rock layer attribute features of the current well section; The lithology label identification step involves inputting the rock layer attribute characteristics into a pre-constructed multi-scale relative feature identification model to obtain the output lithology label data, which serves as the lithology identification result for the current well section. The multi-scale relative feature recognition model is constructed by selecting the advantageous layer thickness interval according to the set strategy, setting a smooth window based on it, and then deciding on the multi-scale relative features that meet the requirements according to the set smooth window, and using the XGBoost algorithm for optimization training. In constructing the multi-scale relative feature recognition model, the multi-scale relative features are obtained by making decisions based on the set smoothing window through the following operations: Step B1: Obtain the rock layer attribute characteristics of each selected existing well as the original features; Step B2: Using the moving average method, obtain the corresponding multiple smooth features for each original feature of the existing well based on each sliding window. Subtract the smooth features from each original feature to obtain the matching multiple relative features, and associate them with the records of the existing well to which they belong.

2. The method according to claim 1, characterized in that, In constructing the multi-scale relative feature recognition model, the advantageous layer thickness interval is selected through the following operations: Step A1: Collect data on all rock layer thickness intervals that have appeared in the selected wells within the set period and meet the set conditions. Sort the collected objects according to their frequency of occurrence and select several rock layer thickness intervals as the first candidate layer thickness intervals based on the highest frequency of occurrence. Step A2: Using the greatest layer thickness bonus as the selection criterion, select several rock layer thickness ranges as the second candidate layer thickness ranges; Step A3: Perform a union operation on the first candidate layer thickness interval and the second candidate layer thickness interval to obtain several layer thickness intervals as the target dominant layer thickness intervals.

3. The method according to claim 1, characterized in that, Set a smoothing window based on the selected advantageous layer thickness range according to the following logic: In the formula, w For the current advantageous layer thickness range [ v 1, v 2) The corresponding reasonable smoothing window size, where S represents the logging sampling interval in meters.

4. The method according to claim 1, characterized in that, In constructing the multi-scale relative feature recognition model, a feature optimization step is also included to reduce feature redundancy, which selects the target relative feature set by performing the following operations: Step C1: Group the relative features of the same smooth window into a group and test the contribution of the relative features of each smooth window to lithology identification. Step C2: Based on the contribution capability test results, add the features of these smoothing windows to the feature set formed by the original features in descending order of capability; After each update of the feature set, test the lithology identification effect of the updated feature set compared to the previous feature set. If the identification effect is improved, proceed to step C2 to add a set of features corresponding to the next smoothing window to the current feature set; if the identification effect is not improved, proceed to step C3. Step C3: Stop adding features and use the current feature set as the target relative feature set.

5. The method according to claim 4, characterized in that, In step C1, the XGBoost algorithm combined with cross-validation is applied to all relative feature data to obtain the contribution of the relative features of each smooth window to lithology identification, with the average F1 score used as the evaluation index.

6. The method according to claim 1, characterized in that, In constructing the multi-scale relative feature recognition model, the relational function of the multi-scale relative feature recognition model is obtained by following the steps below: Select several known well sections as training wells according to the set requirements, and obtain the rock formation properties of the training wells; The rock strata properties of each training well are combined with the optimized target relative features as input to the XGBoost gradient boosting tree algorithm, and the corresponding lithology identification label is used as the standard output. The XGBoost algorithm is fitted to obtain the best mapping relationship between the input and the standard output by combining the set parameter tuning strategy. This relationship function is the final multi-scale relative feature recognition model.

7. The method according to claim 1, characterized in that, The rock strata properties include: P-wave velocity, S-wave velocity, density, natural gamma, porosity, and resistivity parameters.

8. A storage medium, characterized in that, The storage medium stores program code capable of implementing the method as described in any one of claims 1 to 7.

9. A system for identifying lithology based on comprehensive multi-scale relative features using the XGBoost algorithm, characterized in that, The system performs the method as described in any one of claims 1 to 7.