Training method and program product for lithology identification model

By employing multi-task pre-training and supervised fine-tuning, a lithology identification model was constructed, which solved the problems of low efficiency and insufficient identification accuracy in small sample scenarios of traditional rock identification, and improved the feature representation and identification accuracy of rock images.

CN122244866APending Publication Date: 2026-06-19PETROCHINA CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
PETROCHINA CO LTD
Filing Date
2026-05-22
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional manual identification of rock thin sections is inefficient and highly subjective. Deep learning-based lithology identification technology has limited accuracy in small sample scenarios, and existing methods have shortcomings in adapting to rock image features and mining texture and mineral coexistence features.

Method used

By acquiring the first sample rock image for multi-task pre-training, a pre-trained encoder is constructed. Combined with a classification head, the parameters of the feature extraction layer and the classification head are adjusted using multi-task pre-training and supervised fine-tuning to construct a lithology identification model.

Benefits of technology

It improves the robustness of rock image feature representation and the accuracy of lithology identification, enhances the model's generalization ability in small sample scenarios, and achieves more accurate lithology identification.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244866A_ABST
    Figure CN122244866A_ABST
Patent Text Reader

Abstract

This invention discloses a training method and program product for a lithology identification model. The method includes: acquiring a first sample rock image; pre-training an initial encoder based on the first sample rock image using multiple learning tasks to obtain a pre-trained encoder, the initial encoder including multiple feature extraction layers; acquiring a second sample rock image and corresponding sample category labels; and connecting a classification head to the output of the pre-trained encoder to obtain an initial lithology identification model; inputting the second sample rock image into the initial lithology identification model to obtain a sample rock category; adjusting the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model according to the sample rock category and the sample category labels to obtain a target lithology identification model. Through multi-task pre-training combined with supervised fine-tuning, the accuracy and generalization ability of rock image lithology identification are improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of petroleum exploration technology, and in particular to a training method and program product for a lithology identification model. Background Technology

[0002] As oil and gas exploration and development advances into deeper and more complex reservoirs, accurate lithology identification has become crucial for reservoir evaluation. Traditional manual identification of rock thin sections is inefficient and highly subjective. Deep learning-based lithology identification technology, which can automatically extract microscopic image features, has become an important development direction in geological analysis.

[0003] In related technologies, ImageNet pre-trained models are often transferred to rock images, or CNN networks are trained directly from scratch. The former suffers from poor feature adaptation due to the difference in the distribution of microstructure between natural images and rocks; the latter is prone to overfitting in small sample scenarios, and single-task training is insufficient to fully explore the multi-dimensional features of rock images such as texture and mineral coexistence, thus limiting recognition accuracy. Summary of the Invention

[0004] This invention provides a training method and program product for a lithology identification model to solve the problems of insufficient rock image feature representation and low lithology identification accuracy in small sample scenarios.

[0005] According to one aspect of the present invention, a method for training a lithology identification model is provided, comprising:

[0006] A first sample rock image is acquired, and an initial encoder is pre-trained based on the first sample rock image according to multiple learning tasks to obtain a pre-trained encoder. The initial encoder includes multiple feature extraction layers.

[0007] The second sample rock image and the corresponding sample category label are obtained, and a classification head is connected to the output of the pre-trained encoder to obtain an initial lithology recognition model.

[0008] The second sample rock image is input into the initial lithology identification model to obtain the sample rock category. Based on the sample rock category and the sample category label, the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model are adjusted to obtain the target lithology identification model.

[0009] According to another aspect of the present invention, a training apparatus for a lithology identification model is provided, comprising:

[0010] A pre-trained encoder determination module is used to acquire a first sample rock image, and pre-train an initial encoder based on the first sample rock image according to multiple learning tasks to obtain a pre-trained encoder. The initial encoder includes multiple feature extraction layers.

[0011] The initial model determination module is used to acquire the second sample rock image and the sample category label corresponding to the second sample rock image, and to connect the classification head to the output of the pre-trained encoder to obtain the initial lithology recognition model;

[0012] The target model determination module is used to input the second sample rock image into the initial lithology identification model to obtain the sample rock category, and adjust the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model according to the sample rock category and the sample category label to obtain the target lithology identification model.

[0013] According to another aspect of the present invention, an electronic device is provided, the electronic device comprising:

[0014] At least one processor; and

[0015] A memory communicatively connected to the at least one processor; wherein,

[0016] The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the training method of the lithology identification model according to any embodiment of the present invention.

[0017] According to another aspect of the present invention, a computer-readable storage medium is provided, the computer-readable storage medium storing computer instructions for causing a processor to execute and implement the training method of the lithology identification model according to any embodiment of the present invention.

[0018] According to another aspect of the present invention, embodiments of this disclosure also provide a computer program product, including a computer program that, when executed by a processor, implements a training method for a lithology identification model as described in any of the embodiments of this disclosure.

[0019] The technical solution of this invention involves acquiring a first sample rock image, pre-training an initial encoder based on the first sample rock image using multiple learning tasks to obtain a pre-trained encoder. The initial encoder includes multiple feature extraction layers, and multi-task pre-training allows the encoder to learn more general and robust rock feature representations. Next, a second sample rock image and its corresponding sample category label are acquired. A classification head is then connected to the output of the pre-trained encoder to obtain an initial lithology identification model. This model can be built based on the pre-trained encoder and classification head, constructing a complete lithology classification identification chain. The second sample rock image is input into the initial lithology identification model to obtain the sample rock category. Based on the sample rock category and the sample category label, the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model are adjusted to obtain a target lithology identification model. By specifically fine-tuning some feature layers and the classification head, the model accurately adapts to the lithology identification task, solving the problems of insufficient rock image feature representation and low lithology identification accuracy in small sample scenarios. Through multi-task pre-training combined with supervised fine-tuning, the accuracy and generalization ability of rock image lithology identification are improved.

[0020] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description

[0021] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0022] Figure 1 This is a flowchart of a training method for a lithology identification model according to Embodiment 1 of the present invention;

[0023] Figure 2 This is a flowchart of a training method for a lithology identification model according to Embodiment 2 of the present invention;

[0024] Figure 3 This is a schematic diagram of the structure of a training device for a lithology identification model according to Embodiment 3 of the present invention;

[0025] Figure 4 This is a schematic diagram of the structure of an electronic device that implements the training method of the lithology identification model in the embodiments of the present invention. Detailed Implementation

[0026] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0027] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0028] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0029] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

[0030] It is understood that before using the technical solutions disclosed in the various embodiments of this disclosure, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in this disclosure in an appropriate manner in accordance with relevant laws and regulations, and user authorization should be obtained.

[0031] For example, upon receiving a user's active request, a prompt message is sent to the user to explicitly inform them that the requested operation will require the acquisition and use of the user's personal information. This allows the user to independently choose whether to provide personal information to the software or hardware, such as the electronic device, application, server, or storage medium performing the operations of this disclosed technical solution, based on the prompt message.

[0032] As an optional but non-limiting implementation, in response to a user's active request, sending a prompt message to the user can be done via a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide personal information to the electronic device.

[0033] It is understood that the above notification and user authorization process are merely illustrative and do not constitute a limitation on the implementation of this disclosure. Other methods that comply with relevant laws and regulations may also be applied to the implementation of this disclosure.

[0034] It is understood that the data involved in this technical solution (including but not limited to the data itself, the acquisition or use of the data) shall comply with the requirements of relevant laws, regulations and related provisions.

[0035] Example 1

[0036] Figure 1 The flowchart of a lithology identification model training method provided in Embodiment 1 of the present invention is applicable to the technical scenarios of intelligent analysis of drilling core images and automatic classification and identification of formation lithology. The method can be executed by a lithology identification model training device, which can be implemented in hardware and / or software. Optionally, it can be implemented through electronic devices, such as mobile terminals, PCs, or servers.

[0037] like Figure 1 As shown, the method may specifically include:

[0038] S110. Obtain a first sample rock image, and pre-train an initial encoder based on the first sample rock image according to multiple learning tasks to obtain a pre-trained encoder. The initial encoder includes multiple feature extraction layers.

[0039] The first sample rock image can be understood as a type of unlabeled raw rock scene / slice image data collected during the encoder pre-training stage. This data serves as input for the initial encoder, allowing it to learn the general visual features (texture, structure, etc.) of the rock image and improve the generalization ability of subsequent lithology identification. The learning task can be understood as a self-supervised or multi-task feature learning sub-task set for the rock image, mining low-level features from different dimensions to constrain the encoder to learn general representations. The initial encoder can be understood as a learnable feature extraction network that has not yet been fully trained. It is responsible for mapping the input rock image into a high-level feature representation and is the core feature extraction module of the entire model. The pre-trained encoder can be understood as the encoder obtained after multi-task pre-training of the initial encoder, outputting more generalized rock image features to provide high-quality representations for subsequent lithology classification. The feature extraction layer can be understood as the network layer that makes up the encoder, abstracting rock image information step by step, from shallow contours to deep lithological semantic features.

[0040] Based on the above scheme, an optional implementation involves preprocessing the first sample rock image. The first sample rock image can be a raw high-resolution core scan image or a microscopic thin section image of shale. Specifically, the raw high-resolution core scan image or microscopic thin section image of shale is automatically cleaned and standardized to construct a high-quality unlabeled image tensor set. Furthermore, considering the high clay mineral content and easy blurring of shale images, a multi-scale Retinex algorithm is used to decompose the image into illumination and reflection components, and the dark mineral texture details are recovered by suppressing the illumination component. In the morphological processing stage, an adaptive structuring element size selection mechanism based on grain size distribution is introduced: the gray-level co-occurrence matrix of local image regions is statistically analyzed to estimate the average mineral grain size. Dynamically set the closed-loop core size:

[0041] ;

[0042] In the formula: This is the size of the closed-loop kernel; This is an estimate of the average mineral grain size for a local region of the image. Image spatial resolution; This is the floor function. This mechanism ensures that while filling the intergranular micropores, the edge sharpness of brittle mineral particles such as quartz and feldspar is strictly protected, preserving crucial structural boundary information for subsequent bedding identification, and ultimately outputting a clean set of rock structure images. The first sample rock image of the final clean set of rock structure images output in this step serves as the unlabeled sample library for subsequent self-supervised pre-training.

[0043] Based on the above scheme, optionally, the learning task includes a contrast task; the step of pre-training the initial encoder based on the first sample rock image using multiple learning tasks to obtain a pre-trained encoder includes: performing image enhancement on the first sample rock image to obtain a sample enhanced image; inputting the first sample rock image and the sample enhanced image into the initial encoder to obtain a first coding feature of the first sample rock image and a second coding feature of the sample enhanced image; inputting the first coding feature and the second coding feature into a projection head to perform dimensionality reduction processing on the first coding feature and the second coding feature to obtain a first low-dimensional feature and a second low-dimensional feature; determining a contrast loss based on the first low-dimensional feature and the second low-dimensional feature; and adjusting the parameters of the initial encoder based on the contrast loss to obtain a pre-trained encoder.

[0044] The comparison task can be understood as a representation learning task with the feature similarity of positive and negative sample pairs as the optimization objective. This aims to narrow the feature distance between similar rock images and widen the feature distance between dissimilar images, allowing the encoder to learn the inherent essential features of the rock. Image enhancement can be understood as a data augmentation operation that transforms the original rock image to generate variant images, expanding sample diversity, avoiding model overfitting, and improving the encoder's robustness to rock images. Image enhancement can construct positive sample pairs through random cropping, color dithering, Gaussian blurring, and grayscale conversion. The enhanced sample image can be understood as the rock image obtained after image enhancement transformation of the first sample rock image, serving as a positive sample for the comparison task. Sample pairs of the same image from different perspectives are constructed for self-supervised training. The first encoded feature can be understood as the high-dimensional depth feature output after the original first sample rock image is input into the initial encoder, representing the model's current understanding and digital representation of the original rock image. The second encoded feature can be understood as the high-dimensional depth feature output after the enhanced sample image is input into the initial encoder, representing the corresponding deep features of the enhanced rock image, used for comparative learning with the original image features. The projection head can be understood as a feature mapping auxiliary network module connected to the encoder output and composed of multiple layers. It maps the high-dimensional features of the encoder to a representation space suitable for comparative learning, facilitating loss calculation. The dimensionality reduction process can be understood as the operation of mapping high-dimensional encoded features into lower-dimensional, more condensed feature vectors, reducing feature redundancy, lowering computational complexity, and enabling comparative learning in a more compact space. The first low-dimensional feature can be understood as a low-dimensional vector obtained by the projection head after dimensionality reduction of the first encoded feature, used to calculate the comparative loss with the positive samples (second low-dimensional features) of the first low-dimensional feature. The second low-dimensional feature can be understood as a low-dimensional vector obtained by the projection head after dimensionality reduction of the second encoded feature, forming a positive sample pair with the first low-dimensional feature, which is brought closer in the comparative loss. The comparative loss can be understood as a self-supervised loss function calculated based on the similarity of the low-dimensional features of positive and negative samples, providing gradient constraints to guide the encoder to optimize in the direction of clustering similar features and separating dissimilar features. The parameter adjustment can be understood as the process of correcting the network weights through the loss gradient, gradually optimizing the parameters of the initial encoder, and finally obtaining a pre-trained encoder more suitable for representing rock images.

[0045] This technical solution involves enhancing rock images to construct positive and negative sample pairs, extracting features through an encoder, reducing dimensionality using a projection head, and iteratively optimizing encoder parameters using contrast loss. This approach can uncover the essential features of rock texture and structure without manual annotation, enhancing the robustness of feature representation and improving the generalization ability of subsequent lithology identification models.

[0046] Optionally, based on the above scheme, the learning task includes a rotation task; the step of pre-training the initial encoder based on the first sample rock image using multiple learning tasks to obtain a pre-trained encoder includes: rotating the first sample rock image to obtain a rotated image; inputting the first sample rock image and the rotated image into the initial encoder to obtain a third coding feature of the first sample rock image and a fourth coding feature of the rotated image; converting the third coding feature into a first histogram feature, converting the fourth coding feature into a second histogram feature, determining a rotation loss based on the first histogram feature and the second histogram feature, and adjusting the parameters of the initial encoder based on the rotation loss to obtain a pre-trained encoder.

[0047] The rotation task can be understood as a self-supervised learning task. By applying a known rotation (e.g., 0° / 90° / 180° / 270°) to the image, the model predicts the rotation angle, forcing the encoder to focus on the semantic structure and orientation information of the image, thus improving the spatial understanding of features. Image rotation can be understood as performing a rigid rotation transformation on the image, constructing image pairs with known geometric transformation relationships to provide supervision signals for the rotation prediction task. The rotated image can be understood as the image obtained after applying a certain rotation angle to the first sample rock image, forming an "original-rotated" image pair with the original image, used for encoding and subsequent loss calculation. The third encoded feature can be understood as the high-dimensional feature extracted by the initial encoder from the first sample rock image, representing the representation of the original rock image in the feature space, used for histogram transformation and loss calculation. The fourth encoded feature can be understood as the high-dimensional feature extracted by the initial encoder from the rotated image, representing the feature representation of the rotated rock image, and modeling its relationship with the third encoded feature. The histogram feature can be understood as performing a statistical mapping on the network encoded features, obtaining a feature distribution statistical vector, transforming high-dimensional abstract features into a distribution statistical form, facilitating the quantification of the spatial differences between two images. The first histogram feature can be understood as a feature distribution statistical histogram vector obtained by transforming the third encoding feature, characterizing the numerical and structural distribution patterns of the encoded features of the original rock image. The second histogram feature can be understood as a feature distribution statistical histogram vector obtained by transforming the fourth encoding feature, characterizing the numerical and structural distribution patterns of the encoded features of the rotated rock image. The rotation loss can be understood as a self-supervised task loss function calculated using the differences between the first and second histogram features, quantifying the feature distribution deviation between the original and rotated images, and guiding the iterative optimization of encoder parameters.

[0048] This technical solution utilizes rock image rotation transformation to construct self-supervised sample pairs, converts encoded features into histogram features and calculates rotation loss, and optimizes encoder parameters in reverse. This enables the encoder to autonomously learn the spatial orientation and structural distribution characteristics of rock bedding, thereby improving the model's adaptability to changes in rock image pose and its ability to extract lithological features.

[0049] Based on the above scheme, optionally, the learning task includes a reconstruction task; the step of pre-training the initial encoder based on the first sample rock image using multiple learning tasks to obtain a pre-trained encoder includes: occluding part of the content in the first sample rock image to obtain a sample missing image; reconstructing the sample missing image using a decoder to obtain a sample reconstructed image; inputting the first sample rock image and the sample reconstructed image into the initial encoder to obtain a fifth coding feature of the first sample rock image and a sixth coding feature of the sample reconstructed image; determining a reconstruction loss based on the fifth coding feature and the sixth coding feature; and adjusting the parameters of the initial encoder based on the reconstruction loss to obtain a pre-trained encoder.

[0050] The reconstruction task can be understood as a self-supervised or generative task, requiring the model to recover the original input from partial or damaged input, forcing the encoder to extract sufficiently complete semantic and structural information so that the decoder can reconstruct the original rock image as accurately as possible. The missing sample image can be understood as an image obtained by partially occluding the first sample rock image, serving as input to the encoding and reconstruction process to test the model's ability to recover "from incomplete to complete". The decoder can be understood as a learnable network module used to map encoded features or latent representations back to the image space, reconstructing the sample reconstructed image based on the features output by the encoder; it is the core generative component of the reconstruction task. The reconstructed sample image can be understood as the image output by the decoder after reconstructing the missing sample image, compared with the original first sample rock image, used to calculate the reconstruction loss and guide parameter updates. The fifth encoded feature can be understood as the feature extracted by the initial encoder from the "first sample rock image (complete image)," representing the high-level representation of the complete rock image, used for comparison with the features of the reconstructed image. The sixth encoded feature can be understood as the feature extracted by the initial encoder from the "sample reconstructed image," characterizing the feature information of the restored rock image, used for difference comparison with the features of the original image. The reconstruction loss can be understood as a self-supervised loss function constructed based on the feature differences between the fifth and sixth coding features, which quantifies the feature deviation between the original image and the reconstructed image and provides gradient signal constraints for network parameter updates.

[0051] This technical solution generates missing samples by occluding rock images and reconstructs them using a decoder. It constructs a reconstruction loss using the encoding features of the original and reconstructed images and optimizes the encoder parameters in a self-supervised manner, enabling the encoder to learn the global structure and local contextual features of the rock image, thereby enhancing the completeness and representational ability of feature extraction in complex lithological scenes.

[0052] Based on the above scheme, optionally, the learning task includes a distributed task; the step of pre-training the initial encoder based on the first sample rock image using multiple learning tasks to obtain a pre-trained encoder includes: selecting multiple target local images from the first sample rock image, and obtaining a reference local image from the third sample rock image; inputting the multiple target local images and the reference local image into the initial encoder to obtain the seventh coding feature of the target local image and the eighth coding feature of the reference local image, determining the homologous feature similarity between the multiple target local images, and determining the heterologous feature similarity between the target local image and the reference local image; determining the distributed loss according to the ranking relationship of the homologous feature similarity and the heterologous feature similarity, and adjusting the parameters of the initial encoder according to the distributed loss to obtain the pre-trained encoder.

[0053] The distribution task can be understood as a self-supervised or weakly supervised task constrained by the "feature distribution relationship between homogeneous / heterogeneous samples." It standardizes the distribution structure of the feature space by keeping the features of samples from the same source / category closer and the features of samples from different sources / categories further apart. The target local image can be understood as multiple local regions / image patches selected from the first sample rock image, representing different local views of the same rock sample. The third sample rock image can be understood as another rock image sample, originating from the same drilling area as the first sample rock image and having a depth interval greater than a preset threshold. Alternatively, the third sample rock image can be a sample rock image with a different sample category label than the first sample rock image, forming a negative or heterogeneous reference in the distribution task by satisfying a certain heterogeneous condition with the first sample rock image. The reference local image can be understood as a local image (which can be one or more) obtained from the third sample rock image, serving as a "heterogeneous local image" used to calculate the feature similarity between the target local image and the heterogeneous image. The "same drilling area" can be understood as a collection of rock samples from the same well, the same geological unit, or the same sedimentary environment. Within a lithological / sedimentary context, similar depths generally imply a stronger homology. The "depth spacing" can be understood as the distance between two rock images in the well depth direction, serving as one of the geological prerequisites for determining "possibility of homology / similarity." The "preset threshold" can be understood as a set critical value for depth spacing; when the depth spacing exceeds this threshold, the two images are considered more likely to be heterogeneous / heterogeneous, thus suitable as negative samples for the distribution task. The "seventh encoded feature" can be understood as the features extracted by the initial encoder from the target local image, representing the texture and deep structural features of each local rock within the same original image, used for homology similarity calculation. The "eighth encoded feature" can be understood as the depth feature vector output by the initial encoder after inputting the reference local image, representing the local rock features of heterogeneous samples, used for heterogeneous similarity comparison with target local features. The "homogeneous feature similarity" can be understood as the quantified value of the similarity between encoded features of multiple target local images within the same first sample rock image, measuring the similarity of the distribution of local features within the same sample, serving as a benchmark for distribution constraints. The heterogeneous feature similarity can be understood as a quantified value of the similarity between the encoded features of the target local image and the reference local image, measuring the differences in local features between samples from different strata / lithologies, and constructing the boundary of heterogeneous feature distribution. The ranking relationship can be understood as the logical order of numerical magnitude between homogeneous feature similarity and heterogeneous feature similarity, setting reasonable constraints: homogeneous similarity should be higher than heterogeneous similarity, providing a basis for loss calculation. The partial loss can be understood as a self-supervised loss function constructed based on the ranking deviation of homogeneous and heterogeneous feature similarity, quantifying the error when the feature distribution ranking does not meet expectations, and constraining the encoder to optimize the feature distribution representation through backpropagation.

[0054] In one optional implementation, a Siamese network architecture of an initial encoder E and a projection head P is constructed using a first sample rock image as input. The projection head P is a small multilayer perceptron that serves only the contrastive task branch, mapping the high-dimensional feature vectors output by the encoder to a low-dimensional embedding space suitable for calculating the contrastive loss. The other three self-supervised task branches (rotation, reconstruction, and distribution tasks) directly operate on the feature maps or feature vectors output by each layer of the encoder, without passing through the projection head. This projection head is only used during the self-supervised pre-training phase and is discarded after pre-training. Joint multi-task optimization is performed on an unlabeled image set, with the loss function consisting of four branches customized for the geological characteristics of terrestrial shale.

[0055] The first input sample rock image is independently augmented twice, with operations including random cropping, color dithering, and geometric transformation. The augmented image is randomly cropped to 80%–100% of its original size and shrunk to 224×224. It is then horizontally or vertically flipped with a probability of 0.5. Dithering is applied to brightness, contrast, saturation, and hue by ±10%. Gaussian blur with a standard deviation σ ∈ [0.1, 2.0] is applied. Grayscale conversion is performed with a probability of 0.2. Using the SimCLR framework, random data augmentation is performed on the first input sample rock image to obtain an augmented image. High-dimensional features are extracted from both the first sample rock image and the augmented image using an initial encoder. These extracted features are then input into a projection head for dimensionality reduction. Positive sample pairs are generated based on the dimensionality-reduced features, and the contrast loss is determined based on these pairs. For example, InfoNCE (Information Noise-Contrastive Estimation) loss (contrastive loss) can be used. Bring together homologous features and push away heterologous features.

[0056] Apply to the first sample rock image input The rotation operation. Introducing rotational loss, the calculation formula is:

[0057] ;

[0058] In this embodiment, For rotational losses; Let the mean square error function be used. This is the gradient direction histogram feature extraction function for the shallow layer output of the encoder; This represents the first sample rock image input. Apply Angle rotation operation, The symbol indicates the rotation angle; the superscript T indicates matrix transpose.

[0059] A rectangular region in a randomly occluded image is used to train a decoder to reconstruct the occluded content. A reconstruction loss constrained by layer continuity is introduced.

[0060] ;

[0061] In the formula, For reconstruction losses; Image of the first sample rock; The output image reconstructed by the decoder (sample reconstructed image); This represents element-wise multiplication; This is the penalty weight coefficient for the laminar continuity constraint, and its value is greater than 0; For image The gradient field; This is the symbol for the convolution operation; Along the main striation direction Constructed directional derivative convolution kernel; This represents the 2-norm of the vector. This formula imposes a higher penalty on repair errors perpendicular to the texture direction, forcing the model to follow the linear extension law of the bedding during repair, thus preventing the generation of messy false textures.

[0062] Randomly select two large-area overlapping (IoU > 0.6) local images of the target from the same image. , Another image (A sample rock image from the same drilling area as the first sample rock image and with a depth interval greater than a preset threshold, or, the third sample rock image is a sample rock image with a different sample category label than the first sample rock image). Constructing the distribution loss:

[0063] ;

[0064] In this embodiment, Represents the distributed loss, with the interval hyperparameter as the parameter. = 0.5, controls the minimum separation distance between positive and negative sample pairs in the feature space; Hinge loss function, similarity function Cosine similarity is used; This is the local block feature vector extracted by the feature encoder. Optimizing this loss enables the model to map continuously deposited microfacies with similar mineral assemblages to neighboring regions in the feature space.

[0065] Based on the above scheme, optionally, the encoder E can be pre-trained end-to-end on massive unlabeled shale images by constructing an overall loss function.

[0066] Furthermore, the total loss function is defined as:

[0067] ;

[0068] In the formula, This represents the total loss value for multi-task joint pre-training; Indicates comparative loss, For rotational losses, To rebuild the losses, Indicates distributed loss. , , , These are the balancing weights for the contrast loss, rotation loss, reconstruction loss, and distribution loss, respectively. All coefficients are greater than 0 and are used to adjust the contribution ratio of each task to the total loss. A stochastic gradient descent (SGD) optimizer with momentum is used iteratively until convergence. After pre-training, the encoder weights are saved and the projection head is discarded. This encoder is now capable of extracting general geological features such as shale bedding strike, mineral fabric, and grain size distribution. This step outputs a pre-trained feature encoder E, which is capable of extracting general geological features such as shale bedding strike, mineral fabric, and grain size distribution.

[0069] This technical solution constructs homologous and heterologous sample pairs by selecting local images of rocks, calculates feature similarity, constructs distribution loss based on ranking relationship, and optimizes encoder parameters in a self-supervised manner, enabling the encoder to learn the distribution law of clustering features of similar rocks and separating features of dissimilar rocks, effectively improving the lithological feature discrimination and characterization stability.

[0070] S120. Obtain the second sample rock image and the sample category label corresponding to the second sample rock image, and connect the classification head to the output of the pre-trained encoder to obtain the initial lithology recognition model.

[0071] The second sample rock image can be understood as a rock image sample with category labeling used for supervised fine-tuning of the model. It serves as supervised training input data for model tuning in the lithology classification task. The sample category label can be understood as the lithology category identifier corresponding to the second sample rock image, either manually labeled or given by a standard. It serves as the ground truth for model training, guiding parameter updates. The output end can be understood as the location and interface of the feature output of the last layer of the encoder network, outputting the global depth features extracted by the encoder and connecting to the downstream classification head. The classification head can be understood as a classification network module connected to the encoder backend, consisting of fully connected / activation layers, etc., mapping the depth features output by the encoder to specific lithology category probabilities. The initial lithology identification model can be understood as a complete model consisting of a pre-trained encoder with a classification head added, without undergoing fine-tuning for the classification task, forming a complete basic inference link from rock image input to lithology category output.

[0072] After the pre-trained encoder E, a fully connected classification head C is connected, and the first stage of adaptation training is performed using a small number of labeled samples. Specifically, in this stage, the weights of all convolutional layers of the encoder are frozen, but the scaling factor of the last batch normalization layer of the encoder is unfrozen. With bias Parameters. The training objective function uses cross-entropy loss with a regularization term for the prior distribution of shale facies:

[0073] ;

[0074] In the formula, The training loss value; This represents summing over all labeled samples; For the first The true lithological category label (one-hot encoding) of each sample; For the model to the first The lithology category prediction probability vector of each sample output; It is the natural logarithm function; These are the weighting coefficients of the prior distribution regularization term, controlling the strength of the prior constraints; Kullback–Leibler divergence function is used to measure the difference between two probability distributions; This represents the average predicted probability distribution of the model in the current batch. This step involves analyzing geological logging data from the target exploration block to determine the prior probability distribution of various lithologies (e.g., clay shale accounts for 62%, silty shale for 28%, etc.). This formula utilizes macroscopic geological big data as a soft constraint to effectively correct classifier output bias caused by sampling deviations under small sample conditions, thus improving the model's calibration accuracy under the true distribution. This step outputs an initial lithology identification model that has completed pre-training of the classification head.

[0075] S130. Input the second sample rock image into the initial lithology identification model to obtain the sample rock category. Adjust the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model according to the sample rock category and the sample category label to obtain the target lithology identification model.

[0076] The sample rock category can be understood as the predicted lithology category output by the initial lithology identification model based on the input second sample image, providing the model's prediction result, which is compared with the true label to generate training loss. The parameters can be understood as learnable variables such as weights and biases in each layer of the neural network, which determine the network's feature extraction and mapping rules, and their values ​​are continuously optimized through training. The target lithology identification model can be understood as the final lithology identification model that has undergone supervised fine-tuning and iterative optimization, and can be used to accurately complete the automatic lithology classification and identification task of unknown rock images.

[0077] Based on the above scheme, optionally, adjusting the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model according to the sample rock category and the sample category label further includes: determining the principal striation orientation field angle of the fourth and fifth sample rock images adjacent along the depth direction in the second sample rock image; during the training process of the initial lithology identification model, if the difference between the principal striation orientation field angles of the fourth and fifth sample rock images is less than a preset angle threshold and the initial lithology identification model outputs different lithology category prediction results for the fourth and fifth sample rock images, adjusting the model parameters of the initial lithology identification model so that the fourth and fifth sample rock images output the same sample rock category.

[0078] The fourth sample rock image can be understood as one of the rock images in the second sample rock image, used to form a "depth-adjacent pair" with the fifth sample rock image to determine whether the lithology prediction should be consistent. The fifth sample rock image can be understood as another rock image adjacent to the fourth sample rock image in the depth direction, and together with the fourth sample rock image, serves as an adjacent sample pair for laminar consistency constraints. The main laminar direction field angle can be understood as the angle describing the dominant direction of laminar (bedding) in the rock image, reflecting the characteristics of rock deposition or tectonic direction; when the angle change is small, it can serve as a geological prior for "belonging to the same lithology". The preset angle threshold can be understood as the critical angle value for judging whether "laminar directions are sufficiently close". When the angle difference is less than this threshold, the two images are considered to be continuous in laminar structure and should tend to output the same lithology. The lithology category prediction result can be understood as the category output by the initial lithology identification model for the input fourth sample rock image and fifth sample rock image.

[0079] For adjacent core sequence images along the depth direction, a structural tensor smoothing regularization term is introduced to eliminate non-geological jumps in the prediction results.

[0080] Define two adjacent frames , The principal striation direction fields are respectively and Construct a sequence smoothing loss function and embed the implicit optimization objective into the inference process:

[0081] ;

[0082] In the formula, For sequence smoothing loss; and The first Frame and the The main texture orientation angle of a frame image (in radians or degrees); Represents the absolute value of the difference between two angles (or the Euclidean distance); This is an indicator function that takes the value 1 when the condition inside the parentheses is true, and 0 otherwise. and These are the lithology category prediction results for the corresponding frames. The formula enforces the following logical constraints: if the bedding attitude (dip and strike) of adjacent depths remains consistent, the model tends to maintain the same lithology prediction results; lithology category switching is only allowed when the bedding angle undergoes a sudden change exceeding a threshold (reflecting drastic changes in the sedimentary environment or fault influence). This step effectively eliminates the "flickering" of lithology identification results caused by local image noise or uneven illumination, significantly improving the continuity and geological rationality of the longitudinal lithology profile. This step outputs the longitudinal lithology profile identification results after sequence smoothing optimization.

[0083] By adopting this technical solution, the consistent constraint of the direction field angle of the main laminar layer is introduced, and the geological continuity prior is effectively utilized to correct the unreasonable lithological jump prediction of the model on adjacent rock images, thereby improving the smoothness and geological rationality of the well profile lithology identification results.

[0084] Based on the above scheme, optionally, before obtaining the target lithology identification model, the method further includes: determining a first prediction accuracy index based on the second sample rock image and its corresponding sample rock category; determining a second prediction accuracy index based on the prediction consistency of multiple sets of adjacent sixth and seventh sample rock images along the depth direction; and adjusting the model parameters of the initial lithology identification model if the first prediction accuracy index is less than a first preset threshold or the second prediction accuracy index is less than a second preset threshold.

[0085] The first prediction accuracy index can be understood as a quantitative index of single-sample lithology identification accuracy obtained by comparing the predicted lithology with the true label based on the second sample rock image. It measures the accuracy of the model in classifying independent lithology in a single rock image and serves as the basis for judging the convergence of model iterations. The sixth sample rock image can be understood as one of the rock sample images in a group of adjacent rock images selected along the stratigraphic depth direction. It forms a stratigraphic adjacent image pair with the seventh sample and participates in the longitudinal lithology prediction consistency evaluation. The seventh sample rock image can be understood as the rock sample image that is adjacent to the sixth sample rock image along the stratigraphic depth direction (the order of the sixth and seventh is not specifically defined), representing the continuity and rationality of the model's lithology prediction in continuous stratigraphic layers. The second prediction accuracy index can be understood as an overall consistency quantitative index obtained by statistically analyzing the lithology prediction consistency of multiple groups of depth-adjacent rock images. It measures the continuity and rationality of the model's longitudinal lithology prediction along the stratigraphic depth and supplements the geological constraint evaluation beyond single-sample accuracy. The first preset threshold can be understood as the minimum qualified critical value of the first prediction accuracy index set in advance, serving as the judgment standard for single-sample identification accuracy. If it is lower than the threshold, parameter tuning and training need to continue. The second preset threshold can be understood as the minimum qualified critical value of the second prediction accuracy index set in advance, which serves as the judgment standard for the consistency of formation lithology prediction. If the standard is not met, the model parameter adjustment will be triggered.

[0086] In one alternative implementation, the first prediction accuracy index can be determined by the macro-average F1 score. For example, when the macro-average F1 score exceeds 0.93 and the laminar continuity identification accuracy exceeds 95%, the model parameters of the initial lithology identification model are adjusted.

[0087] This technical solution uses dual indicators to monitor the consistency between classification accuracy and profile prediction. When either indicator fails to meet the standard, parameter adjustments are triggered to ensure that the model ultimately achieves both high classification accuracy and depth continuity that conforms to geological laws.

[0088] Based on the above scheme, optionally, after obtaining the target lithology identification model, the method further includes: acquiring a target rock image, inputting the target rock image into the target lithology identification model, and obtaining the target lithology category and the lithology category prediction probability vector.

[0089] The target rock image can be understood as the original image of the actual rock to be detected for automatic lithology identification, serving as the input data source for the target lithology identification model to conduct actual lithology reasoning and identification. The target lithology category can be understood as the final lithology category result output by the target lithology identification model based on the target rock image, providing an intuitive rock naming result to meet the application requirements of geological exploration lithology classification and interpretation. The lithology category prediction probability vector can be understood as the probability distribution vector output by the classification head, where each element corresponds to the prediction probability of a lithology category, summing to 1. This represents the model's confidence level for each lithology category and can be used for result credibility assessment, uncertainty analysis, and auxiliary geological comprehensive interpretation.

[0090] This technical solution enables automated and accurate identification of unknown rock images and outputs category probabilities to provide prediction confidence, supporting subsequent geological decision-making.

[0091] Based on the above scheme, optionally, after obtaining the target lithology category and the lithology category prediction probability vector, the method further includes: correcting the target lithology category based on a pre-constructed shale lithofacies knowledge graph, according to the Bayesian method and the lithology category prediction probability vector.

[0092] The shale lithofacies knowledge graph can be understood as a structured geological knowledge base that describes entities and relationships such as shale (or related lithologies) lithofacies, sedimentary environments, mineral composition, physical properties, and symbiotic relationships. It provides geological priors (such as "which lithologies / minerals are often associated with a certain lithofacies" and "what lithofacies are more likely to occur under a certain environment") to constrain or correct purely data-driven prediction results. The Bayesian method can be understood as a probabilistic reasoning method based on Bayes' theorem (posterior ∝ prior × likelihood), integrating model prediction probabilities with geological prior knowledge to achieve probabilistic iterative correction and optimal inference of lithology categories.

[0093] Using the predicted probability vector of the input image as the processing object, a knowledge graph is constructed that includes terrestrial shale lithology categories, key mineral assemblages, and typical sedimentary structural constraints. .

[0094] Furthermore, regarding the probability vector output by the model... (The model's lithology category prediction probability vector) undergoes posterior correction based on Bayesian inference:

[0095] ;

[0096] In the formula, Given an input image Under the condition of lithological category The final corrected posterior probability; The normalization factor is used to ensure that the sum of the probabilities of all classes is 1. Its calculation formula is: ; This is the probability vector for predicting the lithology category of the model. For knowledge graph-based Defined geological prior likelihood term; Representing categories in a knowledge graph The set of adjacent nodes that have a direct constraint relationship. Defined as: if the model predicts the category If the cosine similarity between the essential mineral assemblage (e.g., the content of felsic minerals must be greater than 45%) and the mineral abundance feature vector extracted from the encoder layer is lower than a set threshold, then the probability of this item is set to a minimum value. ( This step, acting as a hard constraint layer, can automatically correct errors that violate basic geological principles, such as misclassifying layered shale rich in clay minerals as calcareous sandstone.

[0097] Specifically, the knowledge graph constructed in this embodiment includes the following triplet constraint rules:

[0098] Table 1. Examples of constraint rules for shale lithofacies-mineral-structure knowledge graph

[0099]

[0100] This technical solution integrates geological knowledge graphs with Bayesian inference to effectively correct model prediction biases and improve the geological rationality and interpretability of shale lithofacies identification results.

[0101] The technical solution of this invention involves acquiring a first sample rock image, pre-training an initial encoder based on the first sample rock image using multiple learning tasks to obtain a pre-trained encoder. The initial encoder includes multiple feature extraction layers, and multi-task pre-training allows the encoder to learn more general and robust rock feature representations. Next, a second sample rock image and its corresponding sample category label are acquired. A classification head is then connected to the output of the pre-trained encoder to obtain an initial lithology identification model. This model can be built based on the pre-trained encoder and classification head, constructing a complete lithology classification identification chain. The second sample rock image is input into the initial lithology identification model to obtain the sample rock category. Based on the sample rock category and the sample category label, the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model are adjusted to obtain a target lithology identification model. By specifically fine-tuning some feature layers and the classification head, the model accurately adapts to the lithology identification task, solving the problems of insufficient rock image feature representation and low lithology identification accuracy in small sample scenarios. Through multi-task pre-training combined with supervised fine-tuning, the accuracy and generalization ability of rock image lithology identification are improved.

[0102] Example 2

[0103] Figure 2 This is a flowchart of a training method for a lithology identification model provided in Embodiment 2 of the present invention. This embodiment is a further refinement based on the above embodiment, which adjusts the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model according to the sample rock type and the sample category label. Optionally, the feature extraction layer includes multiple convolutional layers; adjusting the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model according to the sample rock category and the sample category label includes: dividing the multiple feature extraction layers into a bottom layer network, a middle layer network, and a high layer network, wherein the bottom layer network is adjacent to the input of the initial encoder, the high layer network is adjacent to the output of the initial encoder, and the middle layer network is located between the bottom layer network and the high layer network; for the middle layer network, obtaining the second rock coding features output by multiple convolutional layers in the middle layer network, and performing a fast Fourier transform on the second rock coding features to obtain rock frequency domain features; determining the target convolutional channel to be updated in the middle layer network according to the rock frequency domain features corresponding to the multiple convolutional layers; determining a second target loss according to the sample rock category and its corresponding sample category label, freezing the parameters of the bottom layer network, and adjusting the parameters of the target convolutional channel in the middle layer network, the high layer network, and the classification head according to the second target loss. For specific implementation details, please refer to the description of this embodiment. Technical features that are the same as or similar to those in the foregoing embodiments will not be repeated here.

[0104] like Figure 2 As shown, the method may specifically include:

[0105] S210. Obtain a first sample rock image, and pre-train an initial encoder based on the first sample rock image according to multiple learning tasks to obtain a pre-trained encoder. The initial encoder includes multiple feature extraction layers, and the feature extraction layers include multiple convolutional layers.

[0106] The convolutional layer can be understood as a network layer that uses convolutional kernels to slide on the feature map to calculate local weighted sums, which can extract basic visual features such as edges, textures, layers, and details of rock images layer by layer.

[0107] S220. Obtain the second sample rock image and the sample category label corresponding to the second sample rock image, and connect the classification head to the output of the pre-trained encoder to obtain the initial lithology recognition model.

[0108] S230. Input the second sample rock image into the initial lithology identification model to obtain the sample rock category. Divide the multiple feature extraction layers into a bottom layer network, a middle layer network, and a high layer network. The bottom layer network is adjacent to the input end of the initial encoder, the high layer network is adjacent to the output end of the initial encoder, and the middle layer network is located between the bottom layer network and the high layer network.

[0109] The bottom-layer network can be understood as the shallow network portion of the feature extraction layer near the initial encoder image input, responsible for extracting basic low-level features of the rock image, such as contours, edges, and simple textures. The middle-layer network can be understood as the intermediate layer network portion of the feature extraction layer located between the bottom-layer and high-layer networks, used to combine and abstract the basic low-level features, extracting local rock structure and texture combination features. The high-layer network can be understood as the deep network portion of the feature extraction layer near the encoder output, condensing global high-order semantic features and outputting abstract representations strongly correlated with lithology.

[0110] S240. For the middle layer network, obtain the second rock coding features output by multiple convolutional layers in the middle layer network, and perform a fast Fourier transform on the second rock coding features to obtain rock frequency domain features. Determine the target convolutional channel to be updated in the middle layer network based on the rock frequency domain features corresponding to the multiple convolutional layers.

[0111] The second rock coding feature can be understood as the intermediate layer rock feature map / feature vector output by the forward inference of each convolutional layer in the intermediate network, carrying the rock structure and texture details extracted by the intermediate network, and serving as the input for frequency domain transformation. The rock frequency domain feature can be understood as the frequency domain representation feature obtained after the second rock coding feature is subjected to fast Fourier transform, which describes the frequency domain laws such as rock texture density, bedding period, and structural undulation, and is used to screen effective convolutional channels. The target convolutional channel can be understood as the convolutional channel in the intermediate network that has been determined by frequency domain feature analysis and needs to participate in parameter update and training, locating the effective channel that contributes greatly to lithology identification, and realizing targeted and refined parameter optimization (partial adjustment).

[0112] Based on the above scheme, optionally, determining the target convolutional channel to be updated in the middle layer network according to the rock frequency domain features corresponding to the multiple convolutional layers includes: determining the frequency domain activation amplitude of multiple channels in the middle layer network according to the rock frequency domain features, and determining the channel with the frequency domain activation amplitude greater than a preset amplitude threshold as the target convolutional channel to be updated.

[0113] The frequency domain activation amplitude can be understood as a scalar derived from the frequency domain characteristics of the rock, representing the overall activation intensity of a certain channel, and is used to measure the "response intensity" of the channel in the frequency domain: the larger the amplitude, the more discriminative / active the channel is to the current input.

[0114] In one alternative implementation, a Fast Fourier Transform (FFT) is performed on the feature maps output by each feature layer of encoder E, and layered differential fine-tuning is implemented based on the frequency domain energy distribution.

[0115] Furthermore, the feature map F is decomposed into low-frequency components. (Reflecting the layering distribution and color background) and high-frequency components (Reflects mineral grain boundaries and microcracks).

[0116] The underlying network's main response The component, which exhibits transdomain invariance across shale formations in different basins, has all weights strictly frozen.

[0117] The mid-layer network response contains high-frequency components. This invention proposes a frequency-domain gating fine-tuning mechanism: calculating the frequency-domain activation amplitude of the output feature map of each convolutional kernel, and only gating high-frequency components with amplitudes greater than a threshold. The channel weights are updated. The weight update formula is:

[0118] ;

[0119] In the formula, For the first The weight matrix of the middle layer network in the next iteration; This is the updated weight matrix; The learning rate during the fine-tuning phase; This is a binary mask vector generated based on the frequency domain activation amplitude, where the corresponding activation amplitude is greater than a threshold. The channel position is 1 if it is not 0 otherwise; This represents element-wise multiplication; For mission losses For middle layer weights The gradient is used. This mechanism ensures that fine-tuning is only applied to the specific mineral assemblages and structural features of the current well section, preventing small-sample training from destroying the laminar continuity perception capability established in the pre-training stage. This step outputs the target lithology identification model after progressive thawing and fine-tuning. For any input image The model outputs a lithology category prediction probability vector. .

[0120] The high-level network has undergone full fine-tuning, and the parameters of the high-level semantic layer and classification head have been fully opened for updates.

[0121] This technical solution uses frequency domain activation amplitude to screen important channels in the middle-layer network, achieving channel-level fine-tuning. While focusing on key features, it suppresses redundant parameter updates, improving model convergence efficiency and lithology identification accuracy.

[0122] S250. Determine the second target loss based on the sample rock type and its corresponding sample category label, freeze the parameters of the bottom layer network, and adjust the parameters of the target convolution channel in the middle layer network, the high layer network, and the classification head based on the second target loss to obtain the target lithology identification model.

[0123] The second objective loss can be understood as a classification loss function calculated from the model's predicted rock category and the actual sample category label, quantifying the error between the lithology prediction and the actual label, and providing a backpropagation gradient constraint signal. Parameter freezing can be understood as fixing the network layer weights and bias parameters, and not performing backpropagation updates during training, preserving the general rock-related basic features already learned by the lower-level network, and avoiding fine-tuning from disrupting the pre-training prior.

[0124] The technical solution of this invention achieves efficient parameter fine-tuning through layered freezing and frequency domain channel selection. While preserving the general features of the bottom layer, it accurately optimizes the key channels in the middle layer and the semantic expression in the high layer, which significantly improves the model's ability to perceive rock texture features and the accuracy of lithology identification.

[0125] Example 3

[0126] Figure 3 This is a schematic diagram of the structure of a training device for a lithology identification model provided in Embodiment 3 of the present invention. Figure 3 As shown, the device includes: a pre-trained encoder determination module 310, an initial model determination module 320, and a target model determination module 330. Among them,

[0127] The pre-trained encoder determination module 310 is used to acquire a first sample rock image, pre-train an initial encoder based on the first sample rock image according to multiple learning tasks, and obtain a pre-trained encoder, wherein the initial encoder includes multiple feature extraction layers; the initial model determination module 320 is used to acquire a second sample rock image and the sample category label corresponding to the second sample rock image, and connect a classification head to the output of the pre-trained encoder to obtain an initial lithology recognition model; the target model determination module 330 is used to input the second sample rock image into the initial lithology recognition model to obtain the sample rock category, and adjust the parameters of at least some feature extraction layers and the classification head in the initial lithology recognition model according to the sample rock category and the sample category label to obtain a target lithology recognition model.

[0128] The technical solution of this invention involves obtaining a first sample rock image through a pre-trained encoder determination module, and pre-training an initial encoder based on the first sample rock image using multiple learning tasks to obtain a pre-trained encoder. The initial encoder includes multiple feature extraction layers and can utilize multi-task pre-training to enable the encoder to learn more general and robust rock feature representations. Next, a second sample rock image and its corresponding sample category label are obtained through an initial model determination module. Finally, a classification head is connected to the output of the pre-trained encoder to obtain an initial lithology recognition model. A recognition model can be built based on the pre-trained encoder and the classification head. The initial lithology identification model is constructed by establishing a complete lithology classification and identification link. The second sample rock image is input into the initial lithology identification model through the target model determination module to obtain the sample rock category. Based on the sample rock category and the sample category label, the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model are adjusted to obtain the target lithology identification model. By selectively fine-tuning some feature layers and the classification head, the model is made to accurately adapt to the lithology identification task, solving the problems of insufficient rock image feature representation and low lithology identification accuracy in small sample scenarios. Through multi-task pre-training combined with supervised fine-tuning, the accuracy and generalization application capability of rock image lithology identification are improved.

[0129] Optionally, the learning task includes a contrast task; the pre-trained encoder includes a first pre-training submodule. The first pre-training submodule is configured to: perform image enhancement on the first sample rock image to obtain a sample enhanced image; input the first sample rock image and the sample enhanced image into the initial encoder to obtain a first encoded feature of the first sample rock image and a second encoded feature of the sample enhanced image; input the first encoded feature and the second encoded feature into a projection head to perform dimensionality reduction processing on the first encoded feature and the second encoded feature to obtain a first low-dimensional feature and a second low-dimensional feature; determine a contrast loss based on the first low-dimensional feature and the second low-dimensional feature; and adjust the parameters of the initial encoder based on the contrast loss to obtain a pre-trained encoder.

[0130] Optionally, the learning task includes a rotation task; the pre-trained encoder includes a second pre-training submodule. The second pre-training submodule is configured to rotate the first sample rock image to obtain a rotated image, input the first sample rock image and the rotated image into the initial encoder to obtain a third encoded feature of the first sample rock image and a fourth encoded feature of the rotated image; convert the third encoded feature into a first histogram feature, convert the fourth encoded feature into a second histogram feature, determine a rotation loss based on the first histogram feature and the second histogram feature, and adjust the parameters of the initial encoder based on the rotation loss to obtain the pre-trained encoder.

[0131] Optionally, the learning task includes a reconstruction task; the pre-trained encoder includes a third pre-training submodule. The third pre-training submodule is used to occlude a portion of the first sample rock image to obtain a missing sample image; reconstruct the missing sample image using a decoder to obtain a reconstructed sample image; input the first sample rock image and the reconstructed sample image into the initial encoder to obtain a fifth coding feature of the first sample rock image and a sixth coding feature of the reconstructed sample image; determine a reconstruction loss based on the fifth and sixth coding features; and adjust the parameters of the initial encoder based on the reconstruction loss to obtain the pre-trained encoder.

[0132] Optionally, the learning task includes a distributed task; the pre-trained encoder includes a fourth pre-training submodule. The fourth pre-training submodule is configured to select multiple target local images from the first sample rock image, and obtain a reference local image from the third sample rock image, wherein the third sample rock image is a sample rock image from the same drilling area as the first sample rock image and with a depth interval greater than a preset threshold, or the third sample rock image is a sample rock image with a different sample category label than the first sample rock image; input the multiple target local images and the reference local image into the initial encoder to obtain the seventh coding feature of the target local image and the eighth coding feature of the reference local image, determine the homologous feature similarity between the multiple target local images, and determine the heterologous feature similarity between the target local image and the reference local image; determine the distributed loss based on the ranking relationship of the homologous feature similarity and the heterologous feature similarity, and adjust the parameters of the initial encoder based on the distributed loss to obtain the pre-trained encoder.

[0133] Optionally, the feature extraction layer includes multiple convolutional layers; the target model determination module includes a network layering submodule, a target convolutional channel determination submodule, and a first adjustment submodule. The network layering submodule is used to divide the multiple feature extraction layers into a bottom layer network, a middle layer network, and a high layer network, wherein the bottom layer network is adjacent to the input of the initial encoder, the high layer network is adjacent to the output of the initial encoder, and the middle layer network is located between the bottom layer network and the high layer network; the target convolutional channel determination submodule is used to obtain the second rock coding features output by multiple convolutional layers in the middle layer network, perform a Fast Fourier Transform on the second rock coding features to obtain rock frequency domain features, and determine the target convolutional channel to be updated in the middle layer network based on the rock frequency domain features corresponding to the multiple convolutional layers; the first adjustment submodule is used to determine a second target loss based on the sample rock category and its corresponding sample category label, freeze the parameters of the bottom layer network, and adjust the parameters of the target convolutional channel in the middle layer network, the high layer network, and the classification head based on the second target loss.

[0134] Optionally, the target convolutional channel determination submodule is specifically used to determine the frequency domain activation amplitude of multiple channels in the middle layer network according to the frequency domain characteristics of the rock, and determine the channels with frequency domain activation amplitude greater than a preset amplitude threshold as target convolutional channels to be updated.

[0135] Optionally, the training device for the lithology identification model further includes an application module. The application module is used to acquire a target rock image after obtaining the target lithology identification model, input the target rock image into the target lithology identification model, and obtain the target lithology category and a lithology category prediction probability vector.

[0136] Optionally, the training device for the lithology identification model further includes a correction module. The correction module is further configured to, after obtaining the target lithology category and the predicted probability vector of the lithology category, correct the target lithology category based on a pre-constructed shale lithofacies knowledge graph, according to a Bayesian method and the predicted probability vector of the lithology category.

[0137] Optionally, the target model determination module includes a second adjustment submodule. The second adjustment submodule is used to determine the principal lamellar orientation field angles of adjacent fourth and fifth sample rock images along the depth direction in the second sample rock image; during the training process of the initial lithology identification model, if the difference in principal lamellar orientation field angles between the fourth and fifth sample rock images is less than a preset angle threshold and the initial lithology identification model outputs different lithology category prediction results for the fourth and fifth sample rock images, the model parameters of the initial lithology identification model are adjusted so that the fourth and fifth sample rock images output the same sample rock category.

[0138] Optionally, the target model determination module includes a third adjustment submodule. This third adjustment submodule is used to: determine a first prediction accuracy index based on the second sample rock image and its corresponding sample rock category before obtaining the target lithology identification model; determine a second prediction accuracy index based on the prediction consistency of multiple sets of adjacent sixth and seventh sample rock images along the depth direction; and adjust the model parameters of the initial lithology identification model if the first prediction accuracy index is less than a first preset threshold or the second prediction accuracy index is less than a second preset threshold.

[0139] The training device for the lithology identification model provided in the embodiments of the present invention can execute the training method for the lithology identification model provided in any embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the execution method.

[0140] Example 4

[0141] Figure 4A schematic diagram of an electronic device 10, which can be used to implement embodiments of the present invention, is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention described and / or claimed herein.

[0142] like Figure 4 As shown, the electronic device 10 includes at least one processor 11 and a memory, such as a read-only memory (ROM) 12 or a random access memory (RAM) 13, communicatively connected to the at least one processor 11. The memory stores computer programs executable by the at least one processor. The processor 11 can perform various appropriate actions and processes based on the computer program stored in the ROM 12 or loaded from storage unit 18 into the RAM 13. The RAM 13 can also store various programs and data required for the operation of the electronic device 10. The processor 11, ROM 12, and RAM 13 are interconnected via a bus 14. An input / output (I / O) interface 15 is also connected to the bus 14.

[0143] Multiple components in electronic device 10 are connected to I / O interface 15, including: input unit 16, such as keyboard, mouse, etc.; output unit 17, such as various types of displays, speakers, etc.; storage unit 18, such as disk, optical disk, etc.; and communication unit 19, such as network card, modem, wireless transceiver, etc. Communication unit 19 allows electronic device 10 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0144] Processor 11 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. Processor 11 performs the various methods and processes described above, such as a method for training a lithology identification model.

[0145] In particular, according to embodiments of the present invention, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication unit 19, or installed from storage unit 18, or installed from ROM 12. When the computer program is executed by processor 11, it performs the functions defined in the methods of the embodiments of the present invention.

[0146] In some embodiments, a method for training a lithology identification model can be implemented as a computer program tangibly contained in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program can be loaded and / or installed on electronic device 10 via ROM 12 and / or communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the method for training a lithology identification model described above can be performed. Alternatively, in other embodiments, processor 11 can be configured to perform a method for training a lithology identification model by any other suitable means (e.g., by means of firmware).

[0147] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0148] Computer programs used to implement the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that when executed by the processor, the computer programs cause the functions / operations specified in the flowcharts and / or block diagrams to be performed. The computer programs may be executed entirely on a machine, partially on a machine, or as a standalone software package, partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0149] In the context of this invention, a computer-readable storage medium can be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus, or device. A computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination thereof. Alternatively, a computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

[0150] To provide interaction with a user, the systems and techniques described herein can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the electronic device. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0151] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or middleware components (e.g., application servers), or frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.

[0152] A computing system can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a hosting product within the cloud computing service system to address the shortcomings of traditional physical hosts and VPS services, such as high management difficulty and weak business scalability.

[0153] It should be understood that the various forms of processes shown above can be used, with steps reordered, added, or deleted. For example, the steps described in this invention can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this invention can be achieved, and this is not limited herein.

[0154] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.

Claims

1. A method for training a lithology identification model, characterized in that, include: A first sample rock image is acquired, and an initial encoder is pre-trained based on the first sample rock image according to multiple learning tasks to obtain a pre-trained encoder. The initial encoder includes multiple feature extraction layers. The second sample rock image and the corresponding sample category label are obtained, and a classification head is connected to the output of the pre-trained encoder to obtain an initial lithology recognition model. The second sample rock image is input into the initial lithology identification model to obtain the sample rock category. Based on the sample rock category and the sample category label, the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model are adjusted to obtain the target lithology identification model.

2. The method according to claim 1, characterized in that, The learning task includes a comparison task; the pre-training of the initial encoder based on the first sample rock image using multiple learning tasks to obtain a pre-trained encoder includes: Image enhancement is performed on the first sample rock image to obtain a sample enhanced image. The first sample rock image and the sample enhanced image are then input into the initial encoder to obtain a first coded feature of the first sample rock image and a second coded feature of the sample enhanced image. The first encoded feature and the second encoded feature are input into the projection head so that the projection head performs dimensionality reduction processing on the first encoded feature and the second encoded feature to obtain a first low-dimensional feature and a second low-dimensional feature. The contrast loss is determined based on the first low-dimensional feature and the second low-dimensional feature, and the parameters of the initial encoder are adjusted based on the contrast loss to obtain the pre-trained encoder.

3. The method according to claim 1, characterized in that, The learning task includes a rotation task; the pre-training of the initial encoder based on the first sample rock image using multiple learning tasks to obtain a pre-trained encoder includes: The first sample rock image is rotated to obtain a rotated image. The first sample rock image and the rotated image are then input into the initial encoder to obtain the third coded feature of the first sample rock image and the fourth coded feature of the rotated image. The third encoded feature is converted into a first histogram feature, and the fourth encoded feature is converted into a second histogram feature. The rotation loss is determined based on the first histogram feature and the second histogram feature. The parameters of the initial encoder are adjusted based on the rotation loss to obtain a pre-trained encoder.

4. The method according to claim 1, characterized in that, The learning task includes a reconstruction task; the pre-training of the initial encoder based on the first sample rock image using multiple learning tasks to obtain a pre-trained encoder includes: By partially obscuring the first sample rock image, a missing sample image is obtained. The missing sample image is then reconstructed using a decoder to obtain a reconstructed sample image. The first sample rock image and the sample reconstructed image are input into the initial encoder to obtain the fifth coding feature of the first sample rock image and the sixth coding feature of the sample reconstructed image; The reconstruction loss is determined based on the fifth and sixth coding features, and the parameters of the initial encoder are adjusted based on the reconstruction loss to obtain the pre-trained encoder.

5. The method according to claim 1, characterized in that, The learning task includes a distributed task; the pre-training of the initial encoder based on the first sample rock image using multiple learning tasks to obtain a pre-trained encoder includes: Multiple target local images are selected from the first sample rock image, and a reference local image is obtained from the third sample rock image, wherein the third sample rock image is a sample rock image from the same drilling area as the first sample rock image and whose depth spacing is greater than a preset threshold, or the third sample rock image is a sample rock image with a different sample category label than the first sample rock image. Multiple target local images and the reference local image are input into the initial encoder to obtain the seventh coding feature of the target local image and the eighth coding feature of the reference local image, determine the homologous feature similarity between the multiple target local images, and determine the heterologous feature similarity between the target local image and the reference local image; The distribution loss is determined based on the ranking relationship between the similarity of the homologous features and the similarity of the heterologous features. The parameters of the initial encoder are adjusted based on the distribution loss to obtain the pre-trained encoder.

6. The method according to claim 1, characterized in that, The feature extraction layer includes multiple convolutional layers; adjusting the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model according to the sample rock type and the sample category label includes: The multiple feature extraction layers are divided into a bottom layer network, a middle layer network, and a high layer network, wherein the bottom layer network is adjacent to the input end of the initial encoder, the high layer network is adjacent to the output end of the initial encoder, and the middle layer network is located between the bottom layer network and the high layer network. For the middle layer network, the second rock coding features output by multiple convolutional layers in the middle layer network are obtained, and the second rock coding features are subjected to fast Fourier transform to obtain rock frequency domain features. The target convolutional channel to be updated in the middle layer network is determined based on the rock frequency domain features corresponding to the multiple convolutional layers. The second target loss is determined based on the sample rock type and its corresponding sample category label. The parameters of the bottom layer network are frozen, and the parameters of the target convolution channel in the middle layer network, the high layer network, and the classification head are adjusted according to the second target loss.

7. The method according to claim 6, characterized in that, The step of determining the target convolutional channel to be updated in the middle layer network based on the rock frequency domain features corresponding to multiple convolutional layers includes: Based on the frequency domain characteristics of the rock, the frequency domain activation amplitude of multiple channels in the middle layer network is determined, and the channels with frequency domain activation amplitude greater than a preset amplitude threshold are determined as target convolutional channels to be updated.

8. The method according to claim 1, characterized in that, After obtaining the target lithology identification model, the following is also included: Acquire a target rock image, input the target rock image into the target lithology identification model, and obtain the target lithology category and the lithology category prediction probability vector.

9. The method according to claim 8, characterized in that, After obtaining the target lithology category and the lithology category prediction probability vector, the following is also included: Based on a pre-constructed shale lithofacies knowledge graph, the target lithology category is corrected according to the Bayesian method and the predicted probability vector of the lithology category.

10. The method according to claim 1, characterized in that, The step of adjusting the parameters of at least some feature extraction layers and the classification head in the initial lithology identification model according to the sample rock type and the sample category label further includes: Determine the principal laminar direction field angles of the fourth and fifth sample rock images that are adjacent along the depth direction in the second sample rock image; During the training process of the initial lithology identification model, if the angle difference between the main laminar direction fields of the fourth sample rock image and the fifth sample rock image is less than a preset angle threshold and the initial lithology identification model outputs different lithology category prediction results for the fourth sample rock image and the fifth sample rock image, the model parameters of the initial lithology identification model are adjusted so that the fourth sample rock image and the fifth sample rock image output the same sample rock category.

11. The method according to claim 1, characterized in that, Before obtaining the target lithology identification model, the following steps are also included: The first prediction accuracy index is determined based on the second sample rock image and its corresponding sample rock category; The second prediction accuracy index is determined based on the predictive consistency of multiple sets of adjacent sixth and seventh sample rock images along the depth direction. If the first prediction accuracy index is less than the first preset threshold or the second prediction accuracy index is less than the second preset threshold, the model parameters of the initial lithology identification model are adjusted.

12. A computer program product, characterized in that, It includes a computer program that, when executed by a processor, implements the training method for the lithology identification model according to any one of claims 1-11.