An underground reservoir lithology identification method based on active learning and multi-person cooperation

By constructing a globally connected nearest neighbor propagation tree and a multi-user collaboration mechanism, tasks are distributed asynchronously and in parallel, and user weights are dynamically updated. This solves the problems of low efficiency and error caused by single-user annotation, and achieves efficient and accurate lithology identification.

CN122196343APending Publication Date: 2026-06-12SOUTHWEST PETROLEUM UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SOUTHWEST PETROLEUM UNIV
Filing Date
2026-03-13
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing lithology identification methods rely on single-user annotation, which leads to extended annotation cycles and occasional errors. They cannot effectively avoid the subjective problems caused by geological ambiguity, and traditional methods become inefficient when the data scale increases.

Method used

A multi-user collaboration mechanism is introduced. By constructing a globally connected nearest neighbor propagation tree, query tasks are asynchronously and in parallel distributed to multiple users. The annotation results are integrated in real time based on user reliability weights, and the weights of ordinary users are dynamically updated to optimize decision quality.

🎯Benefits of technology

It significantly improves the efficiency and accuracy of lithology identification labeling, reduces labeling costs, effectively suppresses the impact of labeling noise, and improves the robustness of identification results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122196343A_ABST
    Figure CN122196343A_ABST
Patent Text Reader

Abstract

The application discloses a kind of underground reservoir lithology identification method based on active learning and multi-person cooperation.Aiming at the problem that the existing method excessively relies on single expert, leading to low efficiency and being easily affected by subjective error, the application first preprocesses lithology data and initializes multi-user cooperation environment;Second, a globally connected affinity propagation tree is constructed;Then, based on node ambiguity and topological importance, a query sequence is generated, and the task is distributed asynchronously and in parallel to multiple users for labeling;Then, the labeling results are integrated using dynamic reliability weights, and the weights of ordinary users are updated in real time;Finally, the lithology class cluster is updated, and the class is diffused on the propagation tree, and the final result is obtained by iteration.The application effectively reduces the dependence on experts and labeling cost, significantly improves the labeling efficiency, accuracy and robustness of lithology identification.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of oil and gas geological exploration technology, and in particular to a method for identifying underground reservoir lithology based on active learning and multi-person collaboration. Background Technology

[0002] As lithology identification gradually moves towards digitalization and automation, improving the accuracy of lithology identification with lower costs from massive amounts of data has become a core challenge. Against this backdrop, active learning technology based on human-computer interaction offers a new approach to solving this problem. This method rapidly labels sample lithology by constructing a nearest-neighbor diffusion network. Simultaneously, it selects the most informative samples based on the topological relationships between lithology data nodes and assigns them to users for labeling, obtaining lithology identification results with lower labeling costs. This type of method effectively improves identification accuracy and saves time and resources. However, existing methods still have some shortcomings: on the one hand, they are usually based on the idealized assumption that "annotators are always reliable," directly using a single user's labeling as the final lithology identification result. However, due to the inherent ambiguity of geology and the different experience and knowledge reserves of each user, their judgment may be subjective when facing complex and diverse lithology samples, making it impossible to avoid accidental labeling errors. On the other hand, they rely on a single user for serial labeling. In traditional active learning lithology identification, users must sequentially complete the query sequence generated by the system, like processing a linear task queue, and must wait for the user to complete all labeling tasks in the current batch before starting the next round of queries. As the data volume increases, a large number of query tasks are assigned to a single user in sequence, resulting in a significant extension of the annotation cycle. Summary of the Invention

[0003] To address the shortcomings of current technologies mentioned in the background section, this invention proposes a lithology identification method integrating a multi-user collaboration mechanism based on active learning. While retaining the core advantages of active learning in reducing annotation costs and improving accuracy, it introduces a multi-user collaboration mechanism, asynchronously and in parallel distributing query tasks to multiple users, and then dynamically integrating the annotations from multiple users as the final lithology identification result; simultaneously, it updates the reliability weights of ordinary users in real time. This invention employs the following technical solution to achieve the above objectives, including the following steps: S1. Acquire lithological data containing multidimensional logging parameters and preprocess it. Initialize a multi-user collaborative annotation environment, which includes experts and ordinary users, and initialize dynamic reliability weights for different users. S2. Based on the preprocessed lithological data, construct a globally connected nearest neighbor propagation tree that reflects the spatial distribution of strata characteristics. ; Calculate the fuzziness of lithological data nodes representing the degree of lithological category mixing, and generate a query sequence containing lithological judgment tasks by combining the topological importance of the nodes. ; S3. Based on the ambiguity, classify the judgment tasks in the query sequence into difficulty levels and distribute them asynchronously in parallel: classify tasks with ambiguity greater than a preset threshold into high-difficulty tasks and assign them to at least two experts for independent annotation; classify tasks with ambiguity not greater than the threshold into normal-difficulty tasks and assign them to a mixed group of experts and ordinary users for collaborative annotation. S4. Collect independent annotation results, perform weighted integration based on the user's dynamic reliability weights, and obtain a comprehensive lithology discrimination result; based on the consistency between the single annotation and the comprehensive discrimination result, apply a penalty coefficient. The dynamic decay mechanism updates the dynamic reliability weight of ordinary users, wherein the penalty coefficient Dynamic adjustments are made based on the geological risk level of the lithological data points that cause discrepancies in the judgment. S5. Update the lithology category cluster synchronously based on the lithology discrimination results, and propagate the tree in the nearest neighbor. The classification is diffused to complete the lithology identification for the current round; S6. Recalculate the node ambiguity based on the identification results of this round, iteratively execute steps S3 to S5, and output the final underground reservoir lithology identification results after the termination condition is met.

[0004] The beneficial effects of this invention are as follows: This invention applies multi-user collaborative active learning to lithology identification. First, it constructs a nearest neighbor propagation tree using preprocessed lithology data to capture the topological relationships between lithology data nodes, providing a reliable path for category diffusion. Second, it selects key nodes to generate query sequences based on the ambiguity and topological importance of lithology data nodes, thereby obtaining maximum lithology discrimination information with minimal annotation interactions. Subsequently, it introduces an asynchronous parallel task distribution multi-user collaborative annotation mechanism, sending multiple query sequence tasks to multiple users in parallel for independent annotation, eliminating the need for users to wait for each other and significantly improving annotation efficiency. It integrates annotation results using dynamic reliability weights, effectively suppressing the impact of low-quality or noisy annotations. Simultaneously, it updates the reliability weights of ordinary users in real time based on annotation consistency, continuously optimizing the quality of group decision-making, thereby significantly improving the robustness of lithology identification results.

[0005] Furthermore, the lithological data acquired in S1 specifically includes wellbore diameter, gamma rays, spontaneous potential, resistivity, density, photoelectric absorption index, intermediate induction, deep induction, sonic transit time, neutron porosity, and formation dip angle; the preprocessing includes filling missing values ​​and feature extraction of the lithological data; the initialization of the multi-user collaborative annotation environment includes creating a user pool containing experts and ordinary users, and assigning a reliability weight of 1 to all users in the pool. .

[0006] The beneficial effects of the above-mentioned further solutions are: the present invention preprocesses lithological data, ensuring the integrity and accuracy of the lithological data in use, and providing reliable input for subsequent analysis; at the same time, by initializing a multi-user collaborative annotation environment, it lays the foundation for subsequent asynchronous parallel task distribution, dynamic reliability weight integration, and user task allocation strategies.

[0007] Furthermore, in S2, a globally connected nearest neighbor propagation tree is constructed. Specifically, it includes the following steps: node set For all preprocessed lithological data, in the initial stage, each node is connected to only the node with the closest Euclidean distance to form several connected branches; a surrogate node selection index is defined. It is composed of the sum of the degree of the node and the weights of its adjacent edges, and the sum of the weights of all edges in the connected component to which the node belongs; in each connected component... The node with the largest value is selected as the surrogate node. Euclidean distance is calculated on the set of surrogate nodes, and the surrogate node with the smallest distance is connected iteratively. The calculation is recalculated after each merge. The value is updated for the proxy node until a uniquely connected nearest neighbor propagation tree is formed. The generated tree satisfies .

[0008] The beneficial effect of the above-mentioned further scheme is that it constructs a sparse and globally connected nearest neighbor propagation tree, captures the topological relationship between data nodes, and provides a reliable path for subsequent nodes to carry out category diffusion.

[0009] Furthermore, the calculation of node ambiguity in S2 specifically includes: node ambiguity By its The information entropy of the lithological category distribution of neighboring nodes is used as a measure; let the node be... of The nearest neighbor set contains Different lithological categories, for any one of them Define its nearest neighbor belonging rate For the number of nodes in this category The proportion of nearest neighbors, and the formula for calculating ambiguity are: .

[0010] Furthermore, the calculation and generation of the node topological importance sequence in S2 specifically includes: first, processing the entire nearest neighbor propagation tree... Add the node with the highest degree to the sequence As the starting node, mark it as visited; subsequently, among the unvisited nodes, select the one that matches the current sequence. Any node in the network is connected to the node with the largest weight. join sequence Repeat this process until all nodes have been added, resulting in a sequence sorted in descending order of topological importance. .

[0011] Furthermore, generating the query sequence in S2 specifically includes: First, assuming the batch size is... Select all nodes with an ambiguity greater than 0. If the number of selected nodes reaches or exceeds [a certain threshold], [further action is required]. Then select the one with the highest ambiguity. 1 node; if the number is insufficient Then, the node with the highest topological importance from the remaining nodes is selected to fill the gap, forming a key node sequence. For sequences Each key node in Select each lithology cluster Build a query pair with the node that is closest to this node. The data are then sorted in ascending order of distance to form a query queue. .

[0012] The beneficial effects of the above-mentioned further scheme are: by calculating the fuzziness and topological importance of nodes to generate query sequences, this strategy can ensure that the nodes queried are not only key data samples that can represent lithological categories, but also data samples located at the cluster boundaries that cause confusion in the clustering effect. Querying such nodes can obtain the maximum lithological discrimination information with the minimum manual annotation cost, thereby improving the accuracy of lithological identification and the overall convergence speed.

[0013] Furthermore, in S3, in conjunction with the asynchronous parallel allocation, specifically includes: allocating each query sequence... Query sample pairs It is encapsulated as an independent annotation task; the parallelism is reflected in two aspects: firstly, it separates tasks belonging to different query sequences. , Independent annotation tasks are simultaneously assigned to multiple users in a collaborative environment; on the other hand, for any given query task... Distribute it to Each user is individually labeled, among which User-submitted annotation results are collected automatically.

[0014] Furthermore, the weighted integration based on the user's dynamic reliability weights in S4 specifically includes: setting... User For query sample pairs The annotation results, among which On behalf of users The label indicates "lithological consistency," meaning it is a determination node. With nodes They belong to the same lithological category; On behalf of users The marker is "lithological inconsistency," which is the determination node. With nodes They belong to different lithological categories; For users Dynamic reliability weights; statistical analysis of the query sample pairs Number of users supporting "lithological consistency" The number of users who support "lithological inconsistency" ; Calculate the user-weighted reliability weights that support "lithological consistency", denoted as : ; Calculate the user-weighted reliability weights that support "lithological inconsistency", denoted as : ; Calculation of lithological discrimination results : ; when At that time, determine the node With nodes Lithology consistent; when At that time, determine the node With nodes The lithology is inconsistent.

[0015] Furthermore, in S4, the reliability weight of ordinary users is updated using a dynamic decay mechanism with feature penalties, specifically including: the reliability weight of experts. It remains constant at 1 during the iteration process; for ordinary users... Its reliability weight The results will be updated in real time based on the consistency between the single annotation and the system's weighted integration judgment: Let Indicates a regular user The total number of times that have been involved in lithological identification and integration is: Indicates a regular user In participating in the The reliability weight before the current annotation is determined by the current annotation. Compared with the comprehensive judgment results of this round If the results are consistent, it is determined that the judgment was correct, and the decaying reward update is executed: ; If its current annotation Compared with the comprehensive judgment results of this round If inconsistent, it is determined that an incorrect judgment has been made, and a decay penalty update is performed: ; in, The preset geological feature penalty coefficient, and Among them, when the logging data points that cause discrepancies in judgment are located in conventional homogeneous thick rock formations, the risk of misjudgment is low, and the system is set accordingly. At this point, the formula degenerates into a standard attenuation penalty. When the logging data points that cause discrepancies in judgment are located in high-risk areas such as key marker layers, fluid interfaces, or thin interbedded layers with extremely high heterogeneity, lithological misjudgments in these areas will seriously affect subsequent reserve assessments. In this case, the system dynamically increases the penalty coefficient according to the risk level and sets... The penalty is increased non-linearly. The updated reliability weight takes effect immediately in the user's subsequent query tasks.

[0016] The beneficial effects of the above explanation are as follows: Firstly, the introduction of parallel and asynchronous mechanisms fully utilizes the resources of the user pool and avoids user idle time. Secondly, asynchronicity decouples the work dependencies between users, allowing each user to handle their own tasks independently without waiting for other users to complete their tasks, thus improving user efficiency. Thirdly, a task allocation strategy based on task ambiguity difficulty assessment is introduced. High-difficulty tasks are prioritized for allocation to at least two expert users to ensure the accuracy of key annotations, while less difficult tasks are allocated to a mixed combination of "expert + ordinary users." This significantly reduces the consumption of scarce expert resources while ensuring overall annotation quality. Fourthly, the introduction of dynamic reliability weights as weighting coefficients adaptively amplifies the annotation contributions of high-quality users and suppresses the noise impact of low-quality annotations. Simultaneously, by setting a reward and penalty mechanism for the reliability weights of ordinary users, it ensures that high-level users can be relied upon during lithological identification integration, allowing them to have a greater influence on group reliability rather than relying on sheer numbers, thereby improving the reliability of the integration results.

[0017] Furthermore, the updating of lithological category clusters in S5 specifically includes: lithological category clusters This data structure is used to store the category relationships between rock strata data nodes and satisfies the following properties: 1. Nodes in each lithology category cluster identify the same lithology type; 2. The intersection of any two different lithology category clusters is an empty set; 3. Rock data nodes contained in any two different lithology category clusters belong to different lithology categories. When determining lithology, for a rock data node to be identified, the node closest to the node to be identified is selected from each current lithology category cluster as a representative node. Based on the lithology identification results generated in each round, the lithology category clusters are updated synchronously, following the following rules: when the rock data node... With lithological category clusters When a representative node in the data is determined to be "lithologically consistent", the node will be... Add to this lithological category cluster When node When all representative nodes of the current lithology clusters are determined to be "lithologically inconsistent", the system creates a new lithology cluster and... The term "new lithology" is added to represent new lithological categories discovered during geological exploration.

[0018] Furthermore, in S5, the nearest neighbor propagation tree... The category diffusion specifically includes: using all current lithological category clusters. As the starting point for category diffusion, each lithological category cluster has a unique lithological label; diffusion proceeds sequentially according to lithological category clusters, for each lithological category cluster... Perform the following traversal: from lithology category clusters Select a node As the source point of lithological tags, traversing the nearest neighbor propagation tree middle All neighboring nodes If the neighboring node Category diffusion can occur if both of the following conditions are met: 1. The neighbor node The first is an unvisited node, meaning it hasn't been added to any lithology cluster yet; the second is the neighboring node. The topological importance of the source point is lower than that of the source point. ;Will Lithological cluster The corresponding lithology label is assigned to neighboring nodes that meet the conditions. , will node Add lithological category clusters And use it as a new source point for the next round of diffusion traversal; when lithological clusters All nodes in the cluster have completed the diffusion of lithology tags and no new nodes have been added, thus stopping the lithology category clustering. Category diffusion; then processing lithological category clusters. The nodes in the process are processed until all lithology clusters are processed, and finally the current lithology identification result is obtained.

[0019] The beneficial effects of the above-mentioned further scheme are: introducing a nearest neighbor propagation tree to propagate categories and using topological importance rules to determine the flow of labels, which not only solves the possibility of label conflicts, but also enables the rapid acquisition of high-quality lithology identification results. Attached Figure Description

[0020] Figure 1 This is a flowchart of the method of the present invention. Detailed Implementation

[0021] The technical solutions in the embodiments of the present invention will be described in detail below. However, it should be understood that the embodiments described herein are only a part of the invention. The present invention is not limited to the scope of the specific implementation. For those skilled in the art, any changes to the embodiments herein that are limited to the claims are within the protection scope of the present invention.

[0022] like Figure 1 As shown, this invention provides a method for identifying the lithology of underground reservoirs based on active learning and multi-person collaboration. The method includes the following steps: S1. Acquire lithological data containing multidimensional logging parameters and preprocess it. Initialize a multi-user collaborative annotation environment, which includes experts and ordinary users, and initialize dynamic reliability weights for different users. S2. Based on the preprocessed lithological data, construct a globally connected nearest neighbor propagation tree that reflects the spatial distribution of strata characteristics. ; Calculate the fuzziness of lithological data nodes representing the degree of lithological category mixing, and generate a query sequence containing lithological judgment tasks by combining the topological importance of the nodes. ; S3. Based on the ambiguity, classify the judgment tasks in the query sequence into difficulty levels and distribute them asynchronously in parallel: classify tasks with ambiguity greater than a preset threshold into high-difficulty tasks and assign them to at least two experts for independent annotation; classify tasks with ambiguity not greater than the threshold into normal-difficulty tasks and assign them to a mixed group of experts and ordinary users for collaborative annotation. S4. Collect independent annotation results, perform weighted integration based on the user's dynamic reliability weights, and obtain a comprehensive lithology discrimination result; based on the consistency between the single annotation and the comprehensive discrimination result, apply a penalty coefficient. The dynamic decay mechanism updates the dynamic reliability weight of ordinary users, wherein the penalty coefficient Dynamic adjustments are made based on the geological risk level of the lithological data points that cause discrepancies in the judgment. S5. Update the lithology category cluster synchronously based on the lithology discrimination results, and propagate the tree in the nearest neighbor. The classification is diffused to complete the lithology identification for the current round; S6. Recalculate the node ambiguity based on the identification results of this round, iteratively execute steps S3 to S5, and output the final underground reservoir lithology identification results after the termination condition is met.

[0023] If the termination condition is met, the lithology identification result for this step is directly output as the final lithology identification result; if the termination condition is not met, the ambiguity sequence of all lithology data nodes is recalculated based on the identification result. Based on the updated ambiguity sequence and topological importance sequence Generate key node sequence Then generate a query sequence. Next, the tasks in the query sequence are asynchronously and in parallel distributed to multiple users for annotation. The lithology discrimination result is obtained by integrating the dynamic reliability weights of the users, and then the lithology category cluster is updated. This process is iterated until the preset termination condition is met, and then the final identification result is output.

[0024] In this embodiment, active learning and multi-user collaboration are integrated for lithology identification. Specifically, firstly, lithology logging data is acquired and preprocessed. Based on the preprocessed lithology data, a nearest neighbor propagation tree is constructed, providing a reliable path for subsequent category diffusion. Next, ambiguity is used to locate data samples that cause confusion at cluster boundaries, and topological importance is used to locate influential data samples at the topological center. A query sequence is obtained based on ambiguity and topological importance. The system asynchronously and in parallel distributes the query tasks in the query sequence to at least two users for independent annotation. Each user works independently without waiting for others. After collecting the annotation results of all participating users for a query task, the annotation results are integrated according to user reliability weights. At the same time, the reliability weights of ordinary users participating in the annotation are updated in real time as rewards and penalties. During the annotation process, a "expert user leading ordinary user" approach is adopted for task allocation. The decision-making effect of the joint user and ordinary user is not significantly different from the decision-making effect of all expert users. Finally, the lithology category clusters are updated based on the integrated lithology identification results. Then, using the lithology category clusters as the source points, lithology labels are diffused based on the nearest neighbor propagation tree, and a current identification result is given in each round. Finally, a final identification result of the lithology data is given after the termination condition is met.

[0025] In this embodiment, lithological data are acquired, including wellbore diameter, gamma rays, spontaneous potential, resistivity, density, photoelectric absorption index, medium induction, deep induction, acoustic transit time, neutron porosity, and formation dip angle.

[0026] In this embodiment, the acquired well logging data is preprocessed, including filling in missing values; the preprocessed lithology data will be used as nodes for constructing the nearest neighbor propagation tree. A multi-user collaborative annotation environment is initialized, and a user pool is created. This pool includes both expert and regular users; initial reliability weights are assigned to all users in the pool. Set it to 1.

[0027] In this embodiment, a globally connected nearest neighbor propagation tree is constructed. Specifically, it includes the following steps: node set For all preprocessed lithological data, in the initial stage, each node is connected to only the node with the closest Euclidean distance to form several connected branches; a surrogate node selection index is defined. It is composed of the sum of the degree of the node and the weights of its adjacent edges, and the sum of the weights of all edges in the connected component to which the node belongs; in each connected component... The node with the largest value is selected as the surrogate node. Euclidean distance is calculated on the set of surrogate nodes, and the surrogate node with the smallest distance is connected iteratively. The calculation is recalculated after each merge. The value is updated for the proxy node until a uniquely connected nearest neighbor propagation tree is formed. The generated tree satisfies .

[0028] In this embodiment, a topological importance sequence is generated. The steps are as follows: First, propagate the entire nearest neighbor tree. Add the node with the highest degree to the sequence As the starting node, mark it as visited; subsequently, among the unvisited nodes, select the one that matches the current sequence. Any node in the network is connected to the node with the largest weight. join sequence Repeat this process until all nodes have been added, resulting in a sequence sorted in descending order of topological importance. .

[0029] In this embodiment, an ambiguity sequence is generated. The steps are as follows: Node ambiguity By its The information entropy of the lithological category distribution of neighboring nodes is used as a measure; let the node be... of The nearest neighbor set contains Different lithological categories, for any one of them Define its nearest neighbor belonging rate For the number of nodes in this category The proportion of nearest neighbors, and the formula for calculating ambiguity are: ; when When the value is larger, it indicates that the node... The higher the ambiguity of the surrounding environment, the more mixed the lithological categories, and the greater the likelihood that it is located in the boundary area between different categories.

[0030] In this embodiment, the steps for generating the query sequence include: First, setting the batch size to be... Select all nodes with an ambiguity greater than 0. If the number of selected nodes reaches or exceeds [a certain threshold], [further action is required]. Then select the one with the highest ambiguity. 1 node; if the number is insufficient Then, the node with the highest topological importance from the remaining nodes is selected to fill the gap, forming a key node sequence. For sequences Each key node in Select each lithology cluster Build a query pair with the node that is closest to this node. The data are then sorted in ascending order of distance to form a query queue. .

[0031] In this embodiment, each query sequence Query sample pairs It is encapsulated as an independent annotation task and performed in parallel. This parallelism is reflected in two aspects: firstly, tasks belonging to different query sequences are assigned in parallel. , Tasks such as [the query task] are simultaneously assigned to multiple users in the user pool; on the other hand, for any given query task... Distribute it to Each user is individually labeled, among which User-submitted annotation results are collected automatically.

[0032] In this embodiment, to minimize annotation costs, a collaborative allocation mechanism based on task difficulty and user category is adopted. Specifically, the system allocates tasks according to ambiguity. Difficulty levels are defined: a difficulty threshold is set. The ambiguity is higher than The tasks are classified as high-difficulty tasks with a fuzziness level lower than [missing information]. The tasks are categorized into low-difficulty tasks, with higher ambiguity indicating greater annotation difficulty. Based on this, the following allocation strategy is implemented: For high-difficulty tasks, the system assigns them to two or more experts for annotation, ensuring the quality of annotations for key nodes; for low-difficulty tasks, collaborative annotation is conducted using a combination of at least one expert and at least one general user. Expert users, possessing extensive experience in lithology identification, have a constant reliability weight of 1, providing a reliable benchmark for lithology identification integration. General users also have an initial reliability weight of 1, but their weight is dynamically adjusted based on the annotation results, employing a reward and penalty mechanism. This collaborative combination of "expert users leading general users" approaches the level of "all-expert" collaboration in terms of decision-making effectiveness, ensuring not only the reliability of annotation results but also reducing the resource consumption of high-cost expert users.

[0033] In this embodiment, to quantify user reliability and avoid low-quality annotations affecting lithology identification results, each user is assigned a specific labeling method. A reliability weight was defined. The initial value is set to 1. Let... User For query sample pairs The annotation results, among which On behalf of users The label indicates "lithological consistency," meaning it is a determination node. With nodes They belong to the same lithological category; On behalf of users The marker is "lithological inconsistency," which is the determination node. With nodes They belong to different lithological categories; For users Dynamic reliability weights; statistical analysis of the query sample pairs. Number of users supporting "lithological consistency" The number of users who support "lithological inconsistency" ; Calculate the user-weighted reliability weights that support "lithological consistency", denoted as : ; Calculate the user-weighted reliability weights that support "lithological inconsistency", denoted as : ; Calculation of lithological discrimination results : ; when At that time, determine the node With nodes Lithology consistent; when At that time, determine the node With nodes The lithology is inconsistent.

[0034] In this embodiment, the reliability weight of the expert It remains constant at 1 during the iteration process; for ordinary users... Its reliability weight The results will be updated in real time based on the consistency between the single annotation and the system's weighted integration judgment results. make Indicates a regular user The total number of times that have been involved in lithological identification and integration is: Indicates a regular user In participating in the The reliability weight before the current annotation is determined by the current annotation. Compared with the comprehensive judgment results of this round If the results are consistent, it is determined that the judgment was correct, and the decaying reward update is executed: ; If its current annotation Compared with the comprehensive judgment results of this round If inconsistent, it is determined that an incorrect judgment has been made, and a decay penalty update is performed: ; in, The preset geological feature penalty coefficient, and In this embodiment, considering the complexity of the underground reservoir environment, the engineering risk cost of misjudgment varies depending on the geological characteristics. If only a conventional, indiscriminate time decay mechanism is used, it will be impossible to effectively identify the true professional capabilities of ordinary users in complex formations. Therefore, when the logging data point causing the judgment discrepancy is located in a conventional, homogeneous, thick rock formation, the risk of misjudgment is low, and the system is set accordingly. At this point, the formula degenerates into a standard attenuation penalty. When the logging data points that cause discrepancies in judgment are located in high-risk areas such as key marker layers, fluid interfaces, or thin interbedded layers with extremely high heterogeneity, lithological misjudgments in these areas will seriously affect subsequent reserve assessments. In this case, the system dynamically increases the penalty coefficient according to the risk level and sets... The penalty is increased non-linearly. The updated reliability weight takes effect immediately in the user's subsequent query tasks.

[0035] In this embodiment, lithological category clusters are defined. This data structure is used to store the category relationships between rock strata data nodes and satisfies the following properties: 1. Nodes in each lithology category cluster identify the same lithology type; 2. The intersection of any two different lithology category clusters is an empty set; 3. Rock data nodes contained in any two different lithology category clusters belong to different lithology categories. When determining lithology, for a rock data node to be identified, the node closest to the node to be identified is selected from each current lithology category cluster as a representative node. Based on the lithology identification results generated in each round, the lithology category clusters are updated synchronously, following the following rules: when the rock data node... With lithological category clusters When a representative node in the data is determined to be "lithologically consistent", the node will be... Add to this lithological category cluster When node When all representative nodes of the current lithology clusters are determined to be "lithologically inconsistent", the system creates a new lithology cluster and... The term "new lithology" is added to represent new lithological categories discovered during geological exploration.

[0036] In this embodiment, all current lithological clusters are used. As the starting point for category diffusion, each lithological category cluster has a unique lithological label; diffusion proceeds sequentially according to lithological category clusters, for each lithological category cluster... Perform the following traversal: from lithology category clusters Select a node As the source point of lithological tags, traversing the nearest neighbor propagation tree middle All neighboring nodes If the neighboring node Category diffusion can occur if both of the following conditions are met: 1. The neighbor node The first is an unvisited node, meaning it hasn't been added to any lithology cluster yet; the second is the neighboring node. The topological importance of the source point is lower than that of the source point. ;Will Lithological cluster The corresponding lithology label is assigned to neighboring nodes that meet the conditions. , will node Add lithological category clusters And use it as a new source point for the next round of diffusion traversal; when lithological clusters All nodes in the cluster have completed the diffusion of lithology tags and no new nodes have been added, thus stopping the lithology category clustering. Category diffusion; then processing lithological category clusters. The nodes in the process are processed until all lithology clusters are processed, and finally the current lithology identification result is obtained.

[0037] In summary, compared with the existing technology, the underground reservoir lithology identification method disclosed in this invention, which integrates active learning and multi-person collaboration, has the following beneficial effects: (1) obtaining lithology identification results with less manual cost: introducing a nearest neighbor propagation tree The structure can capture the node relationships between large-scale data, providing a reliable path for subsequent label propagation. At the same time, selecting nodes with fuzziness and topological importance for labeling can obtain the maximum category information in each interaction, and converge to obtain high-quality lithology category results with the fewest number of queries. (2) Improved labeling efficiency: The query task is asynchronously distributed to multiple users for independent labeling. Users do not need to wait for each other, which effectively changes the system delay caused by single-user serial labeling in traditional methods and improves the efficiency of human-computer interaction. (3) Effectively suppresses labeling noise and enhances the robustness of results: By introducing a dynamic reliability weight mechanism, the lithology identification results are freed from the ideal assumption that "labeling is always reliable" and instead rely on the labeling results of more reliable users. This can eliminate the influence of low-quality or noisy labels and make the lithology identification results more reliable and consistent. (4) Reduced labeling cost: The task allocation strategy of "expert users leading and ordinary users assisting" is adopted. The quality of labeling is guaranteed by expert users, and the labeling cost is reduced by ordinary users. Moreover, there is no significant difference in labeling effect compared with "all expert users".

Claims

1. A method for identifying the lithology of underground reservoirs based on active learning and multi-person collaboration, characterized in that, Includes the following steps: S1. Acquire lithological data containing multidimensional logging parameters and preprocess it. Initialize a multi-user collaborative annotation environment, which includes experts and ordinary users, and initialize dynamic reliability weights for different users. S2. Based on the preprocessed lithological data, construct a globally connected nearest neighbor propagation tree that reflects the spatial distribution of strata characteristics. ; Calculate the fuzziness of lithological data nodes representing the degree of lithological category mixing, and generate a query sequence containing lithological judgment tasks by combining the topological importance of the nodes. ; S3. Based on the ambiguity, classify the judgment tasks in the query sequence into difficulty levels and distribute them asynchronously in parallel: classify tasks with ambiguity greater than a preset threshold into high-difficulty tasks and assign them to at least two experts for independent annotation; classify tasks with ambiguity not greater than the threshold into normal-difficulty tasks and assign them to a mixed group of experts and ordinary users for collaborative annotation. S4. Collect independent annotation results, perform weighted integration based on the user's dynamic reliability weights, and obtain a comprehensive lithology discrimination result; based on the consistency between the single annotation and the comprehensive discrimination result, apply a penalty coefficient. The dynamic decay mechanism updates the dynamic reliability weight of ordinary users, wherein the penalty coefficient Dynamic adjustments are made based on the geological risk level of the lithological data points that cause discrepancies in the judgment. S5. Update the lithology category cluster synchronously based on the lithology discrimination results, and propagate the tree in the nearest neighbor. The classification is diffused to complete the lithology identification for the current round; S6. Recalculate the node ambiguity based on the identification results of this round, iteratively execute steps S3 to S5, and output the final underground reservoir lithology identification results after the termination condition is met.

2. The method for identifying underground reservoir lithology based on active learning and multi-person collaboration according to claim 1, characterized in that, The lithological data acquired in S1 specifically includes wellbore diameter, gamma rays, spontaneous potential, resistivity, density, photoelectric absorption index, intermediate induction, deep induction, sonic transit time, neutron porosity, and formation dip angle; the preprocessing includes filling missing values ​​and feature extraction of the lithological data; the initialization of the multi-user collaborative annotation environment includes creating a user pool containing experts and ordinary users, and assigning a reliability weight of 1 to all users in the pool. .

3. The method for identifying underground reservoir lithology based on active learning and multi-person collaboration according to claim 1, characterized in that, In S2, a globally connected nearest neighbor propagation tree is constructed. Specifically, the following steps are included: Node set For all preprocessed lithological data, in the initial stage, each node is connected to only the node with the closest Euclidean distance to form several connected branches; a surrogate node selection index is defined. It is composed of the sum of the degree of the node and the weights of its adjacent edges, and the sum of the weights of all edges in the connected component to which the node belongs; in each connected component... The node with the largest value is selected as the surrogate node. Euclidean distance is calculated on the set of surrogate nodes, and the surrogate node with the smallest distance is connected iteratively. The calculation is recalculated after each merge. The value is updated for the proxy node until a uniquely connected nearest neighbor propagation tree is formed. The generated tree satisfies .

4. The method for identifying underground reservoir lithology based on active learning and multi-person collaboration according to claim 1, characterized in that, The calculation of node ambiguity in S2 specifically includes the following steps: node ambiguity By its The information entropy of the lithological category distribution of neighboring nodes is used as a measure; let the node be... of The nearest neighbor set contains Different lithological categories, for any one of them Define its nearest neighbor affiliation rate For the number of nodes in this category The proportion of nearest neighbors, and the formula for calculating ambiguity are: 。 5. The method for identifying underground reservoir lithology based on active learning and multi-person collaboration according to claim 1, characterized in that, The calculation and generation of the node topological importance sequence in S2 specifically includes the following steps: First, propagate the entire nearest neighbor tree. Add the node with the highest degree to the sequence As the starting node, mark it as visited; subsequently, among the unvisited nodes, select the one that matches the current sequence. Any node in the network is connected to the node with the largest weight. join sequence Repeat this process until all nodes have been added, resulting in a sequence sorted in descending order of topological importance. .

6. The method for identifying underground reservoir lithology based on active learning and multi-person collaboration according to claim 1, characterized in that, The process of generating the query sequence in S2 includes the following steps: First, let the batch size be... Select all nodes with an ambiguity greater than 0. If the number of selected nodes reaches or exceeds [a certain threshold], [further action is required]. Then select the one with the highest ambiguity. 1 node; if the number is insufficient Then, the node with the highest topological importance from the remaining nodes is selected to fill the gap, forming a key node sequence. For sequences Each key node in Select each lithology cluster In and the key nodes Build query pair from the nearest node The data are then sorted in ascending order of distance to form a query queue. .

7. The method for identifying underground reservoir lithology based on active learning and multi-person collaboration according to claim 1, characterized in that, The asynchronous parallel allocation in S3 specifically includes the following steps: Each query sequence Query sample pairs It is encapsulated as an independent annotation task; the parallelism is reflected in two aspects: firstly, it separates tasks belonging to different query sequences. , Independent annotation tasks are simultaneously assigned to multiple users in a collaborative environment; on the other hand, for any given query task... Distribute it to Each user is individually labeled, among which .

8. The method for identifying subsurface reservoir lithology based on active learning and multi-person collaboration according to claim 1, characterized in that, The weighted synchronization integration based on the user's dynamic reliability weight in S4 specifically includes the following steps: set up User For query sample pairs The annotation results, among which On behalf of users The label indicates "lithological consistency," meaning it is a determination node. With nodes They belong to the same lithological category; On behalf of users The marker is "lithological inconsistency," which is the determination node. With nodes They belong to different lithological categories; For users Dynamic reliability weights; statistical analysis of the query sample pairs Number of users supporting "lithological consistency" The number of users who support "lithological inconsistency" ; Calculate the user-weighted reliability weights that support "lithological consistency", denoted as : ; Calculate the user-weighted reliability weight sum that supports "lithological inconsistency", denoted as : ; Calculation of lithological discrimination results : ; when At that time, determine the node With nodes Lithology consistent; when At that time, determine the node With nodes The lithology is inconsistent.

9. The method for identifying underground reservoir lithology based on active learning and multi-person collaboration according to claim 1, characterized in that, The process of updating the reliability weight of ordinary users in S4 includes the following steps: Expert reliability weight It remains constant at 1 during the iteration process; for ordinary users... Its reliability weight It will be updated in real time based on the consistency between its single annotation and the system's weighted integration judgment results; Indicates a regular user The total number of times that have been involved in lithological identification and integration is: Indicates a regular user In participating in the Reliability weight before annotation; if its current annotation Compared with the comprehensive judgment results of this round If the results are consistent, it is determined that the judgment was correct, and the decaying reward update is executed: ; If its current annotation Compared with the comprehensive judgment results of this round If inconsistent, it is determined that an incorrect judgment has been made, and a decay penalty update is performed: ; in, The preset geological feature penalty coefficient, and Among them, when the logging data points that cause discrepancies in judgment are located in conventional homogeneous thick rock formations, the risk of misjudgment is low, and the system is set accordingly. At this point, the formula degenerates into a standard attenuation penalty. When the logging data points that cause discrepancies in judgment are located in high-risk areas such as key marker layers, fluid interfaces, or thin interbedded layers with extremely high heterogeneity, lithological misjudgments in these areas will seriously affect subsequent reserve assessments. In this case, the system dynamically increases the penalty coefficient according to the risk level and sets... The penalty is increased non-linearly. The updated reliability weight takes effect immediately in the user's subsequent query tasks.

10. The method for identifying underground reservoir lithology based on active learning and multi-person collaboration according to claim 1, characterized in that, The updating of lithological category clusters in S5 specifically includes the following steps: Lithological clusters This data structure is used to store the category relationships between rock strata data nodes and satisfies the following properties:

1. Nodes in each lithology category cluster identify the same lithology type; 2. The intersection of any two different lithology category clusters is an empty set; 3. Rock data nodes contained in any two different lithology category clusters belong to different lithology categories. When determining lithology, for a rock data node to be identified, the node closest to the node to be identified is selected from each current lithology category cluster as a representative node. Based on the lithology identification results generated in each round, the lithology category clusters are updated synchronously, following the following rules: when the rock data node... With lithological category clusters When a representative node in the data is determined to be "lithologically consistent", the node will be... Add to this lithological category cluster When node When all representative nodes of the current lithological category clusters are determined to be "lithologically inconsistent", the system creates a new lithological category cluster and... The term "new lithology" is added to represent new lithological categories discovered during geological exploration.

11. The method for identifying underground reservoir lithology based on active learning and multi-person collaboration according to claim 1, characterized in that, In S5, the nearest neighbor propagation tree The process of category diffusion includes the following steps: Based on all current lithological clusters As the starting point for category diffusion, each lithological category cluster has a unique lithological label; diffusion proceeds sequentially according to lithological category clusters, for each lithological category cluster... Perform the following traversal: from lithology category clusters Select a node As the source point of lithological tags, traversing the nearest neighbor propagation tree middle All neighboring nodes If the neighboring node Category diffusion can occur if both of the following conditions are met:

1. The neighbor node The first is an unvisited node, meaning it hasn't been added to any lithology cluster yet; the second is the neighboring node. The topological importance of the source point is lower than that of the source point. ;Will Lithological cluster The corresponding lithology label is assigned to neighboring nodes that meet the conditions. , will node Add lithological category clusters And use it as a new source point for the next round of diffusion traversal; when lithological clusters All nodes in the cluster have completed the diffusion of lithology tags and no new nodes have been added, thus stopping the lithology category clustering. The category diffusion is then performed; next, the nodes in the lithology category clusters are processed until all lithology category clusters have been processed, and finally the current lithology identification result is obtained.