Data correlation-based digital core library construction method

By differentiating dimensions and constructing spatiotemporal sequences from core sample data, a data association map is generated, which solves the problem of scattered core library data resources in existing technologies and enables efficient multi-dimensional data retrieval and intelligent analysis.

CN122019512BActive Publication Date: 2026-06-26CHINA HYDROELECTRIC ENGINEERING CONSULTING GROUP CHENGDU RESEARCH HYDROELECTRIC INVESTIGATION DESIGN AND INSTITUTE +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA HYDROELECTRIC ENGINEERING CONSULTING GROUP CHENGDU RESEARCH HYDROELECTRIC INVESTIGATION DESIGN AND INSTITUTE
Filing Date
2026-04-10
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing digital core repositories lack in-depth exploration of the internal spatiotemporal correlations and cross-relationships of core data from different sources and of various types, resulting in fragmented data resources, low retrieval efficiency, and difficulty in supporting refined geological research and intelligent analysis.

Method used

By differentiating the dimensions of core sample data, generating spatiotemporal data sequences, constructing data association maps, and building a digital core library based on data indexing rules, the fusion and intelligent retrieval of multi-dimensional data are realized.

Benefits of technology

It significantly improves the depth and breadth of data retrieval, supports fast and accurate multi-level index retrieval, and enhances the utilization efficiency and intelligent analysis capabilities of data resources.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122019512B_ABST
    Figure CN122019512B_ABST
Patent Text Reader

Abstract

The application discloses a data correlation-based digital core library construction method, relates to the technical field of core library digitization, and comprises the following steps: collecting core sample data, and performing dimension division on the collected core sample data; processing core sample data of different dimensions to generate corresponding dimensional spatiotemporal data sequences; constructing a data correlation graph between core sample data of different dimensions according to the spatiotemporal data sequences corresponding to the core sample data of different dimensions; generating corresponding data index rules according to the data correlation graph; and completing the construction of a digital core library based on the data index rules and the collected core sample data. The method is characterized in that the data is divided into multiple dimensions, spatiotemporal sequences are constructed, the same dimension and cross-dimension relationships between different dimensional data are quantified by using a correlation graph, different dimensional data is fused into the same data correlation graph, and multi-level index rules are automatically generated based on the data correlation graph, so that the depth, breadth and intelligent level of data retrieval are significantly improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of core repository digitization technology, specifically a method for constructing a digital core repository based on data association. Background Technology

[0002] Existing digital core repositories mostly focus on simple data storage and management, lacking in-depth mining of the spatiotemporal correlations and cross-relationships within core data from different sources and of various types, such as descriptive data, analytical test data, and image data. This results in scattered data resources, low retrieval efficiency, and difficulties in knowledge discovery, making it difficult to support the needs of refined geological research and intelligent analysis. To address this, we present a method for constructing a digital core repository based on data correlation. Summary of the Invention

[0003] The purpose of this invention is to provide a method for constructing a digital core repository based on data association.

[0004] The objective of this invention can be achieved through the following technical solution: a method for constructing a digital core repository based on data association, comprising:

[0005] Collect core sample data and distinguish the dimensions of the collected core sample data;

[0006] Core sample data of different dimensions are processed to generate spatiotemporal data sequences of corresponding dimensions;

[0007] Based on the spatiotemporal data sequences corresponding to core sample data of different dimensions, a data association map between core sample data of different dimensions is constructed.

[0008] Based on the data association map, corresponding data index rules are generated, and a digital core library is constructed based on the data index rules and the collected core sample data.

[0009] Furthermore, the process of collecting core sample data and distinguishing the dimensions of the collected core sample data includes:

[0010] The core sample data consists of several types of data, including descriptive data, analytical test data, and image data.

[0011] The core sample data is split according to different data types to obtain the data items belonging to the corresponding data types;

[0012] Each data type is treated as a separate data dimension, denoted as the first data dimension;

[0013] Each data item under each data type is treated as a separate data dimension, denoted as the second data dimension.

[0014] Furthermore, the process of processing core sample data of different dimensions to generate spatiotemporal data sequences of corresponding dimensions includes:

[0015] The core sample data in each first data dimension are traversed separately, and the time-related data in the core sample data are marked and corresponding time labels are generated, and the spatially related data are marked and spatial labels are generated.

[0016] The second data dimension corresponding to the time-related data of the obtained time tags and the spatial-related data of the obtained spatial tags is summarized to obtain the spatiotemporal data sequence corresponding to each first data dimension.

[0017] Furthermore, the process of constructing a data association map between core sample data of different dimensions based on the spatiotemporal data sequences corresponding to core sample data of different dimensions includes:

[0018] All time and spatial labels belonging to the same data dimension are aggregated, and the same-dimensional correlation coefficient between different labels is obtained based on the aggregated time and spatial labels.

[0019] Similarity matching is performed on various time and spatial labels of different data dimensions, and cross-dimensional correlation coefficients between different labels are obtained based on the similarity matching results;

[0020] Set several same-dimensional association threshold ranges and cross-dimensional association threshold ranges, and each same-dimensional association threshold range and cross-dimensional association threshold range has a corresponding association level;

[0021] The obtained same-dimensional correlation coefficients are matched with each same-dimensional correlation threshold range. When the same-dimensional correlation coefficient is within any same-dimensional correlation threshold range, the correlation level corresponding to the same-dimensional correlation threshold range is set as the same-dimensional correlation level between the tags corresponding to the same-dimensional correlation coefficient.

[0022] The obtained cross-dimensional correlation coefficients are matched with each cross-dimensional correlation threshold range. When the cross-dimensional correlation coefficient is within any cross-dimensional correlation threshold range, the correlation level corresponding to the cross-dimensional correlation threshold range is set as the tag cross-dimensional correlation level between the tags corresponding to the cross-dimensional correlation coefficient.

[0023] Generate topological nodes corresponding to each label, and generate topological routes and weights between different labels based on the same-dimensional association level between labels of data in the same dimension and the cross-dimensional association level between labels of data in different dimensions, thereby completing the construction of the data association graph.

[0024] Furthermore, the process for obtaining the same-dimensional correlation coefficient between different labels is as follows:

[0025] Select any second data dimension and obtain the relevant data corresponding to each label within that second data dimension, and set the corresponding comparison standard based on the relevant data content;

[0026] The relevant content corresponding to each tag is compared with the relevant content of another tag to obtain the corresponding relevance.

[0027] Based on the corresponding comparison criteria and the obtained correlation degree, the same-dimensional correlation coefficient between labels is obtained.

[0028] Furthermore, the process of obtaining the cross-dimensional correlation coefficients between different labels is as follows:

[0029] Choose any label as the baseline label and choose another label as the comparison label, and the comparison label and the baseline standard do not belong to the same second data dimension;

[0030] Obtain the correlation degree between the relevant content of the baseline label and the control label, and obtain the cross-dimensional correlation coefficient between the baseline label and the control label based on the corresponding control standard and the obtained correlation degree.

[0031] Furthermore, the process of generating corresponding data indexing rules based on the data association graph includes:

[0032] Extract the corresponding text features from the relevant content of the label for each topological node;

[0033] Associate the extracted text features with the topological node;

[0034] Summarize all identical text features and aggregate the topology nodes corresponding to each identical text feature to obtain the set of topology nodes corresponding to that text feature;

[0035] Generate index links with the set of topological nodes, and associate the index links with the text features. Summarize the generated index links as first-level index links.

[0036] Select each topology node in the set of topology nodes corresponding to each first-level index link, and select other topology nodes according to the same-dimensional or cross-dimensional label association level between the selected topology node and other topology nodes, and mark the selected other topology nodes as second-level topology nodes.

[0037] The selected secondary topology nodes are aggregated to obtain a set of secondary topology nodes, and secondary index links are generated.

[0038] By associating all secondary index links corresponding to the selected topological node with the primary index links, the data index rules are constructed, and the core sample data can be retrieved according to the data index rules.

[0039] Compared with the prior art, the beneficial effects of the present invention are:

[0040] By differentiating data from multiple dimensions and constructing spatiotemporal sequences, and by using association graphs to quantify the same-dimensional and cross-dimensional relationships between data from different dimensions, data from different dimensions are integrated into the same data association graph. Based on the data association graph, multi-level index rules are automatically generated, supporting users to perform fast and accurate first-level searches based on text features. Furthermore, it can automatically expand second-level data with high association strength according to the association strength, significantly improving the depth, breadth, and intelligence level of data retrieval. Attached Figure Description

[0041] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this invention. For those skilled in the art, other drawings can be obtained based on these drawings.

[0042] Figure 1 This is a schematic diagram of the present invention. Detailed Implementation

[0043] Collect core sample data and distinguish the dimensions of the collected core sample data;

[0044] Core sample data of different dimensions are processed to generate spatiotemporal data sequences of corresponding dimensions;

[0045] Based on the spatiotemporal data sequences corresponding to core sample data of different dimensions, a data association map between core sample data of different dimensions is constructed.

[0046] Based on the data association map, corresponding data index rules are generated, and a digital core library is constructed based on the data index rules and the collected core sample data.

[0047] It should be further explained that, in the specific implementation process, the collection of core sample data and the dimensional differentiation of the collected core sample data include:

[0048] The core sample data consists of several types of data, including descriptive data, analytical test data, and image data.

[0049] The core sample data is split according to different data types to obtain the data items belonging to the corresponding data types;

[0050] Specifically:

[0051] For descriptive data, there are several descriptive items, and each descriptive item corresponds to descriptive content, and the descriptive content is divided into descriptive keywords or key parameters;

[0052] For analytical test type data, there are several analytical test items, and each analytical test item corresponds to a test result, and the test parameters and test standards in the test results are divided.

[0053] For image type data, there are several image label items, and each image label item corresponds to image information that corresponds to the image content, and the information keywords and information parameters in the image information are divided.

[0054] Each data type is treated as a separate data dimension, denoted as the first data dimension;

[0055] Each data item under each data type is treated as a data dimension, denoted as the second data dimension. That is, each first data dimension contains several second data dimensions.

[0056] It should be further explained that, in the specific implementation process, the process of processing core sample data of different dimensions to generate spatiotemporal data sequences of corresponding dimensions includes:

[0057] The core sample data in each of the first data dimensions are traversed separately, and the time-related data in the core sample data are marked and corresponding time labels are generated, as well as the spatially related data are marked and spatial labels are generated. It should be noted that the time-related data includes date, year, duration, etc., and the spatially related data includes geographical location, latitude and longitude, direction, etc.

[0058] The second data dimension corresponding to the time-related data of the obtained time tags and the spatial-related data of the obtained spatial tags is summarized to obtain the spatiotemporal data sequence corresponding to each first data dimension.

[0059] It should be further explained that, in the specific implementation process, the process of constructing a data correlation map between core sample data of different dimensions based on the spatiotemporal data sequences corresponding to core sample data of different dimensions includes:

[0060] All time and spatial labels belonging to the same data dimension are aggregated, and the same-dimensional correlation coefficient between different labels is obtained based on the aggregated time and spatial labels.

[0061] Similarity matching is performed on various time and spatial labels of different data dimensions, and cross-dimensional correlation coefficients between different labels are obtained based on the similarity matching results;

[0062] Set several same-dimensional association threshold ranges and cross-dimensional association threshold ranges, and each same-dimensional association threshold range and cross-dimensional association threshold range has a corresponding association level;

[0063] The obtained same-dimensional correlation coefficients are matched with each same-dimensional correlation threshold range. When the same-dimensional correlation coefficient is within any same-dimensional correlation threshold range, the correlation level corresponding to the same-dimensional correlation threshold range is set as the same-dimensional correlation level between the tags corresponding to the same-dimensional correlation coefficient.

[0064] The obtained cross-dimensional correlation coefficients are matched with each cross-dimensional correlation threshold range. When the cross-dimensional correlation coefficient is within any cross-dimensional correlation threshold range, the correlation level corresponding to the cross-dimensional correlation threshold range is set as the tag cross-dimensional correlation level between the tags corresponding to the cross-dimensional correlation coefficient.

[0065] Generate topological nodes corresponding to each label, and generate topological routes and weights between different labels based on the same-dimensional association level between labels of data in the same dimension and the cross-dimensional association level between labels of data in different dimensions, thereby completing the construction of the data association graph.

[0066] It should be further explained that the process of obtaining the same-dimensional correlation coefficient between different labels is as follows:

[0067] Choose any second data dimension and summarize all labels within that second data dimension. Label each label as i, where i = 1, 2, ..., n;

[0068] Obtain the relevant data for each tag and set corresponding comparison standards based on the relevant data content; it should be noted that the comparison standards for different tags are different, and the comparison standards depend on the relevant content of the tag;

[0069] The relevant content corresponding to the tag with the number i is compared with the relevant content of the tags with the number not i in turn to obtain the corresponding relevance.

[0070] Based on the corresponding comparison criteria and the obtained correlation degree, the same-dimensional correlation coefficient between the labels is obtained;

[0071] Specifically:

[0072] The relevance between the label i and the content of any label other than i is denoted as . , where i≠j;

[0073] The reference standard is K0;

[0074] Based on the corresponding comparison criteria and the obtained correlation degree, the same-dimensional correlation coefficient between the labels is denoted as Tg(i,j), where:

[0075] .

[0076] It should be further explained that the process of obtaining the cross-dimensional correlation coefficient between different labels is as follows:

[0077] Choose any label as the baseline label and choose another label as the comparison label, and the comparison label and the baseline standard do not belong to the same second data dimension;

[0078] Obtain the correlation degree between the relevant content of the baseline label and the control label. Based on the corresponding control standard and the obtained correlation degree, obtain the cross-dimensional correlation coefficient between the baseline label and the control label. It should be noted that the process of obtaining the cross-dimensional correlation coefficient between the baseline label and the control label only differs in the control standard, which is a cross-dimensional control standard, and will not be elaborated here.

[0079] It should be noted that this invention mainly involves text data association, image data association, and text and image data association. Text data association is obtained using TF-IDF + cosine similarity, image data association is obtained using feature extraction + similarity calculation, and text and image data association is obtained using TF-IDF + image features + similarity calculation. The specific process will not be elaborated here.

[0080] It should be further explained that the process of generating corresponding data indexing rules based on the data association graph includes:

[0081] For each topology node, extract the corresponding text features from the relevant content of the label. It should be further noted that if the relevant content of the label is image content, then convert the corresponding image features into the corresponding text content and extract the text features within the text content.

[0082] Associate the extracted text features with the topological node;

[0083] Summarize all identical text features and aggregate the topology nodes corresponding to each identical text feature to obtain the set of topology nodes corresponding to that text feature;

[0084] Generate index links with the set of topological nodes, and associate the index links with the text features. Summarize the generated index links as first-level index links.

[0085] Select each topology node in the set of topology nodes corresponding to each first-level index link, and select other topology nodes according to the same-dimensional or cross-dimensional label association level between the selected topology node and other topology nodes, and mark the selected other topology nodes as second-level topology nodes.

[0086] The selected secondary topology nodes are aggregated to obtain a set of secondary topology nodes, and secondary index links are generated.

[0087] Associating all secondary index links corresponding to the selected topological node with the primary index links, thereby obtaining the construction of data index rules, and realizing the retrieval of core sample data according to the data index rules;

[0088] Example:

[0089] The user inputs search terms, which are then matched with the text features of each topology node. The topology nodes corresponding to the matching results are used as first-level index links, and the relevant content within each topology node is retrieved through these first-level index links.

[0090] Based on the various topology nodes involved in the primary index links, the corresponding secondary topology nodes are determined, thereby determining the corresponding secondary index links, and the relevant content within each secondary topology node is retrieved through the secondary index links.

[0091] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the present invention. Any modifications or equivalent substitutions made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the scope of the present invention.

Claims

1. A method for constructing a digital core repository based on data association, characterized in that, include: Collect core sample data and distinguish the dimensions of the collected core sample data; Core sample data of different dimensions are processed to generate spatiotemporal data sequences of corresponding dimensions; Based on the spatiotemporal data sequences corresponding to core sample data of different dimensions, a data association map between core sample data of different dimensions is constructed. Based on the data association map, corresponding data index rules are generated, and the digital core library is constructed based on the data index rules and the collected core sample data. The process of constructing a data association map between core sample data of different dimensions based on the spatiotemporal data sequences corresponding to core sample data of different dimensions includes: All time and spatial labels belonging to the same data dimension are aggregated, and the same-dimensional correlation coefficient between different labels is obtained based on the aggregated time and spatial labels. Similarity matching is performed on various time and spatial labels of different data dimensions, and cross-dimensional correlation coefficients between different labels are obtained based on the similarity matching results; Set several same-dimensional association threshold ranges and cross-dimensional association threshold ranges, and each same-dimensional association threshold range and cross-dimensional association threshold range has a corresponding association level; The obtained same-dimensional correlation coefficients are matched with each same-dimensional correlation threshold range. When the same-dimensional correlation coefficient is within any same-dimensional correlation threshold range, the correlation level corresponding to the same-dimensional correlation threshold range is set as the same-dimensional correlation level between the tags corresponding to the same-dimensional correlation coefficient. The obtained cross-dimensional correlation coefficients are matched with each cross-dimensional correlation threshold range. When the cross-dimensional correlation coefficient is within any cross-dimensional correlation threshold range, the correlation level corresponding to the cross-dimensional correlation threshold range is set as the tag cross-dimensional correlation level between the tags corresponding to the cross-dimensional correlation coefficient. Generate topological nodes corresponding to each label, and generate topological routes and weights between different labels based on the same-dimensional association level between labels of data in the same dimension and the cross-dimensional association level between labels of data in different dimensions, thereby completing the construction of the data association graph.

2. The method for constructing a digital core repository based on data association according to claim 1, characterized in that, The process of collecting core sample data and distinguishing its dimensions includes: The core sample data consists of several types of data, including descriptive data, analytical test data, and image data. The core sample data is split according to different data types to obtain the data items belonging to the corresponding data types; Each data type is treated as a separate data dimension, denoted as the first data dimension; Each data item under each data type is treated as a separate data dimension, denoted as the second data dimension.

3. The method for constructing a digital core repository based on data association according to claim 2, characterized in that, The process of processing core sample data of different dimensions to generate spatiotemporal data sequences of corresponding dimensions includes: The core sample data in each first data dimension are traversed separately, and the time-related data in the core sample data are marked and corresponding time labels are generated, and the spatially related data are marked and spatial labels are generated. The second data dimension corresponding to the time-related data of the obtained time tags and the spatial-related data of the obtained spatial tags is summarized to obtain the spatiotemporal data sequence corresponding to each first data dimension.

4. The method for constructing a digital core repository based on data association according to claim 3, characterized in that, The process of obtaining the same-dimensional correlation coefficient between different labels is as follows: Select any second data dimension and obtain the relevant data corresponding to each label within that second data dimension, and set the corresponding comparison standard based on the relevant data content; The relevant content corresponding to each tag is compared with the relevant content of another tag to obtain the corresponding relevance. Based on the corresponding comparison criteria and the obtained correlation degree, the same-dimensional correlation coefficient between labels is obtained.

5. The method for constructing a digital core repository based on data association according to claim 4, characterized in that, The process of obtaining the cross-dimensional correlation coefficient between different labels is as follows: Choose any label as the baseline label and choose another label as the comparison label, and the comparison label and the baseline standard do not belong to the same second data dimension; Obtain the correlation degree between the relevant content of the baseline label and the control label, and obtain the cross-dimensional correlation coefficient between the baseline label and the control label based on the corresponding control standard and the obtained correlation degree.

6. The method for constructing a digital core repository based on data association according to claim 5, characterized in that, The process of generating corresponding data indexing rules based on data association graphs includes: For each topology node, extract the corresponding text features from the relevant content of the label. It should be further noted that if the relevant content of the label is image content, then convert the corresponding image features into the corresponding text content and extract the text features within the text content. Associate the extracted text features with the topological node; Summarize all identical text features and aggregate the topology nodes corresponding to each identical text feature to obtain the set of topology nodes corresponding to that text feature; Generate index links with the set of topological nodes, and associate the index links with the text features. Summarize the generated index links as first-level index links. Select each topology node in the set of topology nodes corresponding to each first-level index link, and select other topology nodes according to the same-dimensional or cross-dimensional label association level between the selected topology node and other topology nodes, and mark the selected other topology nodes as second-level topology nodes. The selected secondary topology nodes are aggregated to obtain a set of secondary topology nodes, and secondary index links are generated. By associating all secondary index links corresponding to the selected topological node with the primary index links, the data index rules are constructed, and the core sample data can be retrieved according to the data index rules.