A blockchain hybrid index retrieval method, system and related device
By combining blockchain Merkle trees and semantic potential field mapping mechanisms, the problem of low semantic understanding and retrieval efficiency in multi-source risk information retrieval is solved, enabling efficient tracing and early warning of risk information in the capital market.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- THE THIRD RES INST OF MIN OF PUBLIC SECURITY
- Filing Date
- 2026-03-10
- Publication Date
- 2026-06-19
AI Technical Summary
Existing multi-source risk information retrieval methods cannot simultaneously achieve semantic understanding, verifiability, and retrieval efficiency, making it difficult to meet the needs of tracing and early warning of risk information in the capital market.
By combining the blockchain Merkle tree structure with the semantic potential field mapping mechanism, a semantic potential field model is introduced on the basis of the blockchain's Merkle tree structure to convert semantic similarity into potential energy value. Combined with the off-chain inverted index caching mechanism, semantic-driven block number retrieval and verification are realized.
It achieves a balance between semantic understanding, verifiability, and efficient retrieval in the context of risk information tracing and early warning in the capital market, ensuring that the data is tamper-proof and the query results are accurate.
Smart Images

Figure CN122240744A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of information retrieval and blockchain technology, specifically to an efficient hybrid index retrieval technology that combines the blockchain Merkle tree structure with a semantic potential field mapping mechanism. Background Technology
[0002] With the rapid development of the capital market and the increasing diversification of information dissemination channels, market risk events often involve complex data spanning multiple institutions, time periods, and dimensions. How to efficiently retrieve and trace this multi-source risk information while ensuring data credibility and verifiability has become a crucial issue for regulators and risk control.
[0003] Current retrieval methods for multi-source risk information mainly include keyword inverted indexing, vector semantic retrieval, and blockchain indexing. However, these methods all have shortcomings in practical applications and cannot meet actual needs.
[0004] Among them, the keyword inverted index method, although fast, lacks semantic understanding and is difficult to deal with the implicit semantics of risk information; Vector semantic retrieval methods, such as embedding models, can capture semantic similarity, but they are computationally expensive and it is difficult to guarantee the verifiability of the results. Blockchain indexing methods (i.e. pure on-chain storage and retrieval methods) can ensure data immutability through on-chain storage and hash verification, but they have low retrieval efficiency and lack semantic layer support.
[0005] As can be seen from the above, there is a current need for a multi-source risk information retrieval scheme that can simultaneously take into account semantic understanding, verifiability, and retrieval efficiency. Summary of the Invention
[0006] To address the problems existing in existing multi-source risk information retrieval schemes, this invention provides a blockchain hybrid index retrieval scheme. This scheme combines the blockchain Merkle tree structure with a semantic potential field mapping mechanism to form an efficient hybrid index retrieval method, which can simultaneously take into account semantic understanding, verifiability, and retrieval efficiency. It can effectively overcome the problems faced by existing multi-source risk information retrieval schemes and is suitable for scenarios such as tracing the source, early warning, and verifiability query of risk information in the capital market.
[0007] To achieve the above objectives, the first aspect of the present invention provides a blockchain hybrid index retrieval method, wherein the retrieval method firstly introduces a semantic potential field model on the basis of the Merkle tree structure of the blockchain to convert the semantic similarity quantity into the corresponding potential energy value; Next, the potential energy value is mapped to the block number index to complete the semantically driven block number retrieval; Next, using the off-chain inverted index caching mechanism, we perform multi-dimensional attribute retrieval and verification.
[0008] In some implementations of this retrieval method, when calculating the potential energy value based on a semantic potential field model, the retrieval method includes: (1) Generate an embedding vector for each document and query; (2) For the generated vectors, use cosine distance to measure vector similarity and calculate semantic distance; (3) Convert the obtained semantic distance into the corresponding potential energy value.
[0009] In some implementations of this retrieval method, the retrieval method establishes an off-chain inverted index cache, so that each keyword or risk subject corresponds to a potential list of block numbers.
[0010] In some embodiments of this retrieval method, a Merkle tree verification step is included.
[0011] In some implementations of this retrieval method, during retrieval and result verification, the method quickly locates candidate blocks in the off-chain index; calculates keyword or main semantic hash; calls the on-chain Merkle tree for verification; and returns the sorted retrieval results and verification path.
[0012] To achieve the above objectives, a second aspect of the present invention provides a blockchain hybrid index retrieval system, the retrieval system comprising a semantic potential field calculation module, a block number mapping module, an off-chain inverted index caching module, and a retrieval and result verification module. The semantic potential field calculation module is configured to introduce a semantic potential field model based on the Merkle tree structure of the blockchain, and convert the semantic similarity quantity into the corresponding potential energy value. The block number mapping module is configured to interact with the semantic potential field calculation module to map potential energy values to block number indexes, thereby completing semantic-driven block number retrieval. The off-chain inverted index cache module is configured to maintain an inverted index table corresponding to the on-chain data off-chain and verify it through the Merkle tree verification mechanism. The retrieval and result verification module is configured to interact with the off-chain inverted index caching module to perform multi-dimensional attribute retrieval and verification in conjunction with the off-chain inverted index caching mechanism.
[0013] In some implementations of this retrieval system, the semantic potential field calculation module is configured to first generate an embedding vector for each document and query; then, for the generated vectors, use cosine distance to measure vector similarity and calculate semantic distance; finally, convert the obtained semantic distance into the corresponding potential energy value.
[0014] In some implementations of this retrieval system, the off-chain inverted index caching module is configured to correspond each keyword or risk subject to a potential list of block numbers.
[0015] In some implementations of this retrieval system, the retrieval and result verification module is configured to quickly locate candidate blocks in the off-chain index; calculate keyword or main semantic hash; call the on-chain Merkle tree for verification; and return the sorted retrieval results and verification path.
[0016] To achieve the above objectives, the present invention also provides a computer-readable storage medium having a program stored thereon that, when executed by a processor, implements the steps of the above-described blockchain hybrid index retrieval method.
[0017] To achieve the above objectives, the present invention also provides a processor for running a program that executes the steps of the above-described blockchain hybrid index retrieval method during runtime.
[0018] To achieve the above objectives, the present invention also provides a terminal device, the device including a processor, a memory, and a program stored in the memory and executable on the processor, wherein the program code is loaded and executed by the processor to implement the steps of the above-described blockchain hybrid index retrieval method.
[0019] To achieve the above objectives, the present invention also provides a computer program product that, when executed on a data processing device, is adapted to perform the steps of the above-described blockchain hybrid index retrieval method.
[0020] The blockchain hybrid index retrieval scheme provided by this invention combines the blockchain Merkle tree structure with a semantic potential field mapping mechanism. By introducing a semantic potential field model on the basis of the blockchain's Merkle tree structure, semantic similarity is transformed into physical potential energy distribution, thereby realizing semantic-driven block number mapping retrieval. On this basis, it further combines an off-chain inverted index caching mechanism to achieve efficient querying and verification of multi-dimensional attributes such as risk subjects, keywords, and event types.
[0021] The blockchain hybrid index retrieval scheme provided by this invention can simultaneously take into account semantic understanding, verifiability, and retrieval efficiency, and can be applied to scenarios of tracing, early warning, and verifiable query of risk information in the capital market. Attached Figure Description
[0022] The present invention will be further described below with reference to the accompanying drawings and specific embodiments.
[0023] Figure 1 This is a flowchart illustrating the blockchain hybrid index retrieval method of the present invention.
[0024] Figure 2This is a system block diagram of the blockchain hybrid index retrieval system in this invention.
[0025] Figure 3 This is a schematic diagram illustrating the structural principle of the inverted index cache module in this invention. Detailed Implementation
[0026] To make the technical means, creative features, objectives and effects of this invention easier to understand, the invention will be further described below with reference to specific illustrations.
[0027] This invention combines the blockchain Merkle tree structure with a semantic potential field mapping mechanism to form an efficient blockchain hybrid index retrieval scheme.
[0028] This blockchain hybrid index retrieval solution mainly achieves multi-source risk information retrieval by combining on-chain storage, off-chain retrieval, mapping logic, and result verification, while taking into account semantic understanding, verifiability, and retrieval efficiency.
[0029] Among them, on-chain storage means storing data on the blockchain and ensuring that the data is immutable through the Merkle tree.
[0030] Specifically, this solution stores all original risk information on the blockchain in the form of blocks. Each block contains information content, a timestamp, a block number, and a Merkle tree root, thereby ensuring the immutability and verifiability of on-chain data based on the Merkle tree structure.
[0031] Off-chain retrieval calculates text semantic vectors through a semantic potential field mapping model, transforms semantic similarity into potential energy, and establishes an inverted index cache.
[0032] Specifically, this solution generates semantic vectors (embeddings) for text data and calculates the similarity between the information and the query semantics. Simultaneously, it constructs a semantic potential field model, ensuring that information with higher semantic similarity has lower potential energy, thus exhibiting stronger attraction. These potential energy values are mapped to block numbers for quick location of relevant blocks. An off-chain inverted index cache is established, assigning each keyword or risk subject a potential list of block numbers, improving the efficiency of multi-condition retrieval.
[0033] Block number mapping is used to map potential energy values to block numbers, enabling fast lookup of semantic indexes to block numbers.
[0034] Results verification: The consistency and completeness of the off-chain retrieval results were verified using Merkle Proof.
[0035] Specifically, this solution maps potential energy values to block number indexes to achieve semantic-driven block number retrieval. Based on this, Merkle Proof is used to verify the consistency between off-chain information and on-chain evidence for the retrieval results, ensuring that the query results are traceable and tamper-proof.
[0036] Accordingly, Figure 1 As shown, the blockchain hybrid index retrieval scheme provided by this invention can be implemented through the following steps: Step 1: Semantic potential calculation; This step introduces a semantic potential field model based on the Merkle tree structure of the blockchain, transforming semantic similarity quantities into corresponding potential energy values.
[0037] In this step, semantic similarity is quantified into a potential energy value. The lower the potential energy, the closer the semantics are to the query, which is then used for block number mapping and sorting.
[0038] The normalized potential energy value generated in this step is stored in an off-chain cache or temporary storage for use in the next step.
[0039] Step 2: Block number mapping; This step maps the potential energy value to the block number index, completing the semantic-driven block number retrieval; thus, semantic relevance is transformed into location information stored in the blockchain, facilitating rapid indexing and retrieval.
[0040] This step involves storing the generated mapping relationship on the blockchain to ensure its immutability, and it also serves as the basic data source for building the off-chain inverted index.
[0041] Step 3: Build the off-chain inverted index cache; This step involves establishing an off-chain inverted index cache, so that each keyword or risk subject corresponds to a list of potential block numbers.
[0042] This step, when building the off-chain inverted index cache, first synchronizes and parses metadata containing {keyword, block_id, weight, timestamp} from the blockchain; then, in an off-chain database (such as Elasticsearch), using the keyword as the key, it constructs an inverted index pointing to a list of block_ids, sorted by weight. This transforms the immutable, structured semantic mapping results on the blockchain into a high-performance, flexibly queryable off-chain index structure, enabling each keyword to be quickly associated with a batch of potential blocks sorted by semantic relevance, providing crucial index support for multi-attribute retrieval.
[0043] Step 4: Search and Result Verification; This step combines an off-chain inverted index caching mechanism to perform multi-dimensional attribute retrieval and verification.
[0044] Based on this, the blockchain hybrid index retrieval scheme also integrates a Merkle tree verification mechanism. Based on this Merkle tree verification mechanism, Merkle Proof verification can be provided on the blockchain for data such as candidate block numbers, information hash lists in blocks (on-chain evidence storage), and off-chain retrieval results. Merkle Proof verification ensures the immutability and verifiability of on-chain data, and verifies the consistency and integrity of off-chain retrieval results.
[0045] In this step, during Merkle tree verification, after obtaining candidate data through off-chain indexing, the Merkle Proof path is retrieved from the blockchain node, starting with the data and its provenance location on the blockchain (block number). The verification module uses the hashes of sibling nodes along this path, starting from the leaf hash of the target data, and performs hash concatenation and calculation layer by layer upwards until the root hash is generated. Finally, this calculated root hash is compared with the Merkle Root recorded in the corresponding block header. Based on these steps, this scheme makes Merkle Proof verification a mandatory step in the retrieval process, providing an independently verifiable credential of authenticity guaranteed by blockchain cryptography for results retrieved off-chain that lack inherent trust, ensuring the efficiency and accuracy of the retrieval results.
[0046] The following details the specific implementation scheme of each step in this blockchain hybrid index retrieval scheme.
[0047] In some implementations, this solution innovatively adds a layer of semantic mapping logic on top of the Merkle tree evidence storage system, which is only used for data integrity verification, when performing semantic potential field calculation, and uses this to form a corresponding semantic potential field model.
[0048] As further explanation, in this scheme, each documented text is regarded as a field source, and its semantic vector determines its position in the semantic space. When a user queries, the vector of the query text will enter the semantic space as a "probe". The semantic potential field model calculates the semantic distance between the probe and each field source and converts it into a potential value through a potential energy function (such as Φ_i = Σ(Q_j / (d_i +ε))).
[0049] Accordingly, this scheme not only measures semantic similarity, but also quantifies multidimensional semantic relationships into a scalar potential that can be directly used for ranking and mapping through weighting and normalization.
[0050] Based on this, this solution can perform semantic potential field calculations on user query keywords or natural language text queries, on-chain or off-chain text information collections such as Documents, and calculate the corresponding potential energy value Φ_i for each document. Thus, semantic similarity is quantified into potential energy. The lower the potential energy, the closer it is to the query semantics, which can then be used for block number mapping and sorting.
[0051] Specifically, this solution first generates an embedding vector for each document and query: Document vector: v_i = Embed(Doc_i); Query vector: v_q = Embed(Query).
[0052] Based on this, semantic distance is calculated for the generated embedding vectors. Cosine distance can be used to measure vector similarity, and this can be accomplished using the following calculation model: Cosine distance: d(i, q) = 1 - (v_i·v_q) / (‖v_i‖ ‖v_q‖).
[0053] Next, the determined semantic distance is converted into potential energy, which can be accomplished based on the following computational model: Φ_i = Σ_{j=1}^{N_q} [ Q_j / ( d(i, q_j) + ε ) ]; Q j This represents the importance weight of the j-th keyword in the query. The importance weight can be calculated using TF-IDF or other attention scores; N q This represents the number of keywords used in the search; ε\epsilon is used to prevent division by zero.
[0054] Finally, all the obtained potential energy values Φ_i are normalized to [0,1]: Φ_i^{norm} (normalizes Φ_i to the interval [0,1].
[0055] In some implementations, this scheme uses the normalized semantic potential value Φ_i_norm as input when mapping block numbers, and converts it into a specific block number through an innovative discrete mapping function BlockID_i = floor(α * Φ_i_norm * N_b).
[0056] This scheme introduces a smoothing coefficient α and utilizes the floor() function for rounding down. By adjusting α (0~1.2), the "clustering" of the mapping of potential values to block numbers can be controlled. The larger α is, the earlier the block number range of high-potential (highly relevant) documents are mapped, and the higher the distinguishability. The smaller α is, the more uniform the mapping distribution, avoiding the hotspot block problem caused by excessive concentration of semantics.
[0057] Accordingly, this scheme successfully transforms continuous, unstructured semantic metrics into a discrete, physically meaningful on-chain address, achieving the core goal of "semantic indexability." That is, by storing evidence on the chain, this mapping relationship of "keyword-potential energy-block number" is fixed, forming a verifiable semantic addressing mechanism.
[0058] As further explanation, this step is preferably performed off-chain, but the generated mapping relationship (such as {keyword, block_id, weight}) will be submitted to the blockchain for notarization to ensure that the mapping rules and results themselves are also immutable.
[0059] Based on this, this scheme can map the normalized semantic potential value Φ_i^{norm} into location information stored in the blockchain and generate the corresponding block number BlockID. i This facilitates quick indexing and retrieval.
[0060] As further explanation, the normalized semantic potential value Φ_i^{norm} is transformed into the corresponding block number BlockID using the following mapping model. i : BlockID_i = floor( α · Φ_i^{norm} · N_b ); Where, N b Total number of blocks on the chain; α: Smoothing coefficient (0~1.2), controls the discrete balance of the mapping; floor(): Rounds down to the nearest integer block number.
[0061] Based on this mapping model, the mapping block number (BlockID) is converted. i The higher the potential energy value (the more relevant), the earlier the block number it maps to or the closer it is to the high-priority block, which facilitates fast retrieval by the off-chain inverted index; at the same time, it can also achieve adjustability, that is, the distribution density can be adjusted by the smoothing coefficient α to avoid high-density potential energy from accumulating in a few blocks.
[0062] Based on the determined block number BlockID iIt enables rapid location, such as directly locating the block containing information by block number, reducing the cost of traversing the entire chain; it also supports multi-dimensional retrieval, such as combining with the subsequent off-chain inverted index to enable joint retrieval by block number, keyword, subject, and event type.
[0063] In some implementations, this scheme extends the verification scope of Merkle Proof to a cross-block result set aggregated by semantic retrieval; simultaneously, it operates Merkle tree verification as an off-chain verification service. When performing Merkle tree verification, this scheme does not passively verify individual data entries, but actively serves user queries, receiving a list of candidate data from the retrieval module, and independently retrieving the Merkle Proof path from the blockchain for each data entry in the list and performing verification. This transforms the Merkle tree from a static storage structure into a dynamic trust endorsement engine.
[0064] Furthermore, when performing Merkle tree verification, this scheme first receives the data to be verified and its block number from the retrieval module; then, by interacting with the blockchain node, it obtains the proof path of the data in the corresponding block's Merkle tree (i.e., the list of sibling node hashes required along the path from the leaf node to the root node); next, starting from the hash of the target data, it sequentially concatenates the current hash with the sibling hashes in the path and calculates the new hash according to the calculation rules of the Merkle tree; finally, it iteratively calculates the root hash; and finally, it compares this calculated root hash with the Merkle Root read from the block header.
[0065] This step runs in the off-chain verification module, and together with steps such as semantic potential field calculation and block number mapping, it forms a front-end and back-end collaboration to constitute a complete and reliable retrieval loop.
[0066] Based on this, the Merkle tree verification constructed in this scheme can be applied to the generated candidate block ID. i Merkle Proof verification is performed using information hash lists in the block (on-chain evidence storage) and off-chain retrieval results to ensure that the off-chain retrieval results are consistent with the on-chain block evidence storage, thus achieving data immutability verification.
[0067] Specifically, the Merkle Proof verification here can be achieved through the following calculation process: (1) Obtain the hash value of the leaf node based on the corresponding hash function: h_i = H(Data_i); It should be noted that the hash function can be SHA-256, etc.
[0068] Further obtain the parent node's hash value: h_p = H(h_left || h_right), where || represents concatenating hash values.
[0069] The root node hash can be obtained by further retrieving it: Root=h root =H(H(h0∥h1)∥H(h2∥h3))… (2) Based on the given data hash h i and proof path The generated root hash is compared with the root hash recorded in the block header: Verify = (H_computed_root ?= H_block_root), If the comparison results match, the verification is successful; otherwise, it indicates that the data has been tampered with.
[0070] As further explanation, based on the above Merkle Proof verification formula, the Merkle Proof verification process of this scheme is as follows: (1) First, obtain the candidate data Data retrieved from the off-chain search. i .
[0071] (2) Next, calculate the hash h of the candidate data. i .
[0072] (3) Next, obtain the corresponding BlockID from the blockchain. i The Merkle Proof path. As an example, this step can be achieved by sending a request to a full node via the blockchain's client API. The request includes the block number BlockID_i of the data to be verified and its transaction index or location information within the block. The full node then reconstructs or reads the corresponding Merkle Proof path from its local block data based on this information and returns it. (4) Next, based on step (3), the hash is calculated layer by layer according to the Merkle tree rules, up to the root node Hcomputed root. As an example, this step first initializes the current hash h_current = h_i. For each sibling hash h_sibling in the Merkle Proof path, if the current node is the left child node (determined by the path direction), then h_new = Hash(h_current || h_sibling); if it is the right child node, then h_new = Hash(h_sibling || h_current). Then h_current is updated to h_new, and the process moves to the next level until the path is processed.
[0073] (5) Finally, compare the calculated H computed root Root Hash stored in the block header: If they match, the verification is successful; otherwise, it means the data has been tampered with.
[0074] Based on the aforementioned steps, this solution will complete the semantic potential field calculation and block number mapping, that is, the semantic potential value has been calculated based on the user query and mapped to the corresponding block number on the blockchain.
[0075] Going forward, performing full-text searches or complex conditional searches directly on the blockchain would be extremely costly due to the following issues with blockchain storage and retrieval: (1) It requires traversing each block or querying remote nodes, resulting in slow access speed to on-chain data; (2) Blockchains are mainly connected in chronological order or hash chain, lacking a flexible index structure; (3) It is difficult to query multiple conditions, such as querying “subject A + risk event + time period”.
[0076] Accordingly, this solution abandons the method of performing full-text search or complex condition retrieval directly on the chain. Instead, it innovatively constructs an off-chain inverted index cache and maintains an inverted index table corresponding to the on-chain data off-chain (such as a local database or cache system). The authenticity and integrity of the data are guaranteed by the Merkle tree verification mechanism.
[0077] The inverted index table here is built based on the {keyword, block_id, weight, timestamp} metadata parsed from the blockchain, and can form a one-to-one mirror relationship with the on-chain data.
[0078] Specifically, this solution can construct an off-chain inverted index cache and maintain an inverted index table through the following steps.
[0079] Step S1: On-chain data synchronization and parsing.
[0080] This step is used to obtain the latest semantic potential field mapping results from the blockchain, including keywords, corresponding block numbers, semantic weights, timestamps, and other information. The semantic potential field mapping results here are the data generated and stored on the blockchain in step 2, based on the block number mapping.
[0081] Specifically, this step first reads the block data of the blockchain node, which can be done via RPC interface or smart contract, for example. Next, for the read block data, the metadata of each block is parsed; the {keyword, block_id, weight, timestamp} quadruple is extracted; duplicates are filtered out, and only the latest version is kept.
[0082] Finally, the latest semantic potential field mapping result, including keywords, corresponding block numbers, semantic weights, timestamps, etc., is formed, which is the on-chain semantic mapping data list ChainData = [{keyword, block_id, weight, timestamp}, …].
[0083] Step S2: Construct the inverted index table in the chain.
[0084] Based on the list of on-chain semantic mapping data obtained in step S1, this step establishes a reverse index mapping from keywords to block numbers in an off-chain database (such as SQLite, LevelDB, ElasticSearch, etc.).
[0085] Specifically, this step first obtains the chain semantic mapping data list ChainData from the result obtained in step S1.
[0086] Next, for the obtained on-chain semantic mapping data list ChainData, all records are traversed; using keywords as index keys, the block number, weight, and timestamp are inserted into the corresponding inverted index table; and records under the same keyword are sorted by time and weight.
[0087] As a further explanation, this step involves iterating through all records in the obtained on-chain semantic mapping data list, ChainData, using keywords as index keys. For each record, {"BlockID": block_id, "Weight": weight, "Timestamp": timestamp} is inserted as an element into the list corresponding to that keyword. After iteration, for each keyword's list, it is first sorted in descending order by Weight (ensuring the most semantically relevant results come first), and then sorted in descending order by Timestamp for records with the same weight (ensuring the most recent results come first). This construction method directly integrates semantic relevance (Weight) and time freshness (Timestamp) at the index layer, laying the foundation for efficient sorting later. The resulting inverted index table has keywords as keys and an ordered list containing block information and metadata as values, supporting fast intersection, union, and other complex query operations.
[0088] Finally, output the inverted index table structure.
[0089] Step S3: Natural Language Query Parsing and Keyword Extraction.
[0090] This step is used to convert the natural language query input by the user into a keyword vector to match the index in the inverted index table generated in step S2.
[0091] Specifically, this step first obtains the user's query, such as "financial risks of company executives resigning in 2025".
[0092] Next, for the user query, NLP models (such as BERT or GPT embedding) are used for word segmentation and entity recognition, and keywords and weights (such as TF-IDF or attention weights) are extracted.
[0093] As an example, this step uses a hybrid strategy combining TF-IDF and query intent to determine keyword weights. First, TF-IDF is used to calculate the importance of a word within the document set. Then, a lightweight intent recognition model (e.g., rule-based or small neural networks) scores the query terms, determining whether they are core query terms or modifiers. The final weight is a weighted fusion of the TF-IDF score and the intent score. This ensures that core terms like "financial risk" receive higher weights, while time-limited terms like "2025" participate in the retrieval in a more appropriate way (e.g., as filtering conditions rather than core weights), thereby improving retrieval accuracy.
[0094] Finally, a list of output keywords and their corresponding weights are generated, such as ["Executive resignation", "Financial risk", "2025"], and their corresponding weights [0.5, 0.4, 0.1].
[0095] Step S4: Off-chain inverted index retrieval and intersection calculation This step can look up the list of corresponding block numbers in the inverted index table constructed in step S2 based on the keywords and their corresponding weights in the keyword list extracted in step S3.
[0096] Specifically, this step obtains the keyword list extracted in step S3 and its corresponding weights, as well as the inverted index table of the off-chain constructed in step S2; Next, for each keyword in the keyword list in step S3, the set of matching block numbers is searched in the inverted index table to form a candidate block number list; Simultaneously, intersection or weighted union operations are performed on multiple keywords; the results are then weighted and merged according to semantic weights to obtain a comprehensive score.
[0097] Specifically, in this step, for queries containing multiple keywords, the search results (block list) for each keyword are first merged; then, a weighted union method is used to calculate the score, i.e., the final comprehensive score of a block = Σ (keyword weight * the original weight of the block under that keyword). This operation will yield a candidate block list sorted by comprehensive score that incorporates multi-dimensional semantics, rather than a simple Boolean intersection, and can more accurately reflect the overall relevance of the document to the complex query.
[0098] Finally, output the list of candidate block numbers and their combined scores.
[0099] Step S5: Block content extraction and Merkle tree verification.
[0100] This step performs Merkle tree verification on the candidate block number list obtained in step S4 to ensure that the retrieved block content has not been tampered with.
[0101] Specifically, this step first obtains the list of candidate block numbers obtained in step S4, as well as the on-chain block header information (including Merkle Root), which is obtained directly from the synchronized blockchain node or through RPC query.
[0102] Next, the target data in the block and its Merkle Proof path are obtained; the leaf hash is calculated accordingly and combined layer by layer along the path until the root node is reached; the calculated root hash is verified to be consistent with the Merkle Root recorded in the block header.
[0103] Finally, output the set of real data that has passed the verification.
[0104] Step S6: Update the cached results and time decay.
[0105] This step is used to cache user query results, accelerate repeated searches, and update weights based on a time decay mechanism.
[0106] Specifically, this step first stores the retrieval results (i.e., the final output data and its comprehensive score that have passed the verification in step S5) into the cache database; checks the cache each time a new query is made; and adjusts the weights of old data according to the time decay formula: W_t = W_0 × e^{-λ · Δt}; Where λ is the attenuation coefficient and Δt is the time difference.
[0107] Output: Latest and reliable semantic search results.
[0108] Step S7: Output the overall results.
[0109] This step is used to output the final results in the required format.
[0110] Based on the aforementioned steps, this solution can complete the blockchain hybrid index retrieval and result verification based on semantic potential field mapping and Merkle tree verification.
[0111] That is, after the user enters the query conditions, this solution will quickly locate the candidate block in the off-chain index based on the aforementioned steps; calculate the keyword or main semantic hash; call the on-chain Merkle Proof for verification; and return the sorted search results and verification path.
[0112] The blockchain hybrid index retrieval scheme provided in this invention can be configured into a corresponding software program to form a blockchain hybrid index retrieval system. When running, this software program executes the aforementioned blockchain hybrid index retrieval method and stores it in a corresponding storage medium for retrieval and execution by a processor.
[0113] See Figure 2 The blockchain hybrid index retrieval system 100 provided by the present invention mainly includes the following functions: semantic potential field calculation module 110, block number mapping module 120, off-chain inverted index caching module 130, retrieval module 140, and verification module 150.
[0114] The semantic potential field calculation module 110 in this system is configured to introduce a semantic potential field model based on the Merkle tree structure of the blockchain, and convert the semantic similarity quantity into the corresponding potential energy value.
[0115] In this system, the block number mapping module 120 is configured to interact with the semantic potential field calculation module 110 to map potential energy values to block number indexes and complete semantic-driven block number retrieval.
[0116] The off-chain inverted index cache module 130 in this system is configured to cooperate logically with the block number mapping module 120 and the blockchain node, receiving the mapped block number and the original on-chain data, and using it to maintain an inverted index table on-chain that corresponds to the on-chain data.
[0117] The retrieval module 140 in this system is configured to interact with the semantic potential field calculation module 110, the block number mapping module 120, the off-chain inverted index caching module 130, and the verification module 150, in order to perform multi-dimensional attribute retrieval and verification in conjunction with the off-chain inverted index caching mechanism.
[0118] The verification module 150 in this system is configured to interact with the retrieval module 140 to verify the consistency and completeness of the downstream retrieval results through MerkleProof.
[0119] The following provides further explanation of the specific configuration schemes of each module in this system and the structures or equipment that may be involved.
[0120] In some implementations of this system, the semantic potential field calculation module 110 in the system introduces a layer of semantic mapping logic on top of the Merkle tree evidence storage system used only for data integrity verification, and forms a corresponding semantic potential field model in this way.
[0121] This semantic potential field calculation module 110 treats each documented text as a field source, and its semantic vector determines its position in the semantic space. When a user queries, the vector of the query text enters the semantic space as a "probe". The semantic distance between the probe and each field source is calculated and converted into a potential value through a potential energy function (such as Φ_i = Σ(Q_j / (d_i +ε))).
[0122] The semantic potential field calculation module 110 preferably runs on an off-chain server. The generated potential energy value is used as an intermediate result and temporarily stored in an off-chain cache or memory database for use by the block number mapping module.
[0123] Furthermore, the semantic potential field calculation module 110 thus formed, when running, first generates an embedding vector for each document and query; then, for the generated vectors, it uses cosine distance to measure vector similarity and calculates semantic distance; finally, it converts the obtained semantic distance into the corresponding potential energy value.
[0124] As an example, when this semantic potential field calculation module 110 is running, it takes user query keywords or natural language text queries, and on-chain or off-chain text information sets Documents as input, and generates an embedding vector for each document and query: Document vector: v_i = Embed(Doc_i); Query vector: v_q = Embed(Query).
[0125] Based on this, the semantic potential field calculation module 110 performs semantic distance calculation on the generated embedding vectors. Cosine distance can be used to measure vector similarity, and this can be accomplished based on the following calculation model: Cosine distance: d(i, q) = 1 - (v_i·v_q) / (‖v_i‖ ‖v_q‖).
[0126] Next, the semantic potential field calculation module 110 converts the determined semantic distance into potential energy, which can be accomplished based on the following calculation model: Φ_i^{norm} (normalizes Φ_i to the interval [0,1].
[0127] Q j This represents the importance weight of the j-th keyword in the query. The importance weight can be calculated using TF-IDF or other attention scores; N q This represents the number of keywords used in the search; ε\epsilon is used to prevent division by zero.
[0128] Finally, all potential energy values Φ_i calculated by this semantic potential field calculation module 110 are normalized to [0,1]: Φ_i^{norm}, (normalize Φ_i to the interval [0,1]).
[0129] As a further example, the semantic potential field calculation module 110 inputs the query: "financial risks of executive resignation"; Document Collection: "Sudden Resignation of a Listed Company Executive Causes Market Panic," "Company Penalized by Regulators for Financial Fraud," "Industry News: Sales of Smart Electric Vehicles Grow." Keyword weights: [0.7, 0.3] (corresponding to "executive resignation" and "financial risk" respectively) The document collection here is obtained from the on-chain document database, while the keyword weights are calculated by a separate NLP preprocessing service based on the query text and passed as input to this module.
[0130] Based on this, the semantic potential field calculation module 110 generates vectors for the query and each document; Next, the cosine distance is calculated to measure vector similarity: Document 1 distance ≈ 0.2; Document 2 distance ≈ 0.3; Document 3 distance ≈ 0.9; Next, potential energy is calculated, converting semantic distance into potential energy: Φ1 ≈ 0.7 / (0.2+ε) + 0.3 / (0.2+ε) ≈ 5 Φ2 ≈ 0.7 / (0.3+ε) + 0.3 / (0.3+ε) ≈ 3.33 Φ3 ≈ 0.7 / (0.9+ε) + 0.3 / (0.9+ε) ≈ 1.11 Finally, all Φ i Normalize to [0,1]: Φ1 norm ≈ 1.0; Φ2 norm ≈ 0.59; Φ3 norm ≈ 0.0; Output: [Φ1 norm , Φ2 norm , Φ3 norm = [1.0, 0.59, 0.0].
[0131] Note: Document 1 is the most relevant, and document 3 is the least relevant.
[0132] In some implementations of this system, the block number mapping module 120 in the system can be configured to receive the normalized potential energy value Φ_i_norm from the semantic potential field calculation module 10, and execute the discrete mapping function BlockID_i = floor(α * Φ_i_norm * N_b) to convert it into a specific block number.
[0133] The discrete mapping function running in this block number mapping module 120 introduces a smoothing coefficient α and utilizes the floor() function for rounding down. By adjusting α (0~1.2), the "clustering" of potential energy values to block numbers can be controlled. The larger α is, the earlier the block number range of high-potential (highly relevant) documents are mapped, and the higher the distinguishability. The smaller α is, the more uniform the mapping distribution, avoiding the problem of hot blocks caused by excessive concentration of semantics.
[0134] Furthermore, this module runs off-chain, but the generated mapping relationships (such as {keyword, block_id, weight}) are assembled into transactions and submitted to the blockchain for notarization, ensuring the openness, transparency, and immutability of the mapping rules and results. The generated mapping results are also stored in an off-chain database for use by the subsequent inverted index building module.
[0135] Furthermore, the resulting block number mapping module 120, during runtime, can obtain the semantic potential energy value Φ_i_norm calculated and normalized by the semantic potential field calculation module 110, and convert the normalized semantic potential energy value Φ_i_norm into the corresponding block number BlockID through the following mapping model. i This information is then converted into location information stored on the blockchain, facilitating rapid indexing and retrieval.
[0136] BlockID_i = floor( α · Φ_i^{norm} · N_b ); Where, N b Total number of blocks on the chain; α: Smoothing coefficient (0~1.2), controls the discrete balance of the mapping; floor(): Rounds down to the nearest integer block number.
[0137] The block ID generated based on this mapping i The higher the potential energy value (the more relevant), the earlier the block number it maps to or the closer it is to the high-priority block, which facilitates fast retrieval by the off-chain inverted index; at the same time, it can also achieve adjustability, that is, the distribution density can be adjusted by the smoothing coefficient α to avoid high-density potential energy from accumulating in a few blocks.
[0138] As a further example, continuing with the previous instance, let's set the total number of blocks on the chain to N. b =1000, smoothing coefficient α=1.0; Based on this, the semantic potential energy value calculated and normalized by the semantic potential field calculation module 110 is obtained: Φ^norm = [1.0, 0.57, 0.0].
[0139] Block number mapping module 120 performs mapping calculations based on the mapping model and obtains the following results: Document 1: BlockID_1 = floor(1.0 * 1.0 * 1000) = 1000 → 999 (maximum value limit); Document 2: BlockID_2 = floor(1.0 * 0.57 * 1000) = 570; Document 3: BlockID_3 = floor(1.0 * 0.0 * 1000) = 0.
[0140] The higher the potential energy (the more relevant), the earlier the mapped block number or the higher the priority region, which facilitates fast retrieval by the off-chain inverted index.
[0141] Based on this, the obtained Block ID can be used... i Perform the following fast indexing and retrieval: Fast location: Directly locate the block containing information by block number, reducing the cost of traversing the entire chain.
[0142] Multidimensional search support: Combined with the subsequent off-chain inverted index, it can perform joint searches based on block number, keywords, subject, and event type.
[0143] Adjustability: The smoothing coefficient α can be adjusted to change the distribution density, preventing high-density potential energy from accumulating in a few blocks.
[0144] In some implementations of this system, the verification module 150 is configured as a Merkle tree verification module, which operates as an off-chain verification service module. When performing Merkle tree verification, this module does not passively verify individual data entries, but actively provides query services to users, receiving a list of candidate data from the retrieval module. For each data entry in the list, it independently retrieves its Merkle Proof path from the blockchain and performs verification. This transforms the Merkle tree from a static structure into a dynamic trust endorsement engine.
[0145] Furthermore, when this Merkle tree verification module performs Merkle tree verification, it first receives the data to be verified and its block number from the retrieval module; then, by interacting with the blockchain node, it obtains the proof path of the data in the corresponding block's Merkle tree (i.e., the list of sibling node hashes required along the path from the leaf node to the root node); next, starting from the hash of the target data, it sequentially concatenates the current hash with the sibling hashes in the path and calculates the new hash according to the calculation rules of the Merkle tree; finally, it iteratively calculates the root hash; and finally, it compares this calculated root hash with the MerkleRoot read from the block header.
[0146] Specifically, this verification module 150 can complete the Merkle Proof verification through the following processing flow: (1) Obtain the hash value of the leaf node based on the corresponding hash function: h_i = H(Data_i); It should be noted that the hash function can be SHA-256, etc.
[0147] Further obtain the parent node's hash value: h_p = H(h_left || h_right), where || represents hash value concatenation.
[0148] The root node hash can be obtained by further retrieving it: Root=h root =H(H(h0∥h1)∥H(h2∥h3))… (2) Based on the given data hash h i and proof path The generated root hash is compared with the root hash recorded in the block header: Verify = (H_computed_root ?= H_block_root), If the comparison results match, the verification is successful; otherwise, it indicates that the data has been tampered with.
[0149] The verification module 150 of this system, based on the above scheme, can verify the candidate block IDs generated by the block number mapping module 120. i MerkleProof verification is performed using information hash lists in the block (on-chain evidence storage) and off-chain retrieval results to ensure that the off-chain retrieval results are consistent with the on-chain block evidence storage, thus achieving data immutability verification.
[0150] As further explanation, the verification module 150 in this system completes the Merkle Proof verification process as follows: (1) First, obtain the candidate data Data retrieved from the off-chain search. i ; (2) Next, calculate the hash h of the candidate data. i ; (3) Next, obtain the corresponding BlockID from the blockchain. i The Merkle Proof path. As an example, this step can be achieved by sending a request to a full node via the blockchain's client API. The request includes the block number BlockID_i of the data to be verified and its transaction index or location information within the block. The full node then reconstructs or reads the corresponding Merkle Proof path from its local block data based on this information and returns it. (4) Next, based on step (3), the hash is calculated layer by layer according to the Merkle tree rules, up to the root node Hcomputed root. As an example, this step first initializes the current hash h_current = h_i. For each sibling hash h_sibling in the Merkle Proof path, if the current node is the left child node (determined by the path direction), then h_new = Hash(h_current || h_sibling); if it is the right child node, then h_new = Hash(h_sibling || h_current). Then h_current is updated to h_new, and the process moves to the next level until the path is processed.
[0151] (5) Finally, compare the calculated H computed root Root Hash stored in the block header: If they match, the verification is successful; otherwise, it means the data has been tampered with.
[0152] As a further example, continuing with the previous instance, this reference verification module 150 completes the following data preparation before performing Merkle Proof verification: Retrieve candidate data from off-chain retrieval. i "Announcement of Resignation of Company Executives"; Calculate the hash of candidate data: h leaf = H(“Announcement of Resignation of Company Executives”) = “abc123”; Obtain the corresponding BlockID directly from the blockchain. i Merkle Proof path: [(h s 1, True), (h s 2, False), ...]; and the block header Root Hash: "root789...".
[0153] Based on this, the reference verification module 150 performs Merkle Proof verification, first calculating the leaf node hash h. leaf =="abc123..."; Next, concatenate the hashes h of the sibling nodes sequentially according to the path. sibling = "def456..." and calculate the parent node hash h parent = H(h leaf || h sibling ); Finally, continue iterating until the root node H is reached. root= H(...last layer...), and compare H accordingly. root == Root_Hash_Stored (i.e., "root789...") stored in the block header: If they are equal, the verification is successful; otherwise, the verification fails.
[0154] In some implementations of this system, the off-chain inverted index cache module 130 is specifically configured to build an off-chain inverted index cache and maintain an inverted index table corresponding to the on-chain data off-chain (e.g., in a local database or cache system), so that each keyword or risk subject corresponds to a list of potential block numbers; and the authenticity and integrity of the data are guaranteed by the Merkle tree verification mechanism.
[0155] In this solution, when the off-chain inverted index cache module 130 establishes the inverted index cache, it first periodically or in real time pulls incremental data containing {keyword, block_id, weight, timestamp} from the blockchain; then, in the local database, using keyword as the primary key, it organizes the parsed block_id, weight, timestamp and other information into an ordered list, which serves as the value of the primary key.
[0156] This scheme sorts by weight and timestamp during insertion, allowing subsequent retrieval operations to directly utilize this ordered structure for fast TOP-N queries without additional sorting calculations. This structure efficiently maps each keyword to a pre-sorted list of block numbers based on semantic relevance and timeliness, significantly improving the response speed of complex queries. Furthermore, the subsequent Merkle Proof verification mechanism ensures the consistency between the data retrieved from this structure and the on-chain data, achieving a balance between efficiency and reliability.
[0157] Furthermore, the off-chain inverted index caching module 130 will work closely with the block number mapping module 120, the retrieval module 140, and the verification module 150. First, it will obtain and parse the metadata required to build the index from the output of the block number mapping module 120 (the semantic mapping data that has been put on the chain) or directly from the data synchronized from the blockchain. Second, when the retrieval module 140 initiates a query, this module is responsible for quickly locating candidate blocks from the index. Finally, when the verification module 150 needs to verify a piece of data, this module needs to provide the location information of the data in the index (such as the block number) so that the verification module can accurately request Merkle Proof from the chain.
[0158] Combination Figure 3As shown, the off-chain inverted index caching module 130 of this scheme can be specifically implemented by the cooperation of the on-chain data synchronization and parsing sub-module 131, the off-chain inverted index table construction sub-module 132, the parsing and extraction sub-module 133, the retrieval and intersection calculation sub-module 134, the extraction and verification sub-module 135, the update sub-module 136, and the output sub-module 137.
[0159] Among them, the on-chain data synchronization and parsing submodule 131 is configured to obtain the latest semantic potential field mapping results from the blockchain, including keywords, corresponding block numbers, semantic weights, timestamps and other information.
[0160] Specifically, the on-chain data synchronization and parsing submodule 131 first reads the block data of the blockchain node, which can be done via RPC interface or smart contract, for example. Then, for the read block data, it parses the metadata in each block; extracts the {keyword, block_id, weight, timestamp} quadruple; filters out duplicates and keeps only the latest version; finally, it forms the latest semantic potential field mapping result including keywords, corresponding block numbers, semantic weights, timestamps, etc., which is the on-chain semantic mapping data list ChainData = [{keyword,block_id, weight, timestamp}, …].
[0161] The off-chain inverted index table construction submodule 132 is configured to interact with the on-chain data synchronization and parsing submodule 131. Based on the on-chain semantic mapping data list obtained by the on-chain data synchronization and parsing submodule 131, it can establish an inverted index mapping from keywords to block numbers in an off-chain database (such as SQLite, LevelDB, ElasticSearch, etc.).
[0162] Specifically, the off-chain inverted index table construction submodule 132 is configured to first obtain the on-chain semantic mapping data list ChainData from the results obtained by the on-chain data synchronization and parsing submodule 131. Next, for the obtained on-chain semantic mapping data list ChainData, it iterates through all records; using keywords as index keys; inserts the block number, weight, and timestamp into the corresponding inverted index table; sorts records under the same keyword by time and weight; and finally, outputs the inverted index table structure.
[0163] The parsing and extraction submodule 133 is configured for query parsing and keyword extraction for natural language, which can convert the natural language query input by the user into a keyword vector to match the index in the off-chain inverted index table generated by the off-chain inverted index table construction submodule 132.
[0164] Specifically, this parsing and extraction submodule 133 is configured to first obtain the user query, such as "company executive resignation financial risk 2025". Next, for the user query, NLP models (such as BERT or GPT embedding) are used for word segmentation and entity recognition, and keywords and weights (such as TF-IDF or attention weights) are extracted. Finally, an output keyword list and corresponding weights are generated, such as ["executive resignation", "financial risk", "2025"], and corresponding weights [0.5, 0.4, 0.1].
[0165] The retrieval and intersection calculation submodule 134 is configured to interact with the off-chain inverted index table construction submodule 132 and the parsing and extraction submodule 133 to perform off-chain inverted index retrieval and intersection calculation. It can search for the corresponding block number list in the off-chain inverted index table constructed by the off-chain inverted index table construction submodule 132 based on the keywords and corresponding weights in the keyword list extracted by the parsing and extraction submodule 133.
[0166] Specifically, the retrieval and intersection calculation submodule 134 is configured to first obtain the keyword list and corresponding weights extracted by the parsing and extraction submodule 133, as well as the off-chain inverted index table constructed by the off-chain inverted index table construction submodule 132. Next, for each keyword in the keyword list formed by the parsing and extraction submodule 133, a matching set of block numbers is searched in the inverted index table to form a candidate block number list; simultaneously, intersection or weighted union operations are performed on multiple keywords; and the results are weighted and fused according to semantic weights to obtain a comprehensive score. Finally, the candidate block number list and its comprehensive score are output.
[0167] The extraction and verification submodule 135 is configured to interact with the retrieval and intersection calculation submodule 134 to complete block content extraction and Merkle tree verification. This extraction and verification submodule 135 can perform Merkle tree verification on the candidate block number list obtained from the retrieval and intersection calculation submodule 134 to ensure that the retrieved block content has not been tampered with.
[0168] Specifically, the extraction and verification submodule 135 is configured to first obtain the candidate block number list and on-chain block header information (including the Merkle Root) obtained by the retrieval and intersection calculation submodule 134. Next, it obtains the target data in the block and its Merkle Proof path; and calculates the leaf hash accordingly, combining them layer by layer along the path until the root node; it then verifies whether the calculated root hash matches the Merkle Root recorded in the block header. Finally, it outputs the set of verified real data.
[0169] The submodule 136 is updated to cache user query results, accelerate repeated searches, and update weights based on a time decay mechanism.
[0170] Specifically, this update submodule 136 is configured to first store the retrieval results (i.e., the set of real data verified by the extraction and verification submodule 135) into the cache database. Based on this, the cache is checked each time a new query is performed; the weights of older data are adjusted according to a time decay formula. W_t = W_0 × e^(-λ·Δt); Where λ is the attenuation coefficient and Δt is the time difference.
[0171] Output: Latest and reliable semantic search results.
[0172] Output submodule 137 is configured to output the synthesis results in the final format.
[0173] The output of this module is the final data after the weight update of the 136-weight update submodule, which includes the latest sorting and verification status of the search results set.
[0174] As a further example, continuing the previous example, for the query "financial risks of executive resignation", after the parsing and extraction submodule 133 extracts keywords and weights, the retrieval and intersection calculation submodule 134 retrieves the index constructed by the off-chain inverted index table construction submodule 132. For example, from the index item "executive resignation", we may get [{BlockID: 100,Weight: 0.9}, {BlockID: 101, Weight: 0.8}], and from the index item "financial risks", we get [{BlockID: 100, Weight: 0.85}, {BlockID: 102, Weight: 0.7}]. After weighted union calculation, BlockID 100 scores 0.5*0.9 + 0.4*0.85 = 0.79, BlockID 101 scores 0.5*0.8 = 0.4, and BlockID 102 scores 0.4*0.7 = 0.28, thus obtaining the sorted candidate list [100, 101, 102]. Subsequently, the extraction and verification submodule 135 initiates Merkle Proof verification on the data for BlockIDs 100 and 101 (depending on requirements). After successful verification, the update submodule 136 adjusts the final score according to time decay, and the output submodule 137 returns the result. This process fully demonstrates the combination of efficient off-chain index retrieval and on-chain trusted verification.
[0175] In some implementations of this system, the retrieval module 140 is configured as the front-end module of the system, which can be used to interact with the user. The retrieval module 140 can obtain the user's input query conditions, and can quickly locate candidate blocks in the inverted index table built in the off-chain inverted index cache module 130 according to the query conditions; calculate keyword or main semantic hash; call the on-chain Merkle tree for verification; and return the sorted retrieval results and verification path.
[0176] As an example, after receiving a user query, the retrieval module 140 first calls the parsing and extraction submodule 133 to process the query; then, it passes the extracted keywords and weights to the retrieval and intersection calculation submodule 134 to perform a retrieval in the off-chain inverted index and obtain a sorted candidate list; next, it sends the candidate data (or the Top-K) in this list to the extraction and verification submodule 135 for Merkle Proof verification; finally, it encapsulates the verified data, along with its comprehensive score, Merkle Proof path, and block header information, into a structured response (such as JSON) and returns it to the user.
[0177] As can be seen from the above, the present invention combines the blockchain Merkle tree structure and semantic potential field mapping mechanism to form an efficient blockchain hybrid index retrieval scheme. Compared with the existing technology, the present invention first performs semantic potential field mapping: quantizes the query and document vector into attractive forces through the potential energy function and normalizes them for comparison. Based on this, semantic metrics are further mapped to block number indexes, and the mapping is bound through on-chain evidence storage, forming a semantically indexable and verifiable mapping mechanism. Finally, a closed loop is implemented with off-chain inverted index caching and on-chain verification. Off-chain, efficient indexing and weighted retrieval are performed, while on-chain, Merkle Proof verification is provided, achieving both speed (off-chain) and authenticity (on-chain). Based on the above-described blockchain hybrid index retrieval scheme, this embodiment of the invention also provides a computer-readable storage medium storing a program thereon, which, when executed by a processor, implements the steps of the above-described blockchain hybrid index retrieval method.
[0178] This invention also provides a processor for running a program, wherein the program executes the steps of the above-described blockchain hybrid index retrieval method during runtime.
[0179] This invention also provides a terminal device, which includes a processor, a memory, and a program stored in the memory and executable on the processor. The program code is loaded and executed by the processor to implement the steps of the above-described blockchain hybrid index retrieval method.
[0180] The present invention also provides a computer program product, which, when executed on a data processing device, is adapted to perform the steps of the above-described blockchain hybrid index retrieval method.
[0181] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
[0182] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and modules described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0183] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0184] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0185] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0186] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0187] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.
[0188] Memory may include non-persistent memory in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.
[0189] Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can store information using any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transfer medium that can be used to store information accessible by a computing device.
[0190] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.
[0191] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0192] The method, specific system unit, or part thereof of the present invention described above is a pure software architecture. It can be deployed via program code on physical media, such as hard disks, optical discs, or any electronic device (such as smartphones or computer-readable storage media). When a machine loads and executes the program code (e.g., a smartphone loads and executes it), the machine becomes a device for implementing the present invention. The method and device of the present invention can also be transmitted in program code form via transmission media, such as cables, optical fibers, or any other transmission method. When the program code is received, loaded, and executed by a machine (e.g., a smartphone), the machine becomes a device for implementing the present invention.
[0193] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely illustrative of the principles of the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the present invention as claimed. The scope of protection of this invention is defined by the appended claims and their equivalents.
Claims
1. A blockchain hybrid index retrieval method, characterized in that, The retrieval method first introduces a semantic potential field model based on the Merkle tree structure of the blockchain, converts the semantic similarity quantity into a corresponding potential energy value; Then, the potential energy value is mapped to the block number index, and the semantic-driven block number retrieval is completed; Then, combined with the off-chain inverted index caching mechanism, the retrieval and verification of multi-dimensional attributes are performed. 2.The blockchain hybrid index retrieval method of claim 1, wherein, When calculating the potential energy value based on the semantic potential field model, the retrieval method includes: (1) generating an embedding vector for each document and query; (2) using cosine distance to measure vector similarity and calculating semantic distance for the generated vector; (3) converting the obtained semantic distance into a corresponding potential energy value. 3.The blockchain hybrid index retrieval method of claim 1, wherein, The retrieval method establishes an off-chain inverted index cache, so that each keyword or risk subject corresponds to a potential block number list. 4.The blockchain hybrid index retrieval method of claim 1, wherein, The retrieval method includes a Merkle tree verification step. 5.The blockchain hybrid index retrieval method of claim 1, wherein, When retrieving and verifying the results, the retrieval method quickly locates the candidate block in the off-chain index, calculates the keyword or subject semantic hash, and calls the on-chain Merkle tree verification; Return the sorted retrieval results and verification path. 6.A blockchain hybrid index search system, characterized in that, The retrieval system includes a semantic potential field calculation module, a block number mapping module, an off-chain inverted index caching module, and a retrieval and result verification module, The semantic potential field calculation module is configured to introduce a semantic potential field model based on the Merkle tree structure of the blockchain, and convert the semantic similarity quantity into a corresponding potential energy value; The block number mapping module is configured to interact with the semantic potential field calculation module, and is used to map the potential energy value to the block number index, and complete the semantic-driven block number retrieval; The off-chain inverted index caching module is configured to maintain an inverted index table corresponding to the on-chain data off-chain, and verify it through the Merkle tree verification mechanism; The retrieval and result verification module is configured to interact with the off-chain inverted index caching module, and is used to combine the off-chain inverted index caching mechanism to perform retrieval and verification of multi-dimensional attributes. 7.The blockchain hybrid index retrieval system of claim 6, wherein, The semantic potential field calculation module is configured to first generate an embedding vector for each document and query; then, using cosine distance to measure vector similarity and calculating semantic distance for the generated vector; finally, converting the obtained semantic distance into a corresponding potential energy value. 8.The blockchain hybrid index retrieval system of claim 6, wherein, The retrieval and result verification module is configured to quickly locate the candidate block in the off-chain index, calculate the keyword or subject semantic hash, and call the on-chain Merkle tree verification; Return the sorted retrieval results and verification path.
9. A computer-readable storage medium having stored thereon a program, characterized in that, The program is executed by the processor to implement the steps of the blockchain hybrid index retrieval method of any one of claims 1-5.
10. A computer program product, characterised in that, When executed on a data processing device, it is adapted to perform the steps of the blockchain hybrid index retrieval method of any one of claims 1-5.