Legal document retrieval method and system based on multi-granularity index and hierarchical sorting
By employing multi-granularity indexing and hierarchical sorting methods, legal document retrieval is rewritten into multiple sub-queries. Combining dense and sparse indexes with multi-dimensional business characteristics, the problems of retrieval accuracy and semantic adaptability in existing technologies are solved, resulting in more accurate and traceable legal document retrieval results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING MEGA INTELLIGENT TECHNOLOGY CO LTD
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies for legal document retrieval suffer from problems such as coarse granularity, lack of hybrid indexing mechanisms, insufficient query comprehension capabilities, inconsistent sorting granularity, lack of preprocessing and compression for long documents, limitations of single-path recall mechanisms, lack of business feature integration, and fixed paragraph boundaries, resulting in insufficient retrieval accuracy and semantic adaptability.
By employing a multi-granularity indexing and hierarchical sorting approach, the original query statement is rewritten into multiple subqueries. A hybrid indexing mechanism combining dense and sparse indexes is used, along with multi-dimensional business characteristics and a legal relevance analysis model, to determine the relevance scores of candidate key statements, thereby achieving cross-granularity reordering and result optimization.
It significantly improves the accuracy, robustness, and legal interpretability of legal document retrieval, ensures the focus and traceability of results, enhances the adaptability to differences in expression, and improves the accuracy of case matching and legal provision association.
Smart Images

Figure CN122309706A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of legal artificial intelligence technology, and in particular to a legal document retrieval method and system based on multi-granularity indexing and hierarchical sorting. Background Technology
[0002] Currently, the relevant technologies have proposed the following technical means for legal document retrieval: (I) A Legal Document Intelligent Retrieval System Based on Vector Technology: First, the document vector generation module converts the documents into vector form. Then, the index building module builds an index based on these vectors. Simultaneously, the user-input query and text are preprocessed. Next, the query vector generation module converts the preprocessed query into a query vector. Then, the system inputs this query vector and the document vector into a similarity calculation module to calculate their similarity. Finally, the result optimization module optimizes the similarity calculation results, thus completing the entire information retrieval or text matching process. This technology has the following problems: coarse granularity, omission of key information; lack of a hybrid indexing mechanism; insufficient query understanding ability; inconsistent sorting granularity, etc.
[0003] (II) Intelligent Search Method for Legislative Materials Based on Vector Retrieval: First, the process is initiated based on the original legal foundation data and a finely tuned version of the vector model (which includes a vector embedding model) from the large language model. Then, the original legal foundation data is processed using the large language model to generate retrieval application text data. Next, the finely tuned vector embedding model is used to convert the retrieval application text data into retrieval application vector data, which is then stored in the legislative material vector library. Finally, the finely tuned vector embedding model is used again to convert the user's search request into a search request vector. The vector retrieval application matches the search request vector with the legislative material vector library, and then retrieves relevant original legal foundation data based on the matching results. This technique has the following problems: lack of preprocessing and compression for long documents; limitations of the single-path retrieval mechanism; and lack of business feature fusion.
[0004] (III) Paragraph Aggregation Retrieval Model (PARM): For long document retrieval, this technology proposes a paragraph-level retrieval scheme. For document-to-document retrieval tasks (such as case retrieval and prior art retrieval), long documents are divided into paragraphs. A dense paragraph retrieval (DPR) model is used to encode the query and paragraphs respectively. Relevant paragraphs are recalled through vector similarity. Finally, strategies such as reciprocal ranking-based fusion (RRF) or vector-based aggregation and reciprocal ranking fusion weighted (VRRF) are used to aggregate the paragraph-level results into document-level ranking results. This technology has the following problems: paragraph boundaries are fixed, and legal semantic integrity is not considered; query rewriting and expansion are lacking; the aggregation strategy is simple and lacks deep semantic re-ranking.
[0005] In summary, existing technologies still have significant shortcomings in terms of accuracy, semantic adaptability, and business practicality in legal document retrieval, leaving considerable room for optimization. Summary of the Invention
[0006] In view of this, the purpose of this invention is to provide a legal document retrieval method and system based on multi-granularity indexing and hierarchical sorting, which significantly improves the accuracy, robustness and legal interpretability of legal document retrieval.
[0007] In a first aspect, the present invention provides a legal document retrieval method based on multi-granularity indexing and hierarchical sorting, comprising: Receive the original query statement uploaded by the user; Rewrite the original query statement into multiple subquery statements; Based on a pre-built multi-indexed legal document database, the target recall results of each subquery statement relative to multiple index categories are determined, and the target recall results of all subquery statements relative to multiple index categories are aggregated into a candidate key statement set. The multi-indexed legal document database stores statement indexes of multiple index categories and their associated original key statements. The original key statements are obtained by extracting key statements from the document summary of the legal document. By using a pre-trained legal relevance analysis model and combining multi-dimensional business characteristics, the target relevance score between the original query statement and the document summary associated with each candidate key statement in the candidate key statement set is determined. The target relevance score is used to characterize the degree of relevance between the document summary associated with the candidate key statement and the original query statement. The query results corresponding to the original query statement are determined based on the target relevance score. The query results include at least the target key statement, the document summary associated with the target key statement, and the legal document to which the target key statement belongs.
[0008] In one implementation, the original query statement is rewritten into multiple subquery statements, including: Identify the legal elements and question type tags in the original query statement. The question type tags should at least include case retrieval, legal provision interpretation, and legal opinion query. The legal elements are expanded using synonyms and / or decomposed into elements. The newly obtained legal elements are then substituted into the legal retrieval paradigm corresponding to the question type label to obtain multiple sub-queries.
[0009] In one implementation, before determining the recall results of each subquery relative to multiple index categories based on a pre-built multi-indexed legal document database, the method further includes: Obtain the document summary corresponding to the legal document; Extracting original key sentences from document summaries; The index building model is called for multiple index categories. The original key statement is indexed relative to multiple index categories through the index building model. The index categories include at least dense index categories and sparse index categories. A densely indexed legal document database is constructed based on the statement index of the dense index category and its associated original key statements, while a sparsely indexed legal document database is constructed based on the statement index of the sparse index category and its associated original key statements.
[0010] In one implementation, based on a pre-built multi-indexed legal document database, the target recall results for each subquery statement relative to multiple index categories are determined, including: For any subquery statement, perform the following operation: The subquery statement is encoded into a query vector, and the first recall result is determined based on the semantic similarity between the query statement and the dense index stored in the dense index legal document database. Extract the term to be retrieved from the subquery statement, and retrieve the original key statement containing the term to be retrieved from the sparse index stored in the sparse index legal document database as the second recall result. The first and second recall results are merged to obtain the target recall result corresponding to the subquery statement.
[0011] In one implementation, a pre-trained legal relevance analysis model, combined with multi-dimensional business characteristics, determines the target relevance score between the original query statement and the document summary associated with each candidate key statement in the candidate key statement set, including: The document summary associated with the original query statement and each candidate key statement in the candidate key statement set is used as input to the pre-trained legal relevance analysis model; By using a legal relevance analysis model, cross-attention processing is performed on the document summaries associated with each candidate key statement to obtain an initial relevance score between the original query statement and the document summaries associated with each candidate key statement. By dynamically adjusting the initial relevance score using multi-dimensional business characteristics, the target relevance score between the original query statement and the document summary associated with each candidate key statement is obtained.
[0012] In one implementation, the multi-dimensional business features include case authority weight, timeliness weight, and regional relevance weight; the initial relevance score is dynamically adjusted using these multi-dimensional business features to obtain a target relevance score between the original query statement and the document summary associated with each candidate key statement, including: Based on the case authority weight, timeliness weight, and regional relevance weight, the initial relevance score is dynamically adjusted by combining the preset fusion coefficient to obtain the target relevance score between the original query statement and the document summary associated with each candidate key statement.
[0013] In one implementation, after determining the query results corresponding to the original query statement based on the target relevance score, the method further includes: The query results are sent to a specified associated terminal to display the document summary contained in the query results on the graphical user interface of the specified associated terminal, and the position of the target key statement in the document summary is rendered with a specified effect.
[0014] Secondly, the present invention also provides a legal document retrieval system based on multi-granularity indexing and hierarchical sorting, comprising: The statement receiving module is used to receive the original query statements uploaded by the user; The statement rewriting module is used to rewrite the original query statement into multiple subquery statements; The candidate statement determination module is used to determine the target recall results of each subquery statement relative to multiple index categories based on a pre-built multi-indexed legal document database, and to aggregate the target recall results of all subquery statements relative to multiple index categories into a candidate key statement set. The multi-indexed legal document database stores statement indexes of multiple index categories and their associated original key statements. The original key statements are obtained by extracting key statements from the document summary of the legal document. The relevance analysis module is used to determine the target relevance score between the original query statement and the document summary associated with each candidate key statement in the candidate key statement set by combining a pre-trained legal relevance analysis model with multi-dimensional business characteristics. The target relevance score is used to characterize the degree of relevance between the document summary associated with the candidate key statement and the original query statement. The result determination module is used to determine the query results corresponding to the original query statement based on the target relevance score. The query results include at least the target key statement, the document summary associated with the target key statement, and the legal document to which the target key statement belongs.
[0015] Thirdly, the present invention also provides an electronic device including a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement any of the methods provided in the first aspect.
[0016] Fourthly, the present invention also provides a computer-readable storage medium storing computer-executable instructions, which, when invoked and executed by a processor, cause the processor to implement any of the methods provided in the first aspect.
[0017] This invention provides a legal document retrieval method and system based on multi-granularity indexing and hierarchical sorting. First, it receives the original query statement uploaded by the user and rewrites it into multiple sub-queries. Then, based on a pre-built multi-index legal document database, it determines the target recall results of each sub-query statement relative to multiple index categories, and aggregates all the target recall results of all sub-queries relative to multiple index categories into a candidate key statement set. The multi-index legal document database stores statement indexes of multiple index categories and their associated original key statements, which are obtained by extracting key statements from the document summaries of legal documents. Next, using a pre-trained legal relevance analysis model combined with multi-dimensional business characteristics, it determines the target relevance score between the original query statement and the document summaries associated with each candidate key statement in the candidate key statement set. The target relevance score characterizes the degree of relevance between the document summaries associated with the candidate key statements and the original query statement. Finally, based on the target relevance score, it determines the query results corresponding to the original query statement. The query results include at least the target key statement, the document summaries associated with the target key statement, and the legal document to which the target key statement belongs. The above method rewrites the original query statement into multiple subqueries, covering the multiple legal intentions of the user's query, improving adaptability to differences in expression, and enhancing accuracy and robustness. It uses subqueries as conditions to filter candidate key statement sets from a multi-indexed legal document database, avoiding noise from full-text matching and making the results more focused and locatable. Finally, a relevance analysis model integrates multi-dimensional business characteristics, calculates the target relevance score between the original query statement and the document summaries associated with the candidate key statements, and determines the final query results. This ensures traceable results and verifiable evidence, significantly improving the accuracy, robustness, and legal interpretability of case matching and legal provision association.
[0018] Other features and advantages of the invention will be set forth in the following description, and will be apparent in part from the description, or may be learned by practicing the invention. The objects and other advantages of the invention are realized and obtained through the structures particularly pointed out in the description and the drawings.
[0019] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description
[0020] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0021] Figure 1 A flowchart illustrating a legal document retrieval method based on multi-granularity indexing and hierarchical sorting, provided for an embodiment of the present invention; Figure 2 This invention provides a technical framework diagram for a legal document retrieval method based on multi-granularity indexing and hierarchical sorting. Figure 3 This is a schematic diagram of the structure of a legal document retrieval system based on multi-granularity indexing and hierarchical sorting, provided in an embodiment of the present invention. Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0022] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below in conjunction with the embodiments. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0023] Currently, the relevant technologies have proposed the following technical means for legal document retrieval: (I) A legal document intelligent retrieval system based on vector technology has the following drawbacks: (1.1) Coarse granularity, key information easily overlooked: Although this technology introduces semantic role labeling and structural division, the retrieval and indexing units are still the entire document or fixed-length paragraphs. However, legal documents are long (thousands to tens of thousands of words), and core legal elements (such as points of contention and judgments) are scattered throughout the text. Vectorization and retrieval based on the entire document or fixed paragraphs are easily diluted by redundant content, leading to the omission of key information and a decrease in retrieval accuracy. This invention solves this problem through key sentence extraction and independent indexing mechanisms.
[0024] (1.2) Lack of a hybrid indexing mechanism: This technology relies solely on dense vector indexes constructed from word vectors, without incorporating sparse indexing methods such as inverted indexes and BM25. In legal retrieval scenarios, users often need to perform precise matching of legal provision numbers, legal terms, and party names. Pure dense indexes have limitations in terms of literal consistency and terminological strictness, easily leading to misjudgments of semantically similar but legally different terms. This invention addresses this problem through a hybrid indexing mechanism that combines dense and sparse indexes.
[0025] (1.3) Lack of a hybrid indexing mechanism: This technology relies solely on dense vector indexes constructed from word vectors, without incorporating sparse indexing methods such as inverted indexes and BM25. In legal retrieval scenarios, users often need to perform precise matching of legal provision numbers, legal terms, and party names. Pure dense indexes have limitations in terms of literal consistency and terminological rigor, easily leading to misjudgments of semantically similar but legally different terms. This invention addresses this problem through a hybrid indexing mechanism that combines dense and sparse indexes.
[0026] (1.4) Inconsistent ranking granularity: This technology calculates similarity at the substructure level, such as title and clause, during the recall stage, but lacks a cross-granularity collaborative reordering mechanism in the final ranking stage, making it difficult to solve the problem of local matching but overall irrelevance. This invention solves this problem through a cross-granularity reordering mechanism that combines sentence-level recall with summary-level ranking.
[0027] (ii) A vector-based intelligent search method for legislative materials has the following drawbacks: (2.1) Lack of preprocessing compression for long documents: Although this technology uses a large model to process and uniformly vectorize documents, the entire document contains multiple legal elements, and direct overall vectorization can easily lead to the dilution or omission of key information. This invention solves this problem through a preprocessing mechanism that combines summary compression and key sentence extraction.
[0028] (2.2) Limitations of the single-path recall mechanism: This technology relies solely on a single vector matching path for recall, without integrating other retrieval methods such as sparse indexes. In the legal field, a single path is insufficient to simultaneously address the requirements of semantic generalization and precise term matching. This invention addresses this issue by employing a mechanism that combines dense and sparse indexes in synergy with parallel recall of multiple subqueries.
[0029] (2.3) Lack of business feature integration: The re-ranking of this technology is based solely on the semantic relevance score output by the vector re-ranking model, without incorporating business features of concern in legal practice, such as court level, trial procedure, and judgment date, making it difficult to dynamically adapt to actual application needs. This invention addresses this problem by introducing a mechanism that involves multi-dimensional business features in fine-tuning the ranking.
[0030] (iii) Paragraph Aggregation Retrieval Model (PARM), which has the following drawbacks: (3.1) Fixed paragraph boundaries, failing to consider the completeness of legal semantics: The PARM model uses fixed lengths or rules to divide paragraphs, which does not adapt to the inherent logical structure of legal documents (such as the focus of the dispute, the determination of facts, the reasoning of the judgment, the judgment result, etc.). Mechanical segmentation easily breaks the complete semantic chain of legal argumentation. This invention ensures that each key sentence carries a complete legal proposition by combining summary compression and key sentence extraction, thus maintaining the completeness of legal semantics.
[0031] (3.2) Lack of query rewriting expansion: PARM directly uses the original query for paragraph retrieval without semantic expansion or rewriting the query, making it difficult to address the inconsistency between user expressions and legal document terminology. This invention solves this problem by generating multiple subqueries through query rewriting.
[0032] (3.3) Simple aggregation strategy, lacking deep semantic reordering: PARM adopts shallow aggregation strategies based on ranking or vector similarity, such as RRF or VRRF, and does not introduce a reordering model that supports deep interaction between queries and documents, thus limiting the final ranking accuracy. This invention achieves refined semantic reordering through a reordering model based on deep interaction.
[0033] Based on this, the present invention provides a legal document retrieval method and system based on multi-granularity indexing and hierarchical sorting, which significantly improves the accuracy, robustness and legal interpretability of legal document retrieval.
[0034] To facilitate understanding of this embodiment, a detailed description of a legal document retrieval method based on multi-granularity indexing and hierarchical sorting disclosed in this embodiment of the invention will be provided first. (See [link to relevant documentation]). Figure 1 The diagram shows a flowchart of a legal document retrieval method based on multi-granularity indexing and hierarchical sorting. The method mainly includes the following steps S102 to S110: Step S102: Receive the original query statement uploaded by the user.
[0035] This includes receiving the user's original query statement, which is described in natural language, such as "the legal liability of company shareholders for failing to fulfill their capital contribution obligations".
[0036] Step S104: Rewrite the original query statement into multiple subquery statements.
[0037] In one example, the original query can be rewritten according to a rewriting strategy, which may include synonym expansion, element decomposition, and question type conversion. Synonym expansion uses a legal domain dictionary or pre-trained language model to replace core legal terms in the original query with their professional synonyms or near-synonyms, covering semantically equivalent queries under different expression habits. Element decomposition breaks down the complex original query into several independent, searchable legal constituent clauses based on the legal logical structure, achieving a multi-dimensional explicit expression of intent. Question type conversion transforms the user's original question into different legal retrieval paradigms (i.e., commonly used keyword combinations or propositional expressions in legal retrieval) based on the semantic focus of the user's original question.
[0038] Step S106: Based on the pre-built multi-index legal document database, determine the target recall results of each subquery statement relative to multiple index categories, and aggregate the target recall results of all subquery statements relative to multiple index categories into a candidate key statement set.
[0039] The multi-indexed legal document database stores statement indexes of multiple index categories and their associated original key statements. The index categories include at least dense index categories and sparse index categories. The original key statements are obtained by extracting key statements from the document summary of the legal document. The target recall result is the retrieval result of the subquery statement relative to a certain index category (i.e., candidate key statements). The candidate key statement set is the set of candidate key statements obtained by aggregating the target recall results of all subquery statements relative to multiple index categories.
[0040] In one implementation, for each subquery statement, a search is performed on the Dense index and the Sparse index respectively to obtain Dense results (denoted as the first recall result) and Sparse results (denoted as the second recall result). The Dense results and Sparse results of the same subquery statement are fused to obtain the target recall result corresponding to the subquery statement. Then, the target recall results corresponding to all subquery statements are aggregated to obtain a set of candidate key statements.
[0041] Step S108: Using a pre-trained legal relevance analysis model and combining multi-dimensional business characteristics, determine the target relevance score between the original query statement and the document summary associated with each candidate key statement in the candidate key statement set.
[0042] The multi-dimensional business features include case authority weight, timeliness weight, and regional relevance weight. The target relevance score is used to characterize the degree of relevance between the document summaries associated with the candidate keywords and the original query. In one implementation, a legal relevance analysis model is first used to determine the initial relevance score between the original query and the document summaries associated with each candidate keyword in the candidate keyword set. Then, the initial relevance score is dynamically adjusted using multi-dimensional business features such as case authority weight, timeliness weight, and regional relevance weight to obtain the target relevance score between the original query and the document summaries associated with each candidate keyword in the candidate keyword set.
[0043] Step S110: Determine the query results corresponding to the original query statement based on the target relevance score. The query results shall include at least the target key statement, the document summary associated with the target key statement, and the legal document to which the target key statement belongs.
[0044] In one example, multiple candidate keywords can be selected as target keywords (which are also the original keywords related to the original query) in descending order of target relevance scores. Then, the query results are generated according to the preset mapping relationship between the original keywords, document summaries, and legal documents.
[0045] The legal document retrieval method based on multi-granularity indexing and hierarchical sorting provided in this invention rewrites the original query statement into multiple sub-queries, covering the multiple legal intentions of the user query, improving adaptability to differences in expression, and enhancing accuracy and robustness. It uses sub-queries as conditions to filter candidate key statement sets from a multi-indexed legal document database, avoiding noise from full-text matching and making the results more focused and locatable. Finally, a relevance analysis model integrates multi-dimensional business characteristics, calculates the target relevance score between the original query statement and the document summaries associated with the candidate key statements, and determines the final query results, achieving traceable results and verifiable basis, significantly improving the accuracy, robustness, and legal interpretability of case matching and legal provision association.
[0046] For ease of understanding, this invention provides a specific implementation of a legal document retrieval method based on multi-granularity indexing and hierarchical sorting. See [link to relevant documentation]. Figure 2 The diagram shown illustrates a technical framework for a legal document retrieval method based on multi-granularity indexing and hierarchical sorting, including: First, in the data preprocessing layer (offline stage), the legal documents (long texts) are summarized and presented in a compressed representation. Then, key sentences are extracted, covering legal elements, points of contention, or key points of judgment. After that, the key sentences are treated as independent document units, and relevant mapping information is constructed. At the same time, Dense index (vector similarity retrieval) and Sparse index (sparse retrieval, keyword matching) are constructed to complete the preparation work for the data preprocessing stage.
[0047] Upon entering the retrieval and recall layer (online stage), after the user inputs the original query statement (natural language description), the original query statement is first rewritten, including operations such as expansion, decomposition, or conversion of question type to generate multiple subqueries, involving 3-5 dimensions; then, multi-subquery parallel retrieval is performed, using Dense index retrieval (based on text similarity matching) and Sparse index retrieval (based on keyword matching); finally, the key sentence (doc) is retrieved, including the document ID and mapping information.
[0048] Finally, in the fine ranking and optimization layer (online stage), the original document summary paragraphs are obtained by tracing back to the source based on the recall results; the Rerank model (which combines query and summary for deep interactive calculation) is used to perform multi-dimensional dynamic fine-tuning of the ranking, taking into account factors such as model ranking weight, time weight, and regional relevance weight, and finally presenting key sentences that are highly relevant to the query and highly hit in the original text, as well as displaying contextual fragments and other information.
[0049] The specific implementation process is as follows: (a) Data preprocessing layer (offline stage), including steps 1.1 to 1.5: Step 1.1: Obtain the document summary corresponding to the legal document. In one example, an existing language model can be invoked. The legal document is input into the language model, and the model outputs the corresponding document summary using built-in relevant prompts. In another example, the document summary corresponding to the legal document can also be manually written. This embodiment of the invention does not limit the method of obtaining the document summary.
[0050] Step 1.2: Extract original key sentences from the document summary. In one implementation, jieba word segmentation and the TextRank algorithm can be used to extract original key sentences from the document summary.
[0051] In other implementations, other feasible alternatives to extracting original key sentences from document summaries include: using a fixed window to slide out sentences and then selecting the Top-N sentences based on importance scores; using a generative model to directly generate key sentences; or using a reading comprehension (MRC) framework to extract relevant sentences for specific legal issues.
[0052] Step 1.3: Call the index building model corresponding to multiple index categories, and build the statement index of the original key statement relative to multiple index categories through the index building model. Construct a dense index legal document database based on the statement index of the dense index category and its associated original key statement, and construct a sparse index legal document database based on the statement index of the sparse index category and its associated original key statement.
[0053] (1) Dense Index: Using a pre-trained legal domain semantic model (such as LawBERT or a fine-tuned version of Legal-BERT), each original key statement is input into the above model to encode it as a 1024-dimensional dense vector representation; then, the 1024-dimensional dense vector representations of all original key statements are imported into vector retrieval libraries such as FAISS or Annoy to construct the Dense index corresponding to the original key statements.
[0054] (2) Sparse index: The original key sentences are segmented into Chinese words, and common stop words and redundant punctuation in the legal field are filtered to obtain a term sequence. Based on the term sequence, the weight of each term in the original key sentences is calculated using the BM25 algorithm to generate a term-sentence inverted index structure, and the Sparse index is constructed accordingly.
[0055] Other feasible alternatives include: using methods such as SPLADE or ColBERT to incorporate sparse feature encoding into dense vectors; generating multiple token-level vector representations for each sentence; or adopting a cascaded indexing strategy, first performing coarse screening with a sparse index, and then finely ranking the candidate set in a dense index.
[0056] Step 1.4 involves maintaining source information for each original key statement and establishing a mapping relationship between the ID of the legal document, the ID of the document summary, and the ID of the original key statement. It should be noted that steps 1.3 and 1.4 can be executed in parallel or sequentially; this embodiment of the invention does not impose any restrictions on this.
[0057] (ii) Retrieval and Recall Layer (Online Phase), including steps 2.1 to 2.2: Step 2.1: Receive the original query statement input by the user and rewrite the original query statement.
[0058] (1) Identify the legal elements and question type tags in the original query statement. Legal elements include subjects (such as company shareholders), behaviors (such as failure to fulfill capital contribution obligations), and needs (such as legal liability). Question type tags include at least case retrieval, legal interpretation, and legal opinion query.
[0059] (2) Expand legal elements with synonyms and / or decompose elements, and substitute the new legal elements obtained from the decomposition into the legal retrieval paradigm corresponding to the question type label to obtain multiple sub-query statements.
[0060] Among these features are: Synonym expansion: mapping core legal expressions such as "failure to fulfill capital contribution obligations" to common synonyms in the Company Law and judicial interpretations, such as "false capital contribution," "fraudulent capital contribution," and "withdrawal of capital"; Element decomposition: based on the legal elements, decomposing the compound query into several semantically independent and searchable sub-queries, such as "shareholder's capital contribution obligation," "shareholder's liability," and "protection of company creditors"; Question type conversion: based on the user's original question intent, converting it into a standardized form suitable for the search task, for example, converting "What if a shareholder fails to contribute capital?" into "legal liability of a shareholder for failing to fulfill capital contribution obligations" or "rules of adjudication for company creditors to request shareholders to repay debts within the scope of unpaid capital and interest."
[0061] Based on the above strategy, the original query statement is rewritten into N subqueries (usually 3-5), each subquery statement representing a dimension of the user's search intent.
[0062] Other feasible alternatives include: expanding queries by mining high-frequency extended terms from historical search logs; or clarifying user intent gradually through multiple rounds of dialogue interaction, rather than generating multiple subqueries in a single session.
[0063] Step 2.2, for any subquery statement, perform the following operation: The subquery is encoded into a query vector, and the first recall result is determined based on the semantic similarity between the query and the dense index stored in the dense indexed legal documents database. In one example, the subquery is encoded into a query vector, semantic similarity is searched in the Dense index, and the Top-K1 key sentences corresponding to the subquery are returned as the first recall result.
[0064] The search term is extracted from the subquery, and the original key sentences containing the search term are retrieved from the sparse index stored in the sparse indexed legal document database as the second recall result. In one example, the subquery is segmented and legal terms are extracted, and term matching retrieval is performed in the Sparse index to return the Top-K2 key sentences corresponding to the subquery as the second recall result.
[0065] The first and second recall results are merged to obtain the target recall results corresponding to the subquery. For each subquery, the first and second recall results are uniformly sorted using RRF or a weighted fusion strategy to generate the target recall list for that subquery. For example, the RRF score (1 / (rank + k)) is calculated for the two results of the same subquery according to the original ranking, or the similarity score and BM25 score are weighted and summed according to preset weights, then merged, deduplicated, and re-sorted to output the Top-K fusion result, which is the target recall result.
[0066] Step 2.3: Aggregate the target recall results of all subquery statements relative to multiple index categories into a candidate key statement set. In one implementation, the candidate key statements returned by each subquery statement are precisely deduplicated according to text content (or approximate deduplication based on semantic hashing), retaining the first occurrence, forming a set of candidate key sentences without duplication.
[0067] (III) Fine-tuning and optimization layer (online stage), including steps 3.1 to 3.3: Step 3.1, trace back to obtain the summary paragraph: Based on the document ID recorded in the candidate keywords, the original legal document is located back to the source, and the pre-stored document summary of the document is extracted, thus realizing the transformation of the retrieval granularity from the keyword level to the document summary level.
[0068] Step 3.2, Rerank Depth Sort: The original query statement and the document summary associated with each candidate key statement in the candidate key statement set are used as input to a pre-trained legal relevance analysis model. The legal relevance analysis model performs cross-attention processing on the original query statement and the document summary associated with each candidate key statement to obtain an initial relevance score between the original query statement and the document summary associated with each candidate key statement.
[0069] In one implementation, a Rerank model (such as a Cross-Encoder architecture or a large language model fine-tuned in the legal domain) is constructed. The original query and candidate document summaries are used as joint inputs. Deep semantic interaction modeling is performed through cross-attention or multi-layer Transformer, and the initial relevance score in the range of 0–1 is output to represent the overall matching degree between the document summary and the user query.
[0070] In this embodiment of the invention, the large language model based on the Cross-Encoder architecture is improved to better output the initial relevance score. Specifically: Step 1: Receive raw data (raw query statement, multiple candidate document summaries) through the input layer, and load the pre-trained legal provision embedding set in the legal domain, and pass it into the legal provision embedding branch.
[0071] Step 2: Input the original query statement and multiple candidate document summaries into the backbone network (multi-layer Transformer). The backbone network performs basic semantic extraction on these original data to generate basic semantic features of the original query and basic semantic features of each candidate summary.
[0072] Step 3: Input the basic semantic features of the query output from the main branch into the query-legal provision branch, and at the same time input the standardized legal provision semantic vector output from the legal provision embedding branch into this branch. The two perform cross-attention interaction to generate query semantic features that integrate legal provision information.
[0073] Step 4: Input the basic semantic features of each candidate summary output from the main branch into the summary-legal provision branch, and perform cross-attention interaction with the legal provision semantic vector of the legal provision embedding branch to generate summary semantic features that integrate legal provision information.
[0074] Step 5: The dual-branch interaction layer performs a second cross-attention interaction between the query semantic features of the fused legal provisions and the summary semantic features of each fused legal provision, to obtain the query-summary semantic interaction features of the fused legal provisions, providing a foundation for subsequent multi-granularity interactions.
[0075] Step 6: Input the query-summary semantic interaction features with integrated legal provisions into the multi-granularity interaction module. Through the fine-grained terminology interaction module in the multi-granularity interaction module, call the legal terminology dictionary, part-of-speech tagging tool, and entity alignment model to perform part-of-speech tagging, entity alignment, and terminology ambiguity filtering on the legal terms in the query and each candidate document summary, to obtain the query-summary text without terminology ambiguity. At the same time, through the coarse-grained semantic matching module in the multi-granularity interaction module, use Sentence-BERT for sentence-level semantic encoding to capture the overall semantic framework of the query and each candidate document summary, to obtain coarse-grained semantic matching features. Integrate the query-summary text without terminology ambiguity and the coarse-grained semantic matching features to obtain the multi-granularity matching features.
[0076] Step 7: Input the multi-granularity matching features into the legal-specific cross-attention layer. Based on the pre-defined attention weight labeled dataset, call the improved cross-attention calculation function to assign high weights of 0.7-0.9 to the legal provisions and core facts in the query and each candidate document summary, and low weights of 0.1-0.3 to irrelevant expressions such as embellishment and procedural statements, to obtain the weighted semantic interaction features. Then, use the attention masking mechanism to filter out words without legal meaning and generate a mask matrix. Use the mask matrix to shield the interference of words without legal meaning to obtain interaction features that focus on the core semantics.
[0077] Step 8: Input the interaction features focusing on core semantics into the multi-layer Transformer hierarchical modeling layer. First, input them into the basic semantic layer of the multi-layer Transformer hierarchical modeling layer. Use literal semantic matching algorithms such as cosine similarity to capture the literal matching relationship between the query and each candidate document summary to obtain the literal semantic matching result. Then, input the literal semantic matching result and the legal provision embedding vector in the dual-branch architecture into the legal relationship layer of the multi-layer Transformer hierarchical modeling layer. Calculate the semantic similarity between the query, candidate document summary and legal provision, model a triangular relationship and determine whether the two conform to the same legal logic to obtain the legal relationship matching result. Finally, input the literal semantic matching result and the legal relationship matching result into the matching decision layer of the multi-layer Transformer hierarchical modeling layer. Combine the output features of the two, introduce the legal rules of "legal provision matching takes precedence over fact matching" and "core demand matching takes precedence over secondary expression" and set weight coefficients. Calculate the preliminary matching score through the fully connected layer. Step 9: Output the preliminary matching score through the output layer. The preliminary matching score is the initial relevance score between the original query statement and each candidate document summary, thus completing the model inference.
[0078] Other feasible alternatives include: using large language models combined with prompting engineering to achieve relevance scoring; using listwise ranking models (such as SetRank) to directly optimize the overall quality of the ranked list; or jointly modeling relevance judgment and business feature prediction through multi-task learning.
[0079] Step 3.3: Dynamically adjust the initial relevance score using multi-dimensional business features to obtain the target relevance score between the original query statement and the document summary associated with each candidate key statement.
[0080] In one implementation, the initial relevance score is dynamically adjusted based on the case authority weight, timeliness weight, and regional relevance weight, combined with a preset fusion coefficient, to obtain the target relevance score between the original query statement and the document summary associated with each candidate key statement.
[0081] Based on the initial relevance score output by Rerank, a dynamic weighted adjustment is made by incorporating legal business characteristics: (1) Weight of case authority: Different weights are assigned based on the court level (decreasing from the Supreme People's Court to the basic court) and the type of trial procedure (guiding cases, gazetted cases, typical cases, and ordinary cases); (2) Timeliness weight: The time decay weight is set according to the date of the judgment, and the score is adjusted in combination with the validity status of the legal text involved (currently valid or repealed); (3) Regional relevance weight: The weight is adjusted according to the matching degree between the location of the court of trial and the region of interest of the user, as well as the consistency of the region of application of law (such as the scope of application of local judicial guidance opinions).
[0082] The target relevance score can be determined using the following formula: ; in. The target relevance score, The initial relevance score output by Rerank. , , , These are configurable parameters that can be adaptively adjusted based on specific application scenarios. , , These are weights for case authority, timeliness, and regional relevance.
[0083] Other feasible alternatives include: using Learning to Rank models (such as LambdaMART and LambdaRank) to automatically learn the optimal weights for each feature; using business features as hard constraints for ranking; or using a faceted navigation mechanism, allowing users to independently select dimensions such as authority and timeliness.
[0084] Finally, (1) the candidate documents are sorted in descending order according to the final ranking score, and the top-N query results are returned. (2) The query results are sent to the designated associated terminal to display the document summary contained in the query results on the graphical user interface of the designated associated terminal, and the position of the target key sentence in the document summary is rendered with the specified effect. For example, in the summary paragraph of each returned document, the key sentence hit by the document in the recall stage is located and highlighted; (3) the case number, court, judgment date and other metadata of the document are presented simultaneously, as well as the summary paragraph and the corresponding list of key sentences.
[0085] In summary, the method provided by the embodiments of the present invention has at least the following characteristics: (1) Solving the problem of information dilution in long documents and improving retrieval accuracy: Existing technologies such as "intelligent legal document retrieval system based on vector technology" and "intelligent legislative material search method based on vector retrieval" directly vectorize long documents or use fixed-length segments, resulting in key legal information being diluted by a large amount of redundant content. This invention refines the retrieval unit into key sentences that carry complete legal propositions through abstract compression and key sentence extraction mechanisms, increasing the information density by about 5 to 10 times; at the same time, it constructs an index based on key sentences to avoid semantic breaks caused by fixed segments (such as separating the focus of the dispute from the reasoning of the judgment), and achieves high-precision indexing while ensuring the integrity of legal semantics.
[0086] (2) Hybrid indexing mechanism takes into account both semantic generalization and precise matching: Existing technologies all use a single index type, which makes it difficult to simultaneously meet the dual needs of legal retrieval for semantic understanding and precise term matching. This invention integrates dense indexes and sparse indexes: dense indexes support semantic generalization matching (such as "equity transfer" and "share transfer"), while sparse indexes ensure precise term-level recall (such as specific legal provision numbers and party names); combined with parallel retrieval of multiple subqueries on the two types of indexes, the recall rate is effectively improved.
[0087] (3) Query rewriting and multi-subquery recall expand intent coverage: Existing technologies directly use the original query for retrieval, which is difficult to fully cover the multiple legal intents implicit by users. This invention generates multiple subqueries by rewriting the query, expands the semantic expression from the perspectives of synonym expansion, element decomposition, and question type conversion, and aggregates the results after parallel retrieval, which significantly improves intent coverage and recall rate.
[0088] (4) Balancing recall and accuracy through cross-granularity re-ranking: In the recall phase, fine-grained matching is performed on the key sentence-level index based on subqueries to ensure that relevant legal elements are detected as much as possible; in the re-ranking phase, deep semantic interaction calculation is carried out on the original query and document summary as units to comprehensively score the overall relevance of the document. This mechanism not only ensures recall, but also suppresses the situation of local matching but overall irrelevance through deep modeling at the summary level, thereby improving ranking accuracy.
[0089] (5) Business feature integration meets the needs of legal practice: The ranking of existing technologies only relies on the semantic relevance score output by the model, without incorporating business factors that are of concern in legal practice, such as court level, trial procedure, and judgment date. This invention dynamically integrates multi-dimensional business features during the re-ranking process, making the final ranking result more in line with the actual judgment logic and working habits of legal professionals.
[0090] (6) Source tracing and highlighting improves the interpretability of results: This invention establishes a precise source tracing relationship between key sentences and original legal documents, and highlights the key sentences and their document locations in the search results, which makes it easier for users to quickly verify the matching basis, intuitively understand the search logic, and enhance the credibility and interpretability of the results.
[0091] Based on the foregoing embodiments, this invention provides a legal document retrieval system based on multi-granularity indexing and hierarchical sorting. (See also...) Figure 3 The diagram shown illustrates the structure of a legal document retrieval system based on multi-granularity indexing and hierarchical sorting. This system mainly comprises the following components: The statement receiving module 302 is used to receive the original query statement uploaded by the user; The statement rewriting module 304 is used to rewrite the original query statement into multiple subquery statements; The candidate statement determination module 306 is used to determine the target recall results of each subquery statement relative to multiple index categories based on a pre-built multi-index legal document database, and to aggregate the target recall results of all subquery statements relative to multiple index categories into a candidate key statement set; wherein, the multi-index legal document database stores statement indexes of multiple index categories and their associated original key statements, and the original key statements are obtained by extracting key statements from the document summary of the legal document. The relevance analysis module 308 is used to determine the target relevance score between the original query statement and the document summary associated with each candidate key statement in the candidate key statement set by combining a pre-trained legal relevance analysis model and multi-dimensional business characteristics. The target relevance score is used to characterize the degree of relevance between the document summary associated with the candidate key statement and the original query statement. The result determination module 310 is used to determine the query results corresponding to the original query statement based on the target relevance score. The query results include at least the target key statement, the document summary associated with the target key statement, and the legal document to which the target key statement belongs.
[0092] The legal document retrieval system based on multi-granularity indexing and hierarchical sorting provided in this invention rewrites the original query statement into multiple sub-queries, covering the multiple legal intentions of the user's query, improving adaptability to differences in expression, and enhancing accuracy and robustness. It uses sub-queries as conditions to filter candidate key statement sets from a multi-indexed legal document database, avoiding noise from full-text matching and making the results more focused and locatable. Finally, a relevance analysis model integrates multi-dimensional business characteristics, calculates the target relevance score between the original query statement and the document summaries associated with the candidate key statements, and determines the final query results, achieving traceable results and verifiable basis, significantly improving the accuracy, robustness, and legal interpretability of case matching and legal provision association.
[0093] In one implementation, the statement rewriting module 304 is specifically used for: Identify the legal elements and question type tags in the original query statement. The question type tags should at least include case retrieval, legal provision interpretation, and legal opinion query. The legal elements are expanded using synonyms and / or decomposed into elements. The newly obtained legal elements are then substituted into the legal retrieval paradigm corresponding to the question type label to obtain multiple sub-queries.
[0094] In one implementation, a database building module is also included, for: Obtain the document summary corresponding to the legal document; Extracting original key sentences from document summaries; The index building model is called for multiple index categories. The original key statement is indexed relative to multiple index categories through the index building model. The index categories include at least dense index categories and sparse index categories. A densely indexed legal document database is constructed based on the statement index of the dense index category and its associated original key statements, while a sparsely indexed legal document database is constructed based on the statement index of the sparse index category and its associated original key statements.
[0095] In one implementation, the candidate statement determination module 306 is specifically used for: For any subquery statement, perform the following operation: The subquery statement is encoded into a query vector, and the first recall result is determined based on the semantic similarity between the query statement and the dense index stored in the dense index legal document database. Extract the term to be retrieved from the subquery statement, and retrieve the original key statement containing the term to be retrieved from the sparse index stored in the sparse index legal document database as the second recall result. The first and second recall results are merged to obtain the target recall result corresponding to the subquery statement.
[0096] In one implementation, the correlation analysis module 308 is specifically used for: The document summary associated with the original query statement and each candidate key statement in the candidate key statement set is used as input to the pre-trained legal relevance analysis model; By using a legal relevance analysis model, cross-attention processing is performed on the document summaries associated with each candidate key statement to obtain an initial relevance score between the original query statement and the document summaries associated with each candidate key statement. By dynamically adjusting the initial relevance score using multi-dimensional business characteristics, the target relevance score between the original query statement and the document summary associated with each candidate key statement is obtained.
[0097] In one implementation, the multi-dimensional business characteristics include case authority weight, timeliness weight, and regional relevance weight; the relevance analysis module 308 is specifically used for: Based on the case authority weight, timeliness weight, and regional relevance weight, the initial relevance score is dynamically adjusted by combining the preset fusion coefficient to obtain the target relevance score between the original query statement and the document summary associated with each candidate key statement.
[0098] In one implementation, a visualization module is also included, for: The query results are sent to a specified associated terminal to display the document summary contained in the query results on the graphical user interface of the specified associated terminal, and the position of the target key statement in the document summary is rendered with a specified effect.
[0099] The system provided in this embodiment of the invention has the same implementation principle and technical effects as the aforementioned method embodiment. For the sake of brevity, any parts not mentioned in the system embodiment can be referred to the corresponding content in the aforementioned method embodiment.
[0100] This invention provides an electronic device, specifically, the electronic device includes a processor and a memory; the memory stores a computer program, which, when run by the processor, executes the method described in any of the above embodiments.
[0101] Figure 4 The present invention provides a schematic diagram of the structure of an electronic device 100, which includes a processor 40, a memory 41, a bus 42 and a communication interface 43. The processor 40, the communication interface 43 and the memory 41 are connected through the bus 42. The processor 40 is used to execute executable modules, such as computer programs, stored in the memory 41.
[0102] The memory 41 may include high-speed random access memory (RAM) or non-volatile memory, such as at least one disk storage device. Communication between this system network element and at least one other network element is achieved through at least one communication interface 43 (which can be wired or wireless), such as the Internet, wide area network, local area network, metropolitan area network, etc.
[0103] Bus 42 can be an ISA bus, PCI bus, or EISA bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 4 The symbol is represented by a single double-headed arrow, but this does not mean that there is only one bus or one type of bus.
[0104] The memory 41 is used to store programs. After receiving an execution instruction, the processor 40 executes the programs. The methods executed by the system defined by the flow process disclosed in any of the foregoing embodiments of the present invention can be applied to the processor 40 or implemented by the processor 40.
[0105] Processor 40 may be an integrated circuit chip with signal processing capabilities. In implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of processor 40 or by instructions in software form. Processor 40 can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this invention. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this invention can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The storage medium is located in memory 41. The processor 40 reads the information in memory 41 and, in conjunction with its hardware, completes the steps of the above method.
[0106] The computer program product of the readable storage medium provided in the embodiments of the present invention includes a computer-readable storage medium storing program code. The instructions included in the program code can be used to execute the methods described in the foregoing method embodiments. For specific implementation, please refer to the foregoing method embodiments, which will not be repeated here.
[0107] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0108] Finally, it should be noted that the above-described embodiments are merely specific implementations of the present invention, used to illustrate the technical solutions of the present invention, and not to limit it. The scope of protection of the present invention is not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present invention, or make equivalent substitutions for some of the technical features; and these modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for searching legal documents based on multi-granularity index and hierarchical ordering, characterized in that, include: Receive the original query statement uploaded by the user; The original query statement is rewritten into multiple sub-query statements; Based on a pre-built multi-indexed legal document database, the target recall results of each subquery statement relative to multiple index categories are determined, and the target recall results of all subquery statements relative to multiple index categories are aggregated into a candidate key statement set; wherein, the multi-indexed legal document database stores statement indexes of multiple index categories and their associated original key statements, and the original key statements are obtained by extracting key statements from the document summary of the legal document. By using a pre-trained legal relevance analysis model and combining multi-dimensional business characteristics, a target relevance score is determined between the original query statement and the document summary associated with each candidate key statement in the candidate key statement set. The target relevance score is used to characterize the degree of relevance between the document summary associated with the candidate key statement and the original query statement. The query results corresponding to the original query statement are determined based on the target relevance score. The query results include at least the target key statement, the document summary associated with the target key statement, and the legal document to which the target key statement belongs. 2.The legal document retrieval method based on multi-granularity index and hierarchical ordering according to claim 1, wherein, The original query statement is rewritten into multiple subquery statements, including: Identify the legal elements and question type tags in the original query statement, wherein the question type tags include at least case retrieval, legal provision interpretation, and legal opinion query. The legal elements are expanded using synonyms and / or decomposed into elements. The newly obtained legal elements are then substituted into the legal retrieval paradigm corresponding to the question type label to obtain multiple sub-queries. 3.The legal document retrieval method based on multi-granularity index and hierarchical ordering according to claim 1, wherein, Before determining the recall results of each subquery relative to multiple index categories based on a pre-built multi-indexed legal document database, the method further includes: Obtain document summaries corresponding to legal documents; Extract the original key sentences from the document summary; Invoke the index building model corresponding to multiple index categories, and build the statement index of the original key statement relative to the multiple index categories through the index building model. The index categories include at least dense index categories and sparse index categories. A densely indexed legal document database is constructed based on the statement index of the dense index category and its associated original key statements, and a sparsely indexed legal document database is constructed based on the statement index of the sparse index category and its associated original key statements. 4.The method of claim 3, wherein, Based on a pre-built multi-indexed legal document database, the target recall results for each subquery statement relative to multiple index categories are determined, including: For any of the subquery statements, perform the following operation: The subquery statement is encoded into a query vector, and the first recall result is determined based on the semantic similarity between the query statement and the dense index stored in the dense indexed legal document database. Extract the term to be retrieved from the subquery statement, and retrieve the original key statement containing the term to be retrieved from the sparse index stored in the sparse index legal document database, as the second recall result; The first recall result and the second recall result are merged to obtain the target recall result corresponding to the subquery statement. 5.The legal document retrieval method based on multi-granularity index and hierarchical ordering according to claim 1, wherein, By using a pre-trained legal relevance analysis model and combining multi-dimensional business characteristics, the target relevance score between the original query statement and the document summary associated with each candidate key statement in the candidate key statement set is determined, including: The document summary associated with the original query statement and each candidate key statement in the candidate key statement set is used as input to a pre-trained legal relevance analysis model. Using the legal relevance analysis model, cross-attention processing is performed on the original query statement and the document summary associated with each candidate key statement to obtain an initial relevance score between the original query statement and the document summary associated with each candidate key statement. The initial relevance score is dynamically adjusted using multi-dimensional business features to obtain the target relevance score between the original query statement and the document summary associated with each of the candidate key statements. 6.The method of claim 5, wherein, The multi-dimensional business features include case authority weight, timeliness weight, and regional relevance weight; the initial relevance score is dynamically adjusted using these multi-dimensional business features to obtain a target relevance score between the original query statement and the document summary associated with each of the candidate key statements, including: Based on the case authority weight, the timeliness weight, and the regional relevance weight, the initial relevance score is dynamically adjusted in combination with a preset fusion coefficient to obtain the target relevance score between the original query statement and the document summary associated with each of the candidate key statements. 7.The method of claim 1, wherein, After determining the query result corresponding to the original query statement based on the target relevance score, the method further includes: The query results are sent to a designated associated terminal, whereby the document summary contained in the query results is displayed in the graphical user interface of the designated associated terminal, and the position of the target key statement in the document summary is rendered with a specified effect.
8. A legal document retrieval system based on multi-granularity indexing and hierarchical ordering, characterized in that, include: The statement receiving module is used to receive the original query statements uploaded by the user; The statement rewriting module is used to rewrite the original query statement into multiple subquery statements; The candidate statement determination module is used to determine the target recall results of each subquery statement relative to multiple index categories based on a pre-built multi-index legal document database, and to aggregate the target recall results of all subquery statements relative to multiple index categories into a candidate key statement set; wherein, the multi-index legal document database stores statement indexes of multiple index categories and their associated original key statements, and the original key statements are obtained by extracting key statements from the document summary of the legal document; The relevance analysis module is used to determine the target relevance score between the original query statement and the document summary associated with each candidate key statement in the candidate key statement set by using a pre-trained legal relevance analysis model and combining multi-dimensional business characteristics. The target relevance score is used to characterize the degree of relevance between the document summary associated with the candidate key statement and the original query statement. The result determination module is used to determine the query result corresponding to the original query statement based on the target relevance score. The query result includes at least the target key statement, the document summary associated with the target key statement, and the legal document to which the target key statement belongs.
9. An electronic device, comprising: The method includes a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement the method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions that, when invoked and executed by a processor, cause the processor to perform the method according to any one of claims 1 to 7.