Contract review method and system
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUANGSHI OF HUBEI TOBACCO CORP
- Filing Date
- 2026-04-01
- Publication Date
- 2026-06-30
Smart Images

Figure CN122309739A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of segmented assembly technology in shipbuilding, and in particular to a contract review method and system. Background Technology
[0002] With the deepening of enterprise digital transformation, contract review, as a core link in enterprise risk management, is increasingly demanding intelligent solutions. Retrieval-Augmented Generation (RAG) technology, due to its ability to combine external knowledge bases with the generation capabilities of large language models, is gradually being applied to the field of intelligent contract review.
[0003] Among related technologies, RAG-based contract review solutions still have the following problems: First, in terms of retrieval accuracy, traditional vector retrieval methods are easily affected by factors such as semantic breaks in clauses and ambiguities in professional terminology when processing contract texts, resulting in the need to improve the relevance of recalled clause fragments to review requirements. Second, in terms of knowledge representation, existing solutions mostly rely on simple segmentation and vectorized storage of legal texts, resulting in a relatively simple knowledge organization dimension, which is difficult to fully support diverse query intentions. Third, in terms of prompt word design, the prompt word templates used in the review process are usually statically set, and there is room for improvement in their adaptability to different contract types or review scenarios. Fourth, in terms of system iteration, most solutions lack effective utilization of review result feedback information, making it difficult to achieve dynamic optimization of retrieval strategies and continuous updates to the knowledge base.
[0004] Therefore, it is necessary to provide a contract review solution that can achieve high-precision retrieval, dynamic prompt optimization, and continuous learning capabilities to improve the accuracy and efficiency of intelligent contract review. Summary of the Invention
[0005] In view of this, it is necessary to provide a contract review method and system to solve the technical problems of low accuracy and efficiency in contract review.
[0006] To address the aforementioned problems, in a first aspect, the present invention provides a contract review method, comprising: The contract to be reviewed is searched through multiple channels based on a pre-built contract knowledge base to obtain contract search results; The contract retrieval results are reordered to generate an enhanced context for the contract to be reviewed; Match contract prompt word templates from the prompt word template library according to the contract type of the contract to be reviewed, and adjust the contract prompt word templates based on historical feedback information to generate optimized prompt words; The contract to be reviewed, the enhanced context, and the optimized prompts are input into a pre-scheduled optimal large language model to generate contract review opinions; The contract review comments are analyzed in a structured manner, and the structured review results are output. Based on user feedback on the review results, adjust at least one of the strategy parameters used in the multi-channel retrieval and re-ranking, as well as the weight and content of the prompt word template library, and trigger an incremental update of the contract knowledge base.
[0007] In one possible implementation, the steps for constructing the pre-built contract knowledge base include: A large model is used to perform semantic segmentation on the input legal and regulatory text, generating semantic text blocks; For each of the aforementioned semantic text blocks, a T2Q question-answer pair generation model is used to generate multiple question-answer pairs; The semantic text blocks and their corresponding question-and-answer pairs are encoded into semantic vectors and question-and-answer pair vectors, respectively, and stored in text block vector libraries and question-and-answer pair vector libraries, respectively. The mapping relationship between the semantic vectors and question-and-answer pair vectors and the original clause metadata is established to construct the contract knowledge base.
[0008] In one possible implementation, the multi-channel retrieval of the contract to be reviewed based on a pre-built contract knowledge base includes: The first channel retrieval result is obtained by retrieving text blocks that are semantically similar to the contract to be reviewed from the text block vector library; The second-channel retrieval results are obtained by retrieving question-answer pairs from the question-answer pair database that match the content of the contract to be reviewed. The search results from the first channel and the search results from the second channel are combined to form the contract search result.
[0009] In one possible implementation, retrieving text blocks from the text block vector library that are semantically similar to the contract under review to obtain a first-channel retrieval result includes: The retrieval strategy parameters adopted by the multi-channel retrieval are obtained. The retrieval strategy parameters include preset document block size and overlapping window ratio, vector retrieval weight, keyword retrieval weight, vector similarity threshold, and initial recall quantity Top-K. The text blocks in the text block vector library are generated by semantic slicing of the legal and regulatory text based on the document block size and overlapping window ratio. A hybrid retrieval strategy is adopted to calculate the vector similarity score between the contract to be reviewed and each text block, as well as the keyword matching score between the contract to be reviewed and each text block. The vector similarity score and the keyword matching score are combined and calculated based on the vector retrieval weight and the keyword retrieval weight to obtain a comprehensive relevance score. Text blocks with vector similarity scores lower than the vector similarity threshold are filtered out. From the remaining text blocks, K text blocks are recalled from high to low based on their comprehensive relevance scores, which are used as the first channel retrieval results.
[0010] In one possible implementation, retrieving question-answer pairs from the question-answer pair database that match the content of the contract to be reviewed, and obtaining the second-channel retrieval results, includes: Obtain the retrieval strategy parameters used in the multi-channel retrieval, which also include a similarity threshold and an initial recall quantity Top-N; The contract to be reviewed is vectorized to obtain a query vector; In the question-answer pair database, the similarity score between the query vector and each question-answer pair vector is calculated; Filter out question-answer pairs with similarity scores below the aforementioned similarity threshold; N question-answer pairs that match the content of the contract to be reviewed, ranked from highest to lowest similarity score, are retrieved as the second channel search results.
[0011] In one possible implementation, reordering the contract retrieval results to generate an enhanced context for the contract to be examined includes: Obtain the sorting strategy parameters used for the re-sorting, the sorting strategy parameters including the number of Top-N items to be retained after re-sorting and the re-sorting score threshold; A Cross-Encoder reordering strategy is adopted, and a fine-ranking model is used to evaluate the deep semantic relevance of each candidate document block in the contract retrieval results to obtain the relevance score of each candidate document block. The candidate document blocks are reordered according to the relevance scores. Filter out candidate document blocks whose relevance scores are lower than the reordering score threshold; Based on the top-N retained quantities after reordering, the N document blocks with the highest relevance scores are selected from the remaining candidate document blocks as the enhanced context of the contract to be reviewed.
[0012] In one possible implementation, adjusting the contract prompt template based on historical feedback information to generate optimized prompts includes: Based on the historical feedback information, the prompt elements in the contract prompt template are adjusted. The prompt elements include at least one of the following: review focus, output format rigor, and role setting, to generate the optimized prompt.
[0013] In one possible implementation, triggering the incremental update of the contract knowledge base includes: Incremental update processes can be initiated periodically or through event-triggered methods. The newly added institutional texts will be semantically sliced, question-and-answer pairs generated, and vectorized, and then integrated into the contract knowledge base.
[0014] In one possible implementation, the structured review results include an interactive review report; The interactive review report has interactive access functionality, which includes at least one of the following: source tracing, version comparison, and report export.
[0015] Secondly, the present invention also provides a contract review system, comprising: The retrieval unit is used to perform multi-channel retrieval of the contract to be reviewed based on a pre-set contract knowledge base to obtain contract retrieval results; A sorting unit is used to reorder the contract retrieval results and generate an enhanced context for the contract to be reviewed. The adjustment unit is used to match contract prompt word templates from the prompt word template library according to the contract type of the contract to be reviewed, and adjust the contract prompt word templates based on historical feedback information to generate optimized prompt words; The generation unit is used to input the contract to be reviewed, the enhanced context, and the optimized prompt words into a pre-scheduled optimal large language model to generate contract review opinions. The parsing unit is used to perform structured parsing of the contract review opinions and output structured review results; The iterative optimization unit is used to adjust at least one of the strategy parameters used in the multi-channel retrieval and the re-ranking, as well as the weight and content of the prompt word template library, based on user feedback on the review results, and to trigger incremental updates to the contract knowledge base.
[0016] Thirdly, the present invention also provides a computer-readable storage medium for storing a computer-readable program or instructions, which, when executed by a processor, can implement the steps of the contract review method described in any of the above implementations.
[0017] The beneficial effects of this invention are: The contract review method provided by this invention performs multi-channel retrieval of the contract to be reviewed based on a pre-built contract knowledge base to obtain contract retrieval results. This multi-channel retrieval method allows for the acquisition of knowledge related to the contract content from different dimensions, improving the comprehensiveness of the retrieval results. The contract retrieval results are reordered to generate enhanced context for the contract to be reviewed, ensuring that the knowledge base input into the large language model has high accuracy and relevance. Contract prompt word templates are matched from a prompt word template library according to the contract type of the contract to be reviewed, and the prompt word templates are adjusted based on historical feedback information to generate optimized prompt words, thus generating optimized prompt words that are more suitable for the current review scenario. The method then integrates the contract to be reviewed, the enhanced context, and the optimized prompt words. The optimal large language model, pre-scheduled for word input, generates contract review opinions. By combining high-quality enhanced context with targeted optimized prompts, the accuracy and usability of the model's generated results are improved. The contract review opinions are structured and parsed to output structured review results, enhancing the readability and practicality of the results. Based on user feedback on the review results, the strategy parameters used in multi-channel retrieval and re-ranking, as well as at least one of the weights and contents of the prompt template library, are adjusted, triggering incremental updates to the contract knowledge base. Through a closed-loop feedback mechanism, retrieval accuracy, ranking effect, and prompt adaptability are improved, ensuring continuous optimization of review capabilities and thus enhancing the accuracy and efficiency of intelligent contract review. Attached Figure Description
[0018] Figure 1 A schematic flowchart of an embodiment of the contract review method provided by the present invention; Figure 2 A flowchart illustrating the construction process of the pre-built contract knowledge base provided by this invention; Figure 3 For the present invention Figure 1 A schematic diagram of an embodiment of S101; Figure 4 For the present invention Figure 3 A schematic diagram of an embodiment of S301; Figure 5 This is a schematic diagram of the structure of the contract review system provided by the present invention. Detailed Implementation
[0019] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.
[0020] In the description of the embodiments of the present invention, unless otherwise stated, "a plurality of" means two or more.
[0021] The terms "first," "second," etc., used in the embodiments of this invention are for descriptive purposes only and should not be construed as indicating or implying their relative importance or implicitly specifying the number of technical features indicated. Therefore, a technical feature defined with "first" or "second" may explicitly or implicitly include at least one of that feature.
[0022] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of the invention. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0023] This invention provides a contract review method and system, which will be described below.
[0024] The execution entity of the contract review method in this application embodiment can be the contract review system provided in this application embodiment, or different types of electronic devices such as server equipment, physical host, or user equipment (UE) that integrate the contract review system. The contract review system can be implemented in hardware or software. The UE can be a terminal device such as a smartphone, tablet computer, laptop computer, handheld computer, desktop computer, or personal digital assistant (PDA).
[0025] This invention provides a contract review method that can be applied to corporate legal management scenarios, enabling intelligent compliance review of user-submitted contracts.
[0026] Figure 1 A schematic flowchart of an embodiment of the contract review method provided by the present invention is shown below. Figure 1 As shown, contract review methods include: S101. Based on a pre-built contract knowledge base, perform multi-channel retrieval on the contract to be reviewed to obtain contract retrieval results.
[0027] Multi-channel retrieval refers to simultaneously employing at least two different retrieval paths to obtain relevant knowledge from the contract knowledge base.
[0028] Specifically, upon receiving a contract submitted by a user for review, a multi-channel search is performed on the contract based on a pre-built contract knowledge base. This search process can execute two search channels in parallel, retrieving relevant legal and regulatory provisions and pre-generated question-and-answer pairs from the knowledge base respectively. The search results from both channels are then combined to form the preliminary contract search results. This multi-channel search approach allows for the acquisition of knowledge related to the contract content from different dimensions, improving the comprehensiveness of the search results.
[0029] S102. The contract retrieval results are reordered to generate an enhanced context for the contract to be reviewed.
[0030] Among them, enhanced context refers to the set of contract retrieval results that have been reordered and filtered before being input into the large language model to assist in generating contract review opinions. Compared with the contract retrieval results, this set has improved the relevance and quality of knowledge fragments.
[0031] Specifically, the contract retrieval results are re-ranked. Since the initial retrieval results may contain knowledge fragments with low relevance to the contract review requirements, a re-ranking model is used to perform a deep semantic relevance assessment on each candidate result. Based on the assessment results, the results are re-ranked, and the highest quality knowledge fragments are selected. These selected knowledge fragments, together with the contract to be reviewed, constitute an enhanced context, ensuring that the knowledge basis input into the large language model has high accuracy and relevance.
[0032] S103. Match contract prompt word templates from the prompt word template library according to the contract type of the contract to be reviewed, and adjust the contract prompt word templates based on historical feedback information to generate optimized prompt words.
[0033] The types of contracts to be reviewed can include purchase contracts, lease contracts, cooperation agreements, etc. Different types correspond to different keywords. For example, for purchase contracts, the keywords may focus on key review points such as supplier qualifications, delivery deadlines, and payment terms; for lease contracts, the focus may be more on rent payment methods, division of maintenance responsibilities, and early termination clauses.
[0034] Historical feedback information refers to the evaluative data collected and recorded during past contract review processes, reflecting user feedback on the review results. After the system outputs contract review comments to the user, the user can evaluate the review results, such as marking or rating whether the comments are accurate, whether the sources of evidence are appropriate, and whether the suggested modifications are reasonable.
[0035] Optimized suggestion words refer to adaptive review instructions generated through dynamic adjustments. They are optimized versions formed by dynamically adjusting based on the specific type of contract to be reviewed and user feedback accumulated during historical review processes.
[0036] Specifically, based on the type of contract to be reviewed, a corresponding basic template, i.e., a contract prompt template, is matched from a pre-set prompt template library. Simultaneously, historical user feedback information is recorded during the review process. Based on this feedback, the review focus, output format requirements, and other elements in the contract prompt template are dynamically adjusted to generate optimized prompts that better match the current review scenario. This allows the system to adapt to the review needs of different contract types and enhances the relevance of the generated content.
[0037] S104. Input the contract to be reviewed, the enhanced context, and the optimized prompts into the pre-scheduled optimal large language model to generate contract review opinions.
[0038] The optimal large language model refers to the model selected from multiple available large language models through an intelligent scheduling mechanism that performs best overall under the current review task. Optionally, the large language models include, but are not limited to: GPT series models (such as GPT-4 and GPT-4 Turbo), which possess strong general text understanding and generation capabilities and are suitable for contract clause review scenarios requiring deep semantic analysis; Claude series models (such as Claude 3 Opus and Claude 3 Sonnet), which perform well in legal text processing and can accurately identify risk clauses in contracts; ERNIE Bot, which is specifically optimized for Chinese legal text processing and is suitable for domestic legal and regulatory review scenarios; Qwen series models, which possess strong Chinese understanding capabilities and fast response speeds, suitable for review tasks with high real-time requirements; and open-source models (such as ChatGLM, Baichuan, and LLaMA), which can be privately deployed according to the enterprise's actual deployment environment to meet data security and cost control requirements.
[0039] Contract review opinions refer to the judgment results on contract compliance generated by the large language model after comprehensive analysis of three input components: the contract under review, enhanced context, and optimized prompts. These opinions include information from multiple dimensions, such as risk clause location (identifying the location and specific wording of clauses in the contract that pose compliance risks); violation nature determination (clarifying the nature of the conflict between the clause and the applicable laws and regulations); specific regulatory basis for the violation (citing relevant provisions from the knowledge base as the basis for judgment); modification suggestions (proposing specific revisions to the risk clauses); and a source index (linking to the original provisions in the knowledge base, facilitating user verification of the accuracy of the basis).
[0040] Specifically, a pre-determined optimal large language model is invoked, and the contract to be reviewed, enhanced context, and optimized prompts are input into the model. Based on the legal basis in the enhanced context and the review instructions in the optimized prompts, the model analyzes the contract to be reviewed and generates structured contract review opinions. By combining high-quality enhanced context with targeted optimized prompts, the accuracy and usability of the model's generated results are improved.
[0041] In one specific implementation, the system pre-connects to multiple different large language model interfaces, such as commercial models provided by different vendors or open-source self-deployed models, and these models differ in terms of response speed, processing cost, and understanding of specific contract domains. When contract review opinions need to be generated, the system performs a comprehensive evaluation and dynamic scheduling based on indicators such as the real-time response requirements of the current task, cost budget constraints, and the historical performance of different models on specific contract types. For example, for simple contract reviews requiring rapid response, the system may prioritize lightweight models with faster response times; for major contract reviews involving complex legal clauses, the system may choose professional models with stronger legal text understanding capabilities; and in cost-sensitive scenarios, the system prioritizes models with higher cost-effectiveness.
[0042] Understandably, through dynamic scheduling, the system can achieve the optimal balance between performance, cost, and effectiveness in different review scenarios, avoiding the impact of a single model's poor performance in a specific scenario on the overall review quality. The decision-making basis for model scheduling can include feedback data such as real-time monitoring response time, historical review result accuracy, and user satisfaction ratings for previous review results. This data is continuously accumulated and used to optimize subsequent scheduling decisions.
[0043] S105. Perform structured parsing on the contract review comments and output structured review results.
[0044] Structured parsing refers to decomposing and recombining contract review opinions in the form of natural language text generated by a large language model according to a pre-defined data structure. This process extracts key fields with clear semantics and uses, forming structured data that can be directly processed by computer systems. Structured parsing can be implemented in several ways. One approach is to use JSON format and define the names and meanings of each field in the prompts. Another approach is to post-process the text after it has been generated by the model using an independent semantic parsing model or rule engine to extract information from predefined fields. Alternatively, both approaches can be combined to improve the accuracy and robustness of the parsing.
[0045] Specifically, the contract review opinions are structured and analyzed to extract key fields, forming a structured review result that is then output. These key fields can include risk clause identification, determination of the nature of the violation, specific institutional basis for the violation, modification suggestions, and index of the source of the basis. This structured result allows users to quickly locate risk clauses and related basis in the contract, improving the readability and usability of the review results. Furthermore, the structured review result provides a clear data foundation for subsequent feedback iterations, facilitating system analysis of user acceptance of each specific review point, thereby enabling more targeted optimization of search strategies and prompt keyword configurations.
[0046] S106. Based on user feedback on the review results, adjust at least one of the strategy parameters used in the multi-channel retrieval and re-ranking, as well as the weight and content of the prompt word template library, and trigger an incremental update of the contract knowledge base.
[0047] Among them, the strategy parameters used in multi-channel retrieval and the strategy parameters used in re-ranking jointly determine the recall quality and ranking accuracy of the retrieval results. By dynamically adjusting them, the review capability can be continuously optimized.
[0048] Specifically, after the review results are output, user feedback is collected. Based on this feedback, the system automatically adjusts the strategy parameters used for multi-channel retrieval and re-ranking, as well as at least one of the weights and contents of the prompt word template library, while simultaneously triggering incremental updates to the contract knowledge base. Through this closed-loop feedback mechanism, retrieval accuracy, ranking effectiveness, and prompt word adaptability are improved, ensuring continuous optimization of review capabilities. This enhances the accuracy of search results, the effectiveness of ranking and filtering, and the matching degree between prompt words and review scenarios, thereby improving the accuracy and efficiency of intelligent contract review.
[0049] This embodiment improves the accuracy and relevance of search results by combining multi-channel retrieval and re-ranking; enhances the system's adaptability to different contract types and review scenarios through dynamic prompt keyword optimization; and possesses the ability to continuously evolve through a closed-loop feedback mechanism and incremental updates to the knowledge base, enabling it to continuously optimize review effects as business develops, thereby improving the accuracy and efficiency of intelligent contract review.
[0050] In summary, the contract review method provided by this invention performs multi-channel retrieval of the contract to be reviewed based on a pre-built contract knowledge base to obtain contract retrieval results. This multi-channel retrieval method allows for the acquisition of knowledge related to the contract content from different dimensions, improving the comprehensiveness of the retrieval results. The contract retrieval results are reordered to generate enhanced context for the contract to be reviewed, ensuring that the knowledge base input into the large language model has high accuracy and relevance. Contract prompt word templates are matched from a prompt word template library according to the contract type of the contract to be reviewed, and the prompt word templates are adjusted based on historical feedback information to generate optimized prompt words, thereby generating optimized prompt words that are more suitable for the current review scenario. The method then integrates the contract to be reviewed, the enhanced context, and... The optimal large language model, pre-scheduled for input prompt words, is optimized to generate contract review opinions. By combining high-quality enhanced context with targeted optimized prompt words, the accuracy and usability of the model's generated results are improved. The contract review opinions are structured and parsed to output structured review results, enhancing their readability and practicality. Based on user feedback on the review results, the strategy parameters used in multi-channel retrieval and re-ranking, as well as at least one of the weights and contents of the prompt word template library, are adjusted, triggering incremental updates to the contract knowledge base. Through this closed-loop feedback mechanism, retrieval accuracy, ranking performance, and prompt word adaptability are improved, ensuring continuous optimization of review capabilities and ultimately enhancing the accuracy and efficiency of intelligent contract review.
[0051] In some embodiments of the present invention, such as Figure 2 As shown, the steps for constructing the pre-built contract knowledge base include: S201. Use a large model to perform semantic segmentation on the input legal and regulatory text to generate semantic text blocks; S202. For each of the semantic text blocks, a T2Q question-answering pair generation model is used to generate multiple question-answering pairs; S203. Encode the semantic text block and the corresponding question-and-answer pair into semantic vectors and question-and-answer pair vectors respectively, and store them in the text block vector library and the question-and-answer pair vector library respectively. Establish the mapping relationship between the semantic vectors and question-and-answer pair vectors and the original clause metadata to construct the contract knowledge base.
[0052] Semantic slicing refers to the process of using a large language model to semantically understand legal and regulatory texts, identifying natural paragraphs, clause boundaries, and semantic transition points, and dividing the text into semantically complete and independent blocks. The large language model refers to a pre-trained language model with text understanding and semantic analysis capabilities. Optionally, this large language model includes, but is not limited to: GPT series models (such as GPT-3.5, GPT-4), Claude series models, LLaMA series and its derivative models (such as Chinese-LLaMA, Alpaca), etc.
[0053] T2Q (Text-to-Question) question-answer pair generation models are deep learning models capable of extracting or generating relevant questions and their corresponding answers from given text. These models take semantic text blocks as input and output several question-answer combinations related to the content of those text blocks. Examples of such models include, but are not limited to, the T5 (Text-to-Text Transfer Transformer) series, the BART (Bidirectional and Auto-Regressive Transformer) model, the UnifiedQA model, and the GPT series. In practical applications, the accuracy and relevance of generated question-answer pairs can be further improved through fine-tuning on domain-specific datasets.
[0054] Vectorization refers to the process of converting text data into a high-dimensional vector representation using a vectorization model. Vectorization models include, but are not limited to, BGE-M3, OpenAI Embedding, or text2vec. Metadata refers to information describing the source of the original provisions, including but not limited to the name of the law to which the provisions belong, the clause number, the chapter position, and the document identifier.
[0055] Specifically, the process involves acquiring legal and regulatory texts that require knowledge-based processing. A large-scale model is used to semantically segment the input legal and regulatory texts, generating semantic text blocks. Unlike traditional fixed-length mechanical segmentation methods, this semantic segmentation process utilizes a large language model to semantically understand the text content, ensuring that each segmented text block remains semantically complete and independent.
[0056] For each semantic text block, a T2Q question-answer pair generation model is used to automatically generate multiple question-answer pairs for that text block. This model takes the semantic text block as input and outputs several question-answer combinations related to the content of that text block. A vectorization model is used to encode both into high-dimensional vectors. Each semantic text block is vectorized to obtain a corresponding semantic vector; each question-answer pair is vectorized to obtain a corresponding question-answer pair vector. The vectorized data are stored in a text block vector library and a question-answer pair vector library, respectively. During storage, a mapping relationship between vectors and original text metadata is established simultaneously, recording the correspondence between vector identifiers and metadata, so that subsequently retrieved vectors can quickly locate the original text content.
[0057] Understandably, this embodiment ensures the semantic integrity of knowledge units through semantic segmentation; it transforms static text into knowledge units that can be directly matched by natural language queries through the T2Q model, thereby achieving multi-dimensional organization of knowledge; and it provides support for efficient similarity retrieval and result tracing through vectorized storage and metadata mapping, thereby improving the accuracy of knowledge retrieval and the credibility of results in subsequent review processes.
[0058] In some embodiments of the present invention, such as Figure 3 As shown, step S101 includes: S301. Retrieve text blocks from the text block vector library that are semantically similar to the contract to be reviewed, and obtain the first channel retrieval result; S302. Retrieve question-answer pairs from the question-answer pair database that match the content of the contract to be reviewed, and obtain the second channel retrieval results; S303. The search results of the first channel and the search results of the second channel are merged to form the contract search result.
[0059] Multi-channel retrieval refers to the process of executing two retrieval channels in parallel, retrieving knowledge fragments related to the contract under review from a text block vector library and a question-answer pair vector library respectively, and then merging the retrieval results from the two channels.
[0060] Specifically, upon receiving a contract to be reviewed, the system retrieves text blocks semantically similar to the contract from a text block vector library, yielding the first-channel retrieval result. This retrieval process vectorizes the contract and calculates similarity in the text block vector library, recalling original legal and regulatory provisions semantically similar to the contract content. The system then retrieves question-and-answer pairs matching the contract content from a question-and-answer pair library, yielding the second-channel retrieval result. This process uses the contract text as the query, matches pre-defined question-and-answer pairs related to the contract content in the question-and-answer pair library, and recalls corresponding question-and-answer knowledge units. Finally, the first-channel and second-channel retrieval results are merged to form the final contract retrieval result.
[0061] Understandably, this embodiment executes two search channels in parallel to obtain knowledge related to the contract under review from the original text dimension and the question-and-answer pair dimension, respectively. This ensures that the search results include both the complete original text of the laws and regulations and the refined question-and-answer knowledge, thereby improving the comprehensiveness and diversity of the recall results. This provides richer candidate basis for the subsequent re-ranking process and avoids information omissions caused by a single search method.
[0062] In some embodiments of the present invention, such as Figure 4 As shown, step S301 includes: S401. Obtain the retrieval strategy parameters adopted by the multi-channel retrieval. The retrieval strategy parameters include preset document block size and overlapping window ratio, vector retrieval weight, keyword retrieval weight, vector similarity threshold, and initial recall quantity Top-K. The text blocks in the text block vector library are generated by semantic slicing of the legal and regulatory text based on the document block size and the overlapping window ratio. S402. Using a hybrid retrieval strategy, calculate the vector similarity score between the contract to be reviewed and each text block, and the keyword matching score between the contract to be reviewed and each text block. S403. The vector similarity score and the keyword matching score are fused together according to the vector retrieval weight and the keyword retrieval weight to obtain a comprehensive relevance score. S404. Filter out text blocks whose vector similarity scores are lower than the vector similarity threshold, and recall K text blocks from the remaining text blocks in descending order of comprehensive relevance score as the first channel retrieval results.
[0063] Among them, the hybrid retrieval strategy refers to the retrieval method that combines vector semantic retrieval with keyword full-text retrieval, and integrates the results of the two retrieval methods through weighting to obtain a more comprehensive retrieval effect.
[0064] Vector retrieval weight and keyword retrieval weight are used to control the proportion of vector similarity score and keyword matching score in the overall relevance score, respectively, and the sum of the two is 1.
[0065] The comprehensive relevance score is a score obtained by combining the vector similarity score and the keyword matching score according to preset weights. It is used to comprehensively measure the relevance of the text block to the contract under review.
[0066] Vector similarity threshold refers to a preset score used to filter low-relevance text blocks. Top-K initial recall refers to the number of candidate text blocks initially recalled during the retrieval phase.
[0067] Document chunk size refers to the length of text blocks set when semantically slicing legal and regulatory texts. Overlap window ratio refers to the proportion of the overlap area between adjacent text blocks to the length of the original text block.
[0068] Specifically, when performing text block retrieval, preset retrieval strategy parameters are obtained. Based on the document block size and overlap window ratio, the legal and regulatory text is semantically sliced, generating text blocks and storing them in a text block vector library. As an optional approach, the document block size can be set to 512 to 1024 tokens, and the overlap window ratio can be set to 15% to 25% to ensure that each text block maintains a relatively independent semantic unit, while avoiding semantic breaks caused by clauses being fragmented at segmentation boundaries.
[0069] A hybrid retrieval strategy was employed to calculate both the vector similarity score between the contract under review and each text block, as well as the keyword matching score between the contract under review and each text block. Vector retrieval was used to capture semantic relationships between clauses, such as identifying the conceptual similarity between "liability for breach of contract" and "liability for compensation." Keyword retrieval was used to ensure accurate matching of precise terms and values, such as contract number, amount percentage, and specific clause number.
[0070] The vector similarity score and keyword matching score are fused together using vector retrieval weights and keyword retrieval weights to obtain a comprehensive relevance score. The vector retrieval weight α can be set from 0.3 to 0.8, and the corresponding keyword retrieval weight is 1-α. Adjusting this weight can balance the contributions of semantic matching and exact matching in the retrieval process.
[0071] Text blocks with vector similarity scores below a threshold (0.65 to 0.85) are filtered out to reduce noise interference. From the remaining text blocks, K text blocks are recalled in descending order of their overall relevance scores as the first-channel retrieval results. The initial recall size Top-K can be set to 20 to 50 to determine the number of candidate document blocks to be sent to the re-ranking stage.
[0072] Understandably, this embodiment balances the contributions of semantic matching and exact matching by setting vector retrieval weights and keyword retrieval weights, enabling the retrieval results to capture semantic relationships between terms while ensuring accurate matching of precise terms and values; it filters low-relevance text blocks by using vector similarity thresholds to reduce noise in subsequent processing; it ensures the semantic integrity of text blocks by setting document block size and overlapping window ratio; and it achieves a balance between recall rate and processing efficiency by setting the initial recall number, providing high-quality candidate results for subsequent re-ranking stages.
[0073] In some embodiments of the present invention, step S302 includes: obtaining the retrieval strategy parameters used in the multi-channel retrieval, wherein the retrieval strategy parameters further include a similarity threshold and an initial recall quantity Top-N; vectorizing the contract to be reviewed to obtain a query vector; calculating the similarity score between the query vector and each question-answer pair vector in the question-answer pair database; filtering out question-answer pairs with similarity scores lower than the similarity threshold; and recalling N question-answer pairs that match the content of the contract to be reviewed in descending order of similarity scores as the second channel retrieval results.
[0074] Among them, the retrieval strategy parameters refer to the behavioral parameters used to control the retrieval process, including the similarity threshold and the initial recall quantity Top-N.
[0075] The similarity threshold is used to set the lower limit of the vector similarity score. Question-answer pairs below this threshold are judged as irrelevant. The initial recall number Top-N is used to determine the final number of question-answer pairs recalled.
[0076] A query vector is a high-dimensional vector representation obtained by transforming the text of the contract to be reviewed through a vectorization model. This vector is located in the same vector space as the vectors in the question-answering database, which facilitates similarity calculation.
[0077] Similarity score is a measure of the vector distance between the query vector and the question-answer pair vector. It can be calculated using cosine similarity or inner product. The higher the score, the stronger the semantic relevance between the two.
[0078] Specifically, when retrieving from the question-answer pair database, the retrieval strategy parameters used for multi-channel retrieval are first obtained, including the similarity threshold and the initial recall of Top-N pairs. The text of the contract to be reviewed is vectorized to obtain a query vector. In the question-answer pair database, the similarity score between the query vector and each question-answer pair vector is calculated one by one to obtain the relevance score of each question-answer pair to the contract to be reviewed. Based on the preset similarity threshold, question-answer pairs with similarity scores below the threshold are filtered out to exclude results with low relevance to the contract to be reviewed. From the remaining question-answer pairs, the top N question-answer pairs are selected in descending order of similarity scores as the second-channel retrieval results.
[0079] Understandably, this embodiment uses vectorized retrieval and threshold filtering to quickly filter out question-answer pairs that highly match the content of the contract to be reviewed from the question-answer pair library, thereby improving the efficiency and accuracy of the second-channel retrieval. By setting the initial recall quantity to Top-N, the number of candidates sent to the subsequent re-ranking stage is controlled, ensuring information sufficiency while avoiding unnecessary computational overhead.
[0080] In some embodiments of the present invention, step S102 includes: obtaining the ranking strategy parameters used for the re-ranking, the ranking strategy parameters including the Top-N number of documents to be retained after re-ranking and a re-ranking score threshold; adopting a Cross-Encoder re-ranking strategy, using a fine-ranking model to perform deep semantic relevance evaluation on the candidate document blocks in the contract retrieval results, and obtaining a relevance score for each candidate document block; re-ranking each candidate document block according to the relevance score; filtering out candidate document blocks with relevance scores lower than the re-ranking score threshold; and selecting the N document blocks with the highest relevance scores from the remaining candidate document blocks according to the Top-N number of documents to be retained after re-ranking, as the enhanced context of the contract to be reviewed.
[0081] Re-ranking refers to the process of re-ranking the initial search results. The Cross-Encoder re-ranking strategy refers to a ranking model that uses a cross-encoder architecture. This model can simultaneously encode the query and candidate documents and perform deep semantic interaction calculations to output a relevance score.
[0082] A fine-ranking model refers to a deep learning model that has been trained to perform fine-grained ranking of candidate results. Optionally, fine-ranking models include, but are not limited to, BERT series models, RoBERTa models, DeBERTa models, or lightweight ranking models (such as MiniLM, DistilBERT), etc.
[0083] The sorting strategy parameters are adjustable parameters used to control the number and precision of screening during the re-sorting process, including the number of items to retain after re-sorting and the re-sorting score threshold.
[0084] Specifically, the ranking strategy parameters used for re-ranking are obtained, including the Top-N retained quantities after re-ranking and the re-ranking score threshold. A Cross-Encoder re-ranking strategy is adopted, and a fine-grained ranking model is used to evaluate the deep semantic relevance of each candidate document block in the contract retrieval results. The fine-grained ranking model takes the contract to be reviewed and each candidate document block as input simultaneously, and captures the fine-grained semantic interaction between them through an internal cross-attention mechanism, outputting a relevance score representing the degree of relevance. This score reflects the practical value of the candidate document block for the current review task.
[0085] Candidate document blocks are reordered based on relevance scores, prioritizing those with higher relevance. Candidate document blocks with relevance scores below the reordering threshold are filtered out, excluding knowledge fragments with low relevance to review requirements and preventing low-quality content from entering subsequent generation stages. The top-N most relevant document blocks after reordering are selected as the enhanced context for the contract under review. For example, when Top-N is set to 5, only the 5 most relevant knowledge fragments are ultimately retained.
[0086] Understandably, this embodiment uses a Cross-Encoder reordering strategy to achieve a deep evaluation of the fine-grained semantic interaction between candidate document blocks and the contract to be reviewed, improving the ranking result from "semantic similarity" to "deep relevance"; by setting a reordering score threshold to filter out low-relevance segments, the quality of the enhanced context is ensured; by setting the number of elements retained after reordering to control the context length, a balance is achieved between information sufficiency and model input constraints, thereby improving the accuracy of the review opinions generated by the large language model.
[0087] In some embodiments of the present invention, step S104 includes: adjusting the prompt word elements in the contract prompt word template based on the historical feedback information, wherein the prompt word elements include at least one of review focus, output format rigor, and role setting, and generating the optimized prompt word.
[0088] Historical feedback information refers to the evaluation data of users on previous contract review results, including at least one of the following: the degree of acceptance of the review opinions, feedback on the accuracy of the source of evidence, and the score of the overall review results.
[0089] The cue word elements refer to the core configuration items that constitute the cue word template, including the review focus, the rigor of the output format, and the role setting. The review focus is used to indicate the specific types of clauses or risk areas that the large language model will focus on during the review process; the rigor of the output format is used to control the level of detail and structure of the review comments; and the role setting is used to define the professional role played by the large language model during the review process.
[0090] Specifically, in the process of generating optimized prompts, user feedback on historical review results is first obtained. When a user gives a low rating or uses inaccurate criteria for reviewing a certain type of contract, this feedback is recorded and its relevance to the current prompt elements is analyzed. Based on the analysis results, the prompt elements in the prompt template are adjusted.
[0091] Furthermore, the adjusted prompt word elements are applied to the prompt word template corresponding to the current contract type to generate optimized prompt words. These optimized prompt words, along with the contract to be reviewed and the enhanced context, are input into the large language model to guide the model in generating review opinions according to the adjusted review focus, output format, and role positioning.
[0092] In one specific implementation, for procurement contracts, if historical feedback shows that users are more concerned about supplier qualification review and payment terms, the review focus will be adjusted to "supplier qualification verification and compliance of payment terms"; if users report that previous review opinions were too brief, the rigor of the output format will be increased, requiring the output to include detailed content such as the original text of the terms, the basis for the violation, the risk level, and suggested modification plans; if users want the review results to be more in line with the perspective of legal professionals, the role will be adjusted to a contract review expert with corporate legal experience.
[0093] Understandably, this embodiment dynamically adjusts the prompt word elements by introducing historical feedback information, enabling the prompt words to be adaptively optimized according to user preferences and the actual needs of the review scenario. This improves the relevance and practicality of the generated content and avoids the problem that static prompt words are difficult to adapt to diverse contract types and review requirements.
[0094] In some embodiments of the present invention, triggering the incremental update of the contract knowledge base includes: initiating the incremental update process through periodic or event-triggered methods; performing semantic slicing, question-answer pair generation, and vectorization processing on the newly added institutional text, and integrating it into the contract knowledge base.
[0095] Incremental updates refer to the process of dynamically incorporating newly added legal and regulatory texts into the existing knowledge base without rebuilding the entire knowledge base.
[0096] Event triggering refers to a startup method activated by specific conditions, including but not limited to user-initiated updates, administrator backend operations, detection of new files in the knowledge base directory, or receipt of update notifications from external systems.
[0097] Specifically, the incremental update process for the knowledge base is initiated when preset update conditions are met. These update conditions include two methods: one is periodic triggering, which executes automatically according to a preset time cycle, such as weekly or monthly; the other is event triggering, which executes immediately upon receiving an external instruction or detecting newly added regulatory text. After the incremental update process is initiated, the newly added regulatory text is acquired and sequentially processed through semantic slicing, question-and-answer pair generation, and vectorization encoding to generate corresponding semantic text blocks, question-and-answer pairs, and their vector representations. The processed data is then integrated into the existing text block vector library and question-and-answer pair vector library, respectively. Simultaneously, a mapping relationship is established between the vectors and the new article's metadata, enabling the new content to be accessed and used normally in subsequent retrieval processes.
[0098] Understandably, this embodiment initiates incremental updates periodically or through event triggering, which ensures the timeliness of the knowledge base while avoiding the computational resource consumption caused by rebuilding the entire knowledge base. By automatically slicing, generating question-answer pairs, and vectorizing the newly added text, it ensures that the new content has a consistent data structure and representation with the original knowledge base, thereby improving the efficiency and flexibility of knowledge base maintenance.
[0099] In some embodiments of the present invention, the structured review results include an interactive review report; the interactive review report has interactive access functionality, which includes at least one of source tracing, version comparison, and report export.
[0100] Interactive review reports refer to visual outputs generated based on structured data that allow users to interact with the report content.
[0101] Source tracing refers to the function of locating the original clause in the knowledge base from the source index in the review results.
[0102] Version comparison refers to the function of comparing the differences between review results of different versions. Report export refers to the function of outputting and saving the review results in a specified format.
[0103] Specifically, after generating structured review results, an interactive review report is generated based on this structured data. This report presents key information such as the identification of risk clauses, determination of the nature of violations, specific regulatory basis for the violations, suggested modifications, and an index of the sources of the basis. The index of the sources of the basis is presented in a clickable format; when the user clicks on the index, the report interface jumps to the corresponding original clause in the knowledge base, displaying the complete original text of the regulations or regulations. For multiple rounds of review of the same contract, the report interface provides a version comparison entry; after the user selects different versions, the system displays the changes in the review results of each version with highlighted differences. The report interface also provides an export control; after the user selects the export format, the system generates the current review results into a specified format file for saving or printing.
[0104] Understandably, this embodiment makes the process of obtaining review conclusions traceable by relying on the traceability function, thereby improving the credibility of the review results; the version comparison function makes it easy for users to track the evolution of review opinions, meeting the compliance management requirements for process traceability; and the report export function meets the archiving and circulation needs of review results in different scenarios.
[0105] To better implement the contract review method in the embodiments of the present invention, based on the contract review method, correspondingly, as follows: Figure 5As shown, this embodiment of the invention also provides a contract review system, the contract review system 500 including: The retrieval unit 501 is used to perform multi-channel retrieval of the contract to be reviewed based on a preset contract knowledge base to obtain contract retrieval results; The sorting unit 502 is used to reorder the contract retrieval results and generate an enhanced context for the contract to be reviewed. The adjustment unit 503 is used to match a contract prompt word template from the prompt word template library according to the contract type of the contract to be reviewed, and adjust the contract prompt word template based on historical feedback information to generate optimized prompt words; The generation unit 504 is used to input the contract to be reviewed, the enhanced context and the optimized prompt words into a pre-scheduled optimal large language model to generate contract review opinions. The parsing unit 505 is used to perform structured parsing on the contract review opinions and output structured review results; The iterative optimization unit 506 is used to adjust at least one of the strategy parameters used in the multi-channel retrieval and the re-ranking, as well as the weight and content of the prompt word template library, based on user feedback on the review results, and to trigger incremental updates to the contract knowledge base.
[0106] The contract review system 500 provided in the above embodiments can implement the technical solutions described in the above contract review method embodiments. The specific implementation principles of each module or unit can be found in the corresponding content in the above contract review method embodiments, and will not be repeated here.
[0107] Accordingly, this application also provides a computer-readable storage medium for storing computer-readable programs or instructions. When the programs or instructions are executed by a processor, they can implement the steps or functions in the contract review methods provided in the above-described method embodiments.
[0108] Those skilled in the art will understand that all or part of the processes of the methods described in the above embodiments can be implemented by a computer program instructing related hardware (such as a processor, controller, etc.), and the computer program can be stored in a computer-readable storage medium. The computer-readable storage medium may be a disk, optical disk, read-only memory, or random access memory, etc.
[0109] The contract review method and system provided by this invention have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this invention. The description of the above embodiments is only for the purpose of helping to understand the method and core ideas of this invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this invention. Therefore, the content of this specification should not be construed as a limitation of this invention.
Claims
1. A contract review method, characterized in that, include: The contract to be reviewed is searched through multiple channels based on a pre-built contract knowledge base to obtain contract search results; The contract retrieval results are reordered to generate an enhanced context for the contract to be reviewed; Match contract prompt word templates from the prompt word template library according to the contract type of the contract to be reviewed, and adjust the contract prompt word templates based on historical feedback information to generate optimized prompt words; The contract to be reviewed, the enhanced context, and the optimized prompts are input into a pre-scheduled optimal large language model to generate contract review opinions; The contract review comments are analyzed in a structured manner, and the structured review results are output. Based on user feedback on the review results, adjust at least one of the strategy parameters used in the multi-channel retrieval and re-ranking, as well as the weight and content of the prompt word template library, and trigger an incremental update of the contract knowledge base.
2. The contract review method according to claim 1, characterized in that, The steps for constructing the pre-built contract knowledge base include: A large model is used to perform semantic segmentation on the input legal and regulatory text, generating semantic text blocks; For each of the aforementioned semantic text blocks, a T2Q question-answer pair generation model is used to generate multiple question-answer pairs; The semantic text blocks and their corresponding question-and-answer pairs are encoded into semantic vectors and question-and-answer pair vectors, respectively, and stored in text block vector libraries and question-and-answer pair vector libraries, respectively. The mapping relationship between the semantic vectors and question-and-answer pair vectors and the original clause metadata is established to construct the contract knowledge base.
3. The contract review method according to claim 2, characterized in that, The multi-channel retrieval of the contract to be reviewed based on a pre-built contract knowledge base includes: The first channel retrieval result is obtained by retrieving text blocks that are semantically similar to the contract to be reviewed from the text block vector library; The second-channel retrieval results are obtained by retrieving question-answer pairs from the question-answer pair database that match the content of the contract to be reviewed. The search results from the first channel and the search results from the second channel are combined to form the contract search result.
4. The contract review method according to claim 3, characterized in that, The step of retrieving text blocks from the text block vector library that are semantically similar to the contract under review, and obtaining the first channel retrieval results, includes: The retrieval strategy parameters adopted by the multi-channel retrieval are obtained. The retrieval strategy parameters include preset document block size and overlapping window ratio, vector retrieval weight, keyword retrieval weight, vector similarity threshold, and initial recall quantity Top-K. The text blocks in the text block vector library are generated by semantic slicing of the legal and regulatory text based on the document block size and overlapping window ratio. A hybrid retrieval strategy is adopted to calculate the vector similarity score between the contract to be reviewed and each text block, as well as the keyword matching score between the contract to be reviewed and each text block. The vector similarity score and the keyword matching score are combined and calculated based on the vector retrieval weight and the keyword retrieval weight to obtain a comprehensive relevance score. Text blocks with vector similarity scores lower than the vector similarity threshold are filtered out. From the remaining text blocks, K text blocks are recalled from high to low based on their comprehensive relevance scores, which are used as the first channel retrieval results.
5. The contract review method according to claim 3, characterized in that, The step of retrieving question-answer pairs from the question-answer pair database that match the content of the contract to be reviewed, and obtaining the second-channel retrieval results, includes: Obtain the retrieval strategy parameters used in the multi-channel retrieval, which also include a similarity threshold and an initial recall quantity Top-N; The contract to be reviewed is vectorized to obtain a query vector; In the question-answer pair database, the similarity score between the query vector and each question-answer pair vector is calculated; Filter out question-answer pairs with similarity scores below the aforementioned similarity threshold; N question-answer pairs that match the content of the contract to be reviewed, ranked from highest to lowest similarity score, are retrieved as the second channel search results.
6. The contract review method according to claim 1, characterized in that, The step of reordering the contract retrieval results to generate an enhanced context for the contract to be examined includes: Obtain the sorting strategy parameters used for the re-sorting, the sorting strategy parameters including the number of Top-N items to be retained after re-sorting and the re-sorting score threshold; A Cross-Encoder reordering strategy is adopted, and a fine-ranking model is used to evaluate the deep semantic relevance of each candidate document block in the contract retrieval results to obtain the relevance score of each candidate document block. The candidate document blocks are reordered according to the relevance scores. Filter out candidate document blocks whose relevance scores are lower than the reordering score threshold; Based on the top-N retained quantities after reordering, the N document blocks with the highest relevance scores are selected from the remaining candidate document blocks as the enhanced context of the contract to be reviewed.
7. The contract review method according to claim 1, characterized in that, The process of adjusting the contract prompt template based on historical feedback information to generate optimized prompts includes: Based on the historical feedback information, the prompt elements in the contract prompt template are adjusted. The prompt elements include at least one of the following: review focus, output format rigor, and role setting, to generate the optimized prompt.
8. The contract review method according to claim 1, characterized in that, The triggering of incremental updates to the contract knowledge base includes: Incremental update processes can be initiated periodically or through event-triggered methods. The newly added institutional texts will be semantically sliced, question-and-answer pairs generated, and vectorized, and then integrated into the contract knowledge base.
9. The contract review method according to claim 1, characterized in that, The structured review results include an interactive review report; The interactive review report has interactive access functionality, which includes at least one of the following: source tracing, version comparison, and report export.
10. A contract review system, characterized in that, include: The retrieval unit is used to perform multi-channel retrieval of the contract to be reviewed based on a pre-set contract knowledge base to obtain contract retrieval results; A sorting unit is used to reorder the contract retrieval results and generate an enhanced context for the contract to be reviewed. The adjustment unit is used to match contract prompt word templates from the prompt word template library according to the contract type of the contract to be reviewed, and adjust the contract prompt word templates based on historical feedback information to generate optimized prompt words; The generation unit is used to input the contract to be reviewed, the enhanced context, and the optimized prompt words into a pre-scheduled optimal large language model to generate contract review opinions. The parsing unit is used to perform structured parsing of the contract review opinions and output structured review results; The iterative optimization unit is used to adjust at least one of the strategy parameters used in the multi-channel retrieval and the re-ranking, as well as the weight and content of the prompt word template library, based on user feedback on the review results, and to trigger incremental updates to the contract knowledge base.