Contract clause automatic review method and system

By combining a clause classification model and a retrieval method, and using a BERT pre-trained language model to represent contract clauses semantically, the problems of low efficiency and low accuracy in contract clause review are solved, and efficient and accurate risk identification of contract clauses is achieved.

CN115630843BActive Publication Date: 2026-06-16ASPIRE INFORMATION TECH BEIJING

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ASPIRE INFORMATION TECH BEIJING
Filing Date
2022-11-01
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing methods for reviewing contract terms are inefficient and inaccurate, especially in their lack of adaptability to different industries and fields, and cannot meet the demand for high accuracy.

Method used

A clause classification model is used to classify contract clauses and paragraphs. Combined with retrieval methods, a combination of BERT pre-trained language model and fully connected layers is used to generate risk category labels and probabilities, and a threshold is used to judge the risk of the clauses.

🎯Benefits of technology

It improves the accuracy and efficiency of contract clause review, adapts to the long-tail distribution problem in different industries and fields, and enhances the overall accuracy of review results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115630843B_ABST
    Figure CN115630843B_ABST
Patent Text Reader

Abstract

The application discloses a kind of contract clause automatic auditing method and system, the method includes: obtaining contract text;The contract text is split, and all clauses and / or paragraphs are obtained;The clauses and / or paragraphs are classified, and the risk category label corresponding to the clauses and / or paragraphs and the probability of the risk category label are obtained;If the probability of the risk category label is greater than the probability threshold set, it is determined that the clauses and / or paragraphs are at risk, and the clauses and / or paragraphs at risk are classified as risk clauses.Using the application scheme, the efficiency and accuracy of contract clause review can be effectively improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of information processing technology, and specifically to a method and system for automatically reviewing contract terms. Background Technology

[0002] Modern enterprise management involves numerous commercial activities, and the smooth operation of these activities requires the establishment of corresponding contracts between the transacting parties. The quality of the contract, the reasonableness of the agreed rights and obligations of both parties, and the existence of legal compliance risks in the contract clauses directly affect the success or failure of commercial activities and even the enterprise. However, a company often has only a small number of legal personnel, and the professional level of different legal personnel varies. Therefore, manually reviewing a large number of contracts is time-consuming, inefficient, cumbersome, and lacks rigor. With the development of search engine technology and the improvement of enterprise contract management, some companies are manually building risk clause databases. For contract texts to be reviewed, information retrieval technology is used to match the risk database. If a corresponding risk clause is found, the contract text is considered to have the corresponding risk. Compared with manual methods, information retrieval significantly improves efficiency. However, the current mainstream information retrieval technology uses the BM25 algorithm. This algorithm takes the contract terms to be retrieved and a risk clause database as input, and outputs a ranking result of the relevant risk clauses in the risk database relative to the input clauses. The top-ranked risk clause is the one with the highest matching risk. If the matching degree of the top-ranked clause exceeds a certain threshold, the contract terms to be retrieved can be considered to have such a risk. Existing risk clause retrieval methods mainly include the following steps: First, a word segmentation algorithm is used to extract the feature word set of the contract text terms to be retrieved; second, based on whether the clauses in the risk database contain the feature words of the feature word set, a set of risk clauses to be ranked is defined; third, the relevance score of each clause in the query clause and the risk clause set is calculated; fourth, the matching risk clauses are given based on the ranking result of the score and the defined threshold judgment conditions. The risk clause retrieval method mainly gives the final score based on the combination of keywords in the text, the statistical weight of the keywords, and the length of the text. This method can be effective in scenarios where the accuracy requirement is not high. However, contract clause review scenarios have higher accuracy requirements, and existing information retrieval methods cannot meet the needs of this scenario. Summary of the Invention

[0003] This invention provides a method and system for automatically reviewing contract terms, so as to improve the efficiency and accuracy of contract term review.

[0004] Therefore, the present invention provides the following technical solution:

[0005] A method for automatically reviewing contract terms, the method comprising:

[0006] Obtain the contract text;

[0007] The contract text is split to obtain all clauses and / or paragraphs;

[0008] The clauses and / or paragraphs are classified using a clause classification model to obtain the risk category labels corresponding to the clauses and / or paragraphs and the probabilities of the risk category labels;

[0009] If the probability of the risk category label is greater than the set probability threshold, then the clause and / or paragraph is determined to be risky, and the clause and / or paragraph with risk is classified as a risky clause.

[0010] Optionally, splitting the contract text to obtain all clauses and / or paragraphs includes:

[0011] The contract text is split according to its inherent logical structure to obtain all clauses and / or paragraphs; or

[0012] The contract text is split according to its style and structure to obtain all clauses and / or paragraphs.

[0013] Optionally, the step of classifying the clauses and / or paragraphs using a clause classification model to obtain the risk category labels corresponding to the clauses and / or paragraphs and the probability of the risk category labels includes:

[0014] The clause and / or paragraph are classified using the aforementioned clause classification model to obtain the probability distribution of the risk category labels corresponding to the clause and / or paragraph;

[0015] The risk category label with the highest probability distribution among the risk labels is selected as the risk category label for the clause and / or paragraph.

[0016] Optionally, the method further includes constructing the clause classification model in the following manner:

[0017] Collect the text of the risk terms and the corresponding risk category labels;

[0018] The risk clause text and the corresponding risk category labels are processed to obtain a training dataset;

[0019] A clause classification model structure is constructed, which adopts the BERT classification model; the first layer of the model is a BERT pre-trained language model layer, the second layer of the model is a fully connected layer, and the third layer of the model is a softmax layer.

[0020] The BERT classification model is trained using the training dataset to obtain the clause classification model.

[0021] Optionally, the method further includes: if the probability of the category label is less than or equal to the threshold corresponding to the category label, then determining whether the clause and / or paragraph is a risky clause for retrieval by means of a retrieval method.

[0022] Optionally, determining whether a clause and / or paragraph is a search risk clause through a search method includes:

[0023] Using the content of the aforementioned terms and / or paragraphs as input, the search engine is invoked to obtain multiple search results and the corresponding scores for each search result;

[0024] If the highest score corresponding to the search result is greater than the set score threshold, then the clause and / or paragraph is determined to be a risky clause; otherwise, the clause and / or paragraph is determined to be a risk-free clause.

[0025] Optionally, the score is a score calculated using the bm25 algorithm.

[0026] Optionally, the method further includes: aggregating the classified risk clauses, the retrieved risk clauses, and the risk-free clauses to obtain the clause review results.

[0027] An automated contract terms review system, the system comprising:

[0028] The contract text acquisition module is used to acquire contract texts.

[0029] The splitting module is used to split the contract text to obtain all clauses and / or paragraphs;

[0030] The classification module is used to classify the clauses and / or paragraphs using a clause classification model to obtain the risk category labels corresponding to the clauses and / or paragraphs and the probabilities of the risk category labels;

[0031] The first judgment module is used to determine that the clause and / or paragraph has a risk if the probability of the risk category label is greater than a set probability threshold, and to classify the clause and / or paragraph with a risk as a risk-category clause.

[0032] Optionally, the splitting module is specifically used to split the contract text according to its inherent logical structure to obtain all clauses and / or paragraphs; or to split the contract text according to its style structure to obtain all clauses and / or paragraphs.

[0033] Optionally, the classification module includes:

[0034] A classification unit is used to classify the clauses and / or paragraphs using the clause classification model to obtain the probability distribution of the risk category labels corresponding to the clauses and / or paragraphs;

[0035] The selection unit is used to select the risk category label with the largest probability distribution in the risk label as the risk category label of the clause and / or paragraph.

[0036] Optionally, the system further includes a retrieval module; the retrieval module is used to determine whether the clause and / or paragraph is a retrieval risk clause by means of a retrieval method when the probability of the category tag is less than or equal to the threshold corresponding to the category tag.

[0037] Optionally, the retrieval module includes:

[0038] The module is used to take the content of the terms and / or paragraphs as input and call the terms search engine to obtain multiple search results and the corresponding scores for each search result.

[0039] The second judgment module is used to determine that the clause and / or paragraph is a risky clause if the highest score corresponding to the search result is greater than a set score threshold, otherwise determine that the clause and / or paragraph is a risk-free clause.

[0040] Optionally, the system further includes an aggregation module for aggregating the classified risk clauses, the retrieved risk clauses, and the risk-free clauses to obtain the clause review results.

[0041] The automatic contract clause review method and system provided by this invention utilizes a clause classification model to classify contract clauses and / or paragraphs, obtaining risk category labels corresponding to the clauses and / or paragraphs and the probabilities of the risk category labels; based on the probabilities, it is determined whether the clauses and / or paragraphs pose a risk. Compared to existing clause retrieval methods, the model-based approach, because it uses massive training data as the modeling corpus, can effectively represent clauses using semantic vectors, thus greatly improving the accuracy of clause review results.

[0042] Furthermore, considering that contract texts may involve different industries and fields, essentially constituting a long-tailed dataset, there is a problem of extremely uneven distribution of training data, which can lead to unsatisfactory review results for certain types of clauses and / or paragraphs. Therefore, after reviewing the clauses and / or paragraphs in the contract text using a model-based approach, a retrieval method is used as a supplement to the clause classification model. This effectively addresses the long-tail clause problem in the classification model and greatly improves the adaptability of the present invention to different industries and fields. Attached Figure Description

[0043] Figure 1 This is a flowchart of an automatic contract terms review method provided in an embodiment of the present invention;

[0044] Figure 2 This is a schematic diagram of the training process of the clause classification model in an embodiment of the present invention;

[0045] Figure 3 This is another flowchart of the automatic contract clause review method provided in the embodiments of the present invention;

[0046] Figure 4 This is a flowchart illustrating the clause review process using a retrieval method in an embodiment of the present invention;

[0047] Figure 5 This is a schematic diagram of a contract terms automatic review system provided in an embodiment of the present invention;

[0048] Figure 6 This is another structural diagram of the automatic contract terms review system provided in this embodiment of the invention. Detailed Implementation

[0049] To enable those skilled in the art to better understand the embodiments of the present invention, the embodiments of the present invention will be further described in detail below with reference to the accompanying drawings and implementation methods.

[0050] Contract terms are the expression and formalization of contractual conditions, serving as the basis for defining the rights and obligations of the contracting parties. In legal terms, the content of a contract refers to its various clauses. Therefore, contract terms should be clear, definitive, and complete, and they should not contradict each other. Otherwise, it will affect the formation, effectiveness, and performance of the contract, as well as the achievement of its purpose. Therefore, accurately understanding the meaning of the clauses is crucial. The purpose of contract term review is to identify risks within the contract terms. Using clause retrieval methods improves efficiency and saves valuable time for legal experts. Manually reviewing risk clauses also addresses the issue of inconsistent standards. However, the process of constructing risk clauses still requires the participation of numerous legal experts, and the accuracy of the review results needs further improvement.

[0051] To this end, embodiments of the present invention provide a method and system for automatic review of contract terms, which uses a clause classification model to classify contract terms and / or paragraphs to obtain risk category labels corresponding to the terms and / or paragraphs and the probability of the risk category labels; and determines whether the terms and / or paragraphs have risks based on the probabilities.

[0052] Figure 1 This is a flowchart of an automatic contract terms review method provided in an embodiment of the present invention. This embodiment includes the following steps:

[0053] Step 101: Obtain the contract text.

[0054] In practical applications, the contract text can be obtained from the contract document to be reviewed uploaded by the user. The contract document format can be a Word file (doc / docx format) or a PDF file (PDF text or PDF scan). This embodiment of the invention does not limit the format.

[0055] The contract text is obtained by parsing the contract document. Specifically, if it is a Word document, an open-source Word document parsing tool can be used to obtain the text. If it is a PDF document, an open-source PDF parsing tool can be used. If it is a scanned PDF document, since each page is an image, a corresponding recognition tool with OCR (Optical Character Recognition) capabilities can be used to recognize the text in each image, ultimately obtaining the entire contract text.

[0056] Step 102: Split the contract text to obtain all clauses and / or paragraphs.

[0057] A contract text can be broken down into clauses based on its internal logical structure. For example, the first clause in a contract text often lists the two parties to the contract, the second clause lists the subject matter of the contract, and other clauses may include provisions on liability for breach of contract, dispute resolution, etc.

[0058] In this embodiment of the invention, one method of splitting the contract text is to split it according to its inherent logical structure to obtain the contract terms; another method is to split the contract text according to its style structure, such as line breaks, and to split the contract text into paragraphs based on the line breaks. Based on the above two splitting logics, a set of contract terms and paragraphs can be obtained.

[0059] It should be noted that in practical applications, the contract text can be split using any of the above methods, or both of the above methods can be used to split the contract, and the resulting clauses and paragraphs can be reviewed separately.

[0060] Step 103: Use the clause classification model to classify the clauses and / or paragraphs to obtain the risk category labels corresponding to the clauses and / or paragraphs and the probabilities of the risk category labels.

[0061] The clause classification model can be trained in advance by collecting a large amount of training data. The specific training process will be explained in detail later.

[0062] When classifying the clauses and / or paragraphs using the clause classification model, it is necessary to calculate the classification labels for all clauses and / or paragraphs sequentially through the clause classification model, output the probability distribution of the risk category label corresponding to each clause and / or paragraph, and select the risk category label with the largest probability distribution of the risk labels as the risk category label of the clause and / or paragraph.

[0063] Step 104: If the probability of the risk category label is greater than the set probability threshold, then the clause and / or paragraph is determined to have a risk, and the clause and / or paragraph with a risk is classified as a risk clause.

[0064] The probability threshold can be set according to the required accuracy of the review results, for example, the probability threshold can be set to 0.5. If the probability of a category label corresponding to a certain clause or paragraph is greater than 0.5, it is considered to have a risk.

[0065] In this embodiment of the invention, the clause classification model is built based on the BERT pre-trained language model. Since the pre-trained language model uses massive training data as modeling corpus, it can effectively represent clauses using semantic vectors.

[0066] like Figure 2 The diagram shown illustrates the training process of the clause classification model in this embodiment of the invention, including the following steps:

[0067] Step 201: Collect the risk clause text and the risk category label corresponding to the risk clause text.

[0068] Step 202: Process the risk clause text and the risk category label corresponding to the risk clause text to obtain the training dataset.

[0069] Specifically, the collected risk clause texts and their corresponding risk category labels are preprocessed and converted into inputs that the BERT classification model can accept, i.e., the model's training set.

[0070] Step 203: Build the clause classification model structure, wherein the clause classification model adopts the BERT classification model.

[0071] The first layer of the model is a BERT pre-trained language model layer, which is a semantic vector encoding layer. The output of the encoding layer includes a cls vector, which is the semantic representation of the entire text input.

[0072] The second layer of the model is a fully connected layer that takes the cls vector as input and outputs a vector of the same dimension as the number of classification types.

[0073] The third layer of the model is the softmax layer, which is used to transform the values ​​of the output vector of the second layer to between 0 and 1, that is, the category probability distribution of the terms.

[0074] After the model structure is built, the loss function will be specified based on the model's structure. The loss function used for the clause classification model is cross-entropy loss.

[0075] Step 204: Train the BERT classification model using the training dataset to obtain the clause classification model.

[0076] Before training the model, an optimization algorithm is specified. For item classification, the Adamw optimization algorithm can be used. During model training, the model parameters are first initialized. Then, the loss function value of the model is calculated based on the batch input data. The Adamw optimization algorithm is used to update the model parameters. The above process of calculating the loss function value and updating the model parameters is continuously executed iteratively until the model's loss function reaches the set value, that is, the model has achieved a good classification effect, and the model training stops.

[0077] By following the steps above, the clause classification model can be obtained.

[0078] The automatic contract clause review method provided by this invention utilizes a clause classification model to classify contract clauses and / or paragraphs, obtaining risk category labels corresponding to the clauses and / or paragraphs and the probabilities of the risk category labels; based on the probabilities, it determines whether the clauses and / or paragraphs pose a risk. Compared to existing clause retrieval methods, the model-based approach, because it uses massive training data as the modeling corpus, can effectively represent clauses using semantic vectors, thus greatly improving the accuracy of clause review results.

[0079] Furthermore, considering that contract texts may involve different industries and fields, essentially forming a long-tailed distributed dataset—meaning some types of clauses account for over 20% of the actual business data, while others account for less than 0.1%—this leads to an extremely uneven distribution of training data. Even with data augmentation methods to mitigate this, unsatisfactory review results may still occur for certain types of clauses and / or paragraphs. Therefore, in another non-limiting embodiment of this invention, after reviewing the clauses and / or paragraphs in the contract text using a model-based approach, a retrieval method can be used as a supplement to the clause classification model. This better addresses the long-tail clause problem in the classification model and improves the adaptability of the invention to different industries and fields.

[0080] like Figure 3The diagram shown is another flowchart of the automatic contract terms review method provided in this embodiment of the invention, which includes the following steps:

[0081] Step 301: Obtain the contract text.

[0082] Step 302: Split the contract text to obtain all clauses and / or paragraphs.

[0083] Step 303: Classify the clauses and / or paragraphs using a clause classification model to obtain the risk category labels corresponding to the clauses and / or paragraphs and the probabilities of the risk category labels.

[0084] Steps 301 to 303 above and Figure 1 Steps 101 to 103 are the same in the illustrated embodiment and will not be described in detail here.

[0085] Step 104: Determine if the probability of the risk category label is greater than the set probability threshold; if so, proceed to step 305; otherwise, proceed to step 306.

[0086] Step 305: Determine if the stated terms and / or paragraphs pose a risk, and classify the risky terms and / or paragraphs as risk-prone terms. Then proceed to step 307.

[0087] Step 306: Determine whether the terms and / or paragraphs are search risk terms by using a search method.

[0088] Step 307: Aggregate the classified risk clauses, the search risk clauses, and the risk-free clauses to obtain the clause review results.

[0089] In this embodiment, for clauses and / or paragraphs that the classification model cannot determine, the existence of risks is further determined through clause retrieval logic. Specifically, a clause retrieval engine can be used to perform the corresponding retrieval. Finally, the risk clause results given by the classification model are aggregated, thereby making the final clause review results more accurate.

[0090] It should be noted that the term search engine can be any existing search engine with corresponding functions, and this embodiment of the invention does not limit this.

[0091] In addition, embodiments of the present invention also provide a clause retrieval engine, which may include the following modules: an information storage module, a text indexing module, a retrieval condition parsing module, a ranking algorithm module, and a vocabulary parser module. Wherein:

[0092] The information storage module is mainly for efficient and reliable information storage and retrieval.

[0093] The text retrieval module is used to build an inverted index for text. An inverted index is an index mapping relationship from words to text objects. Once you have the search terms, you can quickly locate the text corresponding to the search terms through the inverted index.

[0094] The parsing module for search criteria is used to parse the input query criteria. Since the input of search criteria is often a combination of "AND", "OR", and "NOT" Boolean logic, it is necessary to analyze the search criteria through the parsing module to find the specific search terms and establish a connection with the pre-built inverted index.

[0095] The ranking algorithm module mainly calculates the relevance score between search terms and text, sums the relevance scores between the query conditions and each document, and then sorts the documents according to the relevance scores to give the ranking results.

[0096] The vocabulary parser module is mainly used for feature extraction from text. A common example is the word segmentation parser, which segments the text into words.

[0097] In addition, based on the terms search requirements, it is necessary to collect the terms data that need to be input into the terms search engine, and preprocess the collected terms data to convert it into a format acceptable to the terms search engine. The search engine's information storage function is then used to store the terms data in the corresponding storage table. While storing the data, the search engine will create an inverted index for the terms data, making it possible to quickly match terms with keywords in the subsequent process.

[0098] In this embodiment of the invention, the storage table may contain three fields: a unique identifier for the clause text, the content of the clause text, and a tag corresponding to the clause text. The unique identifier for the clause text is a string, the content of the clause text is a string and requires a corresponding vocabulary parser, and the tag field corresponding to the clause text is a string.

[0099] like Figure 4 The diagram shown is a flowchart of a clause review process performed via retrieval in an embodiment of the present invention, including the following steps:

[0100] Step 401: Using the content of the clauses and / or paragraphs as input, call the search engine to obtain multiple search results and the corresponding scores for each search result.

[0101] For example, for each input clause or paragraph, the top 10 clauses with the highest similarity scores are found from the clause database (i.e., the storage table mentioned earlier). The similarity scores here can use the bm25 algorithm.

[0102] Step 402: Determine whether the highest score corresponding to the search result is greater than the set score threshold; if yes, proceed to step 403; otherwise, proceed to step 404.

[0103] The scoring threshold can be determined by business experts based on their actual business experience.

[0104] Step 403: Determine the terms and / or paragraphs as risk retrieval terms.

[0105] Step 404: Determine that the aforementioned terms and / or paragraphs are risk-free terms.

[0106] The automatic contract clause review method provided in this invention combines classification and retrieval methods. This achieves high-precision clause risk identification at the semantic level and improves the identification of rare clauses through retrieval, thus enhancing the overall accuracy of clause review results. Both the classification model and the retrieval method incorporate threshold judgment mechanisms, enabling more flexible adaptation to various business scenarios.

[0107] Accordingly, embodiments of the present invention also provide an automatic contract terms review system, such as... Figure 5 The diagram shown is a structural schematic of an automatic contract terms review system provided in an embodiment of the present invention.

[0108] In this embodiment, the system includes the following modules:

[0109] Contract text acquisition module 501 is used to acquire contract text;

[0110] The splitting module 502 is used to split the contract text to obtain all clauses and / or paragraphs;

[0111] The classification module 503 is used to classify the clauses and / or paragraphs using a clause classification model to obtain the risk category labels corresponding to the clauses and / or paragraphs and the probabilities of the risk category labels;

[0112] The first judgment module 504 is used to determine that the clause and / or paragraph has a risk when the probability of the risk category label is greater than a set probability threshold, and to classify the clause and / or paragraph with a risk as a risk-category clause.

[0113] Specifically, the splitting module 502 can split the contract text according to its inherent logical structure to obtain all clauses and / or paragraphs; or it can split the contract text according to its style structure to obtain all clauses and / or paragraphs.

[0114] Specifically, the classification module 503 may include a classification unit and a selection unit. The classification unit is used to classify the clauses and / or paragraphs using the clause classification model to obtain the probability distribution of risk category labels corresponding to the clauses and / or paragraphs; the selection unit is used to select the risk category label with the largest probability distribution of the risk labels as the risk category label of the clauses and / or paragraphs.

[0115] The clause classification model can be pre-built by a corresponding clause classification model construction module. The training process of the model can be referred to the description in the previous embodiments of the present invention, and will not be repeated here. The clause classification model construction module can be part of the system of the present invention, or it can be independent of the system. The embodiments of the present invention do not limit this.

[0116] like Figure 6 The diagram shown is another structural schematic of the automatic contract terms review system provided in an embodiment of the present invention.

[0117] and Figure 5 Unlike the illustrated embodiment, in this embodiment, the system further includes a retrieval module 505 and an aggregation module 506. Wherein:

[0118] The retrieval module 505 is used to determine whether the clause and / or paragraph is a risky clause by means of a retrieval method when the probability of the category label is less than or equal to the threshold corresponding to the category label.

[0119] The aggregation module 506 is used to aggregate the classified risk clauses, the search risk clauses determined by the search 505, and the risk-free clauses to obtain the clause review results.

[0120] In practical applications, the retrieval module 505 can utilize existing clause retrieval engines with corresponding functions to determine whether the clauses and / or paragraphs are risky clauses.

[0121] In another non-limiting embodiment of the present invention, the retrieval module 505 may also utilize the clause retrieval engine established in the preceding embodiments of the present invention to determine whether the clause and / or paragraph are risky clauses.

[0122] Accordingly, the retrieval module 505 may include: a calling module and a second judgment module, wherein: the calling module is used to take the content of the terms and / or paragraphs as input, call the terms retrieval engine to obtain multiple retrieval results and the scores corresponding to each retrieval result; the second judgment module is used to determine that the terms and / or paragraphs are risky terms if the highest score corresponding to the retrieval result is greater than a set score threshold, otherwise determine that the terms and / or paragraphs are risk-free terms.

[0123] The automatic contract clause review system provided by this invention utilizes a clause classification model to categorize contract clauses and / or paragraphs, obtaining risk category labels corresponding to the clauses and / or paragraphs, and the probability of each risk category label. Based on the probabilities, it determines whether the clauses and / or paragraphs pose a risk. Compared to existing clause retrieval methods, the model-based approach, using massive training data as the modeling corpus, can effectively represent clauses semantically, significantly improving the accuracy of clause review results. Furthermore, after reviewing clauses and / or paragraphs in the contract text using the model-based approach, a retrieval method is used as a supplement to the clause classification model, effectively addressing the long-tail clause problem in the classification model and greatly enhancing the system's adaptability to different industries and fields.

[0124] It should be noted that the terms "comprising" and "having" and any variations thereof in the specification, claims and accompanying drawings of this invention are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units that are explicitly listed, but may include other steps or units that are not explicitly listed or that are inherent to such processes, methods, products or devices.

[0125] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on its differences from other embodiments. Furthermore, the system embodiments described above are merely illustrative. The modules and units described as separate components may or may not be physically separate; that is, they may be located on a single network unit or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0126] The embodiments of the present invention have been described in detail above. Specific implementation methods have been used to illustrate the present invention. The descriptions of the embodiments above are only for the purpose of helping to understand the methods and systems of the present invention, and are merely some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention, and the content of this specification should not be construed as a limitation of the present invention. Therefore, any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for automatically reviewing contract terms, characterized in that, The method includes: Obtain the contract text; The contract text is split to obtain all clauses and / or paragraphs; The clauses and / or paragraphs are classified using a clause classification model to obtain the risk category labels corresponding to the clauses and / or paragraphs and the probabilities of the risk category labels; If the probability of the risk category label is greater than the set probability threshold, then the clause and / or paragraph is determined to be risky, and the clause and / or paragraph with risk is classified as a risky clause. The process of classifying the clauses and / or paragraphs using a clause classification model to obtain the risk category labels corresponding to the clauses and / or paragraphs and the probability of obtaining the risk category labels includes: The clause and / or paragraph are classified using the aforementioned clause classification model to obtain the probability distribution of the risk category labels corresponding to the clause and / or paragraph; Select the risk category label with the highest probability distribution in the risk label as the risk category label for the clause and / or paragraph; The clause classification model adopts the BERT classification model; The first layer of the model is the BERT pre-trained language model layer, which is the semantic vector encoding layer. The output of the encoding layer contains the cls vector, which is the semantic representation of the entire text input. The second layer of the model is a fully connected layer, which takes the cls vector as input and outputs a vector of the same dimension as the number of classification types. The third layer of the model is the softmax layer, which is used to transform the values ​​of the output vector of the second layer to between 0 and 1, that is, the category probability distribution of the terms. After the model structure is built, the loss function will be specified according to the model structure. The loss function selected for the clause classification model is cross-entropy loss.

2. The method according to claim 1, characterized in that, The process of splitting the contract text to obtain all clauses and / or paragraphs includes: The contract text is split according to its inherent logical structure to obtain all clauses and / or paragraphs; or The contract text is split according to its style and structure to obtain all clauses and / or paragraphs.

3. The method according to any one of claims 1 to 2, characterized in that, The method further includes: If the probability of a category label is less than or equal to the threshold corresponding to the category label, then the search method is used to determine whether the clause and / or paragraph is a search risk clause.

4. An automatic contract clause review system, characterized in that, The system includes: The contract text acquisition module is used to acquire contract texts. The splitting module is used to split the contract text to obtain all clauses and / or paragraphs; The classification module is used to classify the clauses and / or paragraphs using a clause classification model to obtain the risk category labels corresponding to the clauses and / or paragraphs and the probabilities of the risk category labels; The first judgment module is used to determine that the clause and / or paragraph has a risk when the probability of the risk category label is greater than a set probability threshold, and to classify the clause and / or paragraph with a risk as a risk-category clause. The classification module includes: A classification unit is used to classify the clauses and / or paragraphs using the clause classification model to obtain the probability distribution of the risk category labels corresponding to the clauses and / or paragraphs; A selection unit is used to select the risk category label with the highest probability distribution of risk labels as the risk category label of the clause and / or paragraph; The clause classification model adopts the BERT classification model; The first layer of the model is the BERT pre-trained language model layer, which is the semantic vector encoding layer. The output of the encoding layer contains the cls vector, which is the semantic representation of the entire text input. The second layer of the model is a fully connected layer, which takes the cls vector as input and outputs a vector of the same dimension as the number of classification types. The third layer of the model is the softmax layer, which is used to transform the values ​​of the output vector of the second layer to between 0 and 1, that is, the category probability distribution of the terms. After the model structure is built, the loss function will be specified according to the model structure. The loss function selected for the clause classification model is cross-entropy loss.

5. The system according to claim 4, characterized in that, The splitting module is specifically used to split the contract text according to its inherent logical structure to obtain all clauses and / or paragraphs; or to split the contract text according to its style structure to obtain all clauses and / or paragraphs.

6. The system according to any one of claims 4 to 5, characterized in that, The system also includes: a retrieval module; The retrieval module is used to determine whether the clause and / or paragraph is a risky clause by means of a retrieval method when the probability of the category tag is less than or equal to the threshold corresponding to the category tag.

7. The system according to claim 6, characterized in that, The retrieval module includes: The module is used to take the content of the terms and / or paragraphs as input and call the terms search engine to obtain multiple search results and the corresponding scores for each search result. The second judgment module is used to determine that the clause and / or paragraph is a risky clause if the highest score corresponding to the search result is greater than a set score threshold, otherwise determine that the clause and / or paragraph is a risk-free clause.

8. The system according to claim 7, characterized in that, The system also includes: The aggregation module is used to aggregate the classified risk clauses, the search risk clauses, and the risk-free clauses to obtain the clause review results.