A method and system for document content review based on large models

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using a document content verification method based on a large model and integrating multiple detection models for learning, the problem of time-consuming and labor-intensive document verification has been solved, achieving efficient and accurate automatic verification, supporting various types of verification, and improving the quality of official documents.

CN118673144BActive Publication Date: 2026-06-16JIANGSU SIJI TECH SERVICE CO LTD

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: JIANGSU SIJI TECH SERVICE CO LTD
Filing Date: 2024-06-21
Publication Date: 2026-06-16

Application Information

Patent Timeline

21 Jun 2024

Application

16 Jun 2026

Publication

CN118673144B

IPC: G06F16/353; G06F16/334; G06F40/284; G06F18/24; G06N3/0455; G06N5/04; G06N20/20

CPC: G06F16/35; G06F16/3344; G06F40/284; G06F18/24; G06N20/20

AI Tagging

Application Domain

Ensemble learning Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

An accident reproduction method and system based on multi-failure coupling and QAR counter-driving
CN121958887BEnsemble learning Sustainable transportation
Cable bridge fault prediction and health management method based on industrial internet
CN122221067AGeometric CAD Data processing applications
Railway intelligent loading quality monitoring and parameter correction method and system
CN122198812AImage analysis Ensemble learning
Intelligent mango fertilization method
CN120615442BEnsemble learning Fertilising methods
System and Method for Classification of Unstructured Text Data
US20260162453A1Ensemble learningNatural language analysis

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

In existing technologies, document verification relies on manual proofreading, which is time-consuming and labor-intensive, and cannot guarantee the accuracy and completeness of the verification, especially when the rules change.

⚗Method used

A document content verification method based on a large model is adopted. By constructing a document dataset and fine-tuning the model, and combining multiple detection models for ensemble learning, the automatic verification of document content is achieved. The ensemble learning method is used to extract and correct error locations from the output results of multiple models.

🎯Benefits of technology

It improves the efficiency and accuracy of official document verification, reduces the consumption of human resources, supports various types of verification, including typos, extra words, etc., provides structured detection results, and improves the quality of official documents.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN118673144B_ABST

Patent Text Reader

Abstract

A method and system for document content review based on large models, comprising: obtaining a training data set, training a plurality of detection models, each detection model having a different detection method; selecting the best detection model according to the error rate of the output results of each detection model; establishing a loading interface between the best detection model and the word processing software, and establishing a calling interface between each other detection model and the word processing software; while the word processing software loads the best detection model to perform document content review, the word processing software also calls each other detection model to review the document content that has been reviewed by the best detection model; and using ensemble learning method to extract the position of the error content in the document from the output results of each detection model. The plug-in is used for document correction, which reduces the expenditure of human time and improves the efficiency of document review.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the fields of power technology, information management technology, and office technology. More specifically, it relates to a method and system for document content verification based on a large model. Background Technology

[0002] Official documents are documents that require clear and rigorous content and are used to convey important matters. Before being sent out, official documents require strict review. However, relying solely on manual verification is not only time-consuming and labor-intensive, but also fails to guarantee efficiency, accuracy, and completeness. Therefore, this patent proposes a document content verification method based on a large model. This method constructs document data, fine-tunes the model, and compares it with the document content to obtain the final verification result. By embedding the document verification function into WPS as a plugin, it effectively improves the work efficiency of staff, enabling quick and easy comparison work and reducing the workload of staff.

[0003] Currently, the company's traditional content verification method involves manually checking and verifying official documents using specified rules. This is not only time-consuming and labor-intensive but also consumes a significant amount of human resources. After the rules change, it is impossible to guarantee the accuracy and completeness of the verification.

[0004] Prior art document 1 (CN116663525B) discloses a document review method, apparatus, device, and storage medium. The document review method includes: acquiring the text of a target document content, where the target document content is the text of the content to be reviewed in the document to be reviewed; retrieving review reference information matching the target document content text from a review reference information database to obtain target review reference information, where the target review reference information is a standard used to determine whether the target document content text meets the requirements; and calling a pre-set large language model to generate review suggestions for the target document content text based on the target document content text and the target review reference information. The shortcoming of prior art document 1 is that it only reviews the content based on the inspection standards. Summary of the Invention

[0005] To address the shortcomings of existing technologies, this invention provides a method and system for document content verification based on a large model. By constructing official document data and fine-tuning the model, the final verification result is obtained by comparing the model with the document content.

[0006] Relieving the burden of manual verification is an important measure to ensure the quality of official documents and greatly improves the efficiency of document verification. It boasts high accuracy, high efficiency, significantly reduces the risk of leaking document content, and has wide applicability.

[0007] The present invention adopts the following technical solution.

[0008] The first aspect of this invention provides a method for document content verification based on a large model, comprising the following steps:

[0009] Obtain a training dataset and train multiple detection models, each with different detection methods; select the best detection model based on the error rate of each model's output.

[0010] Establish a loading interface between the optimal detection model and the word processing software, and establish calling interfaces between other detection models and the word processing software;

[0011] While the word processing software loads the best detection model to perform document content verification, it also calls other detection models to verify the document content that the best detection model has already verified; using ensemble learning, it extracts the location of the erroneous content in the document from the output results of each detection model.

[0012] Preferably, obtaining the training dataset includes collecting basic data and processing and transforming the data;

[0013] The basic data includes the collection of the company's full name and abbreviation, the full name and abbreviation of departments, the names and order of leaders, a dedicated thesaurus and personal thesaurus; it also includes data from the externally available public data sighan_2015 and the State Grid Electric Power professional field dataset;

[0014] Data processing and transformation includes converting all data into training data formats suitable for each detection model.

[0015] Preferably, training multiple detection models specifically includes:

[0016] The training dataset was fed into the large language model chatyuan and the traditional model macbert-csc for training, respectively.

[0017] The large language model chatyuan was trained for 10 rounds using a 64-layer LoRa plugin; the traditional model macbert-csc was trained for 50 and 100 rounds, resulting in three models: chatyuan-10, macbert-csc-50 rounds, and macbert-csc-100 rounds.

[0018] The model's performance is evaluated based on performance metrics, including accuracy, recall, overall performance, and precision.

[0019] Preferably, the selection of the optimal detection model specifically includes:

[0020] Select the best detection model based on the error rate of the output results of each detection model;

[0021] If the error rate of the output results of multiple detection models is 0, gradually reduce the amount of data in the training dataset and compare the error rates of the output results of each detection model again.

[0022] While reducing the training dataset, the response time of each detection model was recorded;

[0023] The weighted sum of error rate and response time is used as the criterion for judging the best detection model.

[0024] Preferably, the establishment of the loading interface between the optimal detection model and the word processing software, and the establishment of the calling interface between other detection models and the word processing software, specifically include:

[0025] The word processing software provides a channel for users to customize the plugins they need. The attributes and functions of the word processing software loading interface are defined by modifying the jsplugins.xml file, which is used to add and modify word processing software add-ins.

[0026] Custom XML add-in files are configured in the oem.ini file of the word processing software. When the corresponding document is opened, the word processing software's function area displays the custom loading interface, including document verification assistance.

[0027] Preferably, while the word processing software loads the optimal detection model to perform document content verification, the word processing software also calls other detection models to verify the document content already verified by the optimal detection model, specifically including:

[0028] Establish a loading interface between the best detection model and the word processing software. The best detection model is displayed as a document auxiliary verification plugin in the word processing software's function area.

[0029] Establish interfaces for calling other detection models and text processing software, allowing other detection models to be called in the background;

[0030] Define an error correction function to format the text into the model's input format; when a user uses an add-in in word processing software to proofread a document, the error correction function is called, and the best detection model is called through the loading interface to perform preliminary proofreading;

[0031] While the best detection model verifies the document content, other detection models are called in parallel through API calls to verify the same document content.

[0032] Preferably, the step of calling the error correction function displays the preliminary prediction results returned by the error correction function in the word processing software interface, and classifies and labels the error types.

[0033] The model's predicted error types, locations, and corrected text are integrated into a structured detection result;

[0034] The error types are categorized and labeled, and different error types are distinguished by different colors in the word processing software interface and highlighted.

[0035] Preferably, the predicted error types include: typos, extra words, missing words, word order errors, leader detection, non-standard expressions, sensitive content, and custom thesaurus.

[0036] Preferably, the step of employing ensemble learning to extract the location of erroneous content in the document from the output results of each detection model specifically includes:

[0037] By comparing the verification results of the best detection model with those of other detection models, the location of all erroneous content in the document is extracted;

[0038] The final error location is determined by ensemble learning. If at least two models detect an error at the same location, then a valid error is considered to exist at that location, and the final error location is determined.

[0039] Based on the identified error location, corrective suggestions are applied to modify the document content.

[0040] A second aspect of the present invention provides a document content verification system based on a large model, comprising: a data preparation module, a model training module, an add-in service configuration module, and an error prediction and correction module;

[0041] The data preparation module collects and prepares basic data for model training and fine-tuning, converting the collected data into training data formats for large language models and traditional models.

[0042] The model training module trains large language models and traditional models, combines multiple trained models, statistically evaluates and assesses the correct recognition of each model, and selects the model with the best performance.

[0043] The Add-in Service Configuration module allows you to customize and configure add-ins in WPS for auxiliary document proofreading.

[0044] The error prediction and correction module uses a trained model to perform text verification on documents and generate error detection and correction suggestions.

[0045] Compared with existing technologies, the beneficial effects of this invention include at least the following: the document content proofreading method based on a large model supports document verification for correcting errors in official documents. Through deep integration with collaborative office systems, the document proofreading function is implemented as a plugin in WPS. This method significantly reduces manpower and time expenditure, improves the efficiency of official document verification, and lays the foundation for building content verification frameworks for different scenarios. Attached Figure Description

[0046] Figure 1 This is a flowchart of document proofreading based on a large model, provided according to an embodiment of the present invention;

[0047] Figure 2 This is a hierarchical structure diagram prepared based on the basic data provided in the embodiments of the present invention;

[0048] Figure 3 This is a model training flowchart provided according to an embodiment of the present invention;

[0049] Figure 4 This is a flowchart of WPS custom add-ins provided according to an embodiment of the present invention;

[0050] Figure 5 This is a pseudocode diagram for error correction provided according to an embodiment of the present invention. Detailed Implementation

[0051] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of this invention. The embodiments described in this application are merely some embodiments of this invention, and not all embodiments. Based on the spirit of this invention, other embodiments obtained by those skilled in the art without creative effort are all within the protection scope of this invention.

[0052] like Figure 1 As shown, Embodiment 1 of the present invention provides a method for document content verification based on a large model, including the following steps:

[0053] Obtain a training dataset and train multiple detection models, each with different detection methods; select the best detection model based on the error rate of each model's output.

[0054] Preferably, obtaining the training dataset includes collecting basic data and processing and transforming the data;

[0055] like Figure 2 As shown, the basic data includes the collection of the company's full name and abbreviation, the department's full name and abbreviation, the names and order of leaders, a dedicated thesaurus and a personal thesaurus; it also includes data from the externally available public data sighan_2015 and the State Grid Electric Power professional field dataset;

[0056] Data processing and transformation includes converting all data into training data formats suitable for each detection model.

[0057] The models include the large language chatyuan model and the traditional model macbert-csc model;

[0058] The training data format for the large language chatyuan model is as follows:

[0059] {

[0060] "conversation_id":1,

[0061] "category":"wordcheck",

[0062] "conversation":[

[0063] {

[0064] "human":"Text correction:\nThermal power generation hours: 1128 hours, an increase of 31 hours compared to the same period last year\nAnswer",

[0065] "assistant": "1128 minor incidents at thermal power plants, 31 hours more than the same period last year."

[0066] }

[0067] ],

[0068] "dataset":"csc"

[0069] }

[0070] The training data format for the traditional MacBERT-CSC model is as follows:

[0071] {

[0072] "id":1,

[0073] Thermal power generation hours totaled 1128 hours, an increase of 31 hours compared to the same period last year;

[0074] "wrong_ids":[7],

[0075] "correct_text":"1128 minor incidents at thermal power plants, an increase of 31 hours compared to the same period last year;"

[0076] }

[0077] Preferably, training multiple detection models specifically includes:

[0078] The training dataset was fed into the large language model chatyuan and the traditional model macbert-csc for training, respectively.

[0079] The large language model chatyuan was trained for 10 rounds using a 64-layer LoRa plugin; the traditional model macbert-csc was trained for 50 and 100 rounds, resulting in three models: chatyuan-10, macbert-csc-50 rounds, and macbert-csc-100 rounds.

[0080] Model performance is evaluated based on performance metrics, including accuracy, recall, overall score, and precision, which are used to assess the model's classification effectiveness. Figure 3 As shown:

[0081] The prepared data was used to train three different models: the Chatyuan model, the Macbert-CSC-50 model, and the Macbert-CSC-100 model. The training results of each model were evaluated using four performance metrics: precision (TP), recall (R), F1 score (F1), and accuracy (ACC).

[0082] The evaluation results for the Chatyuan-10 model are: TP = 0.82, R = 0.94, F1 = 0.85, and ACC = 0.82.

[0083] The evaluation results of the Macbert-CSC-50 model are: TP = 0.76, R = 0.79, F1 = 0.82, and ACC = 0.70.

[0084] The evaluation results of the Macbert-CSC-100 model are: TP = 0.80, R = 0.74, F1 = 0.72, and ACC = 0.65.

[0085] Through these evaluation metrics, the best-performing model was finally obtained: the chatyuan-10 model.

[0086] It is worth noting that by selecting the optimal detection model, we can ensure that the best model is always used for detection under different scenarios and requirements. This method can dynamically select the best-performing model based on actual application needs and specific circumstances, improving the accuracy and reliability of detection, while also saving resources and avoiding unnecessary computational overhead.

[0087] Preferably, the selection of the optimal detection model specifically includes:

[0088] Select the best detection model based on the error rate of the output results of each detection model;

[0089] If the error rate of the output results of multiple detection models is 0, gradually reduce the amount of data in the training dataset and compare the error rates of the output results of each detection model again.

[0090] While reducing the training dataset, the response time of each detection model was recorded;

[0091] The weighted sum of error rate and response time is used as the criterion for judging the best detection model.

[0092] First, establish the loading interface between the best detection model and the text processing software, and then establish the calling interface between other detection models and the text processing software;

[0093] Preferably, the step of first establishing the loading interface between the optimal detection model and the word processing software, and establishing the calling interfaces between other detection models and the word processing software, specifically includes:

[0094] The word processing software provides a channel for users to customize the plugins they need. The attributes and functions of the word processing software loading interface are defined by modifying the jsplugins.xml file, which is used to add and modify word processing software add-ins.

[0095] Custom XML add-in files are configured in the oem.ini file of the word processing software. When the corresponding document is opened, the word processing software's function area displays the custom loading interface, including document verification assistance.

[0096] It is worth noting that setting the priority order for calling detection models ensures that the most suitable model is called first when performing a detection task. This not only improves detection efficiency and reduces processing time, but also optimizes resource utilization, ensuring that the system maintains high efficiency when dealing with detection tasks of varying complexity, thereby improving the overall system's response speed and performance.

[0097] While the word processing software loads the best detection model to perform document content verification, it also calls other detection models to verify the document content that the best detection model has already verified; using ensemble learning, it extracts the location of the erroneous content in the document from the output results of each detection model.

[0098] Preferably, while the word processing software loads the optimal detection model to perform document content verification, the word processing software also calls other detection models to verify the document content already verified by the optimal detection model, specifically including:

[0099] Establish a loading interface between the best detection model and the word processing software. The best detection model is displayed as a document auxiliary verification plugin in the word processing software's function area.

[0100] Establish interfaces for calling other detection models and text processing software, allowing other detection models to be called in the background;

[0101] Define an error correction function to format the text into the input format of the model; when the user uses the add-in to check the document in a word processing software, call the error correction function, and call the best detection model through the loading interface to perform a preliminary check;

[0102] The defined error correction function:

[0103] def correct(prompt):

[0104] prompt = "Error correction task:\n" + prompt + "\nAnswer:"

[0105] result = answer(prompt)

[0106] return result

[0107] Call the error correction function:

[0108] corrected_text = correct('渝店家园')

[0109] Output result:

[0110] Answer: 渝电家园

[0111] While the best detection model checks the document content, call other detection models in parallel through the interface to check the same document content.

[0112] It should be noted that by comparing the output results of different detection models, the accuracy and consistency of the detection can be effectively verified. This method helps to discover the differences and potential errors between models, and then optimize and adjust. The result comparison can also provide multi-level verification, increase the credibility and robustness of the detection results, and provide reliable data support for further decision-making.

[0113] Preferably, call the error correction function, display the preliminary prediction result returned by the error correction function in the interface of the word processing software, and classify and label the error types;

[0114] Integrate the error types, positions and corrected text predicted by the model into a structured detection result;

[0115] The classification and labeling of the error types distinguish different error types through different colors in the interface of the word processing software and highlight them.

[0116] Preferably, the predicted error types include: misspelled words, extra words error, missing words error, word order error, leader detection, non-standard expressions, sensitive content, custom word library.

[0117] The significant difference between this invention and existing technologies is that it supports the verification of multiple types of content, including typos, extra words, missing words, word order errors, leader detection, non-standard expressions, sensitive content, and common sense errors.

[0118] The advantage achieved compared to existing technical document 1 is that it supports more functional verifications.

[0119] Preferably, the step of employing ensemble learning to extract the location of erroneous content in the document from the output results of each detection model specifically includes:

[0120] By comparing the verification results of the best detection model with those of other detection models, the location of all erroneous content in the document is extracted;

[0121] The final error location is determined by ensemble learning. If at least two models detect an error at the same location, then a valid error is considered to exist at that location, and the final error location is determined.

[0122] Based on the identified error location, corrective suggestions are applied to modify the document content.

[0123] Ensemble learning methods include, but are not limited to, voting mechanisms. Those skilled in the art can use other ensemble methods, such as weighted average or maximum likelihood estimation, to improve the accuracy and reliability of the verification results, depending on the specific circumstances.

[0124] The specific pseudocode for the ensemble learning method is as follows:

[0125]

[0126]

[0127] Example 2 of the present invention provides a document content verification system based on a large model, including: a data preparation module, a model training module, an add-in service configuration module, and an error prediction and correction module;

[0128] The data preparation module collects and prepares basic data for model training and fine-tuning, converting the collected data into training data formats for large language models and traditional models.

[0129] The model training module trains large language models and traditional models, combines multiple trained models, statistically evaluates and assesses the correct recognition of each model, and selects the model with the best performance.

[0130] The Add-in Service Configuration module allows you to customize and configure add-ins in WPS for auxiliary document proofreading.

[0131] The error prediction and correction module uses a trained model to perform text verification on documents and generate error detection and correction suggestions.

[0132] Compared with existing technologies, the beneficial effects of this invention include at least the following: the document content proofreading method based on a large model supports document verification for correcting errors in official documents. Through deep integration with collaborative office systems, the document proofreading function is implemented as a plugin in WPS. This method significantly reduces manpower and time expenditure, improves the efficiency of official document verification, and lays the foundation for building content verification frameworks for different scenarios.

[0133] This invention can be a system, method, and / or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of this disclosure.

[0134] Computer-readable storage media can be tangible devices capable of holding and storing instructions for use by an instruction execution device. Computer-readable storage media can be, for example—but not limited to—electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination of the foregoing. The computer-readable storage media used herein are not to be construed as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or electrical signals transmitted through wires.

[0135] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.

[0136] Computer program instructions used to perform the operations of this disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuitry, such as programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), is personalized by utilizing the status information of the computer-readable program instructions to implement various aspects of this disclosure.

[0137] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the protection scope of the claims of the present invention.

Claims

1. A method for document content verification based on a large model, wherein the large model includes multiple detection models, characterized in that, Includes the following steps: Obtain a training dataset and train multiple detection models, each with a different detection method; The optimal detection model is selected based on the error rate of each detection model's output. Training multiple detection models specifically involves: feeding the training dataset into the large language model Chatyuan and the traditional model Macbert-CSC for training; training Chatyuan using a 64-layer LoRa plugin for 10 epochs; and training the traditional model Macbert-CSC for 50 and 100 epochs to obtain three models: Chatyuan-10, Macbert-CSC-50, and Macbert-CSC-100. Model performance is evaluated based on performance metrics, including accuracy, recall, comprehensiveness, and precision. Establish a loading interface between the optimal detection model and the word processing software, and establish calling interfaces between other detection models and the word processing software; While the word processing software loads the best detection model to perform document content verification, it also calls other detection models to verify the document content already verified by the best detection model. Specifically, this includes: establishing a loading interface between the best detection model and the word processing software, displaying the best detection model as a document verification plugin in the word processing software's function area; establishing calling interfaces between other detection models and the word processing software, allowing other detection models to be called in the background; defining an error correction function to format the text into the model's input format; when the user uses the add-in to verify a document in the word processing software, calling the error correction function and calling the best detection model through the loading interface to perform preliminary verification; and while the best detection model is verifying the document content, calling other detection models in parallel through the calling interface to verify the same document content. An ensemble learning approach is used to extract the location of erroneous content in the document from the output of each detection model. Specifically, this includes: comparing the verification results of the best detection model with those of other detection models to extract the location of all erroneous content in the document; determining the final error location through ensemble learning, and if at least two models detect an error at the same location, then the location is considered to be a valid error, thus determining the final error location; and based on the determined final error location, applying correction suggestions to modify the document content.

2. The method for document content verification based on a large model according to claim 1, characterized in that: The acquisition of the training dataset includes collecting basic data and processing and transforming the data; The basic data includes the collection of the company's full name and abbreviation, the full name and abbreviation of departments, the names and order of leaders, a dedicated thesaurus and personal thesaurus; it also includes data from the externally available public data sighan_2015 and the State Grid Electric Power professional field dataset; Data processing and transformation includes converting all data into training data formats suitable for each detection model.

3. The method for document content verification based on a large model according to claim 1, characterized in that: The selection of the optimal detection model specifically includes: Select the best detection model based on the error rate of the output results of each detection model; If the error rate of the output results of multiple detection models is 0, gradually reduce the amount of data in the training dataset and compare the error rates of the output results of each detection model again. While reducing the training dataset, the response time of each detection model was recorded; The weighted sum of error rate and response time is used as the criterion for judging the best detection model.

4. The method for document content verification based on a large model according to claim 1, characterized in that: The establishment of the loading interface between the optimal detection model and the word processing software, and the establishment of the calling interface between other detection models and the word processing software, specifically include: The word processing software provides a channel for users to customize the plugins they need. The attributes and functions of the word processing software loading interface are defined by modifying the jsplugins.xml file, which is used to add and modify word processing software add-ins. Custom XML add-in files are configured in the oem.ini file of the word processing software. When the corresponding document is opened, the word processing software's function area displays the custom loading interface, including document verification assistance.

5. The method for document content verification based on a large model according to claim 1, characterized in that: The error correction function is called, and the preliminary prediction results returned by the error correction function are displayed in the word processing software interface to classify and label the error types. The model's predicted error types, locations, and corrected text are integrated into a structured detection result; The error types are categorized and labeled, and different error types are distinguished by different colors in the word processing software interface and highlighted.

6. The method for document content verification based on a large model according to claim 1, characterized in that: The predicted error types include: typos, extra words, missing words, word order errors, leader detection, non-standard expressions, sensitive content, and custom thesaurus.

7. A document content verification system based on a large model, comprising: The module includes a data preparation module, a model training module, a loading interface configuration module, and an error prediction and correction module. The method for document content verification based on a large model as described in any one of claims 1 to 6 is characterized in that: The data preparation module collects and prepares basic data for model training and fine-tuning, converting the collected data into training data formats for large language models and traditional models. The model training module trains large language models and traditional models, combines multiple trained models, statistically evaluates and assesses the correct recognition of each model, and selects the model with the best performance. The loading interface configuration module allows you to customize and configure add-ins in word processing software to assist in document proofreading. The error prediction and correction module uses a trained model to perform text verification on documents and generate error detection and correction suggestions.