A network security enhanced man-machine collaborative document content generation, optimization and knowledge base counter-feeding method and system

By introducing retrieval-enhanced generation technology and semantic analysis into the document generation system, knowledge loop feedback and full-process security protection were achieved. This solved the problems of unidirectional knowledge flow, insufficient self-evolution ability, and security risks in professional document generation, and improved the accuracy and security of document generation.

CN122196159APending Publication Date: 2026-06-12FAZHENG INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
FAZHENG INTELLIGENT TECH CO LTD
Filing Date
2026-03-11
Publication Date
2026-06-12

Smart Images

  • Figure CN122196159A_ABST
    Figure CN122196159A_ABST
Patent Text Reader

Abstract

The application discloses a network security enhanced man-machine collaborative document content generation, optimization and knowledge base counter-acting method and system, which is applied to the natural language processing and network security technical field and comprises the following steps: according to a generation instruction of a user, relevant context is retrieved from a vector knowledge base based on a retrieval enhancement generation technology, and a large language model is called to generate a document preliminary draft; difference analysis is carried out on the document preliminary draft before and after user editing, semantic analysis is carried out on the extracted difference text block by using preset candidate knowledge identification rules and a natural language processing model, and potential candidate knowledge points are identified; and the user-confirmed candidate knowledge points are subjected to vectorization processing and stored in the vector knowledge base in combination with metadata. The application effectively solves the problems that AI generated content knowledge lags behind and cannot continuously learn from user feedback, and greatly improves the accuracy, timeliness and user satisfaction of professional document generation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of network security and natural language processing, and more specifically to a network security-enhanced human-computer collaborative document content generation, optimization, and knowledge base feedback method and system. Background Technology

[0002] In recent years, generative artificial intelligence technology based on Large Language Models (LLM) has made groundbreaking progress and has been widely applied in fields such as document writing and code generation. Retrieval-enhanced generation (RAG) technology, by introducing external knowledge bases, has to some extent solved the problems of knowledge lag and illusion inherent in large models. However, in practical professional document generation scenarios (such as bidding documents, legal documents, and consulting reports), existing technical solutions, in addition to shortcomings such as unidirectional knowledge flow, lack of self-evolution capabilities, and difficulty in private domain knowledge accumulation, also face significant cybersecurity risks. (1) Unidirectional nature of knowledge flow: Most current systems are in a "one-way output" mode, that is, AI generates content for users to use. When users find that the AI-generated content contains errors, omissions or outdated information and make corrections, this high-value correction data is often only retained in the current document and cannot flow back to the system knowledge base.

[0003] (2) Lack of self-evolution ability: Because it cannot absorb user feedback and modifications, the mistakes made by the AI ​​model will be repeated, and the quality of the generated document cannot be automatically improved as the frequency of user use increases, resulting in users having to repeatedly perform repetitive modification work.

[0004] (3) Difficulty in accumulating private domain knowledge: Knowledge in professional fields (such as the latest project experience and customer preferences) is often implied in the user's editing behavior. The existing knowledge base updates rely on manual regular organization and import, which is inefficient and lagging.

[0005] (4) Risk of sensitive information leakage: During the document generation and editing process, there is a lack of identification and protection mechanisms for sensitive information such as trade secrets, personal privacy, and classified data, which can easily lead to the leakage of sensitive information through AI-generated content or user editing operations.

[0006] (5) Lack of access control: There is a lack of refined access control in the knowledge base access, document editing and knowledge feedback process. Unauthorized users may tamper with core knowledge or steal sensitive information.

[0007] (6) Malicious injection and compliance risks: There is no effective malicious content detection mechanism, which may result in the injection of false information, malicious code and other harmful content. At the same time, there is a lack of compliance verification, which may easily generate content that violates laws and regulations or corporate security policies.

[0008] Therefore, how to provide a cybersecurity-enhanced human-machine collaborative document content generation, optimization, and knowledge base feedback method and system that can combine "knowledge closed-loop feedback" and "full-process security protection" to achieve continuous evolution of AI writing capabilities while preventing the leakage of sensitive information, malicious attacks, and compliance risks is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0009] In view of this, the present invention provides a method and system for document content generation, optimization, and knowledge base feedback in a network security-enhanced human-machine collaborative manner. It aims to solve the core technical problems of existing document generation technologies, such as "knowledge not being updated," "user modifications not being absorbed," and "AI not being able to continuously evolve," while also addressing the following network security-related technical issues: (1) Risk of leakage of sensitive information throughout the entire process of document generation, editing and knowledge transfer; (2) The problem of lax access control in the knowledge base access, document modification and knowledge entry process; (3) Security risks to the knowledge base caused by malicious knowledge injection and the implantation of false information; (4) The lack of operational traceability and auditing mechanisms makes it impossible to trace the source of security incidents; (5) Issues that do not comply with the Data Security Law, Personal Information Protection Law, and other compliance requirements.

[0010] To achieve the above objectives, the present invention adopts the following technical solution: A method for document content generation, optimization, and knowledge base feedback in a network security-enhanced human-computer collaborative manner includes: Step 1: Based on the user's generation instructions, retrieve relevant context from the vector knowledge base using retrieval-enhanced generation technology, and call the large language model to generate the initial draft of the document; Step 2: Perform a difference analysis on the initial draft of the document before and after user editing, and use the preset candidate knowledge recognition rules and natural language processing model to perform semantic analysis on the extracted difference text blocks to identify potential candidate knowledge points; Step 3: Vectorize the candidate knowledge points confirmed by the user and store them in the vector knowledge base along with the metadata.

[0011] Optionally, in step 1, based on the user's generation instructions, relevant context is retrieved from the vector knowledge base using retrieval-enhanced generation technology, and a large language model is invoked to generate a draft document. Specifically: Extract the core keywords of the generated instructions and transform them into query vectors using a vector embedding model; Based on the query vector, similarity retrieval is performed from the vector knowledge base using retrieval enhancement generation technology to generate the Top-K most relevant contextual knowledge; Based on the preset Prompt template, the generated instructions and the retrieved context knowledge are structurally assembled. The assembled prompt is submitted to the large language model, which generates a word-by-word initial draft of the document based on the generation instructions and contextual knowledge in an autoregressive manner.

[0012] Optionally, the query vector can be transformed using a vector embedding model, specifically: A vector embedding model based on the encoder Transformer architecture is adopted. Through contrastive learning training, the text is mapped to the semantic vector space. The specific transformation process is as follows: The text to be converted is segmented into a sequence of tokens that the model can understand; The token sequence is input into the Transformer encoder, which generates token vectors containing deep semantic information through a self-attention mechanism. The average pooling operation is performed on the multiple token vectors output by the encoder, and they are merged into a single vector that can represent the semantics of the entire text. The generated single vector is L2 normalized to generate the corresponding unit vector.

[0013] Optionally, in step 1, after generating the initial draft of the document by calling the large language model, the following steps are also included: The generation log, which includes user ID, operation time, scope of knowledge retrieved, and generated content, is used for security auditing.

[0014] Optionally, in step 2, a difference analysis is performed on the initial draft of the document before and after user editing, specifically as follows: The initial draft of the document generated by the large language model is captured as the pre-edit version, and the document state when the user triggers the save operation is the post-edit version. The structured content of both versions is converted into plain text, and whitespace and line breaks are handled uniformly. The minimum sequence of editing operations from the pre-edited version to the post-edited version is calculated using a differential algorithm; where editing operations include: insertion, deletion, and unchanged. The algorithm traverses the operation sequence output by identifying text content marked as insertion and deletion as differential text blocks and appends the unchanged parts of the text before and after the differential text blocks to form differential data carrying context.

[0015] Optionally, in step 2, semantic analysis is performed on the extracted differential text blocks using preset candidate knowledge recognition rules and natural language processing models to identify potential candidate knowledge points, specifically: Using preset candidate knowledge recognition rules, the differential text blocks are quickly filtered out, obvious formatting adjustments and punctuation modifications are removed, and high-value potential knowledge is initially marked. For text blocks with discrepancies that cannot be accurately determined by the rules, a text classification model finely tuned based on the Transformer architecture is invoked to output the probability distribution of intent classification; where intent classification includes: fact update, information supplementation, and grammatical error correction; When any probability distribution in the fact update and information supplement categories output by the model exceeds a preset threshold, the current difference text block is formally identified as a potential candidate knowledge point.

[0016] Optionally, in step 3, the candidate knowledge points confirmed by the user are vectorized and stored in the vector knowledge base along with the metadata, specifically as follows: The vector embedding model based on the encoder Transformer architecture is invoked to transform candidate knowledge points into a single high-dimensional semantic vector through word segmentation, encoding, pooling, and normalization operations. The high-dimensional semantic vector and metadata are stored in a vector database, and the index is updated for subsequent retrieval. The metadata includes the modifier ID, modification time, and associated document context.

[0017] This invention also provides a network security-enhanced human-computer collaborative document content generation, optimization, and knowledge base feedback system utilizing a network security-enhanced human-computer collaborative document content generation, optimization, and knowledge base feedback method, comprising: The document draft generation module is used to retrieve relevant context from the vector knowledge base based on the user's generation instructions and the retrieval enhancement generation technology, and call the large language model to generate a document draft. Potential candidate knowledge point identification module: used to analyze the differences between the initial draft of the document before and after the user's editing, and to perform semantic analysis on the extracted difference text blocks using preset candidate knowledge identification rules and natural language processing models to identify potential candidate knowledge points; Vector knowledge base update module: This module is used to vectorize the candidate knowledge points confirmed by the user and store them in the vector knowledge base in combination with metadata.

[0018] As can be seen from the above technical solutions, compared with the prior art, the present invention discloses a method and system for document content generation, optimization, and knowledge base feedback in a network security-enhanced human-computer collaboration manner, achieving the following beneficial effects: (1) A complete knowledge loop has been constructed: an automated process from "AI generation" to "user revision" and then to "knowledge feedback" has been realized, which solves the problem of lagging knowledge base updates.

[0019] (2) The system has achieved self-evolution: As users continue to use and modify it, the system can automatically accumulate the latest professional knowledge and expression habits, and the accuracy and timeliness of document generation will be significantly improved over time.

[0020] (3) Reduced knowledge maintenance costs: The knowledge accumulation process is integrated into the user's daily writing process, eliminating the need for a dedicated knowledge administrator to perform tedious data organization work. Attached Figure Description

[0021] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0022] Figure 1 This is a schematic diagram of the method flow provided by the present invention. Detailed Implementation

[0023] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0024] Example 1: Embodiment 1 of this invention discloses a method for document content generation, optimization, and knowledge base feedback in a network security-enhanced human-computer collaboration framework, such as... Figure 1 As shown, it includes: Step 1: Based on the user's generation instructions, retrieve relevant context from the vector knowledge base using Retrieval Enhanced Generation (RAG) technology, and call the large language model to generate the initial draft of the document.

[0025] Based on the user's generation instructions (such as "bidding proposal for a smart city project"), relevant context is retrieved from the vector knowledge base using retrieval-enhanced generation technology, and a large language model is invoked to generate a draft document. Specifically: Extract the core keywords of the generated instructions ("smart city project", "bidding proposal") and transform them into query vectors through a vector embedding model; Based on the query vector, similarity retrieval is performed from the vector knowledge base using retrieval enhancement generation technology to generate the Top-K most relevant contextual knowledge (text slices of knowledge such as historical project cases and technical specifications). Based on a pre-set Prompt template, the generated instructions and retrieved contextual knowledge are structurally assembled; the Prompt will clearly indicate the role of the large language model (such as "bid expert"), the tasks to be completed, and the background information that must be referenced; The assembled prompt is submitted via API to a large language model (such as GPT-4 or Llama series) based on a decoder-only Transformer architecture. The large language model generates a word-by-word draft of the document in an autoregressive manner based on the generation instructions and contextual knowledge. The system then receives and formats the draft before presenting it to the user.

[0026] The user views the initial draft in a front-end text editor. Suppose the initial draft mentions "Project Manager Zhang San, with 10 years of experience," while the user knows that Zhang San recently won the "2024 Gold Medal Project Manager" award and has now gained 12 years of experience. The user directly modifies the content in the editor to "Project Manager Zhang San, a senior expert with 12 years of experience, and a recipient of the 2024 Gold Medal Project Manager award."

[0027] The vector is transformed into a query vector through a vector embedding model, specifically: A vector embedding model based on an encoder-only Transformer architecture (such as Sentence-BERT or BGE model) is used to map text to a semantic vector space through contrastive learning training. The specific transformation process is as follows: Tokenization: The text to be converted is divided into a sequence of tokens that the model can understand; Encoding: Input the token sequence into the Transformer encoder, and generate token vectors containing deep semantic information through the self-attention mechanism; Pooling: Performs an average pooling operation on multiple token vectors output by the encoder, merging them into a single vector that can represent the semantics of the entire text. Normalization: Perform L2 normalization on the generated single vector to generate the corresponding unit vector.

[0028] After generating the initial draft of the document by calling the large language model, it also includes: The generation log, which includes user ID, operation time, scope of knowledge retrieved, and generated content, is used for security auditing.

[0029] Step 2: Perform a difference analysis (Diff) on the initial draft of the document before and after user editing, and use preset candidate knowledge recognition rules and natural language processing models to perform semantic analysis on the extracted difference text blocks to identify potential candidate knowledge points.

[0030] A difference analysis was conducted on the initial draft of the document before and after user editing, specifically as follows: The initial draft of the document generated by the large language model is captured as version A before editing, and the document state when the user triggers the save operation is version B after editing. The structured content of both versions is converted into plain text, and whitespace and line breaks are handled uniformly. The minimum sequence of editing operations from version A before editing to version B after editing is calculated using a differential algorithm (Myers differential algorithm or similar efficient algorithm); where editing operations include: insertion, deletion and unchanged. The algorithm traverses the operation sequence output by identifying text content marked as insertion and deletion as differential text blocks and appends the unchanged parts of the text before and after the differential text blocks to form differential data carrying context.

[0031] Using pre-defined candidate knowledge recognition rules and a natural language processing model, semantic analysis is performed on the extracted differential text blocks to identify potential candidate knowledge points, specifically: Using pre-defined candidate knowledge recognition rules (such as regular expression-based detection of changes in number units and dictionary-based detection of changes in proper nouns), the differential text blocks are quickly filtered out, obvious format adjustments and punctuation modifications are removed, and high-value potential knowledge is initially marked. For text blocks with discrepancies that cannot be accurately determined by the rules, a text classification model finely tuned based on the Transformer architecture (such as a lightweight BERT model) is invoked to output the probability distribution of intent classification; where intent classification includes: fact update, information supplementation, and grammatical error correction; When any probability distribution in the fact update and information supplement categories output by the model exceeds a preset threshold, the current difference text block is formally identified as a potential candidate knowledge point.

[0032] Step 3: Vectorize the candidate knowledge points confirmed by the user (by calling the Embedding model) and store them in the PGVector vector knowledge base along with the metadata.

[0033] The candidate knowledge points confirmed by the user (displayed to the user in the "Knowledge Adoption Panel" that pops up on the right side of the editor: "New information about 'Zhang San' has been detected. Do you want to update it to the knowledge base?") are vectorized and stored in the vector knowledge base along with metadata. Specifically: The vector embedding model based on the encoder Transformer architecture is invoked to transform candidate knowledge points into a single high-dimensional semantic vector through word segmentation, encoding, pooling, and normalization operations. The high-dimensional semantic vector and metadata are stored in a vector database, and the index is updated for subsequent retrieval. The metadata includes the modifier ID, modification time, and associated document context.

[0034] The system implementation architecture of the method disclosed in this invention is as follows: (1) Front-end interaction layer: integrates rich text editors such as Prosemirror or Tippap, and has operation monitoring and Diff visualization display functions.

[0035] (2) Business Logic Layer (Backend): a. Generation service: Encapsulates the RAG process and manages Prompt templates.

[0036] b. Difference Calculation Service: Based on text line or character-level difference comparison algorithms.

[0037] c. Knowledge extraction service: includes a heuristic rule base (e.g., identifying changes in numbers and proper nouns) and a discriminative model.

[0038] (3) Data storage layer: a. Document database: Stores snapshots of various versions of documents.

[0039] b. Vector Knowledge Base (Vector DB): Stores sliced ​​knowledge vectors and supports semantic similarity retrieval.

[0040] Example 2: Embodiment 2 of this invention discloses a network security-enhanced human-computer collaborative document content generation, optimization, and knowledge base feedback system utilizing a network security-enhanced human-computer collaborative document content generation, optimization, and knowledge base feedback method, comprising: The document draft generation module is used to retrieve relevant context from the vector knowledge base based on the user's generation instructions and the retrieval enhancement generation technology, and call the large language model to generate a document draft. Potential candidate knowledge point identification module: used to analyze the differences between the initial draft of the document before and after the user's editing, and to perform semantic analysis on the extracted difference text blocks using preset candidate knowledge identification rules and natural language processing models to identify potential candidate knowledge points; Vector knowledge base update module: This module is used to vectorize the candidate knowledge points confirmed by the user and store them in the vector knowledge base in combination with metadata.

[0041] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to the method section.

[0042] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for document content generation, optimization, and knowledge base feedback in a network security-enhanced human-computer collaborative manner, characterized in that, include: Step 1: Based on the user's generation instructions, retrieve relevant context from the vector knowledge base using retrieval-enhanced generation technology, and call the large language model to generate the initial draft of the document; Step 2: Perform a difference analysis on the initial draft of the document before and after user editing, and use preset candidate knowledge recognition rules and natural language processing models to perform semantic analysis on the extracted difference text blocks to identify potential candidate knowledge points; Step 3: Vectorize the candidate knowledge points confirmed by the user and store them in the vector knowledge base along with the metadata.

2. The method for document content generation, optimization, and knowledge base feedback in a network security-enhanced human-computer collaborative manner according to claim 1, characterized in that, In step 1, based on the user's generation instructions, relevant context is retrieved from the vector knowledge base using retrieval-enhanced generation technology, and a large language model is invoked to generate a draft document. Specifically: Extract the core keywords of the generated instruction and transform them into query vectors using a vector embedding model; Based on the query vector, a similarity search is performed from the vector knowledge base using retrieval enhancement generation technology to generate the Top-K most relevant contextual knowledge; Based on the preset Prompt template, the generated instructions and the retrieved context knowledge are structurally assembled. The assembled Prompt is submitted to the large language model, which generates the initial draft of the document word by word in an autoregressive manner based on the generation instructions and contextual knowledge.

3. The method for document content generation, optimization, and knowledge base feedback in a network security-enhanced human-computer collaborative manner according to claim 2, characterized in that, The vector is transformed into a query vector through a vector embedding model, specifically: A vector embedding model based on the encoder Transformer architecture is adopted. Through contrastive learning training, the text is mapped to the semantic vector space. The specific transformation process is as follows: The text to be converted is segmented into a sequence of tokens that the model can understand; The token sequence is input into the Transformer encoder, which generates token vectors containing deep semantic information through a self-attention mechanism. The average pooling operation is performed on the multiple token vectors output by the encoder, and they are merged into a single vector that can represent the semantics of the entire text. The generated single vector is L2 normalized to generate the corresponding unit vector.

4. The method for document content generation, optimization, and knowledge base feedback in a network security-enhanced human-computer collaborative manner according to claim 1, characterized in that, Step 1, after generating the initial draft of the document using the large language model, also includes: The generation log, which includes user ID, operation time, scope of knowledge retrieved, and generated content, is used for security auditing.

5. The method for document content generation, optimization, and knowledge base feedback in a network security-enhanced human-computer collaborative manner according to claim 1, characterized in that, In step 2, a difference analysis is performed on the initial draft of the document before and after user editing, specifically as follows: The initial draft of the document generated by the large language model is captured as the pre-edit version, and the document state when the user triggers the save operation is the post-edit version. The structured content of both versions is converted into plain text, and whitespace and line breaks are handled uniformly. The minimum sequence of editing operations from the pre-edited version to the post-edited version is calculated using a differential algorithm; wherein the editing operations include: insertion, deletion, and remaining unchanged; The operation sequence output by the traversal algorithm is used to identify the text content marked as insertion and deletion as differential text blocks, and the unchanged parts of the text before and after the differential text blocks are appended to form differential data carrying context.

6. The method for document content generation, optimization, and knowledge base feedback in a network security-enhanced human-computer collaboration according to claim 1, characterized in that, In step 2, the extracted differential text blocks are semantically analyzed using pre-defined candidate knowledge recognition rules and a natural language processing model to identify potential candidate knowledge points, specifically: The differential text blocks are quickly filtered using preset candidate knowledge recognition rules to remove obvious format adjustments and punctuation modifications, and high-value potential knowledge is initially marked. For text blocks with discrepancies that cannot be accurately determined by the rules, a text classification model finely tuned based on the Transformer architecture is invoked to output the probability distribution of intent classification; wherein, the intent classification includes: fact update, information supplementation, and grammatical error correction; When any probability distribution in the fact update and information supplement categories output by the model exceeds a preset threshold, the current difference text block is formally identified as a potential candidate knowledge point.

7. The method for document content generation, optimization, and knowledge base feedback in a network security-enhanced human-computer collaborative manner according to claim 1, characterized in that, In step 3, the candidate knowledge points confirmed by the user are vectorized and stored in the vector knowledge base along with metadata. Specifically: The vector embedding model based on the encoder Transformer architecture is invoked to transform the candidate knowledge points into a single high-dimensional semantic vector through word segmentation, encoding, pooling, and normalization operations. The high-dimensional semantic vector and metadata are stored in a vector database, and the index is updated for subsequent retrieval. The metadata includes the modifier ID, modification time, and associated document context.

8. A network security-enhanced human-computer collaborative document content generation, optimization, and knowledge base feedback system utilizing the network security-enhanced human-computer collaborative document content generation, optimization, and knowledge base feedback method according to any one of claims 1-7, characterized in that, include: The document draft generation module is used to retrieve relevant context from the vector knowledge base based on the user's generation instructions and the retrieval enhancement generation technology, and call the large language model to generate a document draft. Potential candidate knowledge point identification module: used to perform difference analysis on the initial draft of the document before and after user editing, and to perform semantic analysis on the extracted difference text blocks using preset candidate knowledge identification rules and natural language processing models to identify potential candidate knowledge points; Vector knowledge base update module: This module is used to vectorize the candidate knowledge points confirmed by the user and store them in the vector knowledge base in combination with metadata.