Large language model text provenance method based on virtual prompt word embedding

By embedding virtual prompt words into a large language model and utilizing Transformer and Long Short-Term Memory networks, the robustness problem of large language models in text transformation scenarios is solved, achieving stable extraction and tracing of identity identifiers, which is suitable for applications with high real-time requirements.

CN121959528BActive Publication Date: 2026-06-19JINAN UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
JINAN UNIVERSITY
Filing Date
2026-04-01
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing security control methods for text generated by large language models are not robust enough in the face of text editing and semantic restatement, cannot reliably extract verifiable identity information, and have high deployment costs, making them unsuitable for application scenarios with high real-time requirements.

Method used

By embedding virtual prompts into the system prompts of a large language model, the Transformer module is used to generate watermarked text with user identity information, and the identity information is extracted through a long short-term memory network, thus achieving instant embedding and stable extraction of identity identifiers.

Benefits of technology

Without modifying the main parameters of the model, robust embedding of watermark information and reliable extraction of identity information are achieved, making it suitable for application scenarios with high real-time requirements and providing verifiable traceability evidence.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121959528B_ABST
    Figure CN121959528B_ABST
Patent Text Reader

Abstract

This invention provides a method for tracing the source of large language model text based on virtual prompt word embedding, belonging to the fields of artificial intelligence natural language processing technology and digital content security technology. The method includes: (1) constructing chat input command text with role format; (2) obtaining embedded vector input; (3) obtaining system prompt word embedding representation; (4) obtaining generated text with user identity information watermark; (5) restoring user identity information to achieve source tracing of generated text. This invention achieves real-time embedding of traceable identity information in the large language model generation process without modifying the main parameters of the large language model or affecting the model text generation quality and response efficiency. At the same time, it ensures the robustness of the embedded watermark information in text transformation scenarios, and can stably extract verifiable and reproducible identity information from the generated text, providing reliable technical support for tracing the source of generated text.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of artificial intelligence natural language processing technology and digital content security technology, and in particular to a method for tracing the source of text based on a large language model with virtual prompt word embedding. Background Technology

[0002] With the rapid development of artificial intelligence (AI) technology, large language models, with their powerful natural language reasoning and text generation capabilities, have been widely applied in various scenarios such as text creation, code generation, and intelligent question answering, greatly improving the efficiency of information production and processing. However, at the same time, the generation capabilities of large language models have also significantly lowered the production threshold for harmful information such as false information, fraudulent content, academic plagiarism, and spam, exacerbating the risk of harmful information dissemination and posing a serious challenge to the governance of online content ecosystems and information security supervision. To address the security control needs of AI-generated synthetic content, the AI-generated synthetic content identification method explicitly requires that AI-generated content must contain verifiable tagging information. This involves embedding tagging payloads into the generated synthetic content using watermarking technology to achieve content legitimacy verification and source traceability.

[0003] Current research on the security management of text generated by large language models mainly focuses on the detection of generated content. Mainstream technical solutions either add biases to the probability distribution of the predicted vocabulary of the large language model or embed implicit tags based on the statistical analysis of the language features of the generated text. While these methods can achieve automated detection of generated text to some extent, they still have many insurmountable technical shortcomings. On the one hand, existing methods heavily rely on specific generation strategies or pre-existing assumptions. When faced with common text transformation operations such as text editing, content rewriting, and semantic restatement, the robustness of the tags is insufficient, easily leading to the loss of tag information and detection failure. On the other hand, most existing detection schemes can only achieve qualitative identification of whether the text was generated by the model; they cannot extract verifiable and reproducible identity information, making it difficult to provide legally valid traceability evidence and failing to meet the application needs of complex security and judicial supervision scenarios. In addition, some existing watermark embedding solutions require modification of the main parameters of the large language model or the introduction of additional model post-processing procedures. This not only results in high deployment costs and poor adaptability, but also affects the text generation quality and response efficiency of the large language model, making it unsuitable for scenarios with high real-time requirements such as intelligent question answering and intelligent agent interaction. Summary of the Invention

[0004] This invention provides a method for tracing the source of large language model text based on virtual prompt word embedding. Without modifying the main parameters of the large language model or affecting the quality and response efficiency of the generated text, it enables the real-time embedding of traceable identity information during the generation process of the large language model. At the same time, it ensures the robustness of the embedded watermark information in text transformation scenarios. It can stably extract verifiable and reproducible identity information from the generated text, providing reliable technical support for tracing the source of the generated text and meeting the needs of security supervision and judicial evidence for AI-generated content.

[0005] To achieve the above objectives, the present invention adopts the following technical solution:

[0006] Text source tracing methods based on large language models with virtual cue word embeddings include:

[0007] (1) Obtain system prompts and user input, and use a fixed template to combine system prompts and user input to construct chat input command text with role format;

[0008] (2) Obtain the word segmenter and embedding matrix corresponding to the target chat language model, and use the word segmenter and embedding matrix to convert the chat input command text into the corresponding embedding vector input. The embedding vector input includes the embedding representation corresponding to the system prompt word and the embedding representation corresponding to the user input.

[0009] (3) Obtain user identity information, convert the user identity information into a fixed-length bit string, input the bit string and the embedding representation corresponding to the system prompt word into the pre-trained watermark encoder, generate a virtual prompt word representation containing m embedding vectors, where m is the length of the predefined watermark message, insert the virtual prompt word representation into the end position of the embedding representation corresponding to the system prompt word, and obtain the updated system prompt word embedding representation;

[0010] (4) Concatenate the updated system prompt word embedding representation with the embedding representation corresponding to the user input, and input it into the target chat large language model to obtain the generated text with user identity information watermark;

[0011] (5) Input the generated text into a pre-trained watermark extractor to recover the user identity information embedded in the generated text, and trace the source of the generated text based on the recovered user identity information.

[0012] In this specification, in step (1), the fixed template contains predefined instruction words and role definition information. The role definition information includes the identification content corresponding to three types of roles: system, user and model. When constructing chat input instruction text, the system prompt words and user input are encapsulated according to the identification content corresponding to the system role and user role, respectively.

[0013] In this specification, step (2), the specific process of converting the chat input command text into the corresponding embedded vector input is as follows:

[0014] (2.1) The chat input command text is segmented and sliced ​​using the word segmenter to obtain a text sequence composed of several word units;

[0015] (2.2) Look up the encoding table corresponding to the word segmenter, convert each word in the text sequence into the corresponding word token, and obtain the encoding sequence composed of word tokens;

[0016] (2.3) Match the corresponding embedding vector from the embedding matrix according to the word identifier to obtain the embedding vector input corresponding to the chat input command text.

[0017] In this specification, in step (3), the watermark encoder is composed of several converter modules connected in sequence. The watermark encoder encodes the bit string corresponding to the user identity information into the embedding space aligned with the embedding representation corresponding to the system prompt word through the attention mechanism, thereby generating the virtual prompt word representation.

[0018] In this specification, in step (3), after the virtual prompt word representation is inserted, the updated system prompt word embedding representation and the embedding representation corresponding to the user input are concatenated in sequence, and the dimension of the obtained complete embedding representation matches the input dimension of the target chat language model.

[0019] In this specification, the specific process of obtaining the generated text with the user identity information watermark in step (4) is as follows:

[0020] (4.1) Input the concatenated embedding representation into the target chat language model, and calculate the predicted probability distribution of the next word in the text through the multi-layer self-attention mechanism of the target chat language model;

[0021] (4.2) Select a word from the predicted probability distribution obtained in step (4.1) as the predicted text for the next position using a pre-defined continuous sampling strategy;

[0022] (4.3) After concatenating the lexical units obtained in step (4.2) to the input sequence in step (4.1), repeat the process from step (4.1) to step (4.3) until the output length reaches the preset maximum value, or the predefined terminator in the target chat language model vocabulary is obtained by sampling.

[0023] (4.4) Combine all the resulting lexical units into the generated text with user identity information watermark output by the target chat big language model.

[0024] In this specification, in step (4), during the process of obtaining the generated text, the distribution loss between the word prediction probability distribution obtained in step (4.1) and the expected answer probability distribution is calculated simultaneously; after obtaining the generated text, the semantic representation of the generated text is extracted through the pre-trained semantic representation model, and the semantic loss between the generated text and the expected answer is calculated. The distribution loss and the semantic loss are both used for the training and optimization of the watermark encoder.

[0025] In this specification, in step (5), the watermark extractor is composed of a long short-term memory network and several fully connected layers connected in sequence, and the latent space dimension of the long short-term memory network is consistent with the embedding space dimension of the target chat large language model.

[0026] In this instruction manual, the specific process for restoring user identity information in step (5) is as follows:

[0027] (5.1) The generated text is converted into the corresponding continuous embedding representation by using a word segmenter and an embedding matrix;

[0028] (5.2) Input the continuous embedding representations corresponding to the generated text into the Long Short-Term Memory network one by one in the order of the lexical units, and extract the sequence features of the generated text;

[0029] (5.3) Input the extracted sequence features into the fully connected layer to obtain the prediction results corresponding to each bit of the watermark information. Based on the prediction results, restore the bit string corresponding to the user identity information and finally convert it to obtain the user identity information.

[0030] In this specification, a text rewriting attack simulation module is set up during the training phase of the watermark encoder and watermark extractor. The text rewriting attack simulation module is built based on a pre-trained large language model. The watermarked text generated by the target chat large language model is input into the text rewriting attack simulation module to obtain the rewritten attack text. The attack text and the original watermarked text are input together into the watermark extractor to train and optimize the watermark recovery capability of the watermark extractor.

[0031] In summary, the present invention has at least the following beneficial effects:

[0032] This invention requires no modification to the main parameters of the target large language model. It only needs to inject a virtual prompt word embedding vector of fixed length after the system prompt word during the model inference stage to complete the embedding of watermark information. This greatly reduces the deployment cost and adaptation difficulty of the solution, and can be quickly adapted to different open source or commercial large language models, with strong versatility.

[0033] This invention eliminates the need for additional model rewriting or post-processing steps during watermark embedding, thus avoiding any additional burden on the response speed of large language models. While ensuring text generation quality, it can meet the needs of application scenarios with extremely high real-time requirements, such as real-time intelligent question answering and intelligent agent interaction.

[0034] This invention applies a global semantic bias to the generation distribution of a large language model by using virtual prompt words, which deeply integrates the watermark signal into the semantic and expressive features of the generated text, rather than relying solely on surface text statistical features. Even when faced with minor text editing, rewriting, semantic restatement, or other transformation operations, it can still stably extract watermark information, demonstrating excellent anti-interference capabilities and robustness.

[0035] This invention can accurately recover embedded user identity information from generated text. This identity information is verifiable and reproducible, providing reliable technical evidence for tracing the source of AI-generated content and determining responsibility. It effectively adapts to the application needs of complex scenarios such as content security supervision and judicial evidence collection, filling the technical gap that existing technologies cannot simultaneously meet the requirements of real-time watermark embedding and traceable identity information extraction. Attached Figure Description

[0036] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the following description of the embodiments will be briefly introduced. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0037] Figure 1 This is a schematic diagram of the framework of the large language model text tracing method based on virtual prompt word embedding involved in this invention.

[0038] Figure 2 This is a schematic diagram illustrating an example of an input constructed via a template as described in this invention.

[0039] Figure 3 This is a schematic diagram of the Transformer module architecture involved in this invention.

[0040] Figure 4 This is a schematic diagram of the long short-term memory network module architecture involved in this invention. Detailed Implementation

[0041] In the following description, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments can be modified in various ways without departing from the spirit or scope of the embodiments of the invention. Therefore, the drawings and description are considered to be exemplary in nature and not restrictive.

[0042] The following disclosure provides many different implementations or examples for carrying out different structures of the embodiments of the present invention. To simplify the disclosure of the embodiments of the present invention, specific examples of components and arrangements are described below. Of course, these are merely examples and are not intended to limit the embodiments of the present invention. Furthermore, reference numerals and / or reference letters may be repeated in different examples of the embodiments of the present invention; such repetition is for simplification and clarity and does not in itself indicate a relationship between the various implementations and / or arrangements discussed.

[0043] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0044] Figure 1 This diagram illustrates the overall framework of the method, showcasing the complete logic from watermark embedding and watermarked text generation to watermark extraction and loss optimization. It covers five core stages: input processing, watermark encoding, large language model generation, watermark decoding, and multi-dimensional loss constraints. It intuitively demonstrates the core inventive concept of this invention: "achieving watermark embedding and source tracing without modifying the main parameters of the large language model, but only by injecting virtual prompt words through the embedding layer."

[0045] 1. Input fragment

[0046] System prompt: with , , 、… represents an abbreviation for Token, where each symbol represents a token after the system prompt word is segmented, and is a predefined model role, rule, and instruction text.

[0047] User input: , , 、… represents the abbreviation for Token. Each symbol represents a word element after the user-input question and answer text is segmented, which is the specific request content initiated by the user.

[0048] 2. Embedding Module

[0049] It is a text-to-vector mapping module. Its function is to convert discrete text words into continuous embeddings that can be processed by the large language model, using the word segmenter and embedding matrix that come with the target large language model.

[0050] Output two core vectors: system prompt word embedding representation User input embedding representation .

[0051] 3. Watermark Encoder

[0052] This is the core watermark encoding unit of the present invention, which is composed of multiple cascaded Transformer modules.

[0053] Input: System prompt word embedding representation User identity information M (i.e., the user identity bit string to be embedded).

[0054] Output: Updated system prompt word embedding representation The virtual prompt word vector containing user identity information has been inserted.

[0055] 4. LLM

[0056] The target chat language model is the core carrier of text generation. In this invention, its main parameters are frozen throughout the process without any modification.

[0057] Input: The concatenated complete embedding vector (with watermarked system prompt word embedding) +User Input Embedding ).

[0058] Core process: LLM calculates the predicted distribution of the next word through a multi-layer self-attention mechanism, and then selects the next word through a continuous sampling strategy. (Represents the x-th word of the generated text); through an autoregressive loop of repeated sampling, the complete text is generated word by word, and the final output is... , , The generated text fragment consists of , ..., which is the target text embedded with the user's identity watermark.

[0059] 5. Watermark Decoder (Watermark Extractor)

[0060] This is the core watermark extraction unit of the present invention, which is composed of a multi-layer LSTM network and a fully connected layer.

[0061] Input: An embedded representation of the generated text fragment; Output: That is, the restored user identity information.

[0062] Function: Extract watermark signals from the sequence features of generated text, restore the embedded user identity bit string, and ultimately achieve source tracing of the generated text.

[0063] 6. Three major loss constraints

[0064] The three loss terms together constitute the total loss during the training phase, which is used for backpropagation to optimize the parameters of the watermark encoder and decoder, while ensuring both text generation quality and watermark extraction accuracy.

[0065] Distribution Loss: Calculates the difference between the word prediction probability distribution of the LLM output and the expected response probability distribution. It is constructed using cross-entropy loss to constrain the distribution of the watermarked text to be consistent with the original model output, thus avoiding the impact of watermark embedding on the generation quality.

[0066] Semantic Loss: Extracts semantic vectors of the generated text and the expected answer through a pre-trained semantic representation model, calculates the cosine similarity difference between the two, constrains the core semantic consistency between the watermarked text and the original text, and ensures that the watermark embedding does not change the original meaning of the text.

[0067] Watermark Recovery Loss: Calculates the identity information restored by the decoder. The difference from the original identity information M is constructed using a binary cross-entropy loss to constrain the accuracy of watermark encoding and decoding, ensuring that the identity information can be stably restored.

[0068] like Figure 1 As shown, this embodiment provides a text tracing method based on a large language model with virtual prompt word embedding, including the following steps:

[0069] (1) Obtain system prompts and user input, and use a fixed template to combine system prompts and user input to construct chat input command text with role format;

[0070] (2) Obtain the word segmenter and embedding matrix corresponding to the target chat language model. Use the word segmenter and embedding matrix to convert the chat input command text obtained in step (1) into the corresponding embedding vector input. The embedding vector input includes the embedding representation corresponding to the system prompt word and the embedding representation corresponding to the user input.

[0071] (3) Obtain user identity information, convert the user identity information into a fixed-length bit string, input the bit string and the embedding representation corresponding to the system prompt word obtained in step (2) into the pre-trained watermark encoder to generate a virtual prompt word representation containing m embedding vectors, where m is the length of the predefined watermark message, insert the virtual prompt word representation into the end position of the embedding representation corresponding to the system prompt word obtained in step (2) to obtain the updated system prompt word embedding representation;

[0072] (4) The updated system prompt word embedding representation obtained in step (3) is concatenated with the user input embedding representation obtained in step (2), and input into the target chat language model described in step (2) to obtain the generated text with user identity information watermark;

[0073] (5) Input the generated text obtained in step (4) into the pre-trained watermark extractor to recover the user identity information embedded in the generated text, and realize the source tracing of the generated text based on the recovered user identity information.

[0074] In some embodiments, in step (1), the fixed template includes predefined instruction words and role definition information. The role definition information includes the identification content corresponding to three types of roles: system, user, and model. When constructing chat input instruction text, the system prompt words and user input are encapsulated according to the identification content corresponding to the system role and user role, respectively.

[0075] In some embodiments, the specific process of converting the chat input command text into the corresponding embedded vector input in step (2) is as follows:

[0076] (2.1) The chat input command text is segmented and sliced ​​using the word segmenter to obtain a text sequence composed of several word units;

[0077] (2.2) Look up the encoding table corresponding to the word segmenter, convert each word in the text sequence into the corresponding word token, and obtain the encoding sequence composed of word tokens;

[0078] (2.3) Match the corresponding embedding vector from the embedding matrix according to the word identifier to obtain the embedding vector input corresponding to the chat input command text.

[0079] In some embodiments, in step (3), the watermark encoder is composed of several transformer modules connected in sequence. The watermark encoder encodes the bit string corresponding to the user identity information into the embedding space aligned with the embedding representation corresponding to the system prompt word through an attention mechanism, thereby generating the virtual prompt word representation.

[0080] In some embodiments, in step (3), after the virtual prompt word representation is inserted, the updated system prompt word embedding representation and the embedding representation corresponding to the user input are concatenated in sequence, and the dimension of the obtained complete embedding representation matches the input dimension of the target chat large language model.

[0081] In some embodiments, the specific process of obtaining the generated text with the user identity information watermark in step (4) is as follows:

[0082] (4.1) Input the concatenated embedding representation into the target chat language model, and calculate the predicted probability distribution of the next word in the text through the multi-layer self-attention mechanism of the target chat language model;

[0083] (4.2) Select a word from the predicted probability distribution obtained in step (4.1) as the predicted text for the next position using a pre-defined continuous sampling strategy;

[0084] (4.3) After concatenating the lexical units obtained in step (4.2) to the input sequence in step (4.1), repeat the process from step (4.1) to step (4.3) until the output length reaches the preset maximum value, or the predefined terminator in the target chat language model vocabulary is obtained by sampling.

[0085] (4.4) Combine all the resulting lexical units into the generated text with user identity information watermark output by the target chat big language model.

[0086] In some embodiments, in step (4), during the process of generating text, the distribution loss between the word prediction probability distribution obtained in step (4.1) and the expected answer probability distribution is calculated simultaneously; after the text is generated, the semantic representation of the generated text is extracted through a pre-trained semantic representation model, and the semantic loss between the generated text and the expected answer is calculated. The distribution loss and the semantic loss are both used for the training optimization of the watermark encoder.

[0087] In some embodiments, in step (5), the watermark extractor is composed of a long short-term memory network and several fully connected layers connected sequentially, and the latent space dimension of the long short-term memory network is consistent with the embedding space dimension of the target chat large language model in step (2).

[0088] In some embodiments, the specific process of restoring user identity information in step (5) is as follows:

[0089] (5.1) Using the word segmenter and embedding matrix described in step (2), the generated text is converted into the corresponding continuous embedding representation;

[0090] (5.2) Input the continuous embedding representations corresponding to the generated text into the Long Short-Term Memory network one by one in the order of the lexical units, and extract the sequence features of the generated text;

[0091] (5.3) Input the extracted sequence features into the fully connected layer to obtain the prediction results corresponding to each bit of the watermark information. Based on the prediction results, restore the bit string corresponding to the user identity information and finally convert it to obtain the user identity information.

[0092] In some embodiments, during the training phase of the watermark encoder and watermark extractor, a text rewriting attack simulation module is provided. The text rewriting attack simulation module is built based on a pre-trained large language model. The watermarked text generated by the target chat large language model is input into the text rewriting attack simulation module to obtain the rewritten attack text. The attack text and the original watermarked text are input together into the watermark extractor to train and optimize the watermark recovery capability of the watermark extractor.

[0093] The technical concept of this invention is as follows:

[0094] The core of this invention is to dynamically adjust the embedding representation of prompt words in a large language model, encode user identity watermark information into virtual prompt word embedding vectors during the model inference stage, guide the large language model to output text with watermark features, and recover identity information from the generated text through a matching watermark extractor, ultimately achieving full-process traceability of the generated text.

[0095] The core implementation process of this method mainly includes five core steps:

[0096] The first step is to construct the input text for the large language model based on a fixed template. The predefined system prompt words and the user-input question text are standardized and encapsulated according to the role format supported by the large language model to form chat command text that meets the model's input requirements.

[0097] The second step involves using the word segmenter and embedding matrix provided with the target large language model to convert the standardized instruction text constructed in the first step into a continuous embedding vector representation that the model can process, thus completing the mapping from text input to the embedding space.

[0098] The third step involves generating a continuous virtual prompt word embedding vector based on a fixed-length bit string corresponding to the user's identity information using a watermark encoder. This virtual prompt word embedding vector is then inserted into the end of the system prompt word embedding representation and the front of the user input embedding representation, thus completing the embedding of watermark information in the model input stage.

[0099] The fourth step is to input the complete embedded representation containing system prompts, virtual prompts, and user input into the target large language model. Through the model's self-attention calculation and continuous sampling, target text embedded with user identity watermark information is generated.

[0100] The fifth step involves inputting the generated watermarked text into the accompanying watermark extractor. Through feature extraction and probability prediction, the user identity bit string information embedded in the text is recovered, and the source of the generated text is traced based on this identity information.

[0101] The watermark encoder adopts a Transformer module architecture, which can align and encode user identity bit information with the semantic features of system prompt words, ensuring the global guiding role of watermark information in the model generation process. The watermark extractor adopts a long short-term memory network with a fully connected layer architecture, which can stably extract watermark bit information from the sequence features of the generated text. At the same time, this scheme introduces a text rewriting attack simulation module during the training phase to further improve the anti-interference ability and robustness of watermark extraction.

[0102] Specifically, the following steps are included:

[0103] (1) The input of the question-and-answer (QA) data construction method includes two parts: System (system prompts) and User (user input), and the corresponding text input is constructed using a fixed template;

[0104] (2) Use the tokenizer and embedding matrix of the corresponding Chat model (this scheme uses Llama-3.1-8B-Instruct) to convert the text input in (1) into the corresponding embedding input;

[0105] (3) Use the bit string corresponding to the user's identity information to generate a "virtual prompt word" representation containing m Embedding vectors (m is the length of the predefined watermark message, which is 8 bits in this scheme), and insert it into the position at the end of the system prompt word in the Embedding vector obtained in (2);

[0106] (4) Concatenate the system prompt words obtained in (3) and the Embedding vector input by the user and input them into the Chat model to obtain the generated text with the identity information watermark;

[0107] (5) Input the generated text obtained in (4) into the watermark extractor to recover the user's identity information and realize the traceability of the generated text.

[0108] In some embodiments, the specific method for converting discrete text input into a continuous representation in the LLM embedding space in step (2) is as follows:

[0109] (2.1) The system prompt words and user input text are segmented and sliced ​​using the Tokenizer of the target model to obtain a sequence of n tokens;

[0110] (2.2) Look up the encoding table and match each Token with its ID in the encoding table to obtain a sequence composed of Token ID encodings;

[0111] (2.3) Obtain the embedding matrix of the model, and extract the embedding vector from the corresponding row of the embedding matrix according to the Token ID to obtain the model. The size of the embedding representation is the embedding input corresponding to the text input, where d is the dimension of the target model's embedding space.

[0112] In some embodiments, the specific process of obtaining the "virtual prompt word" that can be embedded in the LLM-generated text through user identity information in step (3) is as follows:

[0113] (3.1) Obtain the user's identity representation and represent it using a fixed-length bit string;

[0114] (3.2) Input the identity bit string and system prompt words into the watermark encoder. Encode the identity information into the embedding space and align it with the system prompt words through the attention mechanism to obtain a "virtual prompt word" representation containing m embedding vectors. Then, insert it directly into the position after the system prompt words in the input template constructed by (2) and before the user input begins. This will result in an input representation that can guide the LLM to generate watermarked text. The specific calculation process is as follows:

[0115] ;

[0116] in, : The watermarked system prompt word embedding representation is the updated system prompt word embedding vector obtained after the watermark embedding operation. It is used to concatenate the embedding representation corresponding to the user input and input it into the target chat language model. This represents the embedded representation of the original system prompt words. "Indicates a splicing operation, This represents the watermark bit information to be encoded. This represents a watermark encoder, which is composed of several Transformer modules connected together (this scheme uses 4 layers of Transformer modules to form the encoder, and each module has 16 attention heads).

[0117] In some embodiments, the specific process of obtaining the generated text from the LLM in step (4) is as follows:

[0118] (4.1) Input the embedding representation obtained in (3) into the LLM, calculate the predicted probability distribution of the next token in the text through the multi-layer self-attention mechanism of the LLM, and calculate the distribution loss of the token;

[0119] (4.2) Select a token from the prediction distribution of (4.1) as the predicted text for the next position using a pre-defined continuous sampling strategy;

[0120] (4.3) Concatenate the text sampled in (4.2) after the input in (4.1) and repeat the above process until the output length reaches the preset maximum value (200 tokens in this scheme) or the terminator “<|eot_id|>” is sampled.

[0121] (4.4) All the new tokens obtained form the generated text of the LLM, and a pre-trained SentenceTransformer model is used to extract semantic representations and calculate the semantic loss between the expected answer and the target answer.

[0122] In some embodiments, the specific method for recovering the watermark bit information from the generated text of the LLM using a watermark extractor in step (5) is as follows:

[0123] (5.1) The generated text is converted from discrete text into a continuous embedding representation using the same tokenizer and embedding matrix as in (2);

[0124] (5.2) Input the embedded representation of the generated text into the watermark extractor to recover the original watermark representation and calculate the watermark recovery loss. The watermark extractor consists of a Long Short-Term Memory (LSTM) network and several fully connected layers. In this scheme, the number of LSTM layers is set to 3, the latent space dimension is d, which is consistent with the embedding space size of the target model, and the number of fully connected layers is 2. The input and output dimensions of the first fully connected layer are both d, the input dimension of the second fully connected layer is d, and the final output dimension is the message length m.

[0125] In one specific embodiment:

[0126] (1) Using a fixed template, combine the system prompts and the question input to construct chat input command text with a role format. The specific command text constructed is as follows: Figure 2 As shown, the content added to the original input includes predefined instruction tokens and role definitions. "<|begin_of_text|>" and "<|eot_id|>" are special tokens defined in the vocabulary, representing the start and end of the input content, respectively. "<|start_header_id|>" and "<|end_header_id|>" are predefined special tokens in the vocabulary; the content between them is the role definition, which includes three categories: system, user, and assistant. This content is generally consistent across different languages ​​and models. The text following "<|start_header_id|>system<|end_header_id|>" up to the first "<|eot_id|>" is the system prompt text, which contains predefined information and instructions from the service provider.

[0127] (2) For the input instruction text constructed in step (1), input it into the word segmenter and embedding model corresponding to the large language model to obtain the instruction embedding representation;

[0128] (2.1) Use a word segmenter to segment the input text to obtain the input text sequence, and look up the word segmenter’s word list to obtain the word list index ID corresponding to each word segment, and represent the text sequence with the word list index ID sequence;

[0129] (2.2) Obtain independent output embedding matrices from the large language model, process the sequences obtained in (2.1) one by one, and represent the index ID sequence with the embedding vector of the corresponding row in the embedding matrix by the row number of the index ID corresponding to the embedding matrix.

[0130] (3) Use the bit string corresponding to the user's identity information Generate a segment containing The "virtual prompt word" of each embedding vector is represented and inserted into the position at the end of the system prompt word in the embedding vector obtained in (2);

[0131] (3.1) Input the bit string corresponding to the user's identity and the system prompt word into the watermark encoder. Encode the identity information into the embedding space through the attention mechanism and concatenate it after the embedding representation of the system prompt word. The specific calculation method is as follows:

[0132] ;

[0133] in, This represents the embedded representation of the original system prompt words. " indicates the vector concatenation operation. This represents the watermark bit information to be encoded. This represents a watermark encoder, which is composed of several Transformer modules sequentially; the Transformer module architecture is as follows: Figure 3 As shown;

[0134] Figure 3 This is a structural diagram of the Transformer module, the core component of the watermark encoder. The watermark encoder of this invention is composed of multiple such modules cascaded together. This module captures the contextual dependencies of the sequence through a multi-head attention mechanism, realizes the alignment encoding of the user identity bit string and the semantics of the system prompt words, and encodes the identity information into virtual prompt word vectors that conform to the embedding space specification of a large language model.

[0135] (3.2) Concatenate the system prompts, virtual prompts, and user input embeddings in sequence to obtain the continuous embeddings input into the LLM: ,in For LLM input, For questions entered by the user.

[0136] (4) Input the embedded representation obtained in (3) into the large language model, and obtain the output text sequence with watermark information through continuous sampling under the guidance of the "virtual prompt word";

[0137] (4.1) Input the embedded representation into the LLM, predict the probability distribution of the next text segment from the input text through a multi-layer self-attention mechanism, and calculate the distribution loss between the probability distribution and the expected response.

[0138] (4.2) The next text segment is sampled according to the probability distribution predicted by LLM using the Gumbel-Softmax algorithm, and the sampling process is continuously differentiable so that the gradient can propagate in direction;

[0139] (4.3) Concatenate the text sampled in (4.2) after the input in (4.1) and repeat the above process until the output length reaches the preset maximum value or a specific terminator in the vocabulary is sampled.

[0140] (4.4) Save the new text obtained through the above process as the generated text of LLM, and use a pre-trained Sentence Transformer model to extract semantic representations and calculate the semantic loss between the expected answer;

[0141] (5) Input the generated text into the watermark extractor to recover the user's identity information, and use the identity information to trace the origin of the generated text;

[0142] (5.1) Input the embedded representations corresponding to the generated text one by one into the Long Short-Term Memory (LSTM) network. The LSTM network module architecture is as follows: Figure 4 As shown, the specific calculation process is as follows:

[0143] ;

[0144] in, The features corresponding to time step t, Output the hidden state at the current time step. This represents the cell state updated at the current time step. To generate the embedded representation corresponding to the t-th token in the text. For the features corresponding to time step t-1, this process continues to loop until all the tokens in the generated text have been sequentially input into the LSTM.

[0145] Figure 4 This is a schematic diagram of the core component of the watermark extractor—the LSTM (Long Short-Term Memory) unit. The watermark extractor of this invention is composed of multiple cascaded LSTM units. This unit is an improved version of the recurrent neural network (RNN), which solves the gradient vanishing problem of long sequences in traditional RNNs. It can effectively capture the contextual dependencies of long text sequences, completely extract the globally distributed watermark signal in the generated text, and ultimately achieve stable restoration of user identity information.

[0146] The LSTM unit is the core sequence feature extraction unit of the watermark extractor; σ (Sigmoid Activation Function) is the Sigmoid activation function, which maps the input value to the interval 0~1, controlling the "on / off" state of the gate, where 0 is completely closed and 1 is completely open; T (Tanh Activation Function) is the hyperbolic tangent activation function, which maps the input value to the interval -1~1, generating candidate cell states and performing nonlinear transformation on the features; W is the weight matrix, a learnable parameter during network training, which performs linear transformation on the input features; b is the bias term, a learnable parameter during network training, which adjusts the output baseline of the linear transformation; (Input) is the input at the current time step, generating the embedding vector corresponding to the t-th word in the text, inputting word by word in order; The output of the hidden state of the previous time step is the feature output after the previous word processing, which conveys the context information of the sequence. For the cell state of the previous time step, the LSTM acts as a "memory conveyor belt," transmitting long sequences of global memory and watermark features. The updated global memory and watermark features are the cell state after the current time step and after processing by the current word unit. The hidden state output at the current time step is the sequence feature extracted after processing the current word, which is finally input into the fully connected layer for watermark bit prediction.

[0147] (5.2) After the features extracted by LSTM are passed through a fully connected layer, the corresponding logits for each bit of the watermark information are obtained. These logits are then fed into the Softmax function to obtain the probability prediction for each bit. The specific calculation method is as follows:

[0148] ;

[0149] in, This represents the predicted distribution of the bit string by the watermark extractor. This represents a fully connected network of n successive layers. This represents the output of the last layer of the LSTM;

[0150] (5.3) For each bit probability prediction, take the one with the highest probability as the code corresponding to the current bit, and finally obtain the encoded watermark bit information.

[0151] Since only the core semantics of the generated text are preserved in scenarios such as deep rewriting, watermarking methods that rely solely on semantic stability are easily lost after rewriting. Therefore, during the training process of the watermark encoder and extractor, after obtaining the generated text through (4), this method designs a rewriting module (based on a pre-trained rewriting LLM, this scheme uses the Chat model consistent with the target model) to simulate rewriting attacks, and inputs the rewritten text under attack into the watermark extractor to improve the extraction accuracy of the watermark extractor in the attack scenario and enhance the robustness of the watermarking method in the attack scenario.

[0152] In some embodiments, the total loss function during the training phase of this method is constructed using a weighted summation method to simultaneously constrain the text generation quality and the watermark embedding and extraction effect. The specific expression of the total loss function is as follows:

[0153] ;

[0154] in, This represents the total loss value during the training phase. The distributed loss obtained in step (4), The semantic loss obtained in step (4), The watermark recovery loss obtained in step (5); , , These are the preset weight coefficients for the distribution loss, semantic loss, and watermark recovery loss, respectively. All weight coefficients are constants greater than 0, and the sum of the weights is 1.

[0155] In some embodiments, the distribution loss in step (4) A cross-entropy loss function is used to constrain the consistency between the generated text guided by the watermark and the token probability distribution of the expected response, thus avoiding the impact of watermark embedding on the accuracy of text generation. Its specific expression is as follows:

[0156] ;

[0157] in, The total number of tokens used to generate the text. The total vocabulary size of the target chat large language model, The first in the expected answer The one-hot tag of the vocabulary corresponding to each location token The first output of the target chat large language model in step (4.1) The position The predicted probability value corresponding to each word token.

[0158] In some embodiments, the semantic loss described in step (4) A cosine similarity loss function is used to constrain the global semantic consistency between the watermarked generated text and the expected answer, ensuring that the watermark embedding does not change the core semantic information of the text. Its specific expression is as follows:

[0159] ;

[0160] in, The semantic vector of the watermarked text extracted in step (4.4) through the pre-trained semantic representation model is used. The semantic vector of the expected answer extracted from the pre-trained semantic representation model. Let L2 be the norm of the vector.

[0161] In some embodiments, the watermark recovery loss in step (5) A binary cross-entropy loss function is used to constrain the bit prediction accuracy of the watermark extractor, ensuring that the embedded user identity information can be stably restored. Its specific expression is as follows:

[0162] ;

[0163] in, To predefine the length of the watermark message, The bit string corresponding to user identity information The actual label value of the bit. The first watermark extractor output in step (5) The predicted probability value of the watermark bit.

[0164] In some embodiments, to balance text generation quality and watermark embedding / extraction effectiveness, the weighting coefficients... The value range is 0.2 to 0.4. The value range is 0.3 to 0.5. The value range is 0.2 to 0.4; preferably, The value is 0.3. The value is 0.4. The value is set to 0.3, which maximizes the embedding depth and extraction accuracy of watermark information while ensuring that the semantic accuracy and fluency of the generated text are not significantly different from the original model output.

[0165] In some embodiments, during training, all network parameters of the target chat large language model are frozen, and training is performed solely based on the aforementioned total loss function. The backpropagation results are used to iteratively update the network parameters of the watermark encoder and watermark extractor, avoiding interference with the native text generation capability of the target chat large language model during the training process, while significantly reducing the computational cost and labeled data requirements for training.

[0166] In some embodiments, during the training phase of the watermark encoder and watermark extractor, all parameters of the target chat language model are frozen. No weight parameters of the target chat language model are updated during training. Only the network parameters of the watermark encoder and watermark extractor are iteratively optimized to avoid affecting the text generation capability of the target chat language model during training, while significantly reducing the computational overhead and data requirements for training.

[0167] In some embodiments, the QA dataset used in the training phase is selected from publicly available general question-and-answer datasets, including but not limited to one or more combinations of the SQuAD dataset, WebQuestions dataset, and TriviaQA dataset. The dataset has a sample size of no less than 100,000, and each sample contains standardized question text and corresponding standard answer text to simulate real user question-and-answer scenarios, ensuring that the trained watermark encoder and watermark extractor can be adapted to general question-and-answer interaction scenarios.

[0168] In some embodiments, the optimizer used in the training phase is the AdamW optimizer, the learning rate ranges from 1e-5 to 5e-4, preferably 2e-5; the batch size ranges from 8 to 64, preferably 32; the number of iterations ranges from 10 to 50, preferably 20; a learning rate decay strategy is adopted during training, and the learning rate decays to 0.8 of the current value after every 5 iterations, so as to avoid overfitting during training and improve the generalization ability of the network.

[0169] In some embodiments, the watermark extraction effect of this method is verified. The target chat language model used for verification is Llama-3.1-8B-Instruct, the watermark bit length is 8 bits, and the maximum length of the generated text is 200 tokens. In scenarios without text editing or rewriting attacks, the average error rate of watermark extraction is no higher than 0.1%, achieving 100% accurate restoration of user identity information. In scenarios with mild editing attacks such as word replacement and sentence structure adjustment within 10% of the generated text, the average error rate of watermark extraction is no higher than 1%, still stably achieving accurate restoration of user identity information. In scenarios with deep rewriting attacks such as synonym rewriting and semantic restatement of the generated text, the average error rate of watermark extraction is no higher than 5%, and complete restoration of user identity information can be achieved through error correction coding, demonstrating excellent anti-attack robustness.

[0170] In some embodiments, the watermark encoder can replace the pure transformer module structure with a structure of transformer module and convolutional neural network cascaded together. Specifically, it is composed of two transformer modules followed by two one-dimensional convolutional neural networks, wherein the convolution kernel size of the one-dimensional convolutional neural network is 3 and the stride is 1. The local features of the embedded representation are further extracted through the convolutional neural network to improve the encoding adaptation capability of the watermark encoder for short text system prompt words.

[0171] In some embodiments, the long short-term memory network in the watermark extractor can be replaced by a gated recurrent unit network. The number of layers in the gated recurrent unit network is set to 3, and the latent space dimension is consistent with the embedding space dimension of the target chat large language model. The simplified gating structure reduces the computational overhead of the watermark extractor, improves the speed of watermark extraction, and adapts to the deployment requirements of low computing power devices.

[0172] In some embodiments, the predefined watermark message length m can be adapted and adjusted according to the security requirements of the tracing scenario. When the tracing scenario is a high-security judicial evidence scenario, the value of m can be set to 16 bits, 32 bits, or 64 bits. Increasing the watermark bit length improves the uniqueness of the identity information and the anti-collision capability. When the tracing scenario is a low-security general content identification scenario, the value of m can be set to 4 bits. Shortening the watermark bit length reduces the impact of watermark embedding on the generated text and further improves the speed of watermark extraction.

[0173] In some embodiments, to address the issues of diminishing guidance effect of the virtual prompt words injected once beforehand on the latter half of the generated text and the failure of long text tracing due to watermark signal loss in long text generation scenarios, a dynamic virtual prompt word iterative injection mechanism is set during the text generation process in step (4). Specifically, a token step size threshold for the generated text is preset. During the autoregressive generation process in step (4), the current generation process is paused after generating a number of tokens corresponding to the step size threshold. All generated text is then converted into an embedded representation of the generated text through the word segmenter and embedding matrix in step (2). The text embedding representation, the original system prompt word embedding representation, and the bit string corresponding to the user's identity information are input into the watermark encoder to generate an updated virtual prompt word representation. The updated virtual prompt word representation is then concatenated to the end of the currently generated text embedding representation before continuing the subsequent autoregressive generation process. Through this mechanism, the watermarking guidance effect of the virtual prompt words continues to be effective throughout the entire long text generation process, avoiding the attenuation and loss of the watermark signal in the latter half of the long text. At the same time, no parameters of the target chat large language model need to be modified. The embedding layer is updated only during the intervals of autoregressive generation, without affecting the model's generation logic and text quality.

[0174] In some embodiments, to improve the robustness of watermark information against targeted erasure and deep semantic rewriting attacks, a watermark sparse coding mechanism with semantic anchor binding is introduced in the process of generating virtual prompt word representation in step (3). Specifically, the core entities and core predicates are extracted from the system prompt words and user input as a set of semantic anchors through a pre-trained entity recognition model and semantic role labeling model. The bit string corresponding to the user identity information is split into multiple sets of bit substrings corresponding to the number of semantic anchors. During the encoding process, the watermark encoder assigns corresponding attention weights to each semantic anchor through a multi-head attention mechanism, and encodes each set of bit substrings into the embedding representation subspace of the corresponding semantic anchor. Finally, the virtual prompt word representation is generated. In the proposed prompt word representation, each set of embedding vectors is deeply bound to the semantic features of a semantic anchor point; in the watermark extraction process in step (5), the corresponding semantic anchor point is first extracted from the generated text, and then the corresponding bit substrings are extracted based on the embedding representation of the semantic anchor point, and finally the complete user identity information is restored by splicing; through this mechanism, the watermark information is converted from a globally distributed bias to a sparse encoding that is deeply bound to the core semantic anchor point of the text. Even if the text is deeply rewritten and the non-core content is significantly modified, as long as the core semantic anchor point is retained, the watermark information can be completely extracted, and the ability to resist attacks is significantly improved. Moreover, this mechanism is fully implemented in the encoding logic of the watermark encoder, without modifying the structure and parameters of the target chat large language model.

[0175] In some embodiments, to address the issues of fixed identity bit strings being easily forged, watermark collisions in multi-user scenarios, and the inability to achieve non-repudiation of generation behavior in judicial scenarios, before generating the virtual prompt word representation in step (3), the user identity information is first subjected to chaotic encryption and timestamp binding processing; specifically, the precise timestamp of the current text generation, the hash value of the user input text, and the global mean feature of the original system prompt word embedding representation are obtained, and the three are concatenated as the initial value of the chaotic mapping system. Logistic chaotic mapping is used to perform chaotic encryption on the fixed bit string corresponding to the user identity information to generate a unique dynamic watermark bit string, and then the dynamic watermark is... The bit string is input into the watermark encoder to generate a virtual prompt word representation; in the watermark extraction process of step (5), the timestamp, user input hash value, and system prompt word feature verification bit are first extracted from the restored dynamic watermark bit string. After the full element verification of the generation behavior is completed, the user identity information is restored; through this mechanism, the watermark bit string corresponding to each generation behavior is a unique value. Even if the same user and the same system prompt word are generated at different times and under different user inputs, the watermarks generated are different, which fundamentally avoids watermark collision and forgery, and at the same time realizes the non-repudiation of the entire generation behavior, providing an immutable and complete evidence chain for judicial scenarios.

[0176] In some embodiments, to address the issues of maliciously truncated generated text, splicing and mixing of multiple generated text segments, watermark extraction failure due to partial content replacement, and inability to distinguish the source entity, a self-synchronizing encoding mechanism for block watermarking is introduced when generating the virtual prompt word representation in step (3). Specifically, the bit string corresponding to the user's identity information is split into a synchronization header bit segment and an identity information bit segment. The synchronization header bit segment is a predefined, unique pseudo-random sequence, and the identity information bit segment is the core encoded content corresponding to the user's identity information. The watermark encoder encodes the synchronization header bit segment and the identity information bit segment respectively, generating a synchronization header embedding vector and an identity information embedding vector, which together form the virtual prompt word representation. In the text generation process of step (4), each time a Token block of a preset length is generated, a virtual prompt word representation is generated. The synchronization header embedding vector in the token applies a synchronization bias to the generation distribution of the current token block, so that each token block is embedded with the corresponding synchronization header signal; in the watermark extraction process of step (5), the generated text is first scanned in its entirety by a preset synchronization header pseudo-random sequence to locate the boundaries of all token blocks corresponding to the synchronization header signals, and then the identity information bit segments are extracted from each token block to finally complete the differentiation of the source subject and the restoration of complete identity information of multiple text segments; through this mechanism, even if the generated text is truncated, spliced, or mixed, the effective watermark block can be accurately located by the synchronization header signal to realize the source tracing of fragmented text, and at the same time, the content of different generating subjects and different generating batches in the mixed text can be distinguished, filling the gap in the existing technology that cannot handle the source tracing of fragmented and mixed text.

[0177] In some embodiments, to address the problem of watermark information loss after generated text is rewritten, polished, and restated across models by a third-party large language model, leading to source tracing failure in cross-model scenarios, a cross-model semantic space alignment watermark embedding constraint mechanism is introduced during the training phase of the watermark encoder. Specifically, three or more mainstream large language models with different architectures and parameter sets are pre-selected as alignment models, and the embedding matrices of each alignment model are obtained. During training, the virtual prompt word representations output by the watermark encoder are input into the embedding layers of each alignment model, and the semantic features of the generated text output by each alignment model are extracted. The semantic features of the generated text from each alignment model are then calculated. The cosine similarity loss between semantic features of the generated text is used as a semantic invariance constraint term and added to the total loss function for iterative optimization. Through this mechanism, the watermark encoder encodes the watermark information into a semantically invariant deep feature space that is common to different large language models, rather than into the vocabulary probability distribution or surface language features of a single target model. Even if the generated text is rewritten, polished, or restated by other large language models, as long as the core semantics do not change, the watermark information can be completely preserved, achieving stable source tracing in cross-model scenarios and breaking through the technical bottleneck that existing text watermarks can only adapt to a single target model.

[0178] In some embodiments, to address the problem that existing watermark embedding methods are easily detected by steganography tools as having abnormal text statistical features, leading to the watermark being located and erased, an adaptive smoothing mechanism for the embedding distribution is introduced in step (3) during the generation of the virtual prompt word representation. Specifically, after the watermark encoder outputs the virtual prompt word representation, the original system prompt word embedding representation and the system prompt word embedding representation after adding the virtual prompt word are first input into the target chat large language model to obtain the first token prediction probability distributions corresponding to the two, calculate the KL divergence value between the two probability distributions, compare the KL divergence value with a preset divergence threshold, and if the KL divergence value exceeds the divergence threshold, the virtual prompt word is smoothed by gradient descent. The prompt word indicates that iterative corrections are made to minimize the KL divergence value of the two probability distributions while ensuring that the watermark information can be extracted. At the same time, the divergence threshold adopts an adaptive adjustment method, which is dynamically set based on the semantic complexity of the user input. The higher the semantic complexity of the user input, the larger the divergence threshold, and the lower the semantic complexity of the user input, the smaller the divergence threshold. Through this mechanism, the probability distribution of the token in the watermarked text is not statistically different from the probability distribution of the text generated by the original model. This can effectively resist the detection of mainstream text steganography analysis tools, avoid the watermark being located and targeted for erasure, and the adaptive divergence threshold balances the robustness of the watermark and the naturalness of the generated distribution without affecting the quality of the generated text.

[0179] In some embodiments, to achieve lightweight extraction and rapid on-device verification of watermark information, while also taking into account the need for accurate tracing in judicial scenarios, a dual-modal watermarking encoding mechanism is introduced when generating virtual prompt word representations in step (3). Specifically, the bit string corresponding to the user identity information is simultaneously encoded into a robust deep semantic watermark and a lightweight surface statistical watermark. The deep semantic watermark is encoded into the deep semantic features of the text through the global guidance of the virtual prompt words, and is used for accurate tracing in judicial scenarios. The surface statistical watermark is encoded into the words of the generated text through low-dimensional mapping of the embedding vector of the virtual prompt words. Among the surface features that can be quickly statistically analyzed, such as long distribution, sentence length distribution, and punctuation usage habits, they are used for rapid verification in edge scenarios. In the watermark extraction process in step (5), the initial screening verification can be completed first through surface statistical features. There is no need to run the complete long short-term memory network extraction model. Only simple statistical calculation is required. Then, the deep semantic watermark is accurately extracted from the text that is initially screened to restore the complete user identity information. Through this mechanism, the judicial-level accuracy of watermarks and the lightweight requirements of edge deployment are taken into account, which greatly reduces the computing power overhead of watermark verification and adapts to the differentiated traceability requirements in different scenarios.

[0180] In some embodiments, for the chaotic encryption and timestamp binding processing of user identity information, a Logistic chaotic mapping is used to generate the dynamic watermark bit string. The iterative expression of the Logistic chaotic mapping is: ,in For chaos control parameters, For the first The chaotic sequence value of the next iteration; the specific encryption process is as follows: First, obtain the precise timestamp of the current text generation, accurate to the millisecond level, convert the timestamp into a 16-bit binary string, and simultaneously calculate the SHA-256 hash value of the user input text. Extract the first 16 bits of the hash value into a binary string, then calculate the global mean feature represented by the original system prompt word embedding, quantize the global mean feature into a 16-bit binary string, and concatenate the above three sets of 16-bit binary strings to obtain a 48-bit initial value sequence, which is then converted into an initial value in the interval [0,1]. Set chaos control parameters The value of is 3.999 to ensure that the chaotic map is in a completely chaotic state. The Logistic chaotic map is pre-iterated 1000 times to eliminate transient effects, and then the iteration continues. Second-rate, To predefine the length of the watermark message, the chaotic sequence value obtained in each iteration is binarized to obtain a chaotic key sequence. The binarization rule is: if the chaotic sequence value is greater than 0.5, output 1; otherwise, output 0. The fixed bit string corresponding to the user identity information is XORed bit by bit with the chaotic key sequence to generate the encrypted dynamic watermark bit string, thus completing the chaotic encryption and timestamp binding process of the user identity information.

[0181] In some embodiments, for the watermark sparse coding mechanism of semantic anchor binding, the matching logic between semantic anchors and bit substrings is as follows: First, core entities and core predicates are extracted from system prompts and user input using a pre-trained entity recognition model and semantic role labeling model. Core entities include proper nouns, numerical identifiers, and core topic subjects; core predicates include core predicate verbs and action command words. The extracted core entities and core predicates are deduplicated to form a semantic anchor set, denoted as... ,in The total number of semantic anchors; the bit string corresponding to the user identity information is split into... The rule for splitting a bit string is: if the total length of the bit string is... Can be If divisible by integer, then the length of each bit substring is... If it cannot be divided evenly, then the first The length of the grouped bit substring is The length of the remaining bit substring is To ensure that all bits of the user identity information are completely split without information loss, the watermark encoder calculates the attention weight between the embedding representation corresponding to each semantic anchor and the virtual prompt word embedding space through a multi-head attention mechanism during the encoding process. An independent encoding subspace is allocated to each semantic anchor, and each group of bit substrings is encoded into the corresponding semantic anchor's encoding subspace, generating an embedding vector group that is bound one-to-one with the semantic anchor. All embedding vector groups are then concatenated to form a complete virtual prompt word representation. During watermark extraction, the same entity recognition model and semantic role labeling model as in the encoding stage are used to extract the corresponding set of semantic anchors from the generated text. Then, according to the splitting rules of the encoding stage, the extracted sequence features are grouped based on the number of semantic anchors, and the corresponding bit substrings are extracted from each group of features. Finally, all bit substrings are concatenated in the encoding order to restore the complete user identity information bit string.

[0182] In some embodiments, for the watermark embedding constraint mechanism of cross-model semantic space alignment, the weighting rule of cross-model cosine similarity loss is as follows: Three or more mainstream large language models with different architectures and parameter counts are pre-selected as alignment models. The selection of alignment models covers Transformer decoder-only architecture, encoder-decoder architecture, and lightweight open-source large language models, denoted as follows: ,in To align the total number of models, Assign corresponding weight coefficients to each alignment model. The weight coefficients are positively correlated with the architectural differences between the alignment model and the target chat large language model; the greater the architectural difference, the higher the corresponding weight coefficient. The sum of all weight coefficients is 1. During training, the virtual prompt word representations output by the watermark encoder are input into the embedding layer of each alignment model. The generated text output by each alignment model is obtained through forward inference. The semantic feature vectors corresponding to each generated text are extracted through a pre-trained semantic representation model, denoted as... Calculate the cosine similarity between all pairwise semantic feature vectors, and take the mean of all cosine similarities as the cross-model semantic consistency index. Construct the cross-model cosine similarity loss function, the expression of which is:

[0183] ;

[0184] in semantic feature vector and Cosine similarity between them for The total number of pairwise combinations of each alignment model; the constructed cross-model cosine similarity loss. As a semantic invariance constraint, it is added to the total loss function, and the total loss function is updated to... ,in These are the preset weight coefficients corresponding to the cross-model cosine similarity loss. The value range is 0.1 to 0.3, preferably 0.2.

[0185] In some embodiments, the pre-trained semantic representation model adopts the Sentence-BERT open-source model, specifically the all-MiniLM-L6-v2 version. The input dimension of this model matches the embedding dimension of the target chat large language model, with a maximum input length of 512 tokens and an output semantic vector dimension of 384. The input and output processing rules are as follows: the text to be semantically extracted is input into the model, processed by the model's token segmenter, and then the token-level embedding vectors output by the model are aggregated through mean pooling to obtain a fixed-length semantic vector of the corresponding text. This semantic vector is used for semantic loss calculation and semantic consistency verification.

[0186] In some embodiments, the text rewriting attack simulation module is built based on a pre-trained open-source large language model, preferably the Llama-3.1-8B-Instruct model, consistent with the target chat large language model; the module's input and output processing rules are as follows: a fixed rewriting attack prompt template is constructed, the template content being "Please rewrite the following text using synonyms, retaining the core semantics and information of the original text. Sentence structure can be adjusted, synonyms replaced, and paragraph order adjusted. The rewritten text cannot be completely identical to the original text. The rewriting degree is divided into three levels: mild, moderate, and deep. The rewriting level in this case is {le}". vel}, original content: {text}”, where {level} is the preset rewriting level, and {text} is the watermarked text generated by the target chat language model; during the training phase, three rewriting levels are set: light, medium, and deep. Light rewriting corresponds to word replacement and sentence adjustment within 10%, medium rewriting corresponds to content restatement and structural adjustment within 30%, and deep rewriting corresponds to full-text paraphrasing and semantic paraphrasing. The attack text obtained from different rewriting levels and the original watermarked text are input into the watermark extractor, and the watermark recovery loss is calculated simultaneously to complete the training and optimization of the watermark extractor.

[0187] In some embodiments, for the step size threshold in the dynamic virtual prompt word iterative injection mechanism, a corresponding preferred value range is set according to the preset maximum length of the generated text: when the preset maximum length of the generated text is less than or equal to 500 Tokens, the step size threshold ranges from 50 to 100 Tokens, preferably 50 Tokens; when the preset maximum length of the generated text is 500 to 2000 Tokens, the step size threshold ranges from 100 to 200 Tokens, preferably 100 Tokens; when the preset maximum length of the generated text is greater than 2000 Tokens, the step size threshold ranges from 200 to 500 Tokens, preferably 200 Tokens; during the text generation process, each time a number of Tokens corresponding to the step size threshold is generated, a virtual prompt word update and injection operation is performed once, ensuring that the watermark guidance effect continues to be effective throughout the entire long text generation process.

[0188] In some embodiments, for the divergence threshold in the embedded distribution adaptive smoothing mechanism, an adaptive adjustment method based on the semantic complexity of user input is adopted. First, the semantic vector of the user input text is extracted through a pre-trained semantic representation model, and the information entropy value of the semantic vector is calculated as the semantic complexity index of the user input. The higher the information entropy value, the higher the semantic complexity of the user input. The rules for setting the divergence threshold are as follows: when the semantic complexity information entropy value is less than 2, the divergence threshold ranges from 0.01 to 0.05, preferably 0.02; when the semantic complexity information entropy value is 2 to 5, the divergence threshold ranges from 0.05 to 0.1, preferably 0.08; when the semantic complexity information entropy value is greater than 5, the divergence threshold ranges from 0.1 to 0.2, preferably 0.15. During the iterative correction process of the virtual prompt word representation, when the KL divergence value between the original generation distribution and the watermarked generation distribution is less than or equal to the current divergence threshold, the iterative correction is stopped, and the final virtual prompt word representation is output.

[0189] The embodiments described above are for illustrative purposes only and are not intended to limit the invention. Therefore, any changes in numerical values ​​or substitutions of equivalent elements should still fall within the scope of this invention.

[0190] The above detailed description will enable those skilled in the art to understand that the present invention can indeed achieve the aforementioned objectives and has complied with the provisions of the Patent Law.

[0191] Although preferred embodiments of the invention have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of the invention. The above descriptions are merely preferred embodiments of the invention and are not intended to limit the invention. It should be noted that any modifications, equivalent substitutions, and improvements made within the spirit and principles of the invention should be included within the scope of protection of the invention.

[0192] It should be noted that the above description of the process is for illustrative purposes only and does not limit the scope of this specification. Those skilled in the art can make various modifications and changes to the process under the guidance of this specification. However, these modifications and changes remain within the scope of this specification.

[0193] The basic concepts have been described above. Obviously, for those skilled in the art who have read this application, the above disclosure is merely illustrative and does not constitute a limitation of this application. Although not explicitly stated herein, those skilled in the art may make various modifications, improvements, and corrections to this application. Such modifications, improvements, and corrections are suggested in this application, and therefore, such modifications, improvements, and corrections still fall within the spirit and scope of the exemplary embodiments of this application.

[0194] Furthermore, this application uses specific terms to describe its embodiments. For example, "an embodiment," "one embodiment," and / or "some embodiments" refer to a particular feature, structure, or characteristic related to at least one embodiment of this application. Therefore, it should be emphasized and noted that "an embodiment," "one embodiment," or "an alternative embodiment" mentioned twice or more in different positions in this specification do not necessarily refer to the same embodiment. In addition, certain features, structures, or characteristics in one or more embodiments of this application can be appropriately combined.

Claims

1. A text source tracing method based on a large language model with virtual cue word embedding, characterized in that, include: (1) Obtain system prompts and user input, and use a fixed template to combine system prompts and user input to construct chat input command text with role format; (2) Obtain the word segmenter and embedding matrix corresponding to the target chat language model, and use the word segmenter and embedding matrix to convert the chat input command text into the corresponding embedding vector input. The embedding vector input includes the embedding representation corresponding to the system prompt word and the embedding representation corresponding to the user input. (3) Obtain user identity information, convert the user identity information into a fixed-length bit string, input the bit string and the embedding representation corresponding to the system prompt word into the pre-trained watermark encoder to generate a virtual prompt word representation containing m embedding vectors, where m is the length of the predefined watermark message, insert the virtual prompt word representation into the end position of the embedding representation corresponding to the system prompt word to obtain the updated system prompt word embedding representation; wherein, the watermark encoder is composed of several transformer modules connected in sequence, and the watermark encoder encodes the bit string corresponding to the user identity information into the embedding space aligned with the embedding representation corresponding to the system prompt word through an attention mechanism to generate the virtual prompt word representation; (4) Concatenate the updated system prompt word embedding representation with the embedding representation corresponding to the user input, and input it into the target chat large language model to obtain the generated text with user identity information watermark; (5) Input the generated text into a pre-trained watermark extractor to recover the user identity information embedded in the generated text, and trace the source of the generated text based on the recovered user identity information.

2. The large language model text provenance method based on virtual prompt word embedding according to claim 1, characterized in that, In step (1), the fixed template contains predefined instruction words and role definition information. The role definition information includes the identification content corresponding to three types of roles: system, user and model. When constructing the chat input instruction text, the system prompt words and user input are encapsulated according to the identification content corresponding to the system role and user role, respectively.

3. The large language model text provenance method based on virtual prompt word embedding according to claim 1, characterized in that, In step (2), the specific process of converting the chat input command text into the corresponding embedded vector input is as follows: (2.1) The chat input command text is segmented and sliced ​​using the word segmenter to obtain a text sequence composed of several word units; (2.2) Look up the encoding table corresponding to the word segmenter, convert each word in the text sequence into the corresponding word token, and obtain the encoding sequence composed of word tokens; (2.3) Match the corresponding embedding vector from the embedding matrix according to the word identifier to obtain the embedding vector input corresponding to the chat input command text.

4. The text tracing method based on a large language model with virtual prompt word embedding according to claim 1, characterized in that, In step (3), after the virtual prompt word representation is inserted, the updated system prompt word embedding representation and the embedding representation corresponding to the user input are concatenated in sequence, and the dimension of the obtained complete embedding representation matches the input dimension of the target chat large language model.

5. The large language model text provenance method based on virtual prompt word embedding according to claim 1, characterized in that, In step (4), the specific process of obtaining the generated text with the user's identity information watermark is as follows: (4.1) Input the concatenated embedding representation into the target chat language model, and calculate the predicted probability distribution of the next word in the text through the multi-layer self-attention mechanism of the target chat language model; (4.2) Select a word from the predicted probability distribution obtained in step (4.1) as the predicted text for the next position using a pre-defined continuous sampling strategy; (4.3) After concatenating the lexical units obtained in step (4.2) to the input sequence in step (4.1), repeat the process from step (4.1) to step (4.3) until the output length reaches the preset maximum value, or the predefined terminator in the target chat language model vocabulary is obtained by sampling. (4.4) Combine all the resulting lexical units into the generated text with user identity information watermark output by the target chat big language model.

6. The text tracing method based on a large language model with virtual cue word embedding according to claim 5, characterized in that, In step (4), during the process of obtaining the generated text, the distribution loss between the word prediction probability distribution obtained in step (4.1) and the expected answer probability distribution is calculated simultaneously; after obtaining the generated text, the semantic representation of the generated text is extracted through the pre-trained semantic representation model, and the semantic loss between the generated text and the expected answer is calculated. The distribution loss and the semantic loss are both used for the training and optimization of the watermark encoder.

7. The text tracing method based on a large language model with virtual prompt word embedding according to claim 1, characterized in that, In step (5), the watermark extractor is composed of a long short-term memory network and several fully connected layers connected in sequence, and the latent space dimension of the long short-term memory network is consistent with the embedding space dimension of the target chat large language model.

8. The large language model text provenance method based on virtual prompt word embedding according to claim 7, characterized in that, In step (5), the specific process for restoring user identity information is as follows: (5.1) The generated text is converted into the corresponding continuous embedding representation by using a word segmenter and an embedding matrix; (5.2) Input the continuous embedding representations corresponding to the generated text into the Long Short-Term Memory network one by one in the order of the lexical units, and extract the sequence features of the generated text; (5.3) Input the extracted sequence features into the fully connected layer to obtain the prediction results corresponding to each bit of the watermark information. Based on the prediction results, restore the bit string corresponding to the user identity information and finally convert it to obtain the user identity information.

9. The large language model text provenance method based on virtual prompt word embedding according to claim 1, characterized in that, During the training phase of the watermark encoder and watermark extractor, a text rewriting attack simulation module is set up. The text rewriting attack simulation module is built based on a pre-trained large language model. The watermarked text generated by the target chat large language model is input into the text rewriting attack simulation module to obtain the rewritten attack text. The attack text and the original watermarked text are input together into the watermark extractor to train and optimize the watermark recovery capability of the watermark extractor.