LLM dynamic word segmentation optimization method and device fusing a decision model and a super network

By constructing a lightweight merging decision model and a bidirectional long short-term memory network-based hypernetwork prediction model, the problems of lexical rigidity and semantic fragmentation in static word segmentation techniques for large language models are solved, dynamic word segmentation optimization is achieved, and the computational efficiency and semantic understanding ability of the model are improved, making it suitable for long texts and professional domain texts.

CN122242502APending Publication Date: 2026-06-19QINGDAO OCEAN SHIPPING MARINERS COLLEGE

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
QINGDAO OCEAN SHIPPING MARINERS COLLEGE
Filing Date
2026-03-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing static word segmentation techniques for large language models suffer from problems such as lexical rigidity, low computational efficiency, semantic fragmentation, and insufficient adaptive ability without context. In particular, they affect the computational load and output quality of the model when dealing with technical terms and new vocabulary.

Method used

The technical solution is as follows: Step 1: The input text is standardized and segmented using a static word segmenter to generate an initial word sequence; Step 2: A lightweight merging decision model is constructed, which dynamically calculates the semantic relevance between adjacent words based on the word embeddings within the context window, and iteratively merges highly relevant words; Step 3: A hypernetwork prediction model based on a bidirectional long short-term memory network is constructed, which splits each word in the new word list into a character sequence to generate a high-quality set of embedding vectors that are semantically aligned with the original large language model vector space; Step 4: The embedding vectors of the original words in the sequence are concatenated with the embedding vectors of the new words to form the final enhanced embedding representation, which is used as the input to the downstream large language model.

Benefits of technology

It significantly improves the word segmentation quality and adaptability of large language models, reduces the length of input sequences, improves computational efficiency, and enhances the accuracy and generalization ability of models in semantic understanding and text generation tasks. It is suitable for long texts and professional domain texts.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242502A_ABST
    Figure CN122242502A_ABST
Patent Text Reader

Abstract

This invention relates to an LLM dynamic word segmentation optimization method and apparatus that integrates a decision model and a hypernetwork, belonging to the field of natural language processing technology. The method first uses a static word segmenter for standardized segmentation, generating a static sub-word sequence. Then, a lightweight sub-word merging decision model is constructed, dynamically calculating the semantic relevance between adjacent sub-words based on sub-word embeddings within a context window, and merging highly relevant sub-words in real time to generate new vocabulary. Next, a hypernetwork prediction module is constructed to transform the iteratively generated new words into new embedding vectors. Finally, the statically and dynamically generated embedding vectors are concatenated to output a shorter, semantically stronger embedding representation. This method, by constructing a learnable dynamic word segmenter, effectively solves the problems of excessive word segmentation, weak cross-linguistic capabilities, and poor adaptability to noise and new words caused by fixed segmentation mechanisms in traditional large language models, significantly improving the computational efficiency and generalization ability of the model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of natural language processing technology, and in particular relates to an LLM dynamic word segmentation optimization method and device that integrates decision models and hypernetworks. Background Technology

[0002] With the rapid development of natural language processing technology, large language models (LLMs) have been widely used in tasks such as text generation and text classification. Word segmentation, as a core and fundamental step in the preprocessing of large language models, plays a crucial role in dividing continuous raw text into semantically complete and granularly appropriate sub-words or lexical units. The quality and adaptability of word segmentation directly determine the accuracy of the large language model in capturing text semantics, its processing efficiency, and the quality of its final output. Currently, the word segmentation preprocessing stage of mainstream large language models mainly adopts static word segmentation technology. The core characteristic of this method is its reliance on a pre-defined fixed vocabulary. Once the segmentation rules are determined, they remain fixed, and the input text is segmented in a standardized manner strictly according to the granularity of the pre-defined vocabulary.

[0003] Although such methods are relatively simple to train, their inherent static word segmentation and rigid vocabulary characteristics are increasingly becoming a bottleneck restricting the performance of large language models in practical deployments. The main problems are as follows: The rigid vocabulary and low computational efficiency of static word segmenters, which perform static segmentation according to a pre-set vocabulary, may mechanically over-segment specialized terms, new words, and cross-linguistic words, leading to an unnecessary increase in sequence length. This increases the input computation of large language models, reduces the model's inference speed, and the meaningless subwords generated by over-segmentation can interfere with the model's capture of core semantics, indirectly affecting the quality of the model's output. This problem is particularly prominent in long text and specialized domain text processing scenarios.

[0004] Semantic fragmentation and lack of context-adaptive capabilities: Static word segmentation is based solely on vocabulary statistical features and lacks the ability to perceive semantic relationships within context. It easily breaks down semantically close compound terms and fixed collocations into isolated sub-words, thus destroying semantic integrity. At the same time, the segmentation rules are fixed and cannot be dynamically adjusted according to different contexts, making it difficult for the model to accurately capture the core semantics of the text, directly affecting the accuracy and coherence of semantic understanding and text generation.

[0005] Therefore, there is an urgent need for a word segmentation optimization method that can overcome the limitations of static word segmentation technology, dynamically perceive contextual semantics during the word segmentation stage, intelligently decide on merging adjacent sub-words, and adaptively generate new word embeddings. Summary of the Invention

[0006] (a) Purpose of the invention To overcome the above shortcomings, the present invention aims to provide an LLM dynamic word segmentation optimization method and apparatus that integrates decision-making models and hypernetworks to solve the above technical problems.

[0007] (II) Technical Solution To achieve the above objectives, the technical solution provided in this application is as follows: A dynamic word segmentation optimization method for LLM that integrates decision-making models and hypernetworks includes the following steps: Step 1: After cleaning the input text D, use a static word segmenter to perform standardized segmentation, generate static sub-word sequences, and construct an initial sequence set. It provides the basic processing unit for dynamic optimization; Step 2: Construct a lightweight merging decision model, dynamically calculate the semantic relevance between adjacent words based on the word embedding within the context window, and iteratively merge highly relevant words until the optimal sequence S' that fits the current text is obtained; Step 3: Construct a supernetwork prediction model based on a bidirectional long short-term memory network. This module splits each word in the new word list into a character sequence and generates a high-quality set of embedding vectors that are semantically aligned with the original large language model vector space through bidirectional encoding. ; Step 4: Concatenate the embedding vectors of the original words in sequence S' with the embedding vectors of the new words to reconstruct the positional encoding and form the final enhanced embedding representation, which serves as the input to the downstream large language model.

[0008] Preferably, step 1 specifically includes: Step 1.1: Define T as a single input text to be processed, satisfying the relation ,in This is a unique identifier for the text. The text content obtained from the original data; Step 1.2: Define D as the text dataset to be processed, satisfying the relation ,in For the a-th text to be processed in the dataset, For dataset length, variables ; Step 1.3: Perform data cleaning on D, including removing null values, deduplicating sentences, and filtering meaningless characters, to obtain the cleaned standard dataset: ; in For the a1th cleaned text in the dataset, For dataset length, variables ; Step 1.4: Load the static tokenizer and process the dataset. Perform static word segmentation on each piece of text in it to generate an initial set of sub-word sequences: S ={S1, S2, …, S a2 , …, S L(Sstatic)}; where S a2 is the word segmentation sequence of the a2-th text, and L(S static ) = L(D clean ), and the variable a2 ∈ [1, L(S static )].

[0009] Preferably, the specific steps of step 2 include: Step 2.1: Initialize the dynamic word segmentation result set S dynamic , and let S dynamic = S static , S dynamic = {S1, S2, …, S k , …, S L(Sdynamic)}; Step 2.2: Define the merging decision threshold θ, where the threshold θ ∈ (0, 1), and initialize the batch sequence index k = 1; Step 2.3: If k ≤ L(S dynamic ), then execute the next step, otherwise jump to step 2.17; Step 2.4: Obtain the k-th sequence in S dynamic , where represents the j-th sub-word in the sequence, n represents the number of current sub-words, initialize the position index j = 1, and initialize the set M k of pending merge flags to be empty; Step 2.5: If j < n, then execute the next step, otherwise jump to step 2.15; Step 2.6: Extract the sub-word and the next adjacent sub-word pair (t j , t j+1 ) at the j-th position in the sequence S k , construct a local context window centered on this pair, and look up the vectors corresponding to all sub-words in the window from the pre-trained static embedding table, and stack them in order to form a local context window matrix , where is the window size, is the word embedding dimension; Step 2.7: Input the feature matrix F j into the CNN decision model; Step 2.8: Perform convolution calculation in the convolutional layer to obtain the convolved feature map, and the calculation formula is , where K c is the weight matrix of the c-th convolutional kernel, b​​c The bias is indicated by *, which represents the convolution operation. ReLU is the activation function, and Z is the bias value. c The output feature map; Step 2.9: Perform global max pooling in the pooling layer to capture salient features. The calculation formula is as follows: P=[p1,p2,…,p C ], where p c Z represents c The maximum value of the feature map, where P is the feature vector concatenated from the pooling results; Step 2.10: Perform feature integration and decision-making in the fully connected layer, outputting the decision score between the current word and its next neighboring word. The calculation formulas are as follows: , ,in and This is the weight matrix. and The bias is denoted by ReLU, the activation function is denoted by h, the hidden layer output is denoted by z, and the decision score is denoted by z. Step 2.11: Map the decision score z to the merging probability Y in the output layer. j The calculation formula is: ; Sigmoid is the activation function. ∈[0,1]; Step 2.12: If If the value is ≥θ, proceed to the next step; otherwise, proceed to step 2.14. Step 2.13: Add position index j to the set of flags to be merged M k Update the loop variable Return to step 2.5 and continue scanning the next sub-word pair; Step 2.14: Retain the atomic word pairs, update the loop variable j=j+1, and return to step 2.5; Step 2.15: Perform the merge operation and check the merge mark set M. k If M k If M is empty, it means there are no merged items in the current sequence, so proceed to step 2.16; if M k If not empty, sequentially add S k All word pairs (t) corresponding to marker position j in the middle j ,t j+1 Combined into a new word: ; Use t new Replace the original subwords to generate the updated sequence. ,make Reset j=1, clear M kReturn to step 2.5 for the next iteration, until M. k Empty; Step 2.16: Then return to step 2.3 to calculate the next sequence; Step 2.17: End all iterative merging processes and output the optimized sequence set. , ; Step 2.18: Output S' as the final optimized sequence to the next step.

[0010] Preferably, step 3 specifically includes: Step 3.1: Obtain the sequence set S', and establish and initialize the embedding mapping table E. map This is used to store the generated dynamic word embeddings, with an initial index k=1; Step 3.2: If k≤m, proceed to the next step; otherwise, go to step 3.17. Step 3.3: Obtain the k-th sequence of S' Initialize index j=1; Step 3.4: If j ≤ len( If the condition is met, proceed to the next step; otherwise, skip to step 3.16. Step 3.5: Inspection The j-th word t j , t j If the merged words are new words, proceed to the next step; otherwise, skip to step 3.15. Step 3.6: Query the embedding mapping table E map If the current word t j If the table contains the embedding vector, then the embedding vector is taken directly. If it does not exist, the embedding prediction process is executed in the next step. Step 3.7: Add the new word t j Split into character sequences according to character granularity. , where L is the number of characters; Step 3.8: Extract each character 'c' i Mapped to embedding vector e i ; Step 3.9: [The text appears to be incomplete and contains several grammatical errors. A more accurate translation would require the full context.] A supernetwork that is input into a bidirectional long short-term memory network; Step 3.10: Calculate the forward LSTM output at the current time. Calculation formula ,in Let be the input vector at time i. This represents the forward hidden state at time i-1; Step 3.11: Calculate the inverse LSTM output at the current time. The calculation formula is: ;in The reverse hidden state at time i+1; Step 3.12: Concatenate the outputs of the final step to generate... , where [;] indicates splicing; Step 3.13: Set h final Through a projection layer, the dimensions are mapped to dimensions aligned with the original language model's embedding vectors, generating new embedding vectors. The calculation formula is... , where W is the weight matrix and b is the bias; Step 3.14: (t) j ,e new_j Key-value pairs are stored in the embedded mapping table E. map ; Step 3.15: j = j + 1, return to step 3.5 to process the next word; Step 3.16: k = k + 1, return to step 3.2 to process the next sequence; Step 3.17: End the hypernetwork embedding prediction and output the mapping table E that has generated embedding vectors for all new words. map This is used for integration in the next step.

[0011] Preferably, step 4 specifically includes: Step 4.1: Receive the embedded mapping table E from step 3.17 map Receive the optimized sequence set from step 2.18 Initialize the sequence set S final Initialize the sequence index k=1; Step 4.2: If k≤m, proceed to the next step; otherwise, go to step 4.11. Step 4.3: Obtain the k-th sequence in S' Initialize the corresponding embedding matrix E k Initialize index j=1; Step 4.4: If j≤n, proceed to the next step; otherwise, go to step 4.9. Step 4.5: Look up the word t j If t j In mapping table E map If a word exists, meaning it is a newly generated word, then the corresponding embedding vector e is found. new_j If it does not exist, it is considered an original static word, and the embedding vector e is retrieved from the model's pre-trained embedding table. j The generated or found vector is denoted as v. j ; Step 4.6: Based on the current sequence S' k Given a new length n, regenerate the corresponding position code p.j ; Step 4.7: Calculate the final input Add it to matrix E k ; Step 4.8: j = j + 1, return to step 4.4; Step 4.9: Embed the sequence into matrix E k and S' k Add to the enhanced sequence set S final ; Step 4.10: k = k + 1, return to step 4.2; Step 4.11: End the embedding integration process and output the final enhanced sequence set S. final and its complete embedded information; Step 4.12: Place S final As optimized input, it is directly fed into a large downstream language model for subsequent tasks.

[0012] An LLM dynamic word segmentation optimization device that integrates a decision model and a hypernetwork includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the computer program is loaded onto the processor, it implements the aforementioned LLM dynamic word segmentation optimization method that integrates a decision model and a hypernetwork.

[0013] Beneficial effects: This invention discloses an LLM dynamic word segmentation optimization method and apparatus based on a fusion decision model and a hypernetwork. It innovatively addresses the shortcomings of static word segmentation by constructing a dynamic word segmentation optimization system for large language models through a fusion decision merging model and a hypernetwork prediction module. This effectively overcomes the technical limitations of static word segmentation, significantly improves the quality and adaptability of large language model preprocessing, and thus enhances the overall processing efficiency of large language models. Specific beneficial effects are as follows: 1. This invention achieves dynamic determination of sub-word merging through semantic features within the context window. This effectively solves the problems of rigid vocabulary and lack of adaptability in existing static word segmentation technology. It can also accurately adapt to the segmentation needs of professional terms, new words, and texts in various vertical fields, avoiding the redundancy of sequence length caused by mechanical over-segmentation, significantly compressing the length of the input sequence, and improving the inference and computation efficiency of large language models. Furthermore, it makes up for the lack of adaptive capability of static word segmentation by merging sub-words into semantically closely related lexical units through merging decisions, allowing large language models to accurately capture the core semantics of the text and improve their processing accuracy in tasks such as semantic understanding and text generation.

[0014] 2. This invention transforms new words generated during dynamic word segmentation into high-quality embedding representations by constructing a hypernetwork. This module captures the morphological structure of new words through character-level encoding and accurately maps them into the original semantic vector space of the large language model. This not only eliminates the problems of "embedding missing" or "representation noise" caused by word segmentation changes and ensures the overall consistency and accuracy of the input vector sequence, but also enables the large language model to dynamically adapt to various emerging words and professional terms without retraining, greatly enhancing the model's generalization ability and practicality in various fields. Attached Figure Description

[0015] Figure 1 This is the overall flowchart of the present invention; Figure 2 This is an overall structural diagram of the present invention. Detailed Implementation

[0016] To make the objectives, technical solutions, and advantages of this invention clearer, the following detailed embodiments are described in conjunction with the appendix. Figure 1-2 The present invention will be described in further detail below. It should be understood that these descriptions are merely exemplary and not intended to limit the scope of the invention. Furthermore, descriptions of well-known structures and techniques are omitted in the following description to avoid unnecessarily obscuring the concept of the invention.

[0017] This invention provides an LLM dynamic word segmentation optimization method that integrates decision-making models and hypernetworks, comprising the following steps: Step 1: After cleaning the input text D, use a static word segmenter to perform standardized segmentation, generate static sub-word sequences, and construct an initial sequence set. It provides the basic processing unit for dynamic optimization; Preferably, step 1 specifically includes: Step 1.1: Define T as a single input text to be processed, satisfying the relation ,in This is a unique identifier for the text. The text content obtained from the original data; Step 1.2: Define D as the text dataset to be processed, satisfying the relation ,in For the a-th text to be processed in the dataset, For dataset length, variables ; Step 1.3: Perform data cleaning on D, including removing null values, deduplicating sentences, and filtering meaningless characters, to obtain the cleaned standard dataset: ; in For the a1th cleaned text in the dataset, is the length of the dataset, a variable ; Step 1.4: Load the static tokenizer and perform static tokenization on each piece of text in the dataset to generate an initial set of sub-word sequences: S static = {S1, S2, …, S a2 , …, S L(Sstatic)}; where S a2 is the tokenization sequence of the a2-th text, and L(S static ) = L(D clean ), and the variable a2 ∈ [1, L(S static )].

[0018] Step 2: Build a lightweight merging decision model, dynamically calculate the semantic correlation degree between adjacent sub-words based on the sub-word embeddings within the context window, and repeatedly iterate to merge sub-words with high correlation until the optimal sequence S' suitable for the current text is obtained; Preferably, the specific steps of step 2 include: Step 2.1: Initialize the dynamic tokenization result set S dynamic , and let S dynamic = S static , and S dynamic = {S1, S2, …, S k , …, S L(Sdynamic)}; Step 2.2: Define the merging decision threshold θ, where θ ∈ (0, 1), and initialize the batch sequence index k = 1; Step 2.3: If k ≤ L(S dynamic ), then execute the next step, otherwise jump to step 2.17; Step 2.4: Obtain the k-th sequence in S dynamic , where represents the j-th sub-word in the sequence, n represents the number of current sub-words, initialize the position index j = 1, and initialize the set M k of marks to be merged as empty; Step 2.5: If j < n, then execute the next step, otherwise jump to step 2.15; Step 2.6: Extract the sub-word at the j-th position and the next adjacent sub-word pair (t k j , t j+1 ) in the sequence S, construct a local context window centered on this pair, look up the vectors corresponding to all sub-words within the window from the pre-trained static embedding table, and stack them in order to form the local context window matrix ; where For window size, Word embedding dimension; Step 2.7: Convert the feature matrix F j Input into the CNN decision model; Step 2.8: Perform convolution calculations in the convolutional layer to obtain the convolutional feature map. The calculation formula is as follows: K c Let b be the weight matrix of the c-th convolutional kernel. c The bias is indicated by *, which represents the convolution operation. ReLU is the activation function, and Z is the bias value. c The output feature map; Step 2.9: Perform global max pooling in the pooling layer to capture salient features. The calculation formula is as follows: P=[p1,p2,…,p C ], where p c Z represents c The maximum value of the feature map, where P is the feature vector concatenated from the pooling results; Step 2.10: Perform feature integration and decision-making in the fully connected layer, outputting the decision score between the current word and its next neighboring word. The calculation formulas are as follows: , ,in and This is the weight matrix. and The bias is denoted by ReLU, the activation function is denoted by h, the hidden layer output is denoted by z, and the decision score is denoted by z. Step 2.11: Map the decision score z to the merging probability Y in the output layer. j The calculation formula is: ; Sigmoid is the activation function. ∈[0,1]; Step 2.12: If If the value is ≥θ, proceed to the next step; otherwise, proceed to step 2.14. Step 2.13: Add position index j to the set of flags to be merged M k Update the loop variable Return to step 2.5 and continue scanning the next sub-word pair; Step 2.14: Retain the atomic word pairs, update the loop variable j=j+1, and return to step 2.5; Step 2.15: Perform the merge operation and check the merge mark set M. k If M k If M is empty, it means there are no merged items in the current sequence, so proceed to step 2.16; if M k If not empty, sequentially add S kAll word pairs (t) corresponding to marker position j in the middle j ,t j+1 Combined into a new word: ; Use t new Replace the original subwords to generate the updated sequence. ,make Reset j=1, clear M k Return to step 2.5 for the next iteration, until M. k Empty; Step 2.16: Then return to step 2.3 to calculate the next sequence; Step 2.17: End all iterative merging processes and output the optimized sequence set. , ; Step 2.18: Output S' as the final optimized sequence to the next step.

[0019] Step 3: Construct a supernetwork prediction model based on a bidirectional long short-term memory network. This module splits each word in the new word list into a character sequence and generates a high-quality set of embedding vectors that are semantically aligned with the original large language model vector space through bidirectional encoding. ; Preferably, step 3 specifically includes: Step 3.1: Obtain the sequence set S', and establish and initialize the embedding mapping table E. map This is used to store the generated dynamic word embeddings, with an initial index k=1; Step 3.2: If k≤m, proceed to the next step; otherwise, go to step 3.17. Step 3.3: Obtain the k-th sequence of S' Initialize index j=1; Step 3.4: If j ≤ len( If the condition is met, proceed to the next step; otherwise, skip to step 3.16. Step 3.5: Inspection The j-th word t j , t j If the merged words are new words, proceed to the next step; otherwise, skip to step 3.15. Step 3.6: Query the embedding mapping table E map If the current word t j If the table contains the embedding vector, then the embedding vector is taken directly. If it does not exist, the embedding prediction process is executed in the next step. Step 3.7: Add the new word t j Split into character sequences according to character granularity. , where L is the number of characters; Step 3.8: Extract each character 'c' i Mapped to embedding vector e i ; Step 3.9: [The text appears to be incomplete and contains several grammatical errors. A more accurate translation would require the full context.] A supernetwork that is input into a bidirectional long short-term memory network; Step 3.10: Calculate the forward LSTM output at the current time. Calculation formula ,in Let be the input vector at time i. This represents the forward hidden state at time i-1; Step 3.11: Calculate the inverse LSTM output at the current time. The calculation formula is: ;in The reverse hidden state at time i+1; Step 3.12: Concatenate the outputs of the final step to generate... , where [;] indicates splicing; Step 3.13: Set h final Through a projection layer, the dimensions are mapped to dimensions aligned with the original language model's embedding vectors, generating new embedding vectors. The calculation formula is... , where W is the weight matrix and b is the bias; Step 3.14: (t) j ,e new_j Key-value pairs are stored in the embedded mapping table E. map ; Step 3.15: j = j + 1, return to step 3.5 to process the next word; Step 3.16: k = k + 1, return to step 3.2 to process the next sequence; Step 3.17: End the hypernetwork embedding prediction and output the mapping table E that has generated embedding vectors for all new words. map This is used for integration in the next step.

[0020] Step 4: Concatenate the embedding vectors of the original words in sequence S' with the embedding vectors of the new words to reconstruct the positional encoding and form the final enhanced embedding representation, which serves as the input to the downstream large language model.

[0021] Preferably, step 4 specifically includes: Step 4.1: Receive the embedded mapping table E from step 3.17 map Receive the optimized sequence set from step 2.18 Initialize the sequence set S final Initialize the sequence index k=1; Step 4.2: If k≤m, proceed to the next step; otherwise, go to step 4.11. Step 4.3: Obtain the k-th sequence in S' Initialize the corresponding embedding matrix E k Initialize index j=1; Step 4.4: If j≤n, proceed to the next step; otherwise, go to step 4.9. Step 4.5: Look up the word t j If t j In mapping table E map If a word exists, meaning it is a newly generated word, then the corresponding embedding vector e is found. new_j If it does not exist, it is considered an original static word, and the embedding vector e is retrieved from the model's pre-trained embedding table. j The generated or found vector is denoted as v. j ; Step 4.6: Based on the current sequence S' k Given a new length n, regenerate the corresponding position code p. j ; Step 4.7: Calculate the final input Add it to matrix E k ; Step 4.8: j = j + 1, return to step 4.4; Step 4.9: Embed the sequence into matrix E k and S' k Add to the enhanced sequence set S final ; Step 4.10: k = k + 1, return to step 4.2; Step 4.11: End the embedding integration process and output the final enhanced sequence set S. final and its complete embedded information; Step 4.12: Place S final As optimized input, it is directly fed into a large downstream language model for subsequent tasks.

[0022] An LLM dynamic word segmentation optimization device that integrates a decision model and a hypernetwork includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the computer program is loaded onto the processor, it implements the aforementioned LLM dynamic word segmentation optimization method that integrates a decision model and a hypernetwork.

[0023] Table 1 Variable Description Table This invention constructs a complete LLM dynamic word segmentation optimization framework by integrating a merging decision model and a hypernetwork prediction model, effectively solving the limitations of traditional static word segmentation in terms of semantic coherence, out-of-vocabulary (OOV) processing, and embedding vector alignment. Specifically, the merging decision model dynamically merges sub-words based on local contextual semantic relevance, preserving the semantic integrity of the text and improving the inference efficiency of downstream models by reducing the number of sub-words. The hypernetwork prediction model, for dynamically generated new words, generates embedding vectors aligned with the original model's vector space through character-level bidirectional encoding, ensuring semantic consistency of new word embeddings and avoiding the embedding vector mismatch problem in traditional dynamic word segmentation. The synergistic effect of the two models enables LLM to maintain semantic understanding accuracy while improving processing speed when processing complex text, making it particularly suitable for scenarios with high word segmentation accuracy requirements, such as long texts and professional domain texts.

[0024] In practical applications, this method can be seamlessly integrated into various LLM architectures (such as Transformer-based models). By replacing the original static word segmentation module, it does not require large-scale modifications to the main structure of the model, exhibiting strong compatibility and scalability. For example, in natural language understanding tasks, the optimized word segmentation sequence can more accurately capture long-distance semantic dependencies; in machine translation tasks, dynamically merged sub-words can reduce ambiguity during translation and improve translation quality. Furthermore, the device described in this invention is implemented using a general-purpose memory and processor architecture, and can be deployed on cloud servers or local computing devices to meet the computing power requirements of different scenarios.

[0025] In summary, this invention provides an efficient and flexible solution for LLM word segmentation optimization through innovative technical means. Its technical solution not only has theoretical innovation but also good engineering implementation value, and can provide strong support for improving the performance of models in the field of natural language processing.

[0026] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0027] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A dynamic word segmentation optimization method for LLM that integrates decision-making models and hypernetworks, characterized in that, It includes the following steps: Step 1: After cleaning the input text D, use a static word segmenter to perform standardized segmentation, generate static sub-word sequences, and construct an initial sequence set. It provides the basic processing unit for dynamic optimization; Step 2: Construct a lightweight merging decision model, dynamically calculate the semantic correlation degree between adjacent subwords based on the subword embeddings within the context window, and repeatedly iterate to merge subwords with high correlation until the optimal sequence S' suitable for the current text is obtained; Step 3: Construct a supernetwork prediction model based on a bidirectional long short-term memory network. This module splits each word in the new word list into a character sequence and generates a high-quality set of embedding vectors that are semantically aligned with the original large language model vector space through bidirectional encoding. ; Step 4: Concatenate the embedding vectors of the original subwords in the sequence S' with the embedding vectors of the new words, reconstruct the position encoding, form the final enhanced embedding representation, and use it as the input for the downstream large language model.

2. The LLM dynamic word segmentation optimization method integrating decision-making model and hypernetwork as described in claim 1, characterized in that, The specific steps of Step 1 include: Step 1.1: Define T as a single input text to be processed, satisfying the relation ,in This is a unique identifier for the text. The text content obtained from the original data; Step 1.2: Define D as the text dataset to be processed, satisfying the relation ,in For the a-th text to be processed in the dataset, For dataset length, variable ; Step 1.3: Perform data cleaning on D, including removing null values, de-duplicating sentences, and filtering meaningless characters, to obtain a cleaned standard data set: ; in For the a1th cleaned text in the dataset, For dataset length, variable ; Step 1.4: Load the static tokenizer and process the dataset. Each text in the dataset undergoes static word segmentation to generate an initial set of sub-word sequences: S static ={S1,S2,…,S a2 ,…,S L(Sstatic) }; Where S a2 Let L(S) be the word segmentation sequence of the a2th text. static )=L(D clean ), variable a2∈[1,L(S static )).

3. The LLM dynamic word segmentation optimization method integrating decision-making model and hypernetwork as described in claim 1, characterized in that, The specific steps of Step 2 include: Step 2.1: Initialize the dynamic word segmentation result set S dynamic S dynamic =S static S dynamic ={S1,S2,…,S k ,…,S L(Sdynamic) }; Step 2.2: Define the merging decision threshold θ, where θ ∈ (0, 1), and initialize the batch sequence index k = 1; Step 2.3: If k≤L(S) dynamic If the condition is met, proceed to the next step; otherwise, skip to step 2.

17. Step 2.4: Obtain S dynamic The k-th sequence ,in Let j represent the j-th word in the sequence, and n represent the number of the current word. Initialize the position index j=1, and initialize the set of tags to be merged M. k Empty; Step 2.5: If j < n, then execute the next step, otherwise jump to Step 2.15; Step 2.6: Extract sequence S k The word at position j and its next adjacent word pair (t) j ,t j+1 A local context window is constructed centered on this pair. The vectors corresponding to all subwords within the window are retrieved from the pre-trained static embedding table and stacked in order to form a local context window matrix. ,in For window size, Word embedding dimension; Step 2.7: Convert the feature matrix F j Input into the CNN decision model; Step 2.8: Perform convolution calculations in the convolutional layer to obtain the convolutional feature map. The calculation formula is as follows: K c Let b be the weight matrix of the c-th convolutional kernel. c The bias is indicated by *, which represents the convolution operation. ReLU is the activation function, and Z is the bias value. c The output feature map; Step 2.9: Perform global max pooling in the pooling layer to capture salient features. The calculation formula is as follows: P=[p1,p2,…,p C ], where p c Z represents c The maximum value of the feature map, where P is the feature vector concatenated from the pooling results; Step 2.10: Perform feature integration and decision-making in the fully connected layer, outputting the decision score between the current word and its next neighboring word. The calculation formulas are as follows: , ,in and This is the weight matrix. and The bias is denoted by ReLU, the activation function is denoted by h, the hidden layer output is denoted by z, and the decision score is denoted by z. Step 2.11: Map the decision score z to the merging probability Y in the output layer. j The calculation formula is: ; Sigmoid is the activation function. ∈[0,1]; Step 2.12: If If the value is ≥θ, proceed to the next step; otherwise, proceed to step 2.

14. Step 2.13: Add position index j to the set of flags to be merged M k Update the loop variable Return to step 2.5 and continue scanning the next sub-word pair; Step 2.14: Retain the atomic word pair, update the loop variable j = j + 1, and return to Step 2.5; Step 2.15: Perform the merge operation and check the merge mark set M. k If M k If M is empty, it means there are no merged items in the current sequence, so proceed to step 2.16; if M k If not empty, sequentially add S k All word pairs (t) corresponding to marker position j in the middle j ,t j+1 Combined into a new word: ; Use t new Replace the original subwords to generate the updated sequence. ,make Reset j=1, clear M k Return to step 2.5 for the next iteration, until M. k Empty; Step 2.16: Then return to step 2.3 to calculate the next sequence; Step 2.17: End all iterative merging processes and output the optimized sequence set. , ; Step 2.18: Output S' as the final optimized sequence to the next step.

4. The LLM dynamic word segmentation optimization method integrating decision-making model and hypernetwork as described in claim 1, characterized in that, The specific steps of Step 3 include: Step 3.1: Obtain the sequence set S', and establish and initialize the embedding mapping table E. map This is used to store the generated dynamic word embeddings, with an initial index k=1; Step 3.2: If k ≤ m, then execute the next step, otherwise jump to Step 3.17; Step 3.3: Obtain the k-th sequence of S' Initialize index j=1; Step 3.4: If j ≤ len( If the condition is met, proceed to the next step; otherwise, skip to step 3.

16. Step 3.5: Inspection The j-th word t j , t j If the merged words are new words, proceed to the next step; otherwise, skip to step 3.

15. Step 3.6: Query the embedding mapping table E map If the current word t j If the table contains the embedding vector, then the embedding vector is taken directly. If it does not exist, the embedding prediction process is executed in the next step. Step 3.7: Add the new word t j Split into character sequences at the character granularity. , where L is the number of characters; Step 3.8: Extract each character 'c' i Mapped to embedding vector e i ; Step 3.9: [The text appears to be incomplete and contains several grammatical errors. A more accurate translation would require A supernetwork that is input into a bidirectional long short-term memory network; Step 3.10: Calculate the forward LSTM output at the current time. Calculation formula ,in Let be the input vector at time i. This represents the forward hidden state at time i-1; Step 3.11: Calculate the inverse LSTM output at the current time. The calculation formula is: ;in The reverse hidden state at time i+1; Step 3.12: Concatenate the outputs of the final step to generate... , where [;] indicates splicing; Step 3.13: Set h final Through a projection layer, the dimensions are mapped to dimensions aligned with the original language model's embedding vectors, generating new embedding vectors. The calculation formula is... , where W is the weight matrix and b is the bias; Step 3.14: (t) j ,e new_j Key-value pairs are stored in the embedded mapping table E. map ; Step 3.15: j = j + 1, and return to Step 3.5 to process the next word; Step 3.16: k = k + 1, and return to Step 3.2 to process the next sequence; Step 3.17: End the hypernetwork embedding prediction and output the mapping table E that has generated embedding vectors for all new words. map This is used for integration in the next step.

5. The LLM dynamic word segmentation optimization method integrating a decision model and a hypernetwork as described in claim 1, characterized in that, The specific steps of Step 4 include: Step 4.1: Receive the embedded mapping table E from step 3.17 map Receive the optimized sequence set from step 2.18 Initialize the sequence set S final Initialize the sequence index k=1; Step 4.2: If k ≤ m, then execute the next step, otherwise jump to Step 4.11; Step 4.3: Obtain the k-th sequence in S' Initialize the corresponding embedding matrix E k Initialize index j=1; Step 4.4: If j ≤ n, then execute the next step, otherwise jump to Step 4.9; Step 4.5: Look up the word t j If t j In mapping table E map If a word exists, meaning it is a newly generated word, then the corresponding embedding vector e is found. new_j If it does not exist, it is considered an original static word, and the embedding vector e is retrieved from the model's pre-trained embedding table. j The generated or found vector is denoted as v. j ; Step 4.6: Based on the current sequence S' k Given a new length n, regenerate the corresponding position code p. j ; Step 4.7: Calculate the final input Add it to matrix E k ; Step 4.8: j = j + 1, and return to Step 4.4; Step 4.9: Embed the sequence into matrix E k and S' k Add to the enhanced sequence set S final ; Step 4.10: k = k + 1, and return to Step 4.2; Step 4.11: End the embedding integration process and output the final enhanced sequence set S. final and its complete embedded information; Step 4.12: Place S final As optimized input, it is directly fed into a large downstream language model for subsequent tasks.

6. An LLM dynamic word segmentation optimization device integrating decision-making models and hypernetworks, characterized in that, It includes a memory, a processor, and a computer program stored on the memory and executable on the processor. When the computer program is loaded into the processor, it implements the above-mentioned LLM dynamic word segmentation optimization method integrating a decision model and a hypernetwork.