Semantic reasoning method and device, electronic equipment, and storage medium
By dividing text information into sentences and performing reasoning within the sentence-level latent space, and utilizing dynamic semantic autoencoders and gating fusion mechanisms, the problem of low computational efficiency of large language models on edge devices is solved, achieving an efficient reasoning process.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING UNIV OF POSTS & TELECOMM
- Filing Date
- 2026-03-16
- Publication Date
- 2026-06-19
AI Technical Summary
The reasoning process of large language models is computationally inefficient on edge devices, resulting in resource constraints and making them unsuitable for effective application in real-world business.
Text information is divided into sentences, encoded, and semantically extracted. Reasoning is performed in the sentence-level latent space using a large language model. Sentence-level reasoning is achieved through dynamic semantic autoencoders and gating fusion mechanisms, thus shortening the input and reasoning sequences.
It improves the inference efficiency of large language models, reduces the computational resource requirements, and is suitable for resource-constrained edge devices.
Smart Images

Figure CN122242740A_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of artificial intelligence technology, and more specifically, relates to a semantic reasoning method and apparatus, electronic device, and storage medium. Background Technology
[0002] A large language model-based reasoning system is a computer processing system that utilizes deep learning architecture and natural language processing methods to understand and generate text sequences. With the maturity of large language model technology, its application scenarios are expanding. Applying large model reasoning systems to real-world business applications allows for the capture, extraction, association, and prediction of linguistic features such as intent and entities by analyzing user input prompt word sequences. Furthermore, it enables the analysis and judgment of logical patterns in the extracted, associated, and predicted semantic information, thereby completing the generation and interactive response of various complex text contents. Simultaneously, it provides various decision support and intelligent scheduling related to business needs, achieving intelligent human-computer collaboration.
[0003] However, large language models perform inference by generating text word by word. This inference pattern, where only a single lexical unit is generated at each step, consumes a significant amount of memory and inference time. This high computational cost hinders efficiency and scalability in practical applications, especially on edge devices, where this inefficiency is often incompatible with resource constraints. Therefore, it is necessary to further improve the computational efficiency of large language model inference. Summary of the Invention
[0004] The purpose of this application is to provide a semantic reasoning method, apparatus, electronic device, and storage medium to improve the computational efficiency of reasoning for large language models.
[0005] A first aspect of this application provides a semantic reasoning method, including: Obtain target text information, divide the target text information into segments to obtain each sentence in the target text information, encode each sentence to obtain multiple word features of each sentence; Semantic extraction is performed based on multiple lexical features of each sentence to obtain the first semantic feature of each sentence; Based on the large language model, reasoning is performed on the first semantic features corresponding to each of the multiple sentences to obtain the second semantic features corresponding to each sentence; Decode the second semantic features corresponding to each sentence to obtain the semantic reasoning result for each sentence; The semantic reasoning results corresponding to each of the multiple sentences are used as the semantic reasoning results for the target text information.
[0006] A second aspect of this application provides a semantic reasoning apparatus, comprising: The encoding module is used to acquire target text information, divide the target text information into segments to obtain each sentence in the target text information, encode each sentence, and obtain multiple word features for each sentence; The semantic extraction module is used to extract semantics based on multiple lexical features of each sentence, and obtain the first semantic feature of each sentence; The semantic reasoning module is used to reason about the first semantic features of multiple sentences based on the large language model, and obtain the second semantic features of each sentence. The decoding module is used to decode the second semantic features corresponding to each sentence to obtain the semantic reasoning result corresponding to each sentence; The result output module is used to take the semantic reasoning results corresponding to multiple sentences as the semantic reasoning results for the target text information.
[0007] A third aspect of this application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the steps of the semantic reasoning method described above.
[0008] A fourth aspect of this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the semantic reasoning method described above.
[0009] The beneficial effects of the semantic reasoning method, apparatus, electronic device, and storage medium provided in this application are as follows: This embodiment divides the target text information into sentences, encodes each sentence, compresses each sentence into a single latent representation (i.e., multiple lexical features), and inputs these multiple lexical features into a large language model to achieve sentence-level reasoning within the latent space. The method in this embodiment leverages the dense nature of latent representations to shorten the input and reasoning sequences, thereby improving the reasoning efficiency of the large language model. Attached Figure Description
[0010] To more clearly illustrate the technical solutions in the embodiments of this application, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0011] Figure 1 A flowchart illustrating a semantic reasoning method provided in an embodiment of this application; Figure 2A schematic diagram illustrating the principle of a semantic reasoning method provided in an embodiment of this application; Figure 3 This is a structural block diagram of a semantic reasoning device provided in an embodiment of this application; Figure 4 This is a schematic block diagram of an electronic device provided in an embodiment of this application. Detailed Implementation
[0012] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of this application. However, those skilled in the art will understand that this application may also be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods have been omitted so as not to obscure the description of this application with unnecessary detail.
[0013] To make the objectives, technical solutions, and advantages of this application clearer, the following description will be provided in conjunction with the accompanying drawings and specific embodiments.
[0014] In related technologies, given a basic large language model and... n The input text consists of several sentences, which are then segmented and embedded to obtain... m Word-level embedding vectors ,in, Indicates the first n a sentence Embedded vector, and Large language models process these embedding vectors and predict the next tag, generating a language model based on these embedding vectors. A tag-by-tag sentence composed of tags ,in, This represents the q-th sentence. one tag, and .
[0015] To address the issue of excessively long input and output lengths in large language models, this embodiment proposes enabling the large language model to perform inference within a sentence-level latent space, thereby reducing the input length from m to n and the output length from p to q. To achieve this goal, the following two functions need to be implemented: i) Extract sentence semantics and efficiently compress them into the latent space; ii) Enable large language models to understand sentence-level latent representations and perform sentence-by-sentence reasoning.
[0016] Please refer to Figure 1 , Figure 1 This is a flowchart illustrating a semantic reasoning method provided in an embodiment of this application, which can be executed by an electronic device. The method may include: S101: Obtain target text information, divide the target text information into segments, obtain each sentence in the target text information, encode each sentence, and obtain multiple word features for each sentence.
[0017] In this embodiment, the target text information can be input text containing N (N≥1) sentences. By using a pattern matching method based on punctuation marks (such as commas, ".", "?", and "!"), the target text information can be segmented to obtain each sentence in the target text information.
[0018] Furthermore, after segmenting and embedding each sentence, an embedding vector corresponding to each sentence is obtained. The embedding vectors of each sentence are length aligned and padded to make the embedding vectors of all sentences have the same length. The embedding vectors of multiple sentences after alignment and padding are combined to obtain a word embedding matrix, with the embedding vector of each sentence being a row of the word embedding matrix. The word embedding matrix is input into the encoder to encode the embedding vector of each sentence, thereby obtaining multiple word features corresponding to each sentence.
[0019] S102: Semantic extraction is performed based on multiple lexical features of each sentence to obtain the first semantic feature of each sentence.
[0020] Please refer to Figure 2 This allows inputting multiple lexical features of each sentence output by the encoder into the semantic extraction layer to obtain the first semantic feature of each sentence. Figure 2 In this context, the input target text contains two sentences, and the semantic extraction layer outputs the first semantic features corresponding to each of the two sentences. and .
[0021] S103: Based on the large language model, reason about the first semantic features corresponding to each of the multiple sentences to obtain the second semantic features corresponding to each sentence.
[0022] Please refer to Figure 2 It can identify the first semantic features (e.g.) corresponding to multiple sentences. and The input is a large language model, which performs inference at the sentence level to obtain the second semantic features (e.g., ...) for each sentence. and ).
[0023] S104: Decode the second semantic feature corresponding to each sentence to obtain the semantic reasoning result corresponding to each sentence.
[0024] In this embodiment, the second semantic feature corresponding to each sentence is input into the decoder. The decoder generates a natural language response based on the second semantic feature corresponding to each sentence, which serves as the semantic reasoning result for each sentence.
[0025] Please refer to Figure 2 The input target text information contains two sentences. Each sentence is segmented and embedded, encoded by an encoder, extracted semantically, inferred by a large language model, and decoded by a decoder. The semantic inference result corresponding to each sentence is then output.
[0026] S105: Use the semantic reasoning results corresponding to multiple sentences as the semantic reasoning results for the target text information.
[0027] In this embodiment, the semantic reasoning results corresponding to multiple sentences are output as semantic reasoning results for the target text information.
[0028] As can be seen from the above, this embodiment divides the target text information into sentences, encodes each sentence, compresses each sentence into a single latent representation (i.e., multiple lexical features), and inputs these multiple lexical features into a large language model to achieve sentence-level reasoning within the latent space. The method in this embodiment utilizes the dense characteristics of latent representations to shorten the input and reasoning sequences, thereby improving the reasoning efficiency of the large language model.
[0029] In one embodiment of this application, for each sentence, semantic extraction is performed based on multiple lexical features of the sentence to obtain the first semantic feature of the sentence, including: By adding multiple lexical features, static semantic features are obtained; Multiple word features are mapped to one dimension to obtain a weight sequence; Each weight in the weight sequence is used as the weight of the corresponding word feature, and the weighted sum of multiple word features is obtained to obtain dynamic semantic features. By fusing static and dynamic semantic features, the first semantic feature of the sentence is obtained.
[0030] In this embodiment, a semantic extraction layer is trained to extract semantic features from multiple lexical features of each sentence, thereby obtaining the first semantic feature of the sentence. Please refer to [reference needed]. Figure 2 In this embodiment, the encoder, decoder and semantic extraction layer are trained together.
[0031] The encoder consists of stacked Transformer encoder layers, each containing a self-attention module. These multiple self-attention modules extract sequence context features. The encoder layers map the embedding vector of each sentence to a hidden state, serving as multiple lexical features for each sentence, such as... Figure 2As shown, multiple lexical features can be h = [h1, h2, h3, h4].
[0032] Based on this, semantic extraction is performed on multiple lexical features of each sentence through a semantic extraction layer. For each sentence, the process of extracting semantic features based on its multiple lexical features to obtain the first semantic feature of the sentence includes: firstly along... The latent representation obtained by summing multiple lexical features of the sentence is used as a static semantic feature. Static semantic features While global information is preserved, compressing hidden states containing varying amounts of information into a single vector introduces redundancy. To address this issue, this embodiment employs a fully connected layer called a semantic detector (denoted as the first fully connected layer) to map multiple lexical features onto a one-dimensional vector, obtaining a sequence of lexical weights, for example... Figure 2 In this context, w = [w1, w2, w3, w4]. Since the hidden states obtained through the self-attention mechanism are context-dependent, w can dynamically identify the semantic information of lexical units. Therefore, by weighting the lexical features with their corresponding weights, dynamic semantic features can be extracted. Extracted dynamic semantic features It contains the key information of the sentence.
[0033] However, dynamically extracted The provided signals are often too sparse, making it difficult for subsequent decoders to predict reconstructed outputs consistent with the input. Therefore, this embodiment uses static semantic features. and dynamic semantic features By fusing the data, we obtain sentence-level latent representations. s It integrates global information while reducing redundancy.
[0034] In one embodiment of this application, static semantic features and dynamic semantic features are fused based on a gating unit, specifically including: The learnable gating scalar parameters are normalized to obtain the first weighting coefficients; The second weighting coefficient is determined based on the first weighting coefficient; wherein the sum of the first weighting coefficient and the second weighting coefficient is 1; The first weight coefficient is used as the weight of the static semantic features, and the second weight coefficient is used as the weight of the dynamic semantic features. The static and dynamic semantic features are then weighted and fused.
[0035] In this embodiment, a learnable gating scalar parameter can be set. This parameter can be updated during the training phase. Gated scalar parameter. By mapping values between 0 and 1 using the Sigmoid function, a simple yet effective gating mechanism can be implemented to balance the contributions of static and dynamic semantic features. This process can be formally represented as: ; in, This represents the first weighting coefficient. This represents the second weighting coefficient.
[0036] In one embodiment of this application, each sentence is encoded based on an encoder, multiple lexical features of each sentence are semantically extracted based on a semantic extraction layer, and the second semantic features corresponding to each sentence are decoded based on a decoder. The semantic extraction layer includes a first fully connected layer and a gating unit. The first fully connected layer is used to map multiple lexical features of each sentence to one dimension to obtain a weight sequence. The training process for the encoder, semantic extraction layer, decoder, and large language model includes: Obtain the sample dataset; the sample dataset includes multiple text training samples; Based on the sample dataset, the pre-trained encoder, pre-trained decoder and pre-trained semantic extraction layer are jointly trained to obtain the staged encoder, staged decoder and staged semantic extraction layer. The staged encoder and staged semantic extraction layer are used as the input layer of the pre-trained large language model, and the staged decoder is used as the output layer of the pre-trained large language model. The staged encoder, staged decoder, staged semantic extraction layer and pre-trained large language model are jointly trained based on the sample dataset to obtain the encoder, semantic extraction layer, decoder and large language model.
[0037] Please refer to Figure 2 We can first jointly train the pre-trained encoder, pre-trained decoder, and pre-trained semantic extraction layer to obtain a staged encoder, staged decoder, and staged semantic extraction layer. Among them, the pre-trained encoder and pre-trained semantic extraction layer can extract latent representations from text training samples. s, Training the decoder is responsible for learning from the latent representation s The sentence is reconstructed to achieve self-supervised training. The decoder consists of stacked Transformer decoder layers, with latent representations... s In the decoder, these serve as keys and values for cross-attention. The initial input to the decoder is a preset starting tag (e.g., ...). <start>), and autoregressively generate the probability of reconstructing words. ,in This indicates that the generated reconstructed word is the first one. y The probabilities of each label, along with the input, are used to calculate the focus loss, serving as the training objective for the first stage. This process can be formally represented as:
[0038] in, This indicates the key loss in the first phase. y This indicates the number of tags contained in the preset dictionary. This represents the predicted probability of the reconstructed word output by the decoder being the i-th tag in the dictionary.
[0039] It should be noted that the preset dictionary can use the standard vocabulary that comes with the publicly available natural language processing pre-trained models (such as BERT, GPT, Transformer, etc.). The standard vocabulary includes various tags (lexicons), such as text tags (Chinese characters, words), punctuation marks, and control tags such as start tags, end tags, and padding tags, which are used to convert the input text into the corresponding lexicon number sequence.
[0040] Furthermore, to enable the large language model to perform sentence-by-sentence reasoning in the latent space, we integrate the staged encoder, staged decoder, and staged semantic extraction layer obtained from the first stage of training into the large language model for end-to-end training. Specifically, the embedding layer of the original large language model is replaced with the staged encoder and staged semantic extraction layer, while the output of the large language model is decoded into natural language by the staged decoder.
[0041] In one embodiment of this application, when jointly training the pre-trained encoder, pre-trained decoder, and pre-trained semantic extraction layer based on a sample dataset, the parameters of the first M layers of the large language model are used as the parameters of the encoder, and the parameters of the last M layers of the large language model are used as the parameters of the decoder; where M is the number of layers in the encoder and decoder.
[0042] In this embodiment, when jointly training the pre-trained encoder, pre-trained decoder, and pre-trained semantic extraction layer based on the sample dataset, the initial parameters of both the pre-trained encoder and pre-trained decoder are transferred from the backbone network of the large language model, and the pre-trained encoder and pre-trained decoder maintain the same number of layers. For example, for an encoder and decoder with M layers, the parameters of the first M layers of the large language model are used as the parameters of the encoder, and the parameters of the last M layers of the large language model are used as the parameters of the decoder.
[0043] Compared to random initialization, in this embodiment, parameter transfer can endow the pre-trained encoder and pre-trained decoder with semantic understanding capabilities, so that training does not need to start from the construction of basic grammatical structures, which can effectively improve the training efficiency of semantic extraction capabilities and thus achieve better convergence performance.
[0044] In one embodiment of this application, reasoning is performed on the first semantic features corresponding to multiple sentences based on a large language model to obtain the second semantic features corresponding to each sentence, including: Inference is performed across multiple time steps based on a large language model; In each time step, the latent space features obtained by reasoning at that time step are mapped to a two-dimensional Boolean vector through the second fully connected layer. If the two-dimensional Boolean vector is the end state, then reasoning is performed at the next time step. The latent space features obtained from reasoning at a time step are used as the second semantic features corresponding to a sentence.
[0045] In this embodiment, to determine when the large language model should stop generating sentence-level latent representations, a fully connected layer called the termination head (denoted as the second fully connected layer) can be trained. At each time step, the latent space features inferred at that time step are mapped to a two-dimensional Boolean vector through the second fully connected layer. This is used to determine whether the sentence has ended. If the flag is an end flag (e.g., a value of 1), the current iteration of the generation process terminates, and the latent representation generated at the current time step is used as the second semantic feature corresponding to a complete sentence. If the flag is not an end flag (e.g., a value of 0), the large language model continues to generate the latent representation for the next time step until an end flag is generated.
[0046] In one embodiment of this application, when jointly training the staged encoder, staged decoder, staged semantic extraction layer, and pre-trained large language model based on a sample dataset, the loss function used is: ; in, Let j represent the inference loss, and j represent the total number of time steps. Let represent the two-dimensional Boolean vector obtained at time step t, and k represent the number of tags contained in the true value of the inference result of the large language model. This indicates the predicted probability of the reconstructed word output by the decoder being the i-th tag in the preset dictionary.
[0047] The above formula for calculating inference loss is still based on focus loss. These are stopping terms, used to characterize the accuracy of sentence termination judgment. These are generated terms used to characterize the accuracy of the reasoning content.
[0048] The encoder, decoder, and semantic extraction layer in this embodiment are collectively referred to as Dynamic Semantic Autoencoder (DSAE) and the large language model integrating the Dynamic Semantic Autoencoder is abbreviated as DSEI. An autoencoder that uses a static fusion mechanism for sentence compression and reconstruction (SVAE) and a sentence-level large language model integrating SVAE (SLLM) are used as baseline methods. The methods of this embodiment (DSAE and DSEI) are compared with the baseline methods and token-by-token large language models (token-by-token LLMs) to explore dynamic semantic features and analyze the contribution of each component.
[0049] In SVAE, sentence-level latent representations are obtained by accumulating the hidden states of the last layer of the encoder, while SLLM performs sentence-level inference based on this static latent representation. Furthermore, we compared it with the OPT series of tag-based inference large language models.
[0050] (1) Experimental setup Datasets and Evaluation Metrics: For both DSAE and DSEI, all experiments were conducted on the English subset of the Wanjuan-1.0 dataset. The DSAE training set contains approximately 155 million sentence samples, and the validation set consists of 1000 non-overlapping sentences. The DSEI training set contains approximately 6.4 million paragraph samples, and the validation set also consists of 1000 non-overlapping paragraphs. This embodiment uses PPL as the evaluation metric for both DSAE and DSEI. Furthermore, the computational efficiency of DSEI is evaluated using average input throughput and average GPU memory usage.
[0051] (2) Implementation details (2.1) Base Model: In this embodiment, three OPT series models—125M, 350M, and 1.3B—are used as the base large language model to verify the effectiveness of this method under different model scales. The hidden state dimension of DSAE is consistent with that of the base large language model. During the weight transfer process, the encoder only uses the weights of the self-attention module of the base large language model.
[0052] (2.2) Hyperparameters: In DSAE, the maximum input tag length was set to 64, the batch size to 512, the learning rate to 1e-7, and the gating scalar initialized to 0. When the base large language model was 125M, the number of layers in DSAE was set to 1, 2, and 4 respectively; for other scales of base large language models, 1 layer was used. In DSEI, the maximum input sentence length was set to 16, the batch size to 4, and the learning rate to 1e-6. All experiments used the AdamW optimizer, with weight decay set to 1e-2 and gradient clipping maximum L2 norm of 1. The learning rate was linearly scheduled for the first 5000 iterations, and cosine annealing was used for subsequent iterations. All experiments were performed on a single RTX5880AdaGeneration.
[0053] Table 1 below compares the performance of DSAE and the baseline method SVAE at different model sizes, where PPL represents the prediction loss. This represents the percentage reduction in PPL (Prediction Loss) compared to SVAE. DSAE demonstrates superior performance compared to semantic extraction methods based on static representation fusion across various model depths (number of layers) and widths (hidden layer size). Notably, in the minimum model configuration, DSAE achieves a 13.04% reduction in PPL (Prediction Loss) compared to the baseline method, representing the largest performance gain across all model sizes. This is particularly beneficial for resource-constrained edge devices, which typically employ smaller-scale models.
[0054] Table 1 - Performance of DSAE and baseline method SVAE at different model sizes
[0055] Table 2 shows a comparison of the computational efficiency of DSEI in this embodiment with existing models, as follows: Table 2 - Comparison of computational efficiency between DSEI and existing models
[0056] Table 3 shows the sentence reconstruction performance of DSAE compared to the baseline method in multiple test cases. DSAE's sentence reconstruction accuracy significantly outperforms the baseline method. Lexical weight analysis of the input sentences shows that DSAE assigns higher weights to semantically rich content words (such as "various" in Sample 1 and "pulsed" in Sample 2), while assigning lower weights to function words lacking substantial meaning (such as "for," "a," and "of"). This indicates that DSAE can efficiently identify key sentence information based on context and compress it into a compact latent representation, showing a significant advantage, especially when processing long sentences.
[0057] Samples containing rare words. For example, in sample 2, SVAE reconstructed "515" as "660", while DSAE gave this tag a higher weight during semantic extraction and successfully restored the original tag. In sample 3, the sample contained the proper noun "Schylling", which DSAE gave the highest weight and accurately generated the word.
[0058] Table 3. Sentence reconstruction performance comparison between DSAE and baseline methods in multiple test cases.
[0059] To verify the effectiveness of each component in DSAE, we conducted an ablation study on a model with a hidden layer size of 768 and only one layer. The results are shown in Table 4.
[0060] Table 4 - Elimination Test Results
[0061] This leads to three key findings: (1) Static and dynamic semantic features provide complementary semantic information. This embodiment directly uses static semantic features. and dynamic semantic features The two semantic inputs are used as the semantic inputs for the decoder to reconstruct the sentence. When static semantic features are lacking, the reconstructed sentence is prone to missing words; while when dynamic semantic features are lacking, word inconsistencies will occur. Therefore, we can conclude that both situations will lead to a decrease in performance.
[0062] (2) The gating fusion mechanism is crucial. In the gating ablation experiment, this embodiment directly incorporates the static latent characterization. With dynamic potential representation The summation serves as a latent representation of the sentence. Compared to gating methods, this fusion approach reduces PPL by 9.2%. This confirms the importance of gating the fusion of global and feature information when extracting sentence semantics.
[0063] (3) The parameters of the pre-trained large language model are well-suited for semantic extraction tasks. When random initialization of model parameters was used instead of direct transfer of large language model parameters, the performance decreased by 11.4%. This indicates that the language understanding capabilities obtained from pre-training the large language model are well-suited for semantic extraction tasks. In addition, we investigated the applicability of different large language model layers. When the encoder and decoder parameters were initialized using the first and last layers of the large language model, respectively, the performance decreased by 2.8% and 7.1%, respectively. This shows that shallow networks are more suitable for encoders (both are responsible for extracting features from the text), while deep networks are more suitable for decoders, and both are responsible for generating text based on the extracted features.
[0064] As can be seen from the above, this embodiment proposes an innovative framework for semantic dynamic extraction reasoning (denoted as DSEI). DSEI integrates a dynamic semantic autoencoder (DSAE) into the reasoning process of a large language model (LLM), compressing the reasoning process into a latent space by dynamically extracting sentence semantics. Specifically, it includes: (1) First, DSAE is constructed through self-supervised training: inside the encoder, weights are assigned according to the hidden state of each label, and dynamic semantics are extracted by weighted averaging of these states. Subsequently, these dynamic semantics are combined with the basic semantics of static fusion through a gating mechanism to generate rich sentence-level latent representations, and the decoder reconstructs sentences based on these representations.
[0065] (2) Next, the encoder and decoder are connected to the input and output layers of the large language model respectively for end-to-end training, so that the large language model can complete reasoning in the sentence-level latent space.
[0066] The method described in this embodiment can achieve the following functions: (1) The dynamic semantic autoencoder extracts semantics dynamically through weight allocation and gating mechanisms, compressing sentences into a single latent representation. By eliminating redundant information while retaining global information, it achieves efficient semantic extraction; (2) The dynamic semantic autoencoder is integrated into the large language model to realize end-to-end sentence-level reasoning in the latent space. This framework utilizes the dense characteristics of latent representations to shorten the input and reasoning sequences, thereby significantly improving reasoning efficiency.
[0067] Corresponding to the semantic reasoning method in the above embodiments, Figure 3 This is a structural block diagram of a semantic reasoning apparatus provided according to an embodiment of this application. For ease of explanation, only the parts relevant to the embodiment of this application are shown. Reference Figure 3 The semantic reasoning device 20 includes: an encoding module 21, a semantic extraction module 22, a semantic reasoning module 23, a decoding module 24, and a result output module 25. Among them, the encoding module 21 is used to acquire target text information, divide the target text information into each sentence in the target text information, encode each sentence, and obtain multiple word features of each sentence; The semantic extraction module 22 is used to extract semantics based on multiple word features of each sentence to obtain the first semantic feature of each sentence; Semantic reasoning module 23 is used to reason about the first semantic features corresponding to multiple sentences based on the large language model, and obtain the second semantic features corresponding to each sentence. Decoding module 24 is used to decode the second semantic features corresponding to each sentence to obtain the semantic reasoning result corresponding to each sentence; The result output module 25 is used to take the semantic reasoning results corresponding to multiple sentences as the semantic reasoning results for the target text information.
[0068] In one embodiment of this application, the semantic extraction module 22 is specifically used for: By adding multiple lexical features, static semantic features are obtained; Multiple word features are mapped to one dimension to obtain a weight sequence; Each weight in the weight sequence is used as the weight of the corresponding word feature, and the weighted sum of multiple word features is obtained to obtain dynamic semantic features. By fusing static and dynamic semantic features, the first semantic feature of the sentence is obtained.
[0069] In one embodiment of this application, the semantic extraction module 22 is further configured to: The learnable gating scalar parameters are normalized to obtain the first weighting coefficients; The second weighting coefficient is determined based on the first weighting coefficient; wherein the sum of the first weighting coefficient and the second weighting coefficient is 1; The first weight coefficient is used as the weight of the static semantic features, and the second weight coefficient is used as the weight of the dynamic semantic features. The static and dynamic semantic features are then weighted and fused.
[0070] In one embodiment of this application, each sentence is encoded based on an encoder, semantics are extracted from multiple lexical features of each sentence based on a semantic extraction layer, and the second semantic features corresponding to each sentence are decoded based on a decoder; the semantic extraction layer includes a first fully connected layer and a gating unit, the first fully connected layer being used to map multiple lexical features of each sentence to one dimension to obtain a weight sequence; the semantic extraction module 22 is further specifically used for: Obtain the sample dataset; the sample dataset includes multiple text training samples; Based on the sample dataset, the pre-trained encoder, pre-trained decoder and pre-trained semantic extraction layer are jointly trained to obtain the staged encoder, staged decoder and staged semantic extraction layer. The staged encoder and staged semantic extraction layer are used as the input layer of the pre-trained large language model, and the staged decoder is used as the output layer of the pre-trained large language model. The staged encoder, staged decoder, staged semantic extraction layer and pre-trained large language model are jointly trained based on the sample dataset to obtain the encoder, semantic extraction layer, decoder and large language model.
[0071] In one embodiment of this application, the semantic extraction module 22 is further configured to: When jointly training the pre-trained encoder, pre-trained decoder, and pre-trained semantic extraction layer based on the sample dataset, the parameters of the first M layers of the large language model are used as the parameters of the encoder, and the parameters of the last M layers of the large language model are used as the parameters of the decoder; where M is the number of layers in the encoder and decoder.
[0072] In one embodiment of this application, the semantic reasoning module 23 is specifically used for: Inference is performed across multiple time steps based on a large language model; In each time step, the latent space features obtained by reasoning at that time step are mapped to a two-dimensional Boolean vector through the second fully connected layer. If the two-dimensional Boolean vector is the end state, then reasoning is performed at the next time step. The latent space features obtained from reasoning at a time step are used as the second semantic features corresponding to a sentence.
[0073] In one embodiment of this application, the semantic reasoning module 23 is further configured to: When jointly training the staged encoder, staged decoder, staged semantic extraction layer, and pre-trained large language model based on the sample dataset, the loss function used is: ; in, Let j represent the inference loss, and j represent the total number of time steps. Let represent the two-dimensional Boolean vector obtained at time step t, and k represent the number of tags contained in the true value of the inference result of the large language model. This indicates the predicted probability of the reconstructed word output by the decoder being the i-th tag in the preset dictionary.
[0074] See Figure 4 , Figure 4 This is a schematic block diagram of an electronic device provided according to an embodiment of this application. Figure 4 The electronic device 300 in this embodiment may include one or more processors 301, one or more input devices 302, one or more output devices 303, and one or more memories 304. The processors 301, input devices 302, output devices 303, and memories 304 communicate with each other via a communication bus 305. The memories 304 store computer programs, including program instructions. The processors 301 execute the program instructions stored in the memories 304. Specifically, the processors 301 are configured to invoke the program instructions to perform the functions of each module / unit in the above-described device embodiments, for example... Figure 3 The functions of the encoding module 21, semantic extraction module 22, semantic reasoning module 23, decoding module 24, and result output module 25 are shown.
[0075] It should be understood that, in the embodiments of this application, the processor 301 may be a central processing unit (CPU), or it may be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor.
[0076] Input device 302 may include a touchpad, a fingerprint sensor (for collecting the user's fingerprint information and fingerprint orientation information), a microphone, etc., and output device 303 may include a display (LCD, etc.), a speaker, etc.
[0077] The memory 304 may include read-only memory and random access memory, and provides instructions and data to the processor 301. A portion of the memory 304 may also include non-volatile random access memory. For example, the memory 304 may also store preset information such as dictionaries.
[0078] In specific implementations, the processor 301, input device 302, and output device 303 described in the embodiments of this application can execute the implementation methods described in the semantic reasoning methods provided in the embodiments of this application, or they can execute the implementation methods of the electronic devices described in the embodiments of this application, which will not be elaborated here.
[0079] In another embodiment of this application, a computer-readable storage medium is provided. This computer-readable storage medium stores a computer program, which includes program instructions. When executed by a processor, the program instructions implement all or part of the processes in the methods described above. Alternatively, the computer program can instruct related hardware to complete the process. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable medium can include any entity or device capable of carrying computer program code, a recording medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a random access memory (RAM), an electrical carrier signal, a telecommunication signal, and a software distribution medium, etc.
[0080] The computer-readable storage medium can be an internal storage unit of the electronic device in any of the foregoing embodiments, such as a hard disk or memory of the electronic device. The computer-readable storage medium can also be an external storage device of the electronic device, such as a plug-in hard disk, smart media card (SMC), secure digital card (SD), flash card, etc., equipped on the electronic device. Furthermore, the computer-readable storage medium can include both internal and external storage units of the electronic device. The computer-readable storage medium is used to store computer programs and other programs and data required by the electronic device. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
[0081] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this application.
[0082] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the electronic devices and units described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.
[0083] In the several embodiments provided in this application, it should be understood that the disclosed electronic devices and methods can be implemented in other ways. For example, the device embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be indirect coupling or communication connection through some interfaces or units, or it may be an electrical, mechanical, or other form of connection.
[0084] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of the embodiments of this application, depending on actual needs.
[0085] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0086] The above are merely specific embodiments of this application, but the scope of protection of this application is not limited thereto. Any person skilled in the art can easily conceive of various equivalent modifications or substitutions within the technical scope disclosed in this application, and these modifications or substitutions should all be covered within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.< / start>
Claims
1. A semantic reasoning method, characterized in that, include: Obtain target text information, divide the target text information into segments to obtain each sentence in the target text information, encode each sentence to obtain multiple word features of each sentence; Semantic extraction is performed based on multiple lexical features of each sentence to obtain the first semantic feature of each sentence; Based on the large language model, reasoning is performed on the first semantic features corresponding to each of the multiple sentences to obtain the second semantic features corresponding to each sentence; Decode the second semantic features corresponding to each sentence to obtain the semantic reasoning result for each sentence; The semantic reasoning results corresponding to each of the multiple sentences are used as the semantic reasoning results for the target text information.
2. The semantic reasoning method as described in claim 1, characterized in that, For each sentence, semantic extraction is performed based on multiple lexical features of the sentence to obtain the first semantic feature of the sentence, including: The static semantic features are obtained by adding the multiple lexical features together. The multiple word features are mapped to a one-dimensional array to obtain a weight sequence. Each weight in the weight sequence is used as the weight of the corresponding word feature, and the weighted sum of multiple word features is obtained to obtain dynamic semantic features. The static semantic features and the dynamic semantic features are fused to obtain the first semantic feature of the sentence.
3. The semantic reasoning method as described in claim 2, characterized in that, The fusion of the static semantic features and the dynamic semantic features based on the gating unit specifically includes: The learnable gating scalar parameters are normalized to obtain the first weighting coefficients; The second weighting coefficient is determined based on the first weighting coefficient; wherein the sum of the first weighting coefficient and the second weighting coefficient is 1; The first weight coefficient is used as the weight of the static semantic feature, and the second weight coefficient is used as the weight of the dynamic semantic feature. The static semantic feature and the dynamic semantic feature are then weighted and fused.
4. The semantic reasoning method as described in claim 3, characterized in that, Each sentence is encoded based on an encoder, and multiple lexical features of each sentence are semantically extracted based on a semantic extraction layer. The second semantic features corresponding to each sentence are decoded based on a decoder. The semantic extraction layer includes a first fully connected layer and the gating unit. The first fully connected layer is used to map the multiple lexical features of each sentence to one dimension to obtain a weight sequence. The training process of the encoder, the semantic extraction layer, the decoder, and the large language model includes: Obtain a sample dataset; the sample dataset includes multiple text training samples; Based on the sample dataset, the pre-trained encoder, pre-trained decoder and pre-trained semantic extraction layer are jointly trained to obtain the staged encoder, staged decoder and staged semantic extraction layer. The staged encoder and the staged semantic extraction layer are used as the input layer of the pre-trained large language model, and the staged decoder is used as the output layer of the pre-trained large language model. The staged encoder, the staged decoder, the staged semantic extraction layer and the pre-trained large language model are jointly trained based on the sample dataset to obtain the encoder, the semantic extraction layer, the decoder and the large language model.
5. The semantic reasoning method as described in claim 4, characterized in that, When jointly training the pre-trained encoder, pre-trained decoder, and pre-trained semantic extraction layer based on the sample dataset, the parameters of the first M layers of the large language model are used as the parameters of the encoder, and the parameters of the last M layers of the large language model are used as the parameters of the decoder; where M is the number of layers of the encoder and the decoder.
6. The semantic reasoning method as described in claim 4, characterized in that, The method involves reasoning about the first semantic features corresponding to multiple sentences based on a large language model to obtain the second semantic features corresponding to each sentence, including: Inference is performed across multiple time steps based on a large language model; In each time step, the latent space features obtained by reasoning at that time step are mapped to a two-dimensional Boolean vector through the second fully connected layer. If the two-dimensional Boolean vector is the end state, then reasoning is performed at the next time step. The latent space features obtained from reasoning at a time step are used as the second semantic features corresponding to a sentence.
7. The semantic reasoning method as described in claim 6, characterized in that, When jointly training the staged encoder, staged decoder, staged semantic extraction layer, and the pre-trained large language model based on the sample dataset, the loss function used is: ; in, Let j represent the inference loss, and j represent the total number of time steps. Let represent the two-dimensional Boolean vector obtained at time step t, and k represent the number of tags contained in the true value of the inference result of the large language model. This indicates the predicted probability of the reconstructed word output by the decoder being the i-th tag in the preset dictionary.
8. A semantic reasoning device, characterized in that, include: The encoding module is used to acquire target text information, divide the target text information into segments to obtain each sentence in the target text information, encode each sentence, and obtain multiple word features for each sentence; The semantic extraction module is used to extract semantics based on multiple lexical features of each sentence, and obtain the first semantic feature of each sentence; The semantic reasoning module is used to reason about the first semantic features of multiple sentences based on the large language model, and obtain the second semantic features of each sentence. The decoding module is used to decode the second semantic features corresponding to each sentence to obtain the semantic reasoning result corresponding to each sentence; The result output module is used to take the semantic reasoning results corresponding to multiple sentences as the semantic reasoning results for the target text information.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method as described in any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method as described in any one of claims 1 to 7.