A shared embedding method for large language model recommendation
By assigning shared embeddings to similar IDs and combining text aggregation embedding and whole word embedding techniques, the problems of ID semantic fragmentation and data sparsity in large language model recommendation are solved, thereby improving the performance and efficiency of the recommendation system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUILIN UNIV OF ELECTRONIC TECH
- Filing Date
- 2026-03-02
- Publication Date
- 2026-06-19
AI Technical Summary
Existing generative recommendation methods based on large language models suffer from semantic fragmentation caused by ID text segmentation, and consequently, insufficient data sparsity and long-tail scenario modeling capabilities.
By clustering and assigning shared static embeddings to similar IDs, and combining text aggregation embedding and whole word embedding techniques, the overall semantics of the IDs are formed, thereby improving the recommendation effect.
It significantly alleviates the problems of data sparsity and long tail, enhances the model's ability to recommend low-frequency items, improves the interpretability and generalization performance of the recommendation system, and reduces the model's dependence on verbose and discrete prompts.
Smart Images

Figure CN122241287A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence technology, specifically to the intersection of recommender systems and large language models (LLM), and particularly to a shared embedding method for recommending using large language models. Background Technology
[0002] With the explosive growth of digital content, recommender systems have become a key technology for alleviating information overload and providing personalized services. Sequence recommendation, which predicts a user's future interests by analyzing their historical interaction sequences, is an important branch of recommender systems.
[0003] In recent years, large language models (LLMs) have demonstrated powerful semantic understanding and generation capabilities, and researchers have begun to explore their application to generative sequence recommendation tasks, that is, to construct recommendation tasks as "text-to-text" generation problems.
[0004] However, inputting user IDs and item IDs into large language models in plain text form presents many challenges: First, user and item IDs are usually split into multiple sub-tokens by the tokenizer, which causes their overall semantics to be fragmented, resulting in the "semantic fragmentation" problem; Secondly, the large number of discrete IDs and limited interaction data lead to severe data sparsity and long-tail problems, making it difficult for the model to learn an effective representation of low-frequency IDs. Finally, existing methods lack explicit modeling of potential similarities or associations between IDs, limiting the model's ability to utilize collaborative information.
[0005] Therefore, there is an urgent need for a recommendation method that can effectively integrate ID information, alleviate data sparsity, and enhance ID relationship modeling within a large language model recommendation framework. Summary of the Invention
[0006] The technical problem to be solved by this invention is to overcome the semantic fragmentation problem caused by ID text segmentation in existing generative recommendation methods based on large language models, and the resulting deficiency in the ability to model data sparsity and long-tail scenarios.
[0007] To address the aforementioned technical issues, this invention proposes a shared embedding method for large language model recommendations. This method assigns shared static embeddings to similar IDs through clustering and combines text aggregation embedding and whole word embedding techniques to form the overall semantics of the IDs, thereby enhancing the model's understanding and generalization ability of IDs and improving recommendation performance.
[0008] The present invention adopts the following technical solution: A shared embedding method for recommendation in large language models includes: Obtain historical interaction data of user sets and item sets to obtain user-item historical interaction sequences, train a traditional sequence recommendation model, and extract user ID embedding matrix and item ID embedding matrix from the trained traditional sequence recommendation model; The extracted user ID embedding matrix and item ID embedding matrix are normalized and then clustered using a clustering algorithm to obtain a set of user ID clusters and a set of item ID clusters. For each user cluster and item cluster, a representative text identifier is selected. After being converted into a token embedding sequence by the word segmentation and embedding layer of the pre-trained large language model, the sequence is pooled and aggregated to generate a shared static embedding vector representing the cluster. Establish a mapping function from user ID and item ID to their corresponding shared static embedding; Construct a natural language prompt template that includes the shared embedded placeholder; A whole-word embedding mechanism is used to ensure that multiple tokens with the same ID are identified as a whole; The application of a prompt distillation strategy refines discrete prompt information into a continuous prompt vector; Finally, a large language model is trained in a sequence-to-sequence manner, and beam search is used to generate recommended item sequences during inference.
[0009] The beneficial effects of this invention include: 1. By sharing embeddings, multiple similar IDs are mapped to the same vector, which significantly alleviates the problems of data sparsity and long tail, and improves the model's ability to recommend low-frequency items.
[0010] 2. Implicit similarity relationships between IDs were explicitly modeled through clustering, and this collaborative information was injected into a large language model through shared embedding, enhancing the interpretability and generalization performance of the recommender system.
[0011] 3. By using text aggregation embedding and whole word embedding, the overall semantic consistency of the ID is ensured, effectively avoiding the semantic fragmentation problem caused by word segmentation.
[0012] 4. By combining cue distillation, the model's reliance on lengthy discrete cues is reduced, improving inference efficiency and the ability to capture ID information. Attached Figure Description
[0013] Figure 1 This is the overall flowchart of the shared embedding method recommended for large language models in this invention.
[0014] Figure 2 This is a schematic diagram of the shared embedding system framework for large language model recommendations according to the present invention; Among them, (a) is the overall architecture diagram, (b) is the shared static embedding generation diagram, and (c) is the mapping diagram from ID to embedding.
[0015] Figure 3 This is a schematic diagram of the prompt template in the embodiment.
[0016] Figure 4 This is a schematic diagram illustrating the distillation process in the example. Detailed Implementation
[0017] To make the objectives, technical solutions, and beneficial effects of this invention clearer, the following description is provided in conjunction with the appendix. Figure 1-4 The embodiments and examples further illustrate the content of the present invention in detail, but are not intended to limit the present invention.
[0018] This invention presents a shared embedding method for large language model recommendations, using generative sequence recommendation as an application scenario. It constructs natural language prompts from users' historical interaction sequences and injects user / item ID-related information into the large language model input through a shared static embedding method. This alleviates the semantic fragmentation caused by ID text segmentation and also mitigates the problems of data sparsity and long tail.
[0019] Reference Figure 1 This invention is a shared embedding method for large language model recommendations. Starting with data preparation, it first acquires user-item interaction data and constructs a historical interaction sequence arranged in chronological order for each user. Users constitute a user set U, and items constitute an item set I. The historical interaction sequence is used to predict the next interactive item for that user. The shared embedding method includes the following steps: S1. Embedding Extraction: Extract the user ID embedding matrix and item ID embedding matrix from the trained traditional sequence recommendation model; S2. Clustering and Grouping: The extracted user ID embedding matrix and item ID embedding matrix are normalized respectively. After normalization, the K-Means clustering algorithm is used to cluster them to obtain K sets of user ID clusters and item ID clusters. S3. Generate shared static embeddings: For each user cluster and item cluster, select the text identifier with the smallest ID value in the corresponding cluster, convert it into a token embedding sequence through the word segmentation and embedding layer of the pre-trained Large Language Model (LLM), and then pool and aggregate the sequence to generate a shared static embedding vector representing the cluster. S4. Establish mapping relationship: Define a mapping function to map any user ID or item ID to the shared static embedding vector corresponding to its cluster; S5. Construct Mixed Hints: Build a mixed hint template that includes placeholders; S6. Whole word embedding and prompt distillation: In the model input layer, whole word embedding is applied to multiple word tokens belonging to the same ID so that they share the same whole word embedding vector. A cue distillation strategy is adopted, in which a trainable continuous cue vector is added to the input of the cue template, and then fed together with a discrete cue template containing placeholders into a large language model for joint optimization. S7. Model Training and Inference: During the training phase, the mixed prompts are input into the large language model, and the placeholders are replaced with the corresponding shared static embedding vectors. The training is performed in a sequence-to-sequence paradigm. During the inference phase, a recommended item sequence is generated using a trained model and a beam search algorithm. Output the recommended list after completion. Example 1
[0020] Reference Figure 1-2 This embodiment focuses on a shared embedding method and system for large language models. Figure 2 'a' represents the overall system architecture, demonstrating the process from the original ID sequence, through mapping to obtain shared embeddings, combining whole-word embeddings and suggestion vectors, and inputting them into an LLM to obtain recommendations. The shared embedding method includes the following steps: Step 1: Data Preparation and Task Definition We collect a user-item interaction dataset and construct a historical interaction sequence for each user, sorted chronologically. We use a leave-one-out method to split the dataset: the second-to-last interacting item in each user's sequence is used as the validation set, the last as the test set, and the rest as the training set. The sequence recommendation task is defined as: given a user's historical sequence, predict the item they might interact with next.
[0021] Step 2: Extract the collaborative ID embedding: Train a traditional sequence recommendation model (such as SASRec) using the training set data. After training, extract the embedding vectors of all user IDs and item IDs from the model to form user ID embedding matrices. and item ID embedding matrix These vectors encode collaborative filtering information about user preferences and items.
[0022] Embedding extraction, specifically, let's assume... Represents a set of users. Represents a set of items, where W and N These represent the number of users and the number of items, respectively. For each user... Its interaction history can be represented as a sequence arranged in chronological order. ,in , tIndicates the length of the sequence. Each interaction is accompanied by a timestamp to ensure temporal consistency of the sequence. Given a user u and its interaction history The goal of sequence recommendation is to predict the item a user is most likely to interact with next, based on the currently known sequence of interactions. .
[0023] For a set of users and a set of items, the ID embeddings extracted by a pre-trained sequence recommendation model can be formalized as: ; ; Where the function This represents the process of obtaining user and item ID embeddings from a sequence recommendation model. and These represent collections of users and collections of items, respectively. and These represent the ID embedding matrices for users and items, respectively. w and n These represent the number of users and the number of items, respectively. d It is the dimension of the embedding space of traditional sequence recommendation models. This indicates that the model parameters of the sequence recommendation system are optimized through training on a specific dataset.
[0024] Step 3: Clustering to generate ID clusters: right and L2 normalization is performed separately to make the Euclidean distance between vectors approximate the cosine similarity. Then, the K-Means clustering algorithm is used to cluster the normalized user embeddings and item embeddings separately, obtaining k clusters in each, denoted as . and The number of clusters k can be adjusted according to the size of the dataset. When there are few users, the value of k can be larger, and when there are many users, the value of k can be smaller, in order to balance the sample size within the cluster and the sharing effect.
[0025] The Euclidean distance calculated by K-Means approximates the cosine similarity, and can be formally described as: ; ; ; ; Cluster(•) represents K-Means clustering. and These represent the clustering of users and items, respectively. k Clusters, of which kThe choice of [option] has a significant impact on recommendation performance. k The basis for this is determined based on the users of the dataset.
[0026] Step 4: Generate shared static embeddings: Generate a shared static embedding vector for each cluster, combined with Figure 2 b (demonstrates the process of representing ID text - word segmentation - token embedding - average pooling - shared static embedding), the specific process is as follows: 4.1 Select a representative text identifier for each cluster (e.g., select the identifier with the smallest ID value within the cluster, such as "user_123"); 4.2 Input the representative identifier into the word segmenter of a pre-trained large language model (such as T5) to obtain a series of tokens; 4.3. Through the embedding layer of the large language model, these tokens are converted into corresponding token embedding sequences; 4.4 Perform average pooling on the token embedding sequence to obtain a global text aggregation embedding vector, which is the shared static embedding vector of the cluster. pemb user or pemb item ).
[0027] Specifically, the vectorization process of text identifiers relies on a pre-trained large language model (such as T5). First, tokenization is performed on the text to obtain basic unit tokens; then, these are mapped to a vector space through an embedding layer. Let... For word segmentation operation, For embedding operations, the text embedding of users and items is represented as follows: ; ; in, and The token embedding sequences are for item and user text identifiers, respectively, with the following dimensions: and ,in g and d For the number of tokens, The embedding dimension of a single token generated for a Large Language Model (LLM).
[0028] Average pooling yields text aggregation embedding vectors (static embeddings). To obtain an overall semantic representation, the embedding vectors of all tokens are averaged to obtain a unified text representation. This method can retain the main semantic features while reducing dimensionality and computational complexity.
[0029] ; ; in and These represent the pooled representations of user text embeddings and item text embeddings, respectively. g It is the number of terms in the user's text. z Indicates the first in the user text z The index of each term. Specifically, user text. use_u Decomposed into g There are *n* tokens, and the embedding vector for each term is *n* ... ,in This indicates the position of these terms in the text. By examining all... g Embedding vectors of each term Summing and averaging yields a result. Dimensional User Embedding This vector represents the global features of the entire user text. The same applies to item text. The resulting static vector set for users and items... and Corresponding to and In k Clusters.
[0030] Step 5: Establish the mapping from ID to static embedding: Combination Figure 2 c (demonstrates the process of extracting ID embeddings from a traditional recommendation model - L2 normalization - K-Means clustering - establishing a mapping relationship from ID to cluster), define a mapping function. When a specific user ID or item ID is input, the function first determines the cluster to which the ID belongs based on the clustering results in step 3, and then returns the shared static embedding vector generated for that cluster in step 4, i.e.: ;
[0031] Step 6: Construct a hybrid prompt template: See Figure 3 Design a natural language prompt template to describe the sequence recommendation task, and set placeholders in the template for embedding shared static embedding vectors.
[0032] For example: "user" <pu>The historical interaction sequence is: item_a, item_b, item_c. Based on this sequence, predict the next possible item to interact with. in, <pu>and <ph>(or Figure 3 In<pemb_u_user> ,<pemb_i_item> ) are special placeholders used to fill in the shared static embedding vectors of the user and the currently predicted item in subsequent steps. The text identifiers of the user and item (such as "user_123", "item_456") are also preserved in the prompt.
[0033] Step 7: Apply whole word embedding and prompt distillation: 7.1 Whole-word embedding: Since an ID (e.g., "user_123") can be segmented into multiple tokens (e.g., ["user", "_", "123"]), this invention employs a whole-word embedding mechanism. In the model input layer, all tokens belonging to the same ID are assigned the same, learnable whole-word embedding vector, enabling the model to recognize and distinguish them as a whole.
[0034] 7.2. Distillation Instructions: See [link / reference] Figure 4 During the training phase, a set of trainable continuous cue vectors is concatenated before the discrete cue template text. The model then jointly optimizes these continuous vectors to learn the task information contained in the discrete cue. During the inference phase, only the continuous cue vectors are used, without requiring the complete discrete cue text, thus improving efficiency.
[0035] Step 8, Model Training and Inference: 8.1 Training: Input the prompts constructed in step 6 into a large language model (e.g., T5). After the embedding layer, the placeholders in the prompts are... <pu>and <ph>The corresponding vector is replaced with the actual shared static embedding vector obtained through the mapping function ϕ. The model is trained in an autoregressive sequence-to-sequence manner, predicting the next token from the previous token, and the loss function is negative log-likelihood loss.
[0036] 8.2 Inference: For a given user history sequence, the model uses a beam search algorithm based on the processed input to generate the most likely item ID text sequence. The top b candidate sequences with the highest scores are retained to form the user's Top-N recommendation list.
[0037] Through the above steps, this invention combines the collaborative information of traditional recommendation models, the textual semantic information of IDs, and the powerful generation capabilities of large language models, thereby improving recommendation accuracy while effectively solving the problems of ID semantic fragmentation and data sparsity. Example 2
[0038] The pre-trained large language model T5-small was chosen as the backbone model, with 6 layers each in the encoder and decoder, a model dimension of 512, and 8 attention heads. For tokenization, the default SentencePiece lexical analysis was used, and the vocabulary of T5 was... V There are 32,128 symbols in total, with an embedding dimension of 512. Sequence recommendation prompts, similar to those used in P5 and POD, are used to convert sequence information into text. The model is trained on the training set using the AdamW optimizer, and results are reported on the test set. The number of consecutive prompt vectors for each task is set to 3, consistent with POD. The number of user and item clusters in the Sports, Beauty, and Toys datasets is also considered. k The values were set to 75, 90, and 100 respectively. The batch size for the training task was set to 64, and the learning rate was set to 0.001 (Sports dataset) and 0.0005 (Beauty and Toys dataset).
[0039] During training, if the model's validation loss on the sequence recommendation task is the lowest in the current epoch, a checkpoint is saved. If this does not occur within 5 iterations, training terminates, and the best checkpoint is loaded for evaluation. During inference, the number of beams for sequence recommendation is... b Set to 20. For all tasks, set the inference batch size to 32.
[0040] The model is based on T5-Small and employs an encoder-decoder structure. Under the prompt distillation setting, the input and output are defined as a pair of word sequences and... Then, the labels of the input sequence are concatenated with the cue vector to obtain... .
[0041] Add whole word embedding Then, it is input into the T5-Small model. Since all generative recommendation tasks are sequence-to-sequence tasks, negative log-likelihood loss is used to optimize the model parameters.
[0042]
[0043] in, D It consists of all input-output pairs ( X , Y The training set consists of ) and These represent the number of training samples and the number of tokens in the output sequence, respectively. Given an input sequence, the sum of the two sequences represents the sum of the two sequences at time steps 1 and 2. t Before( The probability of generating a token from the generated token. .
[0044] Because the LLM output is a text sequence, a commonly used beam search is chosen due to its effectiveness in sequence-to-sequence generation. Assume the number of beams is set to... b At each time step, for b A candidate sequence. In the next step, any word from vocabulary V is added to the end of the candidate sequence, thereby generating... b × V Combinations. Then, select the one with the maximum log-likelihood. b A sequence. LLM can continue this process until the candidate sequence reaches a predefined maximum length. For sequence recommendation, b The candidate sequences constitute the recommendation list. Furthermore, validation can be performed on several publicly available recommendation datasets, with different numbers of clusters set for different data sizes. k Training hyperparameters such as batch size and learning rate. The model selection and parameter settings described above are for illustrative purposes only and do not limit the scope of protection of this invention.< / ph> < / pu> < / ph> < / pu> < / pu>
Claims
1. A shared embedding method for recommendation in large language models, characterized in that, Includes the following steps: S1. Embedding Extraction: Obtain historical interaction data of user set and item set to obtain user-item historical interaction sequence, train a traditional sequence recommendation model, and extract user ID embedding matrix and item ID embedding matrix from the trained traditional sequence recommendation model; S2. Clustering and Grouping: The extracted user ID embedding matrix and item ID embedding matrix are normalized respectively, and then clustered using a clustering algorithm to obtain the user ID cluster set and the item ID cluster set; S3. Generate shared static embeddings: For each user cluster and item cluster, select a representative text identifier, convert it into a token embedding sequence through the word segmentation and embedding layer of the pre-trained large language model, and then perform pooling aggregation on the sequence to generate a shared static embedding vector representing the cluster. S4. Establish mapping relationship: Define a mapping function to map any user ID or item ID to the shared static embedding vector corresponding to its cluster; S5. Construct hybrid prompts: Build a natural language prompt template containing the user's historical interaction sequence, and set placeholders in the template for embedding static embedding vectors shared by users and items; S6. Whole word embedding and prompt distillation: In the model input layer, whole word embedding is applied to multiple word tokens belonging to the same ID so that they share the same whole word embedding vector. A cue distillation strategy is adopted, in which a trainable continuous cue vector is added to the input of the cue template, and then fed together with a discrete cue template containing placeholders into a large language model for joint optimization. S7. Model Training and Inference: During the training phase, the mixed prompts are input into the large language model, and the placeholders are replaced with the corresponding shared static embedding vectors. The training is performed in a sequence-to-sequence paradigm. During the inference phase, a recommended item sequence is generated using a trained model and a beam search algorithm.
2. The method according to claim 1, characterized in that, In step S2, the normalization process is L2 normalization. In this case, the clustering algorithm is the K-Means clustering algorithm, and the number of clusters k is dynamically adjusted according to the size of the dataset.
3. The method according to claim 1, characterized in that, In step S3, the representative text identifier is The text identifier with the smallest ID value in the corresponding cluster; the pooling aggregation operation is an average pooling operation.
4. The method according to claim 1, characterized in that, In step S5, the placeholder includes a first placeholder. The placeholder and the second placeholder are used to embed the user-shared static embedding vector and the item-shared static embedding vector in the prompt, respectively.
5. The method according to claim 1, characterized in that, In step S6, the whole word embedding mechanism is a genus All tokens with the same user ID or item ID are assigned the same whole word embedding vector to distinguish them from other non-ID tokens.
6. The method according to claim 1, characterized in that, In step S6, the cue distillation strategy specifically involves: during the training phase, simultaneously using the discrete cue and the continuous cue vector, and jointly training the continuous cue vector to learn task information; During the inference phase, recommendations are generated using only the learned continuous cue vectors.
7. The method according to claim 1, characterized in that, In step S1, the traditional sequence recommendation model is a sequence recommendation model based on a self-attention mechanism, and it is trained using only training set data to extract embeddings.
8. The method according to claim 1, characterized in that, In step S7, the loss function used for model training is negative log-likelihood loss, and the beam search algorithm used in the inference stage retains b candidate sequences to form a recommendation list.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the method as described in any one of claims 1 to 8.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the steps of the method as described in any one of claims 1 to 8.