An Enhanced Cross-Domain Order Recommendation Method Combining Large Language Models and Contrastive Learning

By combining large language models and contrastive learning methods, the long-term and short-term interest features of items are decoupled and integrated, solving the problems of insufficient semantic expression and feature sparsity in cross-domain sequence recommendation systems, and improving the accuracy and cold start capability of the recommendation system.

CN118445477BActive Publication Date: 2026-06-30YANSHAN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
YANSHAN UNIV
Filing Date
2024-04-30
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing cross-domain sequence recommendation systems cannot fully express the semantic information of items, suffer from feature sparsity, and cannot handle complex user behavior data, resulting in insufficient recommendation accuracy, especially in cold start and long-tail item recommendations.

Method used

By combining large language models and contrastive learning methods, the BERT model is used to convert item names and descriptions into embeddings, decoupling long-term and short-term interest features. Contrastive learning and attention mechanisms are then used to fuse features and dynamically adjust feature ratios to improve recommendation accuracy.

Benefits of technology

By decoupling and fusing short-term and long-term interests, the model can better understand and predict user behavior patterns, improving the performance and accuracy of the recommendation system, especially in cold start and long-tail item recommendations.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118445477B_ABST
    Figure CN118445477B_ABST
Patent Text Reader

Abstract

This invention discloses an enhanced cross-domain sequential recommendation method combining a large language model and contrastive learning, belonging to the field of cross-domain sequence recommendation. The method includes several steps: fusing a large language model, fine-tuning the fusion ratio, decoupling the long and short features of the sequence, fusing long-term and short-term features from different domains to obtain long-term and short-term interests, and fusing the fused long-term and short-term interests from different domains to predict the next item. This invention achieves effective fusion of long-term and short-term interests, thereby significantly improving the accuracy of cross-domain sequential recommendation. It enhances the accuracy of cross-domain sequence recommendation to a certain extent, providing a new perspective and technical path for the research and practice of recommendation systems.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of cross-domain sequence recommendation, and in particular to an enhanced cross-domain sequence recommendation method that combines large language models and contrastive learning. Background Technology

[0002] Cross-domain sequence recommendation is a recommendation system technique that aims to recommend a series of items from a target domain or category to a user, while taking into account the user's historical behavior and preferences in the source domain. In the recommendation system, the user's behavior sequence spans different domains or platforms, and the recommendation system needs to predict the user's next behavior or interest based on this cross-domain sequence data.

[0003] By comprehensively extracting features from users' historical behavior and preferences, cross-domain sequence recommendation, through GNN and RNN, can uncover users' potential interests and needs in different domains. Integrating features from different domains improves the model's predictive ability and effectively solves the problems of sparse source domain data and cold start in the source domain.

[0004] However, most current domain sequence recommendation systems rely primarily on the digital representation of items, i.e., item IDs or feature vectors, which can present several problems. First, ID-based recommendation systems may fail to adequately express the semantic information of items. Item IDs are typically unique identifiers, but they don't contain specific information about the item, such as its attributes, functions, or appearance. This can lead to the recommendation system failing to accurately understand the user's query intent or preferences, thus recommending items that don't meet the user's needs. Second, feature-based recommendation systems may suffer from feature sparsity. With a large number of items, each item's feature vector may contain only a small number of non-zero elements, which may prevent the recommendation system from fully capturing the similarities or relationships between items. This will limit the performance of the recommendation system, especially when dealing with cold start problems or recommending long-tail items. Furthermore, item-based recommendation systems may struggle to handle complex user behavior data. User behavior data may contain diverse information, such as clicks, purchases, and favorites. Relying solely on the digital representation of items may not be sufficient to capture the sequential patterns of these behaviors and the user's intent. This can lead to the recommendation system failing to accurately predict the user's next action or preference. Summary of the Invention

[0005] The technical problem this invention aims to solve is to provide an enhanced cross-domain sequential recommendation method that combines a large language model and contrastive learning. By leveraging the rich information contained in the text through a fusion of large language models, and simultaneously utilizing contrastive learning, long-term interests and short-term interests in a sequence can be effectively decoupled. Long-term interests typically refer to information that persists for a relatively long period within a sequence, while short-term interests refer to information that appears only briefly. This helps the model better understand and predict patterns and information in text sequences. By fusing long and short interests separately, the complex structure within the sequence can be captured more precisely, addressing the problem that existing systems cannot accurately recommend items based on their IDs.

[0006] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is as follows:

[0007] An enhanced cross-domain order recommendation method combining large language models and contrastive learning is proposed, with the following specific steps:

[0008] Step 1: Integrate the large language model; input the item name into the large language model, let the large language model generate relevant text descriptions in the form of keyword information, and then convert them into embeddings through the BERT model, while embedding position encoding, type encoding information, and the original text embedding;

[0009] Step 2: Fine-tuning the fusion ratio; The whitening layer of the input parameters fine-tunes the feature encoding, controlling the fusion ratio of different parameters. The parameters of the whitening layer can be learned.

[0010] Step 3: Decouple the long and short features of the sequence; decouple the sequence features from different domains to obtain the long-term and short-term features of the sequence respectively;

[0011] Step 4: Fuse the long-term and short-term features of different domains separately to obtain the long-term and short-term interests after fusion of different domains;

[0012] Step 5: Combine the long-term and short-term interests of different domains after fusion to predict the next item.

[0013] A further improvement to the technical solution of the present invention is as follows: Step 1 is specifically performed as follows:

[0014] Step 1.1: Input text, use the input prompt to allow the large model to specify the key attributes to return, and use the prompt to generate additional information in natural language form, specifically:

[0015] A = P1(K)

[0016] B = P2(K)

[0017] Where A represents the text that generates the keywords for the item, B represents the text that generates the item-related descriptors, K represents the name of the item, P1 represents the prompt that generates the keywords, and P2 represents the prompt that generates the item description.

[0018] Step 1.2: Convert the natural language information into embeddings using the pre-trained BERT model, specifically as follows:

[0019] A w =Bert(A)

[0020] B w =Bert(B)

[0021] Step 1.3: Merge multiple pieces of information to transform a single item into an embedding, specifically:

[0022] E w =LayerNorm(A w +B w +C w +D w +O w )

[0023] Where A w It refers to keyword embedding and B. w It describes the embedding of information, C w It is location information, D w This is attribute information; it converts the entire item sequence into an embedding sequence. The original text's embedding is 0. w Meanwhile, a start marker E is added before the sequence. cls

[0024]

[0025] Where E cls It is the start marker.

[0026] A further improvement of the technical solution of the present invention is that: in step 2, the original BERT representation is linearly transformed to obtain an isotropic semantic representation;

[0027] Incorporating learned parameters into the whitening transform results in better generalization over invisible domains.

[0028] Specifically

[0029] E w =(E w -b)·W

[0030] Where W and b are learnable parameters.

[0031] A further improvement to the technical solution of the present invention is as follows: Step 3 is specifically as follows:

[0032] Step 3.1: Feature extraction using different feature encoders

[0033]

[0034]

[0035] Where: φ is the information encoder used to obtain long-term features. It is an encoder used to obtain short-term features, which encode different short- and long-term features, where u l It is a long-term characteristic, u s It is a short-term characteristic;

[0036] Step 3.2: To better learn long-term and short-term features, contrastive learning is used. The objective function of contrastive learning is to perform contrastive learning between the encoder output and the sequence prototype. This requires that the learned representations of long and short features are closer to their corresponding prototypes than the opposite sequence prototypes.

[0037] sim(u l ,p l )>sim(u l ,p s )

[0038] sim(p l ,u l )>sim(p l ,u s )

[0039] sim(u s ,p s )>sim(u s ,p l )

[0040] sim(p s ,u s )>sim(p s ,u l )

[0041] in

[0042] sim represents the similarity function calculation, and the formula for the sequence prototype is:

[0043]

[0044]

[0045] Where MEAN represents the mean, P s This represents the short-term prototype value of the sequence, p. l This represents the prototype value of the long-term sequence, and E represents the item's embedding.

[0046] The first part of the objective function is L1 = f(u l ,p l ,p s )+f(p l ,u l ,u s )+f(u s ,p s ,p l )+f(p s ,u s ,u l ), where f is the vector cosine similarity function:

[0047] A further improvement to the technical solution of the present invention is as follows: Step 4 is specifically as follows:

[0048] Step 4.1: Fusion between short-term features

[0049] u s =AttEncoder(u s,1 ,u s,2 )

[0050] Step 4.2: Fusion between long-term features

[0051] u l =AttEncoder(u l,1 ,u l,2 )

[0052] Where u l,1 This represents the long-term interest in domain 1, u l This represents long-term interests resulting from the fusion of different domains, u s This represents short-term interests resulting from the merging of different domains, u s,1 This represents the short-term interest in domain 1, while AttEncoder represents the attention mechanism.

[0053] A further improvement to the technical solution of the present invention is as follows: Step 5 is specifically as follows:

[0054] Step 5.1: Both the historical sequence and the target item are used as input to the fusion unit, where the historical sequence is compressed using GRU; an attention mechanism is used for fusion. This model dynamically determines the ratio of long to short feature fusion. Formally, the final fused features are as follows:

[0055]

[0056] a=σ(MLP(h t ||E(x t+1 ||u l ||u s )))

[0057]

[0058] Where σ is the activation function, MLP is a multi-layer neural network, a represents the estimated fusion weights based on historical interactions, target items, and user short-term and long-term interests, and u t This represents the characteristic of integration;

[0059] Step 5.2: Predict and calculate the loss for the target sequence items.

[0060]

[0061] MLP stands for Multilayer Neural Network. This is the prediction for the output.

[0062] Calculate the likelihood function loss for the predicted and actual sequences.

[0063]

[0064] Where y is the actual result.

[0065] The final loss function is L = L1 + βL2, where β is a hyperparameter.

[0066] The technological advancements achieved by this invention, due to the adoption of the aforementioned technical solutions, are as follows: Compared to traditional methods that rely solely on items for prediction, this invention employs a large-scale model to leverage its powerful prior knowledge for generation. However, large-scale models are not adept at converting generated content into embeddings; therefore, this invention selects the BERT model for embedding conversion, thereby introducing rich textual information to improve the prediction of simple numerical items. In handling user interests, this invention decouples long-term and short-term interests. Unlike traditional feature extraction methods, this invention no longer relies solely on traditional methods such as GNNs, but instead fuses coupled information. The rationale for this method lies in the fact that users' long-term interests are relatively stable, while short-term interests are prone to change. Therefore, during prediction, this invention needs to distinguish between long-term and short-term interests and fuse them accordingly, making the prediction results more reasonable and accurate. Attached Figure Description

[0067] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0068] Figure 1 This is a flowchart illustrating the enhanced cross-domain order recommendation method of the present invention;

[0069] Figure 2 This is a schematic diagram illustrating the integration of the large language model of this invention into the item;

[0070] Figure 3 This is a diagram of the algorithm architecture of the present invention; Detailed Implementation

[0071] The present invention will be further described in detail below with reference to embodiments:

[0072] like Figure 1 The diagram shown is a flowchart of an enhanced cross-domain order recommendation method that combines a large language model and contrastive learning. Figure 3 This is a framework diagram illustrating the principle of this invention, in which the large language model is trained on a large-scale text corpus to learn language patterns and knowledge. This model can generate natural language text or understand the meaning of language text, and can handle various natural language tasks, such as text classification, question answering, and dialogue.

[0073] The specific steps are as follows:

[0074] Step 1: Integrate large language models; such as Figure 2 As shown, the name of the item is input into the large language model, which generates a related text description in the form of keyword information. Then, it is converted into an embedding by the BERT model, while embedding position encoding, type encoding information, and the original text embedding.

[0075] Step 1.1: The process is described as generating additional information in natural language form through a limited process via Prompt, where K is the name of the item, and the formula is described as follows:

[0076] A = P1(K)

[0077] A represents the text containing keywords that generate the item, such as the item's brand name and other crucial information.

[0078] B = P2(K)

[0079] B is the text that generates item-related descriptors. P1 is the prompt that generates keywords, and P2 is the prompt that generates item descriptions.

[0080] S12. The information from the natural language is converted into the form of embeddings using the pre-trained model BERT.

[0081] The formula is described as follows

[0082] A w =Bert(A)

[0083] B w =Bert(B)

[0084] S13. Merge multiple pieces of information, transforming a single item into an embedding.

[0085] E w =LayerNorm(A w +B w +C w +D w +O w )

[0086] Where A w It refers to keyword embedding and B. w It describes the embedding of information, C w It is location information, D w The attribute information converts the entire item sequence into an embedding sequence. The original text embedding is 0. w At the same time, add a start marker E before the sequence. cls

[0087]

[0088] Where E cls It is the start marker.

[0089] Step 2: Fine-tuning the fusion ratio; The whitening layer of the input parameters fine-tunes the feature encoding, controlling the fusion ratio of different parameters. The parameters of the whitening layer can be learned.

[0090] Semantic representations are obtained from BERT, but they are not directly suitable for recommendation tasks. BERT induces a non-smooth, anisotropic semantic space in general text. This situation becomes more pronounced when this invention mixes entry texts from multiple domains with significant semantic differences. A simple linear transformation is performed on the original BERT representations to obtain isotropic semantic representations.

[0091] Incorporating learned parameters into the whitening transform results in better generalization over invisible domains.

[0092] The formula is described as follows

[0093] E w =(E w -b)·W

[0094] Where W and b are learnable parameters.

[0095] Step 3: Decouple the long and short features of the sequence; decouple the sequence features from different domains to obtain the long-term and short-term features of the sequence respectively. Decoupling refers to separating the coupling information in the data feature representation, so that each feature can independently represent a part of the data, rather than depending on other features. The purpose of this method is to improve the diversity and robustness of feature representation, enabling the model to generalize better to new data.

[0096] Step 3.1: Feature extraction using different feature encoders

[0097]

[0098]

[0099] Where: φ is the information encoder used to obtain long-term features. It is an encoder used to obtain short-term features, which encode different short- and long-term features, where u l It is a long-term characteristic, u s It is a short-term characteristic;

[0100] Step 3.2: To better learn long-term and short-term features, we use contrastive learning. Contrastive learning is a self-supervised learning paradigm. The core idea of ​​contrastive learning is to learn feature representations by comparing the similarity and differences between different samples, so that similar or related samples are close to each other in the feature space, while dissimilar or unrelated samples are far apart.

[0101] The objective function for the contrastive learning design is as follows: performing contrastive learning between the encoder output and the sequence prototype requires that the learned representations of the long and short features be closer to their corresponding prototypes than the opposite sequence prototypes:

[0102] sim(u l ,p l )>sim(u l ,p s )

[0103] sim(p l ,u l )>sim(p l ,us )

[0104] sim(u s ,p s )>sim(u s ,p l )

[0105] sim(p s ,u s )>sim(p s ,u l )

[0106] in

[0107] sim represents the similarity function calculation, and the formula for the sequence prototype is:

[0108]

[0109]

[0110] Where MEAN represents the mean, P s This represents the short-term prototype value of the sequence, p. l This represents the prototype value of the long-term sequence, and E represents the item's embedding.

[0111] The first part of the objective function is L1 = f(u l ,p l ,p s )+f(p l ,u l ,u s )+f(u s ,p s ,p l )+f(p s ,u s ,u l ), where f is the vector cosine similarity function:

[0112] Step 4: Fuse the long-term and short-term features of different domains separately to obtain the long-term and short-term interests after fusion of different domains;

[0113] Step 4.1: Fusion between short-term features

[0114] u s =AttEncoder(u s,1 ,u s,2 )

[0115] Step 4.2: Fusion between long-term features

[0116] u l=AttEncoder(u l,1 ,u l,2 )

[0117] Where u l,1 This represents the long-term interest in domain 1, u l This represents long-term interests resulting from the fusion of different domains, u s This represents short-term interests resulting from the merging of different domains, u s,1 This represents the short-term interest in domain 1, while AttEncoder represents the attention mechanism.

[0118] Step 5: Combine the long-term and short-term interests of different domains after fusion to predict the next item.

[0119] Step 5.1: Both the historical sequence and the target item are used as input to the fusion unit, where the historical sequence is compressed using GRU; an attention mechanism is used for fusion. This model dynamically determines the ratio of long to short feature fusion. Formally, the final fused features are as follows:

[0120]

[0121] a=σ(MLP(h t ||E(x t+1 ||u l ||u s )))

[0122]

[0123] Where σ is the activation function, MLP is a multi-layer neural network, a represents the estimated fusion weights based on historical interactions, target items, and user short-term and long-term interests, and u t This represents the characteristic of integration;

[0124] Step 5.2: Predict and calculate the loss for the target sequence items.

[0125]

[0126] MLP stands for Multilayer Neural Network. This is the prediction for the output.

[0127] Calculate the likelihood function loss for the predicted and actual sequences.

[0128]

[0129] Where y is the actual result.

[0130] The final loss function is L = L1 + βL2, where β is a hyperparameter.

[0131] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Various modifications and improvements made by those skilled in the art to the technical solutions of the present invention without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.

Claims

1. An enhanced cross-domain order recommendation method combining large language models and contrastive learning, characterized in that: The specific steps are as follows: Step 1: Integrate the large language model; input the item name into the large language model, let the large language model generate relevant text descriptions in the form of keyword information, and then convert it into the form of embedding through the BERT model, while embedding position encoding and type encoding information; Step 2: Fine-tune the blending ratio; The input parameter whitening layer fine-tunes the feature encoding and controls the fusion ratio of different parameters. The parameters of the parameter whitening layer can be learned. Step 3: Decouple the long and short features of the sequence; decouple the sequence features from different domains to obtain the long-term and short-term features of the sequence respectively; The specific steps are as follows: Step 3.1: Feature extraction using different feature encoders in, It is an information encoder used to obtain long-term features. It is an encoder used to obtain short-term features, which are encoded to obtain features of different lengths. It is a long-term characteristic. It is a short-term characteristic; Step 3.2: To better learn long-term and short-term features, contrastive learning is used. The objective function of contrastive learning is to perform contrastive learning between the encoder output and the sequence prototype. This requires that the learned representations of long and short features are closer to their corresponding prototypes than the opposite sequence prototypes. in This represents the similarity function calculation, and the formula for the sequence prototype is: in, This represents the mean; This represents the short-term prototype value of the sequence; This represents the prototype value of the long-term sequence; This represents the item's embedding; The first part of the objective function is ,in, It is the vector cosine similarity function; Step 4: Fuse the long-term and short-term features of different domains separately to obtain the long-term interest and short-term interest after fusion of different domains; Step 5: Merge the long-term and short-term interests from different domains after fusion to predict the next item; the specific steps are as follows: Step 5.1: Both the historical sequence and the target item are used as input to the fusion unit, where the historical sequence is compressed using GRU; an attention mechanism is used for fusion. This model dynamically determines the ratio of long to short feature fusion. Formally, the final fused features are as follows: in, is the activation function; MLP is the classification layer; This represents the estimated fusion weights based on historical interactions, target items, and users' short-term and long-term interests; This represents long-term interests resulting from the integration of different domains; This represents short-term interests resulting from the merging of different domains; Step 5.2: Calculate the loss for predicting the target sequence items, specifically as follows: MLP stands for Multilayer Neural Network; The prediction for the output; The likelihood function loss is calculated for both the predicted and actual sequences, specifically as follows: Where y is the actual result; The final loss function is ,in, It is a hyperparameter.

2. The enhanced cross-domain order recommendation method combining large language models and contrastive learning according to claim 1, characterized in that: The specific steps of step 1 are as follows: Step 1.1: Input text, use the input prompt to allow the large model to specify the key attributes to return, and use the prompt to generate additional information in natural language form, specifically: Where A represents the text that generates the keywords for the item; B represents the text that generates the item's descriptor; and K represents the name of the item. It is a prompt that generates keywords; It is a prompt that generates item descriptions; Step 1.2: Convert the natural language information into embeddings using the pre-trained BERT model, specifically as follows: Step 1.3: Merge multiple pieces of information to transform a single item into an embedding, specifically: in, It is the embedding of keywords; It is the embedding that describes the information; It is location information; It is information about attributes; It is the embedding of the original text; Convert all item sequences into embedding sequences, and add a start marker to the beginning of the sequence. Specifically: in, It is the start marker.

3. The enhanced cross-domain order recommendation method combining large language models and contrastive learning according to claim 1, characterized in that: In step 2, the original BERT representation is linearly transformed to obtain an isotropic semantic representation. Incorporating learned parameters into the whitening transform yields better generalization in the invisible domain. Specifically: in, , These are learnable parameters.

4. The enhanced cross-domain order recommendation method combining large language models and contrastive learning according to claim 1, characterized in that: The specific steps of step 4 are as follows: Step 4.1: Fusion of short-term features, specifically: Step 4.2: Fusion of long-term features, specifically: in, This represents the long-term interest in domain 1; This represents long-term interests resulting from the integration of different domains; This represents short-term interests resulting from the merging of different domains; This represents the short-term interest in domain 1; AttEncoder represents the attention mechanism.