A sequential recommendation method based on sequentially dependent enhanced self-attention network

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using a method based on sequential dependency-enhanced self-attention networks, combining GRU and self-attention mechanisms, the problem of capturing high-order dependencies and preference changes in user interaction sequences in existing technologies is solved, achieving higher recommendation accuracy and model generalization ability.

CN115687772BActive Publication Date: 2026-06-16ANHUI UNIV

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: ANHUI UNIV
Filing Date: 2022-11-09
Publication Date: 2026-06-16

AI Technical Summary

⚠Technical Problem

Existing sequence recommendation algorithms struggle to effectively capture high-order dependencies and changes in user preferences within user interaction sequences, resulting in low recommendation accuracy.

⚗Method used

We employ a sequential dependency-enhanced self-attention network approach, combining GRU and self-attention mechanisms. Through an embedding layer, a location information layer, a GRU module, a self-attention module, a feedforward layer, and a prediction layer, we capture the sequential dependency information and user preference change information of user interaction sequences. Furthermore, by stacking multiple layers of self-attention and feedforward layers, we improve the model's capture capability and prevent overfitting.

🎯Benefits of technology

It improves the accuracy of recommendations, enhances the model's generalization ability, better adapts to changes in user preferences, reduces noise interference, prevents overfitting, and improves the accuracy of recommendations.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115687772B_ABST

Patent Text Reader

Abstract

The application discloses a sequence recommendation method based on a sequential dependence enhanced self-attention network, and steps of the method comprise the following steps: 1, constructing a data set of sequence recommendation and representation; 2, obtaining feature representation of an interaction sequence; 3, obtaining sequential dependence information and user preference change information of the interaction sequence; 4, capturing various feature information of the interaction sequence through the sequential dependence enhanced self-attention network; and 5, performing sequence recommendation by using the last captured representation of the interaction sequence. When the relationship of the interaction items is processed, the sequential dependence enhanced self-attention network model is constructed, the sequential dependence information of the user interaction items and the user preference change information are considered, and therefore the recommendation precision can be improved.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of recommendation, and more specifically to a sequence recommendation method based on a sequence dependency-enhanced self-attention network. Background Technology

[0002] Sequence recommendation is an important research topic in the field of recommender systems. Traditional recommender systems model user-item interactions in a static way, but we know that user interactions are often continuous, and user preferences and item popularity often change over time. Sequence recommendation emerged to address this, treating user-item interactions as a dynamic sequence of interactions and uncovering user preferences by considering the relationships between user interactions.

[0003] Traditional sequence recommendation algorithms primarily utilize Markov chain-based methods (MC), which assume that the next item a user will interact with depends on several recently interacted items. Due to this assumption, MC-based methods fail to capture high-order dependencies between items. In recent years, with the development of deep learning, models such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) have been widely applied to sequence recommendation. RNNs capture sequential dependencies between items through recursive structures, but suffer from low efficiency and difficulty in preserving long-term dependencies. CNN models capture local features of the sequence by performing convolution operations on the input sequence, but their inherent limitations mean they can only capture local information and not global information. Later, with the widespread adoption of Transformer models in various fields, researchers began introducing self-attention methods into sequence recommendation. These methods can effectively capture global dependencies and allow the model to focus more on previous interactions that have a greater impact on future interactions. However, user behavior is often sequential, and user preferences tend to change over time. For example, after purchasing concert tickets, users might immediately buy plane tickets to the concert venue, and then book a hotel; users who previously preferred carbonated drinks might now prefer fruit-flavored carbonated beverages. Previous methods are not well-suited for these scenarios, resulting in low recommendation accuracy. Summary of the Invention

[0004] To overcome the shortcomings of existing technologies, this invention proposes a sequence recommendation method based on a sequential dependency-enhanced self-attention network, aiming to better capture the sequential dependencies of user interaction sequences and information on changes in user preferences, thereby improving the accuracy of recommended items to users.

[0005] To achieve the above-mentioned objectives, the present invention adopts the following technical solution:

[0006] The sequence recommendation method based on a sequential dependency-enhanced self-attention network of this invention is characterized by the following steps:

[0007] Step 1: Obtain user set U, and obtain the interaction sequence of any user u in user set U. in, Represents the t-th interaction item for user u; |S u | represents the length of the interaction sequence of user u; thus, the set of items I is composed of all the interaction sequences of user u;

[0008] Step 2: Construct a sequence-dependent enhanced self-attention network, including: an embedding layer, a location information layer, a GRU module, a self-attention module, a feedforward layer, and a prediction layer;

[0009] Step 2.1: The embedding layer utilizes the item vector matrix. The interaction sequence S of user u u Convert to embedding vector matrix and in, express The embedding vector representation of the item; |I| represents the length of the item set I; d represents the dimension of the item embedding vector;

[0010] Step 2.2: The location information layer utilizes the location matrix. For the embedding vector matrix E u The addition process is performed to obtain the vector matrix X after position embedding. u ,and in, express With the t-th position vector in the position matrix P The vector representation after addition;

[0011] Step 2.3: The GRU module uses equations (1)-(4) to process the vector matrix X. u The process yields a matrix of information on order and preference changes.

[0012]

[0013]

[0014]

[0015]

[0016] In equations (1)-(4), W r and U r This represents the weight matrix of the reset gate. This represents the state information at position t-1. This represents the middle vector at the t-th position of the reset gate, σ represents the sigmoid activation function, and ⊙ represents element-wise multiplication. W represents the middle vector at the t-th position of the updated gate; z and U z This represents the weight matrix of the update gate, and tanh indicates that it is the hyperbolic tangent function. W represents the candidate state information at position t; h and U h The weight matrix to be learned represents the candidate state; This represents the sequence dependency information and user preference change information captured at the t-th position;

[0017] Step 2.4: The self-attention module uses equations (5)-(7) to apply the vector matrix X u And the sequence dependency information and user preference change information matrix H u The process is performed to obtain the self-attention information matrix.

[0018]

[0019]

[0020]

[0021] In equation (5), These are the two weight matrices of the query vector for self-attention. These are the two weight matrices of the self-attention value vector; e ti Let represent the attention score between the interaction item at position t and the interaction item at position i; the attention score of the interaction item at position a... ti It is the weight value of the attention score between the interaction item at position t and the interaction item at position i; and These are the embedding vectors at the i-th position. The order and preference change information vector of the i-th position The weight matrix, This represents the output self-attention information vector at position t;

[0022] Step 2.5, the feedforward layer uses equation (8) to... After processing, the feedforward information matrix FFN(y) is obtained. i ):

[0023] FFN(y i = relu(W1y) i +b1)W2+b2 (8)

[0024] In equation (8), W1 and W2 are weight matrices; b1 and b2 are biases; and ReLU is the activation function.

[0025] Step 2.6, using equation (9) to... After performing the normalization operation, we obtain the normalized information matrix. The data is then input into the feedforward layer for processing to obtain the feedforward information matrix. and After performing residual join, the intermediate sequence representation at position t is obtained.

[0026]

[0027] In equation (9), and express The mean and variance; α and β are the scaling factors and bias; ∈ represents a constant;

[0028] Step 2.7: Represent the intermediate sequence at position t. The input is placed into the self-attention module and combined with the sequence dependency information and user preference change information matrix H. u The information is processed together to obtain the stacked self-attention information matrix Y. u After processing in steps 2.5 and 2.6, the final sequence representation at position t is obtained.

[0029] Step 2.8: The prediction layer uses equation (10) to represent the final sequence at position t. The calculation yields the output of the t-th position interaction term of user u after passing through the sequential dependency self-attention network. The score r of the i-th item i,t :

[0030]

[0031] In equation (10), r represents the embedding vector representation of the i-th item in the item vector matrix M; i,t Represents the relevance score of the i-th item; T represents the transpose;

[0032] Step 2.9: Construct the objective function Loss of the binary cross-entropy using equation (11):

[0033]

[0034] In equation (11), S represents the set of all user interaction sequences; zo′ represents the expected positive sample number corresponding to the z-th position, where a positive sample indicates the next item that user u is predicting to interact with; z This represents the negative sample number corresponding to the z-th position. A negative sample indicates an item that did not appear in the user u interaction sequence. This represents the positive sample number at position z. z The score of the corresponding positive sample; This represents the negative sample number o′ at position z. z The score of the corresponding negative sample.

[0035] Step 2.10: Train the sequential dependency-based enhanced self-attention network using backpropagation and gradient descent, and minimize the objective function Loss to update the network parameters. Stop training when the number of iterations reaches the maximum number of iterations, thereby obtaining the optimal recommendation model to output the scores of candidate items for the input interaction sequence, and select the top top items with the highest scores for recommendation.

[0036] The present invention provides an electronic device, including a memory and a processor, characterized in that the memory is used to store a program supporting the processor in executing the sequence recommendation method, and the processor is configured to execute the program stored in the memory.

[0037] The present invention provides a computer-readable storage medium on which a computer program is stored, characterized in that the computer program, when executed by a processor, performs the steps of the sequence recommendation method.

[0038] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0039] 1. Based on the absolute location information of the items, this invention further utilizes a gated recurrent neural network (GRU) to capture the sequential dependency information of user interaction sequences and the information of changes in user preferences. This enhances the model's ability to capture multi-faceted information, thereby better identifying changes in user preferences when making recommendations to users and thus providing them with better recommendations of the items they need.

[0040] 2. This invention combines the self-attention method and the GRU method. By incorporating the multifaceted information captured by GRU into the self-attention method, the attention mechanism also enables the model to focus more on information useful for future interactions, thereby filtering out the interference of noisy items and providing users with better useful suggestions.

[0041] 3. This invention employs a feedforward layer, which effectively compensates for the model's deficiency in capturing nonlinear features through two linear transformations and activation functions, thereby improving the model's fault tolerance and enhancing the generalization ability of this method, enabling it to adapt to various application scenarios.

[0042] 4. This invention can capture more comprehensive dependency information of interaction sequences by stacking multiple self-attention networks. At the same time, it adopts layer normalization, residual connection and dropout regularization techniques to prevent overfitting during model training, thereby improving the training effect of the model and thus improving the accuracy of recommendations to users. Attached Figure Description

[0043] Figure 1 This is a model diagram of the sequence recommendation method based on a sequence dependency-enhanced self-attention network proposed in this invention. Detailed Implementation

[0044] In this embodiment, a sequence recommendation method based on a sequential dependency-enhanced self-attention network mainly utilizes a gated recurrent neural network (GRU) and a self-attention network to extract various information and dependencies from the interaction sequence. For example... Figure 1 As shown, the model's input is the user's historical interaction sequence. The sequence input is processed through an embedding layer to obtain the embedding vector representation of each interaction item. Then, each interaction item is summed with its corresponding position's embedding vector. The sum is then fed into a GRU to extract order dependency information and user preference change information. The GRU output and the original information are then combined using self-attention to extract multifaceted information from the interaction sequence. A feedforward layer is then used to improve the model's ability to capture non-linearity. Multiple layers of self-attention and feedforward are stacked to further enhance the model's ability to capture diverse information. Layer normalization and residual connections are used to prevent overfitting. Finally, the extracted interaction sequence representation is used to calculate the scores of candidate items, thus recommending the top-scoring items to the user. Specifically, the process is as follows:

[0045] Step 1: Obtain the user set U, and obtain the interaction sequence of any user u in the user set U. Sort the user u's interaction sequence according to timestamps; where, Represents the t-th interaction item for user u; |S u | represents the length of the interaction sequence of user u; thus, the set of items I is composed of the interaction sequences of all users y; users with fewer than 5 interaction items and items with fewer than 5 interaction items are not considered.

[0046] Step 2: Construct a sequence-dependent enhanced self-attention network, such as... Figure 1 As shown, it includes: an embedding layer, a location information layer, a GRU module, a self-attention module, a feedforward layer, and a prediction layer;

[0047] Step 2.1: The embedding layer utilizes the item vector matrix. The interaction sequence S of user yu Convert to embedding vector matrix and in, express The embedding vector representation of the item; |I| represents the length of the item set I; d represents the dimension of the item embedding vector;

[0048] Step 2.2: The location information layer utilizes the location matrix. For the embedding vector matrix E u The superposition process is performed to obtain the vector matrix X after position embedding. u ,and Location information can visually represent the sequential relationship between each item in the user's interaction sequence. Among them, express With the t-th position vector in the position matrix P The vector representation after addition.

[0049] Step 2.3: The GRU module uses equations (1)-(4) to embed the vector matrix X after position embedding. u The process yields a matrix of sequence dependency information and user preference change information.

[0050]

[0051]

[0052]

[0053]

[0054] In equations (1)-(4), W r and U r This represents the weight matrix of the reset gate. This represents the state information at position t-1. This represents the middle vector at the t-th position of the reset gate, σ represents the sigmoid activation function, and ⊙ represents element-wise multiplication. W represents the middle vector at the t-th position of the updated gate; z and U z This represents the weight matrix of the update gate, and tanh indicates that it is the hyperbolic tangent function. W represents the candidate state information at position t; h and U h The weight matrix to be learned represents the candidate state; This represents the sequential dependency information and user preference change information captured at position t. The GRU module can effectively capture the sequential dependencies between user interaction items and can promptly perceive changes in user preferences, thereby making better recommendations.

[0055] Step 2.4: The self-attention module uses equations (5)-(7) to embed the position into the vector matrix X. u And the sequence dependency information and user preference change information matrix H u The process is performed to obtain the self-attention information matrix.

[0056]

[0057]

[0058]

[0059] In equation (5), It is the query vector weight matrix of self-attention. It is the weight matrix of the value vector of self-attention; e ti Let represent the attention score between the interaction item at position t and the interaction item at position i; the attention score of the interaction item at position a... ti It is the weight value of the attention score between the interaction item at position t and the interaction item at position i; and These are the embedding vectors at the i-th position. The vector containing the order dependency information and user preference change information of the i-th position. The weight matrix, This represents the output self-attention information vector at position t; It is mainly used to prevent the input formula (6) from being filled with e. ij If the value is too large, the partial derivative will approach 0. Within the user's interaction sequence, there are some items unrelated to the items the user will interact with in the future. Using an attention mechanism can effectively capture information about interaction items that have a significant impact on the items we will interact with in the future, reducing the interference of such noisy data.

[0060] Step 2.5, the feedforward layer uses equation (8) to... After processing, the feedforward information matrix FFN(y) is obtained. i ):

[0061] FFN(y i = relu(W1y) i +b1)W2+b2 (8)

[0062] In equation (8), W1 and W2 are weight matrices; b1 and b2 are biases; ReLU is an activation function; the feedforward layer consists of a linear change function and an activation function, which can increase the nonlinear characteristics of the model and improve the model's fault tolerance and generalization ability.

[0063] Step 2.6, using equation (9) to... After performing the normalization operation, we obtain the normalized information matrix. The data is then fed into the feedforward layer for processing to obtain the feedforward information matrix. and After performing residual join, we obtain the intermediate sequence representation matrix at position t.

[0064]

[0065] In equation (9), and express Mean and variance; α and β are scaling factors and bias; ∈ = 1e -8 This represents a constant. Using layer normalization and residual connections can effectively reduce overfitting during model training and information loss as network depth increases.

[0066] Step 2.7: Represent the intermediate sequence at position t. The input is placed into the attention module and combined with the sequence dependency information and the user preference change information matrix H. u The information is processed together to obtain the stacked self-attention information matrix Y. u After processing in steps 2.5 and 2.6, the final sequence representation at position t is obtained. Performing two processing steps allows the model to better capture multifaceted information from user interaction sequences, thereby improving the model's recommendation accuracy.

[0067] Step 2.8: The prediction layer uses equation (10) to represent the final sequence at position t. The calculation is performed to obtain the score r of the i-th item by the output of the t-th position interaction item of user u after passing through the sequential dependency self-attention network. i,t :

[0068]

[0069] In equation (10), r represents the embedding vector representation of the i-th item in the item vector matrix M; i,t Let represent the relevance score of the i-th item; T represents the transpose.

[0070] Step 2.9: Construct the objective function Loss of the binary cross-entropy using equation (11):

[0071]

[0072] In equation (11), S represents the set of all user interaction sequences; z o′ represents the expected positive sample number corresponding to the z-th position, where a positive sample indicates the next item that user u predicts to interact with; z This represents the negative sample number corresponding to the z-th position. A negative sample indicates an item that did not appear in the user u interaction sequence. This represents the positive sample number at position z. z The score of the corresponding positive sample; This represents the negative sample number o′ at position z. z The scores of the corresponding negative samples. In this embodiment, the dataset is divided into a training set, a validation set, and a test set. The latest user interaction item is used as the test set, the second newest interaction item is used as the validation set, and the rest are used as the training set.

[0073] Step 2.10: Train the sequential dependency-based enhanced self-attention network using backpropagation and gradient descent. The gradient descent method uses a learning rate of 0.001, an exponential decay rate β1 = 0.9, and β2 = 0.98, and employs the Adam optimization algorithm to minimize the objective function loss to update the network parameters. Training stops when the maximum number of iterations reaches 600, thus obtaining the optimal recommendation model to output scores for candidate items based on the input interaction sequence, and selecting the top-scoring items for recommendation.

[0074] In this embodiment, an electronic device includes a memory and a processor. The memory stores a program that supports the processor in executing the sequence recommendation method described above. The processor is configured to execute the program stored in the memory.

[0075] In this embodiment, a computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of the above-described sequence recommendation method.

Claims

1. A sequence recommendation method based on a sequence dependency-enhanced self-attention network, characterized by: Follow these steps: Step 1: Obtain the user set and obtain the user set Any user Interaction sequence ,in, Indicates user The t One interactive item; This represents the length of the interaction sequence of user u; thus, it is determined by all users. The interaction sequence constitutes the project collection ; Step 2: Construct a sequence-dependent enhanced self-attention network, including: an embedding layer, a location information layer, a GRU module, a self-attention module, a feedforward layer, and a prediction layer; Step 2.1: The embedding layer utilizes the item vector matrix. users Interaction sequence Convert to embedding vector matrix ,and ,in, express Embedded vector representation; Represents a collection of items The length of ; d represents the dimension of the item embedding vector; Step 2.2: The location information layer utilizes the location matrix. For embedding vector matrix The vector matrix after position embedding is obtained by performing addition. ,and ,in, express With position matrix The t-th position vector The vector representation after addition; Step 2.3: The GRU module uses equations (1)-(4) to process the vector matrix. The process yields a matrix of information on order and preference changes. ; (1) (2) (3) (4) In equations (1)-(4), and This represents the weight matrix of the reset gate. This represents the state information at position t-1. This represents the intermediate vector at the t-th position of the reset gate. This represents the sigmoid activation function; Indicates element-wise multiplication. This represents updating the intermediate vector at the t-th position of the gate; and This represents the weight matrix of the updated gate. It represents the hyperbolic tangent function. This represents the candidate state information at position t; and The weight matrix to be learned represents the candidate state; This represents the sequence dependency information and user preference change information captured at the t-th position; Step 2.4: The self-attention module uses equations (5)-(7) to process the vector matrix. And sequence dependency information and user preference change information matrix The process is performed to obtain the self-attention information matrix. : (5) (6) (7) In equation (5), , These are the two weight matrices of the query vector for self-attention. , These are the two weight matrices of the value vector of self-attention; This represents the attention score between the interaction item at position t and the interaction item at position i. It is the weight value of the attention score between the interaction item at position t and the interaction item at position i; and These are the embedding vectors at the i-th position. The order and preference change information vector of the i-th position The weight matrix, This represents the output self-attention information vector at position t; This represents the sequence dependency information and user preference change information captured at the i-th position. This represents the attention score between the interaction item at position t and the interaction item at position k; Step 2.5, the feedforward layer uses equation (8) to... The process is performed to obtain the feedforward information matrix. : (8) In equation (8), and It is a weight matrix; and It is a bias; It is an activation function; Step 2.6, using equation (9) to... After performing the normalization operation, we obtain the normalized information matrix. The information is then input into the feedforward layer for processing to obtain the feedforward information matrix. and After performing residual join, the intermediate sequence representation at position t is obtained. : (9) In equation (9), and express The mean and variance; and These are the scaling factor and the bias; Represents a constant; Step 2.7: Represent the intermediate sequence at position t. The input is placed into the self-attention module and combined with the sequence dependency information and user preference change information matrix. The information is processed together to obtain a stacked self-attention information matrix. After processing in steps 2.5 and 2.6, the final sequence representation at position t is obtained. ; Step 2.8: The prediction layer uses equation (10) to represent the final sequence at position t. The calculation yields the output of the t-th position interaction term of user u after passing through the sequential dependency self-attention network. Score for the i-th item : (10) In equation (10), Represents the project vector matrix The embedding vector representation of the i-th item; Represents the relevance score of the i-th item; T represents the transpose; Step 2.9: Construct the objective function of binary cross-entropy using equation (11). : (11) In equation (11), Represents the set of all user interaction sequences; This represents the positive sample number corresponding to the expected z-th position, where a positive sample indicates the next item that user u is predicting to interact with; This represents the negative sample number corresponding to the z-th position. A negative sample indicates an item that did not appear in the user u interaction sequence. This represents the positive sample number at position z. The score of the corresponding positive sample; This represents the negative sample number at position z. The score of the corresponding negative sample; Step 2.10: Train the sequence-dependent enhanced self-attention network using backpropagation and gradient descent, and make the objective function... The network parameters are updated to the minimum. Training stops when the number of iterations reaches the maximum number of iterations, thus obtaining the optimal recommendation model. The model outputs scores for candidate items based on the input interaction sequence and selects the top-scoring items for recommendation.

2. An electronic device, comprising a memory and a processor, characterized in that, The memory is used to store a program that supports the processor in executing the sequence recommendation method of claim 1, and the processor is configured to execute the program stored in the memory.

3. A computer-readable storage medium storing a computer program, characterized in that, The computer program is executed by the processor to perform the steps of the sequence recommendation method of claim 1.