User portrait construction method based on behavior sequence modeling and graph neural network
By using behavioral sequence modeling and graph neural networks to construct user profiles, the dependency problem between user behavior sequences and interaction graphs is solved, dynamic user profiles are generated, and the accuracy and cold start capability of the recommendation system are improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHONGKE MICRO DOT TECH CO LTD
- Filing Date
- 2026-05-18
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies struggle to balance the temporal dependencies of user behavior sequences with the structural dependencies of user-item interaction graphs in complex business scenarios, resulting in low recommendation accuracy, difficulties in cold starts, and a lack of diversity in user profile representation.
By using behavior sequence modeling and graph neural networks, we obtain users' historical interaction behavior logs, construct a user-item interaction bipartite graph, and generate dynamic user profile vectors using multi-head self-attention networks and multi-layer graph attention networks, thus fusing behavioral temporal information and graph structure information.
It achieves high-fidelity extraction and dynamic expression of user interests, improves recommendation accuracy and cold start capability, and provides more timely and robust personalized decision support.
Smart Images

Figure CN122221908A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of data processing, and in particular relates to a method for constructing user profiles based on behavioral sequence modeling and graph neural networks. Background Technology
[0002] In complex business scenarios, user behavior patterns exhibit high dynamism, sparsity, and multimodal characteristics. For example, in large e-commerce platforms, user interests may rapidly switch between various behavior types such as "browsing-clicking-favoriting-purchasing" depending on promotional activities, seasonal changes, or trending events. In short video and news recommendations, users' immediate interests and long-term preferences are often interwoven by millisecond-level interaction sequences. In cross-domain recommendation scenarios, there is a heterogeneous gap in user behavior data across different business lines at the attribute metadata level. These factors necessitate that the system handles everything from minute-level real-time intent drift to month-level long-term interest evolution, while also considering the correlation between static user attributes such as age and region, and item attributes such as category and price. Furthermore, the higher-order interaction relationships between users and items may be dynamically reconstructed as the session progresses.
[0003] In existing technologies, a common approach is to construct user profiles based on traditional collaborative filtering or logistic regression models by statistically analyzing the frequency of users' historical interactions or manually constructing cross-feature combinations. This type of method is effective in scenarios with dense user behavior data and relatively static business models. However, when faced with high-dimensional, sparse real-time behavior streams, static statistical features struggle to capture the temporal evolution of user interests. Manual feature engineering cannot exhaustively enumerate high-order interaction patterns and fails to adequately utilize the sequence and intent information within the behavior sequence. This results in profile updates lagging behind real-time changes in user interests, making it ineffective in responding to users' sudden shifts or unexpected needs.
[0004] Another approach employs deep neural networks for sequence modeling, such as encoding user behavior logs using recurrent neural networks or their variants, attempting to uncover dependencies between behaviors. While this method improves the representation of short-term user interests to some extent, existing sequence models still have certain shortcomings when integrating multi-source heterogeneous information: truncated sequence modeling, used to reduce computational latency, may lose long-term, periodic user interest patterns, making it difficult to fully preserve cross-session behavioral dependencies while maintaining real-time inference efficiency; the inherent chain structure of sequence models limits their ability to capture long-distance dependencies between non-adjacent nodes in the user-item interaction graph, resulting in an inability to uncover higher-order community influences; furthermore, existing solutions often use simple concatenation or summation operations when handling the joint embedding of behavior types and item identifiers, ignoring the essential differences in semantic strength between different behavior types and the attenuation or enhancement effects of the temporal context of behavior on interest representation, making it difficult for the generated user vectors to achieve accurate decoupling and dynamic balance when facing multi-peaked interest distributions.
[0005] Therefore, the technical problem that existing technologies urgently need to solve is how to balance the temporal dependence of user behavior sequences with the structural dependence of user-item interaction graphs, and achieve high-fidelity vector-level extraction and expression of user interests from micro-behavioral patterns to macro-community influences. This would solve the problems of low recommendation accuracy, cold start difficulties, and singular profile representation in existing recommendation systems when facing scenarios with sparse behavior, drifting interests, and complex relationships. Summary of the Invention
[0006] To address the shortcomings of existing technologies, this invention proposes a user profile construction method based on behavioral sequence modeling and graph neural networks. The method includes: acquiring user historical interaction behavior logs and associated metadata; segmenting the data into session sequences based on timestamps; jointly embedding and encoding behavior types and item identifiers to generate behavior vector sequences with temporal context; initializing attribute feature vectors for user and item nodes; constructing a weighted directed user-item interaction bipartite graph based on the session sequences, with edge weights determined by both behavior type and time; constructing a fusion modeling module; encoding the behavior vector sequences using a multi-head self-attention network to output sequence representation vectors representing long-term interests and short-term intentions; performing message passing on the interaction bipartite graph using a multi-layer graph attention network to output user node representation vectors incorporating neighborhood information; and generating a unified dynamic user profile vector based on dynamic fusion weights. Finally, the profile vector is input into the downstream task adaptation layer to generate task parameters and send them to the business engine. This invention balances behavioral temporal dependence and graph structure dependence, achieving high-fidelity extraction and dynamic expression of user interests, effectively improving recommendation accuracy and cold-start capability in complex scenarios.
[0007] To achieve the above objectives, the present invention provides the following technical solution:
[0008] User profiling methods based on behavioral sequence modeling and graph neural networks include:
[0009] Obtain the target user's historical interaction behavior logs and the user attribute metadata and item attribute metadata associated with the historical interaction behavior logs; based on the timestamps in the historical interaction behavior logs, segment the user's interaction events into a session sequence with a temporal order, and perform joint embedding encoding on the behavior type and item identifier in the session sequence to generate a behavior vector sequence with temporal context;
[0010] Based on the user attribute metadata and item attribute metadata, initialize the attribute feature vectors of user nodes and item nodes, and construct a user-item interaction bipartite graph according to the behavior order and behavior type in the session sequence; wherein, the weight of the directed edge between user nodes and item nodes in the user-item interaction bipartite graph is determined by the behavior type and the behavior occurrence time.
[0011] Construct a fusion modeling module, which performs the following operations:
[0012] The behavior vector sequence is input into a multi-head self-attention network for encoding, and the output is a sequence representation vector representing the user's interests and intentions;
[0013] The user-item interaction bipartite graph is input into a multi-layer graph attention network for message passing and node representation update, and the output is a user node representation vector that integrates neighborhood information.
[0014] Based on the sequence representation vector and the user node representation vector combined with a preset dynamic fusion weight, a unified dynamic user profile vector is generated.
[0015] The dynamic user profile vector is input into the downstream task adaptation layer to generate a set of task parameters for the corresponding business scenario. The set of task parameters is then sent to the business engine through the service interface to drive personalized decision execution.
[0016] Specifically, user interaction events are segmented into session sequences with a temporal order, including:
[0017] Based on the timestamps in the historical interaction behavior logs, the interaction events are divided into candidate session sequences according to the preset initial session segmentation parameters;
[0018] The behavior type and item identifier in the candidate session sequence are jointly embedded and encoded to generate behavior type embedding vector and item identifier embedding vector, and the time decay factor is calculated based on the time interval between adjacent events in the candidate session sequence.
[0019] The time decay factor is fused with the behavior type embedding vector and the item identifier embedding vector to obtain a candidate behavior vector sequence with temporal context.
[0020] Calculate the intent cohesion index of the candidate session sequence, which is used to characterize the comprehensive consistency of behaviors within the sequence in terms of semantic similarity and temporal continuity;
[0021] Determine whether the intent cohesion index is less than a first preset threshold;
[0022] If the intent cohesion index is less than the first preset threshold, the candidate session sequence is determined as the final session sequence, and the candidate behavior vector sequence is output as a behavior vector sequence with temporal context.
[0023] If the intent cohesion index is greater than or equal to the first preset threshold, the initial session segmentation parameters are adjusted according to the intent cohesion index, and the process of segmenting the interaction event into a candidate session sequence based on the timestamp in the historical interaction behavior log and according to the preset initial session segmentation parameters is returned, until the intent cohesion index is less than the first preset threshold.
[0024] Specifically, the time decay factor is calculated based on the time interval between adjacent events in the candidate session sequence, including:
[0025] Obtain a candidate session sequence, which includes interactive events arranged in chronological order and the absolute timestamps corresponding to each event;
[0026] Based on the absolute timestamp, the time interval between adjacent events is calculated to obtain a time interval sequence;
[0027] Initialize the decay parameters of the time decay function, wherein the decay parameters include a reference decay rate and a time scale factor;
[0028] Based on the time interval sequence and the decay parameter, the time decay factor of each event relative to the current time is calculated using the time decay function to obtain the time decay factor sequence.
[0029] Based on the decay parameter of the current iteration, the time decay factor sequence of the current iteration is fused with the behavior type embedding vector and the item identifier embedding vector of the corresponding event to generate the first candidate behavior vector sequence.
[0030] Calculate the temporal continuity index of the first candidate behavior vector sequence. The temporal continuity index is used to quantify the consistency between the mean cosine of the angle between adjacent behavior vectors in the semantic space and the time decay factor.
[0031] Determine whether the temporal continuity index is less than a second preset threshold; if the temporal continuity index is less than the second preset threshold, then output the first candidate behavior vector sequence as a behavior vector sequence with temporal context;
[0032] If the time continuity index is greater than or equal to the second preset threshold, the attenuation parameter is adjusted according to the difference between the time continuity index and the second preset threshold, and the process returns to the step of calculating the time decay factor of each event relative to the current time using the time decay function based on the time interval sequence and the attenuation parameter, until the time continuity index is less than the second preset threshold.
[0033] Specifically, a user-item interaction bipartite graph is constructed, including:
[0034] Obtain user attribute metadata and item attribute metadata. The user attribute metadata includes user numerical features, user category features, and user text features. The item attribute metadata includes item numerical features, item category features, and item text features.
[0035] The user numerical features and item numerical features are normalized respectively to obtain normalized user numerical vectors and item numerical vectors.
[0036] Construct a categorical embedding matrix to map the user categorical features and item categorical features into corresponding user categorical embedding vectors and item categorical embedding vectors;
[0037] The semantic vectors of the user textual features are extracted using a pre-trained BERT model as user text embedding vectors, and the semantic vectors of the item textual features are extracted as item text embedding vectors.
[0038] The normalized user numerical vector, user category embedding vector, and user text embedding vector are fused to generate the initial attribute feature vector of the user node; the normalized item numerical vector, item category embedding vector, and item text embedding vector are fused to generate the initial attribute feature vector of the item node.
[0039] Specifically, constructing a user-item interaction bipartite graph also includes:
[0040] Obtain the final session sequence, which contains interaction events arranged in chronological order. Each interaction event records the user identifier, item identifier, behavior type, and absolute timestamp.
[0041] For each pair of users and items that interact, extract all events involving the corresponding user and the corresponding item from the final session sequence, and arrange them in chronological order to obtain an event list;
[0042] Initialize a learnable parameter set, which includes intensity weight parameters and edge weight time decay coefficients corresponding to behavior types;
[0043] Based on the event list, the weight of the directed edge from the corresponding user to the corresponding item is calculated by the edge weight aggregation function. The edge weight aggregation function is used to combine the behavior type intensity weight parameter of each event in the event list with the time decay effect of the corresponding event occurrence time relative to the current time, and to accumulate the contribution of all events.
[0044] Set the weight of the directed edge from the corresponding item to the corresponding user to be equal to the weight of the directed edge from the corresponding user to the corresponding item.
[0045] Specifically, constructing a user-item interaction bipartite graph also includes:
[0046] Based on all user nodes, item nodes, and the calculated directed edge weights, construct a bipartite graph of user-item interactions;
[0047] Calculate the node discrimination index of the user-item interaction bipartite graph. The node discrimination index is used to quantify the discriminability of different user nodes and item nodes in the feature space and graph structure.
[0048] Determine whether the node discrimination index is less than a fourth preset threshold;
[0049] If the node discrimination index is less than the fourth preset threshold, then the currently constructed user-item interaction bipartite graph is determined as the final user-item interaction bipartite graph.
[0050] If the node discrimination index is greater than or equal to the fourth preset threshold, then based on the difference between the node discrimination index and the fourth preset threshold, update the intensity weight parameter and edge weight time decay coefficient corresponding to the behavior type in the learnable parameter set, and return to execute the step of calculating the directed edge weight based on the event list through the edge weight aggregation function, until the node discrimination index is less than the fourth preset threshold.
[0051] Specifically, the output sequence representation vector representing user interests and intentions includes:
[0052] Obtain the behavior vector sequence, which contains N behavior vectors arranged in chronological order, each behavior vector corresponding to an interaction event;
[0053] Add position encoding to the behavior vector sequence to generate a behavior vector sequence with position information;
[0054] The behavior vector sequence with location information is input into a multi-head self-attention network. Attention weights are calculated through multiple parallel attention heads. Each attention head performs a linear transformation on the input to obtain a query matrix, a key matrix, and a value matrix, and calculates scaled dot product attention. The outputs of all attention heads are concatenated and then linearly transformed to obtain the self-attention output sequence.
[0055] The self-attention output sequence is input into a feedforward neural network, and after two layers of linear transformation and activation function processing, a feedforward output sequence is obtained.
[0056] The feedforward output sequence is subjected to layer normalization and residual connection with the self-attention output sequence to obtain the encoded vector sequence.
[0057] Extract the vector of the last time step from the encoded vector sequence as a short-term vector representing the user's intent, and perform average pooling on the entire encoded vector sequence to obtain a long-term vector representing the user's interest.
[0058] The long-term vector is concatenated with the short-term vector to obtain a sequence representation vector that integrates interest and intent.
[0059] Specifically, the output user node representation vector, which incorporates neighborhood information, includes:
[0060] Obtain the final user-item interaction bipartite graph, which includes a set of user nodes, a set of item nodes, and directed edge weights between nodes. Each user node and item node includes the initial attribute feature vector.
[0061] Initialize the current layer node representation vector to the initial attribute feature vector;
[0062] Set the number of layers L of the graph attention network and initialize the learnable parameters of each layer. The learnable parameters include the query transformation matrix, key transformation matrix, value transformation matrix and linear transformation matrix after multi-head concatenation for attention weight calculation.
[0063] For each layer of the graph attention network, perform the following operations in sequence:
[0064] For each target node, collect all its neighbor nodes and the corresponding directed edge weights;
[0065] The attention coefficient between the target node and each neighbor node is calculated. The attention coefficient is calculated through a single-layer feedforward network based on the current representation of the target node, the current representation of the neighbor nodes, and the weight of the directed edge between the target node and the neighbor nodes. The attention coefficient of all neighbors of the same target node is normalized using the softmax function.
[0066] Specifically, the output of the user node representation vector fused with neighborhood information also includes:
[0067] Based on the normalized attention coefficients, the value vectors of neighboring nodes are weighted and summed to obtain the single-head aggregated vector of the target node;
[0068] Multiple independent attention heads are computed in parallel. The aggregated vectors obtained from each attention head are concatenated along the feature dimension and then subjected to a linear transformation to obtain the layer output vector of the target node.
[0069] The layer output vector of the target node output by the current layer is normalized to obtain a normalized vector; the normalized vector is added element by element to the node representation vector of the target node input by the current layer to obtain the updated node representation vector of the current layer.
[0070] The node representation vectors of all user nodes output by the last layer of the graph attention network are determined as user node representation vectors that incorporate neighborhood information.
[0071] Specifically, a unified dynamic user profile vector is generated, including:
[0072] Obtain the sequence representation vector and the user node representation vector of the fused neighborhood information;
[0073] Initialize the parameters of the fusion gated network, which includes a first fully connected layer, a second fully connected layer, and a softmax output layer;
[0074] The sequence representation vector is concatenated with the user node representation vector to obtain a joint feature vector;
[0075] The joint feature vector is input into the fusion gated network, mapped to the hidden layer dimension through the first fully connected layer, and then mapped to the two-dimensional space through the second fully connected layer. The two dynamic fusion weights are obtained by normalization through the softmax output layer, which correspond to the weight α of the sequence representation vector and the weight β of the user node representation vector, respectively, and α+β=1.
[0076] Multiply the sequence representation vector by the weight α to obtain the weighted sequence representation components;
[0077] Multiply the user node representation vector by the weight β to obtain the weighted user node representation components.
[0078] The weighted sequence representation component and the weighted user node representation component are added element by element to generate a unified dynamic user profile vector.
[0079] The dynamic user profile vector is output to the downstream task adaptation layer.
[0080] Compared with the prior art, the beneficial effects of the present invention are:
[0081] This invention addresses the problem of fragmented temporal and structural dependencies in user interest representation in existing technologies by integrating behavioral sequence modeling and graph neural networks. Specifically, it encodes behavioral vector sequences with temporal context using a multi-head self-attention network, accurately capturing real-time intent drift within short sessions and long-term stable preferences across sessions, overcoming the deficiency of traditional sequence models in utilizing long-term dependency information. Secondly, it uses a multi-layer graph attention network for message passing in a weighted, directed user-item interaction bipartite graph, fully exploring high-order neighborhood associations and community influences between users and items, overcoming the limitation of graph neural networks in perceiving the weak temporal evolution of behavior. Furthermore, it adaptively fuses sequence representation vectors and user node representation vectors through dynamic weight fusion, generating dynamic user profile vectors that contain both micro-behavioral patterns and macro-structural information, achieving high-fidelity extraction and expression of multi-peaked user interests. Therefore, this application significantly improves recommendation accuracy and cold-start capability in scenarios with sparse behavior and drifting interests, providing downstream business engines with more timely and robust personalized decision support. Attached Figure Description
[0082] Figure 1 This is a flowchart of a user profile construction method based on behavioral sequence modeling and graph neural networks according to an embodiment of the present invention;
[0083] Figure 2 This is a flowchart illustrating the segmentation of interactive events in an embodiment of the present invention. Detailed Implementation
[0084] Please see Figure 1 The present invention provides an embodiment of a user profile construction method based on behavioral sequence modeling and graph neural networks, the steps of which include:
[0085] S1. Obtain the target user's historical interaction behavior logs and the associated user attribute metadata and item attribute metadata. For example, this embodiment uses user U1001 from an e-commerce platform as an example to obtain their historical interaction behavior logs and associated metadata. The behavior logs contain three records: Clicking on product P1001 (e.g., a brand of sports shoes) at 14:23:10 on March 1, 2025; adding product P1001 to favorites at 09:15:30 on March 2, 2025; and purchasing product P1001 at 15:40:20 on March 2, 2025. The associated user attribute metadata includes: User U1001's numerical characteristics are 128 active days and 2350 yuan in spending in the past 30 days; categorical characteristics are membership level "Gold" and registration channel "Official Website"; and textual characteristics are interest tags "Fitness, Outdoors, Running". The associated item attribute metadata includes: the numerical features of product P1001 are 23,000 collections and a price of 899 yuan; the category features are "shoes" and "a well-known brand"; and the text features are the product title "professional cushioning running shoes".
[0086] S2. Based on the timestamps in the historical interaction behavior logs, the user's interaction events are segmented into a session sequence with a temporal order, and the behavior type and item identifier in the session sequence are jointly embedded and encoded to generate a behavior vector sequence with temporal context.
[0087] S3. Based on the user attribute metadata and item attribute metadata, initialize the attribute feature vectors of user nodes and item nodes, and construct a user-item interaction bipartite graph according to the behavior order and behavior type in the session sequence; wherein, the weight of the directed edge between user nodes and item nodes in the user-item interaction bipartite graph is determined by the behavior type and the behavior occurrence time.
[0088] S4. Construct the fusion modeling module, which performs the following operations:
[0089] S41. Input the behavior vector sequence into a multi-head self-attention network for encoding, and output a sequence representation vector that represents the user's long-term interests and short-term intentions.
[0090] S42. Input the user-item interaction bipartite graph into a multi-layer graph attention network for message passing and node representation update, and output a user node representation vector that integrates neighborhood information.
[0091] S43. Based on the sequence representation vector and the user node representation vector combined with a preset dynamic fusion weight, a unified dynamic user profile vector is generated.
[0092] S5. Input the dynamic user profile vector into the downstream task adaptation layer to generate a task parameter set corresponding to the business scenario, and send the task parameter set to the business engine through the service interface to drive personalized decision execution.
[0093] In the complex and ever-changing e-commerce recommendation scenarios, the raw behavior logs lack an adaptive session segmentation mechanism, making it difficult to balance intent integrity and noise removal under dynamic user activity. This results in inherent accuracy bottlenecks in the boundary delineation of behavior sequences. Furthermore, since behavior types and item identifiers belong to heterogeneous semantic spaces, simple concatenation cannot quantify the difference in interest intensity between "clicking an item" and "purchasing an item," nor can it model implicit behavior patterns across items, leading to semantic alignment difficulties in joint embedding encoding. Simultaneously, existing positional encoding can only represent relative order and cannot accurately quantify the nonlinear impact of the actual time interval between events on interest decay, resulting in the loss of temporal context. When these biased sequences are input into a multi-head self-attention network, early behavioral information is easily diluted by recent behavioral interactions, causing distortion of long-term interest representation. These problems collectively lead to defects in the generated behavior vector sequences at the source, including ambiguous intent boundaries, semantic alignment deviations, inaccurate temporal quantification, and loss of long-term interest, thus affecting the completeness and accuracy of subsequent user profile construction.
[0094] For further explanation, please refer to [link / reference]. Figure 2 In this embodiment, user interaction events are segmented into a session sequence with a temporal order, including:
[0095] S201. Based on the timestamps in the historical interaction behavior log, the interaction events are divided into candidate session sequences according to the preset initial session segmentation parameters;
[0096] S202. Jointly embed and encode the behavior type and item identifier in the candidate session sequence to generate behavior type embedding vector and item identifier embedding vector, and calculate the time decay factor based on the time interval between adjacent events in the candidate session sequence.
[0097] It should be further explained that the step of jointly embedding and encoding the behavior type and item identifier in the candidate session sequence in this embodiment includes:
[0098] Obtain a candidate session sequence, which contains interaction events arranged in chronological order, with each interaction event corresponding to a behavior type label and an item identifier ID;
[0099] Construct a behavior type embedding matrix and an item identifier embedding matrix, and randomly initialize the row vectors of the behavior type embedding matrix as the initial embedding vectors of the behavior type and the row vectors of the item identifier embedding matrix as the initial embedding vectors of the item identifier.
[0100] For each interaction event in the candidate session sequence, the initial embedding vector of the behavior type of the event is obtained by querying the behavior type embedding matrix, and the initial embedding vector of the item identifier of the event is obtained by querying the item identifier embedding matrix.
[0101] Initialize the learnable parameters of the joint interaction layer, which includes a bilinear interaction matrix and a nonlinear activation function;
[0102] The initial embedding vector of the behavior type and the initial embedding vector of the item identifier for each event are input into the joint interaction layer. The cross-coupling features of the two are calculated through the bilinear interaction matrix, and then mapped by the nonlinear activation function to generate the joint embedding vector of the event.
[0103] Calculate the semantic similarity matrix between the joint embedding vectors of all events in the candidate session sequence, and calculate the semantic separation index based on the semantic similarity matrix. The semantic separation index is used to quantify the ratio of inter-class distance to intra-class distance of the joint embedding vectors corresponding to different behavior types in space.
[0104] Determine whether the semantic separation index is less than a third preset threshold;
[0105] If the semantic separation index is less than the third preset threshold, the joint embedding vector is determined as the final joint embedding encoding result of behavior type and item identifier, and output to subsequent steps;
[0106] If the semantic separation index is greater than or equal to the third preset threshold, then based on the difference between the semantic separation index and the third preset threshold, the learnable parameters of the joint interaction layer are updated using gradient descent, and the process returns to the step of inputting the initial embedding vector of the behavior type and the initial embedding vector of the item identifier of each event into the joint interaction layer, until the semantic separation index is less than the third preset threshold.
[0107] It should be further explained that in this embodiment, the time decay factor is calculated based on the time interval between adjacent events in the candidate session sequence, specifically as follows:
[0108] S2021. Obtain a candidate session sequence, wherein the candidate session sequence includes interactive events arranged in chronological order and the absolute timestamp corresponding to each event;
[0109] S2022. Based on the absolute timestamp, calculate the time interval between adjacent events to obtain a time interval sequence;
[0110] S2023. Initialize the decay parameters of the time decay function, wherein the decay parameters include the reference decay rate and the time scale factor;
[0111] S2024. Based on the time interval sequence and the decay parameter, calculate the time decay factor of each event relative to the current time using the time decay function to obtain the time decay factor sequence.
[0112] S2025. Based on the decay parameter of the current iteration, the time decay factor sequence under the current iteration is fused with the behavior type embedding vector and item identifier embedding vector of the corresponding event to generate a first candidate behavior vector sequence for calculating the time series continuity index.
[0113] S2026. Calculate the temporal continuity index of the first candidate behavior vector sequence. The temporal continuity index is used to quantify the consistency between the mean cosine of the angle between adjacent behavior vectors in the semantic space and the time decay factor.
[0114] S2027. Determine whether the temporal continuity index is less than a second preset threshold; if the temporal continuity index is less than the second preset threshold, output the first candidate behavior vector sequence as a behavior vector sequence with temporal context; the temporal continuity index in this embodiment is a global evaluation scalar used to measure the degree of consistency between the time decay factor sequence and the semantic continuity of the behavior vector; it is obtained by calculating the Pearson correlation coefficient between the cosine similarity sequence between adjacent vectors and the corresponding time decay factor product sequence, and is used to provide feedback to adjust decay parameters, such as the baseline decay rate and the time scale factor, to ensure that the time decay can accurately match the semantic evolution law.
[0115] S2028. If the time continuity index is greater than or equal to the second preset threshold, the attenuation parameter is adjusted according to the difference between the time continuity index and the second preset threshold, and the process returns to the step of calculating the time decay factor of each event relative to the current time based on the time interval sequence and the attenuation parameter through the time decay function, until the time continuity index is less than the second preset threshold.
[0116] For example, this embodiment takes the user behavior log of an e-commerce platform as an example to obtain a candidate session sequence containing 5 events: "clicking product A - clicking product B - adding product B to favorites - purchasing product B - clicking product C". The corresponding behavior type labels are [click, click, favorite, purchase, click], and the item IDs are [A, B, B, B, C]. When performing joint embedding encoding, the behavior type embedding matrix and item identifier embedding matrix are initialized with a dimension of 64, and the joint interaction layer is set to contain a 64×64 bilinear interaction matrix and a ReLU activation function. After the first forward computation, the joint embedding vectors of the five events are obtained. The average intra-class distance for clicks (events 1, 2, and 5) is 0.32. The intra-class distance for favorites (event 3) and purchases (event 4), each with only one event, is 0. The average inter-class distance for all different behavior types is 0.71. Therefore, the semantic separation index (intra-class / inter-class) is 0.45, which is greater than the preset third threshold of 0.3, failing to meet the requirements. Thus, the gradient is calculated based on the difference between the separation index and the threshold (0.15), and the bilinear interaction matrix parameters are updated using the Adam optimizer with a learning rate of 0.01. After three iterations, the average intra-class distance decreases to 0.21, the average inter-class distance increases to 0.78, and the semantic separation index is 0.27, less than 0.3. The joint embedding vector is then output.
[0117] The time decay factor is then calculated: based on the absolute timestamps of each event, assuming they are 09:00, 09:05, 09:20, 09:22, and 10:00, and the current time is 10:05, the adjacent intervals are calculated to be 5 minutes, 15 minutes, 2 minutes, and 38 minutes, respectively. The decay parameters are initialized, with a baseline decay rate a = 0.1 and a time scale factor b = 3600 seconds. The time decay function f = exp(-a·Δt / b) is used to calculate the time decay factor of each event relative to the current time, resulting in a time decay factor sequence [0.61, 0.63, 0.72, 0.71, 0.92]. In this example, Δt represents the time interval between the occurrence time of each interaction event and the current time, used to calculate the time decay factor to quantify the contribution of the event to the current user's interest. The longer the interval, the smaller the decay factor, reflecting a higher weight for recent behavior, thus accurately characterizing the nonlinear law of interest decay over time in the behavior vector sequence.
[0118] The time decay factor is multiplied element-wise with the joint embedding vector of the corresponding event to generate a sequence of candidate behavior vectors with time decay weights. The mean square error of the cosine similarity of adjacent vectors and the corresponding time decay factor (except the last one) is calculated as a time continuity index. The initial value is 0.18, which is greater than the second preset threshold of 0.1. Therefore, the decay parameters are adjusted according to the error: 'a' is increased to 0.15 and 'b' is reduced to 3000 seconds. The time decay factor sequence is recalculated as [0.46, 0.49, 0.61, 0.60, 0.88]. The time continuity index is calculated again and reduced to 0.09, which is less than 0.1. The output is a sequence of behavior vectors with time context. This sequence not only achieves semantic distinction between different behavior types, such as clear separation of click, favorite, and purchase vectors in space, but also accurately characterizes the nonlinear influence of interest decaying over time through the time decay factor.
[0119] S203. The time decay factor is fused with the behavior type embedding vector and the item identifier embedding vector to obtain a candidate behavior vector sequence with temporal context. In this embodiment, although both the "first candidate behavior vector sequence for calculating temporal continuity index" and the "candidate behavior vector sequence with temporal context" involve the fusion of embedding vectors and time decay factors, they differ in their roles in the process, generation purpose, parameter status, and subsequent uses: The former is an intermediate sequence temporarily generated based on the decay parameter of the current iteration step within the iterative loop of temporal continuity index optimization. Its core function is to determine whether the current decay parameter meets the requirements by calculating the temporal continuity index. If the index does not meet the requirements, the decay parameter needs to be adjusted and regenerated. Therefore, this sequence is temporary and variable, and only serves the internal closed-loop optimization of the decay parameter. The latter is a stable vector sequence finally fused after temporal continuity index optimization and confirmation that the decay parameter has met the second preset threshold in the main process of the entire session segmentation. It will serve as the basis for subsequent intention cohesion index calculation and final session sequence output. Therefore, it has determinism and output, and is the formal input to the downstream module.
[0120] It should be further explained that the candidate behavior vector sequence with temporal context obtained in this embodiment includes:
[0121] S2031. Obtain the joint embedding vector of each event obtained through joint embedding encoding, and calculate the time decay factor of each event relative to the current time through the time decay factor.
[0122] S2032. Initialize the parameters of the fusion method, wherein the fusion method includes element-wise multiplication, concatenation, and mapping through a fully connected layer;
[0123] S2033. According to the current fusion method, fuse the time decay factor with the joint embedding vector to obtain a sequence of candidate behavior vectors with time decay weights; if the fusion method is element-wise multiplication, then multiply the time decay factor of each event with each dimension of its joint embedding vector; if the fusion method is concatenation followed by full connection, then concatenate the time decay factor with the joint embedding vector and input it into the fully connected layer, and obtain the fusion vector through learnable weight mapping.
[0124] S2034. Calculate the cosine similarity of adjacent vectors in the candidate behavior vector sequence to obtain a similarity sequence; simultaneously calculate the product of the time decay factors corresponding to adjacent vectors to obtain a product sequence; calculate the Pearson correlation coefficient between the similarity sequence and the product sequence as a fusion quality index.
[0125] S2035. Determine whether the fusion quality index is greater than or equal to the preset fusion quality threshold.
[0126] S2036. If the fusion quality index is greater than or equal to the fusion quality threshold, then the current fusion method and parameters are determined to be valid, and a candidate behavior vector sequence is output.
[0127] S2037. If the fusion quality index is less than the fusion quality threshold, adjust the fusion method or update the fully connected layer parameters according to the difference between the fusion quality index and the fusion quality threshold; if the current method is element-wise multiplication, switch to the concatenated fully connected method and randomly initialize the fully connected layer weights; if the current method is already concatenated fully connected, update the fully connected layer weights through gradient descent; return to execute S2033 until the fusion quality index is less than the fusion quality threshold.
[0128] For example, taking the candidate session sequence in the aforementioned e-commerce scenario as an example, the events are: Click A - Click B - Favorite B - Purchase B - Click C. At this point, a 64-dimensional joint embedding vector for each event has been obtained through joint embedding encoding. This joint embedding vector has been fused with the behavior type embedding vector and the item identifier embedding vector, and the time decay factor sequence for each event relative to the current time is calculated using the time decay factor [0.46, 0.49, 0.61, 0.60, 0.88]. In the fusion step, the fusion layer is designed with an "element-by-element multiplication" strategy: the decay factor of the i-th event is multiplied by each dimension of its corresponding 64-dimensional joint embedding vector to generate a candidate behavior vector sequence with time decay weights. For example, the joint embedding vector of event 1 (Click A) is assumed to have a first dimension value of 0.52, which becomes 0.2392 after multiplying by the decay factor of 0.46; the joint embedding vector of event 3 (Favorite B) has a first dimension value of 0.78, which becomes 0.4758 after multiplying by 0.61. After this operation, all five events yielded new 64-dimensional vectors, forming a candidate behavior vector sequence. To verify the fusion effect, the cosine similarity of adjacent vectors was calculated: the cosine similarity between event 1 and event 2 was 0.82, between event 2 and event 3 was 0.79, between event 3 and event 4 was 0.88, and between event 4 and event 5 was 0.61, with an average similarity of 0.775. Simultaneously, the product of the attenuation factors corresponding to these adjacent vectors was calculated, resulting in a product of 0.46. 0.49 = 0.225, 0.49 0.61 = 0.299, 0.61 0.60 = 0.366, 0.60 0.88 = 0.528, and its Pearson correlation coefficient with similarity is calculated to be 0.93, indicating that the time decay factor is highly consistent with semantic continuity. If this correlation coefficient is lower than the preset fusion quality threshold, such as 0.90, the fusion strategy parameters need to be adjusted, such as adding a fully connected layer after splicing, and adjusting the weights of the fully connected layer until the correlation coefficient reaches the target. In the final output behavior vector sequence, event 5 (clicking C) is the closest to the current time, corresponding to a time decay factor of 0.88, and semantically it differs greatly from the previous purchase behavior, corresponding to a cosine similarity of 0.61. It presents the characteristics of "high weight and low continuity" in the vector space, accurately depicting the instantaneous switching of user interests.
[0129] S204. Calculate the intent cohesion index of the candidate session sequence. The intent cohesion index is used to characterize the comprehensive consistency of behaviors within the sequence in terms of semantic similarity and temporal continuity, specifically:
[0130] Calculate the semantic similarity between any two events in the candidate session sequence to generate an N×N semantic similarity matrix S, where S_ij represents the semantic similarity between the i-th event and the j-th event, and the semantic similarity is determined based on the cosine distance between the combined vectors of the behavior type embedding vector and the item identifier embedding vector;
[0131] Based on the absolute timestamps, calculate the time intervals between adjacent events to obtain a time interval sequence of length N - 1;
[0132] According to the time interval sequence, calculate the temporal continuity weights between each event and subsequent events through a time decay function to obtain a temporal continuity matrix T, where T_ij represents the temporal continuity coefficient between the i-th event and the j-th event. When i = j, T_ij = 1. When i < j, T_ij = exp(-v Δt ij (j - i)), where v is a preset time decay constant; The temporal continuity coefficient in this embodiment is used to construct the temporal continuity matrix T and is a fine-grained pairwise metric representing the temporal association strength between the i-th event and the j-th event in the same session sequence. Where Δt ij is the time interval between the i-th event and the j-th event, and the preset time decay constant v exponentially decays as the event interval increases or the distance becomes farther, and is used for subsequent calculation of the intention coherence metric.
[0133] Based on the semantic similarity matrix S and the temporal continuity matrix T, calculate the intention coherence metric C, the intention coherence metric ; where N is the total number of events in the sequence, the semantic similarity between the i-th event and the j-th event is determined based on the cosine distance of the joint embedding vector, and the value ranges from 0 to 1. The larger the value, the more similar the semantics; the definition of the temporal continuity coefficient between the i-th event and the j-th event is as follows: when i is equal to j, the temporal continuity coefficient is equal to 1; when i is less than j, the temporal continuity coefficient is equal to the exponential function value with the natural constant e as the base and the negative of the preset time decay constant multiplied by the time interval between the i-th event and the (i + 1)-th event and then multiplied by (j - i) as the exponent; when i is greater than j, the temporal continuity coefficient is equal to the temporal continuity coefficient of the j-th event with respect to the i-th event.
[0134] S205. Determine whether the intention coherence metric is less than a first preset threshold;
[0135] S2051. If the intention coherence metric is less than the first preset threshold, then determine the candidate session sequence as the final session sequence and output the candidate behavior vector sequence as a behavior vector sequence with temporal context;
[0136] S2052. If the intent cohesion index is greater than or equal to the first preset threshold, then adjust the initial session segmentation parameters according to the intent cohesion index, and return to the step of segmenting the interaction event into a candidate session sequence based on the timestamp in the historical interaction behavior log according to the preset initial session segmentation parameters, until the intent cohesion index is less than the first preset threshold.
[0137] It should be further explained that, in this embodiment, adjusting the initial session segmentation parameters based on the intent cohesion index specifically includes:
[0138] S2052a. If the intention cohesion index is greater than or equal to the first preset threshold, then calculate the difference between the intention cohesion index and the first preset threshold.
[0139] S2052b. Dynamically adjust the initial session segmentation parameters according to the magnitude of the difference. The initial session segmentation parameters include a maximum time interval threshold and a minimum sequence length threshold. The adjustment rule is to proportionally decrease the maximum time interval threshold or proportionally increase the minimum sequence length threshold, wherein the proportional coefficient is determined by the difference and a preset adjustment step size factor.
[0140] S2052c: Re-segment the original behavior log using the adjusted segmentation parameters to obtain new candidate session sequences, and return to perform joint embedding encoding on the candidate session sequences and subsequent steps until the intent cohesion index is less than the first preset threshold.
[0141] For example, this embodiment takes the user behavior log of an e-commerce platform as an example. The original behavior stream contains a large number of events. After initial session segmentation parameters, including a maximum time interval of 30 minutes and a minimum sequence length of 2, the candidate session sequence is obtained: event 1 (09:00 click A), event 2 (09:05 click B), event 3 (09:20 favorite B), event 4 (09:22 purchase B), event 5 (10:00 click C). First, the semantic similarity matrix S is calculated. It is based on the joint embedding vector of behavior type and item identifier, and is obtained using cosine distance. Smaller values indicate greater semantic similarity. For example, S12=0.2, S13=0.5, S14=0.7, S15=0.8, S23=0.1, S24=0.3, S25=0.8, S34=0.05, S35=0.6, S45=0.7. Based on the absolute timestamp, the time interval sequence Δt=[5,15,2,38] minutes is obtained. Assuming a time decay constant v=0.05, the time continuity matrix T is calculated. For example, T12=exp(-0.05×5×1)=0.7788, T13=exp(-0.05×5×2)=0.6065, and so on. According to the formula... With N=5, C=0.192 is obtained. A first preset threshold θ1=0.18 is set. Since C>0.18, the sequence is deemed to have excessive cohesion, i.e., insufficient consistency, and the segmentation parameters need adjustment. The maximum time interval threshold is reduced from 30 minutes to 25 minutes. After resegmentation, event 5 is assigned to the next session because its interval with event 4 is 38 minutes, exceeding the new maximum time interval threshold. The new sequence only contains the first four events. C=0.158 is recalculated, which is less than 0.18, meeting the requirements. Finally, this sequence is output as a session-level behavioral unit.
[0142] This application addresses technical issues such as ambiguous intent boundaries, semantic alignment deviations, and inaccurate time decay in user behavior logs during session segmentation, semantic encoding, and time quantization in e-commerce recommendation scenarios. It achieves high-fidelity extraction and accurate representation of user interests by constructing an adaptive session segmentation mechanism and an iterative optimization framework combining joint embedding encoding and time decay factors. Specifically, firstly, an intent cohesion index is used to jointly quantify the semantic similarity and temporal continuity of candidate session sequences. By dynamically adjusting the segmentation parameters until the cohesion falls below a preset threshold, the intent fragmentation and noise contamination problems caused by fixed-threshold segmentation are resolved, ensuring the purity and integrity of session units in terms of behavioral intent. Secondly, through iterative optimization of the bilinear interaction matrix of the joint interaction layer and the semantic separation index, behavior types and item identifiers are mapped to a unified semantic space, enabling the quantitative differentiation of interest intensity differences between different behavior types such as "click" and "purchase," overcoming the semantic alignment difficulties caused by simple splicing of heterogeneous information. Thirdly, a time decay factor is introduced. The closed-loop feedback adjustment mechanism of the sub-indices and temporal continuity indicators, through adaptive adjustment of the decay parameters, enables the time decay to accurately match the evolution law of the behavior vector in the semantic space, solving the problem that traditional positional encoding cannot quantify the nonlinear impact of the real time interval on interest decay. The final output behavior vector sequence achieves the technical effect of clear intent boundaries, high semantic discriminability, and accurate time decay from the source, providing high-quality input data for subsequent multi-head self-attention networks to capture long-term interests and short-term intents, fundamentally improving the completeness and accuracy of user profile construction, and thus significantly improving the recommendation accuracy and cold start capability of the recommendation system in scenarios of sparse behavior and interest drift.
[0143] In this embodiment, during the construction of the user-item interaction bipartite graph, user attribute metadata and item attribute metadata often exhibit heterogeneity, such as the coexistence of numerical, categorical, and textual types, high-dimensional sparsity, and varying degrees of missing and noise. Designing a robust initialization strategy to uniformly encode these metadata into semantically aligned node feature vectors while preserving the original attribute discriminability is fundamentally crucial to the quality of subsequent graph neural network representations. Secondly, when constructing the bipartite graph based on the session sequence, multiple interaction events may occur between the same user and the same item. These events correspond to different behavior types, such as clicks, favorites, purchases, and occurrence times. How to comprehensively quantify the combined impact of multiple interactions on a single directed edge and design a reasonable weight aggregation function to fuse them is a critical challenge. Understanding the semantic intensity differences and time decay effects of behavior types, and avoiding signal cancellation or information loss caused by simple aggregation, is key to achieving accurate representation in graph structures. Furthermore, the causal dependencies implicit in behavior sequences, such as clicking before purchasing, are difficult to directly represent in graph structures. While the direction of directed edges can characterize the interaction direction, they cannot model the temporal logic of events before and after the action, limiting the graph structure's ability to depict the evolution of user interests. Simultaneously, bipartite graphs inherently possess long-tail sparsity, with many user nodes connected by only a few edges. Enhancing the distinguishability of weakly connected nodes through refined edge weight design, and providing high-quality initial structure and weight information for subsequent message passing in the graph neural network, are critical challenges that need to be addressed during the construction phase. Therefore, this embodiment constructs a user-item interaction bipartite graph, including:
[0144] S301. Obtain user attribute metadata and item attribute metadata. The user attribute metadata includes user numerical features, user category features, and user text features. The item attribute metadata includes item numerical features, item category features, and item text features.
[0145] S302. Normalize the user numerical features and the item numerical features respectively to obtain normalized user numerical vectors and item numerical vectors.
[0146] S303. Construct a categorical embedding matrix to map the user categorical features and item categorical features into corresponding user categorical embedding vectors and item categorical embedding vectors;
[0147] S304. Extract the semantic vector of the user textual features as the user text embedding vector through a pre-trained BERT model, and extract the semantic vector of the item textual features as the item text embedding vector.
[0148] For example, this embodiment takes the user profile construction scenario of an e-commerce platform as an example. Assume that node attribute feature vectors need to be initialized for user U1001 (a newly registered member) and product P2024 (a type of sports shoe). User U1001's attribute metadata includes: numerical user features such as 28 active days and 1250 yuan in spending in the last 30 days; categorical user features such as membership level "Silver" and registration channel "App Store"; and textual user features such as interest tags "outdoor sports, running". Product P2024's attribute metadata includes: numerical item features such as 12,000 favorites and a price of 549 yuan; categorical item features such as category "shoes" and brand "a sports brand"; and textual item features such as product title "lightweight cushioning running shoes". First, the numerical features are normalized: assuming the maximum active days are 365 days and the maximum spending is 10,000 yuan, the normalized user numerical vector is [0.0767, 0.1250]; the maximum collections are 100,000 and the maximum price is 2,000 yuan, the normalized item numerical vector is [0.1200, 0.2745]. Next, a 32-dimensional categorical embedding matrix is constructed, mapping the user category "Silver" to a user category embedding vector, assuming the first three dimensions are [0.12, -0.05, 0.08], and mapping the item category "Shoes" to an item category embedding vector, assuming the first three dimensions are [0.21, 0.15, -0.03]. Then, the semantic vector of the user text "outdoor sports, running" is extracted using a pre-trained BERT model as the user text embedding vector, with the first three dimensions [0.35, 0.42, 0.28]. The semantic vector of the product title "lightweight cushioning running shoes" is extracted as the item text embedding vector, with the first three dimensions [0.58, 0.63, 0.71]. Finally, the normalized numerical vector, category embedding vector, and text embedding vector are concatenated to obtain the initial attribute feature vector of user U1001 (dimension = 2 + 32 + 768 = 802 dimensions, with the first three dimensions shown as [0.0767, 0.1250, 0.12]) and the initial attribute feature vector of item P2024 (the first three dimensions shown as [0.1200, 0.2745, 0.21]). This initialization process, through the normalization, embedding mapping, and semantic extraction of heterogeneous features, uniformly encodes multi-source metadata into semantically aligned node feature vectors, providing initial node representations containing complete attribute information for subsequent message passing in the graph neural network. In this embodiment, the 802-dimensional vector is composed of three parts: 2 dimensions are from the normalized result of the user's numerical features; 32 dimensions are from the embedding vector obtained by mapping the user's categorical features through a categorical embedding matrix; and 768 dimensions are from the semantic vector extracted by the user's textual features through a pre-trained BERT model. The three parts are concatenated to form a complete initial attribute feature vector containing multi-source heterogeneous information about the user.
[0149] S305. The normalized user numerical vector, user category embedding vector, and user text embedding vector are fused to generate the initial attribute feature vector of the user node; the normalized item numerical vector, item category embedding vector, and item text embedding vector are fused to generate the initial attribute feature vector of the item node.
[0150] S306. Obtain the final session sequence, which contains interactive events arranged in chronological order, and each interactive event records the user identifier, item identifier, behavior type and absolute timestamp;
[0151] S307. For each pair of users and items that interact, extract all events involving that user and that item from the final session sequence and arrange them in chronological order to obtain an event list. For example, following the previous example of user U1001 and item P2024, assume the final session sequence contains multiple interaction events between U1001 and P2024: Event 1 occurs at 10:15:23 on March 2, 2025, with the action type "click"; Event 2 occurs at 10:18:45 on March 2, 2025, with the action type "favorite"; Event 3 occurs at 10:22:10 on March 2, 2025, with the action type "purchase". Then the event list contains these three events arranged in chronological order.
[0152] S308. Initialize the learnable parameter set, which includes the intensity weight parameters and edge weight time decay coefficients corresponding to the behavior types. For example, assume there are three behavior types: click, favorite, and purchase. Initialize the behavior type intensity weight vector, setting the intensity weight of click to 0.5, the intensity weight of favorite to 0.8, and the intensity weight of purchase to 1.0. Initialize the edge weight time decay coefficient to 0.001 per second. These parameters will be adjusted through optimization in subsequent iterations.
[0153] S309. Based on the event list, the directed edge weight from the user to the item is calculated using an edge weight aggregation function. This function integrates the behavior type intensity weight parameter of each event in the event list with the time decay effect of the event's occurrence time relative to the current moment, and accumulates the contributions of all events. Further explanation is needed: in this embodiment, the edge weight aggregation function is specifically defined as follows: for multiple events included in the user-item event list, first, the product of the behavior type intensity weight of each event and the time decay factor of the event's occurrence time relative to the current moment is calculated. Then, the products of all events are accumulated to obtain the directed edge weight from the user to the item. In this embodiment, the time decay factor is calculated based on an independent edge weight time decay coefficient and time interval. It is used to weight and aggregate multiple interaction events of the same user-item pair, such as clicks, favorites, and purchases, according to their occurrence time relative to the current moment, and accumulates the contributions of all events to obtain the directed edge weight. Its parameters are iteratively optimized based on the node discrimination index to construct an interaction graph structure that reflects both the behavior type intensity and timeliness.
[0154] The time impact decay factor for each event is calculated based on the time interval between the event occurrence time and the current time and the edge weight time decay coefficient. The longer the time interval, the smaller the time impact decay factor. Specifically, an exponential decay function with the natural constant e as the base is used, and the edge weight time decay coefficient with the negative exponent is multiplied by the time interval.
[0155] For example, suppose the current time is 11:00:00 on March 2, 2025. Calculate the time interval between each event and the current time: the time interval for event 1 is 2677 seconds, for event 2 it is 2475 seconds, and for event 3 it is 2270 seconds. Based on the edge weight time decay coefficient of 0.001 per second, calculate the time impact decay factor for each event: for event 1 it is approximately e^(-2.677) = 0.069, for event 2 it is approximately e^(-2.475) = 0.084, and for event 3 it is approximately e^(-2.27) = 0.103. Multiply the behavior type intensity weight of each event by the time impact decay factor: the contribution of a click event is 0.5 multiplied by 0.069 = 0.0345, the contribution of a favorite event is 0.8 multiplied by 0.084 = 0.0672, and the contribution of a purchase event is 1.0 multiplied by 0.103 = 0.103. The weight of the directed edge from user U1001 to item P2024 is 0.2047.
[0156] S310. Set the weight of the directed edge from the item to the user to be equal to the weight of the directed edge from the user to the item; that is, ensure the bidirectional symmetry of the edges, and the weight of the directed edge from the item to the user is also 0.2047.
[0157] S311. Based on all user nodes, item nodes, and the calculated directed edge weights, construct a user-item interaction bipartite graph; treat all users as one type of node and all items as another type of node, and establish two directed edges with opposite directions but equal weights between each pair of interacting users and items to form a complete bipartite graph structure.
[0158] S312. Calculate the node distinguishability index of the user-item interaction bipartite graph. The node distinguishability index is used to quantify the distinguishability of different user nodes and item nodes in the feature space and graph structure, and is composed of two parts:
[0159] The first part concerns the degree distribution entropy of all nodes. A node's "degree" is defined as the number of distinct neighboring nodes connected to it. For a user node, its degree is the number of different items it has interacted with; for an item node, its degree is the number of different users it has interacted with. The degree values of all user and item nodes are counted to obtain the degree set of all nodes. Then, the frequency of each distinct degree value is counted and divided by the total number of nodes to obtain the probability of each degree value. Based on these probabilities, the entropy value is calculated. A larger entropy value indicates a more dispersed distribution of node degrees, meaning the graph contains both high-degree and low-degree nodes, resulting in higher structural diversity and better structural distinguishability of the nodes.
[0160] The second part focuses on the Gini coefficient, representing the weight distribution of all edges. First, the weight values of all directed edges in the bipartite graph are collected, forming a weight dataset. Then, the unevenness of the weight distribution is calculated according to the definition of the Gini coefficient: all weight values are sorted, the sum of the absolute values of the differences between the weights of any two edges is calculated, and then divided by the product of twice the average edge weight and the square of the total number of edges. The Gini coefficient ranges from 0 to 1; a larger coefficient indicates a more significant difference between edge weights, a more uneven weight distribution, and thus a higher degree of distinguishability of nodes along the weight dimension.
[0161] Finally, the two indicators mentioned above are weighted and summed to obtain the node discrimination index. Here, a first weight coefficient is preset for the degree distribution entropy value, and a second weight coefficient is preset for the edge weight Gini coefficient; their sum is usually 1. The degree distribution entropy value is multiplied by the first weight coefficient, and the edge weight Gini coefficient is multiplied by the second weight coefficient; then the two products are added together to obtain the final node discrimination index. This index comprehensively reflects the distinguishability of nodes at both the graph structure and edge weight levels.
[0162] For example, assume the constructed bipartite graph contains 10 user nodes and 20 item nodes, for a total of 30 nodes. Statistically, the node degree distribution is as follows: 5 nodes have a degree of 1, 10 have a degree of 2, 8 have a degree of 3, 4 have a degree of 4, and 3 have a degree of 5. The probabilities of each degree value are calculated as follows: the probability of degree 1 is p(1) = 5 / 30 ≈ 0.1667, the probability of degree 2 is p(2) = 10 / 30 ≈ 0.3333, the probability of degree 3 is p(3) = 8 / 30 ≈ 0.2667, the probability of degree 4 is p(4) = 4 / 30 ≈ 0.1333, and the probability of degree 5 is p(5) = 3 / 30 = 0.1. Based on these probabilities, the entropy value is calculated, resulting in a degree distribution entropy of approximately 2.3. Simultaneously, the Gini coefficient of all directed edge weights is calculated, assuming a result of 0.4. Assuming the weighting coefficient of the degree distribution entropy is 0.5 and the weighting coefficient of the edge weight Gini coefficient is also 0.5, then the node discrimination index is 0.5 × 2.3 + 0.5 × 0.4 = 1.35. This value is used to determine whether the currently constructed bipartite graph meets the discrimination requirement. If it is less than the preset threshold, it is output; otherwise, the parameters need to be adjusted and re-optimized.
[0163] S313. Determine whether the node discrimination index is less than the fourth preset threshold;
[0164] S314. If the node discrimination index is less than the fourth preset threshold, then the currently constructed user-item interaction bipartite graph is determined as the final user-item interaction bipartite graph.
[0165] S315. If the node discrimination index is greater than or equal to the fourth preset threshold, then based on the difference between the node discrimination index and the fourth preset threshold, update the intensity weight parameter and edge weight time decay coefficient corresponding to the behavior type in the learnable parameter set, and return to execute the step of calculating the directed edge weight based on the event list through the edge weight aggregation function, until the node discrimination index is less than the fourth preset threshold.
[0166] For example, this embodiment uses the construction of a user profile on an e-commerce platform as an example for a complete explanation. First, initialize the node attribute feature vectors for user U1001 and item P2024: user U1001 has been active for 28 days and has spent 1250 yuan in the past 30 days. After normalization, the user numerical vectors are 0.0767 and 0.1250. The user category features include a silver membership level and an App Store registration channel. These are mapped to a 32-dimensional user category embedding vector through a constructed categorical embedding matrix, with the first three dimensions being 0.12, -0.05, and 0.08. The user text features are interest tags "outdoor sports, running," which are extracted into a 768-dimensional user text embedding vector through a pre-trained BERT model, with the first three dimensions being 0.35, 0.42, and 0.28. Concatenate the above normalized numerical vector, category embedding vector, and text embedding vector to obtain the initial attribute feature vector of user U1001, with a dimension of 802. Item P2024 has 12,000 favorites and a price of 549 yuan. After normalization, the item's numerical vectors are 0.1200 and 0.2745. The item's category features include "shoes" and "a sports brand," mapped to a 32-dimensional category embedding vector using a categorical embedding matrix, with the first three dimensions being 0.21, 0.15, and -0.03. The item's text feature is the product title "Lightweight Cushioning Running Shoes," extracted into a 768-dimensional text embedding vector using a pre-trained BERT model, with the first three dimensions being 0.58, 0.63, and 0.71. Concatenating the normalized numerical vector, category embedding vector, and text embedding vector yields the initial attribute feature vector for item P2024.
[0167] Obtain the list of interaction events between user U1001 and item P2024 in the final session sequence, arranged in chronological order as follows: Event 1 occurs at 10:15:23 on March 2, 2025, with the behavior type being click; Event 2 occurs at 10:18:45 on March 2, 2025, with the behavior type being favorite; Event 3 occurs at 10:22:10 on March 2, 2025, with the behavior type being purchase. Initialize the learnable parameter set, where the behavior type intensity weight parameters are set to click intensity 0.5, favorite intensity 0.8, and purchase intensity 1.0, and the edge weight time decay coefficient is set to 0.001 per second.
[0168] Using the current time of 11:00:00 on March 2, 2025 as the baseline, the time intervals between each event and the current time are calculated: Event 1 has a time interval of 2677 seconds, Event 2 has a time interval of 2475 seconds, and Event 3 has a time interval of 2270 seconds. The time impact decay factor for each event is calculated based on the edge weight time decay coefficient: Event 1 is approximately e^(-2.677) = 0.069, Event 2 is approximately e^(-2.475) = 0.084, and Event 3 is approximately e^(-2.27) = 0.103. Multiplying the behavior type intensity weight of each event by the time impact decay factor yields the following contributions: Click event contribution: 0.5 x 0.069 = 0.0345; Favorite event contribution: 0.8 x 0.084 = 0.0672; Purchase event contribution: 1.0 x 0.103 = 0.103. The contributions of all events are summed up to obtain a directed edge weight of 0.2047 from user U1001 to item P2024, and the weight of the directed edge from item P2024 to user U1001 is set to be equal to this weight.
[0169] Based on all user nodes, item nodes, and the calculated directed edge weights, a user-item interaction bipartite graph is constructed, containing 10 user nodes and 20 item nodes. The degree distribution of all nodes is statistically analyzed: 5 nodes have a degree of 1, 10 nodes have a degree of 2, 8 nodes have a degree of 3, 4 nodes have a degree of 4, and 3 nodes have a degree of 5. The probability of each degree value is calculated based on the degree distribution, resulting in a degree distribution entropy of approximately 2.3. Simultaneously, the weight values of all directed edges are collected, and their Gini coefficients are calculated to be 0.4. The weight coefficients for the degree distribution entropy and the edge weight Gini coefficients are both set to 0.5. The weighted sum of these two values yields a node discrimination index of 1.35. If the node discrimination index is less than the fourth preset threshold, the currently constructed bipartite graph is determined as the final bipartite graph and output; if it is greater than or equal to the fourth preset threshold, the behavior type intensity weight parameter and edge weight time decay coefficient are updated according to the difference between the index and the threshold, and the edge weight and node discrimination index are recalculated until the index meets the requirements.
[0170] This process systematically solves three major technical challenges—unified encoding of multi-source heterogeneous metadata, composite quantification of multiple interaction behaviors, and enhanced node distinguishability under sparse graph structures—by constructing a complete technical solution for user-item interaction bipartite graphs. Specifically, firstly, addressing the heterogeneity of user and item attribute metadata, a divide-and-conquer strategy is adopted, employing numerical feature normalization, categorical feature embedding mapping, and BERT semantic extraction of textual features. This unifies the encoding of multi-source heterogeneous data such as active days, spending amount, membership level, and interest tags into multi-dimensional semantically aligned initial feature vectors, eliminating dimensional differences while preserving the rich semantics of the original attributes, providing high-quality node initialization representations for graph neural networks. Secondly, addressing the challenge of quantifying the composite impact of multiple interactions on the same user-item pair, a learnable edge weight aggregation function is designed. By multiplying and accumulating the intensity weight of behavior type with the exponential time impact decay factor, clicks, favorites, and purchases are quantified. Behavioral signals of varying intensities, such as purchases, are precisely fused with their occurrence times into directed edge weights. This avoids information cancellation or loss caused by simple aggregation, ensuring that edge weights reflect both semantic differences in behavioral types and the nonlinear decay of interest over time. Finally, addressing the long-tail sparsity issue in the interaction bipartite graph, a node discriminability index is introduced. The degree distribution entropy quantifies node structural diversity, and the Gini coefficient of edge weights quantifies the uneven distribution of weights. A parameter iterative optimization closed loop based on discriminability index feedback is constructed, dynamically updating the behavioral type intensity weight parameters and the edge weight time decay coefficient according to the index difference until the discriminability index falls below a fourth preset threshold. This allows the edge weight distribution to adaptively enhance the discriminability of weakly connected nodes. The resulting user-item interaction bipartite graph exhibits semantic alignment of node attributes, accurate reflection of the combined effects of multiple interactions by edge weights, and good discriminability of sparse nodes. This provides high-quality input with both structural integrity and weight refinement for subsequent high-order neighborhood information fusion in graph attention networks, resolving the problem of biased representation of user community influence caused by insufficient graph structure quality.
[0171] It should be further explained that the output sequence representation vector representing the user's long-term interests and short-term intentions in this embodiment includes:
[0172] S411. Obtain the behavior vector sequence, which contains N behavior vectors arranged in chronological order, each behavior vector corresponding to an interaction event; for example, taking the aforementioned e-commerce scenario, the behavior vector sequence with temporal context contains behavior vectors corresponding to five events, namely clicking product A, clicking product B, adding product B to favorites, purchasing product B, and clicking product C. Each behavior vector is 64-dimensional, and the specific values are generated by the previous steps;
[0173] S412. Add positional encoding to the behavior vector sequence to generate a behavior vector sequence with positional information; for example, a sinusoidal positional encoding method is used to generate a positional encoding vector with the same dimension as the behavior vector for each position in the sequence. Add the positional encoding to the behavior vector of the corresponding position element by element to obtain a behavior vector sequence with positional information, which is still 5 64-dimensional vectors, denoted as P1, P2, P3, P4, and P5 respectively;
[0174] S413. Input the behavior vector sequence with location information into a multi-head self-attention network. Calculate attention weights using multiple parallel attention heads. Each attention head performs a linear transformation on the input to obtain a query matrix, a key matrix, and a value matrix, and calculates scaled dot product attention. The outputs of all attention heads are concatenated and then linearly transformed to obtain the self-attention output sequence. For example, the number of attention heads in the multi-head self-attention network is set to 8. For the input sequences P1 to P5, each head first maps the input to a query matrix Q, a key matrix K, and a value matrix V using three different linear transformation matrices, each with a dimension of 8. Then, the scaled dot product attention score of each head is calculated. Each head outputs an 8-dimensional vector sequence. The outputs of the 8 heads are concatenated along the feature dimension to obtain a 64-dimensional vector, which is then mapped back to 64 dimensions through a linear transformation layer to obtain the self-attention output sequence, which is still five 64-dimensional vectors, denoted as A1 to A5.
[0175] S414. The self-attention output sequence is input into a feedforward neural network, and after two layers of linear transformation and activation function processing, a feedforward output sequence is obtained. For example, the feedforward neural network contains two fully connected layers: the first layer maps the 64-dimensional input to 256 dimensions and passes it through a ReLU activation function; the second layer maps the 256-dimensional input back to 64 dimensions. The self-attention output sequences A1 to A5 are input vector-wise into the network to obtain feedforward output sequences F1 to F5, each with a dimension of 64.
[0176] S415. Perform layer normalization on the feedforward output sequence and perform residual concatenation with the self-attention output sequence to obtain an encoded vector sequence; for example, perform layer normalization on each feedforward output vector to obtain a normalized vector. Then add it element-wise with the corresponding self-attention output vector to obtain the encoded vector sequence E1 to E5, each of which is still 64-dimensional;
[0177] S416. Extract the vector of the last time step from the encoded vector sequence as a short-term vector representing the user's short-term intent, and perform average pooling on the entire encoded vector sequence to obtain a long-term vector representing the user's long-term interests; for example, extract the encoded vector E5 corresponding to the last event (clicking product C) as a short-term vector, assuming its specific value is [0.52, 0.33, -0.21, 0.18, 0.07, ...] (64 dimensions in total). Average the five encoded vectors E1 to E5 of the entire sequence along the time dimension to obtain the long-term vector, assuming its value is [0.28, 0.41, 0.15, 0.09, 0.12, ...] (64 dimensions);
[0178] S417. The long-term vector and the short-term vector are concatenated to obtain a sequence representation vector that integrates long-term interests and short-term intentions. For example, the 64-dimensional long-term vector and the 64-dimensional short-term vector are concatenated according to the feature dimensions to form a 128-dimensional vector, which is the final output sequence representation vector. This vector contains both the user's long-term behavior pattern from the beginning of the event and the immediate intention reflected by the latest click.
[0179] This embodiment encodes a sequence of behavior vectors with temporal context using a multi-head self-attention network. First, sinusoidal positional encoding is used to integrate the temporal order into the behavior vectors, solving the problem that traditional positional encoding can only represent relative order and cannot quantify the impact of real time intervals on interest decay. Second, the multi-head self-attention mechanism captures the long-distance dependency between any two events in the sequence in parallel, overcoming the long-term information dilution defect caused by the chain structure of recurrent neural networks. This allows early behaviors at the beginning of the sequence to directly participate in the attention calculation of subsequent events, effectively preserving the stable memory of the user's long-term interests. Third, residual connections and layer normalization ensure the training stability of the deep network, so that the encoded vector sequence can maintain the original behavioral semantics and integrate global contextual information. Finally, the vector of the last time step is extracted as a short-term intent representation, and the entire sequence is averaged and pooled to obtain a long-term interest representation. The two are then concatenated and fused, achieving the decoupling and joint expression of user interests in two dimensions: micro-level real-time intent and macro-level stable preferences. The resulting sequence representation vector accurately captures the instantaneous interest shift reflected in the user's most recent click on product C, while also fully preserving the long-term preferences implied in the entire behavioral pattern from clicking A to purchasing B. This provides a high-quality, multi-scale user interest representation foundation for subsequent collaborative fusion with the high-order neighborhood information output by the graph neural network, thereby significantly improving the completeness of user profiles and decision-making accuracy of the recommendation system in scenarios of sparse behavior and interest drift.
[0180] In this embodiment, during the step of high-order neighborhood information fusion of the user-item interaction bipartite graph using a graph neural network, although the bipartite graph has quantified the combined influence of behavior type and time decay through the edge weight aggregation function, the core technical problem to be solved in this step is how to effectively utilize the graph attention mechanism to distinguish the importance of different neighbor nodes, avoid the oversmoothing problem caused by multi-layer propagation, and adaptively fuse the node's own attribute features with neighborhood information to generate a user node representation that can both retain individual characteristics and reflect the community influence. Therefore, this embodiment outputs a user node representation vector that fuses neighborhood information, including:
[0181] S421. Obtain the final user-item interaction bipartite graph, which includes a set of user nodes, a set of item nodes, and directed edge weights between nodes. Each user node and item node includes the initial attribute feature vector.
[0182] S422. Initialize the current layer node representation vector to the initial attribute feature vector;
[0183] S423. Set the number of layers L of the graph attention network and initialize the learnable parameters of each layer. The learnable parameters include the query transformation matrix, key transformation matrix, value transformation matrix and linear transformation matrix after multi-head concatenation for attention weight calculation.
[0184] S424. For each layer of the graph attention network, perform the following operations in sequence:
[0185] For each target node, collect all its neighbor nodes and the corresponding directed edge weights;
[0186] The attention coefficient between the target node and each neighbor node is calculated. The attention coefficient is calculated through a single-layer feedforward network based on the current representation of the target node, the current representation of the neighbor nodes, and the weight of the directed edge between the target node and the neighbor nodes. The attention coefficient of all neighbors of the same target node is normalized using the softmax function.
[0187] S425. Based on the normalized attention coefficients, perform a weighted summation of the value vectors of the neighboring nodes to obtain the single-head aggregation vector of the target node.
[0188] S426. A multi-head attention mechanism is adopted to compute multiple independent attention heads in parallel. The aggregated vectors obtained by each attention head are concatenated along the feature dimension and then subjected to a linear transformation to obtain the layer output vector of the target node.
[0189] S427. Perform layer normalization on the layer output vector of the target node output by the current layer to obtain a normalized vector; add the normalized vector element-wise to the node representation vector of the target node input by the current layer to obtain the updated node representation vector of the current layer; for example, in step S427, the first layer of the graph attention network is used to process user U1001 as an example for detailed explanation: First, the layer output vector of user U1001 is obtained through the multi-head attention mechanism. This vector is 802-dimensional and is generated by concatenating the 64-dimensional aggregate vectors output by the four attention heads and then performing a linear transformation; then, the layer output vector is normalized, that is, the mean of each dimension of the vector is calculated and summed. The standard deviation is normalized to a standard normal distribution using learnable scaling and translation parameters, resulting in a normalized vector. Finally, the normalized vector is element-wise added to the node representation vector of user U1001 (i.e., the initial 802-dimensional attribute feature vector) input to the current layer, achieving residual connections and obtaining the updated node representation vector of the first layer. This vector retains the original attribute features of user U1001, such as 128 active days and membership level, while also incorporating behavioral information transmitted by its first-order neighbor item P2024 through the attention mechanism, such as the compound semantics of "click-favorite-purchase", providing a high-quality input representation for the subsequent second-layer network to further aggregate neighborhood information.
[0190] S428. Determine the node representation vectors of all user nodes output by the last layer graph attention network as user node representation vectors that incorporate neighborhood information.
[0191] For example, taking the aforementioned e-commerce scenario, the final bipartite graph contains 10 user nodes and 20 item nodes, each with an initial attribute feature vector of 802 dimensions. The graph attention network is set to have L=2 layers, with 4 attention heads per layer, each head outputting 64 dimensions. For the first layer, taking user node U1001 as an example, its neighbors include item node P2024, etc., each neighbor's representation is 802 dimensions. When calculating the attention coefficient between U1001 and its neighbor P2024, the representation of U1001 (802 dimensions), the representation of P2024 (802 dimensions), and the edge weights (scalars) are concatenated into a 1605-dimensional vector, input into a single-layer feedforward network, and processed by LeakyReLU to obtain a scalar score. The scores of all neighbors are normalized using the softmax function. Each head maps the neighbor representations to 64-dimensional value vectors using a 64×802 value transformation matrix. These vectors are then weighted by attention to obtain four 64-dimensional aggregated vectors, concatenated into a 256-dimensional vector, and then mapped back to 802 dimensions using an 802×256 linear transformation matrix to obtain the layer output vector. Residual connections and layer normalization yield the first-layer updated representation of U1001. The second layer is similar; here, the representation of U1001 has incorporated first-order neighbor information, and its neighbors, including items and their second-order associated users, are aggregated to obtain higher-order representations. The final output vectors of all user node representations are the user node representation vectors that have incorporated higher-order neighborhood information.
[0192] This process addresses technical challenges in graph structure information fusion, such as difficulty in distinguishing the importance of neighboring nodes, oversmoothing due to multi-layer propagation, and the difficulty in balancing individual characteristics and community influence, by introducing a multi-layer graph attention network for message passing and node representation updates in the user-item interaction bipartite graph. Specifically, in each layer of the graph attention network, the attention coefficient is calculated by concatenating the target node representation, neighbor node representation, and edge weights and inputting them into a single-layer feedforward network. This enables differentiated modeling of neighbor node importance based on behavioral semantics and temporal weights, allowing neighbors with strong intent, such as purchase behavior, to receive higher attention weights, while neighbors with weak signals, such as accidental clicks, receive lower attention weights. The contribution of neighboring nodes is suppressed; a multi-head attention mechanism is used to capture neighborhood aggregation features in different semantic subspaces in parallel, enriching the expressive power of node representations; residual connections are used to add the layer output and input representations element by element, effectively alleviating the convergence problem of node representations caused by multi-layer propagation, so that each node can retain its own initial attribute features while aggregating high-order neighborhood information; after two layers of propagation, the user node representation integrates the behavioral information of first-order directly interacting items and aggregates the community influence transmitted by second-order and higher-order neighbors, such as the preferences of other users interacting with the same item, realizing a unified expression of individual characteristics and macro-community influence. The final output user node representation vector provides a high-quality input with rich structural semantics for the subsequent dynamic fusion with the sequence representation vector, improving the robustness of user profiles in behavior-sparse scenarios and the representation ability in the cold start stage.
[0193] This embodiment, based on the obtained sequence representation vectors reflecting users' long-term interests and short-term intentions, and user node representation vectors fused with neighborhood information, addresses the key issue of how to dynamically weigh the contribution of these two types of heterogeneous information in profile construction according to the user's current real-time behavioral context and stable social influence, avoiding the imbalance in interest expression caused by fixed-weight fusion. It should be further noted that the unified dynamic user profile vector generated in this embodiment includes:
[0194] S431. Obtain the sequence representation vector, wherein the sequence representation vector is a vector representation that integrates the user's long-term interests and short-term intentions;
[0195] S432. Obtain the user node representation vector of the fused neighborhood information, wherein the user node representation vector is the final user node vector obtained after being updated by a multi-layer graph attention network.
[0196] S433. Initialize the parameters of the fusion gated network, which includes a first fully connected layer, a second fully connected layer, and a softmax output layer;
[0197] S434. Concatenate the sequence representation vector with the user node representation vector to obtain a joint feature vector;
[0198] S435. The joint feature vector is input into the fusion gated network, mapped to the hidden layer dimension through the first fully connected layer, then mapped to the two-dimensional space through the second fully connected layer, and finally normalized through the softmax output layer to obtain two dynamic fusion weights, which correspond to the weight α of the sequence representation vector and the weight β of the user node representation vector, respectively, and α+β=1.
[0199] S436. Multiply the sequence representation vector by the weight α to obtain the weighted sequence representation components;
[0200] S437. Multiply the user node representation vector by the weight β to obtain the weighted user node representation components.
[0201] S438. The weighted sequence representation component and the weighted user node representation component are added element by element to generate a unified dynamic user profile vector.
[0202] S439. Output the dynamic user profile vector to the downstream task adaptation layer.
[0203] For example, taking the aforementioned e-commerce scenario, the sequence representation vector is 128-dimensional, and the user node representation vector is 802-dimensional. Concatenating the two yields a 930-dimensional joint feature vector. The first fully connected layer of the fusion gated network maps the 930-dimensional vector to 128-dimensional, and the second fully connected layer maps the 128-dimensional vector to 2-dimensional. The softmax output layer then yields weights α=0.3 and β=0.7. Multiplying the 128-dimensional sequence representation vector by 0.3 and the 802-dimensional user node representation vector by 0.7, and then adding them together, yields an 802-dimensional dynamic user profile vector. If the dimensions are inconsistent, they can be aligned through linear transformation. Here, it is assumed that the user node representation vector is already used as the final output dimension, and the sequence representation vector needs to be mapped to 802 dimensions through a linear transformation, or vice versa. In this example, it can be assumed that the sequence representation vector has already been mapped to 802 dimensions through the fully connected layer. This vector simultaneously integrates the user's immediate intent and long-term preferences extracted from sequential behavior, as well as the high-order community influence aggregated from the interaction graph, providing a comprehensive and adaptive user representation for downstream recommendation tasks.
[0204] This process, through the design of a fusion-gated network, dynamically calculates fusion weights based on the joint features of sequence representation vectors and user node representation vectors, achieving adaptive fusion of user interests in both the temporal evolution and community structure dimensions. Specifically, addressing the difficulty of balancing changes in users' immediate intent and stable social influences in fixed-weight fusion, the gated network generates dynamic weights for these two types of heterogeneous information based on the user's current immediate behavioral context (reflected by the sequence representation vector) and its higher-order neighborhood information in the interaction graph (reflected by the user node representation vector). This results in a corresponding increase in the contribution weight of sequence information when user interests rapidly shift, such as from browsing to purchasing; while the weight of community influence in the graph structure is enhanced during periods of sparse user behavior or cold start. The resulting dynamic user profile vector retains the micro-interest evolution trajectory extracted from the user's sequence behavior and incorporates macro-community association features aggregated from the interaction graph, overcoming the insufficient representation capabilities of a single data source and the rigid information utilization problems caused by fixed-weight fusion. After this vector is input into the downstream task adaptation layer, it can provide more targeted and timely user representations for different business processes such as recall, coarse ranking, and fine ranking, thereby significantly improving the prediction accuracy and cold start response capability of the recommendation system in complex scenarios, and realizing the optimization of personalized decision-making from static profiling to dynamic adaptation.
[0205] For example, taking the aforementioned e-commerce scenario as an example, the generated 802-dimensional dynamic user profile vector is input into the downstream task adaptation layer, which then generates a specific set of task parameters based on the recommendation business requirements. The recall task head passes through a fully connected layer, including a weight matrix. bias The dynamic user profile vector is mapped to a 64-dimensional recall embedding vector, for example, [0.23, -0.15, 0.67, ..., 0.08]. The coarse-ranking task head uses a two-layer fully connected network. The first fully connected network maps the 866-dimensional input (concatenated from the dynamic user profile vector and the candidate item vector, assumed to be 64-dimensional) to 128 dimensions. The parameters of the first fully connected network are the weight matrix. bias The second layer maps to 1D and outputs a coarse ranking score via a sigmoid function, with a threshold set to 0.6; the fine ranking task head uses a more complex three-layer neural network, including a weight matrix. , , The corresponding biases are used to output predicted click-through rate (CTR) and conversion rate (CTR). The adaptation layer encapsulates the recall embedding vector, the coarse-ranking threshold of 0.6, and all weight parameters of the fine-ranking model into a task parameter package, which is then sent to the online recommendation engine via a service interface. The engine then uses the recall embedding vector to quickly retrieve 500 candidate products from the inverted index. The coarse-ranking network calculates scores to select the top 100, which are then input into the fine-ranking network to obtain the CTR and CTR of each product. Finally, the products are sorted in descending order according to the fusion score, for example, CTR × 0.7 + CTR × 0.3. The top 10 products with the highest scores are pushed to the user as personalized recommendation results, realizing a complete closed loop from user profiling to real-time recommendation decision-making.
[0206] After generating the constructed dynamic user profile vector, it is directly input into the downstream task adaptation layer to generate task parameters. However, it lacks a timeliness evaluation and adaptive update mechanism for the dynamic user profile vector itself. Specifically, user interests evolve continuously over time, and the effectiveness of the generated dynamic user profile vector decays over time after it is distributed through the service interface. If it is not updated for a long period of time, the profile will become disconnected from actual interests. At the same time, different business scenarios have different timeliness requirements for the profile vector. For example, real-time recommendations require updates at the second level, while offline mining can accept updates at the hour level. The existing solution does not establish a lifecycle management and update trigger mechanism for the profile vector. In addition, after the profile vector is distributed to the business engine, its actual effect lacks a feedback loop, and it is impossible to continuously optimize the profile generation model based on business execution results, such as click-through rate and conversion rate. It should be further explained that this embodiment also includes a lifecycle management and adaptive update mechanism for the dynamic user profile vector after it is distributed to the business engine, specifically including:
[0207] S51. Obtain the generation timestamp T_generate of the dynamic user profile vector, and preset the validity period threshold ΔT_valid of the dynamic user profile vector. The validity period threshold is dynamically configured according to the business scenario, such as ΔT_valid=300 seconds for real-time recommendation scenario and ΔT_valid=3600 seconds for offline mining scenario.
[0208] S52. When the business engine receives the dynamic user profile vector, it records the current time T_current and calculates the existing duration of the dynamic user profile vector ΔT_existed=T_current-T_generate.
[0209] S53. Determine whether the existing duration ΔT_existed is less than the validity period threshold ΔT_valid;
[0210] If ΔT_existed < ΔT_valid, then the dynamic user profile vector is deemed valid and can be directly used for business decisions.
[0211] If ΔT_existed ≥ ΔT_valid, the dynamic user profile vector is determined to be invalid, triggering a profile update request.
[0212] S54. In response to the profile update request, obtain the user interaction behavior logs newly added from T_generate to T_current, and record them as incremental behavior logs;
[0213] S55. Based on the incremental behavior log, generate an incremental dynamic user profile vector and merge and update it with the original dynamic user profile vector, specifically including:
[0214] S551. Obtain the original dynamic user profile vector V_old and the incremental dynamic user profile vector V_increment generated by the incremental behavior log;
[0215] S552. Initialize the fusion coefficient λ, which is dynamically determined based on the ratio of the number of events N_increment in the incremental behavior log to the total number of historical events N_total, λ=min(1,N_increment / (N_total+N_increment));
[0216] S553. Calculate the updated dynamic user profile vector V_new=(1-λ)×V_old+λ×V_increment;
[0217] S554. Update the timestamp of the updated dynamic user profile vector V_new to T_current and store it in the profile library;
[0218] S56. The updated dynamic user profile vector V_new is reissued to the business engine to replace the original invalid profile vector and continue to drive personalized decision execution.
[0219] S57. Periodically collect business performance metrics from the business engine, such as order conversion rate (CTR) and click-through rate (CVR), and build a performance monitoring dataset.
[0220] S58. When the business performance indicator is lower than the preset threshold for K consecutive periods, the model retraining mechanism is triggered, and the unified dynamic user profile vector generation step is returned to be executed, and the parameters of the joint embedding encoding, graph attention network and fusion gating network are globally updated.
[0221] For example, taking the aforementioned e-commerce scenario, suppose the dynamic user profile vector is generated at 11:00:00 on March 2, 2025, and the validity period threshold in the real-time recommendation scenario is set to 300 seconds. When the business engine receives the profile vector at 11:05:30, it calculates that the existing duration of 330 seconds is greater than 300 seconds, determines it to be invalid, and triggers an update request. It retrieves the user behavior logs added between 11:00:00 and 11:05:30, including the event where user U1001 clicked on product P2025 at 11:03:20. Based on this incremental log, an incremental dynamic user profile vector V_increment is generated. Meanwhile, the total number of historical events N_total=3, corresponding to clicking A, favoriting B, and purchasing B, and the number of incremental events N_increment=1. The calculation is λ=1 / (3+1)=0.25. The original user profile vector V_old is 802-dimensional, with the first three dimensions set to [0.23, 0.45, -0.12]. The increment vector V_increment has the first three dimensions set to [0.56, 0.31, 0.08]. Therefore, the updated dynamic user profile vector V_new has the first three dimensions set to (1-0.25)×[0.23, 0.45, -0.12]+0.25×[0.56, 0.31, 0.08]=[0.1725+0.14, 0.3375+0.0775, -0.09+0.02]=[0.3125, 0.415, -0.07]. The updated dynamic user profile vector is then re-distributed to the recommendation engine for subsequent recommendation decisions. Additionally, if the conversion rate drops by more than 10% for seven consecutive days, a full model retraining is triggered.
[0222] This embodiment effectively solves the problem of timeliness decay caused by the continuous evolution of user interests after the distribution of user profile vectors, and the problem of model optimization lag caused by the lack of a closed loop of business effect feedback, by introducing a lifecycle management and adaptive update mechanism for dynamic user profile vectors. Specifically, the timeliness of profile vectors is evaluated by setting a preset validity period threshold. When an incremental update is triggered after the timeout, the fusion coefficient is dynamically calculated based on the ratio of the number of new behavior logs to the number of historical events. The incremental profile vector is then weighted and fused with the original vector. This avoids the resource overhead caused by frequent full recalculation and ensures that the profile can quickly respond to the latest changes in user interests and accurately capture interest migration. At the same time, by periodically monitoring business effect indicators, such as order conversion rate, the model is globally retrained when the indicator declines continuously. This forms a complete closed loop from profile generation, business application to effect feedback. The parameters of joint embedding encoding, graph attention network and fusion gating network can be continuously optimized based on real business data, which improves the adaptability, robustness and long-term effectiveness of the profile model in dynamic environments. This provides the recommendation system with user representation capabilities that are both responsive in real time and continuously evolving.
[0223] The embodiments of the present invention have been described above with reference to the accompanying drawings. However, the present invention is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments under the guidance of the present invention without departing from the spirit and scope of the claims. All of these variations are within the protection scope of the present invention.
Claims
1. A user profile construction method based on behavioral sequence modeling and graph neural networks, characterized in that, include: Obtain the target user's historical interaction behavior logs and the user attribute metadata and item attribute metadata associated with the historical interaction behavior logs; Based on the timestamps in the historical interaction behavior logs, the user's interaction events are segmented into a session sequence with a temporal order, and the behavior type and item identifier in the session sequence are jointly embedded and encoded to generate a behavior vector sequence with temporal context. Based on the user attribute metadata and item attribute metadata, initialize the attribute feature vectors of user nodes and item nodes, and construct a user-item interaction bipartite graph according to the behavior order and behavior type in the session sequence; wherein, the weight of the directed edge between user nodes and item nodes in the user-item interaction bipartite graph is determined by the behavior type and the behavior occurrence time. Construct a fusion modeling module, which performs the following operations: The behavior vector sequence is input into a multi-head self-attention network for encoding, and the output is a sequence representation vector representing the user's interests and intentions; The user-item interaction bipartite graph is input into a multi-layer graph attention network for message passing and node representation update, and the output is a user node representation vector that integrates neighborhood information. Based on the sequence representation vector and the user node representation vector combined with a preset dynamic fusion weight, a unified dynamic user profile vector is generated. The dynamic user profile vector is input into the downstream task adaptation layer to generate a set of task parameters for the corresponding business scenario. The set of task parameters is then sent to the business engine through the service interface to drive personalized decision execution.
2. The user profile construction method based on behavioral sequence modeling and graph neural networks as described in claim 1, characterized in that, The step of segmenting user interaction events into a temporally ordered session sequence includes: Based on the timestamps in the historical interaction behavior logs, the interaction events are divided into candidate session sequences according to the preset initial session segmentation parameters; The behavior type and item identifier in the candidate session sequence are jointly embedded and encoded to generate behavior type embedding vector and item identifier embedding vector, and the time decay factor is calculated based on the time interval between adjacent events in the candidate session sequence. The time decay factor is fused with the behavior type embedding vector and the item identifier embedding vector to obtain a candidate behavior vector sequence with temporal context. Calculate the intent cohesion index of the candidate session sequence, which is used to characterize the comprehensive consistency of behaviors within the sequence in terms of semantic similarity and temporal continuity; Determine whether the intent cohesion index is less than a first preset threshold; If the intent cohesion index is less than the first preset threshold, the candidate session sequence is determined as the final session sequence, and the candidate behavior vector sequence is output as a behavior vector sequence with temporal context. If the intent cohesion index is greater than or equal to the first preset threshold, the initial session segmentation parameters are adjusted according to the intent cohesion index, and the process of segmenting the interaction event into a candidate session sequence based on the timestamp in the historical interaction behavior log and according to the preset initial session segmentation parameters is returned, until the intent cohesion index is less than the first preset threshold.
3. The user profile construction method based on behavioral sequence modeling and graph neural networks as described in claim 2, characterized in that, The step of calculating the time decay factor based on the time interval between adjacent events in the candidate session sequence includes: Obtain a candidate session sequence, which includes interactive events arranged in chronological order and the absolute timestamps corresponding to each event; Based on the absolute timestamp, the time interval between adjacent events is calculated to obtain a time interval sequence; Initialize the decay parameters of the time decay function, wherein the decay parameters include a reference decay rate and a time scale factor; Based on the time interval sequence and the decay parameter, the time decay factor of each event relative to the current time is calculated using the time decay function to obtain the time decay factor sequence. Based on the decay parameter of the current iteration, the time decay factor sequence of the current iteration is fused with the behavior type embedding vector and the item identifier embedding vector of the corresponding event to generate the first candidate behavior vector sequence. Calculate the temporal continuity index of the first candidate behavior vector sequence. The temporal continuity index is used to quantify the consistency between the mean cosine of the angle between adjacent behavior vectors in the semantic space and the time decay factor. Determine whether the temporal continuity index is less than a second preset threshold; if the temporal continuity index is less than the second preset threshold, then output the first candidate behavior vector sequence as a behavior vector sequence with temporal context; If the time continuity index is greater than or equal to the second preset threshold, the attenuation parameter is adjusted according to the difference between the time continuity index and the second preset threshold, and the process returns to the step of calculating the time decay factor of each event relative to the current time using the time decay function based on the time interval sequence and the attenuation parameter, until the time continuity index is less than the second preset threshold.
4. The user profile construction method based on behavioral sequence modeling and graph neural networks as described in claim 3, characterized in that, The construction of the user-item interaction bipartite graph includes: Obtain user attribute metadata and item attribute metadata. The user attribute metadata includes user numerical features, user category features, and user text features. The item attribute metadata includes item numerical features, item category features, and item text features. The user numerical features and item numerical features are normalized respectively to obtain normalized user numerical vectors and item numerical vectors. Construct a categorical embedding matrix to map the user categorical features and item categorical features into corresponding user categorical embedding vectors and item categorical embedding vectors; The semantic vectors of the user textual features are extracted using a pre-trained BERT model as user text embedding vectors, and the semantic vectors of the item textual features are extracted as item text embedding vectors. The normalized user numerical vector, user category embedding vector, and user text embedding vector are fused to generate the initial attribute feature vector of the user node; the normalized item numerical vector, item category embedding vector, and item text embedding vector are fused to generate the initial attribute feature vector of the item node.
5. The user profile construction method based on behavioral sequence modeling and graph neural networks as described in claim 4, characterized in that, The construction of the user-item interaction bipartite graph also includes: Obtain the final session sequence, which contains interaction events arranged in chronological order. Each interaction event records the user identifier, item identifier, behavior type, and absolute timestamp. For each pair of users and items that interact, extract all events involving the corresponding user and the corresponding item from the final session sequence, and arrange them in chronological order to obtain an event list; Initialize a learnable parameter set, which includes intensity weight parameters and edge weight time decay coefficients corresponding to behavior types; Based on the event list, the weight of the directed edge from the corresponding user to the corresponding item is calculated by the edge weight aggregation function. The edge weight aggregation function is used to combine the behavior type intensity weight parameter of each event in the event list with the time decay effect of the corresponding event occurrence time relative to the current time, and to accumulate the contribution of all events. Set the weight of the directed edge from the corresponding item to the corresponding user to be equal to the weight of the directed edge from the corresponding user to the corresponding item.
6. The user profile construction method based on behavioral sequence modeling and graph neural networks as described in claim 5, characterized in that, The construction of the user-item interaction bipartite graph also includes: Based on all user nodes, item nodes, and the calculated directed edge weights, construct a bipartite graph of user-item interactions; Calculate the node discrimination index of the user-item interaction bipartite graph. The node discrimination index is used to quantify the discriminability of different user nodes and item nodes in the feature space and graph structure. Determine whether the node discrimination index is less than a fourth preset threshold; If the node discrimination index is less than the fourth preset threshold, then the currently constructed user-item interaction bipartite graph is determined as the final user-item interaction bipartite graph. If the node discrimination index is greater than or equal to the fourth preset threshold, then based on the difference between the node discrimination index and the fourth preset threshold, update the intensity weight parameter and edge weight time decay coefficient corresponding to the behavior type in the learnable parameter set, and return to execute the step of calculating the directed edge weight based on the event list through the edge weight aggregation function, until the node discrimination index is less than the fourth preset threshold.
7. The user profile construction method based on behavioral sequence modeling and graph neural networks as described in claim 6, characterized in that, The output sequence representation vector representing user interests and intentions includes: Obtain the behavior vector sequence, which contains N behavior vectors arranged in chronological order, each behavior vector corresponding to an interaction event; Add position encoding to the behavior vector sequence to generate a behavior vector sequence with position information; The behavior vector sequence with location information is input into a multi-head self-attention network. Attention weights are calculated through multiple parallel attention heads. Each attention head performs a linear transformation on the input to obtain a query matrix, a key matrix, and a value matrix, and calculates scaled dot product attention. The outputs of all attention heads are concatenated and then linearly transformed to obtain the self-attention output sequence. The self-attention output sequence is input into a feedforward neural network, and after two layers of linear transformation and activation function processing, a feedforward output sequence is obtained. The feedforward output sequence is subjected to layer normalization and residual connection with the self-attention output sequence to obtain the encoded vector sequence. Extract the vector of the last time step from the encoded vector sequence as a short-term vector representing the user's intent, and perform average pooling on the entire encoded vector sequence to obtain a long-term vector representing the user's interest. The long-term vector is concatenated with the short-term vector to obtain a sequence representation vector that integrates interest and intent.
8. The user profile construction method based on behavioral sequence modeling and graph neural networks as described in claim 7, characterized in that, The output user node representation vector that fuses neighborhood information includes: Obtain the final user-item interaction bipartite graph, which includes a set of user nodes, a set of item nodes, and directed edge weights between nodes. Each user node and item node includes the initial attribute feature vector. Initialize the current layer node representation vector to the initial attribute feature vector; Set the number of layers L of the graph attention network and initialize the learnable parameters of each layer. The learnable parameters include the query transformation matrix, key transformation matrix, value transformation matrix and linear transformation matrix after multi-head concatenation for attention weight calculation. For each layer of the graph attention network, perform the following operations in sequence: For each target node, collect all its neighbor nodes and their corresponding directed edge weights; The attention coefficient between the target node and each neighbor node is calculated. The attention coefficient is calculated through a single-layer feedforward network based on the current representation of the target node, the current representation of the neighbor nodes, and the weight of the directed edge between the target node and the neighbor nodes. The attention coefficient of all neighbors of the same target node is normalized using the softmax function.
9. The user profile construction method based on behavioral sequence modeling and graph neural networks as described in claim 8, characterized in that, The output user node representation vector that fuses neighborhood information also includes: Based on the normalized attention coefficients, the value vectors of neighboring nodes are weighted and summed to obtain the single-head aggregated vector of the target node; Multiple independent attention heads are computed in parallel. The aggregated vectors obtained from each attention head are concatenated along the feature dimension and then subjected to a linear transformation to obtain the layer output vector of the target node. The layer output vector of the target node output by the current layer is normalized to obtain a normalized vector; the normalized vector is added element by element to the node representation vector of the target node input by the current layer to obtain the updated node representation vector of the current layer. The node representation vectors of all user nodes output by the last layer of the graph attention network are determined as user node representation vectors that incorporate neighborhood information.
10. The user profile construction method based on behavioral sequence modeling and graph neural networks as described in claim 9, characterized in that, The generation of a unified dynamic user profile vector includes: Obtain the sequence representation vector and the user node representation vector of the fused neighborhood information; Initialize the parameters of the fusion gated network, which includes a first fully connected layer, a second fully connected layer, and a softmax output layer; The sequence representation vector is concatenated with the user node representation vector to obtain a joint feature vector; The joint feature vector is input into the fusion gated network, mapped to the hidden layer dimension through the first fully connected layer, and then mapped to the two-dimensional space through the second fully connected layer. The two dynamic fusion weights are obtained by normalization through the softmax output layer, which correspond to the weight α of the sequence representation vector and the weight β of the user node representation vector, respectively, and α+β=1. Multiply the sequence representation vector by the weight α to obtain the weighted sequence representation components; Multiply the user node representation vector by the weight β to obtain the weighted user node representation components. The weighted sequence representation component and the weighted user node representation component are added element by element to generate a unified dynamic user profile vector. The dynamic user profile vector is output to the downstream task adaptation layer.