An e-commerce interactive data processing method based on artificial intelligence

By constructing user-project interaction graphs and social graphs, dynamically adjusting attention weights, and generating counterfactual samples, the problems of oversmoothing and causal bias removal in GNN propagation are solved, thereby improving the accuracy and robustness of e-commerce interactive data processing.

CN122243608APending Publication Date: 2026-06-19SHENZHEN CHONGZHENG IND CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN CHONGZHENG IND CO LTD
Filing Date
2026-03-18
Publication Date
2026-06-19

Smart Images

  • Figure CN122243608A_ABST
    Figure CN122243608A_ABST
Patent Text Reader

Abstract

This invention relates to the field of e-commerce data processing technology and discloses an artificial intelligence-based method for processing e-commerce interactive data. The method includes: collecting historical interaction data to construct a user-item interaction graph and a user social graph; performing information propagation in a graph neural network to obtain the final user representation and the final spectral radius; comparing the final spectral radius with a preset threshold; if it is lower than the preset threshold, calculating the interaction probability distribution to obtain the user behavior entropy; if it is not lower than the preset threshold, generating a recommendation list; determining the counterfactual sampling rate and generating counterfactual samples and their sampling weights; inputting the counterfactual samples and their sampling weights into the graph neural network, blocking causal paths through a backdoor adjustment strategy to obtain the debiased user representation and item representation; calculating the matching score between users and items, ranking candidate items, and generating a recommendation list. This invention improves the accuracy and robustness of e-commerce interactive data processing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of e-commerce data processing technology, and in particular to an artificial intelligence-based method for processing interactive e-commerce data. Background Technology

[0002] With the rapid development of e-commerce, the interaction data between users and e-commerce platforms has exploded. How to accurately extract user intent from massive amounts of user clicks, browsing, comments, purchases, and other interaction data to achieve personalized recommendations has become a core competitive advantage for e-commerce platforms. Currently, recommendation methods based on graph neural networks (GNNs) are widely used in the field of e-commerce interaction data processing because they can effectively handle complex data structures such as user item interaction graphs and user social graphs.

[0003] However, existing GNN-based e-commerce interactive data processing methods still have the following technical problems in practical applications:

[0004] First, the oversmoothing problem during GNN propagation leads to convergence of node representations. In existing GNN methods, the spectral radius of node representations gradually decreases after multiple propagation layers, causing representations from different users to become homogenized, making it difficult to distinguish individual user preferences. When the spectral radius falls below a certain threshold, user discrimination significantly decreases, and the model's recommendation performance suffers. Current technologies have not yet established a linkage mechanism between spectral radius monitoring and propagation process control, making it impossible to effectively intervene in the propagation process when the spectral radius is too low.

[0005] Second, the disconnect between user behavior characteristics and model state characteristics leads to low counterfactual sampling efficiency. Existing counterfactual inference methods often use fixed sampling rates or simple rules when generating counterfactual samples, failing to organically combine the diversity of user behavior with the model propagation state. When user behavior is highly diverse and the model is overly smooth, more intensive counterfactual sampling is needed to explore potential user interests, but existing methods cannot dynamically adjust sampling strategies based on these two types of characteristics, resulting in insufficient representativeness and efficiency of counterfactual samples.

[0006] Third, there is a disconnect between the quality of counterfactual samples and the effectiveness of causal debiasing. Existing causal inference methods typically treat all counterfactual samples equally when blocking confounding factor paths, failing to consider the quality differences among different counterfactual samples. In reality, counterfactual samples generated with different sampling weights exhibit significant differences in reliability and representativeness, and low-quality counterfactual samples can contaminate the causal debiasing process. Current technologies lack an effective mechanism to feed back the quality assessment results of counterfactual samples to the causal debiasing stage.

[0007] Therefore, this invention proposes an artificial intelligence-based method for processing interactive e-commerce data. Summary of the Invention

[0008] The purpose of this invention is to solve the problems of overly smooth propagation, low counterfactual sampling efficiency, and poor accuracy of causal bias removal in the existing technology of GNN. To this end, an artificial intelligence-based method for processing interactive e-commerce data is proposed.

[0009] To achieve the above objectives, the present invention adopts the following technical solution: an e-commerce interactive data processing method based on artificial intelligence, comprising the following steps:

[0010] Step S1: Collect users' historical interaction data on e-commerce platforms, and construct user project interaction graphs and user social graphs based on the historical interaction data;

[0011] Step S2: Perform information propagation of the graph neural network on the user project interaction graph and the user social graph, and dynamically adjust the attention weights according to the spectral radius of the node representation during the propagation process to obtain the final user representation and the corresponding final spectral radius.

[0012] Step S3: Compare the final spectral radius with a preset threshold. If it is lower than the preset threshold, calculate the user's interaction probability distribution on different product categories to obtain the user behavior entropy, and proceed to step S4. If it is not lower than the preset threshold, jump to step S6 to generate a recommendation list.

[0013] Step S4: Determine the counterfactual sampling rate based on the user behavior entropy, and sample the user's historical interaction data using the counterfactual sampling rate to generate counterfactual samples and corresponding sampling weights;

[0014] Step S5: Input the counterfactual samples and their sampling weights into the graph neural network, and block the causal path from social relationship nodes and spatiotemporal context nodes to interaction result nodes through a backdoor adjustment strategy to obtain the debiased user representation and item representation.

[0015] Step S6: Calculate the matching score between users and items based on the biased user representation and item representation, and sort the candidate items according to the matching score to generate a recommendation list.

[0016] The beneficial effects of the technical solution provided by this invention include at least the following:

[0017] This invention monitors the spectral radius of node representations during information propagation in a graph neural network. When the spectral radius falls below a preset threshold, it triggers user behavior entropy calculation. Based on the user behavior entropy, it determines the counterfactual sampling rate to generate counterfactual samples. Furthermore, it uses a backdoor adjustment strategy to block the causal path from social relationship nodes and spatiotemporal context nodes to interaction result nodes. This effectively solves the problem of decreased user discriminability caused by excessive smoothing in GNNs, achieves synergistic optimization of user behavior features and model state features, and improves the relevance of counterfactual samples and the accuracy of causal debiasing.

[0018] This invention constructs user item interaction graphs and user social graphs, and dynamically adjusts attention weights based on spectral radius during propagation. It can detect the degree of model oversmoothing in real time and intervene in a timely manner to prevent node representation convergence, maintain user personalization characteristics, and thus improve the diversity and accuracy of recommendation results.

[0019] This invention determines the counterfactual sampling rate based on user behavior entropy and nonlinearly fuses the diversity of user behavior with the model propagation state. This enables an adaptive counterfactual sampling strategy, which increases sampling intensity when user interests are broad and the model is overly smooth, and reduces sampling redundancy when user interests are concentrated and the model is in good condition, thereby improving the generation efficiency and representativeness of counterfactual samples.

[0020] This invention uses the sampling weight of counterfactual samples as confidence levels to perform weighted fusion of the counterfactual world representation and the original propagation results. It can differentiate the processing based on the quality differences of counterfactual samples, allowing high-quality samples to play a greater role in the debiasing process, while effectively suppressing the influence of low-quality samples, thereby improving the stability and reliability of causal debiasing.

[0021] This invention constructs a complete technical closed loop, organically combining spectral radius monitoring, behavioral entropy calculation, counterfactual sampling, and causal bias correction. This comprehensively improves the accuracy, robustness, and interpretability of e-commerce interactive data processing, providing users with more personalized and high-quality recommendation services. Attached Figure Description

[0022] To more clearly illustrate the technical solutions and advantages in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0023] Figure 1 This is a schematic diagram of the method flow provided in an embodiment of the present invention;

[0024] Figure 2 This is a schematic diagram of the technical process provided for an embodiment of the present invention. Detailed Implementation

[0025] To further illustrate the technical means and effects adopted by the present invention to achieve its intended purpose, the following, in conjunction with the accompanying drawings and preferred embodiments, details the specific implementation, structure, features, and effects of an artificial intelligence-based e-commerce interactive data processing method proposed according to the present invention. In the following description, different "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, specific features, structures, or characteristics in one or more embodiments can be combined in any suitable form.

[0026] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0027] The following examples are for illustrative purposes only and are not intended to limit the scope of the invention.

[0028] The following description, in conjunction with the accompanying drawings, details a specific scheme for an artificial intelligence-based e-commerce interactive data processing method provided by the present invention.

[0029] Please see Figure 1 and Figure 2 It illustrates a method flowchart and technical flowchart of an artificial intelligence-based e-commerce interactive data processing method according to an embodiment of the present invention, including the following steps:

[0030] Step S1: Collect users' historical interaction data on e-commerce platforms, and construct user project interaction graphs and user social graphs based on the historical interaction data;

[0031] Step S2: Perform information propagation of the graph neural network on the user project interaction graph and the user social graph, and dynamically adjust the attention weights according to the spectral radius of the node representation during the propagation process to obtain the final user representation and the corresponding final spectral radius.

[0032] Step S3: Compare the final spectral radius with a preset threshold. If it is lower than the preset threshold, calculate the user's interaction probability distribution on different product categories to obtain the user behavior entropy, and proceed to step S4. If it is not lower than the preset threshold, jump to step S6 to generate a recommendation list.

[0033] Step S4: Determine the counterfactual sampling rate based on the user behavior entropy, and sample the user's historical interaction data using the counterfactual sampling rate to generate counterfactual samples and corresponding sampling weights;

[0034] Step S5: Input the counterfactual samples and their sampling weights into the graph neural network, and block the causal path from social relationship nodes and spatiotemporal context nodes to interaction result nodes through a backdoor adjustment strategy to obtain the debiased user representation and item representation.

[0035] Step S6: Calculate the matching score between users and items based on the biased user representation and item representation, and sort the candidate items according to the matching score to generate a recommendation list.

[0036] It should be noted that historical interaction data refers to various interaction-related data generated by users during their use of e-commerce platforms, including user rating data, comment text data, social relationship data, and spatiotemporal context data. It covers various interaction traces between users and products, and between users themselves, and mainly provides basic data support for subsequent chart construction, feature extraction, modeling and analysis.

[0037] User-item interaction graph refers to a graph structure constructed with users and product items as nodes and user-corrected rating data for product items as edge weights. Nodes are used to identify different users and different product items, and edge weights are used to reflect the degree of user preference for product items. In this invention, it is used to intuitively present the interactive relationship between users and product items.

[0038] A user social graph is a graph structure built with users as nodes and social relationship data between users as edges. The edges are used to represent social connections such as friends and followings between users. In this invention, it is used to capture the social influence relationships between users.

[0039] Information propagation in graph neural networks refers to the process of transmitting and aggregating the feature information of a node to its neighboring nodes through inter-layer computation in the constructed user project interaction graph and user social graph. In this invention, it aims to integrate multi-dimensional related information and enrich the feature dimensions of node representation.

[0040] The spectral radius of a node representation refers to the matrix constructed based on the node feature vectors generated during the propagation of the graph neural network. The spectral radius of this matrix is ​​the maximum value of the modulus of the matrix's eigenvalues, and it is mainly used to quantify the stability and propagation effect of the node representation.

[0041] Attention weights refer to the weight coefficients assigned to the feature information of different adjacent nodes during the information propagation process of a graph neural network. The magnitude of the weight reflects the degree of influence of the corresponding node's features on the representation of the current node. In this invention, the purpose is to highlight important related information and suppress the interference of irrelevant or secondary information.

[0042] The end-user representation refers to the vector representation obtained after multiple rounds of information propagation and dynamic adjustment of attention weights in a graph neural network, which can comprehensively reflect user preferences, social influence, and other characteristics. It is mainly used as the core user feature carrier for various subsequent analysis and prediction tasks.

[0043] Product categories refer to the category system that e-commerce platforms divide into various products according to preset classification rules. These categories include coarse-grained categories and fine-grained categories. Coarse-grained categories are general classifications of products, while fine-grained categories are further subdivisions of coarse-grained categories. In this invention, the purpose is to standardize product classification and facilitate the statistical analysis of user interaction preferences across different product categories.

[0044] Interaction probability distribution refers to the probability distribution formed by the proportion of user interactions in each product category to the total number of interactions. It is mainly used to quantify the distribution characteristics of user preferences in different categories.

[0045] User behavior entropy refers to a value calculated using the information entropy formula based on the probability distribution of user interactions across different product categories. It is used to measure the degree of dispersion of user behavior preferences. The larger the value, the more dispersed the user preferences are; the smaller the value, the more concentrated the user preferences are. In this invention, it is used to determine the diversity of user needs and to provide a basis for counterfactual sampling.

[0046] The counterfact sampling rate refers to the proportion of counterfact samples selected from user historical interaction data, which is dynamically determined based on user behavior entropy. In this invention, it is used to adaptively adjust the number of counterfact samples, balancing sampling efficiency and model performance.

[0047] Counterfactual samples are samples generated through counterfactual sampling that have virtual differences from the user's actual historical interaction data. They are used to simulate possible user interactions in different scenarios. In this invention, they are used to uncover causal relationships in the data and assist the model in correcting biases.

[0048] Sampling weights refer to the weight values ​​assigned to each counterfactual sample, which are used to reflect the importance of the sample in the model training and inference process. In this invention, they are used to highlight the influence of key counterfactual samples and improve the model's bias removal effect.

[0049] The backdoor adjustment strategy refers to a processing strategy that, within the framework of causal inference, controls confounding factors to block the influence of non-causal paths on the outcome. In this invention, it aims to eliminate interference from irrelevant factors and accurately uncover the true causal relationship between user and product interactions.

[0050] Social relationship nodes refer to nodes that represent users' social relationships in a causal graph. Spatiotemporal context nodes refer to nodes that represent environmental information such as time and geographical location in a causal graph. Interaction result nodes refer to nodes that represent the results of user interaction with products (such as clicks or purchases) in a causal graph. In this invention, these nodes are used to clearly define the different elements in the causal relationship.

[0051] The debiased user and item representations refer to the vector representations of users and product items that have been processed by a backdoor adjustment strategy to eliminate the interference of confounding factors such as social relationships and spatiotemporal context. These representations can more realistically reflect the matching relationship between users and products. In this invention, they are used to improve the accuracy of subsequent matching score calculations.

[0052] The matching score is a value calculated by a preset matching function based on the biased user representation and item representation. It is used to quantify the degree of fit between users and product items, and in this invention, it serves as the core basis for product ranking.

[0053] The recommendation list refers to the list formed by sorting candidate products in descending order according to their matching scores and selecting the top N products. In this invention, it is used to present personalized recommendation results to users, improve the user shopping experience, and increase platform conversion efficiency.

[0054] In one specific implementation, this method can be applied to personalized product recommendation scenarios on comprehensive e-commerce platforms, and the specific implementation details are as follows:

[0055] The specific implementation of step S1: The e-commerce platform collects users' historical interaction data in real time through the data collection module, including users' product rating data on the platform (quantitative ratings from 1 to 5 points), comment text data (users' textual evaluations of products), social relationship data (users' friend lists and following relationships), and spatiotemporal context data (the time and city where the user interacts).

[0056] The comment text data is processed. The number of characters in each comment text is counted as the text length. The proportion of sentiment words in the pre-set sentiment word dictionary in the comment text to the total number of words in the text is counted as the sentiment word density. A sentiment analysis model based on BERT is used to perform sentiment analysis on the comment text data and generate a sentiment tendency score between -1 and 1 (-1 represents strong negative sentiment and 1 represents strong positive sentiment).

[0057] The confidence level of the sentiment tendency score is determined based on the text length and sentiment word density. The specific calculation method is: confidence level = 0.6 × (text length / preset maximum text length) + 0.4 × sentiment word density, where the preset maximum text length is set to 500 characters.

[0058] The confidence level is compared with a preset confidence threshold of 0.5. When the confidence level is higher than 0.5, the sentiment score is normalized to 1 to 5 and then weighted and merged with the user rating data according to weights of 0.3 and 0.7 to obtain the corrected user rating data. When the confidence level is lower than or equal to 0.5, the user rating data is directly used as the corrected user rating data.

[0059] With each user and each product item as a node, and the corrected user rating data as the edge weight, a user-item interaction graph is constructed; with users as nodes, if there is a friend or following relationship between two users, an edge is established to construct a user social graph.

[0060] The specific implementation of step S2: GraphSAGE is used as a graph neural network model to propagate information on the user project interaction graph and the user social graph.

[0061] In the current layer's graph neural network propagation, user item interaction information and social relationship information are aggregated based on the attention mechanism to generate the user representation of the current layer.

[0062] Construct an adjacency matrix based on the user representation of the current layer, calculate the spectral radius ρ_current of the adjacency matrix, and obtain the preset first spectral radius threshold ρ_adjust=0.8.

[0063] The current spectral radius ρ_current is compared with the first spectral radius threshold ρ_adjust. When the current spectral radius ρ_current is lower than 0.8, the attention weights of the attention mechanism are updated: based on the current spectral radius ρ_current and the preset spectral radius threshold ρ_threshold=0.9, the spectral radius deviation δ=(0.9-ρ_current) / 0.9 is calculated; the preset intimacy threshold θ_social=0.6 is obtained, and weights greater than 0.6 are selected from the attention weights corresponding to social relationships as social weights to be adjusted; the preset amplification coefficient γ_soc is obtained. With ial=0.5, calculate the amplification factor 1+0.5×δ based on δ and γ_social, and multiply the social weights to be adjusted by this amplification factor to update the attention weights corresponding to social relationships; obtain the preset confidence threshold θ_confidence=0.5 for sentiment tendency scores, and select weights with confidence less than 0.5 from the attention weights corresponding to item interactions as the item weights to be adjusted; obtain the preset reduction coefficient γ_item=0.4, calculate the reduction factor 1-0.4×δ based on δ and γ_item, and multiply the item weights to be adjusted by this reduction factor to update the attention weights corresponding to item interactions.

[0064] The user representation is regenerated based on the updated attention weights, and the graph neural network continues to propagate until the preset propagation layer number of 5 is reached. Then the propagation stops, and the final user vector is used as the final user representation. The corresponding final spectral radius ρ_final is obtained.

[0065] The specific implementation of step S3 is as follows: compare the final spectral radius with the preset threshold. If it is lower than the preset threshold, the model is determined to be in an oversmooth state. Calculate the user interaction probability distribution on different product categories to obtain the user behavior entropy, and then proceed to step S4 to perform behavior entropy calculation and counterfactual sampling.

[0066] Extract users' historical behavior sequences (such as browsing, favorites, adding to cart, and purchasing records) from historical user interaction data, and map each product in the historical behavior sequence according to the platform's preset multi-level category system. For example, coarse-grained categories are clothing, electronic products, and food, while fine-grained categories are dresses, T-shirts, and pants under clothing.

[0067] The number of first interactions by users in each coarse-grained category and the number of second interactions in each fine-grained category are counted separately. The probability of a user's first interaction in each coarse-grained category is calculated based on the number of first interactions (number of interactions in a coarse-grained category / total number of interactions). The probability of a user's second interaction in each fine-grained category is calculated based on the number of second interactions (number of interactions in a fine-grained category / total number of interactions).

[0068] Based on the first interaction probability, the coarse-grained behavior entropy is calculated using the information entropy formula H=-Σp(x)log_2p(x). Based on the second interaction probability, the fine-grained behavior entropy is calculated using the same information entropy formula. The coarse-grained behavior entropy and the fine-grained behavior entropy are weighted and fused with weights of 0.4 and 0.6 respectively to obtain the multi-granularity behavior entropy, which is used as the user behavior entropy H.

[0069] The specific implementation of step S4 is as follows: Based on the user behavior entropy H, determine the basic demand degree of counterfactual sampling D_behavior=H / H_max, where H_max is the preset maximum behavior entropy, with a value of 3 (corresponding to the case where the user interacts evenly across multiple categories); based on the current spectrum radius ρ_current, determine the compensation demand degree of counterfactual sampling D_model=(0.9-ρ_current) / 0.9, where D_model is 0 when ρ_current≥0.9; nonlinearly fuse the basic demand degree and the compensation demand degree to obtain the comprehensive sampling demand degree D=1-(1-D_behavior)×(1-D_model); based on the preset minimum sampling rate ρ_min=0.1, maximum sampling rate ρ_max=0.5 and comprehensive sampling demand degree D, calculate the dynamic sampling rate ρ_dynamic=0.1+(0.5-0.1)×D.

[0070] The system acquires multi-dimensional features for each interaction record in the user's historical behavior sequence, including the confidence level of sentiment score, the intimacy of social relationship, and the interaction time decay factor (the closer the interaction time, the closer the decay factor is to 1; the further the interaction time, the closer the decay factor is to 0). It calculates the sampling weight for each interaction record based on these multi-dimensional features, comprehensively considering the influence of each feature. A weighted random sampling of the user's historical behavior sequence is performed using a dynamic sampling rate and sampling weight. For each sampled interaction record, counterfactual samples with causal labels are generated based on the corresponding product category and interaction type. For example, a record of a user actually purchasing product A is hypothetically presented as a record of purchasing other products within the same category, and the sample is labeled with a counterfactual attribute.

[0071] The specific implementation of step S5 is as follows: Construct a structural causal model to identify confounding factors affecting user interaction results, including item popularity (popularity reflected by metrics such as product exposure and sales volume) and spatiotemporal context (interaction time, location, etc.); input counterfactual samples and their sampling weights into a graph neural network to generate user and item representations in the counterfactual world; normalize the sampling weights of the counterfactual samples and use the normalized sampling weights as the confidence level of the counterfactual world representation; employ a backdoor adjustment strategy to intervene in the user and item representations in the counterfactual world, specifically by controlling the two confounding factors of item popularity and spatiotemporal context to block the causal paths of social relationship nodes → user representation nodes → interaction result nodes and spatiotemporal context nodes → item representation nodes → interaction result nodes; calculate the cluster of counterfactual samples based on their sampling weights. The population confidence score C_population = average(W) / max(W), where W is the sampling weight of the counterfactual samples, average is the mean function, and max is the maximum function. The confidence score of the counterfactual world representation is used as the individual confidence score C, which is fused with the population confidence score to obtain the comprehensive confidence score C_comprehensive = 0.6 × C + (1 - 0.6) × C_population, where 0.6 is the preset fusion coefficient. The fusion weight w = C_comprehensive is determined based on the comprehensive confidence score. For each user or item, its counterfactual world representation H_cf and the original propagation result H_ori are obtained, and weighted fusion is performed according to the formula H_final = w × H_cf + (1 - w) × H_ori to obtain the biased user representation and item representation.

[0072] The specific implementation of step S6 is as follows: Input the bias-reduced user representation and item representation into a preset matching function (such as inner product function, cosine similarity function) to calculate the initial matching score S_initial for each candidate item; correct the initial matching score according to the confidence level C, with the correction formula being S_final=S_initial×(1+0.5×(1-C)), where 0.5 is a preset correction coefficient; sort the candidate items in descending order according to the corrected matching score S_final to generate a sorted list of candidate items; obtain the preset recommendation list length N=10, and select the top 10 candidate items from the sorted list as the final recommended items; display the final recommended items in the "You May Like" section on the homepage of the e-commerce platform in the form of a recommendation list.

[0073] Step S1 further includes the following sub-steps:

[0074] Step S1-1: Collect users' historical interaction data on the e-commerce platform. The historical interaction data includes user rating data, comment text data, social relationship data, and spatiotemporal context data.

[0075] Steps S1-2: Count the number of characters in the comment text data as the text length, count the proportion of sentiment words in the comment text data to the total number of words in the text as the sentiment word density, and perform sentiment analysis on the comment text data to generate a sentiment tendency score;

[0076] Steps S1-3: Determine the confidence level of the sentiment tendency score based on the text length and sentiment word density, and compare the confidence level with the preset sentiment tendency score threshold;

[0077] Steps S1-4: When the confidence level is higher than the preset sentiment score threshold, the sentiment score and user rating data are weighted and fused to obtain the corrected user rating data. When the confidence level is lower than or equal to the preset sentiment score threshold, the user rating data is used as the corrected user rating data.

[0078] Steps S1-5: Construct a user-project interaction graph with users and projects as nodes and the corrected user rating data as edge weights.

[0079] Steps S1-6: Construct a user social graph with users as nodes and social relationship data as edges.

[0080] It should be noted that user rating data refers to the quantitative evaluation data given by users on e-commerce platforms for products they have interacted with. It is usually presented in the form of a score of 1 to 5 points and is used to directly reflect the user's initial preference for the product. It mainly provides basic preference data for subsequent rating correction and chart construction.

[0081] Comment text data refers to the text evaluation content posted by users after completing product interaction, including users' descriptions of product quality, user experience, appearance design, etc. In this invention, it is used to explore users' hidden emotional tendencies and deep needs.

[0082] Social relationship data refers to the interpersonal connections established by users within an e-commerce platform, including friend relationships, follow relationships, and interaction records. In this invention, it is used to capture the social influence between users and provide a basis for subsequent social graph construction.

[0083] Spatiotemporal context data refers to the time and geographic location information when a user engages in interactive behavior. The time information includes the moment and time period in which the interaction occurs, and the geographic location information includes the user's city and region. In this invention, it is used to enrich the scene dimension of interactive data and help to accurately characterize user preferences.

[0084] Text length refers to the numerical value obtained after counting the characters in a single comment text. Characters include Chinese characters, letters, numbers, and punctuation marks. In this invention, it is used to measure the richness of information in the comment text and serves as one of the bases for confidence calculation.

[0085] Emotional words refer to specific words that can express positive or negative emotions and are included in a pre-defined emotional word dictionary. Positive emotional words include words like "high-quality," "easy to use," and "satisfied," while negative emotional words include words like "inferior," "stuck," and "disappointed." In this invention, they are used to quantify the intensity of emotional tendencies in comment texts.

[0086] Sentiment word density refers to the ratio of the number of sentiment words in a single comment text to the total number of words in the text. The total number of words is the total number of words in the comment text after being separated by spaces or punctuation. In this invention, it is used to reflect the concentration of sentiment expression and to help determine the reliability of the sentiment tendency score.

[0087] Sentiment score refers to a quantitative value obtained after processing comment text data through sentiment analysis algorithms. It is used to intuitively reflect the user's sentiment tendency. The higher the score, the stronger the positive sentiment, and the lower the score, the stronger the negative sentiment. In this invention, it is used to transform text sentiment into a calculable numerical feature.

[0088] Confidence level refers to a numerical value calculated based on text length and sentiment word density to measure the reliability of sentiment tendency score. The longer the text length and the higher the sentiment word density, the higher the confidence level. In this invention, it is used to determine whether the sentiment tendency score has value in fusion with user rating data.

[0089] The preset threshold refers to a baseline value preset based on the quality of historical review texts and model performance of the e-commerce platform. It is used to classify the reliability of sentiment scores. In this invention, it is used to determine whether to use sentiment scores to correct user rating data.

[0090] The revised user rating data refers to user rating data that has undergone weighted fusion processing or is directly used. Compared with the original user rating data, it can more accurately reflect users' actual preferences for product items, mainly providing accurate edge weight data for user item interaction graphs.

[0091] In one specific implementation, this sub-step can be applied to the personalized recommendation data preprocessing scenario of a comprehensive e-commerce platform, and the specific implementation details are as follows:

[0092] The specific implementation of step S1-1: Through the e-commerce platform's backend data collection system, historical user interaction data within the platform is collected in real time. User rating data consists of 1-5 point quantitative ratings submitted by users after purchasing or browsing products, for example, a user giving a 4-point rating after purchasing a certain brand of headphones; comment text data consists of text reviews posted by users, such as "The headphone sound quality is clear, but the battery life is shorter than expected"; social relationship data consists of the association information established by users through the platform's "Add Friends" and "Follow Influencers" functions, including the user's friend list, follower list, and interaction records (such as commenting on product links shared by friends); spatiotemporal context data consists of the time and geographical location information when the user performed the above interactions, with time information accurate to the minute, for example, 19:35 on October 20, 2025, and geographical location information obtained based on the user's authorized IP address or location function, for example, Chaoyang District, Beijing.

[0093] The specific implementation of steps S1-2 is as follows: A text processing tool is used to count the characters in each comment text to obtain the text length. For example, the text length of "The headphone sound quality is clear, but the battery life is shorter than expected" is 22 characters. A preset sentiment word dictionary (containing 5000 commonly used positive and negative sentiment words) is used to count the number of sentiment words in each comment text. For example, in the above comment, "clear" is a positive sentiment word, and "short" is a negative sentiment word, totaling 2 sentiment words. The total number of words in this comment is 10, so the sentiment word density is 2 / 10 = 0.2. A BERT-based sentiment analysis model is used to process the comment text data, generating a sentiment tendency score between -1 and 1. The sentiment tendency score of the above comment is 0.3, indicating an overall positive bias but with some negative feedback.

[0094] The specific implementation of steps S1-3 is as follows: Calculate the confidence level of the sentiment tendency score based on the text length and sentiment word density. The calculation formula is: Confidence Level = 0.5 × (Text Length / Preset Maximum Text Length) + 0.5 × Sentiment Word Density. The preset maximum text length is set to 500 characters. For example, the text length of the above comment is 22 characters, and the corresponding score is 0.5 × (22 / 500) = 0.022. The sentiment word density is 0.2, and the corresponding score is 0.5 × 0.2 = 0.1. The final confidence level is 0.022 + 0.1 = 0.122. The preset confidence threshold is 0.3. Compare the calculated confidence level of 0.122 with the threshold of 0.3 to determine the reliability of the sentiment tendency score.

[0095] The specific implementation of steps S1-4: Since the confidence score of the above comment is 0.122, which is lower than the preset threshold of 0.3, it indicates that the reliability of the sentiment score is insufficient. Therefore, the user rating data of 4 is directly used as the corrected user rating data. If a user's comment on a certain laptop has a text length of 150 characters and a sentiment word density of 0.4, the calculated confidence score is 0.5×(150 / 500)+0.5×0.4=0.15+0.2=0.35, which is higher than the preset threshold of 0.3. The sentiment score is 0.8 and the user rating is 5 points. The user rating data is weighted and fused according to the sentiment score weight of 0.3 and the user rating weight of 0.7. The corrected user rating data is 0.3×(0.8×5)+0.7×5=0.3×4+3.5=1.2+3.5=4.7 points (rounded to one decimal place).

[0096] The specific implementation of steps S1-5 is as follows: Each registered user and each product item within the platform is treated as an independent node. User nodes are identified by user IDs, and product item nodes are identified by product IDs. The corrected user rating data is used as the weight of the connection edge between the corresponding user node and the product item node. For example, if user ID U1001 gives a corrected rating of 4 points for headphones with product ID G2003, then an edge with a weight of 4 is established between node U1001 and node G2003 to construct a complete user-item interaction graph.

[0097] The specific implementation of steps S1-6 is as follows: Each registered user in the platform is an independent node, and the user node is identified by the user ID. If there is a friend relationship or a following relationship between two users, an undirected edge is established between the two corresponding user nodes. For example, if the user with user ID U1001 follows the expert user with user ID U1002, then an edge is established between the U1001 node and the U1002 node, thereby constructing a complete user social graph.

[0098] Step S2 further includes the following sub-steps:

[0099] Step S2-1: In the information propagation of the graph neural network in the current layer, the user's item interaction information and social relationship information are aggregated based on the attention mechanism to generate the user representation of the current layer;

[0100] Step S2-2: Based on the user representation of the current layer, calculate the current spectral radius ρ_current and obtain the preset first spectral radius threshold ρ_adjust;

[0101] Step S2-3: Compare the current spectral radius ρ_current with the first spectral radius threshold ρ_adjust;

[0102] Step S2-4: When the current spectral radius ρ_current is lower than the first spectral radius threshold ρ_adjust, update the attention weights of the attention mechanism; otherwise, keep the current attention weights unchanged and regenerate the user representation based on the updated attention weights.

[0103] Step S2-5: Continue propagation until the propagation termination condition is met, then output the final user representation and obtain the corresponding final spectral radius ρ_final.

[0104] Furthermore, in sub-steps S2-4, the attention weights of the attention mechanism are updated according to the following steps:

[0105] The spectral radius deviation is calculated based on the current spectral radius ρ_current and the preset spectral radius threshold ρ_threshold. The specific formula is: δ=(ρ_threshold-ρ_current) / ρ_threshold, where δ is the spectral radius deviation and ρ_current<ρ_threshold.

[0106] Obtain a preset intimacy threshold θ_social, and select weights greater than θ_social from the attention weights corresponding to social relationships as the social weights to be adjusted;

[0107] Obtain the preset amplification factor γ_social, calculate the amplification factor 1+γ_social×δ based on δ and γ_social, and multiply the social weight to be adjusted by this amplification factor to update the attention weight corresponding to the social relationship;

[0108] Obtain the preset confidence threshold θ_confidence for sentiment tendency scores, and select weights with confidence less than θ_confidence from the attention weights corresponding to project interactions as project weights to be adjusted.

[0109] Obtain the preset reduction factor γ_item, calculate the reduction factor 1-γ_item×δ based on δ and γ_item, multiply the weight of the item to be adjusted by the reduction factor, and update the attention weight corresponding to the item interaction.

[0110] It should be noted that the attention mechanism refers to the mechanism of assigning different weights to information from different sources during the information aggregation process. The weight reflects the importance of the corresponding information. In this invention, it is used to highlight key information and suppress the interference of secondary information.

[0111] Project interaction information refers to data related to the interaction between users and product items, including corrected user rating data, interaction type, interaction frequency, etc., which are used in this invention to reflect the user's preference characteristics for product items.

[0112] Social relationship information refers to data related to the social connections between users, including social intimacy, interaction frequency, relationship type, etc., which are used in this invention to reflect the characteristics of social influence between users.

[0113] The user representation of the current layer refers to the user feature vector that reflects the fusion effect of the current layer after information propagation and attention mechanism of the current layer graph neural network. In this invention, it is used as the basis for subsequent spectral radius calculation and weight update.

[0114] The current spectral radius ρ_current refers to the maximum value of the modulus of the eigenvalues ​​in the matrix constructed based on the user representation of the current layer. It is used to quantify the stability and propagation effect of the user representation of the current layer. In this invention, it is used to determine the information fusion quality of the current layer.

[0115] The first spectral radius threshold ρ_adjust refers to a preset baseline value based on the performance requirements of the graph neural network model. It is used to determine whether the current spectral radius has achieved the expected propagation effect. In this invention, it is a trigger condition for updating attention weights.

[0116] The propagation termination condition refers to the preset criteria for determining when the propagation of information in the graph neural network stops, including preset propagation layers, spectral radius stability thresholds, etc. In this invention, it is used to control the timing of the termination of the propagation process and avoid over-propagation or under-propagation.

[0117] The final spectral radius ρ_final refers to the spectral radius of the matrix constructed based on the end-user representation. It is used to quantify the stability of the end-user representation and, in this invention, to determine whether the end-user representation meets the requirements of subsequent processing.

[0118] The spectral radius threshold ρ_threshold is a preset benchmark value used to calculate the deviation of the spectral radius. It reflects the ideal standard for the stability of the user representation. In this invention, it is used to quantify the difference between the current spectral radius and the ideal state.

[0119] The spectral radius deviation δ refers to the ratio of the difference between the current spectral radius and the spectral radius threshold to the spectral radius threshold. It is used to measure the degree of deviation of the current spectral radius and, in this invention, is to determine the adjustment range of the attention weight.

[0120] The intimacy threshold θ_social refers to a preset benchmark value used to filter the weight of important social relationships. In this invention, it is used to distinguish between social relationships that have a greater or lesser impact on users and to adjust the weight accordingly.

[0121] The attention weight corresponding to social relationship refers to the weight coefficient assigned to social relationship information in the attention mechanism, which is used to reflect the degree of influence of social relationship on user representation. In this invention, it is used to adjust the role of social information in the fusion process.

[0122] The social weights to be adjusted refer to the weights that are greater than the intimacy threshold selected from the attention weights corresponding to social relationships. These weights correspond to social relationships that have a significant impact on users, and in this invention, they are adjusted in a targeted manner.

[0123] The amplification factor γ_social refers to a preset coefficient used to control the amplification of social weight. In this invention, it is used to dynamically adjust the amplification intensity according to the deviation of the spectral radius, so as to avoid over-adjustment or under-adjustment.

[0124] The magnification factor refers to the factor used to update the social weights to be adjusted, calculated based on the spectral radius deviation and the magnification factor. In this invention, it is used to achieve dynamic amplification of social weights and enhance the influence of important social relationships.

[0125] The confidence threshold θ_confidence for sentiment tendency score refers to a preset benchmark value used to filter the interaction weights of low-reliability items. In this invention, it is used to identify item interaction information based on low-confidence sentiment scores and adjust the weights accordingly.

[0126] The attention weight corresponding to the project interaction refers to the weight coefficient assigned to the project interaction information in the attention mechanism, which is used to reflect the degree of influence of the project interaction on the user's expression. In this invention, it is used to adjust the role of the project interaction information in the fusion process.

[0127] The project weights to be adjusted refer to the weights selected from the attention weights corresponding to project interactions that have a confidence level less than θ_confidence. The project interaction information corresponding to these weights has low reliability, and in this invention, they are adjusted in a targeted manner.

[0128] The reduction coefficient γ_item is a preset coefficient used to control the reduction magnitude of the item weight. In this invention, it is used to dynamically adjust the reduction intensity according to the deviation of the spectral radius to balance the influence of the interactive information of the items.

[0129] The reduction factor refers to the factor used to update the weight of the item to be adjusted, which is calculated based on the spectral radius deviation and the reduction factor. In this invention, it is used to achieve dynamic reduction of the item weight and reduce the interference of low-reliability item interaction information.

[0130] In one specific implementation, this sub-step can be applied to the personalized recommendation user feature extraction scenario of a comprehensive e-commerce platform, and the specific implementation details are as follows:

[0131] The specific implementation of step S2-1: GraphSAGE is used as the graph neural network model, with a total propagation layer of 5. In the information propagation of the current layer (taking layer 2 as an example), the user's item interaction information and social relationship information are aggregated based on the attention mechanism. Item interaction information includes the user's corrected rating data for each product and the frequency of interaction. Social relationship information includes the user's social intimacy with friends (calculated based on the frequency of interaction, ranging from 0 to 1) and relationship type. The attention mechanism initially assigns equal weights to each piece of information (item interaction information weight 0.5, social relationship information weight 0.5). The user representation of the current layer is generated through aggregation calculation. For example, the current layer user representation of user U1001 is [0.32, 0.45, 0.28, ..., 0.51] (128 dimensions).

[0132] The specific implementation of step S2-2 is as follows: Construct an adjacency matrix based on the user representation of the current layer, calculate the spectral radius of the matrix through eigenvalue decomposition, and obtain the current spectral radius ρ_current=0.72; the preset first spectral radius threshold ρ_adjust=0.8 is obtained from the model parameter configuration.

[0133] The specific implementation of step S2-3 is as follows: compare the current spectral radius ρ_current=0.72 with the first spectral radius threshold ρ_adjust=0.8, and determine that the current spectral radius is lower than the first spectral radius threshold, so attention weight update is required.

[0134] The specific implementation of steps S2-4 is as follows: Based on the current spectral radius ρ_current=0.72 and the preset spectral radius threshold ρ_threshold=0.9, calculate the spectral radius deviation δ=(0.9-0.72) / 0.9=0.2; obtain the preset intimacy threshold θ_social=0.6, and filter from the attention weights corresponding to social relationships. For example, in the social relationships of user U1001, the weight of friend U1002 is 0.65 and the weight of friend U1003 is 0.58. Filter out weights greater than 0.6, such as 0.65, as the social weights to be adjusted; obtain the preset amplification factor γ_social=0.5, and calculate the amplification factor =1+0.5×0.2=1.1 based on δ=0.2 and γ_social=0.5. Multiply the social weight to be adjusted 0.65 by 1.1 to obtain the updated social weight 0.715, and update the attention weights corresponding to social relationships; obtain the preset confidence threshold θ_con for sentiment tendency score. With a fidence of 0.5, the attention weights corresponding to item interactions are selected. For example, user U1001's interaction weight for product G2003 is 0.48 (corresponding to a sentiment score confidence of 0.42 < 0.5), and the interaction weight for product G2004 is 0.55 (corresponding to a sentiment score confidence of 0.63 > 0.5). The weight of 0.48 with a confidence of less than 0.5 is selected as the item weight to be adjusted. The preset reduction coefficient γ_item = 0.4 is obtained. Based on δ = 0.2 and γ_item = 0.4, the reduction factor is calculated as 1 - 0.4 × 0.2 = 0.92. The item weight to be adjusted, 0.48, is multiplied by 0.92 to obtain the updated item weight, 0.4416. The attention weights corresponding to item interactions are then updated. Based on the updated attention weights (social relationship weight adjusted to 0.715, item interaction weight adjusted to 0.4416, etc.), the user's item interaction information and social relationship information are re-aggregated to generate a new user representation for the current layer.

[0135] The specific implementation of step S2-5 is as follows: the updated user representation of the current layer is used as the input of the next layer, and the information propagation of the graph neural network continues. Steps S2-1 to S2-4 are repeated until the preset number of propagation layers (5 layers) is reached (propagation termination condition), then the propagation stops, and the final user representation is output. For example, the final user representation of user U1001 is [0.38,0.42,0.35,...,0.56], and the corresponding final spectral radius ρ_final=0.85 is calculated.

[0136] In step S3, calculating the user's interaction probability distribution across different product categories to obtain the user behavior entropy further includes the following sub-steps:

[0137] Step S3-1: Obtain the user's historical behavior sequence from historical interaction data, and map each product in the historical behavior sequence according to the preset multi-level category system to obtain the affiliation of each product in coarse-grained category and fine-grained category.

[0138] Step S3-2: Count the number of first interactions by the user on the coarse-grained category and the number of second interactions on the fine-grained category, respectively.

[0139] Step S3-3: Calculate the probability of the user's first interaction on the coarse-grained category based on the first number of interactions, and calculate the probability of the user's second interaction on the fine-grained category based on the second number of interactions.

[0140] Steps S3-4: Calculate coarse-grained behavioral entropy based on the first interaction probability using the information entropy formula, and calculate fine-grained behavioral entropy based on the second interaction probability using the information entropy formula.

[0141] Step S3-5: Weighted fusion of coarse-grained behavior entropy and fine-grained behavior entropy to obtain multi-granularity behavior entropy, which is used as user behavior entropy.

[0142] It should be noted that the historical behavior sequence refers to a series of continuous interactive behavior records generated by users on e-commerce platforms, arranged in chronological order, including product information corresponding to behaviors such as browsing, favorites, adding to cart, and purchasing. In this invention, it is used to fully present the user's behavioral trajectory and provide a basis for category mapping and interaction statistics.

[0143] The pre-defined multi-level category system refers to the hierarchical classification rules set by e-commerce platforms for product classification. It is divided into two core levels: coarse-grained and fine-grained. In this invention, it is used to standardize product classification from different dimensions, so as to facilitate accurate statistics on user preferences in different categories.

[0144] Coarse-grained categories refer to higher-level classifications in a multi-level category system, covering a wide range, such as clothing, electronic products, and food. In this invention, they are used to capture users' major category preferences from a macro perspective.

[0145] Fine-grained categories refer to lower-level classifications in a multi-level category system. They are further subdivisions of coarse-grained categories, such as dresses and T-shirts under clothing, and headphones and mobile phones under electronic products. In this invention, the purpose is to explore users' specific category preferences from a micro-level perspective.

[0146] The first interaction count refers to the cumulative number of times a user interacts with other users across various coarse-grained categories. These interactions include browsing, adding to favorites, adding to cart, and purchasing. In this invention, the purpose is to quantify the degree of user attention to the major categories.

[0147] The second number of interactions refers to the cumulative number of times a user interacts with a product under each fine-grained category. The number of interactions is the same as the first number of interactions. In this invention, it is used to quantify the degree of user attention to a specific product category.

[0148] The first interaction probability refers to the proportion of the number of first interactions in a certain coarse-grained category to the total number of interactions in all coarse-grained categories. In this invention, it is used to reflect the proportion of user preference distribution across major categories.

[0149] The second interaction probability refers to the proportion of the number of second interactions in a certain fine-grained category to the total number of interactions in all fine-grained categories. In this invention, it is used to reflect the distribution of user preferences in specific categories.

[0150] Information entropy formula refers to a mathematical formula used to calculate the uncertainty or discreteness of a system. In this invention, it is specifically the core formula for calculating behavioral entropy: H=-Σp(x)log_2p(x), where p(x) is the interaction probability. In this invention, it is used to transform the preference distribution into a quantifiable entropy value.

[0151] Coarse-grained behavioral entropy refers to the value calculated by the information entropy formula based on the first interaction probability corresponding to the coarse-grained category. It is used to measure the degree of dispersion of users' preferences in major categories. The larger the value, the more dispersed the preferences in major categories. In this invention, it is used to reflect the user preference characteristics from a macro perspective.

[0152] Fine-grained behavioral entropy refers to the value calculated by the information entropy formula based on the second interaction probability corresponding to the fine-grained category. It is used to measure the degree of dispersion of users' preferences in specific categories. The larger the value, the more dispersed the preferences in specific categories. In this invention, it is used to reflect user preference characteristics from a micro perspective.

[0153] Multi-granularity behavioral entropy refers to the comprehensive entropy value obtained after weighted fusion, i.e., user behavior entropy, which can comprehensively reflect the dispersion of user preferences at different levels and categories. In this invention, it is to provide a core basis for the subsequent determination of counterfactual sampling rate.

[0154] In one specific implementation, this sub-step can be applied to the personalized recommendation preference analysis scenario of a comprehensive e-commerce platform, and the specific implementation details are as follows:

[0155] The specific implementation of step S3-1: Extract the historical behavior sequence of user U1001 from the user historical interaction database of the e-commerce platform. This sequence records the user's interaction behavior over the past three months in chronological order, such as browsing dresses on October 1, 2025, purchasing headphones on October 5, 2025, and adding casual pants to the cart on October 8, 2025. In the platform's preset multi-level category system, coarse-grained categories include clothing, electronic products, food, and home furnishings, while fine-grained categories are subcategories of coarse-grained categories, such as dresses, T-shirts, and casual pants under clothing, and headphones, mobile phones, and laptops under electronic products. Map each product in the historical behavior sequence according to this system. For example, dresses are mapped to the coarse-grained category "clothing" and the fine-grained category "dresses"; headphones are mapped to the coarse-grained category "electronic products" and the fine-grained category "headphones"; and casual pants are mapped to the coarse-grained category "clothing" and the fine-grained category "casual pants".

[0156] The specific implementation of step S3-2: Count the number of first interactions by user U1001 in each coarse-grained category. Interaction behaviors include browsing, adding to favorites, adding to cart, and purchasing. Specifically, the number of interactions in the clothing category is 15 (8 browsing, 2 adding to favorites, 3 adding to cart, 2 purchasing), the number of interactions in the electronics category is 8 (3 browsing, 1 adding to favorites, 2 adding to cart, 2 purchasing), the number of interactions in the food category is 3 (2 browsing, 1 purchasing), and the number of interactions in the home furnishings category is 2 (2 browsing). The total number of first interactions is 28. Count the number of second interactions in each fine-grained category. Specifically, dresses are 5 times, T-shirts are 3 times, casual pants are 4 times, headphones are 4 times, mobile phones are 2 times, laptops are 2 times, snacks are 2 times, drinking water is 1 time, pillows are 1 time, and towels are 1 time. The total number of second interactions is 28.

[0157] The specific implementation of step S3-3: Calculate the probability of the first interaction based on the number of first interactions: clothing category probability = 15 / 28 ≈ 0.536, electronic product category probability = 8 / 28 ≈ 0.286, food category probability = 3 / 28 ≈ 0.107, home furnishing category probability = 2 / 28 ≈ 0.071; calculate the probability of the second interaction based on the number of second interactions: dress probability = 5 / 28 ≈ 0.179, T-shirt probability = 3 / 28 ≈ 0.107, casual pants probability = 4 / 28 ≈ 0.143, headphones probability = 4 / 28 ≈ 0.143, mobile phone probability = 2 / 28 ≈ 0.071, laptop probability = 2 / 28 ≈ 0.071, snack probability = 2 / 28 ≈ 0.071, drinking water probability = 1 / 28 ≈ 0.036, cushion probability = 1 / 28 ≈ 0.036, towel probability = 1 / 28 ≈ 0.036.

[0158] The specific implementation of step S3-4: Based on the first interaction probability, the coarse-grained behavioral entropy is calculated using the information entropy formula, Hcoarse≈1.56; based on the second interaction probability, the fine-grained behavioral entropy is calculated using the same information entropy formula, Hfine≈2.83.

[0159] The specific implementation of step S3-5: The weight of coarse-grained behavior entropy is preset to 0.3, and the weight of fine-grained behavior entropy is 0.7. The two are weighted and merged. User behavior entropy = 0.3×1.56+0.7×2.83≈0.468+1.981≈2.449. This value indicates that user preferences are relatively dispersed, especially the diversity of preferences in specific categories.

[0160] Step S4 further includes the following sub-steps:

[0161] Step S4-1: Determine the basic demand level for counterfactual sampling based on user behavior entropy. The specific formula is: D_behavior=H / H_max, where H is user behavior entropy, H_max is the preset maximum behavior entropy, and D_behavior is the basic demand level.

[0162] Step S4-2: Determine the compensation requirement for counterfactual sampling based on the current spectral radius. The specific formula is: D_model = (ρ_threshold - ρ_current) / ρ_threshold, where D_model is the compensation requirement, ρ_current is the current spectral radius, and ρ_threshold is the preset spectral radius threshold. When ρ_current ≥ ρ_threshold, D_model takes the value of 0.

[0163] Step S4-3: Nonlinearly fuse the basic demand degree and the compensation demand degree to obtain the comprehensive sampling demand degree. The specific formula is: D=1-(1-D_behavior)×(1-D_model), where D is the comprehensive sampling demand degree.

[0164] Step S4-4: Calculate the dynamic sampling rate based on the preset minimum sampling rate, maximum sampling rate, and comprehensive sampling demand. The specific formula is: ρ_dynamic = ρ_min + (ρ_max - ρ_min) × D, where ρ_dynamic is the dynamic sampling rate, ρ_max is the preset maximum sampling rate, and ρ_min is the preset minimum sampling rate.

[0165] Steps S4-5: Adaptive importance sampling is performed on the user's historical behavior sequence based on the dynamic sampling rate to generate a lightweight counterfactual sequence as a counterfactual sample, and the sampling weight corresponding to each counterfactual sample is recorded.

[0166] Furthermore, in sub-steps S4-5, the step of generating a lightweight counterfactual sequence as a counterfactual sample includes:

[0167] Obtain multi-dimensional features for each interaction record in the user's historical behavior sequence. These multi-dimensional features include the confidence level of sentiment score, the intimacy of social relationship, and the interaction time decay factor.

[0168] The sampling weight of each interaction record is calculated based on multi-dimensional features. The specific formula is: W=w_base×(1+α×S_intimacy / S_max)×(1-β×(1-C_conf / C_max))×T_decay, where W is the sampling weight, w_base is the basic sampling weight, α is the intimacy amplification coefficient, S_intimacy is the social relationship intimacy, S_max is the preset maximum intimacy, β is the confidence suppression coefficient, C_conf is the confidence of the sentiment tendency score, C_max is the preset maximum confidence, and T_decay is the interaction time decay factor.

[0169] We employ dynamic sampling rate and sampling weight to perform weighted random sampling on the user's historical behavior sequence;

[0170] For the sampled interaction records, counterfactual samples with causal labels are generated based on their corresponding product categories and interaction types.

[0171] It should be noted that the basic demand degree refers to a quantitative indicator that reflects the degree of dispersion of user preferences and the counterfactual sampling demand, which is calculated by user behavior entropy. The more dispersed the user preferences are, the higher the basic demand degree is. In this invention, it is used to determine the basic intensity of sampling based on the user's own preference characteristics.

[0172] Maximum behavioral entropy refers to the preset upper limit of user behavioral entropy, which corresponds to the ideal state of users interacting evenly across all categories. In this invention, it is used to normalize the basic demand degree so that its value ranges between 0 and 1.

[0173] The compensation demand degree refers to the sampling demand index calculated based on the difference between the current spectral radius and the spectral radius threshold, which is used to compensate for the insufficient stability of the user representation. The greater the deviation of the spectral radius, the higher the compensation demand degree. In this invention, it is used to dynamically adjust the sampling intensity according to the model propagation effect.

[0174] Nonlinear fusion refers to the process of combining and calculating the basic demand degree and the compensation demand degree using a preset nonlinear formula, which can highlight the synergistic effect of the two demand degrees. In this invention, it is to obtain a comprehensive sampling demand degree that is more in line with actual needs.

[0175] The comprehensive sampling demand degree refers to the final quantitative index of sampling demand obtained by integrating the basic demand degree and the compensation demand degree. Its value ranges from 0 to 1. In this invention, it is used as the core basis for calculating the dynamic sampling rate.

[0176] The minimum sampling rate refers to a preset lower limit of the counterfactual sampling rate, which is used to ensure the minimum sampling amount and avoid model bias caused by insufficient sampling. In this invention, it is used to control the minimum threshold of the sampling rate.

[0177] The maximum sampling rate refers to the preset upper limit of the counterfactual sampling rate, which is used to limit the maximum sampling amount and avoid increased computational overhead caused by oversampling. In this invention, it is used to control the maximum threshold of the sampling rate.

[0178] Dynamic sampling rate refers to a sampling rate that can be adaptively adjusted based on the comprehensive sampling demand. The higher the demand, the higher the sampling rate. In this invention, it is used to achieve dynamic adaptation of the sampling amount and balance the sampling effect with the calculation efficiency.

[0179] Adaptive importance sampling refers to a sampling method that assigns different sampling probabilities based on the importance of interaction records. Records with higher importance have a higher probability of being sampled. In this invention, the purpose is to retain key interaction information while controlling the amount of sampling.

[0180] Lightweight counterfactual sequences refer to counterfactual behavior sequences that are shortened in length and contain key information, generated through adaptive importance sampling. In this invention, they are used to reduce the overhead of subsequent model computation and improve processing efficiency.

[0181] The interaction time decay factor is a decay coefficient calculated based on the interval between the interaction time and the current time. The closer the time interval, the closer the decay factor is to 1. In this invention, it is to highlight the importance of recent interaction records.

[0182] The basic sampling weight refers to the preset sampling weight benchmark value. The initial weight of all interactive records is this value. In this invention, it is used to unify the calculation benchmark of the sampling weight.

[0183] The intimacy amplification coefficient refers to a preset coefficient used to amplify the influence of social relationship intimacy on sampling weight. In this invention, it is used to increase the sampling priority of interaction records corresponding to high-intimacy social relationships.

[0184] Maximum intimacy refers to the preset upper limit of social relationship intimacy, which is used to normalize the social relationship intimacy. In this invention, it is to ensure the rationality of the sampling weight calculation.

[0185] The confidence suppression coefficient is a preset coefficient used to suppress the influence of low confidence sentiment scores on sampling weights. In this invention, it is used to reduce the sampling priority of low-reliability interaction records.

[0186] Maximum confidence level refers to the preset upper limit of confidence level for sentiment tendency score, which is used to normalize the confidence level of sentiment tendency score. In this invention, it is to ensure the stability of sampling weight calculation.

[0187] Weighted random sampling refers to a random sampling process that combines dynamic sampling rate and sampling weight. The sampling probability is positively correlated with the sampling weight. In this invention, it is used to achieve sampling based on importance and improve sample quality.

[0188] Causal labels are tags added to counterfactual samples to identify their causal differences from real samples, such as virtual interaction types and virtual category associations. In this invention, they are used to help the model distinguish between real samples and counterfactual samples and accurately learn causal relationships.

[0189] In one specific implementation, this sub-step can be applied to the counterfactual sampling scenario of personalized recommendations on a comprehensive e-commerce platform, and the specific implementation details are as follows:

[0190] The specific implementation of step S4-1: Given that the user behavior entropy H of user U1001 is 2.449 and the preset maximum behavior entropy H_max is 3.0, the basic demand degree is calculated according to the formula D_behavior=H / H_max=2.449 / 3.0≈0.816, which indicates that the user's preferences are scattered and the basic sampling demand is high.

[0191] The specific implementation of step S4-2: The current spectral radius ρ_current=0.72, the preset spectral radius threshold ρ_threshold=0.9, according to the formula D_model=(ρ_threshold-ρ_current) / ρ_threshold, the compensation requirement is calculated as (0.9-0.72) / 0.9=0.2, indicating that the current user representation is not stable enough and needs to be compensated by sampling.

[0192] The specific implementation of step S4-3: According to the nonlinear fusion formula D=1-(1-D_behavior)×(1-D_model), calculate the comprehensive sampling requirement degree=1-(1-0.816)×(1-0.2)=1-0.184×0.8≈1-0.147=0.853.

[0193] The specific implementation of step S4-4: The preset minimum sampling rate ρ_min=0.1 and maximum sampling rate ρ_max=0.5. According to the formula ρ_dynamic=ρ_min+(ρ_max-ρ_min)×D, the dynamic sampling rate is calculated as 0.1+(0.5-0.1)×0.853=0.1+0.4×0.853≈0.1+0.341=0.441, that is, the sampling rate is approximately 44.1%.

[0194] The specific implementation of steps S4-5 is as follows: Obtain the multi-dimensional features of each interaction record in the historical behavior sequence of user U1001, including the confidence score of sentiment tendency C_conf (e.g., the confidence score of the dress interaction record is 0.122, and the confidence score of the headphone interaction record is 0.35); the intimacy of social relationship S_intimacy (e.g., the intimacy score of the interaction record influenced by friend U1002 is 0.7, and the intimacy score of the interaction record without social influence is 0); and the decay factor of interaction time T_decay (e.g., the decay factor of the interaction record 3 days ago is 0.9, and the decay factor of the interaction record 30 days ago is 0.3).

[0195] The preset base sampling weight w_base=1.0, intimacy amplification coefficient α=0.5, maximum intimacy S_max=1.0, confidence suppression coefficient β=0.3, and maximum confidence C_max=1.0.

[0196] Taking the dress interaction record as an example, according to the formula W=w_base×(1+α×S_intimacy / S_max)×(1-β×(1-C_conf / C_max))×T_decay, the sampling weight is calculated as 1.0×(1+0.5×0 / 1.0)×(1-0.3×(1-0.122 / 1.0))×0.9≈1.0×1.0×(1-0.5×0 / 1.0)). .3×0.878)×0.9≈1.0×0.737×0.9≈0.663; Taking the headphone interaction record as an example, the sampling weight =1.0×(1+0.5×0.7 / 1.0)×(1-0.3×(1-0.35 / 1.0))×0.8≈1.0×1.35×(1-0.3×0.65)×0.8≈1.35×0.805×0.8≈0.869.

[0197] Using a dynamic sampling rate of 44.1% and the calculated sampling weights, weighted random sampling was performed on the user's historical behavior sequence. The sampling probability was positively correlated with the sampling weights, and the sampling probability of the headphone interaction record (0.869) was higher than that of the dress interaction record (0.663).

[0198] For the sampled interaction records, counterfactual samples with causal labels are generated based on their product category and interaction type. For example, for the sampled "buy headphones" record (coarse-grained category: electronics, fine-grained category: headphones), the counterfactual sample "buy laptop" (coarse-grained category: electronics, fine-grained category: laptop) is generated, and the causal label "category replacement" is added. For the sampled "add dress to cart" record, the counterfactual sample "add T-shirt to cart" is generated, and the causal label "fine category replacement" is added.

[0199] Step S5 further includes the following sub-steps:

[0200] Step S5-1: Construct a structural causal model and identify confounding factors that affect user interaction outcomes. Confounding factors include item popularity and spatiotemporal context.

[0201] Step S5-2: Input the counterfactual samples and their sampling weights into the graph neural network to generate user representations and item representations in the counterfactual world;

[0202] Step S5-3: Normalize the sampling weights of the counterfactual samples and use the normalized sampling weights as the confidence level of the counterfactual world representation;

[0203] Step S5-4: Using a backdoor adjustment strategy, intervene in the user representation and item representation in the counterfactual world to block the causal path from social relationship nodes and spatiotemporal context nodes to interaction result nodes;

[0204] Step S5-5: Based on the confidence level of the counterfactual world representation, the intervened counterfactual world representation is weighted and fused with the original propagation results to obtain the debiased user representation and item representation.

[0205] Furthermore, in sub-step S5-5, the step of weightedly fusing the post-intervention counterfactual world representation with the original propagation results includes:

[0206] The population confidence of the counterfactual samples is calculated based on the sampling weights of the counterfactual samples. The specific formula is: C_population=average(W) / max(W), where C_population is the population confidence, W is the sampling weight of the counterfactual samples, average() is the average function, and max() is the maximum function.

[0207] The confidence level of the counterfactual world representation is used as the individual confidence level, which is then fused with the group confidence level to obtain the comprehensive confidence level. The specific formula is: C_comprehensive=γ×C+(1-γ)×C_population, where C_comprehensive is the comprehensive confidence level, γ is the preset fusion coefficient, and C is the individual confidence level.

[0208] The fusion weight is calculated based on the comprehensive confidence level, and the specific formula is: w = C_comprehensive, where w is the fusion weight;

[0209] For each user or item, obtain its counterfactual world representation and the original propagation result, and perform a weighted fusion according to the following formula: H_final=w×H_cf+(1-w)×H_ori, where H_final is the debiased user representation and item representation, H_cf is the counterfactual world representation, and H_ori is the original propagation result.

[0210] It should be noted that the structural causal model refers to a model framework that clarifies the relationships between causes, effects, and confounding factors by constructing a causal relationship diagram between variables. In this invention, it is used to clearly define the various factors and their paths of action that affect the results of user interaction, thus providing a theoretical basis for debiasing.

[0211] Confounding factors refer to interfering factors that simultaneously affect causal variables and outcome variables, leading to misjudgments of causal relationships. In this invention, they are specifically project popularity and spatiotemporal context. The purpose of these factors in this invention is to accurately identify and eliminate irrelevant interferences and restore the true causal relationship.

[0212] Project popularity refers to the degree of attention a product or item receives on an e-commerce platform. It is measured by a combination of indicators such as exposure, sales volume, and clicks. In this invention, it is used to quantify the interference of product popularity on user interaction behavior.

[0213] Spatiotemporal context refers to the time and geographical location characteristics when a user generates an interactive behavior. Time characteristics include time periods, holidays, etc., and geographical location characteristics include the city and region. In this invention, it is used to quantify the interference of scene factors on user interactive behavior.

[0214] In the counterfactual world, the user representation refers to the vector generated by inputting counterfactual samples into a graph neural network, which reflects the user characteristics in the virtual scene. In this invention, it is used to provide a benchmark for comparison and to assist in bias reduction calculations.

[0215] The item representation in the counterfactual world refers to the vector generated by inputting counterfactual samples into a graph neural network, which reflects the characteristics of goods and items in a virtual scene. In this invention, it is used to match the counterfactual user representation to construct a virtual interactive scene.

[0216] The confidence level of the counterfactual world representation refers to the normalized sampling weights, which are used to measure the reliability of the counterfactual world representation. The higher the weights, the higher the confidence level. In this invention, it is used to determine the reference value of counterfactual samples in the debiasing process.

[0217] A causal path refers to the associated path from a causal variable to an outcome variable in a structural causal model. In this invention, it specifically refers to the path from social relationship nodes and spatiotemporal context nodes to interaction result nodes. In this invention, it is used to identify non-causal paths that need to be blocked.

[0218] The original propagation results refer to the user representations and item representations obtained through graph neural network information propagation without the introduction of counterfactual samples. In this invention, they are used to compare and fuse with the counterfactual representations after intervention to obtain the final debiased results.

[0219] The debiased user representation refers to the user feature vector that has been weighted and fused to eliminate the interference of confounding factors, and can truly reflect the user's core preferences. In this invention, it is used to provide accurate user features for subsequent matching score calculation.

[0220] The debiased item representation refers to the product item feature vector that has been weighted and fused to eliminate the interference of confounding factors. It can truly reflect the core attributes of the product and is used in this invention to provide accurate product features for subsequent matching score calculation.

[0221] Group confidence is a quantitative indicator that reflects the overall reliability of counterfactual samples, calculated based on the sampling weights of all counterfactual samples. In this invention, it is used to judge the reference value of counterfactual samples from an overall perspective.

[0222] Individual confidence refers to the confidence level corresponding to a single counterfactual world representation, i.e., the normalized weight of a single sample. In this invention, it is used to determine the reference value of a single counterfactual sample at the individual level.

[0223] The fusion coefficient γ is a preset coefficient used to adjust the proportion of influence between individual confidence and group confidence. In this invention, it is used to balance the reliability of individual samples and the reliability of the overall samples.

[0224] The overall confidence level refers to the final confidence level index obtained by integrating individual confidence levels and group confidence levels. It can comprehensively reflect the reliability of counterfactual representations and is the core basis for determining the integration weight in this invention.

[0225] Fusion weights are coefficients determined based on the overall confidence level and used to weight and fuse the counterfactual representation and the original result. In this invention, they are used to control the contribution ratio of the two representations in the final result.

[0226] In one specific implementation, this sub-step can be applied to the personalized recommendation bias removal scenario of a comprehensive e-commerce platform, and the specific implementation details are as follows:

[0227] The specific implementation of step S5-1 involves constructing a structural causal model. This model includes five categories of variables: user characteristics, product characteristics, item popularity, spatiotemporal context, and interaction results. A causal graph clarifies the relationships between these variables: user characteristics and product characteristics are causal variables, interaction results are outcome variables, and item popularity and spatiotemporal context are confounding factors that simultaneously influence both causes and outcomes. For example, a highly popular headphone product (item popularity) will simultaneously increase the probability of user clicks (interaction results) and product exposure (causal variable association), creating confounding interference.

[0228] The specific implementation of step S5-2 is as follows: The counterfactual samples generated in step S4 (such as samples with causal labels like "buy a laptop" and "add to cart a T-shirt") and their sampling weights (such as laptop sample weight 0.869, T-shirt sample weight 0.721, etc.) are input into the GraphSAGE graph neural network to generate the user representation H_cf_user and item representation H_cf_item in the counterfactual world. For example, the counterfactual user representation H_cf_user=[0.32,0.41,0.38,...,0.53] for user U1001, and the counterfactual item representation H_cf_item=[0.29,0.37,0.45,...,0.51] for laptop.

[0229] The specific implementation of step S5-3 is as follows: The sampling weights of the counterfactual samples are normalized using the min-max normalization method, mapping the weights to the 0-1 interval. For example, the original sampling weights 0.869, 0.721, and 0.663 are normalized to 1.0, 0.830, and 0.763. The normalized weights represent the confidence level C of the counterfactual world representation. For instance, the confidence level C for the laptop sample is 1.0, and the confidence level C for the T-shirt sample is 0.830.

[0230] The specific implementation of step S5-4: Intervention is carried out using a backdoor adjustment strategy, controlling the values ​​of the two confounding factors, item popularity and spatiotemporal context, to a fixed mean (mean of item popularity 0.5, mean of spatiotemporal context 0.3), blocking the non-causal paths of "social relationship node → user representation → interaction result node" and "spatiotemporal context node → item representation → interaction result node", and obtaining the counterfactual user representation H_cf_user_intervene and item representation H_cf_item_intervene after intervention.

[0231] The specific implementation of step S5-5: First, calculate the population confidence level. Given the sampling weights of the counterfactual samples W=[0.869,0.721,0.663], according to the formula C_population=average(W) / max(W), calculate the average value average(W)=(0.869+0.721+0.663) / 3≈0.751, and the maximum value max(W)=0.869. Therefore, the population confidence level C_population=0.751 / 0.869≈0.864.

[0232] With a preset fusion coefficient γ=0.6, taking the laptop sample as an example, its individual confidence level C=1.0, according to the formula C_comprehensive=γ×C+(1-γ)×C_population, the overall confidence level is calculated as 0.6×1.0+(1-0.6)×0.864=0.6+0.346≈0.946; taking the T-shirt sample as an example, the overall confidence level is 0.6×0.830+0.4×0.864≈0.498+0.346≈0.844. Based on the overall confidence level, the fusion weight w=C_comprehensive is calculated. The fusion weight w for the laptop sample is 0.946, and the fusion weight w for the T-shirt sample is 0.844.

[0233] Obtain the original propagation result H_ori_user=[0.35,0.43,0.36,...,0.55] for user U1001 and the original propagation result H_ori_item=[0.31,0.39,0.42,...,0.54] for the laptop.

[0234] According to the formula H_final=w×H_cf+(1-w)×H_ori, the bias-reduced user representation H_final_user=0.946×[0.32,0.41,0.38,...,0.53]+(1-0.946)×[0.35,0.43,0.36,...,0.55]≈[0.322,0.411,0.379,...,0.531]; the bias-reduced laptop item representation H_final_item=0.946×[0.29,0.37,0.45,...,0.51]+0.054×[0.31,0.39,0.42,...,0.54]≈[0.291,0.371,0.448,...,0.511].

[0235] Step S6 further includes the following sub-steps:

[0236] Step S6-1: Input the bias-reduced user representation and item representation into a preset matching function to calculate the matching score of the user for each candidate item;

[0237] Step S6-2: Correct the initial matching score according to the confidence level C. The correction formula is: S_final=S_initial×(1+α×(1-C)), where S_initial is the initial matching score, S_final is the corrected matching score, and α is the preset correction coefficient, which takes values ​​in the range of [0,1].

[0238] Step S6-3: Sort the candidate items in descending order according to the corrected matching scores to generate a sorted list of candidate items;

[0239] Step S6-4: Obtain the preset recommended list length, select the top N candidate items from the sorted list as the final recommended items, and display the final recommended items in the form of a recommended list.

[0240] It should be noted that the preset matching function refers to a pre-defined mathematical function used to calculate the degree of fit between the user representation and the item representation, including the inner product function, cosine similarity function, etc. In this invention, it is used to convert high-dimensional feature vectors into quantifiable matching values.

[0241] The initial matching score refers to the numerical value of the degree of fit between the user and the candidate item, which is calculated by the matching function and has not been adjusted for confidence. In this invention, it is used to provide a basic quantitative indicator for matching.

[0242] The confidence level C refers to the normalized sampling weight corresponding to the counterfactual world representation, which is used to measure the reliability of the counterfactual sample. In this invention, it is used to determine the credibility of the initial matching score and provide a basis for correction.

[0243] The correction coefficient α is a preset coefficient used to control the magnitude of the confidence level correction to the matching score. Its value ranges from 0 to 1. In this invention, it is used to balance the strength of the confidence level correction and avoid over-correction or under-correction.

[0244] The corrected matching score refers to the final quantitative matching index obtained after confidence correction, which can more realistically reflect the degree of fit between the user and the candidate items. In this invention, it is used to improve the accuracy of recommendation ranking.

[0245] Descending order sorting refers to arranging candidate items in descending order of their corrected matching scores. In this invention, it is used to prioritize presenting items that have a higher degree of relevance to the user.

[0246] The candidate item ranking list refers to a list containing all candidate items and their corresponding corrected matching scores after being sorted in descending order. In this invention, it is used to clarify the recommendation priority of candidate items.

[0247] The preset recommendation list length refers to the pre-defined number N of final recommended items, which is set according to the display requirements and user experience of the e-commerce platform. In this invention, it is used to control the number of recommended results and ensure the display effect.

[0248] The final recommended items refer to the top N candidate items selected from the sorted list, which is the set of items with the highest relevance to the user. In this invention, it is used to provide users with accurate and personalized recommended content.

[0249] The recommended list format refers to the way e-commerce platforms display the final recommended items, including image and text lists, card-style displays, etc. In this invention, it is to facilitate users' browsing and selection of recommended items.

[0250] In one specific implementation, this sub-step can be applied to the personalized recommendation scenario of the "You May Like" section of a comprehensive e-commerce platform, and the specific implementation details are as follows:

[0251] The specific implementation of step S6-1: The preset matching function is the cosine similarity function. The bias-reduced user representation obtained in step S5 (e.g., H_final_user=[0.322,0.411,0.379,...,0.531] for user U1001) and the candidate item representation (e.g., the bias-reduced item representation of candidate items such as laptop, T-shirt, casual pants, etc.) are input into this function to calculate the initial matching score. For example, the initial matching score S_initial=0.82 for user U1001 with laptop, S_initial=0.75 for user U1001 with T-shirt, S_initial=0.68 for user U1001 with T-shirt, and S_initial=0.79 for user U1001 with mobile phone.

[0252] The specific implementation of step S6-2: Preset the correction coefficient α=0.5, and obtain the confidence level C (correlated with the reliability of counterfactual samples) for each candidate item. For example, the confidence level C for a laptop is 1.0, the confidence level C for a T-shirt is 0.830, the confidence level C for casual pants is 0.763, and the confidence level C for a mobile phone is 0.864. The corrected matching score is calculated using the formula S_final = S_initial × (1 + α × (1 - C)): Laptop S_final = 0.82 × (1 + 0.5 × (1 - 1.0)) = 0.82; T-shirt S_final = 0.75 × (1 + 0.5 × (1 - 0.830)) = 0.75 × 1.085 ≈ 0.814; Casual pants S_final = 0.68 × (1 + 0.5 × (1 - 0.763)) = 0.68 × 1.1185 ≈ 0.761; Mobile phone S_final = 0.79 × (1 + 0.5 × (1 - 0.864)) = 0.79 × 1.068 ≈ 0.844.

[0253] The specific implementation of step S6-3: Sort the candidate items in descending order according to the corrected matching score. The sorting result is: mobile phone (0.844) > laptop (0.82) > T-shirt (0.814) > casual pants (0.761), and generate a sorted list of candidate items.

[0254] The specific implementation of step S6-4: The preset recommendation list length N=10, the top 10 candidate items are selected from the sorted list as the final recommended items (in addition to the above 4 items, other high matching score items such as headphones and dresses are also included); the final recommended items are displayed in the card-style image and text list format unique to the "You May Like" section, each card contains a product image, name, price and "Reason for Recommendation" tag (such as "Matches your electronic product preferences"), and are presented in a prominent position on the homepage of the e-commerce platform for easy browsing and clicking by users.

[0255] The above-described embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and should all be included within the protection scope of this application.

Claims

1. A method for processing interactive e-commerce data based on artificial intelligence, characterized in that, Includes the following steps: Step S1: Collect users' historical interaction data on e-commerce platforms, and construct user project interaction graphs and user social graphs based on the historical interaction data; Step S2: Perform information propagation of the graph neural network on the user project interaction graph and the user social graph, and dynamically adjust the attention weights according to the spectral radius of the node representation during the propagation process to obtain the final user representation and the corresponding final spectral radius. Step S3: Compare the final spectral radius with a preset threshold. If it is lower than the preset threshold, calculate the user's interaction probability distribution on different product categories to obtain the user behavior entropy, and proceed to step S4. If it is not lower than the preset threshold, jump to step S6 to generate a recommendation list. Step S4: Determine the counterfactual sampling rate based on the user behavior entropy, and sample the user's historical interaction data using the counterfactual sampling rate to generate counterfactual samples and corresponding sampling weights; Step S5: Input the counterfactual samples and their sampling weights into the graph neural network, and block the causal path from social relationship nodes and spatiotemporal context nodes to interaction result nodes through a backdoor adjustment strategy to obtain the debiased user representation and item representation. Step S6: Calculate the matching score between users and items based on the biased user representation and item representation, and sort the candidate items according to the matching score to generate a recommendation list.

2. The e-commerce interactive data processing method based on artificial intelligence according to claim 1, characterized in that: Step S1 further includes the following sub-steps: Step S1-1: Collect users' historical interaction data on the e-commerce platform. The historical interaction data includes user rating data, comment text data, social relationship data, and spatiotemporal context data. Steps S1-2: Count the number of characters in the comment text data as the text length, count the proportion of sentiment words in the comment text data to the total number of words in the text as the sentiment word density, and perform sentiment analysis on the comment text data to generate a sentiment tendency score; Steps S1-3: Determine the confidence level of the sentiment tendency score based on the text length and sentiment word density, and compare the confidence level with the preset sentiment tendency score threshold; Steps S1-4: When the confidence level is higher than the preset sentiment score threshold, the sentiment score and user rating data are weighted and fused to obtain the corrected user rating data. When the confidence level is lower than or equal to the preset sentiment score threshold, the user rating data is used as the corrected user rating data. Steps S1-5: Construct a user-project interaction graph with users and projects as nodes and the corrected user rating data as edge weights. Steps S1-6: Construct a user social graph with users as nodes and social relationship data as edges.

3. The method for processing interactive e-commerce data based on artificial intelligence according to claim 1, characterized in that: Step S2 further includes the following sub-steps: Step S2-1: In the information propagation of the graph neural network in the current layer, the user's item interaction information and social relationship information are aggregated based on the attention mechanism to generate the user representation of the current layer; Step S2-2: Based on the user representation of the current layer, calculate the current spectral radius ρ_current and obtain the preset first spectral radius threshold ρ_adjust; Step S2-3: Compare the current spectral radius ρ_current with the first spectral radius threshold ρ_adjust; Step S2-4: When the current spectral radius ρ_current is lower than the first spectral radius threshold ρ_adjust, update the attention weights of the attention mechanism; otherwise, keep the current attention weights unchanged and regenerate the user representation based on the updated attention weights. Step S2-5: Continue propagation until the propagation termination condition is met, then output the final user representation and obtain the corresponding final spectral radius ρ_final.

4. The AI-based e-commerce interactive data processing method according to claim 3, characterized in that, In sub-steps S2-4, the attention weights of the attention mechanism are updated according to the following steps: The spectral radius deviation is calculated based on the current spectral radius ρ_current and the preset spectral radius threshold ρ_threshold. The specific formula is: δ=(ρ_threshold-ρ_current) / ρ_threshold, where δ is the spectral radius deviation and ρ_current<ρ_threshold. Obtain a preset intimacy threshold θ_social, and select weights greater than θ_social from the attention weights corresponding to social relationships as the social weights to be adjusted; Obtain the preset amplification factor γ_social, calculate the amplification factor 1+γ_social×δ based on δ and γ_social, and multiply the social weight to be adjusted by this amplification factor to update the attention weight corresponding to the social relationship; Obtain the preset confidence threshold θ_confidence for sentiment tendency scores, and select weights with confidence less than θ_confidence from the attention weights corresponding to project interactions as project weights to be adjusted. Obtain the preset reduction factor γ_item, calculate the reduction factor 1-γ_item×δ based on δ and γ_item, multiply the weight of the item to be adjusted by the reduction factor, and update the attention weight corresponding to the item interaction.

5. The AI-based e-commerce interactive data processing method according to claim 1, characterized in that: In step S3, calculating the user's interaction probability distribution across different product categories to obtain the user behavior entropy further includes the following sub-steps: Step S3-1: Obtain the user's historical behavior sequence from historical interaction data, and map each product in the historical behavior sequence according to the preset multi-level category system to obtain the affiliation of each product in coarse-grained category and fine-grained category. Step S3-2: Count the number of first interactions by the user on the coarse-grained category and the number of second interactions on the fine-grained category, respectively. Step S3-3: Calculate the probability of the user's first interaction on the coarse-grained category based on the first number of interactions, and calculate the probability of the user's second interaction on the fine-grained category based on the second number of interactions. Steps S3-4: Calculate coarse-grained behavioral entropy based on the first interaction probability using the information entropy formula, and calculate fine-grained behavioral entropy based on the second interaction probability using the information entropy formula. Step S3-5: Weighted fusion of coarse-grained behavior entropy and fine-grained behavior entropy to obtain multi-granularity behavior entropy, which is used as user behavior entropy.

6. The e-commerce interactive data processing method based on artificial intelligence according to claim 1, characterized in that: Step S4 further includes the following sub-steps: Step S4-1: Determine the basic demand level for counterfactual sampling based on user behavior entropy. The specific formula is: D_behavior=H / H_max, where H is user behavior entropy, H_max is the preset maximum behavior entropy, and D_behavior is the basic demand level. Step S4-2: Determine the compensation requirement for counterfactual sampling based on the current spectral radius. The specific formula is: D_model = (ρ_threshold - ρ_current) / ρ_threshold, where D_model is the compensation requirement, ρ_current is the current spectral radius, and ρ_threshold is the preset spectral radius threshold. When ρ_current ≥ ρ_threshold, D_model takes the value of 0. Step S4-3: Nonlinearly fuse the basic demand degree and the compensation demand degree to obtain the comprehensive sampling demand degree. The specific formula is: D=1-(1-D_behavior)×(1-D_model), where D is the comprehensive sampling demand degree. Step S4-4: Calculate the dynamic sampling rate based on the preset minimum sampling rate, maximum sampling rate, and comprehensive sampling demand. The specific formula is: ρ_dynamic = ρ_min + (ρ_max - ρ_min) × D, where ρ_dynamic is the dynamic sampling rate, ρ_max is the preset maximum sampling rate, and ρ_min is the preset minimum sampling rate. Steps S4-5: Adaptive importance sampling is performed on the user's historical behavior sequence based on the dynamic sampling rate to generate a lightweight counterfactual sequence as a counterfactual sample, and the sampling weight corresponding to each counterfactual sample is recorded.

7. The AI-based e-commerce interactive data processing method according to claim 6, characterized in that, In sub-steps S4-5, the step of generating a lightweight counterfactual sequence as a counterfactual sample includes: Obtain multi-dimensional features for each interaction record in the user's historical behavior sequence. The multi-dimensional features include the confidence level of sentiment tendency score, the intimacy of social relationship, and the interaction time decay factor. The sampling weight of each interaction record is calculated based on multi-dimensional features. The specific formula is: W=w_base×(1+α×S_intimacy / S_max)×(1-β×(1-C_conf / C_max))×T_decay, where W is the sampling weight, w_base is the basic sampling weight, α is the intimacy amplification coefficient, S_intimacy is the social relationship intimacy, S_max is the preset maximum intimacy, β is the confidence suppression coefficient, C_conf is the confidence of the sentiment tendency score, C_max is the preset maximum confidence, and T_decay is the interaction time decay factor. We employ dynamic sampling rate and sampling weight to perform weighted random sampling on the user's historical behavior sequence; For the sampled interaction records, counterfactual samples with causal labels are generated based on their corresponding product categories and interaction types.

8. The artificial intelligence-based e-commerce interactive data processing method according to claim 1, characterized in that: Step S5 further includes the following sub-steps: Step S5-1: Construct a structural causal model and identify confounding factors that affect user interaction results. These confounding factors include item popularity and spatiotemporal context. Step S5-2: Input the counterfactual samples and their sampling weights into the graph neural network to generate user representations and item representations in the counterfactual world; Step S5-3: Normalize the sampling weights of the counterfactual samples and use the normalized sampling weights as the confidence level of the counterfactual world representation; Step S5-4: Using a backdoor adjustment strategy, intervene in the user representation and item representation in the counterfactual world to block the causal path from social relationship nodes and spatiotemporal context nodes to interaction result nodes; Step S5-5: Based on the confidence level of the counterfactual world representation, the intervened counterfactual world representation is weighted and fused with the original propagation results to obtain the debiased user representation and item representation.

9. The artificial intelligence-based e-commerce interactive data processing method according to claim 8, characterized in that, In sub-step S5-5, the step of weightedly fusing the counterfactual world representation after intervention with the original propagation result includes: The population confidence of the counterfactual samples is calculated based on the sampling weights of the counterfactual samples. The specific formula is: C_population=average(W) / max(W), where C_population is the population confidence, W is the sampling weight of the counterfactual samples, average() is the average function, and max() is the maximum function. The confidence level of the counterfactual world representation is used as the individual confidence level, which is then fused with the group confidence level to obtain the comprehensive confidence level. The specific formula is: C_comprehensive=γ×C+(1-γ)×C_population, where C_comprehensive is the comprehensive confidence level, γ is the preset fusion coefficient, and C is the individual confidence level. The fusion weight is calculated based on the comprehensive confidence level, and the specific formula is: w = C_comprehensive, where w is the fusion weight; For each user or item, obtain its counterfactual world representation and the original propagation result, and perform a weighted fusion according to the following formula: H_final=w×H_cf+(1-w)×H_ori, where H_final is the debiased user representation and item representation, H_cf is the counterfactual world representation, and H_ori is the original propagation result.

10. The e-commerce interactive data processing method based on artificial intelligence according to claim 1, characterized in that: Step S6 further includes the following sub-steps: Step S6-1: Input the bias-reduced user representation and item representation into a preset matching function to calculate the matching score of the user for each candidate item; Step S6-2: Correct the initial matching score according to the confidence level C. The correction formula is: S_final=S_initial×(1+α×(1-C)), where S_initial is the initial matching score, S_final is the corrected matching score, and α is a preset correction coefficient with a value range of [0,1]. Step S6-3: Sort the candidate items in descending order according to the corrected matching scores to generate a sorted list of candidate items; Step S6-4: Obtain the preset recommended list length, select the top N candidate items from the sorted list as the final recommended items, and display the final recommended items in the form of a recommended list.