Short video recommendation algorithm based on causal inference and graph convolution network
By combining causal inference with graph convolutional networks, video duration bias is eliminated, achieving fairness and accuracy in short video recommendation systems, and improving user experience and content diversity.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- LIAONING UNIVERSITY
- Filing Date
- 2026-04-02
- Publication Date
- 2026-06-19
AI Technical Summary
In scenarios where short and long videos are recommended together, existing recommendation systems are prone to duration bias, leading to homogenized content production by creators and a decline in user experience, making it difficult to build a fair and stable recommendation mechanism.
A recommendation algorithm based on causal inference and graph convolutional networks is adopted. By eliminating the influence of video duration confounding factors through causal debiasing and dual-channel feature learning, the user's true interest and surface relevance signals are separated, and a differentiated aggregation mechanism is designed for personalized recommendations.
It improves the fairness and accuracy of the recommendation system, promotes content diversity, optimizes user experience, and increases platform activity.
Smart Images

Figure CN122240876A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of short video recommendation, specifically involving a short video recommendation algorithm based on causal inference and graph convolutional networks. Background Technology
[0002] In recent years, with the improvement of mobile internet infrastructure and the popularization of smart terminals, short video platforms such as Douyin, Kuaishou, and Bilibili have rapidly emerged, profoundly changing the way people obtain information, enjoy entertainment, and interact socially. Fragmented time is fully utilized, the threshold for content production has been significantly lowered, and algorithm-driven distribution mechanisms have replaced traditional subscription and search models. Users continuously receive personalized recommendations while enjoying an "endless scrolling" browsing experience. This algorithm-centric content distribution model enables platforms to quickly match user interests among massive video resources, achieving high-frequency, high-stickiness content consumption, and driving exponential growth in the user base and activity of video platforms.
[0003] Against this backdrop, recommendation systems have become a key technological infrastructure for video platforms. Unlike traditional e-commerce or news recommendation systems, which primarily rely on explicit feedback signals such as clicks, purchases, and favorites, video platforms place greater emphasis on implicit user behavioral feedback, especially metrics like viewing time, completion rate, and number of repeat views. These behavioral signals can more precisely characterize the user's true interest in content, thus providing data support for personalized ranking and precise distribution. However, problems arise when platforms simultaneously offer short and long video content and rank them within the same recommendation stream.
[0004] First, the core metric of viewing time is inherently highly correlated with video length. Longer videos have an advantage in absolute viewing time, while shorter videos tend to perform better in terms of completion rate. When recommendation models directly use viewing time or completion rate as optimization targets, the model may tend to recommend videos of a certain length, thus creating a "duration bias." For example, if total viewing time is the primary optimization target, the system may prefer to push longer videos because even if users only watch a portion of them, they still contribute a higher viewing time; while if completion rate is the primary metric, shorter videos are more likely to receive higher ratings and higher ranking weights because they are more easily watched in their entirety. This bias does not necessarily reflect users' true preferences for content quality, but may simply be driven by the structural factor of video length.
[0005] Secondly, duration bias can further impact the healthy development of the content ecosystem. If the recommendation system consistently favors videos of a certain length, it will lead to homogenization among creators in content production, who will actively cater to algorithmic preferences by compressing or lengthening videos to gain more traffic. This will not only weaken content diversity but may also harm user experience, causing users to be unknowingly "surrounded" by a single type of content. From the perspective of the platform's long-term development, how to ensure user engagement while avoiding the unfair influence of duration factors on recommendation results has become a critical issue that urgently needs to be addressed.
[0006] Therefore, in the context of mixed short and long video recommendation, how to reasonably model behavioral signals such as viewing time, eliminate or mitigate duration bias, and construct a fairer, more stable, and interpretable recommendation mechanism has become a key research area in recommender systems.
[0007] This is an important direction. It not only involves the accurate portrayal of user interests, but also relates to the sustainable development of the platform's content ecosystem and the improvement of algorithmic fairness. Summary of the Invention
[0008] This invention proposes a short video recommendation algorithm based on causal inference and graph convolutional networks. By eliminating confounding factors, it separates users' true interests from superficial relevance signals. This method utilizes causal intervention or reweighting mechanisms to weaken the influence of false paths in the "duration → viewing behavior" sequence, thereby mitigating the duration bias problem and improving the fairness and accuracy of the recommendation system.
[0009] To achieve the above objectives, the present invention adopts the following technical solution: A short video recommendation algorithm based on causal inference and graph convolutional networks includes the following steps: Step 1) For the short video-user interaction sample information input into the recommendation system, first model the causal relationship of the items, and use causal inference to filter out confounding factors (video duration) in the sample to restore the user's true preferences.
[0010] The causal relationship modeling of the project is specifically manifested as follows: right The causal effect refers to the effect of forcibly applying... When the value changes from the reference value to the target value, The changes that have occurred. Therefore, the key to estimating causal effects lies in obtaining information about the changes that have occurred. Results after intervention In practice, the factual standard for obtaining causal intervention outcomes is to conduct randomized controlled trials. However, such experiments are extremely costly in recommender systems and are not practical for problems involving confounding features, as items are typically created by third parties. Therefore, we must estimate intervention outcomes from existing data.
[0011] Step 2) Through The operator and backdoor criterion (backdoor adjustment method) are used to calculate and eliminate the interference of the video duration as a confounding factor, so as to obtain the debiased user-item rating matrix. based on The specific methods for eliminating confounding factors using operators and backdoor criteria are as follows: Introduction Operators represent forced intervention on variables, distinguishing between observed probabilities. With causal probability Cut off confounding factors through the back door criterion (Video duration) The interference path is identified, and the backdoor adjustment formula is derived to achieve causal bias removal. The calculation process is as follows:
[0012] in Confounding factors Pick The natural distribution probability of the value For a given user Video features Confounding factors The conditional probability of user interaction.
[0013] Step 3) Divide the user-item rating matrix obtained after causal bias correction into positive and negative samples, and then use a graph convolutional network (PCNN) to perform the process. Design a differentiated aggregation mechanism to model and aggregate the features of positive and negative samples separately: The specific methods for positive and negative sample partitioning and differential graph convolution aggregation are as follows: Step 3.1: Thresholding to Divide Positive and Negative Samples: A thresholding mechanism is introduced. Based on the debiased user-item rating matrix, the short video items around the user are divided into positive samples (user preferences) and negative samples (user dislikes).
[0014] in For users For video Debiased scoring This is a preset scoring threshold.
[0015] Step 3.2: Differentiated Neighborhood Aggregation Settings: When aggregating positive samples, merge the user's own positive samples with positive sample information from users with similar preferences, and set the aggregation layer hyperparameter. Dynamic selection through attention mechanism Neighbor aggregation; when aggregating negative samples, only negative samples of the user themselves are aggregated, and a shallower aggregation layer is set. To avoid excessive interference from negative signals; Step 3.3: Data Augmentation and Neighborhood Balancing: For each target video, a fixed number of nodes are sampled from the positive and negative sample neighbors. By padding and truncation, the neighborhood size is ensured to be consistent, eliminating the bias caused by insufficient number of neighbors. Step 3.4: Dual Channel Feature learning: Constructing independent positive and negative samples The channels have the same network architecture but do not share parameters. They learn the feature patterns and attention distribution of positive and negative neighborhoods separately. The calculation process is as follows:
[0016] in For attention weights, The characteristic transformation matrix, For nodes The neighborhood, For activation function, For nodes The aggregated features.
[0017] Step 4) For the dual channels in Step 3) The positive and negative samples output are aggregated and fused. The interaction probability prediction result between the user and the short video is obtained through the aggregation layer and the output layer, thus completing the personalized recommendation of short videos. Step 4.1: Feature Aggregation: For positive and negative samples The output features employ a differentiated aggregation strategy. (etc.), among which Aggregation is the optimal strategy:
[0018] in, For positive sample aggregation, For negative sample aggregation features, This is the final expression.
[0019] Step 4.2: Interaction Probability Prediction: Predicting the final aggregated features Input / output layers, through The activation function maps to interaction probabilities between 0 and 1, that is:
[0020] in This is the output layer weight vector. For bias terms, for Activation function For users For video The predicted value of the interaction probability; Step 4.3: Personalized Recommendation: Based on the predicted interaction probability values of each short video, sort them in descending order and push them to the user. Personalized recommendations are made based on the short video content with the highest probability value.
[0021] The beneficial effects of this invention are: it deeply integrates causal inference with graph convolutional networks, first through... Operators and back-gate criteria eliminate cluttered interference from video duration, restoring the user's true preference data, and then pass through dual channels. Differentiated modeling and aggregation of positive and negative samples are performed to fully capture users' positive interests and negative feedback. While addressing the issue of bias in short video recommendation duration, this approach enhances the recommendation system's ability to characterize user preferences, effectively improving recommendation accuracy and robustness. Simultaneously, it promotes platform content diversity, optimizes user experience, and increases user activity on the platform. Attached Figure Description
[0022] Figure 1 This is the flowchart for the causal bias correction module; Figure 2 This is a simplified diagram of the expert network module design; Figure 3 This is a flowchart of the preference modeling module; Figure 4 This is a simplified diagram of the positive sample aggregation module; Figure 5 This is a simplified diagram of the final aggregation module; Figure 6 This is an RI comparison chart. Detailed Implementation
[0023] The present invention will be further described below with reference to the accompanying drawings and examples.
[0024] This invention provides a short video recommendation method based on causal inference and graph convolutional networks, which addresses the duration bias problem caused by confounding factors such as video length in short video recommendation systems, achieving more accurate user preference modeling and personalized recommendations. This method utilizes a real recommendation log dataset from Kuaishou for model training and validation, and its module flow is as follows: Figure 1 and Figure 3 As shown, it mainly includes four stages: causal bias removal, positive and negative sample separation, differential graph aggregation modeling, and final recommendation prediction. 1) For the short video-user interaction sample information input into the recommendation system, the causal relationship of the items is first modeled. Causal inference is then used to filter out confounding factors (video duration) in the samples to restore the true user preferences. The causal relationship modeling of the items is specifically manifested as: defining... For video features other than duration, For user preference results, right The causal effect refers to the effect of forcibly applying... When the value changes from the reference value to the target value, The changes that have occurred. Therefore, the key to estimating causal effects lies in obtaining information about the changes that have occurred. Results after intervention In practice, the factual standard for obtaining causal intervention results is to conduct randomized controlled trials. However, such experiments are extremely costly in recommender systems and are not practical for problems involving confounding features, as short video projects are typically created by third-party creators. Therefore, this embodiment starts from... Dataset Estimating intervention results from sparse training sets (affected by confounding features) completes the basic modeling for causal debiasing.
[0025] like Figure 1 As shown, a causal relationship model is first performed on the short video-user interaction samples. Video duration is explicitly identified as a confounding factor. A causal analysis framework is constructed using user characteristics, video content characteristics, and interaction results, thus providing a theoretical basis for subsequent bias removal calculations.
[0026] 2) Through Operators and backdoor criterion (backdoor adjustment method) are used to calculate and eliminate the interference of video duration as a confounding factor, resulting in a debiased user-item rating matrix; based on The specific method for eliminating confounding factors using operators and backdoor criteria is as follows: Introducing... Operators represent forced intervention on variables, distinguishing between observed probabilities. With causal probability Cut off confounding factors through the back door criterion (Video duration) Interference path, derive backdoor modulation
[0027] The entire formula achieves causal bias correction, and the calculation process is as follows:
[0028] like Figure 2 As shown, this embodiment further utilizes an expert network structure to achieve causal bias correction calculation. Each expert models a different feature subspace, and the gating unit adaptively assigns expert weights based on the feature distribution of the input samples, thereby improving the accuracy of conditional probability estimation and bias correction calculation.
[0029] in Confounding factors Pick The natural distribution probability of the value is obtained through statistics. Take video duration from dataset Frequency estimation of values; For a given user Video features Confounding factors The conditional probability of user interaction is calculated using a method that addresses the sparsity of the dataset. The network serves as the backbone, using maximum likelihood estimation to train the model to fit the conditional probability, while also introducing [something] into the expert module. The network allows experts to adaptively focus on relevant duration features, and residual connections ensure smooth gradients. Finally, the formula calculation is completed through an architecture of basic common modules + gating units + multi-expert modules, and the biased user-project rating matrix is output.
[0030] 3) such as Figure 3 As shown, after causal debiasing, the user-item rating matrix is further divided into positive and negative samples, and a preference modeling module is constructed. Based on the debiased rating results, this module divides the user's historical interactions into positive and negative sample sets, and then feeds them into a differential graph neural network for feature propagation and aggregation to enhance the ability to characterize the user's true preference structure.
[0031] After biasing the causal relationship, the user-item rating matrix is divided into positive and negative samples, and a graph convolutional network is used to perform the analysis. The design employs a differentiated aggregation mechanism to model and aggregate the features of both positive and negative samples. The specific methods for positive / negative sample partitioning and differentiated graph convolutional aggregation are as follows: Step 1: Thresholding to Separate Positive and Negative Samples: A thresholding mechanism is introduced. Based on the debiased user-item rating matrix, the short video items around the user are divided into positive samples (user preferences) and negative samples (user dislikes), i.e.:
[0032] in For users For video Debiased scoring This is a preset scoring threshold.
[0033] Step 2: Differentiated Neighborhood Aggregation Settings: When aggregating positive samples, merge the user's own positive samples with positive sample information from users with similar preferences, and set the aggregation layer hyperparameter. Dynamic selection through attention mechanism Neighbor aggregation; when aggregating negative samples, only negative samples of the user themselves are aggregated, and a shallower aggregation layer is set. To avoid excessive interference from negative signals; such as Figure 4As shown, the positive sample aggregation module takes the target node as the center and performs multi-layer propagation and feature aggregation on its positive sample neighborhood. This module integrates the user's own positive feedback information and the local neighborhood information of users with similar preferences, and dynamically assigns importance weights to different neighbors through an attention mechanism, thereby extracting more representative positive interest representations.
[0034] Step 3: Data Augmentation and Neighborhood Balancing: For each target video, a fixed number of nodes are sampled from its positive and negative neighbor samples. Padding and truncation are used to ensure consistent neighborhood size and eliminate... Bias caused by insufficient number of neighbors in the dataset; Step 4: Dual Channel Feature learning: Constructing independent positive and negative samples The channels have the same network architecture but do not share parameters. They learn the feature patterns and attention distribution of positive and negative neighborhoods separately. The calculation process is as follows:
[0035] in For attention weights, The characteristic transformation matrix, For nodes The neighborhood, For activation function, For nodes The aggregated features.
[0036] 4) Regarding the dual channels in step 3), The output positive and negative samples are aggregated and fused. The aggregation layer and output layer are then used to obtain the user-short video interaction probability prediction result, based on... Dataset The test set (with confounding features removed) was used to validate the model, and the final output was personalized recommendation results for short videos. like Figure 5 As shown, after obtaining the aggregated representations of the positive and negative sample channels, the two are input into the final aggregation module for fusion processing. This module integrates the dual-channel features through summation or concatenation, and calculates the interaction probability between the user and the short video at the output layer, thereby generating the final recommendation result.
[0037] Step 1: Feature aggregation: For positive and negative samples The output features employ a differentiated aggregation strategy. (etc.), among which Aggregation is the optimal strategy:
[0038] in, For positive sample aggregation, For negative sample aggregation features, This is the final expression.
[0039] Step 2: Interaction Probability Prediction: Predicting the final aggregated features Input / output layers, through The activation function maps to interaction probabilities between 0 and 1, that is:
[0040] in This is the output layer weight vector. For bias terms, for Activation function For users For video The predicted value of the interaction probability; Step 3: Personalized Recommendation and Model Validation: Based on the predicted interaction probability values of each short video, sort them in descending order and push short video content with the highest probability values to the user; simultaneously, compare the prediction results with... Dataset A comparison of real feedback, from , , , Four metrics were used to verify the model's performance and complete the entire process of personalized short video recommendation.
[0041] 5) Experimental Data Analysis: Table 1 shows the comparison results of recommendation performance of different models on this dataset. From an overall perspective, the proposed... The series of models significantly outperformed the baseline models mentioned above in all four metrics, validating the effectiveness of the proposed method in user interest modeling and preference identification.
[0042] like Figure 6 As shown, this paper presents a comparison of the RI (Recall, Map Value, NDCG, and AUC) of the proposed method with several baseline models on the test set. Overall, the proposed model outperforms the comparative methods in terms of Recall, MAP (Map Value), NDCG, and AUC, indicating that the proposed method has good effectiveness in user interest modeling and preference identification.
[0043] Traditional recommendation model Because it cannot characterize the structured semantic relationships in user behavior, Only 0.1611 was achieved; based on knowledge enhancement and While it utilizes external semantic information to some extent, its performance is still weaker than the method presented in this paper due to the lack of distinguishing modeling between positive and negative preference signals.
[0044] In comparison, Both variants achieved significant performance improvements, among which To achieve the best results, , , , The values reached 0.1833, 0.0326, 0.0946, and 0.833, respectively. This result indicates that obtaining a more realistic preference signal through causal debiasing and employing a dual-channel approach... Modeling positive and negative behaviors separately can effectively enhance the ability to represent user interests, thereby improving recommendation accuracy.
[0045] Meanwhile, different aggregation strategies also exhibited performance differences as expected. The aggregation performed best overall, indicating that explicit weighting and fusion of positive and negative preference information helps retain important interest cues; while While high-dimensional splicing increases the parameters, no effective performance gain is obtained, reflecting that direct splicing cannot fully utilize the dual-channel information structure.
[0046] In summary, the experimental results fully verify the effectiveness and design rationality of the proposed causal bias removal + dual-channel graph neural network.
[0047] Example 1
[0048] S100. Select short video-user interaction samples input into the recommendation system. Dataset As a training set As a test set, causal relationship modeling is performed on the training set samples, defining... Video duration (confounding factors) For other features of the video, For user characteristics, For user preference results, clearly right The core research objective is to determine the causal effects. For example... Figure 1 As shown, the S100 stage mainly completes sample input, variable definition, and causal relationship modeling, laying the foundation for subsequent back door adjustment and bias reduction calculations.
[0049] S200, based on Operator and backdoor criterion pair Causal bias correction calculations are performed on the training set. First, different video durations are analyzed in the statistical data set. The frequency of occurrence, estimated Reconstruction Backbone network, combined Expert modules and gating mechanisms are used to train the model for fitting via maximum likelihood estimation. Finally, the formula is substituted into the backdoor adjustment to complete the calculation, outputting the debiased user-item rating matrix, eliminating the clutter interference of video duration on user preferences. For example... Figure 2 As shown, in the S200 stage, the conditional probability is fitted using an expert network and a gating mechanism, and the causal weighting is debiased by combining a backdoor adjustment formula, finally obtaining the debiased user-item rating matrix.
[0050] S300. Perform positive and negative sample segmentation and differentiation on the bias-free user-item rating matrix. The model uses a preset scoring threshold to divide the user's surrounding videos into positive and negative samples, and sets the number of aggregation layers for positive samples. Neighbor aggregation number It integrates positive sample information from the user and similar users; it sets the number of aggregation layers for negative samples. It only aggregates negative samples from the user's own data; it ensures consistent neighborhood size through data augmentation mechanisms; and it constructs a dual-channel independent... The network learns the neighborhood features and attention distribution of positive and negative samples respectively, and completes feature extraction and aggregation. For example... Figure 3 and Figure 4 As shown, in stage S300, positive and negative samples are first divided based on the debiased score, and then neighborhood information is propagated and aggregated through a differential graph neural network. Among these steps... Figure 3 This demonstrates the overall workflow of the preference modeling module. Figure 4 The specific structure of the positive sample aggregation module is shown.
[0051] S400, for positive and negative samples Aggregate features Fusion, input and output layers through The activation function obtains the predicted user-short video interaction probability values, and a recommendation list is generated by sorting the predicted values in descending order; the recommendation results are then displayed in... Validation was performed on the test set, and compared with traditional recommendation models ( ), knowledge enhancement model ( Perform a performance comparison. For example... Figure 5 As shown, the S400 stage fuses the output features of the positive and negative channels, and calculates the user's interaction probability with the candidate short videos through the output layer, thereby generating a sorted recommendation list.
[0052] S100 includes the following steps: (1) To Dataset and Perform data preprocessing to extract user features, video features, and interaction features, and construct a short video-user interaction sample set; (2) Based on the preprocessed sample set, draw a confounding factor causal diagram, clarify the relationship between variables, and determine the video duration as the core confounding factor, laying the foundation for subsequent causal debiasing.
[0053] S200 includes the following steps: (1) Introduction The operator distinguishes between observational probability and causal probability, and cuts the video duration using the backdoor criterion. Video features Interference path, derive backdoor adjustment formula
[0054] (2) Estimate :statistics The frequency of different video durations in the training set is calculated as a probability estimate. (3) Estimation : Building For the backbone network model, add after the embedding layer Operation modeling of second-order combinatorial features, introduction of expert modules The network, combined with a gating mechanism, selects corresponding expert inferences, completes model training by minimizing the negative log-likelihood of interaction labels, and fits the conditional probability of user interactions. (4) Substitute the backdoor adjustment formula, complete the causal weighted debiasing calculation through the basic public module + gating unit + multi-expert module architecture, and output the debiased user-item rating matrix.
[0055] S300 includes the following steps: (1) Thresholding to divide positive and negative samples: based on The rating distribution of the dataset is determined by pre-setting a reasonable threshold to divide the videos in the user-item rating matrix into positive samples (user preferences) and negative samples (user dislikes), which are then stored as two separate sample sets. (2) Differentiated neighborhood aggregation settings: For the positive sample set, merge the positive sample information of the user's own positive sample and the positive sample information of users with similar preferences, and set the number of aggregation layers. The system dynamically selects four neighbors for aggregation using an attention mechanism; for the negative sample set, it only aggregates the user's own negative samples, and sets the number of aggregation layers. To avoid excessive interference from negative signals; (3) Data augmentation and neighborhood balancing: For each target video, a fixed number of nodes are sampled from the positive and negative sample neighbors. Samples with insufficient neighbors are supplemented, and samples with excessive neighbors are truncated to ensure that the neighborhood size is consistent. (4) Dual-channel Feature learning: Constructing independent positive and negative samples The network has a consistent architecture but does not share parameters. It calculates the weights of neighboring nodes through an attention mechanism, completes feature aggregation, and obtains the feature representations of positive and negative samples.
[0056] S400 includes the following steps: (1) Feature fusion: adopting Aggregation strategy, combining positive and negative samples The output feature representations are weighted and summed to obtain the final feature representation of the target video:
[0057] (2) Interactive probability prediction: The final feature representation is input to the output layer, and then... The activation function is mapped to the interaction probability between 0 and 1, and the predicted interaction probability value for each user to each short video is obtained. (3) Generate a recommendation list: Sort the short videos in descending order according to the predicted interaction probability values, and push the short video content with the highest probability value to each user; (4) Model performance verification and comparison: In Validate model performance on the test set, and simultaneously with , , Model Comparison , , , Four indicators.
[0058] Experimental conclusions
[0059] like Figure 6 As shown in Table 1, the proposed model demonstrates superior performance across multiple evaluation metrics, validating the effectiveness of the method in mitigating duration bias and improving recommendation accuracy. In this embodiment, the proposed... The model in On the test set Reaching 0.1833, Reaching 0.0326, Reaching 0.0946, Reaching 0.833, significantly better than , Equal comparison models. In contrast, traditional Because the model struggles to characterize the structured semantic relationships in user behavior and does not address duration biases, it is prone to learning superficial relevance influenced by confounding factors, resulting in relatively low recommendation accuracy. and Although external semantic information is introduced to enhance the representation capability, the positive and negative preference signals are not differentiated and the temporal confounding effect is not eliminated from the causal level. Therefore, the overall performance is still weaker than the method of this invention.
[0060] Table 1. Top-N recommendation performance of different methods on KuaiRec
[0061] This invention obtains more accurate and reliable user preference signals through a causal bias correction mechanism, and combines dual-channel... Differentiated modeling of positive and negative samples effectively enhances the ability to represent user interests. Comparative experiments on multiple mainstream evaluation metrics show that the overall performance of this method is superior to... The baseline model was used; experiments also showed that the proposed method exhibited stronger stability and generalization ability in scenarios involving a mix of long and short videos.
[0062] Furthermore, analysis of the distribution of recommendation results shows that this method can effectively alleviate the unfairness caused by time bias while improving recommendation accuracy, and enhance the balance and diversity of content distribution.
Claims
1. A short video recommendation algorithm based on causal inference and graph convolutional networks, characterized in that, The steps are as follows: Step 1) Perform causal bias correction processing on the user behavior data input into the system: based on The operator constructs a causal intervention model to explicitly eliminate the confounding effects of video duration on user interest signals and obtain user preference data; Step 2) Divide the biased data into positive and negative samples: Based on the behavior intensity threshold, distinguish the neighborhood behavior, divide the relationship between users and items into a positive sample set and a negative sample set, and store them separately for differential modeling; Step 3) Perform graph neural network modeling and feature aggregation on positive and negative samples respectively: Construct a differentiated graph neural network structure, perform feature extraction and aggregation on positive and negative samples respectively, and then fuse the two representations into the central node to achieve a multi-angle characterization of user preference signals; Step 4) Optimize recommendation prediction based on differential representation: Use the fused preference features at the central node to perform recommendation task prediction and output the recommendation results.
2. The short video recommendation algorithm based on causal inference and graph convolutional networks according to claim 1, characterized in that: In step 1), by introducing causal intervention calculation, the mixed effects of video duration variable on user preference signal are isolated, and the debiased user behavior data is obtained, which is then used as the basic input for subsequent positive and negative sample construction and graph aggregation modeling. in, Indicates project characteristics, Indicates user characteristics, Indicates user interests, To represent confounding factors, the formula means that... Operation, cut off The path, i.e., forced recommendation It appears, rather than naturally; The corresponding output is then passed through a backbone network, and the appropriate method is selected. By modeling second-order features through embedding layers, the information expressed by the input becomes richer, greatly improving the ability of subsequent hidden layers to learn higher-order nonlinear combination features. in, The input features are represented by two parts. The first part captures the combined information from "multiplying all features pairwise". The second part subtracts the square of each feature itself, ultimately yielding the pure interaction terms between the features. .
3. The short video recommendation algorithm based on causal inference and graph convolutional networks according to claim 1, characterized in that: In step 2), the specific method for dividing positive and negative samples is as follows: based on the intensity of the user's behavior after debiasing, a threshold is set to divide the neighborhood behaviors related to the central node into a set of positive samples and a set of negative samples; and these are stored separately for differential graph neural network modeling. in, Represents the number of layers in the graph structure. Representing the Vector representation of items in a layer. This represents all positive samples around the corresponding user. The formula means that for all positive sample neighbors, an aggregation operation is performed, and then the result is divided by a total number to prevent the aggregation result from being too large.
4. The short video recommendation algorithm based on causal inference and graph convolutional networks according to claim 1, characterized in that: In step 3), both positive and negative samples use a graph attention network structure for propagation. The network architecture of the two is consistent, but the parameters are not shared, so that the model can independently learn the feature patterns and attention distribution of the positive and negative neighborhoods. First, for each round of aggregation process, it is necessary to... To ensure the number of neighbors, a data augmentation algorithm was designed, the specific formula of which is as follows: in, This indicates the number of neighboring nodes to be used. This indicates the number of currently existing neighboring nodes. This represents dynamically and randomly generated nodes of interest.
5. The short video recommendation algorithm based on causal inference and graph convolutional networks according to claim 1, characterized in that: In step 4), the final node aggregation of the fused features uses two different aggregation formulas: in, This represents the final representation obtained from the aggregation of positive sample neighbors. This represents the final representation obtained from the aggregation of negative sample neighbors. and These represent either summation or concatenation methods used to calculate the final score.