Item recommendation method and system based on hybrid collaborative filtering
By employing a hybrid collaborative filtering method, residual preprocessing, and time decay factor correction for bias, combined with an anti-prevalence penalty index, the bias and diversity problems in collaborative filtering systems are solved, achieving personalized and diverse recommendation results and improving the accuracy and efficiency of the recommendation system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- COMMUNICATION UNIVERSITY OF CHINA
- Filing Date
- 2026-01-27
- Publication Date
- 2026-06-19
AI Technical Summary
Existing collaborative filtering recommendation systems suffer from bias problems, time-effect problems, and diversity and redundancy problems, which lead to recommendation results that are biased towards popular items, fail to reflect changes in user interests in a timely manner, and have high computational costs and poor interpretability due to their complexity.
A hybrid collaborative filtering method is adopted. Global and individual biases are removed through residual preprocessing, and neighborhood similarity is calculated by introducing time decay factor and inverse frequency factor. Combined with anti-prevalence penalty index, the results are re-ranked to generate personalized and diverse recommendation results.
It improves the accuracy and diversity of recommendations, maintains the simplicity, transparency and low latency of the algorithm, and enhances the practicality of the recommendation system.
Smart Images

Figure CN122240936A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing model technology, particularly to the field of Internet recommendation system technology, and specifically to an object recommendation method and system based on hybrid collaborative filtering. Background Technology
[0002] With the development of internet applications, personalized recommendation systems have been widely used in e-commerce, content media, and other fields to help users discover items and / or other objects of interest. Collaborative filtering (CF) is one of the most classic methods in recommendation systems, including neighborhood collaborative filtering techniques such as user-based collaborative filtering (UserCF) and item-based collaborative filtering (ItemCF). Traditional neighborhood collaborative filtering techniques have long been favored in industry due to their simplicity, efficiency, and ease of understanding. However, existing neighborhood collaborative filtering techniques have the following shortcomings in practical applications: 1) Bias Issue: Traditional UserCF / ItemCF typically calculates similarity and preference scores directly using users' historical behavior, failing to consider factors such as popular items generally receiving higher scores (global bias), active users tending to receive higher scores (user-specific bias), and popular items naturally receiving more interaction (item-specific bias). This leads to recommendations being biased towards top-ranked popular items, making it difficult for long-tail items to be exposed, and resulting in a poor novelty experience for users. Although some industry methods use baseline score (bias) correction, they are still insufficient to fully address the impact of bias.
[0003] 2) The Time Effect Problem: User interests change dynamically over time, and earlier historical behaviors have low relevance to current preferences. Traditional collaborative filtering lacks modeling of the time factor, easily allowing expired preferences to dominate recommendation results, thus failing to reflect recent shifts in user interests in a timely manner. Some existing time-aware recommendation methods (such as time decay-based and session-based algorithms) show that considering the time factor can improve recommendation performance, but classic neighborhood methods (such as neighborhood collaborative filtering techniques) do not provide sufficient support for this.
[0004] 3) Diversity and Redundancy Issues: Recommendation lists are often dominated by a few similar popular items, resulting in high redundancy and insufficient diversity. Users may receive recommendations that are all similar in style, failing to cover a wide range of user interests and limiting the exposure of long-tail content. This problem is particularly prominent in traditional collaborative filtering, because the top N items selected solely based on similarity scores are likely to belong to similar popular categories.
[0005] On the other hand, model-based collaborative filtering methods (such as matrix factorization, deep learning models, and graph neural networks) have developed rapidly in recent years, achieving significant improvements in accuracy. For example, graph models such as LightGCN utilize the propagation of user-item graphs to improve recommendation performance. However, these complex models often have high computational and maintenance costs, poor interpretability, and are difficult to meet the requirements of low latency and high interpretability in real-world systems.
[0006] Given the shortcomings of the existing technologies, there is an urgent need for a lightweight personalized item recommendation method to effectively address the issues of bias correction, time decay, and diversity reordering, while maintaining the simplicity and transparency of the algorithm and improving the accuracy and diversity of the recommendations. Summary of the Invention
[0007] In view of this, embodiments of the present invention provide a method and system for item recommendation based on hybrid collaborative filtering, in order to eliminate or improve one or more defects existing in the prior art.
[0008] One aspect of the present invention provides an item recommendation method based on hybrid collaborative filtering, the method comprising the following steps: The system acquires historical user interaction behavior data, performs residual preprocessing on the interaction behaviors in the historical user interaction behavior data to remove global bias and individual bias, and obtains the user's residual preference value for the items that have been interacted with; wherein, the historical user interaction behavior data includes a user set, an item set, and a user-item interaction record set, and the individual bias includes user bias and item bias; The time decay factor of the interaction behavior is calculated based on the interval between the current recommendation time and the interaction time of the interaction behavior. The neighborhood set of each user is obtained by introducing the time decay factor and the inverse frequency factor of the item based on the neighborhood similarity of the user. The neighborhood set of each item is obtained by introducing the time decay factor and the inverse frequency factor of the user based on the neighborhood similarity of the item. The user-side prediction score for the target item is calculated based on the neighborhood similarity of neighboring users in the target user's neighborhood set and the residual preference values of neighboring users for the target item. Based on the neighborhood similarity of neighboring items in the neighborhood set of items that the target user has interacted with and the residual preference value of the target user for the items that have been interacted with, calculate the item-side prediction score of the target user for items that are similar to the items that have been interacted with. The target user's initial rating for the initial item is generated by weighting and combining the user-side predicted score and the item-side predicted score, and adding the global bias and individual bias removed in the residual preprocessing. Based on the initial rating, an initial candidate item set with initial ranking for the target user is generated. The candidate items in the initial candidate item set are re-ranked by introducing an anti-popularity penalty index to obtain a re-ranked first candidate item set. Based on the first candidate item set, the item recommendation result for the target user is obtained.
[0009] In some embodiments of the present invention, the step of performing residual preprocessing on the interaction behavior in the user's historical interaction behavior data to remove global and individual biases and obtain the user's residual preference value for the interacted items includes: calculating the user's residual preference value for the interacted items based on the following formula: ; in, This represents the residual preference value after removing global and individual biases. Indicates the intensity of interaction or an explicit rating. This represents the global average bias. Indicates user bias. This indicates that the item is offset.
[0010] In some embodiments of the present invention, the time decay factor is calculated based on the following formula: ; in, Indicates user and items The time decay factor of the interaction behavior between them Indicates the current recommendation time and user The most recent encounter with items The time interval between interactions Indicates the attenuation constant; The inverse frequency factor of the item is calculated based on the following formula: ; in, The inverse frequency factor of an item, Represents items The number of interactive users; The user's inverse frequency factor is calculated based on the following formula: ; in, Represents the user's inverse frequency factor. Indicates user The number of items that have been interacted with.
[0011] In some embodiments of the present invention, the method further includes: capturing interest shifts by comparing the similarity between the user's latest behavior and its historical preferences, adjusting the time decay factor to give higher weight to the user's latest behavior when interest shifts are captured, and updating the current user's neighborhood set and the user's residual preference value for interacted items in real time. The fusion weight coefficient for the weighted combination of the user-side predicted score and the item-side predicted score is dynamically calculated based on the preset fusion weight coefficient and the matching relationship between the user-item historical interaction density.
[0012] In some embodiments of the present invention, the calculation of user-based neighborhood similarity by introducing the time decay factor and the inverse frequency factor of the item includes: calculating user-based neighborhood similarity by introducing the time decay factor, the inverse frequency factor of the item, and the first contraction factor based on the following formula: ; in, Indicates user and users Neighborhood similarity between them and Representing users respectively and users A collection of items Indicates user and items The time decay factor of the interaction behavior between them Indicates user and items The time decay factor of the interaction behavior between them and Representing users respectively and users The number of items that have been interacted with. Represents items inverse frequency factor, Indicates the first contraction factor; The calculation of item-based neighborhood similarity by introducing the time decay factor and the user's inverse frequency factor includes: calculating item-based neighborhood similarity based on the following formula by introducing the time decay factor, the user's inverse frequency factor, and the second contraction factor: ; in, Represents items and items Neighborhood similarity between them and Representing items and items The user set, Indicates user and items The time decay factor of the interaction behavior between them Indicates user and items The time decay factor of the interaction behavior between them and Representing items and items The number of users who have interacted with the account. Indicates user inverse frequency factor, This represents the second contraction factor.
[0013] In some embodiments of the present invention, calculating the user-side prediction score of the target user for the target item based on the neighborhood similarity of neighboring users in the target user's neighborhood set and the residual preference values of neighboring users for the target item includes: calculating the user-side prediction score of the target user for the target item based on the following formula: ; in, Indicates target user For target items User-side prediction score, Indicates neighboring users For target items The residual preference value, Indicates user and users Neighborhood similarity between them Indicates user The neighborhood set; The step of calculating the item-side prediction score for items similar to those already interacted with by the target user, based on the neighborhood similarity of neighboring items in the neighborhood set of items already interacted with by the target user and the residual preference value of the target user for the items already interacted with, includes: calculating the item-side prediction score for items similar to those already interacted with by the target user based on the following formula: ; in, Indicates target user For items similar to those already interacted with The corresponding item-side prediction score, Indicates user For items The residual preference value, Represents items and items Neighborhood similarity between them Represents items The neighborhood set.
[0014] In some embodiments of the present invention, the candidate items in the initial candidate item set are reordered by introducing an anti-prevalence penalty index to obtain a reordered candidate item set, including: confirming the penalty value corresponding to the prevalence index for each item in the initial candidate item set; and subtracting the corresponding penalty value from the initial score of each item in the initial candidate item set to obtain an updated score value for each item, and reordering based on the updated score value to obtain a reordered candidate item set.
[0015] In some embodiments of the present invention, obtaining the item recommendation result for the target user based on the reordered candidate item set includes: selecting a first number of items with the highest ratings from the reordered candidate item set to form a final candidate item set, which serves as the item recommendation result for the target user. The candidate items in the initial candidate item set are reordered using an anti-popularity penalty index to obtain a reordered candidate item set. Obtaining the item recommendation result for the target user based on the reordered candidate item set includes: Confirm the penalty value corresponding to the popularity index for each item in the initial candidate item set; Subtract the corresponding penalty value from the initial score of each item in the initial candidate item set to obtain the updated score value of each item, and re-sort the items based on the updated score value; The maximum marginal relevance algorithm is used to perform a second screening of the items in the reordered candidate item set to obtain the first number of item recommendation lists, which are used as the item recommendation results for the target user. Each iteration of the secondary screening is based on the following formula: ; in, This indicates the candidate items in the initial candidate item set that have not yet been selected into the currently selected item set for the user. The rating function for the items in the currently selected item set; Represents items With users Relevance score, Represents items With users The similarity of items in the currently selected item set. , which is the compromise factor.
[0016] In some embodiments of the present invention, after reordering based on the updated score values to obtain a reordered set of candidate items, the method further includes: adjusting the current ordering using pre-set diversity and fairness constraints to obtain an updated reordered set of candidate items.
[0017] Another aspect of the present invention provides an item recommendation system based on hybrid collaborative filtering. The system includes a processor, a memory, and a computer program / instructions stored in the memory. The processor is used to execute the computer program / instructions. When the computer program / instructions are executed, the system performs the steps of the method described above.
[0018] Another aspect of the present invention provides a computer-readable storage medium having a computer program / instructions stored thereon, which, when executed by a processor, implement the steps of the method as described above.
[0019] The item recommendation method and system based on hybrid collaborative filtering of the present invention can output recommendation results that not only meet user interests but also take into account novelty and diversity.
[0020] Additional advantages, objects, and features of the invention will be set forth in part in the description which follows, and will also become apparent in part to those skilled in the art upon studying the description, or may be learned by practice of the invention. The objects and other advantages of the invention can be realized and obtained by means of the structures specifically pointed out in the description and drawings.
[0021] Those skilled in the art will understand that the objectives and advantages achievable with the present invention are not limited to those specifically described above, and that the above and other objectives achievable with the present invention will become clearer from the following detailed description. Attached Figure Description
[0022] The accompanying drawings, which are provided to further illustrate the invention and form part of this application, are not intended to limit the scope of the invention.
[0023] Figure 1 This is a flowchart illustrating an item recommendation method based on hybrid collaborative filtering in one embodiment of the present invention.
[0024] Figure 2 This is a schematic diagram of the architecture of an item recommendation system based on hybrid collaborative filtering in one embodiment of the present invention. Detailed Implementation
[0025] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the embodiments and accompanying drawings. Here, the illustrative embodiments and descriptions of this invention are used to explain the invention, but are not intended to limit the invention.
[0026] It should also be noted that, in order to avoid obscuring the invention with unnecessary details, only the structures and / or processing steps closely related to the solution according to the invention are shown in the accompanying drawings, while other details that are not closely related to the invention are omitted.
[0027] It should be emphasized that the term "including / comprises" as used herein refers to the presence of a feature, element, step, or component, but does not exclude the presence or addition of one or more other features, elements, steps, or components.
[0028] To address the problems existing in the prior art, this invention provides an improved collaborative filtering recommendation method. This method takes into account the advantages of traditional neighborhood collaborative filtering technology in engineering applications due to its high computational efficiency and strong interpretability. At the same time, it introduces a bias correction mechanism, time decay modeling, and diversity reordering strategy to improve the accuracy and diversity of recommendations while maintaining low latency and interpretability, thereby improving the practicality of the recommendation results.
[0029] To address the issue of high redundancy and insufficient diversity in existing recommendation methods, methods such as result re-ranking and diversity algorithms (e.g., Maximum Marginal Relevance, MMR) have been used to alleviate this problem. However, effectively combining these methods with collaborative filtering techniques remains a complex challenge. This invention proposes an improved collaborative filtering recommendation method that constructs a hybrid collaborative filtering recommendation framework combining bias awareness and time awareness, referred to herein as the "HybridCFPlus" framework. Therefore, this improved collaborative filtering recommendation method is also called a hybrid collaborative filtering-based item recommendation method, or a bias-aware and time-aware item recommendation method. In this application, "items" can include not only commodities but also other items recommended to users through online platforms. As examples, "items" can include: 1) commodities recommended to users on e-commerce platforms; 2) films, songs, or videos recommended to users on film, music, or short video platforms; 3) news headlines, articles, or social posts recommended to users on news and social platforms; 4) books or advertisements recommended to users by digital libraries or some app stores, etc. These examples of "items" are not intended to limit the invention, and any content suitable for recommending to users based on their interaction preferences is within the scope of this application. Therefore, the item recommendation method based on hybrid collaborative filtering of the present invention can be applied to multiple platforms or multiple application scenarios.
[0030] As an example, the hybrid collaborative filtering recommendation framework of this invention may mainly include an interaction residual modeling module, a time decay similarity calculation module, a user-side and item-side neighborhood fusion module, and a diversity re-ranking module, etc. Figure 2As shown. Furthermore, the hybrid collaborative filtering recommendation framework of this invention may also include other modules, and this invention is not limited thereto.
[0031] Figure 1 The diagram shows a flowchart of the item recommendation method based on bias perception and time perception hybrid collaborative filtering provided by the present invention. Figure 1 As shown, the method includes the following steps (steps S110-S160): Step S110: Obtain user historical interaction behavior data, perform residual preprocessing on the interaction behavior in the user historical interaction behavior data to remove global bias and individual bias, and obtain the user's residual preference value for the interacted items.
[0032] As an example, user history interaction data may include user sets. Item collection and user-item interaction record collection Each record represents a user In time With items An interaction occurred. When the interaction log collection contains implicit feedback data, It can represent the strength of interaction, usually This indicates that there is interaction. This indicates no interaction. The value is merely an example, and the invention is not limited thereto; when the interaction record set data is explicit feedback data, It can represent explicit scoring.
[0033] In this embodiment of the invention, residual preprocessing removes global bias, user bias, and item bias from the residual signal to obtain the user's residual preference value for the interacted items (i.e., the residual signal). The "residual signal" refers to the portion of the signal remaining after removing global bias, user bias, and item bias from the original user-item interaction data. This portion of the signal represents the pure interaction effect between the user and the item, i.e., the user's personalized preference for the item. The interaction residual modeling module can be used to perform residual preprocessing (or bias removal) on the interaction behavior in the user's historical interaction behavior data to eliminate the influence of global bias and individual bias on subsequent scoring, thereby obtaining the user's residual preference value for the interacted items. The global bias can be, for example, the global average bias. Let $\mathbf{v}$ represent the global average interaction intensity or global average rating; user bias and item bias are individual biases, representing the deviation of a user from the global average bias (e.g., deviation caused by active users) and the deviation of an item from the global average bias (e.g., deviation caused by high individual interaction for popular items), respectively. User bias and item bias can be represented by $\mathbf{v}$. and express.
[0034] As an example, the global average bias User bias and item bias This can be obtained by statistically analyzing or solving a user-item matrix constructed from user historical interaction data, to support residual preprocessing. When the user historical interaction data is explicit feedback data, the interaction records include user... ,thing With interaction time t In addition, it also includes explicit user ratings for items. For example, a rating system from one to five. At this point, a user item rating matrix can be constructed. R The matrix has rows corresponding to users, columns corresponding to items, and elements representing observed ratings. Unobserved locations are represented by null or missing values. Global average bias. It can be obtained by the arithmetic mean of all observed scores, that is, all observed scores. Sum and divide by the number of observation score items N User bias Characterizable user Deviation from the global average in rating habits can be determined by the user. The average of the observed scores and The difference is obtained, or by the user. The residuals on its already rated set of items are all obtained. Item bias. Characterizable items The deviation of the rating from the global average can be determined by the item. The average of the observed scores and The difference is obtained, or obtained from the item. The mean of the residuals over the set of users who rated the product is obtained. To avoid small sample bias caused by rating sparsity, a regularization constant can be added to the denominator of the above mean calculation for smoothing, thus making the bias of users or items with few interactions closer to zero. Furthermore, , , Alternatively, minimizing the difference between the observed score and the baseline predicted value can also be achieved. add add The squared error between the baseline predictions is combined with a regularization term to solve the problem jointly. The baseline prediction corresponds to the predicted value at each observation position in matrix R. The solution can be obtained iteratively with alternating updates until convergence. When user historical interaction data is implicit feedback data, if only a marker indicating whether an interaction occurred is provided, then... Defined as the interaction strength obtained from implicit behavior mapping, used to replace explicit scoring for calculation. , , As an example, interaction strength can be obtained by weighting various behaviors, such as assigning lower weight to browsing, medium weight to clicking, higher weight to adding to favorites and shopping cart, and the highest weight to purchasing or completing a viewing session. The number of interactions or the duration of each interaction can be normalized before being used as the basis for the interaction strength. At this point, a user-item interaction intensity matrix is also constructed. R The matrix elements represent the interaction strength. Non-interactive locations can be considered missing or zero. Global average bias. The average value of all observed interaction strengths can be taken, with user bias. User The average interaction strength relative Deviation, item offset Retrievable items The average interaction strength relative The deviation can be smoothed by adding a regularization constant to the denominator. Furthermore, considering that unobserved feedback is not necessarily negative feedback in implicit feedback, samples can be proportionally drawn from user-item pairs that have never been interacted with and their... Set it to zero or a small value to form a training set containing positive samples and sampled negative samples, and then calculate based on this. , , This allows the bias term to characterize differences in user activity and item popularity, thereby enabling the residual preference value to retain a purer personalized deviation signal.
[0035] In this step, the residual preference value obtained from the residual preprocessing can be expressed as: ; (1) in, This indicates the residual preference value after removing the global average bias, user bias, and item bias. Indicates the intensity of interaction or an explicit rating. This represents the global average bias. Indicates user bias. This indicates that the item is offset.
[0036] This step utilizes the interactive residual modeling module to ensure that only the individualized deviation portion of a user's item preference is retained, preventing popular items or active users from dominating subsequent calculations due to excessively high inherent preferences. In this step, the calculation of global bias, individual bias, and residual preference values can be pre-processed offline.
[0037] Step S120: Calculate the time decay factor of the interaction behavior based on the interval between the current recommendation time and the interaction time of the interaction behavior. Calculate the neighborhood similarity based on the user by introducing the time decay factor and the inverse frequency factor of the item to obtain the neighborhood set of each user. Calculate the neighborhood similarity based on the item by introducing the time decay factor and the inverse frequency factor of the user to obtain the neighborhood set of each item.
[0038] This step can be completed by the time-decay similarity calculation module. To make the recommendation focus more on recent interactions, this invention introduces a time-decay factor in similarity calculation and score aggregation. More specifically, a time-decay factor is defined for each interaction record as a weight based on time difference. Through time decay, older interactions are assigned smaller weights to highlight recent user behavior, which is crucial for capturing shifts in user interests. This weight will be applied to subsequent neighborhood similarity calculations and residual preference aggregation, making the entire framework sensitive to "recentness".
[0039] As an example, the formula for calculating the time decay factor is as follows: ; (2) in, Indicates user and items The time decay factor of the interaction behavior between them Indicates the current recommendation time and user The most recent encounter with items The time interval between interactive actions This is a time-scale parameter (decay constant) used to control the rate of decay over time, and is related to the time interval Δ. t Using the same time unit, and . The smaller the value, the greater the weight. The faster the decay, the higher the weight of the interaction that is closer to the current recommendation time, indicating a greater emphasis on recent behavior; A larger value indicates that interactions over a longer historical period still have a greater impact, meaning that the effects of historical behaviors are retained for a longer time. The above formula for calculating the time decay factor is merely an example, and the invention is not limited to it. The value range can be set according to the interest change cycle and data time granularity of the business scenario. As an example, when Δ t When measured in days, The range can be from 1 day to 365 days; when Δ t When measured in hours, The range can be from 1 hour to 8760 hours. This range is not limited and can be adjusted according to the content lifecycle, repurchase cycle, or user interest drift rate of different platforms. The criterion for determining τ can be one of the following methods or a combination thereof. First, a half-life criterion: a pre-determined interest half-life T_half is used, such that when Δ... t When τ equals T_half, the time decay weight is approximately 0.5. Therefore, we can let τ equal T_half divided by ln2, thus directly mapping business expectations to decay intensity. Second, data distribution benchmark: statistically analyze the historical interaction time interval Δ. t Distribution, setting τ in Δ t The value is located near the median or in the interval between lower and higher quantiles, ensuring that neither extremely recent noise is overemphasized nor outdated preferences are over-retained. Third, a validation set is used as a benchmark for parameter tuning. A grid search or segmented search is performed on a set of candidate τs on an offline validation set to select the τ that optimizes the ranking metric. This τ can be re-estimated periodically on new data to accommodate interest shifts.
[0040] Furthermore, in this embodiment of the application, to address the issue that user interests may shift over time, the present invention further proposes an interest drift detection and online incremental update mechanism to enhance the system's adaptability to changes in interests. More specifically, the method of the present invention also includes: capturing interest shifts by comparing the similarity between the user's latest behavior and their historical preferences; adjusting the time decay factor to assign higher weight to the user's latest behavior when an interest shift is detected; and updating the current user's neighborhood set and the user's residual preference values for interacted items in real time.
[0041] Interest shifts are determined by comparing the similarity between a user's latest behavior and their historical preferences. By capturing sudden interest shifts in real time, a shift in user interest is identified when a significant discrepancy is found between recent and historical preferences, triggering a lightweight incremental update for that user in the model. For example, adjusting the decay constant to adjust the time decay factor assigns higher weight to recent behaviors, naturally diluting older interests. After detecting an interest shift, the system recalculates the neighborhood set and the user's residual preference values for interacted items only for that user, updating the recommendation results immediately to reflect new interests. This online incremental update avoids the need to retrain the entire model for every interest change, improving system real-time performance. Experimental results show that this mechanism can quickly adjust recommended content when interests change, effectively reducing user churn and ensuring recommendation accuracy in high-real-time scenarios such as news and information. The system maintains a summary of recent behavior for each user, performs low-cost comparisons with historical preferences, and asynchronously performs local updates when conditions are met, ensuring overall service stability and efficiency.
[0042] Furthermore, in this embodiment of the invention, to improve reliability under sparse data when calculating user-user similarity and item-item similarity, an inverse item frequency (IIF) factor and an inverse user frequency (IUF) factor are introduced, respectively. The user inverse frequency factor (also called the inverse user frequency factor) and the item inverse frequency factor (also called the inverse item frequency factor) can be used to reduce the disproportionate impact of high-frequency users or popular items on similarity. For ease of description, the following will use... The inverse frequency factor of an item is represented by... This represents the user's inverse frequency factor.
[0043] As an example, the inverse frequency factor of an item It can be calculated based on the following formula: ; in, Represents items The number of interactive users; The user's inverse frequency factor It can be calculated based on the following formula: ; in, Indicates user The number of items that have been interacted with.
[0044] As can be seen above, the user's inverse frequency factor and the item's inverse frequency factor can be represented by a unified formula: ; (3) in, It is a node The degree value of the interactive object. For example, when calculating similarity for users, For users The number of items that have been interacted with; when calculating similarity for items. Represents items How many users have interacted with it? (Through) Reducing the co-occurrence count contribution of users or items with high degrees can prevent "users with a lot of interaction" or "very popular items" from having excessive weight in similarity calculation. In other words, the inverse frequency factor of users and items can make the similarity measure more robust.
[0045] Furthermore, this invention can also introduce a shrinkage factor to normalize and smooth the similarity. Specifically, a constant term is added when normalizing the co-occurrence count for similarity calculation to prevent overestimation of similarity when the number of shared interaction samples is small. As an example, for the denominator of user-user similarity, a term can be introduced... Scaling can be applied; for item-to-item similarity, it can be introduced... Scaling is performed. Among them, Indicates user The total number of items interacted with. Represents items The number of interactive users. (Includes a shrinkage factor.) and Then, when the number of interactions between two users or items is small, increasing the denominator can shrink the similarity value towards 0, avoiding similarity distortion caused by accidental co-occurrence under sparse data. Through comprehensive use... , The shrinkage process makes similarity calculation more robust and reliable in sparse, long-tailed regions. In the embodiments of the present invention, They can be the same or different. Preferably, .
[0046] After preprocessing the residual signal, time decay factor, and inverse frequency factor, this invention can calculate user-based neighborhood similarity and item-based neighborhood similarity.
[0047] First, regarding the neighborhood similarity of users, in one embodiment of the present invention, for any two users... and Define their similarity This is a "cosine similarity" form weighted by a time decay factor and corrected for by an inverse frequency factor. For example, by introducing a time decay factor, an item's inverse frequency factor, and a shrinkage factor, user-based neighborhood similarity can be calculated based on the following formula: ; (4) in, Indicates user and users Neighborhood similarity between them Indicates user and A collection of items that have been interacted with. and Representing users respectively and users A collection of items Indicates user and items The time decay factor of the interaction behavior between them Indicates user and items The time decay factor of the interaction behavior between them and Representing users respectively and users The number of items that have been interacted with. Represents items inverse frequency factor, Indicates the contraction factor; For each common item At the same time, user needs were taken into consideration. and Time decay factor weight on the item and and the inverse frequency factor of the item. The denominator was then subjected to shrinkage normalization. The calculated result... Can be regarded as user With users Interest similarity under conditions that remove bias and focus on recent behavior.
[0048] Based on the aforementioned user-based neighborhood similarity, the present invention can provide each user with... Find its most similar predecessor One neighboring user, get user The neighborhood set of can be denoted as . .
[0049] The formula for neighborhood similarity of items is similar in structure to that for user similarity. In one embodiment of the present invention, for any two items... and Define item similarity This is a "cosine similarity" form weighted by a time decay factor and corrected for by an inverse frequency factor. For example, by introducing a time decay factor, a user's inverse frequency factor, and a shrinkage factor, item-based neighborhood similarity can be calculated based on the following formula: (5) in, Represents items and items Neighborhood similarity between them and Representing items and items The user set, Indicates simultaneous with items and An interactive set of users Indicates user and items The time decay factor of the interaction behavior between them Indicates user and items The time decay factor of the interaction behavior between them and Representing items and items The number of users who have interacted with the account. Indicates user inverse frequency factor, This represents the contraction factor.
[0050] For each concurrently interacting user Consider its role in the item and Time decay factor weights Multiply by the user's IUF value (i.e., users with lower activity levels contribute relatively more), and add the number of item interactions to the denominator. The form is contracted and normalized. This yields a similarity measure between items. This reflects the degree to which the item has been favored by its common users recently.
[0051] Similarly, for each item Pre-select the most similar ones A set of neighboring items can be obtained. neighborhood set .
[0052] In this step S120, the time decay factor and IUF value ( and The determination of scaling factors and the calculation of each neighborhood set can be completed offline in advance.
[0053] Step S130: Calculate the user-side prediction score of the target user for the target item based on the neighborhood similarity of neighboring users in the target user's neighborhood set and the residual preference value of neighboring users for the target item.
[0054] The neighbor pairs of the target item in the user's neighborhood set The residual preference values are summarized to calculate the user-side prediction score. For example, the user-side prediction score for the target item can be calculated based on the following formula: ; (6) in, Indicates target user For target items User-side prediction score, Indicates neighboring users For target items The residual preference value, Indicates user and users Neighborhood similarity between them Indicates user The neighborhood set.
[0055] Formula (6) embodies the idea that: with users A group of neighbors with similar recent behavioral patterns If for items Exhibiting interest higher than their usual preferences (i.e., positive and large) can also be used as a user indicator. Items you might like Evidence. Through a weighted summation of these neighbors' opinions on the items. The residual signal can be used to obtain the user estimate from the perspective of the user's neighbors. For items Preference rating.
[0056] Step S140: Based on the neighborhood similarity of neighboring items in the neighborhood set of items already interacted by the target user and the residual preference value of the target user for the items already interacted with, calculate the item-side prediction score of the target user for items similar to the items already interacted with.
[0057] For target users and a candidate item Summarize users Previously consumed items The residual preference values of similar items can be used to calculate the item-side prediction score. For example, the item-side prediction score for a target user on items similar to those already interacted with can be calculated using the following formula: ; (7) in, Indicates target user For items similar to those already interacted with The corresponding item-side prediction score, Indicates user For items The residual preference value, Represents items and items Neighborhood similarity between them Represents items The neighborhood set.
[0058] Formula (7) above means: if the item With users An item I used to like High similarity, and users For items The residual preference is positive (i.e., more inclined to prefer items than the general trend). Then this preference can be transferred to items based on similarity weights. This approach captures the impact of potential item similarity attributes, compensating for potential omissions in preference signals from the user's neighbor perspective alone. In practice, an item similarity index can be built offline for rapid acquisition. The list efficiently calculates item-side scores during the online phase.
[0059] In steps S130 and S140, the calculation of the user-side prediction score and the item-side prediction score can be pre-calculated offline by the time decay similarity calculation module, or it can be calculated online by the user-side and item-side neighborhood fusion module.
[0060] Step S150: By weighting and combining the user-side predicted score and the item-side predicted score, and adding the global bias and individual bias removed in the residual preprocessing, an initial score for the target user on the initial item is generated, and an initial candidate item set with initial ranking for the target user is generated based on the initial score.
[0061] Considering that user-side ratings and item-side ratings each have their advantages and are complementary, this invention integrates the two to obtain a more comprehensive recommendation score. Specifically, the user-side predicted score obtained in step S130 and the item-side predicted score obtained in step S140 are weighted and combined, and the previously removed separate items are added back to form the target user score. For items The initial rating.
[0062] ; (8) in, These are the global bias, user bias, and item bias defined earlier. This is a fusion weighting coefficient used to balance the contributions of user neighborhoods and item neighborhoods. When When the signal is larger, it tends to emphasize user similarity. When the signal is small, it emphasizes the similarity of items. Experiments have shown that using a fixed signal... Good results can be achieved (e.g.) (This represents the equal-weighted fusion of user and item-side information; of course, dynamic adjustment based on neighborhood density could also be considered.) .
[0063] In this embodiment, the fusion weight coefficient for the weighted combination of the user-side predicted score and the item-side predicted score is dynamically calculated based on a preset fusion weight coefficient and the matching relationship between the historical interaction density of users and items. That is, the fusion weight of the user-side (UserCF) and item-side (ItemCF) predicted scores can be adaptively adjusted according to the historical interaction density of users and items. When user interaction behavior is low (e.g., for cold-start users), the algorithm increases the weight of the item-side predicted score (e.g., the score calculated based on item similarity) to compensate for insufficient user preference information; conversely, when item interaction records are sparse (e.g., for long-tail items), the influence of user-side preference prediction is increased to ensure that such items still have a chance to be recommended when matched with a specific user.
[0064] In implementation, an adaptive weighting strategy can be used to dynamically adjust the fusion weight coefficients, which depend on the interaction count, in the fusion scoring formula (8) of the adaptive calculation step S150. As an example, the matching relationship between different fusion weight coefficients and user interaction counts and item interaction frequencies can be pre-set. The fusion weight coefficients can be automatically calculated based on user interaction counts (such as click counts) and item interaction frequencies. During system deployment, this correspondence can also be adjusted to balance the effect under different sparsity levels. The fusion weights can be quickly obtained based on interaction density during online prediction without significant additional overhead. Experiments have verified that this mechanism can effectively improve the relevance and coverage of recommendation results in user cold start and long-tail item scenarios.
[0065] In this embodiment of the invention, the global bias and individual bias are subtracted first when calculating the residual signal so that the neighborhood only learns the true individual residuals. The global bias and individual bias are then added back in this step to restore the prediction result to the original scoring scale that is sortable and has a fallback prior. The fused score in formula (8) By combining baseline preference correction and evidence from user / item neighborhoods, the squaring is more accurate.
[0066] After obtaining initial ratings for all candidate items, the item recommendation system based on hybrid collaborative filtering selects the top items for each user. The items with the highest scores form the initial candidate item set. Here Typically, the number of recommendations displayed is much larger than the final number. (For example The goal is to select 50 or 100 items, ultimately showing the user the top 10, with the aim of preserving as many potentially relevant items as possible with high recall. It's important to note that items already interacted with by the user should be filtered out before generating the candidate item set (i.e., the exclusion set). This avoids recommending historical items again. Similarly, the candidate item set is strictly based on historical data to ensure no "information leakage" occurs by using future data. Through the above fusion and filtering steps, an initial candidate item set (list) is obtained that integrates bilateral neighbor signals while maintaining a strict temporal order. This lays the foundation for the final reordering.
[0067] Step S150 is preferably completed online by the user-side and item-side neighborhood fusion module.
[0068] Step S160: The candidate items in the initial candidate item set are re-ranked by introducing at least an anti-popularity penalty index to obtain a re-ranked first candidate item set, and the item recommendation result for the target user is obtained based on the first candidate item set.
[0069] In this step, the initial candidate item set obtained in step S150 is... Furthermore, an anti-prevalence penalty index is introduced for re-ranking. Additionally, the maximum marginal relevance (MMR) algorithm can be used to perform a secondary screening of the re-ranked items in the candidate item set, resulting in the final recommendation list. These steps are described below.
[0070] In this step, for the initial candidate item set This approach balances relevance and diversity by introducing a lightweight reordering module that executes a diversity reordering algorithm. It reduces the dominance of popular content and list redundancy while preserving most of the relevance, thereby increasing the exposure of long-tail items.
[0071] 1) Anti-popularity Penalty: Applying a penalty to the initial set of candidate items. Each item is assigned a penalty score based on its popularity (popularity) in historical interaction data. This penalty score causes particularly popular items to drop in the ranking, thereby improving the relative ranking of long-tail items. (Definable items) The penalty value is: ; (9) in, Represents items Popularity metrics (such as how many users have interacted with it). Parameters for controlling the severity of punishment, . The score increases with the item's popularity, therefore popular items will receive a larger deduction. The adjusted effective score can be represented as... .when Taking 0 indicates no penalty. Taking a larger value will significantly reduce the ranking of highly popular items. This superficial anti-popularity processing can prevent top items from being overexposed and guide the recommendation results towards the long tail, but since the penalty is uniform and smooth, it will not completely disrupt the ranking of highly relevant items. After applying the above penalty adjustment to the initial scores of each item in the initial candidate item set, the updated score values of each item are obtained. Based on the updated score values, the items are re-ranked to obtain the re-ranked first candidate item set. The top N items are selected from the re-ranked first candidate item set to form the second candidate item set. This second candidate item set can be used as the final item recommendation result for the target user. This invention can also continue to execute the MMR greedy diversity selection algorithm for secondary screening after applying the penalty adjustment, and obtain the final item recommendation result after secondary screening.
[0072] 2) MMR Greedy Diversity Selection: After applying the above penalty adjustment, this invention can also use the Maximum Marginal Relevance (MMR) algorithm to reorder the initial candidate item set. A second round of filtering yields a first list of N recommended items, which will serve as the final Top-ranked list. Item Recommendation Results. MMR employs a greedy iterative selection mechanism, choosing items at each step that are both highly relevant to the user and maximally different from the already selected item list, thereby improving the overall diversity of the list. Specifically, MMR defines an initial candidate item set. Each item in the recommended list that has not yet been selected (which can be called the set of selected items) Candidate items For the currently selected item set The scoring function is as follows: ; (10) in, This indicates that the initial candidate item set does not yet include the recommended item list (currently selected item set). S The candidate items for the user The rating function for the items currently included in the recommended item list; Represents items With users Relevance score, Represents items With the currently selected item set S The similarity of items in the database is used to measure the similarity of newly added items. The redundancy that will be added to the list (the higher the similarity, the greater the redundancy). , is a compromise factor. When The larger the value, the more emphasis is placed on relevance (higher weight of the first term); conversely, the smaller the value, the more emphasis is placed on diversity (higher weight of the second term). Each iteration of the MMR algorithm selects the term with the highest relevance based on formula (10). Add items to the collection (The set of selected items in the first iteration) S The selected item set may contain only the item ranked first in the initial candidate item set after applying the penalty adjustment, and the selected item set may be removed from the initial candidate item set until the selected item set is reached. S Full selection One recommendation result. In the formula... The similarity between items can be directly reused from the similarity calculated in formula (5), or other similarity measures can be used as needed (such as similarity based on item content attributes, if there is additional content information). Through the greedy selection of MMR, the final recommendation list significantly reduces internal similarity while ensuring accuracy, thereby covering more diverse content. The time complexity of the reordering algorithm is approximately O(n). ( (This is the size of the candidate item set), which is acceptable in a real-world system.
[0073] Furthermore, in some embodiments of this invention, before or after performing the MMR greedy diversity selection, the weight parameters of each objective in the re-ranking module can be adjusted using pre-set diversity and fairness constraints, i.e., the ranking can be adjusted to obtain an updated re-ranked set of candidate items. The business side can control the exposure ratio of different types of results in the recommendation list. For example, by adding diversity and fairness constraints during the recommendation result output stage to optimize the final list, long-tail content can be required to occupy at least a certain proportion in the Top-N list, limiting the repetition of similar items and achieving a more balanced recommendation. Experimental results show that this method improves the coverage and type richness of recommendation results while maintaining the Precision metric essentially unchanged, verifying the effectiveness of the re-ranking mechanism described in the patent in improving recommendation fairness and user satisfaction.
[0074] Step S160 is preferably completed online by the diversity reordering module.
[0075] After the above steps, the HybridCFPlus framework constructed in this invention can output the top N recommended items that both match user interests and are novel and diverse, serving as the Top-N recommended items. Recommendation list. It should be noted that, in order to ensure a strict chronological order in the recommendation process, this method uses only historical data when calculating bias, similarity, and popularity statistics. This ensures that the model does not utilize future information, making the evaluation results closer to real-world online performance.
[0076] Compared with the prior art, the beneficial effects of the present invention are mainly reflected in the following aspects: 1) Improved Recommendation Accuracy and Timeliness: By removing interfering global and individual biases through residual modeling and introducing time decay to focus on the latest user behavior, neighborhood collaborative filtering significantly improves accuracy. Experimental results show that compared to the traditional UserCF / ItemCF method, this invention improves precision, recall, and NDCG (Normalized Discount Cumulative Gain). For example, on a real dataset (Yelp implicit feedback), precision is improved by approximately 40%, recall by approximately 40%, and NDCG by approximately 8%. This indicates that this method can more accurately capture the user's current interests, resulting in more relevant recommendation results.
[0077] 2) Increasing the diversity of recommendation results and mitigating popularity bias: This invention significantly reduces the over-concentration in the recommendation list by incorporating anti-popularity penalties and MMR diversity re-ranking into the scoring, thereby increasing the exposure of long-tail items and the novelty of the list. In experiments, this method achieved higher or comparable coverage and novelty metrics compared to existing baselines while maintaining accuracy. For example, compared to UserCF's approximately 38.7% directory coverage, this invention achieved approximately 39.9%, and the recommendation list covers a wider variety of items. Users will receive more diverse recommendations, avoiding monotony and helping to discover less popular items within a broad spectrum of interests. At the same time, the system also reduces the bias of being dominated by a few popular items, improving overall fairness.
[0078] 3) Transparent and efficient model, easy to deploy and expand: This invention is based on a memory-neighborhood method, eliminating the need to train complex deep models. The main computations can be completed offline through preprocessing, with only a small amount of computation and greedy sorting required in the online phase, offering advantages such as low latency and minimal computational resource consumption. Simultaneously, each module (bias term, similarity, and weight parameters) has a clear physical meaning, facilitating parameter tuning and interpretation, meeting industry requirements for interpretability and controllability. Furthermore, the framework has a clear structure, is easily integrated with existing recommendation systems, and the integration of user-side and item-side signals provides a new approach to solving cold start and data sparsity problems. In summary, this invention achieves a comprehensive improvement in recommendation performance while ensuring simplicity and efficiency.
[0079] The following example of an item recommendation system based on hybrid collaborative filtering illustrates the specific implementation of this invention. Suppose an online video platform wants to use the method of this invention to provide movie recommendations to users: Scenario assumption: User U1 has watched multiple movies over a period of time. For example, U1 watched movie M1 in the past week and movie M2 several months ago. The item recommendation system based on hybrid collaborative filtering first calculates the global average rating based on historical data. And the bias values for each user and item (e.g., finding that user U1's average rating is too high, and some popular movies have a positive bias, etc.). For the interaction between user U1 and movie M1, because M1 was recently watched and U1 has a high preference for it, it's possible... and Slightly below 1, therefore the residual A positive value indicates that U1's preference for M1 is higher than average; conversely, for M2, which was viewed earlier, it may be... A value close to 0 or negative (indicating that interest has cooled or that M2 is already popular among the general public).
[0080] Offline preparation: Based on historical data statistics, the system preprocesses the data needed to calculate similarity for all users. For example, it calculates the set of movies each user has watched. Audience group for each film Based on this, user-user and item-item similarity matrices are calculated (applying time decay factor, IUF, and shrinkage factor). The neighborhood size is also set, such as retaining a certain number of units per user. The most similar neighbor users, each item is reserved The system identifies the most similar neighboring items and stores this neighborhood information for online queries. In this process, the system only utilizes past interaction records to ensure that the calculated bias and similarity do not reveal future information.
[0081] Online recommendation calculation: When a recommendation needs to be generated for user U1, the item recommendation system based on hybrid collaborative filtering performs the following steps: Residual calculation: Real-time acquisition of user U1's most recent interaction behavior (e.g., M1 and M2), based on pre-calculated... Then, the residual preference of U1 for each watched film was calculated. and Simultaneously, their time weights are calculated based on the difference between the current time and the viewing time. (Since M1 was watched recently,) M2 takes longer. ).
[0082] User neighborhood contribution calculation: The system retrieves the top neighbors of user U1 from the pre-stored list of user neighbors. Neighbor User Set (These neighbors are users with similar viewing preferences to user U1, such as having recently watched similar types of movies.) For each candidate film... The system aggregates all neighbors. For the film The preference contribution, i.e., calculating the user's For the film User-side prediction score: For example, suppose user U2 is a neighbor of U1 and U2 recently showed a high residual preference for a movie X. Then, movie X will receive a significant positive contribution from user U2, increasing its recommendation score for user U1. Conversely, if a movie Y's high rating mainly comes from users with interests dissimilar to user U1's, it will not provide a significant score for user U1 through neighbor aggregation.
[0083] Item neighborhood contribution calculation: Simultaneously, the system utilizes a pre-stored list of similar items to obtain a set of similar movies for each movie watched by user U1. For example, for movie M1 watched by user U1, it finds several movies most similar to movie M1 (a set of movies of the same genre or with the same lead actors). For each candidate movie in these similar movies... According to formula (7), the prediction scores of user U1 for movies similar to those already watched are accumulated: Thus, if video Z is very similar to video M1, which is liked by user U1, and user U1 has a positive residual preference for video M1, then video Z will receive a higher item-side score. The system will perform this calculation on all similar candidates of videos watched by user U1 to obtain an array of recommended item-side scores.
[0084] Score fusion and candidate item set generation: The system combines the user-side scores and item-side scores according to formula (8) with weights. The data is then fused and a bias term is added to obtain the initial overall score for the candidate films on U1. Then the system selects the highest-rated film from all the films that haven't been watched yet. A set of candidate items With K=100, the candidate list includes videos X, Y, Z, etc. (these videos are inferred from the neighborhood as potentially interesting content). At this point, if U1 has watched fewer videos before, the contribution from the item side may be greater; conversely, if the user has a larger neighborhood, the contribution from the user side will be greater. The fusion score ensures that the two signals complement each other.
[0085] Diversity Reordering: For candidate item sets The system further calculates the popularity of each film and applies formula (9) to adjust the score. Assuming there is a popular blockbuster H among the candidates, with a very large number of viewers in the training set, then its... The score is relatively high, and after deductions, the effective score will be slightly lower than that of some niche films. Subsequently, the system uses the MMR algorithm to select the final top [performing films]. Recommendations (e.g.) First, select the film with the highest valid score as the first item in the results list, for example, film X (assuming film X has both high relevance and some novelty). Next, calculate the scores for the remaining candidates. If a film Y is very similar in genre to a selected film X (resulting in a high similarity score), even if film Y's relevance score is not low, its overall score may be reduced by the similarity factor, making it lower than that of a film Z with a different style. In this case, the second step might select film Z instead of film Y to ensure the recommendation list covers different genres. This process iterates, with the third film potentially being significantly different from the first two, and so on, until all 10 films are selected. After MMR processing, user U1's final recommendation list includes films X, Z, ... etc., with diverse themes, greatly reducing homogenization. For example, if the top 3 candidates before re-ranking were all action films, the MMR re-ranking might retain the most relevant action film X while introducing a comedy film Z and a documentary film W, enriching the list. These top 10 films will ultimately be presented to the user as the recommendation result.
[0086] Through the above implementation steps, user U1 will not only receive movies highly matched to their interests, but the list also features diverse genres and styles, including both popular films and niche gems, thereby improving user satisfaction and the platform's long-tail benefits. It is important to emphasize that the parameters of each module in this invention can be adjusted according to application requirements, for example, by selecting the optimal [module / module] through a validation set. Parameters are adjusted to adapt to different datasets and business objectives. In practical implementation, each step can also be optimized (e.g., using inverted indexes to accelerate neighborhood calculations, batch calculations to reduce latency, etc.) to ensure the algorithm runs efficiently in real-world systems.
[0087] The HybridCFPlus recommendation framework proposed in this invention is applicable to a wide range of personalized recommendation scenarios. For example: 1) E-commerce product recommendations: This feature can be used on e-commerce websites to recommend products based on users' browsing and purchase history. Bias correction avoids recommending only popular items, time decay ensures recommendations keep pace with users' recent interests (such as seasonal preferences), and diversity reordering allows recommendations to cover different categories and brands, enhancing the user's shopping exploration experience.
[0088] 2) Audio-visual content recommendation: This method is applied to film, music, and short video platforms to recommend movies, songs, or videos based on users' viewing and liking records. Time-based optimization ensures newly released content appears promptly, while anti-popularity mechanisms prevent top-ranked songs / TV series from monopolizing recommendations, increasing exposure for niche content. Diverse lists provide users with comprehensive entertainment content recommendations.
[0089] 3) News and Social Media Feeds: The platform can recommend news articles or social media posts based on users' reading history using this method. Bias processing reduces the overwhelming effect of headlines on all users, time decay highlights the latest trending topics, and MMR ensures that recommended news is diverse in its themes, not limited to a single topic, thus satisfying users' diverse reading interests.
[0090] 4) Other fields: For example, in any data-driven scenario involving user-item interaction, such as digital book recommendations, app store recommendations, and online advertising recommendations, the method of this invention can be used to improve the quality and diversity of recommendations.
[0091] In terms of scalability, the HybridCFPlus framework has good scalability and can be improved according to actual needs: 1) Content features can be incorporated: When calculating the similarity between items, the content attribute similarity of items can be fused in addition to the collaborative filtering signal, or content similarity can be directly used as the similarity in MMR. This further enhances the diversity of recommendation results (especially in scenarios with rich attribute information).
[0092] 2) Adaptive Fusion: This invention uses fixed weights to fuse user-side and item-side scores, but it can be easily extended to adaptive weighting—for example, dynamically adjusting based on the user's historical behavior count. To mitigate the cold start problem, the model increases item-side weights for users with low interaction levels and increases user-side weights for users with high interaction levels to utilize more personalized information. This improvement allows the model to perform better for different user groups.
[0093] 3) Advanced normalization strategy: Other mature normalization methods can be introduced into similarity calculation, such as the BM25 formula in the field of information retrieval, to replace the current IUF+ shrinkage scheme, so as to obtain a more refined co-occurrence count smoothing effect.
[0094] 4) Exposure Bias Correction: Combining causal inference or techniques such as IPS (Inverse Propensity Scoring) and CRM, the item exposure rate is considered when calculating the bias or objective function, giving more attention to high-quality items with a small audience, thus improving recommendation fairness. This is naturally compatible with the framework of this invention and can be incorporated into the residual stage or objective optimization.
[0095] 5) Integrating Lightweight Graph Models: This invention focuses on neighborhood methods, but can also be combined with new technologies such as graph neural networks or contrastive learning. For example, contrastive learning on the user-item graph can enhance the representation to a more robust form, which can be used as additional features for MMR re-ranking and may further improve the results. While ensuring controllable real-time computation, these self-supervised signals can improve the model's noise resistance.
[0096] In summary, HybridCFPlus provides a concise and modular recommendation framework that has proven to significantly improve the performance of traditional collaborative filtering in its current form. Its components can be independently improved or replaced to adapt to different application scenarios. For example, in large-scale systems, the neighborhood size and parallel computing scheme can be adjusted to meet performance requirements; in applications that emphasize novelty, it can improve… or reduce This further emphasizes diversity. With its high flexibility and excellent results, the method of this invention is expected to be widely used in industry and can serve as a foundation for further research and development in the field of recommender systems.
[0097] Corresponding to the above method, the present invention also provides an item recommendation system based on hybrid collaborative filtering. The system includes a computer device, which includes a processor and a memory. The memory stores computer instructions, and the processor is used to execute the computer instructions stored in the memory. When the computer instructions are executed by the processor, the system implements the steps of the method described above.
[0098] This invention also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the aforementioned item recommendation method based on hybrid collaborative filtering. The computer-readable storage medium can be a tangible storage medium, such as random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, register, floppy disk, hard disk, removable storage disk, CD-ROM, or any other form of storage medium known in the art.
[0099] The item recommendation method and system based on hybrid collaborative filtering of this invention can incorporate time decay factors to reflect recent changes in user interests after removing global and individual biases. It also improves the recall rate of recommendation results by combining neighborhood signals from both the user and item sides, and effectively alleviates the problem of over-concentration in recommendation results through diversity reordering. Thus, it improves novelty and long-tail coverage while ensuring recommendation relevance. In short, this invention aims to overcome the three major limitations of traditional neighborhood recommendation: preference bias, insufficient timeliness, and list redundancy.
[0100] Those skilled in the art will understand that the exemplary components, systems, and methods described in conjunction with the embodiments disclosed herein can be implemented in hardware, software, or a combination of both. Whether implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this invention. When implemented in hardware, it can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this invention are programs or code segments used to perform the desired tasks. The programs or code segments can be stored in a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried in a carrier wave.
[0101] It should be clarified that the present invention is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of the present invention is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of the present invention.
[0102] In this invention, features described and / or illustrated for one embodiment may be used in the same or similar manner in one or more other embodiments, and / or combined with or in place of features of other embodiments.
[0103] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, various modifications and variations of the embodiments of the present invention are possible. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for item recommendation based on hybrid collaborative filtering, characterized in that, The method includes the following steps: The system acquires historical user interaction behavior data, performs residual preprocessing on the interaction behaviors in the historical user interaction behavior data to remove global bias and individual bias, and obtains the user's residual preference value for the items that have been interacted with; wherein, the historical user interaction behavior data includes a user set, an item set, and a user-item interaction record set, and the individual bias includes user bias and item bias; The time decay factor of the interaction behavior is calculated based on the interval between the current recommendation time and the interaction time of the interaction behavior. The neighborhood set of each user is obtained by introducing the time decay factor and the inverse frequency factor of the item based on the neighborhood similarity of the user. The neighborhood set of each item is obtained by introducing the time decay factor and the inverse frequency factor of the user based on the neighborhood similarity of the item. The user-side prediction score for the target item is calculated based on the neighborhood similarity of neighboring users in the target user's neighborhood set and the residual preference values of neighboring users for the target item. Based on the neighborhood similarity of neighboring items in the neighborhood set of items that the target user has interacted with and the residual preference value of the target user for the items that have been interacted with, calculate the item-side prediction score of the target user for items that are similar to the items that have been interacted with. The target user's initial rating for the initial item is generated by weighting and combining the user-side predicted score and the item-side predicted score, and adding the global bias and individual bias removed in the residual preprocessing. Based on the initial rating, an initial candidate item set with initial ranking for the target user is generated. The candidate items in the initial candidate item set are re-ranked by introducing an anti-popularity penalty index to obtain a re-ranked candidate item set. Based on the re-ranked candidate item set, the item recommendation result for the target user is obtained.
2. The method according to claim 1, characterized in that, The step of performing residual preprocessing on the user's historical interaction behavior data to remove global and individual biases, and obtaining the user's residual preference value for the interacted items, includes: calculating the user's residual preference value for the interacted items based on the following formula: ; in, This represents the residual preference value after removing global and individual biases. Indicates the intensity of interaction or an explicit rating. This represents the global average bias. Indicates user bias. This indicates that the item is offset.
3. The method according to claim 1, characterized in that, The time decay factor is calculated based on the following formula: ; in, Indicates user and items The time decay factor of the interaction behavior between them Indicates the current recommendation time and user The most recent item The time interval between interactive actions Indicates the attenuation constant; The inverse frequency factor of the item is calculated based on the following formula: ; in, The inverse frequency factor of an item, Represents items The number of interactive users; The user's inverse frequency factor is calculated based on the following formula: ; in, Represents the user's inverse frequency factor. Indicates user The number of items that have been interacted with.
4. The method according to any one of claims 1-3, characterized in that, The method further includes: capturing interest shifts by comparing the similarity between the user's latest behavior and their historical preferences; adjusting the time decay factor to give higher weight to the user's latest behavior when interest shifts are captured; and updating the current user's neighborhood set and the user's residual preference value for interacted items in real time. The fusion weight coefficient for the weighted combination of the user-side predicted score and the item-side predicted score is dynamically calculated based on the preset fusion weight coefficient and the matching relationship between the user-item historical interaction density.
5. The method according to claim 1, characterized in that, The calculation of user-based neighborhood similarity by introducing the time decay factor and the inverse frequency factor of the item includes: calculating user-based neighborhood similarity based on the following formula by introducing the time decay factor, the inverse frequency factor of the item, and the first contraction factor: ; in, Indicates user and users Neighborhood similarity between them and Representing users respectively and users A collection of items Indicates user and items The time decay factor of the interaction behavior between them Indicates user and items The time decay factor of the interaction behavior between them and Representing users respectively and users The number of items that have been interacted with. Represents items inverse frequency factor, Indicates the first contraction factor; The calculation of item-based neighborhood similarity by introducing the time decay factor and the user's inverse frequency factor includes: calculating item-based neighborhood similarity based on the following formula by introducing the time decay factor, the user's inverse frequency factor, and the second contraction factor: ; in, Represents items and items Neighborhood similarity between them and Representing items and items The user set, Indicates user and items The time decay factor of the interaction behavior between them Indicates user and items The time decay factor of the interaction behavior between them and Representing items and items The number of users who have interacted with the account. Indicates user inverse frequency factor, This represents the second contraction factor.
6. The method according to claim 1, characterized in that, The calculation of the target user's user-side prediction score for the target item based on the neighborhood similarity of neighboring users in the target user's neighborhood set and the residual preference values of neighboring users for the target item includes: calculating the target user's user-side prediction score for the target item based on the following formula: ; in, Indicates target user For target items User-side prediction score, Indicates neighboring users For target items The residual preference value, Indicates user and users Neighborhood similarity between them Indicates user The neighborhood set; The step of calculating the item-side prediction score for items similar to those already interacted with by the target user, based on the neighborhood similarity of neighboring items in the neighborhood set of items already interacted with by the target user and the residual preference value of the target user for the items already interacted with, includes: calculating the item-side prediction score for items similar to those already interacted with by the target user based on the following formula: ; in, Indicates target user For items similar to those already interacted with The corresponding item-side prediction score, Indicates user For items The residual preference value, Represents items and items Neighborhood similarity between them Represents items The neighborhood set.
7. The method according to claim 1, characterized in that, The candidate items in the initial candidate item set are re-ranked by introducing an anti-prevalence penalty index, resulting in a re-ranked candidate item set, including: Confirm the penalty value corresponding to the popularity index for each item in the initial candidate item set; The initial score of each item in the initial candidate item set is subtracted from the corresponding penalty value to obtain the updated score value of each item. Based on the updated score value, the items are reordered to obtain the reordered candidate item set.
8. The method according to claim 7, characterized in that, Based on the reordered set of candidate items, the recommended items for the target user include: Select the first number of highest-rated items from the reordered candidate item set to form the final candidate item set, which will serve as the item recommendation result for the target user.
9. The method according to claim 1, characterized in that, The candidate items in the initial candidate item set are re-ranked using an anti-prevalence penalty index to obtain a re-ranked candidate item set. Based on the re-ranked candidate item set, the item recommendation results for the target user are obtained, including: Confirm the penalty value corresponding to the popularity index for each item in the initial candidate item set; Subtract the corresponding penalty value from the initial score of each item in the initial candidate item set to obtain the updated score value of each item, and re-sort the items based on the updated score value; The maximum marginal relevance algorithm is used to perform a second screening of the items in the reordered candidate item set to obtain the first number of item recommendation lists, which are used as the item recommendation results for the target user. Each iteration of the secondary screening is based on the following formula: ; in, This indicates the candidate items in the initial candidate item set that have not yet been selected into the currently selected item set for the user. The rating function for the items in the currently selected item set; Represents items With users Relevance score, Represents items With users The similarity of items in the currently selected item set. , which is the compromise factor.
10. The method according to claim 7, characterized in that, After reordering the candidate items based on the updated rating values to obtain a reordered set of candidate items, the method further includes: Adjust the current ranking using pre-set diversity and fairness constraints to obtain an updated, re-ranked set of candidate items.
11. An item recommendation system based on hybrid collaborative filtering, comprising a processor, a memory, and a computer program / instructions stored in the memory, characterized in that, The processor is configured to execute the computer program / instructions, and when the computer program / instructions are executed, the system implements the steps of the method as described in any one of claims 1 to 10.
12. A computer-readable storage medium having a computer program / instructions stored thereon, characterized in that, When the computer program / instructions are executed by the processor, they implement the steps of the method as described in any one of claims 1 to 10.