Negative sample selection method against single-type collaborative filtering problems

A collaborative filtering and negative sample technology, applied in the field of Internet recommendation, can solve the problems of not considering the influence of the user's social relationship interests, not considering the user's interest characteristics, etc.

Active Publication Date: 2017-12-01
UNIV OF ELECTRONIC SCI & TECH OF CHINA
4 Cites 7 Cited by

AI-Extracted Technical Summary

Problems solved by technology

The problem with these negative sample selection methods is that they basically select items randomly, without consid...
View more

Abstract

The invention provides a negative sample selection method against single-type collaborative filtering problems. The method aims to help selection of a negative sample of each user according to an implicit feedback scene without an explicit negative sample in a recommendation algorithm based on a machine learning model. The method comprises the specific steps that (1) weights of articles selected to serve as negative samples are calculated based on the popularity of the articles; (2) weights of the articles selected to serve as the negative samples are calculated based on the social relation between the users; (3) weights of the articles selected to serve as the negative samples are calculated based on preferences of the users on article features; (4) the popularity weights, the social relation weights and the feature weights of the articles are fused to calculate the probability that the articles are selected to serve as the negative samples of a target user; and (5) a plurality of articles with the highest negative sample probability are selected to serve as the negative samples of the target user according to a certain proportion and the number of positive samples of the target user.

Application Domain

MarketingSpecial data processing applications

Technology Topic

Selection methodData mining +5

Image

  • Negative sample selection method against single-type collaborative filtering problems
  • Negative sample selection method against single-type collaborative filtering problems
  • Negative sample selection method against single-type collaborative filtering problems

Examples

  • Experimental program(1)

Example Embodiment

[0062] Example
[0063] Suppose that a user set {a,b,c,d,e,f} composed of 6 users has a behavior on an item set {item1,item2,...,item10} composed of 10 items. The user behavior data record is shown in Table 1. The social relationship here is assumed to be a one-way following relationship. For example, if user a follows user b, then b is a friend of a, and a is not a friend of b. In this embodiment, the specific process of selecting a negative sample for user a is described in detail. Image 6 It is a schematic diagram of friends obtained according to the following relationship of friends in this embodiment.
[0064] Table 1 User behavior data
[0065] User
[0066] Step 1: For each user u, calculate the popularity weight of its non-behavior items
[0067] First, count the total number of actions performed by users for each item in the data set. The results are shown in Table 2:
[0068] Then, sort the items in ascending order according to the number of behaviors: item6, item8, item9, item10, item3, item4, item5, item7, item2, item1;
[0069] Table 2 Statistics of the number of acts performed by the item
[0070] Items
[0071] Third, the 10 items are equally distributed to 4 levels according to the number of actions, and the number of items in each level is Therefore, the level assignments obtained are: item6, item8, and item9 belong to level1; item10, item3, and item4 belong to level2; item5, item7, and item2 belong to level3; item1 belongs to level1;
[0072] Fourth, mark the popularity of each item according to the level to which each item belongs, as shown in Table 3:
[0073] Table 3 Popularity of items
[0074] Items
[0075] Fifth, use formula (1) to calculate the popularity weight w of each item p (i), where α=0.5;
[0076] w p (item1)=1+α·k=1+0.5×4=3, in the same way:
[0077] w p (item2)=w p (item5)=w p (item7)=1+0.5×3=2.5
[0078] w p (item3)=w p (item4)=w p (item10)=1+0.5×2=2
[0079] w p (item6)=w p (item8)=w p (item9)=1+0.5×1=1.5
[0080] Sixth, normalize the popularity weight of each item to the range of [0,1] by formula (2);
[0081]
[0082]
[0083]
[0084]
[0085] Step 2: According to the social relationship of each user, calculate the weight w that considers the influence of the social relationship, the items that the user has no behavior are selected as the negative sample of the user w s (u,i), the specific steps are:
[0086] First, for a specific user, suppose that for user a in the data set, calculate the item set of user a’s friends who have behaviors but user a has no behaviors DIFF (a). From the data set, we can see that the friends followed by user a are b, c, d, and statistical data is available for item DIFF (a)={item1, item4, item6, item7};
[0087] Then, create an item collection item DIFF (a) Each item in the inverted index table to user a’s friends, for item DIFF Item i in (a), when friend x acts on it, the element a[i][x]=1 of the inverted sorting table, otherwise a[i][x]=0, the inverted index table is shown in Table 4 Shown:
[0088] Table 4 Items-Friends inverted list
[0089] Items
[0090] Third, calculate the social relationship of user a to item DIFF Item i in (a) is used as the weight of the negative sample, as shown in formula (4), where Indicates the influence of friend x on user a, defined as formula (5). First calculate the number of items that user a and each of his friends have in common:
[0091] overlap(a,b)=|{item2,item3,item5}∩{item1,item2,item7}|=|{item2}|=1
[0092] overlap(a,c)=|{item2,item3}|=2
[0093] overlap(a,d)=|{item2,item5}|=2
[0094] Then, calculate the influence of user a's friends on it:
[0095]
[0096]
[0097]
[0098] Finally, calculate the item DIFF (a)=The items in {item1, item4, item6, item7} are selected as the weight of the negative sample of user a according to the social relationship:
[0099]
[0100]
[0101]
[0102]
[0103] Repeat the above steps to obtain the weight of each user's non-behavior items selected as negative samples calculated based on their social relationships;
[0104] Step 3: Based on the characteristics of the item and the historical behavior of user a, the logistic regression model is used to calculate the weight w that the item that user a has no behavior is selected as the negative sample of user a f (a,i). In this embodiment, assuming that the article only has text, the specific steps are:
[0105] First of all, for the article text content, based on the hidden feature extraction method of the topic extraction model LDA, all 10 article texts are input as a set, and LDA extracts k (k=4) hidden topics from the input sample to obtain each The distribution probability of an item i on the 4 hidden topics. In the method of this embodiment, the 4 hidden topics are taken as the 4 features of the item, and the hidden topic distribution probability is taken as the feature value. As shown in Table 5, the value of each item on 4 characteristics:
[0106] Table 5 Item feature values
[0107]
[0108] Training the user’s preference model for item features based on the logistic regression model. Since the model needs to be trained, positive and negative samples are also needed. In this embodiment, a certain ratio (1:1) is randomly selected as negative samples for items that user a has no behavior. For training, the item feature preference training sample set of user a as shown in Table 6 is obtained;
[0109] Table 6: User a's item feature preference training sample set
[0110]
[0111] Based on the above training set, a logistic regression model is used to train to obtain user a's preference weights for different item features. The logistic regression model is shown in formula (6). The weights of user a for the four features obtained in this embodiment are: wf 1 = -0.04807, wf 2 = 0.1457, wf 3 =0.0941, wf 4 = -0.1961;
[0112] Second, use the item feature preference model of user a after training to calculate user a's preference for items {item1, item4, item6, item7, item8, item9, item10} without behavior, the calculation method is shown in formula (7) :
[0113]
[0114]
[0115]
[0116]
[0117] In the same way, the preference degree of item1, item4, and item9 can be calculated according to the weight of 4 item characteristics:
[0118] like (a, item1) = 0.4868 like (a, item4) = 0.4963 like (a, item9) = 0.4954
[0119] Third, calculate user a, and select items {item1, item4, item6, item7, item8, item9, item10} that have no behavior as the feature weight of the user's negative sample. The calculation method is as shown in formula (7):
[0120] w f (a,item1)=1-like(a,item1)=1-0.4868=0.5132
[0121] The same can be obtained:
[0122] w f (a,item4) = 0.5037 w f (a,item6)=0.5084 w f (a,item7) = 0.5
[0123] w f (a,item8) = 0.4916 w f (a,item9) = 0.5046 w f (a,item10) = 0.4987
[0124] Repeat the above steps to train each user's preference model for item features, calculate each user, consider item feature factors, and select items as the feature weight of the negative sample;
[0125] Step 4: Integrate the three weights of item popularity, user social relationship, and item feature preference, and calculate the probability that user a selects {item1, item4, item6, item7, item8, item9, item10} as a negative sample, as in the formula ( 8), assuming that η 1 =0.5, η 2 =0.2, η 3 = 0.3, then:
[0126]
[0127]
[0128]
[0129]
[0130]
[0131]
[0132]
[0133] Step 5: For user a, sort the items with no behavior in descending order according to the calculated negative sample weight value {item1, item7, item4, item10, item6, item9, item8}, in a 1:1 ratio to the number of positive samples Relationship, select the negative sample item set {item1, item7, item4}. Repeat steps 4-5 to get the negative sample set of all users.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products