[0062] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
[0063] figure 1 It is a schematic diagram of the method flow of the recommendation method based on multi-category joint soft clustering in the embodiment of the present invention, such as figure 1 As shown, the recommended method includes:
[0064] S11: Obtain user-item interaction information, and construct a scoring matrix and a classification matrix according to the user-item interaction information;
[0065] S12: Perform multi-category soft clustering processing on the scoring matrix and the classification matrix to obtain a multi-category soft clustering result;
[0066] S13: Using weighted non-negative matrix factorization to predict the user preference of the multi-category soft clustering result, and obtain the prediction result;
[0067] S14: Recommend the item with the highest prediction score to the user according to the prediction result.
[0068] To further explain S11:
[0069] Obtain user-item relationship, user-user relationship, and item-item relationship according to the user-item interaction information; construct the scoring matrix based on the user-item relationship, the user-user relationship, and the item-item relationship And the classification matrix; the classification matrix includes a user classification matrix and an item classification matrix.
[0070] In the user-item interaction information, there are three different types of internal relationships: user-item relationship, user-user relationship, and item-item relationship.
[0071] Suppose there are n users and m items, and the only information we know is the user-item rating matrix , Where T ij Represents the rating of item j by user i, u i Represents the i-th user, y j It represents the jth item.
[0072] Our goal is to classify users and items into c subcategories at the same time, where users/items can appear in multiple subcategories.
[0073] The result of MCoC can use a classification matrix Said, where P ij Represents an element (user or item) indicating the value of the jth subcategory, P ij ∈[0,1]. If P ij 0 means that the i-th element belongs to the j-th subcategory, P ij =0 means not belonging; P ij The size of represents the relevant weight belonging to the subcategory, where the sum of all weights of each row is 1. If the number of subcategories where each element is fixed, for example, k(1
[0074]
[0075] among them Is the user classification matrix, Is the item classification matrix.
[0076] To further explain S12:
[0077] Construct a shared low-order spatial matrix according to the scoring matrix and the classification matrix; use a minimized objective function to perform multi-category clustering iterative calculation processing on the low-order spatial matrix to obtain the iterative value of the objective function; adopt the The objective function iteration value is compared with the iteration threshold. If the objective function iteration value is less than the iteration threshold, the iteration is stopped and the clustering low-order space matrix is obtained; otherwise, the iterative calculation is continued; for the clustering low-order space matrix Each row of is normalized to obtain multi-category soft clustering results.
[0078] The first is to construct a shared low-order space matrix. These three relationships will be considered at the same time, and a loss function expression will be proposed to convert the clustering problem into a loss function minimization problem:
[0079] 1) User-item relationship
[0080] If a user makes a high score for an item, it is more likely to appear in the same subcategory at the same time; in order to put these strongly related elements together, the following loss function is proposed for the user-item relationship:
[0081]
[0082] Where q i Is the ith row of Q, r j Is the jth row of R, Is the user’s degree diagonal matrix, Is the degree diagonal matrix of the item
[0083] This loss function is very easy to understand, because only user-item information is known. Minimizing this loss function means taking high-scoring user-item pairs. In the result matrix P, the indicator vector of user i and the item j The indicator vectors must be very close.
[0084] 2) User-user relationship
[0085] What this step does is to use the score matrix T to model the user-user relationship. First, you need to calculate the similarity between two users , Here you can use the Euclidean distance, Pearson correlation coefficient equidistant calculation method; use the loss function calculation method similar to the above user-item, there are:
[0086]
[0087] among them This loss function means that two users with high similarity have more similar indicator vectors in the result matrix.
[0088] 3) Item-item relationship
[0089] This step is to use the score matrix T to model the item-item relationship, which is similar to the user-user relationship modeling above; first, you need to calculate the similarity between the two items , There is a loss function:
[0090]
[0091] among them This loss function means that two items with high similarity have more similar indicator vectors in the result matrix.
[0092] 4) Objective function
[0093] Combining the above three loss functions, the loss function for solving the classification matrix P is obtained:
[0094] ∈(P)=∈(Q,R)+∈(Q)+∈(R)
[0095] s.t.
[0096]
[0097]
[0098] |P i |=k, i=1,..., (m+n)
[0099] The parameter c is the number of all sub-categories, k is the number of sub-categories allowed for each user or item (1≤k≤c), and |·| is the cardinality constraint, representing the number of non-zero values of a vector.
[0100] Due to these constraints, minimizing the objective function is very difficult to solve, so an approximate method is adopted to obtain an approximate solution; the approach is similar to spectral clustering, which is divided into two stages
[0101] a) Map all users and items to a shared low-order space, and construct a shared low-order space matrix:
[0102] This step is to obtain an r-dimensional approximate expression of P; first, simplify the loss function to obtain:
[0103]
[0104] Where L Q Is the user's degree diagonal matrix The rest is the difference between the weight matrix W between 0 and the user, L R Is the degree diagonal matrix of the item The rest is the difference between the weight matrix W between 0 and the user; matrix S:
[0105]
[0106] Tr() calculates the sum of the diagonals of the matrix; decompresses the approximation of the result matrix to an r-dimensional space, relaxes the constraints, and converts it into minimizing the following objective function:
[0107]
[0108]
[0109] By solving the r minimum eigenvalues of MX=λX, X=[x 1 ,..., x r ].
[0110] b) Iterate on category clustering:
[0111] Two clustering methods can be used for clustering, namely hard clustering and soft clustering; in the embodiment of the present invention, soft clustering is adopted, and an object can appear in multiple categories. Therefore, fuzzy c-means is selected for clustering. Class; the approach is to minimize the following objective function:
[0112]
[0113] Where P ij Is the relationship between object i (user or product) and subclass j in the result matrix P, v j Is the class center of the subclass, d() is the distance function, and l is the parameter of the degree of fuzzification; iteratively update P and V:
[0114]
[0115]
[0116] The objective function is calculated after each iteration, and the iteration is stopped when the improved value of the objective function is less than a threshold; after the iteration is stopped, for each row of P, only the largest k elements are retained and normalized to ensure the sum of each row It is 1; in the embodiment of the present invention, the iteration threshold may be 0.5, and the specific threshold may be set according to the needs of the user, and there is no mandatory requirement in the embodiment of this aspect.
[0117] To further explain S13:
[0118] The multi-category soft clustering results are divided into sub-category matrices to obtain sub-category matrices; non-negative matrix decomposition and prediction processing is performed on the sub-category matrix to obtain the prediction results of the sub-category matrix; The prediction result of the sub-category matrix is calculated and the prediction result is obtained.
[0119] Through the sub-categories divided above, some small matrices are obtained from the original score matrix. Perform weighted NMF matrix (non-negative matrix) decomposition in each sub-matrix.
[0120] Make a simple improvement to the weighted NMF matrix; generally speaking, the initial value of NMF often uses random initialization; but it is found that there is such a phenomenon: in the existing ratings, most of the user A's score is low, and most of the user B's score is high. In the case of random initialization, the result of WNMF is often that A’s other scores are higher than B’s other scores. The main reason is that random initialization has similar interest vectors to all users/items from the perspective of vector moments, so when users When i corresponds to item j with a lower score, matrix decomposition tends to orthogonalize the corresponding user vector and item vector. Therefore, when initializing the two matrices of matrix decomposition, consider the average rating of the user and the item.
[0121] After making predictions in each sub-matrix, use a weighted sum method to calculate the final prediction score:
[0122]
[0123] Among them, Pr(u i , Y j , K) represents the predicted score of user i for item j in subcategory k.
[0124] To further explain S14:
[0125] The obtained prediction results are sorted from low to low in prediction scores, and the sorting results are obtained; the top 10 items with the highest sorting are recommended to the user.
[0126] In the embodiment of the present invention, a top-to-bottom sorting method is used for scoring sorting, and then the top 10 items sorted are recommended to the user according to the sorting result; the sorting method here can be various and can be based on the user's preference Choose a different sorting method.
[0127] figure 2 It is a schematic diagram of the system structure composition of a recommendation system based on multi-category joint soft clustering in an embodiment of the present invention, such as figure 2 As shown, the recommendation system includes:
[0128] Matrix construction module 11: used to obtain user-item interaction information, and construct a scoring matrix and a classification matrix according to the user-item interaction information;
[0129] Clustering module 12: used to perform multi-category soft clustering processing on the scoring matrix and the classification matrix to obtain a multi-category soft clustering result;
[0130] Prediction module 13: configured to use weighted non-negative matrix factorization to predict the user preference of the multi-category soft clustering result, and obtain the prediction result;
[0131] The recommendation module 14 is used to recommend the item with the highest prediction score to the user according to the prediction result.
[0132] Preferably, the matrix construction module 11 includes:
[0133] Relationship acquisition unit: used to acquire user-item relationship, user-user relationship, and item-item relationship according to the user-item interaction information;
[0134] Matrix construction unit: used to construct the scoring matrix and the classification matrix according to the user-item relationship, the user-user relationship, and the item-item relationship;
[0135] The classification matrix includes a user classification matrix and an item classification matrix.
[0136] Preferably, the clustering module 12 includes:
[0137] The second matrix construction unit: used to construct a shared low-order spatial matrix according to the scoring matrix and the classification matrix;
[0138] Clustering iteration unit: used to perform multi-category clustering iterative calculation processing on the low-order spatial matrix by using a minimized objective function to obtain the objective function iteration value;
[0139] Judging unit: used to compare the objective function iteration value with the iteration threshold, if the objective function iteration value is less than the iteration threshold, stop the iteration and obtain the clustering low-order spatial matrix; otherwise, continue the iterative calculation;
[0140] Normalization unit: used to normalize each row of the clustering low-order spatial matrix to obtain a multi-category soft clustering result.
[0141] Preferably, the prediction module 13 includes:
[0142] Matrix division unit: used to divide the multi-category soft clustering results into sub-category matrix to obtain the sub-category matrix;
[0143] Subclass matrix prediction unit: used to perform non-negative matrix decomposition prediction processing on the subclass matrix to obtain the subclass matrix prediction result;
[0144] Weighted calculation unit: used to calculate the prediction result of the sub-category matrix by using the weighted summation to obtain the prediction result.
[0145] Preferably, the recommendation module 14 includes:
[0146] Sorting unit: used to sort the obtained prediction results from the highest score to low, and obtain the sorting result;
[0147] Recommendation unit: used to recommend the top 10 items with the highest ranking to users.
[0148] Specifically, for the working principle of the system-related functional modules in the embodiment of the present invention, refer to the related description of the method embodiment, and details are not described herein again.
[0149] In the embodiment of the present invention, by using multi-category soft clustering, users and items are classified into multiple overlapping subcategories. Users and items share the same clustering space to perform multi-category soft clustering, which is actually The discovery process of a domain of interest is to classify a certain type of items and users who like that type into a category. For example, classify electronic products and users who like electronic products into a sub-category, and classify daily necessities and daily necessities. Users belong to one sub-category; at the same time, a customer can exist in multiple sub-categories, both like daily necessities and electronic products; the resulting sub-categories can well represent the user’s interest domain; through these sub-categories, from the original Generate sub-matrices in the matrix, and apply the matrix decomposition algorithm to these sub-matrices. On the one hand, the sparsity of the matrix is greatly reduced, and the performance of matrix decomposition prediction is improved; on the other hand, because the ratings of users in the same interest domain are used to predict , Its reliability is higher than considering the ratings of all users, because the evaluation of items that the user is not interested in has certain interference with the prediction of the rating of the items in the user-interest domain; it can be based on the multi-category soft clustering of users and items The rating prediction is performed on the favorite degree, and items are recommended to the user based on the rating prediction, and the prediction accuracy is high.
[0150] A person of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by a program instructing relevant hardware. The program can be stored in a computer-readable storage medium, and the storage medium can include: Read Only Memory (ROM, Read Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.
[0151] In addition, a recommendation method and system based on multi-category joint soft clustering provided by the embodiments of the present invention are described in detail above. In this article, specific examples shall be used to illustrate the principles and implementation of the present invention. The description of the examples is only used to help understand the method and the core idea of the present invention; at the same time, for those skilled in the art, according to the idea of the present invention, there will be changes in the specific implementation and the scope of application. In summary As mentioned, the content of this specification should not be construed as limiting the present invention.