A user cold start recommendation system and method based on similarity matching
By constructing a knowledge graph database and a similarity matching algorithm, user ratings are converted into product type scores. The system then finds the most similar returning users for recommendations, solving the cold start problem of recommendation systems and enabling accurate recommendations for new users.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NANJING TECH UNIV
- Filing Date
- 2022-12-12
- Publication Date
- 2026-06-26
AI Technical Summary
When faced with new users and new products, recommendation systems cannot make effective recommendations due to insufficient data, leading to the cold start problem.
By constructing a knowledge graph database, user ratings for individual products are transformed into product type ratings. A similarity matching algorithm is used to find the most similar returning users, and a factorization machine model is used for recommendations.
It improves the accuracy and efficiency of the recommendation system for new users, expands the matching range, reduces the bias of recommendation results, and solves the cold start problem.
Smart Images

Figure CN115907828B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a user cold start recommendation system and method based on similarity matching, belonging to the field of intelligent recommendation technology. Background Technology
[0002] With the advent of the information age, the amount of information has exploded exponentially. Extracting useful information from this vast amount of data has become a crucial task for promoting the development of the Industrial Internet. Recommendation systems, based on massive amounts of data and user-product interaction information, perform targeted matching between products and users. In the Industrial Internet, recommendation systems also hold a pivotal position. However, recommendation systems constantly receive a large influx of new users and products, and the data on these users and products is often insufficient, which is the "cold start problem" faced by recommendation systems. How to solve the cold start problem and match the best possible results for users and products entering at different times is one of the most pressing issues that recommendation systems need to address. Summary of the Invention
[0003] This invention provides a user cold-start recommendation method based on similarity matching. It addresses the problem of insufficient data on new users, which hinders effective recommendations. The main focus is on solving the user cold-start problem in recommendation systems, preventing the system from failing to provide effective recommendations due to insufficient data. Therefore, this invention designs a data transformation algorithm to convert user ratings of individual products into ratings of product types. A similarity matching algorithm then matches new users with the most similar existing users. Finally, the features of these two users are correlated and fed into the recommendation system, reducing potential biases in the recommendation results and enabling the system to provide the most accurate recommendations possible for new users.
[0004] Meanwhile, this invention provides a system that employs a user cold start recommendation method based on similarity matching.
[0005] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is as follows:
[0006] A user cold start recommendation method based on similarity matching includes the following steps:
[0007] Step 1: Obtain all existing user and product information to build a knowledge graph database.<U,R,I> , where U={u1,u2,u3,…u b …,u n}, where U represents the set of attributes for returning users, n represents the number of returning users, and u b Represents the attributes of the b-th existing user; I = {i1, i2, i3, ... i c …,i m}, where I represents the set of product attributes, m represents the quantity of products, and i c Represents the attributes of the c-th product; R = {r1, r2, r3, ... r} d …,r l} represents the set of interaction relationships between existing users and the product, l represents the number of interaction relationships, and r represents the number of interaction relationships. d This represents the d-th interaction relationship; the constructed knowledge graph database contains multiple [relationships].<u,r,i> Triples existing in form;
[0008] The set R of interaction relationships between old users and products in this invention can be determined according to the specific situation. It can be the old user's rating record for the product, the old user's click record for the product, the old user's search record for the product, etc. The method proposed in this invention only uses the rating record.
[0009] Step 2: Extract the rating vector of a single existing user b for the product.
[0010] Step 3: Generate the product likes and dislikes rating vector for this long-time user.
[0011] Step 4: Generate product type preference / dislike rating vectors for all existing users, following the steps 2 and 3.
[0012] Step 5: After the new user inputs their information, follow steps 1, 2, and 3 to obtain the new user's product type preference / dislike rating vector.
[0013] Step 6: Perform similarity matching between the new user's product type preference rating vector and the product type preference rating vectors of all existing users to obtain the existing user whose current preference rating is most similar to the new user's.
[0014] Step 7: Associate the new user's product type preference rating with the existing user to obtain the new user's similarity prediction;
[0015] Step 8: Use the factorization machine model to calculate the relevance between new users and products, and complete the product recommendation for new users.
[0016] In step 1, u b ={UserID,age,gender,professional,…};
[0017] i c ={ItemID,type1,type2,…,type g}, type gThis represents the g-th product type, which includes the product's color, size, material, and function. All product type information is extracted and one-hot encoded to obtain the product type attribute vector. The type of this product is coded as 1, otherwise it is 0;
[0018] r d ={UserID,ItemID,score}.
[0019] In step 2, extract the product ratings of a single long-term user b and provide the rating vector for that user. in This represents the rating vector of long-time user b. This represents the rating given by existing user b to the m-th product. Within the product rating range of [1, 5].
[0020] Step 3 specifically includes the following steps:
[0021] S301: Establish a product rating space-user likes and dislikes discrimination space mapping rule, that is, the rating in the [1,5] rating range is proportionally mapped to the [-2,2] user likes and dislikes discrimination space. Negative scores represent that users are not interested in the product, and positive scores represent that users have some liking for the product.
[0022] S302: Map all rating records of old user b to the user preference space according to the product rating space-user preference space mapping rule, and generate the product preference rating vector for that user. in This represents the product preference / dislike rating vector for long-term user B. This represents the rating of old user b for the m-th product;
[0023] S303: Process the product type tags for each product rated by old user b according to the following formula.
[0024]
[0025] in, This represents the rating of long-term user b for the g-th product type, ultimately yielding a rating vector of long-term user b for each product type. in This represents the product type preference / dislike rating vector for long-term user b.
[0026] In step 4, obtain the product type preference / dislike rating vectors of all existing users. Here, * represents any one of all existing users b.
[0027] In step 5, after the new user data is input into the knowledge graph database, the product rating features of the new users are extracted and a rating vector is generated, following the steps in step 2. score x new This represents the new user's rating of the x-th product. Following step 3, we obtain the new user's like / dislike rating vector for each product type. This represents the new user's rating of their liking or disliking for the g-th product type.
[0028] New user data is also multiple<u,r,i> Triples that exist in the form of a form.
[0029] In step 6, a similarity matching function is established to perform similarity matching between the new user's product type preference rating vector and the product type preference rating vectors of all existing users, thereby obtaining the existing users whose ratings are most similar to the new user's. The similarity matching function is shown in the following formula.
[0030]
[0031] Where, sim(u new (,u) represents the similarity between new users' product type preference ratings and existing users' product type preference ratings; a higher similarity score indicates greater similarity between the two. q This represents the q-th product type preference rating of existing users; This represents the new user's likes and dislikes rating for the q-th product type, and the user with the highest similarity value is the most similar existing user to the new user.
[0032] In step 7, the product type preference rating vector of new users is associated with that of their most similar existing users using the following formula:
[0033]
[0034] in, Represents the new user association vector. Let sim represent the product type like / dislike rating vector of the old user who is most similar to the new user, and let sim represent the similarity between the new user's product type like / dislike rating and the old user's product type like / dislike rating.
[0035] In step 8, the knowledge graph database is input into the factorization machine (FM) model. Based on the FM model, the recommendation results are predicted, and the relevance y between users and products is output. FM , relevance y FM The best products are recommended to users;
[0036]
[0037] in, ω0 is a constant term representing the global bias of the FM model. ω0 is adjusted based on actual conditions to correct the recommendation results. u ∈R represents the weight of the user rating. These are the second-order weight parameters of the FM model. A vector representing user ratings of a product.
[0038] This represents the vector representing the rating a product receives.
[0039] A user cold start recommendation system based on similarity matching, which employs the user cold start recommendation method based on similarity matching of the present invention.
[0040] The present invention has the following beneficial effects:
[0041] This invention relates to a user cold-start recommendation system based on similarity matching. It employs a data transformation algorithm to convert user ratings of individual products into ratings of product types, and uses a similarity matching algorithm to match new users with the most similar existing users. Finally, the two are correlated and fed into the recommendation system, enabling the system to provide new users with the most accurate recommendation results possible. This system offers the following advantages:
[0042] 1. This invention proposes an optimized matching conversion mechanism that converts users' direct ratings of products into ratings of product types, thereby expanding the matching range and improving the system's operating speed.
[0043] 2. Compared with traditional recommendation system models, the similarity matching function used in this invention compares and correlates the ratings of similar product types among users, thus better finding the most similar existing users for new users and expanding the new user data to facilitate a more accurate analysis of new users' preferences.
[0044] 3. The solution to the user cold start problem provides an effective solution for the injection of new users into the system, giving the system a certain advantage in handling real industrial scenarios.
[0045] This invention discloses a user cold-start recommendation system and method based on similarity matching. It constructs a knowledge graph database based on users, products, and the interaction relationships between users and products, and performs standardization processing. The processed data is used to extract rating information and map it to a user preference type space, thereby determining each user's preferred product types. A similarity matching algorithm is designed to find the existing user with the highest similarity to the new user. The product type rating vectors of the existing user and the new user are processed through an association function and used as the basis for system recommendations, thus reasonably expanding the new user data and effectively alleviating the cold-start problem of the recommendation system. This invention solves the user cold-start problem caused by introducing new users into industrial recommendation systems. This invention can be used in industrial scenarios where recommendation systems effectively recommend product types that match the preferences of new users. Attached Figure Description
[0046] Figure 1 This is a schematic diagram of the user cold start process based on similarity matching;
[0047] Figure 2 This provides the basic framework for an intelligent recommendation system that handles user cold start based on similarity matching.
[0048] Figure 3 This is an example of a knowledge graph database. Detailed Implementation
[0049] The present invention will be further explained in detail below with reference to the accompanying drawings and embodiments. The specific embodiments described herein are only for explaining the present invention and are not intended to limit the present invention.
[0050] See Figures 1-3 This embodiment provides a user cold start recommendation method based on similarity matching, including the following steps:
[0051] Step 1: Obtain all existing user and product information to build a knowledge graph database.<U,R,I> , where U={u1,u2,u3,…u b …,u n}, where U represents the set of attributes for returning users, n represents the number of returning users, and u b Represents the attributes of the b-th existing user; I = {i1, i2, i3, ... i c …,i m}, where I represents the set of product attributes, m represents the quantity of products, and i c Represents the attributes of the c-th product; R = {r1, r2, r3, ... r} d …,r l} represents the set of interaction relationships between existing users and the product, l represents the number of interaction relationships, and r represents the number of interaction relationships. dThis represents the d-th interaction relationship; the constructed knowledge graph database contains multiple [relationships].<u,r,i> Triples existing in form;
[0052] Step 2: Extract the rating vector of a single existing user b for the product.
[0053] Step 3: Generate the product attribute relationship interaction vector for this existing user. That is, the product likes and dislikes rating vector;
[0054] Step 4: Generate the product attribute relationship interaction vector for all existing users, i.e., the product type like / dislike rating vector, according to Step 2 and Step 3.
[0055] Step 5: After the new user inputs their information, follow steps 1, 2, and 3 to obtain the new user's product type preference / dislike rating vector.
[0056] Step 6: Perform similarity matching between the new user's product type preference rating vector and the product type preference rating vectors of all existing users to obtain the existing user whose current preference rating is most similar to the new user's.
[0057] Step 7: Associate the new user's product type preference rating with the existing user to obtain the new user's similarity prediction;
[0058] Step 8: Use the factorization machine model to calculate the relevance between new users and products, and complete the product recommendation for new users.
[0059] In step 1, u b ={UserID,age,gender,professional,…};
[0060] i c ={ItemID,type1,type2,…,type g}, type g This represents the g-th product type, which includes the product's color, size, material, and function. All product type information is extracted and one-hot encoded to obtain the product type attribute vector. The type of this product is coded as 1, otherwise it is 0;
[0061] r d ={UserID,ItemID,score}.
[0062] In step 2, extract the product ratings of a single long-term user b and provide the rating vector for that user. in This represents the rating vector of long-time user b. This represents the rating given by existing user b to the m-th product. Within the product rating range of [1, 5].
[0063] Step 3 specifically includes the following steps:
[0064] S301: Establish a product rating space-user likes and dislikes discrimination space mapping rule, that is, the rating in the [1,5] rating range is proportionally mapped to the [-2,2] user likes and dislikes discrimination space. Negative scores represent that users are not interested in the product, and positive scores represent that users have some liking for the product.
[0065] S302: Map all rating records of old user b to the user preference space according to the product rating space-user preference space mapping rule, and generate the product preference rating vector for that user. in This represents the product preference / dislike rating vector for long-term user B. This represents the rating of old user b for the m-th product;
[0066] S303: Process the product type tags for each product rated by old user b according to the following formula.
[0067]
[0068] in, This represents the rating of long-term user b for the g-th product type, ultimately yielding a rating vector of long-term user b for each product type. in This represents the product type preference / dislike rating vector for long-term user b.
[0069] In step 4, obtain the product type preference / dislike rating vectors of all existing users. in, * It represents any one of all existing users b.
[0070] In step 5, after the new user data is input into the knowledge graph database, the product rating features of the new users are extracted and a rating vector is generated, following the steps in step 2. score x new This represents the new user's rating of the x-th product. Following step 3, we obtain the new user's like / dislike rating vector for each product type. This represents the new user's rating of their liking or disliking for the g-th product type.
[0071] In step 6, a similarity matching function is established to perform similarity matching between the new user's product type preference rating vector and the product type preference rating vectors of all existing users, thereby obtaining the existing users whose ratings are most similar to the new user's. The similarity matching function is shown in the following formula.
[0072]
[0073] Where, sim(u new (,u) represents the similarity between new users' product type preference ratings and existing users' product type preference ratings; a higher similarity score indicates greater similarity between the two. q This represents the q-th product type preference rating of existing users; This represents the new user's likes and dislikes rating for the q-th product type, and the user with the highest similarity value is the most similar existing user to the new user.
[0074] In step 7, the product type preference rating vector of new users is associated with that of their most similar existing users using the following formula:
[0075]
[0076] in, Represents the new user association vector. Let sim represent the product type like / dislike rating vector of the old user who is most similar to the new user, and let sim represent the similarity between the new user's product type like / dislike rating and the old user's product type like / dislike rating.
[0077] In step 8, the knowledge graph database is input into the factorization machine (FM) model. Based on the FM model, the recommendation results are predicted, and the relevance y between users and products is output. FM , relevance y FM The best products are recommended to users;
[0078]
[0079] in, ω0 is a constant term representing the global bias of the FM model. ω0 is adjusted based on actual conditions to correct the recommendation results. u ∈R represents the weight of the user rating. These are the second-order weight parameters of the FM model. A vector representing user ratings of a product. This represents the vector representing the rating a product receives.
[0080] This embodiment describes a user cold start recommendation system based on similarity matching, using a similarity matching-based user cold start recommendation method.
[0081] Numerous specific details are set forth in the specification provided herein. However, it will be understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this specification.
[0082] Similarly, it should be understood that, in order to streamline this disclosure and aid in understanding one or more of the various aspects of the invention, in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof. However, this method of disclosure should not be interpreted as reflecting an intention that the claimed invention requires more features than expressly recited in each claim. Rather, as reflected in the claims, inventive aspects lie in fewer than all features of a single foregoing disclosed embodiment. Therefore, the claims following the detailed description are hereby expressly incorporated into that detailed description, wherein each claim itself is a separate embodiment of the invention.
[0083] Those skilled in the art will understand that the modules, units, or groups of devices in the examples disclosed herein can be arranged in the device as described in this embodiment, or alternatively, can be located in one or more devices different from the device in this example. The modules in the foregoing examples can be combined into a single module or, in addition, can be divided into multiple sub-modules.
[0084] Those skilled in the art will understand that modules in the device of the embodiments can be adaptively changed and placed in one or more devices different from that embodiment. Modules, units, or groups in the embodiments can be combined into a single module, unit, or group, and further, they can be divided into multiple sub-modules, sub-units, or sub-groups. Except where at least some of such features and / or processes or units are mutually exclusive, any combination can be used to combine all features disclosed in this specification (including the accompanying claims, abstract, and drawings) and all processes or units of any method or device so disclosed. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract, and drawings) may be replaced by an alternative feature that serves the same, equivalent, or similar purpose.
[0085] Furthermore, those skilled in the art will understand that although some embodiments described herein include certain features but not others included in other embodiments, combinations of features from different embodiments are intended to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
[0086] Furthermore, some of the embodiments described herein are methods or combinations of method elements that can be implemented by a processor of a computer system or by other means of performing the functions. Therefore, a processor having the necessary instructions for implementing the methods or method elements forms means for implementing the methods or method elements. Furthermore, the elements described herein in the apparatus embodiments are examples of means for implementing the functions performed by elements for the purposes of carrying out the invention.
[0087] The various techniques described herein can be implemented in combination with hardware or software, or a combination thereof. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, can take the form of program code (i.e., instructions) embedded in a tangible medium, such as a floppy disk, CD-ROM, hard disk, or any other machine-readable storage medium, wherein when the program is loaded into and executed by a machine such as a computer, the machine becomes an apparatus for practicing the present invention.
[0088] When the program code is executed on a programmable computer, the computing device generally includes a processor, a processor-readable storage medium (including volatile and non-volatile memory and / or storage elements), at least one input device, and at least one output device. The memory is configured to store program code; the processor is configured to execute the method of the present invention according to instructions in the program code stored in the memory.
[0089] By way of example, and not limitation, computer-readable media include computer storage media and communication media. Computer storage media stores information such as computer-readable instructions, data structures, program modules, or other data. Communication media generally embodies computer-readable instructions, data structures, program modules, or other data in the form of modulated data signals such as carrier waves or other transmission mechanisms, and includes any information delivery medium. Any combination of the above is also included within the scope of computer-readable media.
[0090] As used herein, unless otherwise specified, the use of ordinal numbers such as “first,” “second,” “third,” etc., to describe ordinary objects merely indicates different instances of similar objects and is not intended to imply that the objects being described must have a given order in time, space, ordering, or any other manner.
[0091] Although the invention has been described with reference to a limited number of embodiments, those skilled in the art will understand from the foregoing description that other embodiments are conceivable within the scope of the invention described herein. Furthermore, it should be noted that the language used in this specification has been chosen primarily for readability and instructional purposes, and not for the purpose of interpreting or limiting the subject matter of the invention. Therefore, many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the appended claims. The disclosure of the invention is illustrative and not restrictive, and the scope of the invention is defined by the appended claims.
Claims
1. A user cold start recommendation method based on similarity matching, characterized in that, Includes the following steps: Step 1: Obtain all existing user and product information to build a knowledge graph database.<U,R,I> ,in, U represents the set of attributes for returning users, and n represents the number of returning users. This represents the attribute of the b-th existing user; Let I represent the set of product attributes, and m represent the quantity of products. This represents the attribute of the c-th product; This represents the set of interaction relationships between existing users and the product, where l represents the number of interaction relationships. This represents the d-th interaction relationship; the constructed knowledge graph database contains multiple [relationships].<u,r,i> Triples that exist in the form of a form; Step 2: Extract the rating vector of a single existing user b for the product. ; Step 3: Generate the product likes and dislikes rating vector for this long-time user. ; Step 4: Generate product type preference / dislike rating vectors for all existing users, following the steps 2 and 3. ; Step 5: After the new user inputs their information, follow steps 1, 2, and 3 to obtain the new user's product type preference / dislike rating vector. ; Step 6: Perform similarity matching between the new user's product type preference rating vector and the product type preference rating vectors of all existing users to obtain the existing user whose current preference rating is most similar to the new user's. Step 7: Associate the new user's product type preference rating with the existing user to obtain the new user's similarity prediction; Step 8: Use the factorization machine model to calculate the relevance between new users and products, and complete the product recommendation for new users; Step 3 specifically includes the following steps: S301: Establish a product rating space-user likes and dislikes discrimination space mapping rule, that is, the rating in the [1,5] rating range is proportionally mapped to the [-2,2] user likes and dislikes discrimination space. Negative scores represent that users are not interested in the product, and positive scores represent that users have some liking for the product. S302: [Regarding existing users] All rating records are mapped to the user preference space according to the product rating space-user preference space mapping rule, and a product preference rating vector for that user is generated. ,in This represents the product preference / dislike rating vector for long-term user B. This represents the rating of old user b for the m-th product.
2. The user cold start recommendation method based on similarity matching according to claim 1, characterized in that, In step 1, ; , This represents the g-th product type, which includes the product's color, size, material, and function. All product type information is extracted and one-hot encoded to obtain the product type attribute vector. The type of this product is coded as 1, otherwise it is 0; 。 3. The user cold start recommendation method based on similarity matching according to claim 2, characterized in that, In step 2, extract the product ratings of a single long-term user b and provide the rating vector for that user. ,in This represents the rating vector of long-time user b. This represents the rating given by existing user b to the m-th product. Within the product rating range of [1, 5].
4. The user cold start recommendation method based on similarity matching according to claim 3, characterized in that, Step 3 also includes the following steps: S303: Process the product type tags for each product rated by old user b according to the following formula. ; in, This represents the rating of long-term user b for the g-th product type, ultimately yielding a rating vector of long-term user b for each product type. ,in This represents the product type preference / dislike rating vector for long-term user b.
5. The user cold start recommendation method based on similarity matching according to claim 4, characterized in that, In step 4, obtain the product type preference / dislike rating vectors of all existing users. ,in, It represents any one of all existing users b.
6. The user cold start recommendation method based on similarity matching according to claim 5, characterized in that, In step 5, after the new user data is input into the knowledge graph database, the product rating features of the new users are extracted and a rating vector is generated, following the steps in step 2. , This represents the new user's rating of the x-th product. Following step 3, we obtain the new user's like / dislike rating vector for each product type. , This represents the new user's rating of their liking or disliking for the g-th product type.
7. The user cold start recommendation method based on similarity matching according to claim 6, characterized in that, In step 6, a similarity matching function is established to perform similarity matching between the new user's product type preference rating vector and the product type preference rating vectors of all existing users, thereby obtaining the existing users whose ratings are most similar to the new user's. The similarity matching function is shown in the following formula. ; in, This indicates the similarity between new users' product type preference ratings and those of existing users; the higher the similarity, the more similar the two are. This represents the q-th product type preference rating of existing users; This represents the new user's likes and dislikes rating for the q-th product type, and the user with the highest similarity value is the most similar existing user to the new user.
8. The user cold start recommendation method based on similarity matching according to claim 7, characterized in that, In step 7, the product type preference rating vector of new users is associated with that of their most similar existing users using the following formula: ; in, Represents the new user association vector. This represents a vector of likes and dislikes for product types from existing users who are most similar to new users. This indicates the similarity between new users' likes and dislikes ratings for product types and those of existing users.
9. A user cold start recommendation method based on similarity matching according to claim 8, characterized in that, In step 8, the knowledge graph database is input into the factorization machine (FM) model. Based on the FM model, the recommendation results are predicted, and the relevance between users and products is output. Relevance The best products are recommended to users; ; in, The constant term represents the global bias of the FM model. Adjustments will be made based on the actual situation to correct the recommendations. Weighting of user ratings These are the second-order weight parameters of the FM model. , A vector representing user ratings of a product. , This represents the vector representing the rating a product receives.
10. A user cold start recommendation system based on similarity matching, employing the user cold start recommendation method based on similarity matching according to any one of claims 1 to 9.