Item recommendation method and device, computer device, and storage medium
By acquiring users' historical click data and calculating and optimizing item similarity, the problem of high complexity in existing item recommendation algorithms is solved, achieving more efficient item recommendation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA PING AN LIFE INSURANCE CO LTD
- Filing Date
- 2022-11-14
- Publication Date
- 2026-06-26
Smart Images

Figure CN115659051B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence, and in particular discloses a method, apparatus, computer device, and storage medium for recommending items. Background Technology
[0002] In product recommendation, there are many existing recommendation algorithms. Among them, the ItemCF algorithm is a classic collaborative filtering algorithm. Its implementation mainly includes two processes: 1. Calculate the similarity between items; 2. Generate a recommendation list for the user based on the similarity of items and the user's historical behavior. However, the existing ItemCF algorithm has shortcomings, including: In the first step, when calculating the similarity sim between items, the number of items is N, which will eventually form an N*N similarity matrix or hash table. That is, each item A will correspond to an itemList of size N, with a space complexity of O(N*N); In the second step, when obtaining the sequence of items clicked by each user in the historical time (the number of items is M), it is necessary to traverse each item B in the itemList and obtain the N similar items corresponding to item B to form a recall list of size M*N, with a space complexity of O(M*N); Finally, the items in the recall list are sorted in reverse order according to the similarity sim to obtain the top K similar items, which are then used for product recommendation. At this time, the time complexity is O(MNlogMN).
[0003] It is evident that existing recommendation algorithms have high time and space complexity, which affects the efficiency of item recommendation. Therefore, those skilled in the art urgently need to find a new technical solution to address these issues. Summary of the Invention
[0004] Therefore, it is necessary to provide a method, apparatus, computer equipment, and storage medium for recommending items to address the aforementioned technical problems, thereby reducing the time and space complexity of the recommendation process and improving the operational efficiency of item recommendation.
[0005] A method for recommending items, the method comprising:
[0006] Retrieve historical click data, including user clicks on items;
[0007] The similarity between any two clicked items is calculated based on the historical click data; each pair of clicked items includes a first item and a second item.
[0008] The items with the aforementioned similarity are optimized using a preset optimization algorithm, and the second item that corresponds to the first item and meets the preset similarity is selected.
[0009] Obtain the click data corresponding to the first item selected by the user, and recommend a second item corresponding to the first item to the user based on the click data.
[0010] An item recommendation device, the device comprising:
[0011] The acquisition module is used to acquire historical click data containing user clicks on items;
[0012] The calculation module is used to calculate the similarity between every two clicked items based on the historical click data; every two clicked items include a first item and a second item;
[0013] The filtering module is used to optimize each item with the aforementioned similarity using a preset optimization algorithm, and to filter out the second item that corresponds to the first item and satisfies the preset similarity.
[0014] The recommendation module is used to obtain the click data corresponding to the first item selected by the user, and recommend a second item corresponding to the first item to the user based on the click data.
[0015] A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the aforementioned method for recommending items.
[0016] A computer-readable storage medium storing a computer program that, when executed by a processor, implements the aforementioned item recommendation method.
[0017] The aforementioned item recommendation method, apparatus, computer equipment, and storage medium recall the most similar items for users. This not only effectively reduces the time and space complexity of existing recall algorithms—for example, reducing the time complexity from O(MNlogMN) to O(MKlogK) and the space complexity from O(M*N) to O(M*K), where K is the total number of items recalled, M is the total number of click sequences for each user, and N is the total number of items clicked by the user, with K being much smaller than M—but also improves the running efficiency of current recall algorithms. Attached Figure Description
[0018] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0019] Figure 1 This is a schematic diagram of an application environment for an item recommendation method according to an embodiment of the present invention;
[0020] Figure 2 This is a flowchart illustrating an item recommendation method according to an embodiment of the present invention;
[0021] Figure 3 This is a schematic diagram of the structure of an item recommendation device according to an embodiment of the present invention;
[0022] Figure 4 This is a schematic diagram of a preset data structure for an item recommendation method according to an embodiment of the present invention;
[0023] Figure 5 This is a schematic diagram of a computer device according to an embodiment of the present invention. Detailed Implementation
[0024] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0025] The present invention provides a method for recommending items, which can be applied to, for example... Figure 1 In this application environment, the client communicates with the server via a network. The client can be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server can be a standalone server or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDNs), and big data and artificial intelligence platforms. In one embodiment, as... Figure 2 As shown, a video data acquisition method is provided, which is applied to... Figure 1 Taking the server as an example, the following steps S10-S40 are used for illustration:
[0026] S10, Obtain historical click data including user clicks on items;
[0027] Understandably, historical click data is data collected through a user-pre-set tracking table. It can be the click information of each user clicking on items within a preset time period. Here, "item" refers to the data of items clicked by the user on the page. The item data can be displayed in a module of the page dimension, such as the item data displayed in a button module of a certain page. Click information refers to the information formed after a user clicks on an item. The historical click data formed by the items clicked by the user can be regarded as data in a two-dimensional data dimension, which can be represented by (user_id, item_id).
[0028] S20, calculate the similarity between every two clicked items based on the historical click data; every two clicked items include a first item and a second item;
[0029] Understandably, historical click data contains data on items clicked by multiple users. Therefore, an item set (represented by `itemList`) can be formed using the item data clicked by each user. Since each item set contains at least two clicked items, the similarity between any two items clicked simultaneously by each user can be calculated using a preset similarity calculation formula. The first item and the second item refer to items clicked by the user within a preset time threshold (i.e., clicked simultaneously). In one embodiment, the first item can refer to one of the clicked items, i, and the second item can refer to one of the clicked items, j. In another embodiment, the first item refers to one of the clicked items, i.e., j. The items referred to by "Item 1" and "Item 2" can be interchanged. However, for "Item 1" and "Item 2" to form a corresponding relationship, there must be a simultaneous click relationship between them. For example, as mentioned above, items i and j can form a corresponding relationship. However, for items f that are not clicked simultaneously with items i and j, items f cannot form a simultaneous click relationship with items i and j. Instead, items f can only form a simultaneous click relationship with items g that are clicked simultaneously. In this case, items g and f are respectively the first item and the second item. In addition, after calculating the similarity between every two items in the item set, a similarity matrix of items is formed. This similarity matrix is used as a whole for subsequent optimization processing.
[0030] S30, optimize each item with the aforementioned similarity using a preset optimization algorithm, and select the second item that corresponds to the first item and satisfies the preset similarity.
[0031] Understandably, the preset optimization algorithm refers to the algorithm used to optimize the similarity matrix composed of various similar items. The optimized result will be used in the subsequent recall algorithm. It mainly uses a preset data structure to filter the items corresponding to each node and obtain the filtered second items corresponding to the first item. The preset similarity refers to the highest similarity specified in the preset optimization algorithm. The number of second items corresponding to the first item filtered by the preset similarity is the top K.
[0032] S40, obtain the click data corresponding to the first item selected by the user, and recommend a second item corresponding to the first item to the user based on the click data.
[0033] Understandably, the click data corresponding to the first item selected by the user is the data of items in the user's click sequence. Here, the click sequence refers to the items that the user clicks on the page in order. At the same time, the click sequence does not limit the number of items clicked. Thus, even if there is only one clicked item, a click sequence can be formed, which includes a sequence formed by an item being clicked by the user once or the same item being clicked multiple times by the same user. After the user clicks on an item, the click data is generated, and the top K most similar items can be recommended to the user through the above-mentioned optimized algorithm. In this embodiment, the second item is recalled by the recall algorithm (item recommendation algorithm), that is, the recommended item. In another embodiment, the click data corresponding to the second item selected by the user is obtained, and the recommended item is the first item.
[0034] In the embodiments of steps S10 to S40, based on artificial intelligence, the similarity matching in the existing recall algorithm is improved to recall the items with the highest similarity to the user. This not only effectively reduces the time and space complexity of the existing recall algorithm, such as reducing the time complexity of the recall algorithm from O(MNlogMN) to O(MKlogK) and the space complexity from O(M*N) to O(M*K), where K is the total number of items recalled, M is the total number of click sequences for each user, N is the total number of items clicked by the user, and K is much smaller than M, but also further improves the running efficiency of the recall algorithm, that is, it can further improve the recommendation efficiency of the recall algorithm.
[0035] Furthermore, the step of calculating the similarity between every two clicked items based on the historical click data includes:
[0036] The items clicked by each user in the historical click data are grouped into an item set;
[0037] The frequency of each item in the item set is recorded using a preset first hash table;
[0038] The number of times each pair of items in the item set is clicked together within a preset time threshold is recorded using a preset second hash table;
[0039] The similarity between each pair of items corresponding to each user is calculated using a preset similarity calculation formula. The preset similarity calculation formula is D=A / (B*C), where D is the similarity between each pair of items, A is the number of times each pair of items is clicked together within a preset time threshold, B is the number of times the first item is clicked, and C is the number of times the second item is clicked.
[0040] Understandably, a hash table is a data structure that allows direct access based on a key value. In other words, it accesses records by mapping key values to a location in the table, thus speeding up the search. A first hash table is used to record the frequency of each item in the item set: a hash table cnt is created to record the frequency of each item (the following examples use i or j as examples, but there are no specific limitations). Let cnt[i] represent the first item i and cnt[j] represent the second item j, i.e., cnt[item] += 1. Here, cnt[item] can represent the number of times the same user clicks the same item; each click counts as one click. A second hash table is used to record the frequency of each item in the item set. The number of times each pair of items is clicked together within a preset time threshold is calculated as follows: A hash table `mp` can be created to record the collinearity between items. Let `mp[i][j]` represent the number of times the first item `i` and the second item `j` are clicked together, or the number of users who clicked the first item `i` and the second item `j` simultaneously, i.e., `mp[i][j] += 1`. The number of times items are clicked together means that the first item `i` and the second item `j` are clicked by the same user simultaneously; each click counts as one click. The preset similarity calculation formula is: D = A / (B * C) = Similarity between each pair of items = Number of times each pair of items is clicked together within a preset time threshold / (Number of times the first item appears * Number of times the second item appears). This can be further expressed mathematically as: ,in, This represents the similarity between any two items, that is, the similarity between the first item and the second item. Other parameters are described above.
[0041] Furthermore, the step of recording the number of times each pair of items in the item set is clicked together within a preset time threshold using a preset second hash table includes:
[0042] The initial number of times each pair of items is clicked together within a preset time threshold is recorded using a preset second hash table. This initial number of clicks is then corrected using a preset correction formula to obtain the total number of clicks each pair of items is clicked together within the preset time threshold. The preset correction formula is G = ... G is the number of times each pair of items is clicked together within a preset time threshold, E is the initial number of times each pair of items is clicked together within a preset time threshold, and F is the number of items in the item set.
[0043] Understandably, the pre-defined correction formula =G= =Number of times each pair of items is clicked together within a preset time threshold =Initial number of times each pair of items is clicked together within a preset time threshold + 1 / log(1 + number of items in the item set), which can be further expressed mathematically as mp[i][j] += The initial number of times each pair of items is clicked together within a preset time threshold is equivalent to the initial number of times each pair of items is clicked simultaneously. Similarly, the number of times each pair of items is clicked together within a preset time threshold is also calculated. The reason for using the preset correction formula in this embodiment is that the contribution of active users to the similarity between items should be less than that of inactive users. In this case, a penalty for active users needs to be included. Therefore, this embodiment finally corrects the similarity calculation formula between each pair of items by adding the IUF (Inverse User Frequence) parameter.
[0044] Further, the step of optimizing items of each similarity level using a preset optimization algorithm and selecting the second item that corresponds to the first item and satisfies the preset similarity includes:
[0045] Create a preset data structure and add the similarity between each pair of items as a node to the data structure;
[0046] Determine whether the total number of nodes in the data structure exceeds a preset threshold.
[0047] If the number of nodes exceeds a preset threshold, the first node in the data structure is deleted, and the root node in the data structure is used to replace the original first node as the new first node.
[0048] Determine the similarity between the new first and second nodes and the similarity between the new first and second nodes and other child nodes connected by the new first and second nodes. Replace the connection order of the new first and second nodes and other child nodes according to the similarity. Select the second item that corresponds to the first item and meets the preset similarity.
[0049] Understandably, the preset data structure refers to the structure built by the heap. The heap is a complete binary tree, divided into max-heap and min-heap. As the name suggests, the value of the root node of the max-heap is the maximum value in the heap, and the value of the parent node of each subtree is greater than or equal to the value of the child node. The min-heap is the opposite, the value of the root node is the minimum value in the heap, and the value of the parent node of each subtree is less than or equal to the value of the child node. Since the higher the similarity value, the more similar the two items are, this embodiment uses a min-heap. More specifically, when the number of nodes in the heap exceeds K, a pop operation is performed to pop the item with the smallest similarity at the top of the heap, and retain the K items with the largest similarity. The time complexity of building the heap is O(K), and the time complexity of deleting the top element and adding an element is O(logK), which is the height of the heap.
[0050] Specifically, a heap data structure is created. After calculating the similarity sim[i][j] between the first item i and the second item j, (sim[i][j], j) is added as a tuple as a new node to the end of the heap. When the first item is j and the second item is i, the tuple is (sim[j][i], i), where i and j are symmetrical, and the similarity sim[i][j] = sim[j][i]. When the size of the heap exceeds a preset threshold K, the top element of the heap (i.e., the similarity corresponding to the first and second nodes) is deleted. The last element of the heap (the similarity corresponding to the root node) is assigned to the top element, and then the last element is deleted. Since heaps are generally implemented using arrays, deleting the first element of the array will shift the subsequent elements in the heap forward, so the current time complexity is O(N), while the time complexity of deleting the last element is much smaller. The time complexity is O(1), so in this embodiment, the first element is assigned the value of the last element and the last element is deleted. It is determined whether the number of all nodes in the data structure exceeds the preset number threshold, where the number threshold can be preset to K, which represents the size of the heap. The new top element of the heap (the similarity corresponding to the new first and second nodes) is compared with the elements of its connected child nodes (the similarity corresponding to the child nodes). The connection order of the new first and second nodes and other child nodes is changed according to the similarity. The items in the nodes left in the heap are the K second items j with the highest similarity to the first item i (equivalent to the top K). That is, the nodes of the second items corresponding to the nodes of the first item and satisfying the preset similarity are selected. Finally, the K second items j corresponding to the first item i are stored in the created hash table dic, i.e., dic[i]=heap, which is used for subsequent recall algorithms.
[0051] More specifically, since one of the two child nodes connected by the new first and second node must be a child node with the second largest value in the heap, the child node containing that value is selected and swapped with the new first and second node. Then, the swapped child node is compared with the child node again, and this process is recursively executed until a new heap is formed. That is, the final heap consists of the nodes corresponding to the nodes containing the first item and satisfying a preset similarity. Figure 4 For example, the original first node 1 is replaced by the root node 3, and the root node 3 becomes the new first node. After comparing the new first node 3 with its connected child node 2, the new first node 3 and child node 2 are swapped, and child node 2 becomes the new first node. Here, the value in the node represents the similarity, and the child node with the second largest value is child node 2.
[0052] To understand it further, each node in a heap contains a value. The value being compared is the value sim of the first element in the tuple, which is the similarity. Since it is a min-heap, the sim of the first element in the tuple of the top element of the heap must be the smallest, that is, the second item j least similar to the first item i. And the size of the heap is set to K, so the items finally selected are the K second items j most similar to item i. The purpose of using tuples is: after obtaining the tuples corresponding to the K second items j most similar to the first item i using the heap, such as the obtained tuples [(sim1,j1),(sim2,j2),(sim3,j3)], the second items [j1,j2,j3] are extracted and used for recall during the recall process.
[0053] In this embodiment, the time complexity of calculating the K most similar second items j corresponding to the first item i is O(NlogK), while the time complexity without using the heap is O(N). Although the time complexity of using the heap for preprocessing in this embodiment is slightly higher than that without using the heap, this time is negligible compared to the huge benefits brought by heap preprocessing in the recall, and the space complexity can be reduced from O(N) to O(K).
[0054] Furthermore, after determining whether the number of all nodes in the data structure exceeds a preset threshold, the method further includes:
[0055] If the number of nodes does not exceed a preset threshold, the similarity between each pair of items is added as a node to the root node of the data structure.
[0056] Determine the similarity between the root node and the similarity between other child nodes connected to the root node, change the connection order of the root node and other child nodes according to the similarity, and filter out the second item that corresponds to the first item and meets the preset similarity.
[0057] Understandably, when the number of nodes does not exceed the preset threshold, it means there is no need to delete the top element of the heap. In this case, we only need to add the similarity between each pair of items as a node to the root node in the data structure, and change the connection order of the root node and other child nodes according to the similarity. Finally, the items in the nodes left in the heap are the top K second items j with the highest similarity to the first item i. That is, we select the node containing the second item that corresponds to the node containing the first item and satisfies the preset similarity.
[0058] Furthermore, the acquisition includes historical click data of user clicks on items; the historical click data includes item click information, including:
[0059] Pre-acquire user item click events and set up a tracking table based on the item click events;
[0060] Based on the aforementioned tracking data table, historical click data of users clicking on items within a preset time period is obtained.
[0061] Understandably, item click events can be click events set according to user needs, such as the click event of the module where object i is located, or the click event of the module where item j is located; the tracking table can be designed based on item click events to collect historical click data of users on items on the page over a period of time.
[0062] In summary, the above provides an item recommendation method based on artificial intelligence, which recalls the items with the highest similarity match for users. This method not only effectively reduces the time and space complexity of existing recall algorithms—for example, reducing the time complexity from O(MNlogMN) to O(MKlogK) and the space complexity from O(M*N) to O(M*K), where K is the total number of items recalled, M is the total number of click sequences for each user, and N is the total number of items clicked by the user (with K being much smaller than M)—but also improves the running efficiency of the recall algorithm. This approach can further improve the recommendation efficiency of the recall algorithm. Furthermore, it is simple to implement, utilizing Python's heapq library directly or a min-heap implemented using an array to calculate the top K items with the highest similarity. Additionally, this method is highly scalable; the heap-optimized algorithm can be applied not only to existing itemCF recall algorithms but also to various item similarity-based recall algorithms, such as embedding recall and maximum likelihood recall, thereby effectively improving the running efficiency of the recall algorithm and consequently, its recommendation efficiency.
[0063] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
[0064] In one embodiment, the present invention also provides an item recommendation device, which corresponds one-to-one with the item recommendation method described in the above embodiments. For example... Figure 3 As shown, this item recommendation device includes an acquisition module 11, a calculation module 12, a filtering module 13, and a recommendation module 14. Detailed descriptions of each functional module are as follows:
[0065] Module 11 is used to acquire historical click data containing user clicks on items;
[0066] Calculation module 12 is used to calculate the similarity between every two clicked items based on the historical click data; every two clicked items include a first item and a second item;
[0067] The filtering module 13 is used to optimize each item with the aforementioned similarity using a preset optimization algorithm, and to filter out the second item that corresponds to the first item and satisfies the preset similarity.
[0068] Recommendation module 14 is used to obtain click data corresponding to the first item selected by the user, and recommend a second item corresponding to the first item to the user based on the click data.
[0069] Furthermore, the computing module includes:
[0070] The collection submodule is used to collect each item clicked by the user in the historical click data into an item collection;
[0071] The first recording submodule is used to record the number of times each item appears in the item set through a preset first hash table;
[0072] The second recording submodule is used to record the number of times each pair of items in the item set is clicked together within a preset time threshold using a preset second hash table;
[0073] The calculation submodule is used to calculate the similarity between each pair of items corresponding to each user using a preset similarity calculation formula; wherein, the preset similarity calculation formula is D=A / (B*C), where D is the similarity between each pair of items, A is the number of times each pair of items is clicked together within a preset time threshold, B is the number of times the first item is clicked, and C is the number of times the second item is clicked.
[0074] Furthermore, the second recording submodule includes:
[0075] The correction unit is used to record the initial number of times each pair of items is clicked together within a preset time threshold using a preset second hash table, and then corrects the initial number of times each pair of items is clicked together within the preset time threshold using a preset correction formula, thus obtaining the total number of times each pair of items is clicked together within the preset time threshold; wherein, the preset correction formula is G= G is the number of times each pair of items is clicked together within a preset time threshold, E is the initial number of times each pair of items is clicked together within a preset time threshold, and F is the number of items in the item set.
[0076] Furthermore, the filtering module includes:
[0077] Create a submodule to create a preset data structure, and add the similarity between each pair of items as a node to the data structure;
[0078] The judgment submodule is used to determine whether the number of all nodes in the data structure exceeds a preset number threshold.
[0079] The replacement submodule is used to delete the first and second nodes in the data structure if the number of nodes exceeds a preset threshold, and replace the original first and second nodes with the root node in the data structure as the new first and second nodes.
[0080] The first filtering submodule is used to determine the similarity between the new first and second nodes and the similarity between the new first and second nodes and other child nodes connected by the new first and second nodes, change the connection order of the new first and second nodes and other child nodes according to the similarity, and filter out the second items that correspond to the first items and meet the preset similarity.
[0081] Furthermore, the judgment submodule also includes:
[0082] A submodule is added to the root node of the data structure if the number of nodes does not exceed a preset threshold.
[0083] The second filtering submodule is used to determine the similarity between the root node and the similarity between other child nodes connected to the root node, change the connection order of the root node and other child nodes according to the similarity, and filter out the second item that corresponds to the first item and satisfies the preset similarity.
[0084] Furthermore, the acquisition module includes:
[0085] The settings submodule is used to pre-acquire user item click events and set up the tracking table based on the item click events.
[0086] The acquisition submodule is used to acquire historical click data of users clicking on items within a preset time period based on the tracking table.
[0087] For specific limitations regarding an item recommendation device, please refer to the limitations regarding an item recommendation method described above, which will not be repeated here. Each module in the aforementioned item recommendation device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device in hardware form, or stored in the memory of a computer device in software form, so that the processor can call and execute the operations corresponding to each module.
[0088] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 5 As shown, the computer device includes a processor, memory, interface, and database connected via a system bus. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database stores data involved in an item recommendation method. The interface connects and communicates with external terminals. The computer program, when executed by the processor, implements an item recommendation method.
[0089] In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the steps of an item recommendation method as described in the above embodiment, for example... Figure 2 Steps S10 to S40 are shown. Alternatively, when the processor executes the computer program, it implements the functions of each module / unit of the item recommendation device in the above embodiments, for example... Figure 3 The functions of modules 11 to 14 are shown. To avoid repetition, they will not be described again here.
[0090] In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored. When executed by a processor, the computer program implements the steps of an item recommendation method as described in the above embodiments, for example... Figure 2 Steps S10 to S40 are shown. Alternatively, when a computer program is executed by a processor, it implements the functions of each module / unit of the item recommendation device in the above embodiments, for example... Figure 3The functions of modules 11 to 14 are shown. To avoid repetition, they will not be described again here.
[0091] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided by this invention can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), Rambus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
[0092] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is used as an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above.
[0093] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.
Claims
1. A method for recommending items, characterized in that, The method includes: Retrieve historical click data, including user clicks on items; The similarity between any two clicked items is calculated based on the historical click data; each pair of clicked items includes a first item and a second item. The items with the aforementioned similarity are optimized using a preset optimization algorithm, and the second item that corresponds to the first item and meets the preset similarity is selected. Obtain the click data corresponding to the first item selected by the user, and recommend a second item corresponding to the first item to the user based on the click data; The calculation of the similarity between every two clicked items based on the historical click data includes: The items clicked by each user in the historical click data are grouped into an item set; The frequency of each item in the item set is recorded using a preset first hash table; The number of times each pair of items in the item set is clicked together within a preset time threshold is recorded using a preset second hash table; The similarity between each pair of items corresponding to each user is calculated using a preset similarity calculation formula. The preset similarity calculation formula is D=A / (B*C), where D is the similarity between each pair of items, A is the number of times each pair of items is clicked together within a preset time threshold, B is the number of times the first item is clicked, and C is the number of times the second item is clicked. The step of recording the number of times each pair of items in the item set is clicked together within a preset time threshold using a preset second hash table includes: The initial number of times each pair of items is clicked together within a preset time threshold is recorded using a preset second hash table. This initial number of clicks is then corrected using a preset correction formula to obtain the total number of clicks each pair of items is clicked together within the preset time threshold. The preset correction formula is G = ... G is the number of times each pair of items is clicked together within a preset time threshold, E is the initial number of times each pair of items is clicked together within a preset time threshold, and F is the number of items in the item set. The method further includes: A heap data structure is created. After calculating the similarity between the first and second items, the similarity is added as a tuple to the end of the heap as a new node. When the size of the heap exceeds a preset threshold K, the similarity corresponding to the first and second nodes is deleted, the last element in the heap is assigned to the top element, and then the last element is deleted. It is determined whether the number of all nodes in the data structure exceeds a preset threshold K, which represents the size of the heap. The new top element of the heap is compared with the elements of its connected child nodes, and the connection order of the new first and second nodes and other child nodes is changed according to the similarity. The items in the nodes remaining in the heap are the K second items with the highest similarity to the first item. Finally, the K second items corresponding to the first item are stored through a hash function.
2. The item recommendation method according to claim 1, characterized in that, The step of optimizing items of each similarity level using a preset optimization algorithm and selecting the second item that corresponds to the first item and meets the preset similarity score includes: Create a preset data structure and add the similarity between each pair of items as a node to the data structure; Determine whether the total number of nodes in the data structure exceeds a preset threshold. If the number of nodes exceeds a preset threshold, the first node in the data structure is deleted, and the root node in the data structure is used to replace the original first node as the new first node. Determine the similarity between the new first and second nodes and the similarity between the new first and second nodes and other child nodes connected by the new first and second nodes. Replace the connection order of the new first and second nodes and other child nodes according to the similarity. Select the second item that corresponds to the first item and meets the preset similarity.
3. The item recommendation method according to claim 2, characterized in that, After determining whether the number of all nodes in the data structure exceeds a preset threshold, the method further includes: If the number of nodes does not exceed a preset threshold, the similarity between each pair of items is added as a node to the root node of the data structure. Determine the similarity between the root node and the similarity between other child nodes connected to the root node, change the connection order of the root node and other child nodes according to the similarity, and filter out the second item that corresponds to the first item and meets the preset similarity.
4. The item recommendation method according to claim 1, characterized in that, The acquisition of historical click data containing user clicks on items includes: Pre-acquire user item click events and set up a tracking table based on the item click events; Based on the aforementioned tracking data table, historical click data of users clicking on items within a preset time period is obtained.
5. An item recommendation device, characterized in that, The device includes: The acquisition module is used to acquire historical click data containing user clicks on items; The calculation module is used to calculate the similarity between every two clicked items based on the historical click data; every two clicked items include a first item and a second item; The filtering module is used to optimize each item with the aforementioned similarity using a preset optimization algorithm, and to filter out the second item that corresponds to the first item and satisfies the preset similarity. The recommendation module is used to obtain the click data corresponding to the first item selected by the user, and recommend a second item corresponding to the first item to the user based on the click data; The computing module includes: The collection submodule is used to collect each item clicked by the user in the historical click data into an item collection; The first recording submodule is used to record the number of times each item appears in the item set through a preset first hash table; The second recording submodule is used to record the number of times each pair of items in the item set is clicked together within a preset time threshold using a preset second hash table; The calculation submodule is used to calculate the similarity between each pair of items corresponding to each user using a preset similarity calculation formula; wherein, the preset similarity calculation formula is D=A / (B*C), where D is the similarity between each pair of items, A is the number of times each pair of items is clicked together within a preset time threshold, B is the number of times the first item is clicked, and C is the number of times the second item is clicked. The second recording submodule includes: The correction unit is used to record the initial number of times each pair of items is clicked together within a preset time threshold using a preset second hash table, and then corrects the initial number of times each pair of items is clicked together within the preset time threshold using a preset correction formula, thus obtaining the total number of times each pair of items is clicked together within the preset time threshold; wherein, the preset correction formula is G= G is the number of times each pair of items is clicked together within a preset time threshold, E is the initial number of times each pair of items is clicked together within a preset time threshold, and F is the number of items in the item set. The recommended item device also includes: A heap data structure is created. After calculating the similarity between the first and second items, the similarity is added as a tuple to the end of the heap as a new node. When the size of the heap exceeds a preset threshold K, the similarity corresponding to the first and second nodes is deleted, the last element in the heap is assigned to the top element, and then the last element is deleted. It is determined whether the number of all nodes in the data structure exceeds a preset threshold K, which represents the size of the heap. The new top element of the heap is compared with the elements of its connected child nodes, and the connection order of the new first and second nodes and other child nodes is changed according to the similarity. The items in the nodes remaining in the heap are the K second items with the highest similarity to the first item. Finally, the K second items corresponding to the first item are stored through a hash function.
6. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements an item recommendation method as described in any one of claims 1 to 4.
7. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements an item recommendation method as described in any one of claims 1 to 4.