A method of operating a prediction and related apparatus

By using a first feature extraction network to extract unbiased features in the recommendation system and fusing them with scene-related features, combined with gradient orthogonalization, the problem of insufficient capture of user behavior features in different scenarios is solved, achieving higher prediction accuracy and resource efficiency.

CN115237732BActive Publication Date: 2026-06-30HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2022-06-30
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing recommendation systems fail to effectively capture user behavior features in different scenarios, resulting in poor accuracy in predicting operational information, and modeling each scenario independently consumes a lot of resources.

Method used

Unbiased features are extracted through the first feature extraction network and fused with scene-related features. The gradient is then updated using orthogonalization to improve the model's generalization ability.

Benefits of technology

It improves the accuracy of operational information prediction, reduces resource consumption, and enhances the model's generalization ability in different scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115237732B_ABST
    Figure CN115237732B_ABST
Patent Text Reader

Abstract

An operation prediction method can be applied to the field of artificial intelligence, and the method comprises: obtaining first embedding representation and second embedding representation of attribute information of a user and an article through a first feature extraction network and a second feature extraction network respectively, the first embedding representation being a feature irrelevant to recommendation scene information, and the second embedding representation being a feature related to a target recommendation scene; the first embedding representation and the second embedding representation are used for fusion to obtain fused embedding representation, and target operation information of the user to the article is predicted according to the fused embedding representation. The fused feature obtained by the application can represent the behavior characteristics specific to the user in each scene, and can also represent the behavior characteristics specific to the user between different scenes, thereby improving the prediction accuracy of subsequent operation information prediction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence, and more particularly to an operation prediction method and related apparatus. Background Technology

[0002] Artificial intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess the functions of perception, reasoning, and decision-making.

[0003] Click-through rate (CTR) prediction refers to predicting the probability of a user selecting an item under specific circumstances. For example, CTR prediction plays a crucial role in recommendation systems for applications such as app stores and online advertising. Through CTR prediction, businesses can maximize revenue and improve user satisfaction. Recommendation systems need to consider both the user's selection rate and the item's bid. The selection rate is predicted by the recommendation system based on the user's historical behavior, while the item's bid represents the system's revenue after the item is selected / downloaded. For instance, a function can be constructed that calculates a value based on the predicted user selection rate and item bid, and the recommendation system can then sort items in descending order according to this function value.

[0004] To meet users' personalized needs, recommendation systems include various recommendation scenarios: browsers, the negative one screen, video streams, etc. Users exhibit different behaviors in different scenarios based on their preferences, and each scenario has user-specific behavioral characteristics. Typically, each scenario is modeled independently. However, modeling a single scenario independently is insufficient because the same user behaves differently in different scenarios, making it difficult to effectively capture the user's behavioral characteristics across different scenarios. Furthermore, when there are many scenarios, independently modeling and maintaining each scenario results in significant manpower and resource consumption. Moreover, if a single feature extraction network is used to extract features from multiple scenarios, the network cannot learn common behavioral characteristics due to the differences in features between scenarios, leading to poor prediction accuracy of action information. Summary of the Invention

[0005] This application can extract common features (i.e., scene-independent features) in different scenarios through a first feature extraction network, and use the fusion result of these features and scene-related features to predict operation information, thereby improving the prediction accuracy of operation information.

[0006] In a first aspect, this application provides an operation prediction method, the method comprising: acquiring attribute information of users and items in a target recommendation scenario; obtaining a first embedding representation and a second embedding representation based on the attribute information by passing a first feature extraction network and a second feature extraction network respectively, wherein the first embedding representation is a feature unrelated to the recommendation scenario information and the second embedding representation is a feature related to the target recommendation scenario; fusing the first embedding representation and the second embedding representation to obtain a fused embedding representation (e.g., by matrix multiplication or other fusion methods); and predicting the user's target operation information on the item based on the fused embedding representation.

[0007] The first feature extraction network extracts features unrelated to the recommendation scenario information, namely the unbiased feature representation of each scenario, and fuses them with features related to the recommendation scenario information (which can be called scenario representation in this application embodiment). This can represent the user-specific behavioral characteristics of each scenario, as well as the user-specific behavioral characteristics between different scenarios, thereby improving the prediction accuracy of subsequent operation information prediction.

[0008] In one possible implementation, the attribute information includes the user's operation data in the target recommendation scenario, and the operation data also includes the user's first operation information on the item;

[0009] The method further includes: predicting second operation information of a user on an item using a first neural network based on attribute information; predicting a first recommended scenario for the operation data using a second neural network based on attribute information; wherein the difference between the first operation information and the second operation information, and the difference between the first recommended scenario and the target recommended scenario, are used to determine a first loss; based on the first loss, orthogonalizing the gradient corresponding to the third feature extraction network in the first neural network and the gradient corresponding to the fourth feature extraction network in the second neural network to obtain a first gradient corresponding to the initial feature extraction network; and updating the third feature extraction network based on the first gradient to obtain the first feature extraction network.

[0010] The approach of this application embodiment is as follows: By training the third feature extraction network, the trained third feature extraction network can identify unbiased feature representations shared across various scenarios. This application embodiment uses a second neural network. Since the second neural network is used to identify the recommendation scenario where the operation data is located, the embedding representation obtained based on the fourth feature extraction network in the second neural network can carry semantic information strongly related to the recommendation scenario. This semantic information strongly related to the recommendation scenario is not needed in the unbiased feature representation. Therefore, in order for the third feature extraction network to have the ability to identify embedding representations that do not have semantic information strongly related to the recommendation scenario (which this application embodiment can call specariore representation), in this application embodiment, when determining the gradients used to update the third and fourth feature extraction networks, the gradients of the third and fourth feature extraction networks are orthogonalized. Orthogonalization can constrain the gradient directions (i.e., the parameter update directions) of the third and fourth feature extraction networks to be mutually orthogonal or nearly mutually orthogonal. This allows the embedding representations extracted by the third and fourth feature extraction networks to possess different information, achieving separation of embedding representations. Since the second neural network has excellent ability to distinguish recommendation scenarios from operational data, the updated fourth feature extraction network's extracted embedding representation has semantic information strongly correlated with the recommendation scenario. Furthermore, the first neural network is used to identify operational information, and the trained first neural network has good predictive ability for user actions. Therefore, the trained third feature extraction network can identify information used for operational information recognition (i.e., the outer edge of the information), and this information does not have semantic information strongly correlated with the recommendation scenario. This improves the generalization ability of the recommendation model across various scenarios.

[0011] In one possible implementation, an additional neural network can be deployed to orthogonalize the gradients corresponding to the third feature extraction network and the fourth feature extraction network.

[0012] In one possible implementation, a constraint term can be added to the first loss to orthogonalize the gradients of the third and fourth feature extraction networks.

[0013] In one possible implementation, after obtaining the initial gradients corresponding to the third and fourth feature extraction networks based on the first loss, the initial gradients corresponding to the third and fourth feature extraction networks can be orthogonalized so that the directions of the obtained first and fourth feature extraction networks are orthogonal (or nearly orthogonal).

[0014] In one possible implementation, during the process of predicting the user's second operation information on the item through a first neural network based on attribute information, and predicting the first recommended scenario of the operation data through a second neural network based on attribute information, information indicating the target recommended scenario is not used as input to the third and fourth feature extraction networks.

[0015] Since the first operation information is used as the ground truth during training of the third feature extraction network, it does not need to be input into the first feature extraction network during the feedforward process. Similarly, since the information indicating the target recommendation scene is used as the ground truth during training of the fourth feature extraction network, it does not need to be input into the third feature extraction network during the feedforward process.

[0016] In this embodiment of the application, in order to improve the generalization of the model, the unbiased representation obtained by the third feature extraction network and the biased representation obtained by the fourth feature extraction network can be combined (or fused), so that the combined representation can still have a high prediction ability after being processed by the neural network (for operation information prediction).

[0017] In one possible implementation, the unbiased representation obtained from the third feature extraction network and the biased representation obtained from the fourth feature extraction network can be input into the fourth neural network to predict the user's fifth operation information on the item; the difference between the fifth operation information and the first operation information is used to determine the first loss. For example, the unbiased representation obtained from the third feature extraction network and the biased representation obtained from the fourth feature extraction network can be fused (e.g., by splicing), and the fused result can be input into the fourth neural network. Optionally, the fourth neural network and the first neural network can have the same or similar network structures. For details, please refer to the description of the first neural network in the above embodiments, which will not be repeated here.

[0018] In one possible implementation, the unbiased representation obtained from the third feature extraction network and the biased representation obtained from the fourth feature extraction network can be input into the fourth neural network to obtain the user's fifth operation information on the item. The first loss is constructed based on the difference between the fifth operation information and the first operation information (i.e., the truth value). In other words, the first loss includes not only the loss terms that include the difference between the first and second operation information and the difference between the first recommendation scenario and the target recommendation scenario, but also the difference between the fifth operation information and the first operation information.

[0019] In one possible implementation, the difference between the target operation information and the first operation information is used to determine the second loss; the method also includes updating the first feature extraction network based on the second loss.

[0020] In one possible implementation, during the actual model inference process, the trained third feature extraction network needs to be connected to the scene-related operation information prediction network. Each scene corresponds to an operation information prediction network related to that scene. During inference, in order to predict the user's operation information on items in a certain recommendation scene (recommendation scene A), the attribute information of the user and the item is input into the trained third feature extraction network to obtain an embedded representation (e.g., a first embedded representation). The attribute information of the user and the item is also input into the feature extraction network related to recommendation scene A (or input into the feature extraction networks related to each scene, and then weighted based on the scene weights) to obtain an embedded representation (e.g., a second embedded representation). The first and second embedded representations can be fused and input into the operation information prediction network corresponding to recommendation scene A to obtain the predicted operation information (e.g., target operation information).

[0021] Therefore, during training, for updating the third feature extraction network, in addition to the gradient obtained by backpropagation based on the output of the first neural network (e.g., the first gradient), it is also necessary to update the gradient obtained by backpropagation based on the output of the operation information prediction network corresponding to each scenario (e.g., target operation information) (e.g., the second gradient).

[0022] However, the gradients mentioned above are scenario-specific gradients (e.g., the second gradient). These gradients (e.g., the second gradient) can have a negative impact on the gradients obtained based on unbiased representations (e.g., the first gradient). The parameter update directions of these gradients will conflict. For example, if gradients A and B are gradients with opposite directions, directly superimposing gradients A and B and then updating them is equivalent to not updating the parameters at all. This means that the effective information between them cannot be used effectively to improve the performance of the corresponding scenario.

[0023] In this embodiment, to address the aforementioned issues, the third feature extraction network is first updated based on the gradient obtained from the unbiased representation. Then, on one hand, the attribute information of users and items is processed based on a feature extraction network related to the recommendation scenario (e.g., a second feature extraction network) (or inputted into feature extraction networks related to various scenarios and then weighted according to scenario weights) to obtain an embedding representation (e.g., a second embedding representation). On the other hand, the updated third feature network is used to process the attribute information of users and items to obtain an embedding representation (e.g., a first embedding representation). This first and second embedding representations can be fused and input into an operation information prediction network corresponding to recommendation scenario A to obtain predicted operation information (e.g., target operation information). A loss (e.g., a second loss) is obtained based on the target operation information, and a gradient (e.g., a second gradient) is determined based on the second loss. The first feature extraction network is then updated according to the second gradient.

[0024] In this way, instead of combining the gradient obtained based on unbiased representation and the gradient obtained based on scene-related operation information to update the third feature extraction, this application updates the third feature extraction network after updating the third feature extraction network based on the gradient obtained based on unbiased representation (to obtain the first feature extraction network), and then updates the first feature extraction network based on the gradient obtained based on scene-related operation information. This ensures that there is no negative influence between the gradient related to the specific scene and the gradient obtained based on unbiased representation, and can make good use of the effective information between them to improve the effect of the corresponding scene.

[0025] In one possible implementation, a first embedding representation and a second embedding representation are obtained based on attribute information through a first feature extraction network and a second feature extraction network, respectively, including: obtaining a second embedding representation based on attribute information through the second feature extraction network; obtaining a second embedding representation based on attribute information through the second feature extraction network includes: obtaining multiple embedding representations based on attribute information through multiple feature extraction networks, including the second feature extraction network; wherein each feature extraction network corresponds to a recommendation scenario, and the second feature extraction network corresponds to the target recommendation scenario; and the multiple embedding representations are fused to obtain the second embedding representation.

[0026] In one possible implementation, multiple embedded representations are fused, including: predicting the probability value of the attribute information corresponding to each recommendation scenario based on the attribute information; and fusing the multiple embedded representations by using each probability value as the weight of the corresponding recommendation scenario.

[0027] In other words, a corresponding feature extraction network can be set up for each recommendation scenario. During the feedforward process, attribute information is input into each feature extraction network, and each feature extraction network can output an embedding representation. Multiple embedding representations output by multiple feature extraction networks can be fused. The fusion method can be based on determining the weights (or probability values) corresponding to each recommendation scenario based on the attribute information, and then fusing multiple embedding representations based on these probability values ​​to obtain a second embedding representation. In one possible implementation, the probability values ​​corresponding to each recommendation scenario can be obtained based on the attribute information; each probability value can be used as the weight for the corresponding recommendation scenario, and multiple embedding representations can be fused to obtain a second embedding representation. For example, a weighted summation can be used.

[0028] In one possible implementation, a fourth neural network can be used to obtain the probability values ​​of attribute information corresponding to each recommendation scenario. The fourth neural network can reuse the second neural network and use the output probability for each recommendation scenario as the weight of multiple embedding representations. Alternatively, an end-to-end retraining of a fourth neural network with recommendation scenario prediction capabilities can be selected to achieve efficient fusion of information from multiple scenarios.

[0029] In one possible implementation, during each iteration, the model can be updated based on the gradients obtained from a batch of data. This batch of data can contain operational data from different recommendation scenarios; for example, it can include operational data from a second recommendation scenario. A loss (e.g., the third loss in this embodiment) and a gradient for updating the first feature extraction network can also be obtained based on the operational data from the second recommendation scenario. However, since the gradients obtained based on the second loss and the third loss are from different recommendation scenarios, they may have negative influences on each other (their parameter update directions may conflict; for example, gradients A and B are gradients with opposite directions. If gradients A and B are directly superimposed and then updated, it is equivalent to not updating the parameters at all). This prevents the effective use of their information to improve the performance of the corresponding scenario.

[0030] To address the aforementioned issues, in this embodiment of the application, the gradients obtained based on the second loss and the gradients obtained based on the third loss are orthogonalized, thereby reducing the mutual negative impact between gradients obtained from different recommendation scenarios.

[0031] In one possible implementation, user operation data (including attribute information) in the second recommendation scenario can be obtained; based on the operation data in the second recommendation scenario, the user's operation information on items in the second recommendation scenario can be predicted (for example, the attribute information of the user's operation data in the second recommendation scenario can be input into the feature extraction network corresponding to the second recommendation scenario, specifically referring to the description of the second feature extraction network in the above embodiments, to obtain an embedding representation, which can be input into the neural network corresponding to the second recommendation scenario (for predicting operation information in the second recommendation scenario) to obtain the corresponding prediction result). The prediction result can determine the third loss based on the true value of the operation information in the operation data of the second recommendation scenario, and the third loss can obtain the gradient (a third gradient) corresponding to the first feature extraction network during backpropagation.

[0032] In one possible implementation, multiple third gradients of the first feature extraction network can be obtained by orthogonalizing multiple gradients corresponding to the first feature extraction network based on the second loss and the third loss. One of the multiple third gradients is obtained based on the second loss, and another of the multiple third gradients is obtained based on the third loss. The multiple third gradients are then fused (e.g., by vector summation) to obtain the second gradient corresponding to the first feature extraction network. The first feature extraction network is then updated based on the second gradient.

[0033] In one possible implementation, the operational data includes information indicating the target recommendation scenario; the method further includes: obtaining a third embedding representation based on the operational data through a second feature extraction network; predicting third operational information of the user on the item based on the third embedding representation through a third neural network; wherein the difference between the third operational information and the first operational information is used to determine a fourth loss; and updating the third neural network and the second feature extraction network based on the fourth loss.

[0034] It should be understood that while the weighted fusion method based on multiple feature extraction networks described above can result in a second embedded representation that contains more information about the corresponding recommendation scenario and less information about non-corresponding recommendation scenarios, the attribute information (excluding information indicating the target recommendation scenario) is input into multiple feature extraction networks. Therefore, the embedded representations output by these networks lack accurate semantic information about the corresponding recommendation scenario. Thus, during the training of multiple feature extraction networks, information indicating the recommendation scenario and attribute information can be additionally input into the feature extraction networks to participate in the feedforward process of network training.

[0035] In one possible implementation, the operation information indicates whether the user has performed a target operation on the item, the target operation including at least one of the following: a click operation, a browse operation, an add-to-cart operation, and a purchase operation.

[0036] In one possible implementation, the attribute information includes user attributes, which include at least one of the following: gender, age, occupation, income, hobbies, and education level.

[0037] In one possible implementation, the attribute information includes the item's attributes, which include at least one of the following: item name, developer, installation package size, category, and rating.

[0038] The user's attribute information can be attributes related to the user's preferences, including at least one of gender, age, occupation, income, hobbies, and education level. Gender can be male or female, age can be a number between 0 and 100, occupation can be teacher, programmer, chef, etc., hobbies can be basketball, tennis, running, etc., and education level can be primary school, junior high school, high school, university, etc. This application does not limit the specific type of user attribute information.

[0039] The items can be physical or virtual, such as apps, audio / video files, web pages, and news articles. The attribute information of the items can include at least one of the following: item name, developer, installation package size, category, and rating. For example, if an item is an application, the category can be chat, parkour, office, etc., and the rating can be a score or comment on the item. This application does not limit the specific type of attribute information of the items.

[0040] In one possible implementation, different recommendation scenarios are different applications, or different recommendation scenarios are different types of applications (e.g., video applications and browser applications are different applications), or different recommendation scenarios are different functions of the same application (e.g., different channels of the same application, such as news channels, technology channels, etc.), and the above different functions can be divided according to recommendation categories.

[0041] In one possible implementation, the method further includes: determining to recommend items to the user when the target operation information meets preset conditions.

[0042] Secondly, this application provides an operation prediction device, the device comprising:

[0043] The acquisition module is used to acquire attribute information of users and items in the target recommendation scenario;

[0044] The feature extraction module is used to obtain a first embedding representation and a second embedding representation based on attribute information through a first feature extraction network and a second feature extraction network, respectively. The first embedding representation is a feature that is unrelated to the recommendation scenario information, and the second embedding representation is a feature that is related to the target recommendation scenario. The first embedding representation and the second embedding representation are used to fuse to obtain a fused embedding representation.

[0045] The prediction module is used to predict the user's target operation information on the item based on the fused embedded representation.

[0046] In one possible implementation, the attribute information includes the user's operation data in the target recommendation scenario, and the operation data also includes the user's first operation information on the item;

[0047] The prediction module is also used for:

[0048] Based on the attribute information, the first neural network predicts the user's second action information on the item;

[0049] Based on attribute information, a first recommended scenario for the operation data is predicted through a second neural network; wherein the difference between the first operation information and the second operation information, as well as the difference between the first recommended scenario and the target recommended scenario, are used to determine the first loss.

[0050] The device also includes:

[0051] The model update module is also used to orthogonalize the gradient corresponding to the third feature extraction network in the first neural network and the gradient corresponding to the fourth feature extraction network in the second neural network based on the first loss, so as to obtain the first gradient corresponding to the initial feature extraction network.

[0052] Based on the first gradient, the third feature extraction network is updated to obtain the first feature extraction network.

[0053] In one possible implementation, the difference between the target operation information and the first operation information is used to determine the second loss; the model update module is also used for:

[0054] The first feature extraction network is updated based on the second loss.

[0055] In one possible implementation, the feature extraction module is specifically used to obtain a second embedding representation based on attribute information through a second feature extraction network;

[0056] Based on the attribute information, a second embedding representation is obtained through a second feature extraction network, including:

[0057] Based on the attribute information, multiple embedding representations are obtained through multiple feature extraction networks, including a second feature extraction network; where each feature extraction network corresponds to a recommendation scenario, and the second feature extraction network corresponds to the target recommendation scenario.

[0058] Multiple embedding representations are fused to obtain a second embedding representation.

[0059] In one possible implementation, the feature extraction module is specifically used to predict the probability value of the attribute information corresponding to each recommendation scenario based on the attribute information.

[0060] Each probability value is used as a weight for the corresponding recommendation scenario, and multiple embedding representations are fused.

[0061] In one possible implementation, the acquisition module is also used for:

[0062] Obtain user action data in the second recommendation scenario;

[0063] The prediction module is also used to predict the user's operation information on items in the second recommended scenario based on the operation data in the second recommended scenario; wherein, the user's operation information on items in the second recommended scenario is used to determine the third loss;

[0064] The model update module is specifically used to: based on the second loss and the third loss, orthogonalize the multiple gradients corresponding to the first feature extraction network to obtain multiple third gradients of the first feature extraction network;

[0065] Multiple fourth gradients are fused to obtain the second gradient corresponding to the first feature extraction network;

[0066] Update the first feature extraction network based on the second gradient.

[0067] In one possible implementation, the operational data includes information indicating the target recommendation scenario; the feature extraction module is further configured to obtain a third embedding representation based on the operational data through a second feature extraction network;

[0068] According to the third embedding representation, the third neural network is used to predict the user's third action information on the item; wherein the difference between the third action information and the first action information is used to determine the fourth loss.

[0069] The model update module is also used to update the third neural network and the second feature extraction network based on the fourth loss.

[0070] In one possible implementation, the target action information indicates whether the user has performed a target action on an item, and the target action includes at least one of the following:

[0071] Clicking, browsing, adding to cart, and purchasing.

[0072] In one possible implementation, the attribute information includes user attributes, which include at least one of the following: gender, age, occupation, income, hobbies, and education level.

[0073] In one possible implementation, the attribute information includes the item's attributes, which include at least one of the following: item name, developer, installation package size, category, and rating.

[0074] In one possible implementation, different recommendation scenarios are for different applications; or,

[0075] Different recommendation scenarios are for different types of applications; or,

[0076] Different recommendation scenarios represent different functions of the same application.

[0077] In one possible implementation, the device further includes:

[0078] When the target operation information meets the preset conditions, it is determined to recommend items to the user.

[0079] Thirdly, embodiments of this application provide a model training method, the method comprising:

[0080] The user's operation data in the target recommendation scenario is obtained, and the operation data includes the attribute information of the user and the item, as well as the user's first operation information on the item;

[0081] Based on the attribute information, a first neural network is used to predict the user's second operation information on the item;

[0082] Based on the attribute information, a first recommended scenario for the operation data is predicted using a second neural network; wherein the difference between the first operation information and the second operation information, and the difference between the first recommended scenario and the target recommended scenario, are used to determine the first loss;

[0083] Based on the first loss, the first gradient corresponding to the initial feature extraction network is obtained by orthogonalizing the gradients corresponding to the third feature extraction network in the first neural network and the fourth feature extraction network in the second neural network.

[0084] The third feature extraction network is updated based on the first gradient to obtain the first feature extraction network.

[0085] In one possible implementation, the method further includes:

[0086] Based on the attribute information, a first embedding representation and a second embedding representation are obtained through the first feature extraction network and the second feature extraction network, respectively; the first embedding representation and the second embedding representation are fused to obtain a fused embedding representation;

[0087] Based on the fused embedded representation, the user's target operation information for the item is predicted; the difference between the target operation information and the first operation information is used to determine the second loss.

[0088] The first feature extraction network is updated based on the second loss.

[0089] In one possible implementation, obtaining the first embedding representation and the second embedding representation based on the attribute information through the first feature extraction network and the second feature extraction network respectively includes:

[0090] Based on the attribute information, a second embedding representation is obtained through a second feature extraction network;

[0091] The step of obtaining the second embedding representation through the second feature extraction network based on the attribute information includes:

[0092] Based on the attribute information, multiple embedding representations are obtained through multiple feature extraction networks, including the second feature extraction network; wherein each feature extraction network corresponds to a recommendation scenario, and the second feature extraction network corresponds to the target recommendation scenario;

[0093] The multiple embedded representations are fused to obtain the second embedded representation.

[0094] In one possible implementation, the method further includes:

[0095] Obtain user action data in the second recommendation scenario;

[0096] Based on the operation data in the second recommendation scenario, predict the user's operation information on the item in the second recommendation scenario; wherein, the user's operation information on the item in the second recommendation scenario is used to determine the third loss;

[0097] The step of updating the first feature extraction network based on the second loss includes:

[0098] Based on the second loss and the third loss, multiple third gradients of the first feature extraction network are obtained by orthogonalizing multiple gradients corresponding to the first feature extraction network.

[0099] The multiple fourth gradients are fused to obtain the second gradient corresponding to the first feature extraction network;

[0100] The first feature extraction network is updated based on the second gradient.

[0101] Fourthly, embodiments of this application provide a model training apparatus, the apparatus comprising:

[0102] The acquisition module is used to acquire user operation data in the target recommendation scenario. The operation data includes the attribute information of the user and the item, as well as the user's first operation information on the item.

[0103] The prediction module is used to predict the user's second operation information on the item based on the attribute information and through a first neural network.

[0104] Based on the attribute information, a first recommended scenario for the operation data is predicted using a second neural network; wherein the difference between the first operation information and the second operation information, and the difference between the first recommended scenario and the target recommended scenario, are used to determine the first loss;

[0105] Based on the first loss, the first gradient corresponding to the initial feature extraction network is obtained by orthogonalizing the gradients corresponding to the third feature extraction network in the first neural network and the fourth feature extraction network in the second neural network.

[0106] The model update module is used to update the third feature extraction network according to the first gradient to obtain the first feature extraction network.

[0107] In one possible implementation, the device further includes:

[0108] The feature extraction module is used to obtain a first embedding representation and a second embedding representation based on the attribute information through the first feature extraction network and the second feature extraction network, respectively; the first embedding representation and the second embedding representation are fused to obtain a fused embedding representation;

[0109] The prediction module is further configured to predict the user's target operation information on the item based on the fused embedded representation; the difference between the target operation information and the first operation information is used to determine a second loss.

[0110] The model update module is also used to update the first feature extraction network based on the second loss.

[0111] In one possible implementation, obtaining the first embedding representation and the second embedding representation based on the attribute information through the first feature extraction network and the second feature extraction network respectively includes:

[0112] Based on the attribute information, a second embedding representation is obtained through a second feature extraction network;

[0113] The step of obtaining the second embedding representation through the second feature extraction network based on the attribute information includes:

[0114] Based on the attribute information, multiple embedding representations are obtained through multiple feature extraction networks, including the second feature extraction network; wherein each feature extraction network corresponds to a recommendation scenario, and the second feature extraction network corresponds to the target recommendation scenario;

[0115] The multiple embedded representations are fused to obtain the second embedded representation.

[0116] In one possible implementation, the acquisition module is further configured to:

[0117] Obtain user action data in the second recommendation scenario;

[0118] The prediction module is further configured to predict the user's operation information on the item in the second recommendation scenario based on the operation data in the second recommendation scenario; wherein the user's operation information on the item in the second recommendation scenario is used to determine the third loss;

[0119] The model update module is specifically used to obtain multiple third gradients of the first feature extraction network by orthogonalizing multiple gradients corresponding to the first feature extraction network based on the second loss and the third loss.

[0120] The multiple fourth gradients are fused to obtain the second gradient corresponding to the first feature extraction network;

[0121] The first feature extraction network is updated based on the second gradient.

[0122] Fifthly, embodiments of this application provide an operation prediction apparatus, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory to perform any of the optional methods described in the first aspect above.

[0123] In a sixth aspect, embodiments of this application provide a model training apparatus, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory to perform any of the optional methods described in the third aspect above.

[0124] In a seventh aspect, embodiments of this application provide a computer-readable storage medium storing a computer program that, when run on a computer, causes the computer to perform the methods described in the first aspect and any optional method described in the third aspect.

[0125] Eighthly, embodiments of this application provide a computer program product, including code, which, when executed, is used to implement the first aspect and any optional method described above, as well as any optional method described above in the third aspect.

[0126] Ninthly, this application provides a chip system including a processor for supporting an execution device or training device in implementing the functions involved in the foregoing aspects, such as transmitting or processing data involved in the foregoing methods; or, information. In one possible design, the chip system further includes a memory for storing program instructions and data necessary for the execution device or training device. This chip system may be composed of chips or may include chips and other discrete devices. Attached Figure Description

[0127] Figure 1 A structural diagram illustrating the main framework of artificial intelligence;

[0128] Figure 2 A schematic diagram of a system architecture provided for an embodiment of this application;

[0129] Figure 3 A schematic diagram of a system architecture provided for an embodiment of this application;

[0130] Figure 4 A schematic diagram illustrating a recommended scenario provided in an embodiment of this application;

[0131] Figure 5 A flowchart illustrating an operation prediction method provided in an embodiment of this application;

[0132] Figure 6 This is a schematic diagram of a recommendation model;

[0133] Figure 7 This is a schematic diagram of a recommendation model;

[0134] Figure 8 This is a schematic diagram of a recommendation model;

[0135] Figure 9 This is a schematic diagram of a recommendation model;

[0136] Figure 10 A flowchart illustrating an operation prediction method provided in an embodiment of this application;

[0137] Figure 11This is a schematic diagram of the structure of a recommended device provided in an embodiment of this application;

[0138] Figure 12 A schematic diagram of an execution device provided in an embodiment of this application;

[0139] Figure 13 A schematic diagram of a training device provided in an embodiment of this application;

[0140] Figure 14 This is a schematic diagram of a chip provided in an embodiment of this application. Detailed Implementation

[0141] The embodiments of the present invention will now be described with reference to the accompanying drawings. The terminology used in the embodiments section is for illustrative purposes only and is not intended to limit the scope of the invention.

[0142] The embodiments of this application will now be described with reference to the accompanying drawings. Those skilled in the art will recognize that, with technological advancements and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are equally applicable to similar technical problems.

[0143] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such terms are interchangeable where appropriate; this is merely a way of distinguishing objects with the same attributes in the embodiments of this application. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion, so that a process, method, system, product, or apparatus that comprises a series of elements is not necessarily limited to those elements, but may include other elements not explicitly listed or inherent to those processes, methods, products, or apparatuses.

[0144] First, the overall workflow of the artificial intelligence system is described; please refer to [link / reference]. Figure 1 , Figure 1 The diagram illustrates a structural framework for artificial intelligence (AI). The framework is further elaborated below along two dimensions: the "Intelligent Information Chain" (horizontal axis) and the "IT Value Chain" (vertical axis). The "Intelligent Information Chain" reflects a series of processes from data acquisition to processing. For example, it could be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, and intelligent execution and output. In this process, data undergoes a condensation process of "data—information—knowledge—wisdom." The "IT Value Chain" reflects the value that AI brings to the information technology industry, from the underlying infrastructure of human intelligence and information (provided and processed through technological means) to the industrial ecosystem of the system.

[0145] (1) Infrastructure

[0146] Infrastructure provides computing power to support artificial intelligence systems, enabling communication with the external world and providing support through a basic platform. This communication occurs through sensors; computing power is provided by intelligent chips (hardware acceleration chips such as CPUs, NPUs, GPUs, ASICs, and FPGAs); and the basic platform includes distributed computing frameworks and related platform guarantees and support, which may include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to acquire data, and this data is provided to intelligent chips in the distributed computing system provided by the basic platform for computation.

[0147] (2) Data

[0148] The data at the next layer of infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, text, and IoT data from traditional devices, including business data from existing systems and sensor data such as force, displacement, liquid level, temperature, and humidity.

[0149] (3) Data processing

[0150] Data processing typically includes methods such as data training, machine learning, deep learning, search, reasoning, and decision-making.

[0151] Among them, machine learning and deep learning can perform intelligent information modeling, extraction, preprocessing, and training on data, including symbolization and formalization.

[0152] Reasoning refers to the process in which, in a computer or intelligent system, the machine thinks and solves problems by simulating human intelligent reasoning, based on reasoning control strategies and using formalized information. Typical functions include search and matching.

[0153] Decision-making refers to the process of making decisions based on intelligent information after reasoning, and it typically provides functions such as classification, sorting, and prediction.

[0154] (4) General ability

[0155] After the data processing mentioned above, the results of the data processing can be used to form some general capabilities, such as algorithms or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.

[0156] (5) Smart Products and Industry Applications

[0157] Intelligent products and industry applications refer to products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Their application areas mainly include: intelligent terminals, intelligent transportation, intelligent healthcare, autonomous driving, smart cities, etc.

[0158] This application's embodiments can be applied to the field of information recommendation, including but not limited to e-commerce product recommendations, search engine result recommendations, app store recommendations, music recommendations, and video recommendations. The recommended items in various application scenarios can also be referred to as "objects" for ease of subsequent description. That is, in different recommendation scenarios, the recommended object can be an app, a video, music, or a specific product (such as the presentation interface of an online shopping platform, which displays different products based on different users; this can essentially be presented through the recommendation results of a recommendation model). These recommendation scenarios typically involve user behavior log collection, log data preprocessing (e.g., quantization, sampling), sample set training to obtain a recommendation model, and analysis and processing of the objects (such as apps, music, etc.) involved in the scenarios corresponding to the training sample items based on the recommendation model. For example, if the samples selected in the recommendation model training stage come from the user's operational behavior towards the recommended app in a mobile app store, then the recommendation model trained in this way is applicable to the aforementioned mobile app app store, or can be used for recommending terminal apps in other types of terminal app app stores. The recommendation model will eventually calculate the recommendation probability or score of each object to be recommended. The recommendation system selects the recommendation results according to certain selection rules, such as sorting them according to the recommendation probability or score. The results are then presented to the user through the corresponding application or terminal device. The user interacts with the objects in the recommendation results to generate user behavior logs, etc.

[0159] Reference Figure 4 In the recommendation process, when a user interacts with the recommendation system, a recommendation request is triggered. The system inputs this request and its related feature information into the deployed recommendation model, then predicts the click-through rate (CTR) of the user for all candidate items. Subsequently, the candidate items are sorted in descending order based on the predicted CTR and displayed sequentially in different positions as the recommendation result for the user. Users browse the displayed items and perform user actions, such as browsing, clicking, and downloading. These user actions are stored in logs as training data, and the parameters of the recommendation model are periodically updated through an offline training module to improve the model's recommendation performance.

[0160] For example, when a user opens the app store, the recommendation module is triggered. This module predicts the likelihood of the user downloading a given set of candidate apps based on the user's download history, click history, app characteristics, and environmental factors such as time and location. Based on the predictions, the app store displays apps in descending order of probability, thus increasing the likelihood of app downloads. Specifically, apps more likely to be downloaded are listed first, while those less likely are listed last. User behavior is also logged and used to train and update the prediction model's parameters through an offline training module.

[0161] For example, in applications related to lifelong partners, historical data from users across domains such as video, music, and news can be used to construct a cognitive brain, mimicking the mechanisms of the human brain, and build a framework for a lifelong learning system. Lifelong partners can record past events based on system and application data, understand current intentions, predict future actions or behaviors, and ultimately provide intelligent services. In the current first phase, user behavior data (including information such as SMS messages, photos, and email events) obtained from music apps, video apps, and browser apps is used to build a user profile system and implement learning and memory modules based on user information filtering, association analysis, cross-domain recommendation, and causal reasoning to construct a personal knowledge graph for each user.

[0162] The application architecture of this application embodiment will be described next.

[0163] See appendix Figure 2This invention provides a recommendation system architecture 200. A data acquisition device 260 is used to collect samples. A training sample can consist of multiple feature information (or described as attribute information, such as user attributes and item attributes). The feature information can be of various types, specifically including user feature information, object feature information, and tag features. User feature information is used to characterize user characteristics, such as gender, age, occupation, hobbies, etc. Object feature information is used to characterize the features of the objects pushed to the user. Different recommendation systems correspond to different objects, and the types of features to be extracted for different objects are also different. For example, the object features extracted from the training samples of an app market can be the app's name (identifier), type, size, etc. The object features mentioned in the training samples of e-commerce apps can include the product name, category, price range, etc. Tag features are used to indicate whether a sample is a positive or negative example. Typically, the tag features of a sample can be obtained from the user's actions on the recommended object. Samples where the user has performed actions on the recommended object are positive examples, while samples where the user has not performed actions or has only browsed the recommended object are negative examples. For example, if a user clicks, downloads, or purchases a recommended object, the tag feature is 1, indicating that the sample is a positive example; if the user has not performed any actions on the recommended object, the tag feature is 0, indicating that the sample is a negative example. After collection, the samples can be stored in database 230. Some or all of the feature information in the samples in database 230 can also be directly obtained from the client device 240, such as user feature information, user action information on objects (used to determine type identification), and object feature information (such as object identification). The training device 220 trains and obtains a model parameter matrix based on the samples in database 230 to generate a recommendation model 201 (e.g., the feature extraction network and neural network in this embodiment). The following describes in more detail how the training device 220 trains to obtain the model parameter matrix used to generate the recommendation model 201. The recommendation model 201 can be used to evaluate a large number of objects to obtain the score of each object to be recommended. Furthermore, it can recommend a specified or preset number of objects from the evaluation results of a large number of objects. The calculation module 211 obtains the recommendation results based on the evaluation results of the recommendation model 201 and recommends them to the client device through the I / O interface 212.

[0164] In this embodiment, the training device 220 can select positive and negative samples from the sample set in the database 230 and add them to the training set. Then, the recommendation model is used to train the samples in the training set to obtain the trained recommendation model. The implementation details of the calculation module 211 can be found in [reference needed]. Figure 5 A detailed description of the method embodiments shown.

[0165] After training the model parameter matrix based on the samples, the training device 220 uses it to construct the recommendation model 201 and then sends the recommendation model 201 to the execution device 210. Alternatively, the model parameter matrix can be directly sent to the execution device 210, where the recommendation model is constructed for use in the corresponding system. For example, a recommendation model trained based on video-related samples can be used to recommend videos to users on video websites or apps, while a recommendation model trained based on app-related samples can be used to recommend apps to users in app stores.

[0166] The execution device 210 is equipped with an I / O interface 212 for data interaction with external devices. The execution device 210 can obtain user characteristic information from the client device 240 through the I / O interface 212, such as user identifier, user identity, gender, occupation, and hobbies. This information can also be obtained from the system database. The recommendation model 201 recommends target objects to the user based on the user characteristic information and the characteristic information of the objects to be recommended. The execution device 210 can be located on a cloud server or on the user client.

[0167] The execution device 210 can access data, code, etc., in the data storage system 250, and can also store output data into the data storage system 250. The data storage system 250 can be located within the execution device 210, can be set up independently, or can be located in other network entities; there can be one or multiple such systems.

[0168] The calculation module 211 uses the recommendation model 201 to process the user feature information and the feature information of the object to be recommended. For example, the calculation module 211 uses the recommendation model 201 to analyze and process the user feature information and the feature information of the object to be recommended, thereby obtaining the score of the object to be recommended. The objects to be recommended are sorted according to the score, and the objects ranked higher will be recommended to the client device 240.

[0169] Finally, I / O interface 212 returns the recommendation results to client device 240 and presents them to the user.

[0170] At a deeper level, the training device 220 can generate corresponding recommendation models 201 based on different sample feature information for different targets, so as to provide users with better results.

[0171] It is worth noting that, attached Figure 2 This is merely a schematic diagram of a system architecture provided by an embodiment of the present invention. The positional relationships between the devices, components, modules, etc. shown in the diagram do not constitute any limitation. For example, in the attached diagram... Figure 2In this context, the data storage system 250 is an external memory relative to the execution device 210. In other cases, the data storage system 250 may also be placed within the execution device 210.

[0172] In this embodiment, the training device 220, the execution device 210, and the client device 240 may be three different physical devices. Alternatively, the training device 220 and the execution device 210 may be on the same physical device or a cluster, or the execution device 210 and the client device 240 may be on the same physical device or a cluster.

[0173] See appendix Figure 3 This is a system architecture 300 proposed in an embodiment of the present invention. In this architecture, the execution device 210 is implemented by one or more servers, optionally in conjunction with other computing devices, such as data storage, routers, load balancers, etc. The execution device 210 can be deployed on a single physical site or distributed across multiple physical sites. The execution device 210 can use data in the data storage system 250 or call program code in the data storage system 250 to implement the object recommendation function. Specifically, the information of the object to be recommended is input into the recommendation model. The recommendation model generates an estimated score for each object to be recommended, and then sorts them in descending order of the estimated scores. The object to be recommended is then recommended to the user according to the sorting result. For example, the top 10 objects in the sorting result are recommended to the user.

[0174] The data storage system 250 is used to receive and store the parameters of the recommendation model sent by the training device, as well as the data for storing the recommendation results obtained through the recommendation model. It may also include the program code (or instructions) required for the normal operation of the storage system 250. The data storage system 250 can be a distributed storage cluster consisting of one or more devices deployed outside the execution device 210. In this case, when the execution device 210 needs to use data on the storage system 250, the storage system 250 can send the required data to the execution device 210, and the execution device 210 receives and stores (or caches) the data. Alternatively, the data storage system 250 can be deployed within the execution device 210. When deployed within the execution device 210, the distributed storage system can include one or more storage devices. Optionally, when multiple storage devices exist, different storage devices are used to store different types of data. For example, the model parameters of the recommendation model generated by the training device and the recommendation results obtained through the recommendation model can be stored on two different storage devices.

[0175] Users can interact with execution device 210 by operating their respective user devices (e.g., local device 301 and local device 302). Each local device can represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, game console, etc.

[0176] Each user's local device can interact with the execution device 210 through a communication network of any communication mechanism / standard. The communication network can be a wide area network, a local area network, a point-to-point connection, or any combination thereof.

[0177] In another implementation, execution device 210 can be implemented by a local device. For example, local device 301 can implement the recommendation function of execution device 210 based on the recommendation model to obtain user feature information and provide recommendation results to the user, or provide services to the user of local device 302.

[0178] Since the embodiments of this application involve a large number of neural network applications, for ease of understanding, the relevant terms and concepts such as neural networks involved in the embodiments of this application will be introduced below.

[0179] 1. Click-through rate (CTR)

[0180] Click probability, also known as click-through rate, refers to the ratio of the number of clicks to the number of impressions of recommended information (e.g., recommended items) on a website or application. Click-through rate is usually an important metric for evaluating recommendation systems.

[0181] 2. Personalized Recommendation System

[0182] A personalized recommendation system refers to a system that analyzes a user's historical data (such as the operation information in the embodiments of this application) using machine learning algorithms, and uses this data to predict new requests and provide personalized recommendation results.

[0183] 3. Offline training

[0184] Offline training refers to a module in a personalized recommendation system that iteratively updates the parameters of the recommendation model according to the algorithm learned by the machine learning, based on the user's historical data (such as the operation information in the embodiments of this application), until the set requirements are met.

[0185] 4. Online Inference

[0186] Online prediction refers to using a model trained offline to predict a user's preference for recommended items in the current context, based on the characteristics of the user, items, and context, and to predict the probability of the user selecting the recommended items.

[0187] For example, Figure 3 This is a schematic diagram of the recommendation system provided in an embodiment of this application. For example... Figure 3 As shown, when a user enters the system, a recommendation request is triggered. The recommendation system inputs this request and its related information (such as the operation information in this embodiment) into the recommendation model, and then predicts the user's selection rate for items within the system. Further, the items are sorted in descending order based on the predicted selection rate or a function based on that selection rate; that is, the recommendation system can display items in different positions sequentially as recommendations to the user. The user browses items in different positions and performs user actions, such as browsing, selecting, and downloading. Simultaneously, the user's actual behavior is stored in a log as training data, and the parameters of the recommendation model are continuously updated through an offline training module to improve the model's predictive performance.

[0188] For example, a user opening the app store on a smart device (e.g., a mobile phone) triggers the app store's recommendation system. The app store's recommendation system predicts the probability of the user downloading each recommended candidate app based on the user's historical behavior logs, such as historical download records and user selection records, as well as the app store's own characteristics, such as environmental features like time and location. Based on the calculation results, the app store's recommendation system can display candidate apps in descending order of predicted probability values, thereby increasing the download probability of candidate apps.

[0189] For example, apps with a predicted high user selection rate can be displayed in the top recommendation positions, while apps with a predicted low user selection rate can be displayed in the bottom recommendation positions.

[0190] The recommended model mentioned above can be a neural network model. The following is an introduction to the relevant terms and concepts of neural networks that may be involved in the embodiments of this application.

[0191] (1) Neural Network

[0192] A neural network can be composed of neural units, which can be defined as a computational unit that takes xs (i.e., input data) and an intercept of 1 as input. The output of this computational unit can be:

[0193]

[0194] Where s = 1, 2, ..., n, where n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer, and the activation function can be the sigmoid function. A neural network is a network formed by connecting multiple of the above-mentioned individual neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field, which can be a region composed of several neural units.

[0195] (2) Deep Neural Networks

[0196] Deep Neural Networks (DNNs), also known as multilayer neural networks, can be understood as neural networks with many hidden layers, though there's no specific metric for "many." DNNs can be categorized into three layers based on their position: input layers, hidden layers, and output layers. Generally, the first layer is the input layer, the last layer is the output layer, and the layers in between are hidden layers. All layers are fully connected, meaning that any neuron in the i-th layer is connected to any neuron in the (i+1)-th layer. Although DNNs appear complex, the operation of each layer is actually quite simple, resembling a linear relationship as follows: in, It is the input vector. It is the output vector. α is the offset vector, W is the weight matrix (also called coefficients), and α() is the activation function. Each layer is simply an adjustment of the input vector. The output vector is obtained through such a simple operation. Because DNNs have many layers, the coefficients W and the offset vector... The number of these parameters is therefore quite large. The definitions of these parameters in a DNN are as follows: Taking the coefficient W as an example: Assuming a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as... The superscript 3 represents the layer number where coefficient W resides, while the subscript corresponds to the output third layer index 2 and the input second layer index 4. In summary, the coefficients from the k-th neuron in layer L-1 to the j-th neuron in layer L are defined as follows: It's important to note that the input layer does not have a W parameter. In deep neural networks, more hidden layers allow the network to better represent complex real-world situations. Theoretically, the more parameters a model has, the higher its complexity and "capacity," meaning it can perform more complex learning tasks. Training a deep neural network is essentially the process of learning the weight matrix, with the ultimate goal of obtaining the weight matrix of all layers in the trained deep neural network (a weight matrix formed by the vectors W from many layers).

[0197] (3) Loss Function

[0198] In training a deep neural network, to ensure the output closely approximates the desired predicted value, we compare the network's prediction with the target value. Based on the difference, we update the weight vector of each layer (usually pre-configuring parameters before the initial update). For example, if the prediction is too high, the weight vector is adjusted to predict a lower value. This adjustment continues until the deep neural network predicts the target value or a value very close to it. Therefore, we need to predefine "how to compare the difference between the predicted and target values," which is the loss function or objective function. These are important equations used to measure the difference between the predicted and target values. Taking the loss function as an example, a higher output value (loss) indicates a greater difference, and training the deep neural network becomes a process of minimizing this loss.

[0199] (4) Backpropagation algorithm

[0200] Backpropagation (BP) can be used during training to correct the parameters in the initial model, thereby reducing the model's error loss. Specifically, forward propagation of the input signal to the output generates error loss; this error loss information is then propagated back to update the parameters in the initial model, leading to convergence of the error loss. The backpropagation algorithm is an error-loss-driven backpropagation process aimed at obtaining optimal model parameters, such as the weight matrix.

[0201] (5) Machine Learning Systems

[0202] Based on input data and labels, the parameters of a machine learning model are trained using optimization methods such as gradient descent, and the trained model is then used to predict unknown data.

[0203] (6) Personalized recommendation system

[0204] This system uses machine learning algorithms to analyze and model users' historical data, and then uses this data to predict new user requests and provide personalized recommendations.

[0205] (7) Recommended scenarios

[0206] Recommended scenarios can refer to an application (APP) that serves a specific need, such as Huawei Browser or Huawei Video, or to specific channels, such as entertainment channels, news channels, and technology channels in a browser's information feed.

[0207] (8) Multi-scene modeling

[0208] By integrating data from multiple scenarios, a single model is trained to serve multiple scenarios.

[0209] Machine learning systems, including personalized recommendation systems, train the parameters of a machine learning model based on input data and labels using optimization methods such as gradient descent. Once the model parameters converge, the model can be used to predict unknown data. Taking click-through rate prediction in a personalized recommendation system as an example, its input data includes user attributes and product attributes. How to predict personalized recommendation lists based on user preferences has a significant impact on improving the recommendation accuracy of the system.

[0210] To meet users' personalized needs, recommendation systems include various recommendation scenarios: browsers, the negative one screen, video streams, etc. Users exhibit different behaviors in different scenarios based on their preferences. Each scenario has user-specific behavioral characteristics as well as shared behavioral characteristics. Typically, each scenario is modeled separately.

[0211] However, modeling a single scenario independently is not feasible because the same user will behave differently in different scenarios. This makes it difficult to effectively capture the behavioral characteristics of users in different scenarios and thus learn user preferences more fully. Furthermore, when there are many scenarios, modeling and maintaining each scenario independently will result in a large consumption of manpower and resources.

[0212] In existing implementations, the STAR (Star Topology Adaptive Recommender) model captures common behavioral features of users in different scenarios through multi-scenario models. For example, in the STAR (star topology adaptive recommender) model, a common feature extraction network is trained to adapt to various scenarios. However, the common feature extraction network in the existing technology cannot extract an embedding representation that can accurately represent the common behavioral features of users in different scenarios, resulting in poor generalization of the recommendation model in various scenarios.

[0213] To address the aforementioned issues, this application provides an operation prediction method, which can be either a feedforward process for model training or an inference process.

[0214] Reference Figure 5 , Figure 5 An embodiment of an operation prediction method provided in this application is illustrated, such as... Figure 5 As shown, an operation prediction method provided in this application includes:

[0215] 501. Obtain the attribute information of users and items in the target recommendation scenario.

[0216] In this embodiment of the application, the execution subject of step 501 can be a terminal device, which can be a portable mobile device, such as, but not limited to, mobile or portable computing devices (such as smartphones), personal computers, server computers, handheld devices (such as tablets) or laptop devices, multiprocessor systems, game consoles or controllers, microprocessor-based systems, set-top boxes, programmable consumer electronics, mobile phones, mobile computing and / or communication devices with wearable or accessory form factors (such as watches, glasses, headphones or earphones), network PCs, minicomputers, mainframe computers, distributed computing environments including any of the above systems or devices, etc.

[0217] In this embodiment of the application, the entity executing step 501 can be a cloud-side server. The server can receive user operation data sent from the terminal device, and thus the server can obtain the user's operation data.

[0218] For ease of description, the form of the executing entity will not be distinguished below, and will be described as training device.

[0219] During model training, attribute information can be used as user action data.

[0220] The user's operation data can be obtained based on the interaction records between the user and the items (such as the user's behavior log). This operation data can include the user's actual operation records on each item, and can include the user's attribute information, the attribute information of each item, and the operation type of the user's operation on the multiple items (such as clicking, downloading, etc.).

[0221] The user's attribute information can be attributes related to the user's preferences, such as at least one of gender, age, occupation, income, hobbies, and education level. Gender can be male or female, age can be a number between 0 and 100, occupation can be teacher, programmer, chef, etc., hobbies can be basketball, tennis, running, etc., and education level can be primary school, junior high school, high school, university, etc. This application does not limit the specific type of user attribute information.

[0222] The items can be physical or virtual, such as applications (APPs), audio and video, web pages, and news information. The attribute information of the items can be at least one of the following: item name, developer, installation package size, category, and rating. For example, if the item is an application, the category can be chat, parkour game, office, etc., and the rating can be a score or comment on the item. This application does not limit the specific type of attribute information of the items.

[0223] In one possible implementation, the training device can acquire user operation data, which includes attribute information of the user and the item, as well as the user's first operation information on the item in the target recommendation scenario.

[0224] In one possible implementation, the target recommendation scenario can be an application that serves specific needs, such as Huawei Browser or Huawei Video, or it can refer to a specific channel, such as the entertainment channel, news channel, or technology channel in the browser's information stream.

[0225] In one possible implementation, different recommendation scenarios are different applications, or different recommendation scenarios are different types of applications (e.g., video applications and browser applications are different applications), or different recommendation scenarios are different functions of the same application (e.g., different channels of the same application, such as news channels, technology channels, etc.), and the above different functions can be divided according to recommendation categories.

[0226] 502. Based on the attribute information, a first embedding representation and a second embedding representation are obtained by passing a first feature extraction network and a second feature extraction network respectively. The first embedding representation is a feature that is not related to the recommendation scene information, and the second embedding representation is a feature that is related to the target recommendation scene. The first embedding representation and the second embedding representation are used to fuse to obtain a fused embedding representation.

[0227] Next, we will first introduce how to obtain the first feature extraction network:

[0228] In one possible implementation, based on the attribute information, a first neural network can be used to predict the user's second operation information on the item; based on the attribute information, a second neural network can be used to predict a first recommended scenario for the operation data; wherein the difference between the first operation information and the second operation information, and the difference between the first recommended scenario and the target recommended scenario, are used to determine a first loss; based on the first loss, the gradient corresponding to the third feature extraction network in the first neural network and the gradient corresponding to the fourth feature extraction network in the second neural network are orthogonalized to obtain a first gradient corresponding to the initial feature extraction network; based on the first gradient, the third feature extraction network is updated to obtain the first feature extraction network.

[0229] In one possible implementation, during the process of predicting the user's second operation information on the item through a first neural network based on the attribute information, and predicting the first recommended scenario of the operation data through a second neural network based on the attribute information, information indicating the target recommended scenario is not used as input to the third and fourth feature extraction networks.

[0230] Since the first operation information is the ground truth used when training the third feature extraction network, it does not need to be input into the third feature extraction network during the feedforward process. Similarly, since the information indicating the target recommendation scene is the ground truth used when training the fourth feature extraction network, it also does not need to be input into the third feature extraction network during the feedforward process.

[0231] In one possible implementation, the second operation information can indicate whether the user has performed a target operation. The target operation can be a type of user behavior. On network platforms and applications, users often have various forms of interaction with items (that is, multiple operation types), such as browsing, clicking, adding to cart, and purchasing in e-commerce platforms.

[0232] In one possible implementation, the second operation information could be the probability that the user will perform a target operation on the item.

[0233] For example, the second operation information could be whether the user clicked, or the probability of clicking.

[0234] In one possible implementation, the first neural network may include a multilayer perceptron (MLP) and an output layer. The first neural network may output second operation information of the user on the item, which may indicate whether the user will perform a target operation on the item.

[0235] In one possible implementation, a first recommended scenario for the operation data can be predicted using a second neural network; wherein the difference between the first operation information and the second operation information, and the difference between the first recommended scenario and the target recommended scenario, are used to determine a first loss.

[0236] In one possible implementation, the first recommendation scenario can be represented by an identifier, for example, 1 represents application A, 2 represents application B, etc.

[0237] In one possible implementation, the second neural network may include a multilayer perceptron (MLP) and an output layer, which can output a first recommended scenario for the operational data.

[0238] In one possible implementation, after the second operation information and the first recommendation scenario are predicted by the first neural network and the second neural network, a loss function (e.g., the first loss in the embodiments of this application) can be constructed based on the ground truth of the second operation information and the first recommendation scenario, respectively. For example, the first loss can be determined based on the difference between the first operation information and the second operation information, and the difference between the first recommendation scenario and the target recommendation scenario.

[0239] It should be understood that the first loss may include other loss items in addition to the differences between the first operation information and the second operation information, and the differences between the first recommended scenario and the target recommended scenario, which are not limited here.

[0240] In one possible implementation, the first gradient corresponding to the third feature extraction network can be obtained by orthogonalizing the gradients corresponding to the third feature extraction network and the fourth feature extraction network based on the first loss.

[0241] Reference Figure 6The approach of this application embodiment is as follows: By training the third feature extraction network, the trained third feature extraction network can identify unbiased feature representations shared across various scenarios. This application embodiment uses a second neural network. Since the second neural network is used to identify the recommendation scenario where the operation data is located, the embedding representation obtained based on the fourth feature extraction network in the second neural network can carry semantic information strongly related to the recommendation scenario. This semantic information strongly related to the recommendation scenario is not needed in the unbiased feature representation. Therefore, in order for the third feature extraction network to have the ability to identify embedding representations that do not have semantic information strongly related to the recommendation scenario (which this application embodiment can call specariore representation), in this application embodiment, when determining the gradients used to update the third and fourth feature extraction networks, the gradients of the third and fourth feature extraction networks are orthogonalized. Orthogonalization can constrain the gradient directions (i.e., the parameter update directions) of the third and fourth feature extraction networks to be mutually orthogonal or nearly mutually orthogonal. This allows the embedding representations extracted by the third and fourth feature extraction networks to possess different information, achieving separation of embedding representations. Since the second neural network has excellent ability to distinguish recommendation scenarios from operational data, the updated fourth feature extraction network's extracted embedding representation has semantic information strongly correlated with the recommendation scenario. Furthermore, the first neural network is used to identify operational information, and the trained first neural network has good predictive ability for user actions. Therefore, the trained third feature extraction network can identify information used for operational information recognition (i.e., the outer edge of the information), and this information does not have semantic information strongly correlated with the recommendation scenario. This improves the generalization ability of the recommendation model across various scenarios.

[0242] In one possible implementation, an additional neural network can be deployed to orthogonalize the gradients corresponding to the third feature extraction network and the fourth feature extraction network.

[0243] In one possible implementation, a constraint term can be added to the first loss to orthogonalize the gradients of the third and fourth feature extraction networks.

[0244] In one possible implementation, after obtaining the initial gradients corresponding to the third and fourth feature extraction networks based on the first loss, the initial gradients corresponding to the third and fourth feature extraction networks can be orthogonalized so that the directions of the obtained first and fourth feature extraction networks are orthogonal (or nearly orthogonal).

[0245] In one possible implementation, the third feature extraction network can be updated based on the first gradient to obtain the first feature extraction network.

[0246] In this embodiment of the application, in order to improve the generalization of the model, the unbiased representation obtained by the third feature extraction network and the biased representation obtained by the fourth feature extraction network can be combined (or fused), so that the combined representation can still have a high prediction ability after being processed by the neural network (for operation information prediction).

[0247] Reference Figure 7 In one possible implementation, the unbiased representation obtained from the third feature extraction network and the biased representation obtained from the fourth feature extraction network can be input into the fourth neural network to predict the user's fifth operation information on the item; the difference between the fifth operation information and the first operation information is used to determine the first loss. For example, the unbiased representation obtained from the third feature extraction network and the biased representation obtained from the fourth feature extraction network can be fused (e.g., by splicing), and the fusion result can be input into the fourth neural network. Optionally, the fourth neural network and the first neural network can have the same or similar network structures. For details, please refer to the description of the first neural network in the above embodiments, which will not be repeated here.

[0248] In one possible implementation, the unbiased representation obtained from the third feature extraction network and the biased representation obtained from the fourth feature extraction network can be input into the fourth neural network to obtain the user's fifth operation information on the item. The first loss is constructed based on the difference between the fifth operation information and the first operation information (i.e., the truth value). In other words, the first loss includes not only the loss terms that include the difference between the first and second operation information and the difference between the first recommendation scenario and the target recommendation scenario, but also the difference between the fifth operation information and the first operation information.

[0249] This application uses a first feature extraction network to extract features unrelated to the recommendation scenario information, namely the unbiased feature representation of each scenario, and fuses them with features related to the recommendation scenario information (which can be referred to as scenario representation in this application embodiment). This can represent the user-specific behavioral characteristics of each scenario, as well as the user-specific behavioral characteristics between different scenarios, thereby improving the prediction accuracy of subsequent operation information prediction.

[0250] 503. Based on the fused embedded representation, predict the user's target operation information on the item.

[0251] In one possible implementation, the difference between the target operation information and the first operation information is used to determine a second loss; the method further includes updating the first feature extraction network based on the second loss.

[0252] In one possible implementation, during the actual model inference process, the trained third feature extraction network needs to be connected to the scene-related operation information prediction network. Each scene corresponds to an operation information prediction network related to that scene. During inference, in order to predict the user's operation information on items in a certain recommendation scene (recommendation scene A), the attribute information of the user and the item is input into the trained third feature extraction network to obtain an embedded representation (e.g., a first embedded representation). The attribute information of the user and the item is also input into the feature extraction network related to recommendation scene A (or input into the feature extraction networks related to each scene, and then weighted based on the scene weights) to obtain an embedded representation (e.g., a second embedded representation). The first and second embedded representations can be fused and input into the operation information prediction network corresponding to recommendation scene A to obtain the predicted operation information (e.g., target operation information).

[0253] Therefore, during training, for updating the third feature extraction network, in addition to the gradient obtained by backpropagation based on the output of the first neural network (e.g., the first gradient), it is also necessary to update the gradient obtained by backpropagation based on the output of the operation information prediction network corresponding to each scenario (e.g., target operation information) (e.g., the second gradient).

[0254] However, the gradients mentioned above are scenario-specific gradients (e.g., the second gradient). These gradients (e.g., the second gradient) can have a negative impact on the gradients obtained based on unbiased representations (e.g., the first gradient). The parameter update directions of these gradients will conflict. For example, if gradients A and B are gradients with opposite directions, directly superimposing gradients A and B and then updating them is equivalent to not updating the parameters at all. This means that the effective information between them cannot be used effectively to improve the performance of the corresponding scenario.

[0255] In the embodiments of this application, reference is made to Figure 8 To address the aforementioned issues, the third feature extraction network is first updated based on the gradient obtained from the unbiased representation. Then, on one hand, the attribute information of users and items is processed using a feature extraction network related to the recommendation scenario (e.g., the second feature extraction network) (or input into various scenario-related feature extraction networks and then weighted based on scenario weights) to obtain an embedded representation (e.g., a second embedded representation). On the other hand, the updated third feature network processes the attribute information of users and items to obtain an embedded representation (e.g., a first embedded representation). This first and second embedded representations can be fused and input into an operation information prediction network corresponding to recommendation scenario A to obtain predicted operation information (e.g., target operation information). A loss (e.g., a second loss) is obtained based on the target operation information, and a gradient (e.g., a second gradient) is determined based on the second loss. The first feature extraction network is then updated based on the second gradient.

[0256] In this way, instead of combining the gradient obtained based on unbiased representation and the gradient obtained based on scene-related operation information to update the third feature extraction, this application updates the third feature extraction network after updating the third feature extraction network based on the gradient obtained based on unbiased representation (to obtain the first feature extraction network), and then updates the first feature extraction network based on the gradient obtained based on scene-related operation information. This ensures that there is no negative influence between the gradient related to the specific scene and the gradient obtained based on unbiased representation, and can make good use of the effective information between them to improve the effect of the corresponding scene.

[0257] In one possible implementation, refer to Figure 9The step of obtaining a first embedding representation and a second embedding representation based on the attribute information through a first feature extraction network and a second feature extraction network respectively includes: obtaining a second embedding representation based on the attribute information through a second feature extraction network; the step of obtaining a second embedding representation based on the attribute information through a second feature extraction network includes: obtaining multiple embedding representations based on the attribute information through multiple feature extraction networks, including the second feature extraction network; wherein each feature extraction network corresponds to a recommendation scenario, and the second feature extraction network corresponds to the target recommendation scenario; and the multiple embedding representations are fused to obtain the second embedding representation.

[0258] In one possible implementation, fusing the multiple embedded representations includes: predicting the probability value of the attribute information corresponding to each recommendation scenario based on the attribute information; and fusing the multiple embedded representations by using each probability value as a weight for the corresponding recommendation scenario.

[0259] In other words, a corresponding feature extraction network can be set up for each recommendation scenario. During the feedforward process, attribute information is input into each feature extraction network, and each feature extraction network can output an embedding representation. Multiple embedding representations output by multiple feature extraction networks can be fused. The fusion method can be based on determining the weights (or probability values) corresponding to each recommendation scenario based on the attribute information, and then fusing multiple embedding representations based on these probability values ​​to obtain a second embedding representation. In one possible implementation, the probability values ​​corresponding to each recommendation scenario can be obtained based on the attribute information; each probability value can be used as the weight for the corresponding recommendation scenario, and multiple embedding representations can be fused to obtain a second embedding representation. For example, a weighted summation can be used.

[0260] In one possible implementation, a fourth neural network can be used to obtain the probability values ​​of attribute information corresponding to each recommendation scenario. The fourth neural network can reuse the second neural network and use the output probability for each recommendation scenario as the weight of multiple embedding representations. Alternatively, an end-to-end retraining of a fourth neural network with recommendation scenario prediction capabilities can be selected to achieve efficient fusion of information from multiple scenarios.

[0261] The above describes how to obtain a second embedding representation based on the attribute information through a second feature extraction network. This second embedding representation is a scene-related embedding representation. A feedforward process is also required based on a first feature extraction network. Specifically, a first embedding representation can be obtained based on the attribute information through the first feature extraction network. Then, the first embedding representation and the second embedding representation can be fused (e.g., through matrix multiplication) to obtain a fused embedding representation. Finally, based on the fused embedding representation, the user's target operation information on the item in the target recommendation scenario can be predicted.

[0262] It should be understood that while the weighted fusion method based on multiple feature extraction networks described above can result in a second embedded representation that contains more information about the corresponding recommendation scenario and less information about non-corresponding recommendation scenarios, the attribute information (excluding information indicating the target recommendation scenario) is input into multiple feature extraction networks. Therefore, the embedded representations output by these networks lack accurate semantic information about the corresponding recommendation scenario. Thus, during the training of multiple feature extraction networks, information indicating the recommendation scenario and attribute information can be additionally input into the feature extraction networks to participate in the feedforward process of network training.

[0263] In one possible implementation, the operation data includes information indicating the target recommendation scenario; the method further includes: obtaining a third embedding representation based on the operation data through a second feature extraction network; predicting third operation information of the user on the item based on the third embedding representation through a third neural network; wherein the difference between the third operation information and the first operation information is used to determine a fourth loss; and updating the third neural network and the second feature extraction network based on the fourth loss.

[0264] In one possible implementation, during each iteration, the model can be updated based on the gradients obtained from a batch of data. This batch of data can contain operational data from different recommendation scenarios; for example, it can include operational data from a second recommendation scenario. A loss (e.g., the third loss in this embodiment) and a gradient for updating the first feature extraction network can also be obtained based on the operational data from the second recommendation scenario. However, since the gradients obtained based on the second loss and the third loss are from different recommendation scenarios, they may have negative influences on each other (their parameter update directions may conflict; for example, gradients A and B are gradients with opposite directions. If gradients A and B are directly superimposed and then updated, it is equivalent to not updating the parameters at all). This prevents the effective use of their information to improve the performance of the corresponding scenario.

[0265] To address the aforementioned issues, in this embodiment of the application, the gradients obtained based on the second loss and the gradients obtained based on the third loss are orthogonalized, thereby reducing the mutual negative impact between gradients obtained from different recommendation scenarios.

[0266] In one possible implementation, user operation data (including attribute information) in the second recommendation scenario can be obtained; based on the operation data in the second recommendation scenario, the user's operation information on items in the second recommendation scenario can be predicted (for example, the attribute information of the user's operation data in the second recommendation scenario can be input into the feature extraction network corresponding to the second recommendation scenario, specifically referring to the description of the second feature extraction network in the above embodiments, to obtain an embedding representation, which can be input into the neural network corresponding to the second recommendation scenario (for predicting operation information in the second recommendation scenario) to obtain the corresponding prediction result). The prediction result can determine the third loss based on the true value of the operation information in the operation data of the second recommendation scenario, and the third loss can obtain the gradient (a third gradient) corresponding to the first feature extraction network during backpropagation.

[0267] In one possible implementation, multiple third gradients of the first feature extraction network can be obtained by orthogonalizing multiple gradients corresponding to the first feature extraction network based on the second loss and the third loss. One of the multiple third gradients is obtained based on the second loss, and another of the multiple third gradients is obtained based on the third loss. The multiple third gradients are then fused (e.g., by vector summation) to obtain the second gradient corresponding to the first feature extraction network. The first feature extraction network is then updated based on the second gradient.

[0268] In one possible implementation, during the model's reasoning process, when the target operation information meets preset conditions, it is determined to recommend the item to the user.

[0269] By using the above method, the probability of a user performing an action on an item can be obtained, and information can be recommended based on the probability. Specifically, when the recommended information meets preset conditions, the item can be recommended to the user.

[0270] When recommending information, you can present the recommended information to users in the form of a list page, hoping that users will take action.

[0271] The beneficial effects of the embodiments of this application will be described below in conjunction with experimental results:

[0272] The embodiments of this application were verified on publicly available commercial datasets and private company datasets, and the data statistics are as follows:

[0273] Table 1. Statistics of Dataset Information

[0274]

[0275] The offline evaluation metric is AUC, while the online evaluation metrics are CTR and ECPM.

[0276] Compared to the existing baseline, the offline experimental results are as follows:

[0277] Table 2. Overall performance on public datasets

[0278]

[0279] Table 3. Overall performance on the company's private dataset

[0280]

[0281] It can be seen that this solution outperforms the baseline model (including single-scenario modeling solutions, heuristic solutions, multi-task solutions, and existing multi-scenario modeling solutions) on both public datasets and company datasets.

[0282] The three parts of this solution can be used independently or in combination. Table 4 shows the incremental effects of the modules involved in this solution.

[0283] Table 4. Incremental stacking effect of different modules

[0284]

[0285] As can be seen, based on the simplest skeleton model Shared Bottom, the performance of the model gradually increases as three modules are added, meaning that the improvement relative to the baseline model becomes increasingly significant.

[0286] The following describes an operation prediction device provided in the embodiments of this application from the perspective of the device itself, referring to... Figure 11 , Figure 11 This application provides a schematic diagram of the structure of an operation prediction device, as shown in the embodiments. Figure 11 As shown, an operation prediction device 1100 provided in this application embodiment includes:

[0287] The acquisition module 1101 is used to acquire attribute information of users and items in the target recommendation scenario;

[0288] For a detailed description of the acquisition module 1101, please refer to the description of step 501 in the above embodiment, which will not be repeated here.

[0289] Feature extraction module 1102 is used to obtain a first embedding representation and a second embedding representation based on the attribute information through a first feature extraction network and a second feature extraction network, respectively. The first embedding representation is a feature unrelated to the recommendation scene information, and the second embedding representation is a feature related to the target recommendation scene. The first embedding representation and the second embedding representation are fused to obtain a fused embedding representation.

[0290] For a detailed description of the feature extraction module 1102, please refer to the description of step 502 in the above embodiment, which will not be repeated here.

[0291] The prediction module 1103 is used to predict the user's target operation information on the item based on the fused embedded representation.

[0292] For a detailed description of the prediction module 1103, please refer to the descriptions of steps 503 and 504 in the above embodiments, which will not be repeated here.

[0293] In one possible implementation, the attribute information includes the user's operation data in the target recommendation scenario, and the operation data further includes the user's first operation information on the item;

[0294] The prediction module is also used for:

[0295] Based on the attribute information, a first neural network is used to predict the user's second operation information on the item;

[0296] Based on the attribute information, a first recommended scenario for the operation data is predicted using a second neural network; wherein the difference between the first operation information and the second operation information, and the difference between the first recommended scenario and the target recommended scenario, are used to determine the first loss;

[0297] The device further includes:

[0298] The model update module 1104 is further configured to, based on the first loss, orthogonalize the gradient corresponding to the third feature extraction network in the first neural network and the gradient corresponding to the fourth feature extraction network in the second neural network to obtain the first gradient corresponding to the initial feature extraction network.

[0299] The third feature extraction network is updated based on the first gradient to obtain the first feature extraction network.

[0300] In one possible implementation, the difference between the target operation information and the first operation information is used to determine the second loss; the model update module 1104 is further configured to:

[0301] The first feature extraction network is updated based on the second loss.

[0302] In one possible implementation, the feature extraction module is specifically used to obtain a second embedding representation based on the attribute information through a second feature extraction network;

[0303] The step of obtaining the second embedding representation through the second feature extraction network based on the attribute information includes:

[0304] Based on the attribute information, multiple embedding representations are obtained through multiple feature extraction networks, including the second feature extraction network; wherein each feature extraction network corresponds to a recommendation scenario, and the second feature extraction network corresponds to the target recommendation scenario;

[0305] The multiple embedded representations are fused to obtain the second embedded representation.

[0306] In one possible implementation, the feature extraction module is specifically used to predict the probability value of the attribute information corresponding to each recommendation scenario based on the attribute information.

[0307] Each probability value is used as a weight for the corresponding recommendation scenario, and the multiple embedded representations are fused.

[0308] In one possible implementation, the acquisition module is further configured to:

[0309] Obtain user action data in the second recommendation scenario;

[0310] The prediction module is further configured to predict the user's operation information on the item in the second recommendation scenario based on the operation data in the second recommendation scenario; wherein the user's operation information on the item in the second recommendation scenario is used to determine the third loss;

[0311] The model update module 1104 is specifically used to: based on the second loss and the third loss, orthogonalize multiple gradients corresponding to the first feature extraction network to obtain multiple third gradients of the first feature extraction network;

[0312] The multiple fourth gradients are fused to obtain the second gradient corresponding to the first feature extraction network;

[0313] The first feature extraction network is updated based on the second gradient.

[0314] In one possible implementation, the operation data includes information indicating the target recommendation scenario; the feature extraction module is further configured to obtain a third embedding representation based on the operation data through the second feature extraction network.

[0315] Based on the third embedding representation, the third neural network predicts the user's third operation information on the item; wherein the difference between the third operation information and the first operation information is used to determine the fourth loss.

[0316] The model update module 1104 is further configured to: update the third neural network and the second feature extraction network according to the fourth loss.

[0317] In one possible implementation, the target operation information indicates whether the user has performed a target operation on the item, the target operation including at least one of the following:

[0318] Clicking, browsing, adding to cart, and purchasing.

[0319] In one possible implementation, the attribute information includes the user's user attributes, which include at least one of the following: gender, age, occupation, income, hobbies, and education level.

[0320] In one possible implementation, the attribute information includes the item attributes of the item, which include at least one of the following: item name, developer, installation package size, category, and rating.

[0321] In one possible implementation, different recommendation scenarios are for different applications; or,

[0322] Different recommendation scenarios are for different types of applications; or,

[0323] Different recommendation scenarios represent different functions of the same application.

[0324] In one possible implementation, the device further includes:

[0325] The recommendation module is used to recommend the item to the user when the target operation information meets preset conditions.

[0326] This application embodiment also provides a model training apparatus, the apparatus comprising:

[0327] The acquisition module is used to acquire user operation data in the target recommendation scenario. The operation data includes the attribute information of the user and the item, as well as the user's first operation information on the item.

[0328] The prediction module is used to predict the user's second operation information on the item based on the attribute information and through a first neural network.

[0329] Based on the attribute information, a first recommended scenario for the operation data is predicted using a second neural network; wherein the difference between the first operation information and the second operation information, and the difference between the first recommended scenario and the target recommended scenario, are used to determine the first loss;

[0330] Based on the first loss, the first gradient corresponding to the initial feature extraction network is obtained by orthogonalizing the gradients corresponding to the third feature extraction network in the first neural network and the fourth feature extraction network in the second neural network.

[0331] The model update module is used to update the third feature extraction network according to the first gradient to obtain the first feature extraction network.

[0332] In one possible implementation, the device further includes:

[0333] The feature extraction module is used to obtain a first embedding representation and a second embedding representation based on the attribute information through the first feature extraction network and the second feature extraction network, respectively; the first embedding representation and the second embedding representation are fused to obtain a fused embedding representation;

[0334] The prediction module is further configured to predict the user's target operation information on the item based on the fused embedded representation; the difference between the target operation information and the first operation information is used to determine a second loss.

[0335] The model update module is also used to update the first feature extraction network based on the second loss.

[0336] In one possible implementation, obtaining the first embedding representation and the second embedding representation based on the attribute information through the first feature extraction network and the second feature extraction network respectively includes:

[0337] Based on the attribute information, a second embedding representation is obtained through a second feature extraction network;

[0338] The step of obtaining the second embedding representation through the second feature extraction network based on the attribute information includes:

[0339] Based on the attribute information, multiple embedding representations are obtained through multiple feature extraction networks, including the second feature extraction network; wherein each feature extraction network corresponds to a recommendation scenario, and the second feature extraction network corresponds to the target recommendation scenario;

[0340] The multiple embedded representations are fused to obtain the second embedded representation.

[0341] In one possible implementation, the acquisition module is further configured to:

[0342] Obtain user action data in the second recommendation scenario;

[0343] The prediction module is further configured to predict the user's operation information on the item in the second recommendation scenario based on the operation data in the second recommendation scenario; wherein the user's operation information on the item in the second recommendation scenario is used to determine the third loss;

[0344] The model update module is specifically used to obtain multiple third gradients of the first feature extraction network by orthogonalizing multiple gradients corresponding to the first feature extraction network based on the second loss and the third loss.

[0345] The multiple fourth gradients are fused to obtain the second gradient corresponding to the first feature extraction network;

[0346] The first feature extraction network is updated based on the second gradient.

[0347] The following describes an execution device provided in an embodiment of this application. Please refer to [link / reference]. Figure 12 , Figure 12 This is a schematic diagram of an execution device provided in an embodiment of this application. The execution device 1200 can specifically be a mobile phone, tablet, laptop, smart wearable device, server, etc., and is not limited thereto. The execution device 1200 implements... Figure 5 The corresponding embodiment describes the function of the operation prediction method. Specifically, the execution device 1200 includes: a receiver 1201, a transmitter 1202, a processor 1203, and a memory 1204 (wherein the execution device 1200 may have one or more processors 1203), wherein the processor 1203 may include an application processor 12031 and a communication processor 12032. In some embodiments of this application, the receiver 1201, transmitter 1202, processor 1203, and memory 1204 may be connected via a bus or other means.

[0348] Memory 1204 may include read-only memory and random access memory, and provides instructions and data to processor 1203. A portion of memory 1204 may also include non-volatile random access memory (NVRAM). Memory 1204 stores processor and operation instructions, executable modules, or data structures, or subsets thereof, or extended sets thereof, wherein the operation instructions may include various operation instructions for implementing various operations.

[0349] Processor 1203 controls the operation of the execution device. In specific applications, the various components of the execution device are coupled together through a bus system, which may include not only the data bus, but also power buses, control buses, and status signal buses. However, for clarity, all buses in the diagram are referred to as the bus system.

[0350] The methods disclosed in the embodiments of this application can be applied to or implemented by the processor 1203. The processor 1203 can be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above methods can be completed by the integrated logic circuitry in the hardware of the processor 1203 or by instructions in software form. The processor 1203 can be a general-purpose processor, a digital signal processor (DSP), a microprocessor or microcontroller, a vision processing unit (VPU), a tensor processing unit (TPU), or other processors suitable for AI computation. It may further include application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. The processor 1203 can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in the embodiments of this application can be directly manifested as being executed by a hardware decoding processor, or being executed by a combination of hardware and software modules in the decoding processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory 1204. Processor 1203 reads information from memory 1204 and, in conjunction with its hardware, completes steps 501 to 503 in the above embodiments.

[0351] Receiver 1201 can be used to receive input digital or character information, and to generate signal inputs related to the settings and function control of the execution device. Transmitter 1202 can be used to output digital or character information through the first interface; transmitter 1202 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; transmitter 1202 may also include a display device such as a display screen.

[0352] This application also provides a training device; please refer to [link / reference]. Figure 13 , Figure 13 This is a schematic diagram of a training device provided in an embodiment of this application. Specifically, the training device 1300 is implemented by one or more servers. The training device 1300 can vary significantly due to different configurations or performance. It may include one or more central processing units (CPUs) 1313 (e.g., one or more processors) and memory 1332, and one or more storage media 1330 (e.g., one or more mass storage devices) for storing application programs 1342 or data 1344. The memory 1332 and storage media 1330 can be temporary or persistent storage. The program stored in the storage media 1330 may include one or more modules (not shown in the diagram), each module may include a series of instruction operations on the training device. Furthermore, the CPU 1313 may be configured to communicate with the storage media 1330 and execute the series of instruction operations in the storage media 1330 on the training device 1300.

[0353] The training device 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input / output interfaces 1358; or, one or more operating systems 1341, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, etc.

[0354] Specifically, the training device can perform steps 501 to 503 in the above embodiments.

[0355] This application also provides a computer program product that, when run on a computer, causes the computer to perform steps as performed by the aforementioned execution device, or causes the computer to perform steps as performed by the aforementioned training device.

[0356] This application also provides a computer-readable storage medium storing a program for signal processing, which, when run on a computer, causes the computer to perform steps as performed by the aforementioned execution device, or causes the computer to perform steps as performed by the aforementioned training device.

[0357] The execution device, training device, or terminal device provided in this application embodiment can specifically be a chip. The chip includes a processing unit and a communication unit. The processing unit can be, for example, a processor, and the communication unit can be, for example, an input / output interface, pins, or circuits. The processing unit can execute computer execution instructions stored in the storage unit to cause the chip within the execution device to execute the data processing method described in the above embodiments, or to cause the chip within the training device to execute the data processing method described in the above embodiments. Optionally, the storage unit can be a storage unit within the chip, such as a register or cache. Alternatively, the storage unit can be a storage unit located outside the chip within the wireless access device, such as a read-only memory (ROM) or other types of static storage devices capable of storing static information and instructions, such as random access memory (RAM).

[0358] For details, please refer to Figure 14 , Figure 14 This is a schematic diagram of a chip provided in an embodiment of this application. The chip can be represented as a neural network processor (NPU) 1400. The NPU 1400 is mounted as a coprocessor on the host CPU, and tasks are assigned by the host CPU. The core part of the NPU is the arithmetic circuit 1403, which is controlled by a controller 1404 to extract matrix data from the memory and perform multiplication operations.

[0359] The NPU 1400 achieves this through the cooperation of its various internal components. Figure 5 The operation prediction method provided in the described embodiments.

[0360] More specifically, in some implementations, the arithmetic circuitry 1403 within the NPU 1400 includes multiple processing engines (PEs). In some implementations, the arithmetic circuitry 1403 is a two-dimensional pulsating array. The arithmetic circuitry 1403 can also be a one-dimensional pulsating array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 1403 is a general-purpose matrix processor.

[0361] For example, suppose we have an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit retrieves the corresponding data of matrix B from the weight memory 1402 and caches it in each PE of the arithmetic circuit. The arithmetic circuit retrieves the data of matrix A from the input memory 1401 and performs matrix operations with matrix B. The partial result or the final result of the obtained matrix is ​​stored in the accumulator 1408.

[0362] Unified memory 1406 is used to store input and output data. Weight data is directly transferred to weight memory 1402 via Direct Memory Access Controller (DMAC) 1405. Input data is also transferred to unified memory 1406 via DMAC.

[0363] BIU stands for Bus Interface Unit, which is used for interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1409.

[0364] The Bus Interface Unit (BIU) 1410 is used by the instruction fetch memory 1409 to fetch instructions from external memory, and also by the memory access controller 1405 to fetch the original data of the input matrix A or the weight matrix B from external memory.

[0365] The DMAC is mainly used to move input data from external memory DDR to unified memory 1406, or to weight data to weight memory 1402, or to input data to input memory 1401.

[0366] The vector computation unit 1407 includes multiple processing units that further process the output of the computation circuit 1403 when needed, such as vector multiplication, vector addition, exponential operations, logarithmic operations, size comparisons, etc. It is mainly used for computation in non-convolutional / fully connected layers of neural networks, such as batch normalization, pixel-level summation, and upsampling of feature planes.

[0367] In some implementations, the vector computation unit 1407 can store the processed output vector in the unified memory 1406. For example, the vector computation unit 1407 can apply a linear function, or a nonlinear function, to the output of the computation circuit 1403, such as performing linear interpolation on feature planes extracted from a convolutional layer, or, for example, accumulating a vector of values ​​to generate activation values. In some implementations, the vector computation unit 1407 generates normalized values, pixel-level summed values, or both. In some implementations, the processed output vector can be used as activation input to the computation circuit 1403, for example, for use in subsequent layers of the neural network.

[0368] The instruction fetch buffer 1409 connected to the controller 1404 is used to store the instructions used by the controller 1404;

[0369] Unified memory 1406, input memory 1401, weighted memory 1402, and instruction fetch memory 1409 are all on-chip memories. External memory is proprietary to this NPU hardware architecture.

[0370] The processor mentioned above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the above program.

[0371] It should also be noted that the device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. In addition, in the device embodiment drawings provided in this application, the connection relationship between modules indicates that they have a communication connection, which can be implemented as one or more communication buses or signal lines.

[0372] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general-purpose hardware, or it can be implemented by special-purpose hardware including application-specific integrated circuits, special-purpose CPUs, special-purpose memory, special-purpose components, etc. Generally, any function performed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be diverse, such as analog circuits, digital circuits, or special-purpose circuits. However, for this application, software program implementation is more often the preferred implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, ROM, RAM, magnetic disk, or optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, training equipment, or network device, etc.) to execute the methods described in the various embodiments of this application.

[0373] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product.

[0374] The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device or data center that integrates one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state drives (SSDs)).

Claims

1. An operation prediction method characterized by, The method includes: Obtain attribute information of users and items in the target recommendation scenario; the attribute information includes the user's operation data in the target recommendation scenario, and the operation data also includes the user's first operation information on the item; Based on the attribute information, a first neural network is used to predict the user's second operation information on the item; Based on the attribute information, a first recommended scenario for the operation data is predicted using a second neural network; wherein the difference between the first operation information and the second operation information, and the difference between the first recommended scenario and the target recommended scenario, are used to determine the first loss; Based on the first loss, the first gradient corresponding to the initial feature extraction network is obtained by orthogonalizing the gradients corresponding to the third feature extraction network in the first neural network and the fourth feature extraction network in the second neural network. The third feature extraction network is updated based on the first gradient to obtain the first feature extraction network. Based on the attribute information, a first embedding representation and a second embedding representation are obtained by passing a first feature extraction network and a second feature extraction network, respectively. The first embedding representation consists of features that are unrelated to the recommendation scenario information, which are common features under different recommendation scenarios. The second embedding representation consists of features that are related to the target recommendation scenario. The first embedding representation and the second embedding representation are fused to obtain a fused embedding representation. Based on the fused embedded representation, predict the user's target operation information for the item.

2. The method of claim 1, wherein, The difference between the target operation information and the first operation information is used to determine the second loss; the method further includes: The first feature extraction network is updated based on the second loss.

3. The method of claim 2, wherein, The step of obtaining a first embedding representation and a second embedding representation based on the attribute information through a first feature extraction network and a second feature extraction network, respectively, includes: Based on the attribute information, a second embedding representation is obtained through a second feature extraction network; The step of obtaining the second embedding representation through the second feature extraction network based on the attribute information includes: Based on the attribute information, multiple embedding representations are obtained through multiple feature extraction networks, including the second feature extraction network; wherein each feature extraction network corresponds to a recommendation scenario, and the second feature extraction network corresponds to the target recommendation scenario; The multiple embedded representations are fused to obtain the second embedded representation.

4. The method of claim 3, wherein, The fusion of the multiple embedded representations includes: Based on the attribute information, predict the probability value of the attribute information corresponding to each recommendation scenario; Each probability value is used as a weight for the corresponding recommendation scenario, and the multiple embedded representations are fused.

5. The method according to any of claims 2-4, characterized by, The method further includes: Obtain user action data in the second recommendation scenario; Based on the operation data in the second recommendation scenario, predict the user's operation information on the item in the second recommendation scenario; wherein, the user's operation information on the item in the second recommendation scenario is used to determine the third loss; The step of updating the first feature extraction network based on the second loss includes: Based on the second loss and the third loss, multiple third gradients of the first feature extraction network are obtained by orthogonalizing multiple gradients corresponding to the first feature extraction network. Multiple third gradients are fused to obtain the second gradient corresponding to the first feature extraction network; The first feature extraction network is updated based on the second gradient.

6. The method according to any one of claims 2 to 4, characterized in that, The operational data includes information indicating the target recommendation scenario; the method further includes: Based on the operation data, a third embedding representation is obtained through the second feature extraction network; Based on the third embedding representation, a third neural network is used to predict the user's third operation information on the item; wherein the difference between the third operation information and the first operation information is used to determine the fourth loss. The third neural network and the second feature extraction network are updated based on the fourth loss.

7. The method according to any one of claims 1 to 4, characterized in that, The target operation information indicates whether the user has performed a target operation on the item, and the target operation includes at least one of the following: Clicking, browsing, adding to cart, and purchasing.

8. The method according to any one of claims 1 to 4, characterized in that, The attribute information includes the user's user attributes, which include at least one of the following: gender, age, occupation, income, hobbies, and education level.

9. The method according to any one of claims 1 to 4, characterized in that, The attribute information includes the item attributes of the item, and the item attributes include at least one of the following: item name, developer, installation package size, category, and rating.

10. The method according to any one of claims 1 to 4, characterized in that, Different recommendation scenarios are for different applications; or, Different recommendation scenarios are for different types of applications; or, Different recommended scenarios represent different functions of the same application.

11. The method according to any one of claims 1 to 4, characterized in that, The method further includes: When the target operation information meets the preset conditions, the item is recommended to the user.

12. A model training method, characterized in that, The method includes: The user's operation data in the target recommendation scenario is obtained, and the operation data includes the attribute information of the user and the item, as well as the user's first operation information on the item; Based on the attribute information, a first neural network is used to predict the user's second operation information on the item; Based on the attribute information, a first recommended scenario for the operation data is predicted using a second neural network; wherein the difference between the first operation information and the second operation information, and the difference between the first recommended scenario and the target recommended scenario, are used to determine the first loss; Based on the first loss, the first gradient corresponding to the initial feature extraction network is obtained by orthogonalizing the gradients corresponding to the third feature extraction network in the first neural network and the fourth feature extraction network in the second neural network. The third feature extraction network is updated based on the first gradient to obtain the first feature extraction network.

13. The method according to claim 12, characterized in that, The method further includes: Based on the attribute information, a first embedding representation and a second embedding representation are obtained through the first feature extraction network and the second feature extraction network, respectively; the first embedding representation and the second embedding representation are fused to obtain a fused embedding representation; Based on the fused embedded representation, the user's target operation information for the item is predicted; the difference between the target operation information and the first operation information is used to determine the second loss. The first feature extraction network is updated based on the second loss.

14. The method according to claim 12 or 13, characterized in that, The step of obtaining a first embedding representation and a second embedding representation based on the attribute information through the first feature extraction network and the second feature extraction network, respectively, includes: Based on the attribute information, a second embedding representation is obtained through a second feature extraction network; The step of obtaining the second embedding representation through the second feature extraction network based on the attribute information includes: Based on the attribute information, multiple embedding representations are obtained through multiple feature extraction networks, including the second feature extraction network; wherein each feature extraction network corresponds to a recommendation scenario, and the second feature extraction network corresponds to the target recommendation scenario; The multiple embedded representations are fused to obtain the second embedded representation.

15. The method according to claim 13, characterized in that, The method further includes: Obtain user action data in the second recommendation scenario; Based on the operation data in the second recommendation scenario, predict the user's operation information on the item in the second recommendation scenario; wherein, the user's operation information on the item in the second recommendation scenario is used to determine the third loss; The step of updating the first feature extraction network based on the second loss includes: Based on the second loss and the third loss, multiple third gradients of the first feature extraction network are obtained by orthogonalizing multiple gradients corresponding to the first feature extraction network. Multiple third gradients are fused to obtain the second gradient corresponding to the first feature extraction network; The first feature extraction network is updated based on the second gradient.

16. An operation prediction device, characterized in that, The device includes: The acquisition module is used to acquire attribute information of users and items in the target recommendation scenario; the attribute information includes the user's operation data in the target recommendation scenario, and the operation data also includes the user's first operation information on the item; The feature extraction module is used to obtain a first embedding representation and a second embedding representation based on the attribute information through a first feature extraction network and a second feature extraction network, respectively. The first embedding representation consists of features unrelated to the recommendation scenario information, which are common features under different recommendation scenarios. The second embedding representation consists of features related to the target recommendation scenario. The first embedding representation and the second embedding representation are fused to obtain a fused embedding representation. The prediction module is used to predict the user's target operation information on the item based on the fused embedded representation; The prediction module is also used for: Based on the attribute information, a first neural network is used to predict the user's second operation information on the item; Based on the attribute information, a first recommended scenario for the operation data is predicted using a second neural network; wherein the difference between the first operation information and the second operation information, and the difference between the first recommended scenario and the target recommended scenario, are used to determine the first loss; The device further includes: The model update module is further configured to, based on the first loss, orthogonalize the gradients corresponding to the third feature extraction network in the first neural network and the fourth feature extraction network in the second neural network to obtain the first gradient corresponding to the initial feature extraction network. The third feature extraction network is updated based on the first gradient to obtain the first feature extraction network.

17. The apparatus according to claim 16, characterized in that, The difference between the target operation information and the first operation information is used to determine the second loss; the model update module is further used to: The first feature extraction network is updated based on the second loss.

18. The apparatus according to claim 17, characterized in that, The feature extraction module is specifically used to obtain a second embedding representation based on the attribute information through a second feature extraction network; The step of obtaining the second embedding representation through the second feature extraction network based on the attribute information includes: Based on the attribute information, multiple embedding representations are obtained through multiple feature extraction networks, including the second feature extraction network; wherein each feature extraction network corresponds to a recommendation scenario, and the second feature extraction network corresponds to the target recommendation scenario; The multiple embedded representations are fused to obtain the second embedded representation.

19. The apparatus according to claim 18, characterized in that, The feature extraction module is specifically used to predict the probability value of the attribute information corresponding to each recommendation scenario based on the attribute information. Each probability value is used as a weight for the corresponding recommendation scenario, and the multiple embedded representations are fused.

20. The apparatus according to any one of claims 17 to 19, characterized in that, The acquisition module is also used for: Obtain user action data in the second recommendation scenario; The prediction module is further configured to predict the user's operation information on the item in the second recommendation scenario based on the operation data in the second recommendation scenario; wherein the user's operation information on the item in the second recommendation scenario is used to determine the third loss; The model update module is specifically used to: based on the second loss and the third loss, orthogonalize multiple gradients corresponding to the first feature extraction network to obtain multiple third gradients of the first feature extraction network; Multiple third gradients are fused to obtain the second gradient corresponding to the first feature extraction network; The first feature extraction network is updated based on the second gradient.

21. The apparatus according to any one of claims 17 to 19, characterized in that, The operation data includes information indicating the target recommendation scenario; the feature extraction module is further configured to obtain a third embedding representation based on the operation data through the second feature extraction network; Based on the third embedding representation, a third neural network is used to predict the user's third operation information on the item; wherein the difference between the third operation information and the first operation information is used to determine the fourth loss. The model update module is further configured to: update the third neural network and the second feature extraction network according to the fourth loss.

22. The apparatus according to any one of claims 17 to 19, characterized in that, The target operation information indicates whether the user has performed a target operation on the item, and the target operation includes at least one of the following: Clicking, browsing, adding to cart, and purchasing.

23. The apparatus according to any one of claims 16 to 19, characterized in that, The attribute information includes the user's user attributes, which include at least one of the following: gender, age, occupation, income, hobbies, and education level.

24. The apparatus according to any one of claims 16 to 19, characterized in that, The attribute information includes the item attributes of the item, and the item attributes include at least one of the following: item name, developer, installation package size, category, and rating.

25. The apparatus according to any one of claims 16 to 19, characterized in that, Different recommendation scenarios are for different applications; or, Different recommendation scenarios are for different types of applications; or, Different recommended scenarios represent different functions of the same application.

26. The apparatus according to any one of claims 16 to 19, characterized in that, The device further includes: When the target operation information meets the preset conditions, the item is recommended to the user.

27. A computing device, characterized in that, The computing device includes a memory and a processor; the memory stores code, and the processor is configured to retrieve the code and execute the method as described in any one of claims 1 to 15.

28. A computer storage medium, characterized in that, The computer storage medium stores one or more instructions that, when executed by one or more computers, cause the one or more computers to perform the method of any one of claims 1 to 15.

29. A computer program product, comprising code, characterized in that, When the code is executed, it is used to implement the method as described in any one of claims 1 to 15.