A method, apparatus, computer device, and storage medium for training a prediction model.

By employing a feature extraction method that combines multiple rounds of iterative training and nonlinear functions, the problem of overfitting in the prediction model is solved, thereby improving training efficiency and accuracy.

CN116796181BActive Publication Date: 2026-06-30TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2022-03-15
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing predictive model training methods suffer from overfitting due to limited sample data, resulting in low prediction accuracy and reliability.

Method used

By performing multiple rounds of iterative training on the sample data, feature extraction is performed using a combination of nonlinear functions to obtain various generalized features, and model parameters are adjusted based on these features to avoid overfitting.

Benefits of technology

It improves the training efficiency and accuracy of the prediction model, reduces the probability of misjudgment, and enhances the model's ability to learn from sample data.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116796181B_ABST
    Figure CN116796181B_ABST
Patent Text Reader

Abstract

This application provides a method, apparatus, computer device, and storage medium for training a prediction model, applicable to fields such as artificial intelligence and intelligent transportation, to address the problem of low prediction accuracy and reliability of trained prediction models. The method includes at least the following steps during each iteration of training: performing nonlinear transformations on multiple initial features based on various function combinations obtained by permuting and combining pre-stored nonlinear functions to obtain multiple generalization features; wherein each generalization feature represents an object type related to the sample object and a content type related to the recommended content of the sample; and determining the training loss of the prediction model to be trained based on the multiple generalization features, and adjusting the model parameters. This avoids overfitting and improves the prediction accuracy and reliability of the prediction model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a method, apparatus, computer device, and storage medium for training a prediction model. Background Technology

[0002] With the continuous development of technology, more and more devices can provide intelligent search services for target objects. For example, as devices can present increasingly rich content in response to search operations for target objects, when there is a lot of content found for the target object, the device can prioritize displaying the target content that the target object is more likely to choose from among the many contents.

[0003] When a device prioritizes displaying target content with a higher probability of being selected by the target object, it typically uses a trained prediction model obtained through multiple rounds of iterative training to predict the probability of the target object selecting different content.

[0004] In traditional methods for obtaining trained prediction models, due to the limited amount of sample data, it is usually necessary to repeatedly train the prediction model to be trained in multiple rounds based on the same set of sample data in order to make the training loss of the prediction model reach the training objective and thus obtain a trained prediction model.

[0005] However, when the prediction model is repeatedly trained based on the same set of sample data, the prediction model will only focus on learning the main features represented by that set of sample data. This will cause the obtained trained prediction model to have an overfit problem. As a result, when using the trained prediction model to predict the probability of the target object choosing different content, there will be a misjudgment probability. This will cause the device to prioritize the content that is not the target content that the target object needs to choose, resulting in low prediction accuracy and reliability of the trained prediction model.

[0006] It is evident that, under the relevant technologies, the prediction accuracy and reliability of the trained prediction models are relatively low. Summary of the Invention

[0007] This application provides a method, apparatus, computer device, and storage medium for training a prediction model, which addresses the problem of low prediction accuracy and reliability of the trained prediction model.

[0008] Firstly, a method for training a prediction model is provided, including:

[0009] Obtain each sample data, wherein each sample data includes: a sample object and sample recommended content, and the sample probability that the sample object selects the sample recommended content;

[0010] Based on the aforementioned sample data, the prediction model to be trained undergoes multiple rounds of iterative training, outputting the trained target prediction model; wherein, in each round of iterative training, at least the following steps are performed:

[0011] Feature extraction is performed on the sample objects and sample recommendation content contained in a sample data to obtain multiple initial features. Then, based on multiple function combinations obtained by permuting and combining various pre-stored nonlinear functions, nonlinear transformation is performed on the multiple initial features to obtain multiple generalized features. Each generalized feature represents an object type related to the sample object and a content type related to the sample recommendation content.

[0012] Based on the aforementioned generalization features, the training loss of the prediction model to be trained is determined, and the model parameters are adjusted.

[0013] Secondly, an apparatus for training a prediction model is provided, comprising:

[0014] Acquisition module: used to acquire each sample data, wherein each sample data includes: a sample object and sample recommended content, and the sample probability of the sample object selecting the sample recommended content;

[0015] Processing module: used to perform multiple rounds of iterative training on the prediction model to be trained based on the various sample data, and output the trained target prediction model; wherein, in each round of iterative training, at least the following steps are performed:

[0016] The processing module is further configured to: extract features from the sample objects and sample recommendation content contained in a sample data to obtain multiple initial features, and perform nonlinear transformation on the multiple initial features based on multiple function combinations obtained by permuting and combining pre-stored nonlinear functions to obtain multiple generalized features; wherein each generalized feature represents an object type related to the sample object and a content type related to the sample recommendation content;

[0017] The processing module is also used to: determine the training loss of the prediction model to be trained based on the multiple generalization features, and adjust the model parameters.

[0018] Optionally, the processing module is specifically used for:

[0019] From the pre-stored sparse feature dictionary, select each sparse feature that matches the sample object and each sparse feature that matches the sample recommendation content.

[0020] The selected sparse features are adjusted to a specified dimension to obtain the multiple initial features.

[0021] Optionally, the processing module is specifically used for:

[0022] For each combination of functions, perform the following operations:

[0023] Based on the nonlinear functions contained in a function combination, nonlinear transformations are performed on the multiple initial features respectively to obtain multiple nonlinear transformation results;

[0024] By combining the results of the multiple nonlinear transformations, a generalization feature is obtained.

[0025] Optionally, each function combination is a sequence of multiple sub-combinations, each sub-combination containing at least one nonlinear function; the processing module is specifically used for:

[0026] For each of the aforementioned function combinations, perform the following operations respectively:

[0027] According to the arrangement order of multiple sub-combinations contained in a function combination, based on at least one nonlinear function contained in the first sub-combination in the arrangement order, the multiple initial features are subjected to nonlinear transformation, and at least one corresponding intermediate feature is output.

[0028] According to the arrangement order, based on at least one nonlinear function contained in other sub-combinations, at least one intermediate feature output by the adjacent previous sub-combination is subjected to nonlinear transformation until the last sub-combination in the arrangement order is subjected to nonlinear transformation. Then, at least one intermediate feature output is used as a generalization feature.

[0029] Optionally, the processing module is specifically used for:

[0030] Multiple linear functions are used to perform linear transformations on the multiple initial features to obtain multiple linear transformation features, wherein each of the linear functions is a first-order linear function or a second-order linear function;

[0031] Based on the multiple linear transformation features and the multiple generalization features, the training loss of the prediction model to be trained is determined;

[0032] When the training loss does not meet the training objective, the model parameters of the prediction model to be trained are adjusted, and the next round of iterative training begins.

[0033] Optionally, the processing module is specifically used for:

[0034] For the various generalization features, the following operations are performed respectively: based on the multiple linear transformation features and one generalization feature, predict the training probability of the sample object selecting the sample recommendation content;

[0035] The training loss of the prediction model to be trained is determined based on the errors between the predicted multiple training probabilities and the sample probabilities contained in the sample data.

[0036] Optionally, the processing module is specifically used for:

[0037] The cross-entropy function is used to calculate the error between the multiple training probabilities and the sample probabilities contained in a single sample data, thereby obtaining multiple cross-entropy losses.

[0038] Using a similarity evaluation function, the error between every two training probabilities among the plurality of training probabilities is calculated to obtain at least one probability loss;

[0039] The training loss of the prediction model to be trained is determined based on the multiple cross-entropy losses and the at least one probability loss.

[0040] Optionally, the processing module is specifically used for:

[0041] Using a norm function, the error between every two generalization features among the various generalization features is calculated to obtain at least one generalization loss;

[0042] The multiple cross-entropy losses, the at least one probability loss, and the at least one generalization loss are used as the training losses for the prediction model to be trained.

[0043] Thirdly, a computer program product is provided, including a computer program that, when executed by a processor, implements the method described in the first aspect.

[0044] Fourthly, a computer device is provided, comprising:

[0045] Memory, used to store program instructions;

[0046] A processor is configured to invoke program instructions stored in the memory and execute the method described in the first aspect according to the obtained program instructions.

[0047] Fifthly, a computer-readable storage medium is provided, the computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method as described in the first aspect.

[0048] In this embodiment, various pre-stored nonlinear functions are arranged and combined to obtain multiple function combinations. For multiple initial features corresponding to a sample data, multiple initial features are nonlinearly transformed based on multiple function combinations to obtain multiple generalized features. Each generalized feature represents an object type related to the sample object and a content type related to the sample recommendation content.

[0049] On the one hand, during one round of iterative training on a sample data, multiple generalization features contained in the sample data are extracted. The prediction model to be trained is trained based on multiple generalization features. Compared with the method of extracting only one generalization feature for training during one round of iterative training, this simplifies the training process and improves the training efficiency of the prediction model to be trained.

[0050] On the other hand, during a round of iterative training, the extracted generalization features can represent multiple object types related to the sample object and multiple content types related to the sample recommendation content. This allows the prediction model to be trained to learn fully from the sample data, avoiding the problem of overfitting caused by the prediction model ignoring some features contained in the sample data during the training process. This improves the prediction accuracy and reliability of the trained prediction model. Attached Figure Description

[0051] Figure 1a A schematic diagram illustrating the principle of a training and prediction method based on related technologies;

[0052] Figure 1b This is one application scenario of the training and prediction method provided in the embodiments of this application;

[0053] Figure 2 A flowchart illustrating a training and prediction method provided in an embodiment of this application;

[0054] Figure 3a A schematic diagram illustrating the principle of the training and prediction method provided in this application embodiment;

[0055] Figure 3b A schematic diagram of the training and prediction method provided in the embodiments of this application. Figure 2 ;

[0056] Figure 4a A schematic diagram three illustrating the principle of the training and prediction method provided in the embodiments of this application;

[0057] Figure 4b A schematic diagram four illustrating the principle of the training and prediction method provided in the embodiments of this application;

[0058] Figure 5a A schematic diagram five illustrating the principle of the training and prediction method provided in the embodiments of this application;

[0059] Figure 5b A schematic diagram of the training and prediction method provided in the embodiments of this application. Figure 6 ;

[0060] Figure 5cA schematic diagram seven illustrating the principle of the training and prediction method provided in the embodiments of this application;

[0061] Figure 6 A schematic diagram of the training and prediction method provided in the embodiments of this application. Figure 8 ;

[0062] Figure 7a A schematic diagram of the training and prediction method provided in the embodiments of this application. Figure 9 ;

[0063] Figure 7b A schematic diagram ten illustrating the principle of the training and prediction method provided in the embodiments of this application;

[0064] Figure 7c 11. A schematic diagram illustrating the principle of the training and prediction method provided in this application embodiment;

[0065] Figure 8 A schematic diagram of a training prediction apparatus provided in an embodiment of this application;

[0066] Figure 9 A schematic diagram of the training prediction apparatus provided in the embodiments of this application. Figure 2 . Detailed Implementation

[0067] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings.

[0068] The following explanations of some terms used in the embodiments of this application are provided to facilitate understanding by those skilled in the art.

[0069] (1) Deep Recommendation (DeepFM) Model:

[0070] Building upon the classic paper Wide & Deep Learning, this paper replaces the wide part (LR) with a feature-driven model (FM), improving upon the original model's reliance on manual feature engineering and resulting in an end-to-end deep learning model. The DeepFM model has been widely adopted in recommendation and advertising systems of many internet companies.

[0071] (2) Embedding:

[0072] The task of deep learning is to map high-dimensional raw data (such as images and sentences) to low-dimensional manifolds, so that the high-dimensional raw data becomes separable after being mapped to the low-dimensional manifold. This mapping is called embedding.

[0073] (3) Dropout algorithm:

[0074] Dropout refers to the process of temporarily removing neural network units from the network with a certain probability during the training of a deep learning network. It is equivalent to finding a thinner network from the original network.

[0075] This application relates to the field of Artificial Intelligence (AI), and is designed based on Computer Vision (CV) and Machine Learning (ML) technologies. It can be applied to fields such as cloud computing, intelligent transportation, assisted driving, and mapping.

[0076] Artificial intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that studies the design principles and implementation methods of various machines, attempting to understand the essence of intelligence and produce new intelligent machines that can react in a way similar to human intelligence, enabling machines to have perception, reasoning, and decision-making functions.

[0077] Artificial intelligence (AI) is a comprehensive discipline encompassing a wide range of fields, including both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, interactive operating systems, and mechatronics. AI software technologies primarily include computer vision, speech processing, natural language processing, machine learning / deep learning, autonomous driving, and intelligent transportation. With the development and progress of AI, it has been researched and applied in numerous fields, such as smart homes, intelligent customer service, virtual assistants, smart speakers, intelligent marketing, wearable devices, autonomous driving, drones, robots, smart healthcare, vehicle networking, and intelligent transportation. It is believed that with further technological advancements, AI will be applied in even more fields, playing an increasingly important role. The solutions provided in this application's embodiments relate to deep learning and augmented reality technologies in AI, which are further illustrated by the following examples.

[0078] Computer vision is the science that studies how to enable machines to "see." More specifically, it refers to machine vision, which uses cameras and computers to replace human eyes in identifying, tracking, and measuring targets, and then performs image processing to create images more suitable for human observation or transmission to instruments. As a scientific discipline, computer vision studies related theories and technologies, attempting to build artificial intelligence systems capable of extracting information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content / behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous localization and mapping (SLAM), autonomous driving, intelligent transportation, and other technologies, as well as common biometric recognition technologies such as facial recognition and fingerprint recognition.

[0079] Machine learning is a multidisciplinary field that involves probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and many other disciplines. It specifically studies how computers acquire new knowledge or skills by simulating human learning behavior, reorganize existing knowledge structures, and continuously improve their performance.

[0080] Machine learning is the core of artificial intelligence and the fundamental way to endow computers with intelligence. Its applications span all areas of artificial intelligence. Deep learning, on the other hand, is the core of machine learning and a technology for realizing machine learning. Machine learning typically includes techniques such as deep learning, reinforcement learning, transfer learning, inductive learning, artificial neural networks, and instructional learning. Deep learning includes techniques such as convolutional neural networks (CNNs), deep belief networks, recurrent neural networks, autoencoders, and generative adversarial networks.

[0081] It should be noted that in the embodiments of this application, data related to objects or content are involved. When the above embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0082] The following is a brief introduction to the application areas of the training and prediction model method provided in the embodiments of this application.

[0083] With the continuous development of technology, more and more devices can provide intelligent search services for target objects. As the recommended content that devices can present in response to search operations for target objects becomes increasingly richer, when there are many recommended results for a target object, the device can prioritize presenting the recommended content that the target object is more likely to choose from among the numerous recommendations.

[0084] For example, in the communication software installed on the device, the target can obtain different articles or videos through different subscription accounts. Please refer to [reference needed]. Figure 1a The subscription account is a self-media publishing platform with over 300 million daily active users, therefore, the recommended content that the device can display is very rich.

[0085] The device can predict the probability of a target audience selecting each article or video, and then present the articles or videos in descending order of probability, so that the target audience can quickly obtain recommended content that interests them.

[0086] When a device prioritizes displaying recommended content that a target user is more likely to select, a trained prediction model obtained through multiple rounds of iterative training is typically used to predict the probability of the target user selecting different recommended content. For example, a click-through rate (CTR) prediction model can be used to predict the probability of a user clicking on a certain item.

[0087] In traditional methods for obtaining trained prediction models, due to the limited amount of sample data, it is usually necessary to repeatedly train the prediction model to be trained in multiple rounds based on the same set of sample data in order to make the training loss of the prediction model reach the training objective and thus obtain a trained prediction model.

[0088] However, when the prediction model is repeatedly trained based on the same set of sample data, the prediction model will only focus on learning the main features represented by that set of sample data, while ignoring the relatively sparse long-tail data. This results in insufficient fitting ability of long-tail features, causing the obtained trained prediction model to have overfitting problems.

[0089] Therefore, when using a trained prediction model to predict the probability of a target user choosing different recommended content, misjudgments may occur. For example, it might predict recommended content that the target user is not interested in as having a higher probability of selection, or predict recommended content that the target user is interested in as having a lower probability of selection. This results in the device prioritizing recommendations that are not the recommended content that the target user needs to choose, leading to lower prediction accuracy and reliability of the trained prediction model.

[0090] It is evident that, under the relevant technologies, the prediction accuracy and reliability of the trained prediction models are relatively low.

[0091] To address the issue of low prediction accuracy and reliability in trained prediction models, this application proposes a method for training prediction models. In this method, after obtaining each sample data set, the prediction model to be trained undergoes multiple rounds of iterative training based on each sample data set, outputting the trained target prediction model. Each sample data set includes: a sample object, sample recommended content, and the sample probability of the sample object selecting the sample recommended content.

[0092] In each round of iterative training, at least the following steps are performed: Feature extraction is performed on the sample objects and recommended content contained in a sample dataset to obtain multiple initial features. Then, based on various combinations of pre-stored nonlinear functions, nonlinear transformations are applied to these initial features to obtain multiple generalized features. Each generalized feature represents an object type related to the sample object and a content type related to the recommended content. Based on these multiple generalized features, the training loss of the prediction model to be trained is determined, and the model parameters are adjusted.

[0093] In this embodiment, various pre-stored nonlinear functions are arranged and combined to obtain multiple function combinations. For multiple initial features corresponding to a sample data, multiple initial features are nonlinearly transformed based on multiple function combinations to obtain multiple generalized features. Each generalized feature represents an object type related to the sample object and a content type related to the sample recommendation content.

[0094] On the one hand, during one round of iterative training on a sample data, multiple generalization features contained in the sample data are extracted. The prediction model to be trained is trained based on multiple generalization features. Compared with the method of extracting only one generalization feature for training during one round of iterative training, this simplifies the training process and improves the training efficiency of the prediction model to be trained.

[0095] On the other hand, during a round of iterative training, the extracted generalization features can represent multiple object types related to the sample object and multiple content types related to the sample recommendation content. This allows the prediction model to be trained to learn fully from the sample data, avoiding the problem of overfitting caused by the prediction model ignoring some features contained in the sample data during the training process. This improves the prediction accuracy and reliability of the trained prediction model.

[0096] The following describes the application scenarios of the training and prediction model method provided in this application.

[0097] Please refer to Figure 1bThis diagram illustrates an application scenario of the method for training and predicting the model provided in this application. The application scenario includes a client 101 and a server 102. The client 101 and the server 102 can communicate with each other. The communication method can be wired, such as through a network cable or serial cable; or wireless, such as through Bluetooth or Wi-Fi. No specific limitation is imposed.

[0098] Client 101 generally refers to a device that can provide sample data to server 102 or use a trained prediction model, such as a terminal device, a third-party application accessible by the terminal device, or a webpage accessible by the terminal device. Terminal devices include, but are not limited to, mobile phones, computers, intelligent transportation equipment, and smart appliances. Server 102 generally refers to a device that can train a prediction model, such as a terminal device or a server. Servers include, but are not limited to, cloud servers, local servers, or associated third-party servers. Both client 101 and server 102 can use cloud computing to reduce the consumption of local computing resources; similarly, they can also use cloud storage to reduce the consumption of local storage resources.

[0099] As one embodiment, the client 101 and the server 102 can be the same device, and there is no specific limitation. In this embodiment, the client 101 and the server 102 are described as different devices.

[0100] The following is based on Figure 1b Using server 102 as the main component, this paper provides a detailed description of the method for training and predicting the model provided in the embodiments of this application. Please refer to [link / reference]. Figure 2 This is a flowchart illustrating a method for training a prediction model provided in an embodiment of this application.

[0101] S201, obtain data for each sample.

[0102] When training the prediction model, the server first acquires various sample data. Each sample data includes a sample object, sample recommended content, and the sample probability of the sample object selecting the sample recommended content. The trained prediction model is used to predict the probability of the target object selecting a certain candidate recommended content based on the target object and information such as various candidate recommended content. Therefore, the server can sort the candidate recommended content in descending order of probability according to the predicted probability of the target object selecting each candidate recommended content, and then present the target recommended content to the target object in the order of probability.

[0103] The sample objects can include objects such as virtual accounts; the sample objects can also include the time when the sample recommended content was selected, and the attribute information of the sample objects, such as the gender, age, geographical location, occupation, account usage time, etc., without any specific restrictions.

[0104] The sample recommended content can include articles, videos, voice messages, or chat logs. The sample recommended content can also include content attribute information, such as the author of the content, the amount of interaction with the content, the main keywords contained in the comments of the content, and the richness of the content, such as the length of the article, the number of images in the article, or the number of scene transitions in the video, etc. There are no specific restrictions.

[0105] The sample object can select sample recommended content for various purposes, such as viewing the details of the sample recommended content, playing the sample recommended content, viewing the context information of the sample recommended content, forwarding the sample recommended content, or tagging the sample recommended content. There are no specific restrictions.

[0106] The server can retrieve individual sample data from storage devices, download individual sample data from network resources, and generate virtual individual sample data according to data standards, etc., without any specific restrictions.

[0107] S202, based on each sample data, performs multiple rounds of iterative training on the prediction model to be trained, and outputs the trained target prediction model.

[0108] After obtaining the sample data, the server can perform multiple rounds of iterative training on the prediction model to be trained based on each sample data, and output the trained target prediction model. During each round of iterative training, the server can train the prediction model to be trained based on one sample data and adjust the model parameters. The server can output the trained target prediction model when the prediction model to be trained has been trained based on all sample data, or when it is determined that the training loss of the prediction model to be trained meets the training objective, etc., without any specific restrictions.

[0109] The following describes the process of training a prediction model based on a single sample dataset. The process of training a prediction model based on other sample datasets is similar and will not be repeated here. For the training process of the prediction model to be trained, please refer to S203~S204.

[0110] S203: Extract features from the sample objects and recommended content contained in a sample data to obtain multiple initial features. Then, based on the various combinations of functions obtained by permuting and combining pre-stored nonlinear functions, perform nonlinear transformation on the multiple initial features to obtain multiple generalized features.

[0111] For a single sample data point from the dataset, the server can use the prediction model to be trained to extract features from both the sample object and the recommended content contained in that sample data point, obtaining multiple initial features. Each initial feature is used to characterize a concrete, shallow feature contained in the sample object or recommended content. Each individual initial feature can only be used to characterize a part of the sample object or recommended content. Each initial feature can be in the form of a feature vector or a feature matrix, and the feature dimensions of each initial feature can be the same or different, without any specific restrictions.

[0112] After obtaining the initial features, the server continues to use the prediction model to be trained, permuting and combining the pre-stored nonlinear functions to obtain multiple function combinations. Based on these multiple function combinations, nonlinear transformations are applied to the initial features to obtain various generalized features. Each generalized feature represents an object type related to the sample object and a content type related to the sample recommendation content. By performing nonlinear transformations on multiple initial features, the representational power of the initial features can be enhanced, converging the various parts of the sample object or sample recommendation content into a whole, forming abstract deep features contained in the sample object or sample recommendation content. This allows each generalized feature obtained after the nonlinear transformation to represent an object type related to the sample object and a content type related to the sample recommendation content.

[0113] Please refer to Figure 3a This is a schematic diagram illustrating the principle of a method for training a prediction model provided in this application embodiment. For a sample dataset containing sample objects and sample recommendation content, after feature extraction using the prediction model to be trained, initial features A, B, ..., N are obtained. The prediction model to be trained then performs nonlinear transformations on the initial features A, B, ..., N based on various function combinations. After performing a nonlinear transformation on the initial features A, B, ..., N using function combination a, a generalized feature a is obtained; after performing a nonlinear transformation on the initial features A, B, ..., N using function combination b, a generalized feature b is obtained; and so on, after performing a nonlinear transformation on the initial features A, B, ..., N using function combination n, a generalized feature n is obtained, thus obtaining multiple generalized features.

[0114] As one implementation, the initial features can be embeddings. The server can use the prediction model to be trained to directly extract the embeddings contained in the sample object and the sample recommendation content. Alternatively, the server can first select sparse features that match the sample object and sparse features that match the sample recommendation content from a pre-stored sparse feature dictionary. The sparse features can be in one-hot encoding form. Thus, all features representing the sample object and the sample recommendation content can be obtained.

[0115] After obtaining the sparse features, the server continues to use the prediction model to be trained, adjusting each sparse feature to a specified dimension to obtain multiple initial features. The dimension adjustment of the sparse features using the prediction model to be trained can be achieved through hidden layers within the model.

[0116] The hidden layer used to adjust the dimension in the prediction model to be trained can contain multiple hidden nodes. For each sparse feature, the same number of hidden nodes as the specified dimension can be selected from the multiple hidden nodes to adjust the dimension of the sparse feature. Each selected hidden node can perform a weighted summation of the elements contained in the sparse feature based on the corresponding weight parameters to be trained, thereby adjusting the sparse feature to the specified dimension.

[0117] Please refer to Figure 3b The specified dimension is 3, and the sparse features are [0, 0, 1, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1], and [1, 0, 0, 0, 0, 0], with each sparse feature having a dimension of 6. Taking the dimension adjustment of the sparse feature [0, 0, 1, 0, 0, 0] as an example, three hidden nodes are selected from multiple hidden nodes, namely hidden node A, hidden node B, and hidden node C. The weight parameters to be trained for hidden nodes A, B, and C are 2, 5, and 3, respectively.

[0118] Hidden node A performs a weighted summation of the elements in the sparse feature [0, 0, 1, 0, 0, 0], resulting in 2; hidden node B performs a weighted summation of the elements in the sparse feature [0, 0, 1, 0, 0, 0], resulting in 5; hidden node C performs a weighted summation of the elements in the sparse feature [0, 0, 1, 0, 0, 0], resulting in 3. Thus, through the hidden layers, the sparse feature [0, 0, 1, 0, 0, 0] can be adjusted to [2, 5, 3], achieving the goal of adjusting dimension 6 to 3. Since the sparse feature contains a large number of zeros, dimension adjustment not only densifies the sparse feature but also allows the prediction model to process features based on a specified dimension during subsequent feature processing, avoiding the problem of high training complexity caused by diverse feature dimensions.

[0119] As one example, when arranging and combining pre-stored nonlinear functions to obtain multiple function combinations, each pair of function combinations can contain the same number of nonlinear functions or different numbers of nonlinear functions. For example, the first function combination contains nonlinear function A and nonlinear function B, the second function combination contains nonlinear function A and nonlinear function C, and the third function combination contains nonlinear function A, nonlinear function B, and nonlinear function C.

[0120] In two combinations of functions containing the same number of nonlinear functions, at least one of the nonlinear functions must be different. For example, the first combination of functions contains nonlinear functions A and B, and the second combination contains nonlinear functions A and C.

[0121] Based on multiple function combinations, nonlinear transformations are applied to multiple initial features to obtain corresponding generalized features. Taking one function combination as an example, each nonlinear function contained in the function combination is used to perform nonlinear transformations on multiple initial features, resulting in multiple nonlinear transformation results. By combining these nonlinear transformation results, a generalized feature is obtained.

[0122] Please refer to Figure 4a A function combination includes nonlinear functions A, B, and C, and multiple initial features include initial feature a, initial feature b, ..., initial feature n. Nonlinear function A is used to perform a nonlinear transformation on initial features a, b, ..., n, resulting in nonlinear transformation result A; nonlinear function B is used to perform a nonlinear transformation on initial features a, b, ..., n, resulting in nonlinear transformation result B; and nonlinear function C is used to perform a nonlinear transformation on initial features a, b, ..., n, resulting in nonlinear transformation result C. The nonlinear transformation results A, B, and C are combined into a generalized feature.

[0123] As one embodiment, each function combination can also be a sequence of multiple sub-combinations, each sub-combination containing at least one nonlinear function. For example, a function combination may contain sub-combinations A, B, and C, with the sub-combinations arranged in the order of sub-combinations A, C, and B.

[0124] Taking a single function combination as an example, the server can use the prediction model to be trained, and according to the order of the multiple sub-combinations contained in a function combination, perform nonlinear transformations on multiple initial features based on at least one nonlinear function contained in the first sub-combination in the order, outputting at least one corresponding intermediate feature. Continuing in the order, nonlinear transformations are performed on at least one intermediate feature output by the adjacent sub-combination based on at least one nonlinear function contained in each of the other sub-combinations, until the last sub-combination in the order is transformed. The output at least one intermediate feature is then used as a generalization feature.

[0125] Please refer to Figure 4b A function combination comprises subcombinations A and B, with the subcombinations arranged in the order of subcombinations A and B. Subcombination A contains nonlinear functions a, b, and c, and subcombination B contains nonlinear functions d and e. Multiple initial features include initial feature a, initial feature b, ..., and initial feature n.

[0126] Following the order of arrangement, first use the nonlinear function 'a' contained in subcombination A to perform a nonlinear transformation on the initial features a, b, ..., and n to obtain the intermediate feature a; then use the nonlinear function 'b' contained in subcombination A to perform a nonlinear transformation on the initial features a, b, ..., and n to obtain the intermediate feature b; finally, use the nonlinear function 'c' contained in subcombination A to perform a nonlinear transformation on the initial features a, b, ..., and n to obtain the intermediate feature c.

[0127] Following the order of arrangement, the nonlinear function d contained in subcombination B is used to perform nonlinear transformation on intermediate features a, b, and c to obtain intermediate feature d; the nonlinear function e contained in subcombination B is used to perform nonlinear transformation on intermediate features a, b, and c to obtain intermediate feature e.

[0128] Since subcombination B is the last subcombination, the intermediate features d and e are combined into a generalized feature.

[0129] As one implementation, the server can employ the Dropout algorithm to permutate and combine multiple nonlinear functions, obtaining various function combinations. Utilizing the randomness of Dropout, multiple permutations and combinations can be derived from a pre-stored set of nonlinear functions, resulting in a variety of function combinations. By fitting a sample data set with these multiple function combinations, various generalization features of the sample data can be obtained, achieving the goal of performing multiple rounds of training on that single sample data and realizing data augmentation.

[0130] Taking the use of Dropout to obtain two function combinations as an example, due to the randomness of Dropout, the two function combinations contain different nonlinear functions. Since the function combinations are composed of nonlinear functions selected from multiple pre-stored nonlinear functions, if there are corresponding nonlinear functions in the two function combinations, then the relevant node parameters of the corresponding nonlinear functions are the same, that is, parameter sharing.

[0131] After obtaining multiple initial features, the server can use Dropout to select from multiple nonlinear functions. Please refer to [reference needed]. Figure 5a Each circle represents a nonlinear function, and the prediction model contains multiple nonlinear functions.

[0132] For random selection from multiple nonlinear functions, please refer to [reference needed]. Figure 5b Through two rounds of random selection, two corresponding function combinations, function combination A and function combination B, are obtained. In function combination A and function combination B, solid circles represent nonlinear functions that have been selected, and dashed circles represent nonlinear functions that have not been selected.

[0133] After obtaining function combination A and function combination B, please refer to... Figure 5c The server performs nonlinear transformations on multiple initial features based on function combination A and function combination B, respectively, to obtain two corresponding generalized features, including generalized feature A and generalized feature B.

[0134] S204, based on multiple generalization features, determines the training loss of the prediction model to be trained and adjusts the model parameters.

[0135] After obtaining multiple generalization features, the server can determine the training loss of the prediction model to be trained based on these features and adjust the model parameters. The server can adjust the model parameters of the prediction model to be trained when it determines that the training loss does not meet the training objective; and output the prediction model to be trained as the trained target prediction model when it determines that the training loss meets the training objective. The server can also adjust the model parameters of the prediction model to be trained based on the training loss when it determines that there are untrained samples in the sample data; and output the prediction model to be trained as the trained target prediction model when it determines that there are no untrained samples in the sample data. The specific implementation is not limited.

[0136] The server can predict a training probability of a sample object selecting the recommended content from the sample based on multiple generalization features, thus obtaining multiple training probabilities. The server can determine the training loss based on the error between the average of the multiple training probabilities and the corresponding sample probability; the server can also determine the training loss based on the maximum error between each of the multiple training probabilities and the sample probability; the server can also determine the training loss based on the minimum error between each of the multiple training probabilities and the sample probability; the server can also determine the training loss based on the distribution of the errors between each of the multiple training probabilities and the sample probability, etc., without any specific restrictions.

[0137] By predicting multiple training probabilities based on various generalization features, multiple training iterations of a single sample data can be performed in a single training round, simplifying the training process and improving training efficiency.

[0138] As one example, since the initial features characterize the concrete features of the sample data and the generalized features characterize the abstract features of the sample data, the server can predict the training probability of a sample object selecting the recommended sample content based on the initial features and the generalized features.

[0139] The server can use multiple linear functions to perform linear transformations on multiple initial features, obtaining multiple linearly transformed features. Each linear function can be either a first-order or second-order linear function. Specifically, the server can use a first-order linear function to perform linear transformations on multiple initial features to obtain first-order representations of the initial features (i.e., first-order features), and use a second-order linear function to perform linear transformations on multiple initial features to obtain second-order representations of the initial features (i.e., second-order features). The server can then use the obtained first-order and second-order features as linearly transformed features, respectively.

[0140] When the initial features are obtained based on sparse features, the server can use a first-order linear function to perform a linear transformation on each sparse feature to obtain first-order features, and use a second-order linear function to perform a linear transformation on each initial feature to obtain second-order features. The obtained first-order features and second-order features are then used as linear transformation features.

[0141] When performing a linear transformation on multiple initial features, the server can either select a subset of initial features from all initial features for the linear transformation, or it can directly perform a linear transformation on all initial features.

[0142] The linear function selected by the server for each linear transformation can be learned through multiple rounds of iterative training, and the parameters contained in each linear function can also be learned through multiple rounds of iterative training, etc., without any specific restrictions.

[0143] After obtaining multiple linear transformation features, the server can determine the training loss of the prediction model to be trained based on these features and various generalization features. When the training loss does not meet the training objective, the model parameters of the prediction model to be trained are adjusted, and the next round of iterative training begins. When the training loss meets the training objective, the prediction model to be trained is output as the trained target prediction model.

[0144] As one example, when determining the training loss of the prediction model to be trained based on multiple linear transformation features and multiple generalization features, for each generalization feature, the server can predict a training probability of a sample object selecting the recommended sample content based on multiple linear transformation features and one generalization feature, thereby obtaining multiple training probabilities. The server can determine the training loss of the prediction model to be trained based on the errors between the predicted multiple training probabilities and the corresponding sample probabilities.

[0145] As one example, when determining the training loss of the prediction model to be trained based on the errors between multiple predicted training probabilities and their corresponding sample probabilities, the server can use a cross-entropy function to calculate the errors between the multiple training probabilities and their corresponding sample probabilities, thus obtaining multiple cross-entropy losses. The server can then use these multiple cross-entropy losses as the training loss of the prediction model to be trained.

[0146] Since multiple training probabilities can be obtained based on various generalization features, and the predicted training probabilities should tend to be consistent regardless of the perspective from which the generalization features are obtained, constraints can be set during training to prevent the multiple training probabilities from becoming too divergent, which could lead to inaccurate training loss or failure to meet the training objective. These constraints, along with cross-entropy loss, can be used as the training loss for the prediction model to be trained, enabling the model to learn more accurately from the sample data.

[0147] There can be multiple types of constraints; two of them will be introduced below as examples.

[0148] Constraint 1:

[0149] Constrain the error between any two training probabilities in a plurality of training probabilities.

[0150] The server can use a similarity evaluation function to calculate the error between every two training probabilities, obtaining at least one probability loss. The probability loss measures the similarity between two training probabilities; the smaller the probability loss, the greater the similarity.

[0151] The server can determine the training loss of the prediction model to be trained based on multiple cross-entropy losses and at least one probability loss. By constraining the probability loss, the model can more accurately extract the generalization features of the sample data during training, and more accurately predict the training probability based on the generalization features, thereby improving the generalization ability and making the prediction accuracy and reliability of the trained target prediction model higher.

[0152] Constraint 2:

[0153] Constrain the error between any two generalization features in a variety of generalization features.

[0154] The server can use a norm function to calculate the error between every two generalization features in a variety of generalization features, obtaining at least one generalization loss. The norm function can be L1 norm or L2 norm, etc., with no specific restriction. By constraining every two generalization features, the object categories and content categories represented by each pair of generalization features are prevented from being too divergent, allowing the model to extract generalization features from the sample data more accurately during training, thereby improving the prediction accuracy and reliability of the trained prediction model.

[0155] The server can use multiple cross-entropy losses and at least one generalization loss as the training loss of the prediction model to be trained. The server can also combine constraint one to use multiple cross-entropy losses, at least one probability loss, and at least one generalization loss as the training loss of the prediction model to be trained, etc., without any specific restrictions.

[0156] To obtain two generalization features, generalization feature A and generalization feature B, please refer to [reference needed]. Figure 6 The server first uses a norm function to calculate the error between generalized feature A and generalized feature B, obtaining the generalization loss. After making predictions based on generalized feature A and generalized feature B respectively, obtaining training probabilities A and B, the server then uses a similarity evaluation function to calculate the error between training probabilities A and B, obtaining the probability loss. The server uses a cross-entropy function to calculate the error between training probabilities A and B and the sample probabilities, obtaining cross-entropy loss A and cross-entropy loss B. The server uses the generalization loss, probability loss, and cross-entropy loss A and cross-entropy loss B as the training loss.

[0157] The following is an example of the method for training and predicting models provided in the embodiments of this application.

[0158] Please refer to Figure 7a Taking the training process for a single sample of data as an example, the model architecture of the prediction model to be trained is based on the model architecture of the DeepFM model.

[0159] The server uses the prediction model to be trained to extract features from the sample objects and sample recommendation content contained in the sample data, and obtains multiple sparse features. Each sparse feature is represented by a triangle, and different triangles represent different sparse features.

[0160] After obtaining multiple sparse features, the hidden layer in the prediction model to be trained is used to transform the sparse features into an embedding layer. The multiple sparse features are then densified and adjusted to a specified dimension to obtain multiple initial features.

[0161] After obtaining multiple initial features, the Dropout algorithm is used to select two function combinations, function combination A and function combination B, from the multiple nonlinear functions contained in the prediction model to be trained through permutation and combination. Figure 7a Solid circles represent selected nonlinear functions, while dashed circles represent unselected nonlinear functions. Figure 7a The two function combinations each contain multiple sub-combinations, and each sub-combination contains multiple nonlinear functions. The parameters of identical nonlinear functions in the two function combinations are shared.

[0162] Two different function combinations are used to perform nonlinear transformations on multiple initial features, resulting in two corresponding generalization features, generalization feature A and generalization feature B. Figure 7a The values ​​are represented by squares. Simultaneously, the Wide&FM linear memory module in the prediction model to be trained is used to linearly transform multiple initial features using multiple linear functions, obtaining corresponding linearly transformed features. Based on the obtained linearly transformed features and each generalized feature, the prediction model to be trained predicts the training probability of a sample object selecting the recommended content, obtaining two training probabilities, training probability A and training probability B.

[0163] After obtaining two generalization features, the L2 norm is used to calculate the generalization loss between the two generalization features; after obtaining two training probabilities, KL-Divergence is used to calculate the similarity between the two training probabilities to obtain the probability loss; after obtaining two training probabilities, the cross-entropy function is used to calculate the cross-entropy loss between the two training probabilities and the sample probability respectively to obtain two cross-entropy losses.

[0164] The server can use the obtained generalization loss, probability loss, and two cross-entropy losses as training losses to determine whether the generalization loss, probability loss, and two cross-entropy losses satisfy their respective training objectives.

[0165] When it is determined that the training loss meets the training objective, the model parameters of the prediction model to be trained are adjusted, and the next round of iteration training is started; when it is determined that the training loss meets the training objective, the prediction model to be trained is output as the target prediction model that has been trained.

[0166] Compared with mainstream CTR prediction models in related technologies, the model evaluation metric (Area Under Curve, AUC) shows a significant improvement when tested on offline datasets in the subscription account business. Please refer to Table 1 for a comparison of the prediction model provided in this application embodiment with the LR model, FM model, Wide&Deep model, DeepFM model, AutoInt model, and xDeepFM model in related technologies, showing the improvement in AUC.

[0167] Table 1

[0168]

[0169] In this embodiment, no additional model structure or parameters are added to the DeepFM model. Based on multiple training iterations for each sample data, the training time only increases by 30%, far less than the time increase brought by more complex models, such as the AutoInt model and the xDeepFM model.

[0170] As one example, after obtaining a trained target prediction model, the server can use the target prediction model to predict the probability of a target object selecting target content.

[0171] The target audience accesses an application through a client, which can periodically push content published by various subscription accounts to the target audience. For example, the possible content options include: Option A, published by ID1, introducing the recently released movie A; Option B, published by ID2, introducing the specific recipe for a type of bread; Option C, published by ID3, introducing the ten highest-rated movies of the past five years; Option D, published by ID4, introducing ten baking-related film and television programs; and Option E, published by ID5, introducing the film and television programs recently scheduled to air for the target actor.

[0172] Please refer to Figure 7b The first target and the second target enter an application through their respective clients. The clients contain relevant information about the target, such as the first target being female, a pastry chef, and a friend of ID1; and the second target being a movie lover and a fan of the target actor.

[0173] Therefore, when the first target object enters an application through the client, the server uses a target prediction model to predict the probability that the first target object will choose each alternative. The server inputs the relevant information of the first target object, as well as the relevant information of each alternative, into the target prediction model.

[0174] The target prediction model extracts features of the first target object and features of each candidate content, and outputs the following prediction probabilities: the first target object selects candidate content A as 0.6; the first target object selects candidate content B as 0.8; the first target object selects candidate content C as 0.2; the first target object selects candidate content D as 0.7; and the first target object selects candidate content E as 0.1.

[0175] Therefore, the server can select candidate content with a predicted probability greater than 0.5 from the various candidate content options, including candidate content A, candidate content B, and candidate content D. The server sorts candidate content A, candidate content B, and candidate content D according to their predicted probabilities from highest to lowest, obtaining a target sequence in the order of candidate content B, candidate content D, and candidate content A. The server sends the target sequence to the client, which then presents the content to the first target object according to the order of the content in the target sequence.

[0176] Please refer to Figure 7c When the second target object enters an application through a client, the server uses a target prediction model to predict the probability that the second target object will choose each alternative. The server inputs the relevant information of the second target object, as well as the relevant information of each alternative, into the target prediction model.

[0177] The target prediction model extracts features from the second target object and the features of each alternative content, and outputs the following prediction probabilities: the second target object selects alternative content A with a probability of 0.9; the first target object selects alternative content B with a probability of 0.1; the first target object selects alternative content C with a probability of 0.6; the first target object selects alternative content D with a probability of 0.3; and the first target object selects alternative content E with a probability of 0.8.

[0178] Therefore, the server can select the three candidate contents with the highest predicted probabilities from the various options, including candidate content A, candidate content C, and candidate content E. The server sorts candidate content A, candidate content C, and candidate content E according to their predicted probabilities from highest to lowest, obtaining a target sequence in the order of candidate content A, candidate content E, and candidate content C. The server sends the target sequence to the client, which then presents the content to the second target object according to the order of the contents in the target sequence.

[0179] Based on the same inventive concept, embodiments of this application provide an apparatus for training a prediction model, capable of achieving the functions corresponding to the aforementioned method for training a prediction model. Please refer to... Figure 8 The device includes an acquisition module 801 and a processing module 802, wherein:

[0180] Acquisition module 801: used to acquire each sample data, wherein each sample data includes: sample object and sample recommended content, and sample probability of sample object selecting sample recommended content;

[0181] Processing module 802: Used to perform multiple rounds of iterative training on the prediction model to be trained based on each sample data, and output the trained target prediction model; wherein, in each round of iterative training, at least the following steps are performed:

[0182] The processing module 802 is also used to: extract features from the sample objects and sample recommendation content contained in a sample data respectively to obtain multiple initial features, and perform nonlinear transformation on the multiple initial features based on multiple function combinations obtained by permuting and combining various pre-stored nonlinear functions to obtain multiple generalized features; wherein, each generalized feature represents an object type related to the sample object and a content type related to the sample recommendation content.

[0183] The processing module 802 is also used to: determine the training loss of the prediction model to be trained based on multiple generalization features, and adjust the model parameters.

[0184] In one possible embodiment, the processing module 802 is specifically used for:

[0185] From the pre-stored sparse feature dictionary, select each sparse feature that matches the sample object and each sparse feature that matches the sample recommendation content.

[0186] The selected sparse features are adjusted to the specified dimensions to obtain multiple initial features.

[0187] In one possible embodiment, the processing module 802 is specifically used for:

[0188] For each combination of functions, perform the following operations:

[0189] Based on the nonlinear functions contained in a function combination, multiple initial features are subjected to nonlinear transformations to obtain multiple nonlinear transformation results.

[0190] By combining the results of multiple nonlinear transformations, a generalization feature is obtained.

[0191] In one possible embodiment, each function combination is a sequence of multiple sub-combinations, each sub-combination containing at least one nonlinear function; the processing module 802 is specifically used for:

[0192] For combinations of multiple functions, perform the following operations respectively:

[0193] According to the arrangement order of multiple sub-combinations contained in a function combination, based on at least one nonlinear function contained in the first sub-combination in the arrangement order, multiple initial features are subjected to nonlinear transformation, and at least one corresponding intermediate feature is output.

[0194] In the order of arrangement, based on at least one nonlinear function contained in other sub-combinations, at least one intermediate feature output by the adjacent previous sub-combination is transformed nonlinearly until the last sub-combination in the order of arrangement is transformed nonlinearly. Then, at least one intermediate feature output is used as a generalization feature.

[0195] In one possible embodiment, the processing module 802 is specifically used for:

[0196] Multiple linear functions are used to perform linear transformations on multiple initial features to obtain multiple linearly transformed features, wherein each linear function is a first-order linear function or a second-order linear function;

[0197] Based on multiple linear transformation features and various generalization features, the training loss of the prediction model to be trained is determined.

[0198] When the training loss fails to meet the training objective, the model parameters of the prediction model to be trained are adjusted, and the next round of iterative training begins.

[0199] In one possible embodiment, the processing module 802 is specifically used for:

[0200] For various generalization features, perform the following operations respectively: Based on multiple linear transformation features and one generalization feature, predict the training probability of a sample object selecting the recommended content of the sample;

[0201] The training loss of the prediction model to be trained is determined based on the errors between the predicted training probabilities and the sample probabilities contained in a sample data.

[0202] In one possible embodiment, the processing module 802 is specifically used for:

[0203] The cross-entropy function is used to calculate the error between multiple training probabilities and the sample probabilities contained in a sample data, thus obtaining multiple cross-entropy losses.

[0204] By employing a similarity evaluation function, the error between every two training probabilities in a plurality of training probabilities is calculated to obtain at least one probability loss;

[0205] The training loss of the prediction model to be trained is determined based on multiple cross-entropy losses and at least one probability loss.

[0206] In one possible embodiment, the processing module 802 is specifically used for:

[0207] By using the norm function, the error between every two generalization features in a variety of generalization features is calculated to obtain at least one generalization loss;

[0208] Multiple cross-entropy losses, at least one probability loss, and at least one generalization loss are used as the training losses for the prediction model to be trained.

[0209] Please refer to Figure 9 The aforementioned training and prediction apparatus can run on a computer device 900. The current and historical versions of the data storage program, as well as the application software corresponding to the data storage program, can be installed on the computer device 900, which includes a processor 980 and a memory 920. In some embodiments, the computer device 900 may include a display unit 940, which includes a display panel 941 for displaying a user-interactive interface, etc.

[0210] In one possible embodiment, the display panel 941 may be configured in the form of a liquid crystal display (LCD) or an organic light-emitting diode (OLED).

[0211] The processor 980 is used to read a computer program and then execute the methods defined by the computer program. For example, the processor 980 reads a data storage program or file, thereby running the data storage program on the computer device 900 and displaying the corresponding interface on the display unit 940. The processor 980 may include one or more general-purpose processors, and may also include one or more DSPs (Digital Signal Processors) for performing related operations to implement the technical solutions provided in the embodiments of this application.

[0212] The memory 920 generally includes main memory and secondary storage. Main memory can be random access memory (RAM), read-only memory (ROM), and cache, etc. Secondary storage can be a hard disk, optical disk, USB flash drive, floppy disk, or magnetic tape drive, etc. The memory 920 is used to store computer programs and other data. The computer programs include applications corresponding to each client, and other data may include data generated after the operating system or applications are run, including system data (e.g., operating system configuration parameters) and user data. In this embodiment, program instructions are stored in the memory 920, and the processor 980 executes the program instructions in the memory 920 to implement any of the methods described in the preceding figures.

[0213] The aforementioned display unit 940 is used to receive input digital information, character information, or contact touch operations / non-contact gestures, and to generate signal inputs related to user settings and function control of the computer device 900. Specifically, in this embodiment, the display unit 940 may include a display panel 941. The display panel 941, for example, is a touch screen, which can collect touch operations performed by the user on or near it (such as operations performed by the user using a finger, stylus, or any suitable object or accessory on or on the display panel 941), and drive corresponding connection devices according to a pre-set program.

[0214] In one possible embodiment, the display panel 941 may include two parts: a touch detection device and a touch controller. The touch detection device detects the player's touch position and the signal generated by the touch operation, and transmits the signal to the touch controller. The touch controller receives touch information from the touch detection device, converts it into touch point coordinates, and sends it to the processor 980. It can also receive and execute commands from the processor 980.

[0215] The display panel 941 can be implemented using various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the display unit 940, in some embodiments, the computer device 900 may also include an input unit 930. The input unit 930 may include an image input device 931 and other input devices 932, wherein the other input devices may include, but are not limited to, one or more of the following: a physical keyboard, function keys (such as volume control buttons, power buttons, etc.), a trackball, a mouse, and a joystick.

[0216] In addition to the above, the computer device 900 may also include a power supply 990 for powering other modules, an audio circuit 960, a near-field communication module 970, and an RF circuit 910. The computer device 900 may also include one or more sensors 950, such as an accelerometer, a light sensor, and a pressure sensor. The audio circuit 960 specifically includes a speaker 961 and a microphone 962, for example, the computer device 900 can use the microphone 962 to collect the user's voice and perform corresponding operations.

[0217] As one embodiment, the number of processors 980 can be one or more, and the processors 980 and the memory 920 can be coupled together or relatively independent.

[0218] As one example, Figure 9 The processor 980 in the middle can be used to implement, for example Figure 8 The functions of the acquisition module 801 and the processing module 802 in the process.

[0219] As one example, Figure 9 The processor 980 in the text can be used to implement the functions of the server or terminal devices discussed above.

[0220] Those skilled in the art will understand that all or part of the steps of the above method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it performs the steps of the above method embodiments. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0221] Alternatively, if the integrated units of this invention are implemented as software functional modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of this invention, or the parts that contribute to the prior art, can be embodied in the form of software products, for example, through a computer program product. This computer program product is stored in a storage medium and includes several instructions to cause a computer device to execute all or part of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, ROM, RAM, magnetic disks, or optical disks.

[0222] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.

Claims

1. A method of training a prediction model, the method comprising: include: Obtain various sample data, wherein each sample data includes: a sample object and sample recommended content, and the sample probability of the sample object selecting the sample recommended content; the sample recommended content includes at least one of the following: articles, videos, voice messages, and chat logs; Based on the aforementioned sample data, the prediction model to be trained undergoes multiple rounds of iterative training, outputting the trained target prediction model; wherein, in each round of iterative training, at least the following steps are performed: Feature extraction is performed on the sample objects and recommended content contained in a sample data to obtain multiple initial features. Then, based on various combinations of pre-stored nonlinear functions, nonlinear transformations are applied to these initial features to obtain multiple generalized features. Each initial feature describes a local shallow feature of the sample object or the recommended content. Each generalized feature represents an object type related to the sample object and a content type related to the recommended content. Each pair of function combinations satisfies one of the following conditions: they contain different numbers of nonlinear functions; or, when containing the same number of nonlinear functions, at least one nonlinear function in one combination is different from all nonlinear functions in the other combination. Multiple linear functions are used to perform linear transformations on the multiple initial features to obtain multiple linear transformation features, wherein each of the linear functions is a first-order linear function or a second-order linear function; For the various generalization features, the following operations are performed respectively: based on the multiple linear transformation features and one generalization feature, predict the training probability of the sample object selecting the sample recommendation content; Based on the errors between the predicted multiple training probabilities and the sample probabilities contained in the sample data, the training loss of the prediction model to be trained is determined, so as to adjust the model parameters.

2. The method according to claim 1, characterized in that, Feature extraction is performed on the sample objects and recommended content contained in a sample dataset to obtain multiple initial features, including: From the pre-stored sparse feature dictionary, select each sparse feature that matches the sample object and each sparse feature that matches the sample recommendation content. The selected sparse features are adjusted to a specified dimension to obtain the multiple initial features.

3. The method according to claim 1, characterized in that, The method involves performing nonlinear transformations on the multiple initial features based on various function combinations obtained by permuting and combining pre-stored nonlinear functions, to obtain multiple generalization features, including: For each combination of functions, perform the following operations: Based on the nonlinear functions contained in a function combination, nonlinear transformations are performed on the multiple initial features respectively to obtain multiple nonlinear transformation results; By combining the results of the multiple nonlinear transformations, a generalization feature is obtained.

4. The method according to claim 3, characterized in that, Each function combination is a sequence of multiple sub-combinations, and each sub-combination contains at least one non-linear function. The multiple function combinations obtained by permuting and combining various pre-stored nonlinear functions are used to perform nonlinear transformations on the multiple initial features to obtain multiple generalization features, including: For each of the aforementioned function combinations, perform the following operations respectively: According to the arrangement order of multiple sub-combinations contained in a function combination, based on at least one nonlinear function contained in the first sub-combination in the arrangement order, the multiple initial features are subjected to nonlinear transformation, and at least one corresponding intermediate feature is output. According to the arrangement order, based on at least one nonlinear function contained in other sub-combinations, at least one intermediate feature output by the adjacent previous sub-combination is subjected to nonlinear transformation until the last sub-combination in the arrangement order is subjected to nonlinear transformation. Then, at least one intermediate feature output is used as a generalization feature.

5. The method according to claim 1, characterized in that, Based on the errors between the predicted multiple training probabilities and the sample probabilities contained in the single sample data, the training loss of the prediction model to be trained is determined, including: The cross-entropy function is used to calculate the error between the multiple training probabilities and the sample probabilities contained in a single sample data, thereby obtaining multiple cross-entropy losses. Using a similarity evaluation function, the error between every two training probabilities among the plurality of training probabilities is calculated to obtain at least one probability loss; The training loss of the prediction model to be trained is determined based on the multiple cross-entropy losses and the at least one probability loss.

6. The method according to claim 5, characterized in that, Based on the multiple cross-entropy losses and the at least one probability loss, the training loss of the prediction model to be trained is determined, including: Using a norm function, the error between every two generalization features among the various generalization features is calculated to obtain at least one generalization loss; The multiple cross-entropy losses, the at least one probability loss, and the at least one generalization loss are used as the training losses for the prediction model to be trained.

7. An apparatus for training a prediction model, characterized in that, include: Acquisition module: used to acquire various sample data, wherein each sample data includes: a sample object and sample recommended content, and the sample probability of the sample object selecting the sample recommended content; the sample recommended content includes at least one of the following: articles, videos, voice messages and chat logs; Processing module: used to perform multiple rounds of iterative training on the prediction model to be trained based on the various sample data, and output the trained target prediction model; wherein, in each round of iterative training, at least the following steps are performed: The processing module is further configured to: extract features from the sample objects and sample recommendation content contained in a sample data to obtain multiple initial features, and perform nonlinear transformations on the multiple initial features based on multiple function combinations obtained by permuting and combining various pre-stored nonlinear functions to obtain multiple generalized features; wherein, each initial feature is used to describe a local shallow feature of the sample object or the sample recommendation content; each generalized feature represents an object type related to the sample object and a content type related to the sample recommendation content; and each pair of function combinations satisfies any of the following conditions: containing different numbers of nonlinear functions, or, when containing the same number of nonlinear functions, at least one nonlinear function in one function combination is different from all nonlinear functions in the other function combination; The processing module is also used for: Multiple linear functions are used to perform linear transformations on the multiple initial features to obtain multiple linear transformation features, wherein each of the linear functions is a first-order linear function or a second-order linear function; For the various generalization features, the following operations are performed respectively: based on the multiple linear transformation features and one generalization feature, predict the training probability of the sample object selecting the sample recommendation content; Based on the errors between the predicted multiple training probabilities and the sample probabilities contained in the sample data, the training loss of the prediction model to be trained is determined, so as to adjust the model parameters.

8. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the method as described in any one of claims 1 to 6.

9. A computer device, characterized in that, include: Memory, used to store program instructions; A processor is configured to invoke program instructions stored in the memory and execute the method as described in any one of claims 1 to 6 according to the obtained program instructions.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-executable instructions for causing a computer to perform the method as described in any one of claims 1 to 6.