A model training and content display method, device, equipment and medium
By generating adversarial content samples and training them in the CTR prediction model, and optimizing gradient updates, the problem of inaccurate scoring of long-tail and difficult data in the CTR prediction model is solved, thereby improving the accuracy of content recommendation and user search efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING QIYI CENTURY SCI & TECH CO LTD
- Filing Date
- 2024-04-22
- Publication Date
- 2026-06-23
AI Technical Summary
Existing CTR prediction models are not accurate enough in scoring long-tail and difficult data, resulting in poor content recommendation performance and affecting users' search efficiency.
By acquiring a training sample set, adding perturbation information to generate adversarial content samples, training the CTR prediction model, optimizing the gradient update direction, and improving the model's learning effect on target data.
This improved the accuracy of the CTR prediction model in scoring target data, thereby enhancing the accuracy of content recommendations and improving user search efficiency.
Smart Images

Figure CN118427432B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to a model training and content display method, apparatus, device and medium. Background Technology
[0002] Currently, in content recommendation scenarios, multiple pieces of content to be recommended are retrieved based on keywords entered by the user. The retrieved content is then scored and ranked using a click-through rate (CTR) prediction model, and recommended content is displayed to the user according to the ranking results.
[0003] The CTR prediction model is trained and updated using the exposed content. However, the exposed content includes long-tail data and difficult data, which have low click-through rates. The CTR prediction model's scoring of these data is not accurate enough. For example, it scores long-tail data that can meet users' real needs lower, while scoring difficult data with low actual click-through rates higher, resulting in poor content recommendation performance and seriously affecting users' search efficiency. Summary of the Invention
[0004] The purpose of this application is to provide a model training and content display method, apparatus, device, and medium to improve content recommendation effectiveness and enhance user search efficiency. The specific technical solution is as follows:
[0005] In a first aspect of this application, a model training method is provided, the method comprising:
[0006] Obtain a training sample set, which includes multiple original content samples carrying labels. The multiple original content samples are generated from target data. The number of clicks on the target data is less than a preset threshold. The labels represent the click probability of the target data.
[0007] Perturbation information is added to the multiple original content samples to obtain multiple adversarial content samples;
[0008] The multiple original content samples and the multiple adversarial content samples are respectively input into the click-through rate prediction model to obtain multiple first prediction values corresponding to the multiple original content samples and multiple second prediction values corresponding to the multiple adversarial content samples.
[0009] Using the multiple first estimated values, the multiple second estimated values, and the labels carried by the multiple original content samples, calculate the multiple first loss values corresponding to the multiple original content samples and the multiple second loss values corresponding to the multiple adversarial content samples, respectively.
[0010] Based on the plurality of first loss values and the plurality of second loss values, verify whether the click-through rate prediction model has converged;
[0011] If the click-through rate prediction model does not converge, calculate the multiple first gradients corresponding to the multiple original content samples and the multiple second gradients corresponding to the multiple adversarial content samples respectively.
[0012] The parameters of the click-through rate prediction model are adjusted based on the plurality of first gradients and the plurality of second gradients.
[0013] In some embodiments, the step of adding perturbation information to the plurality of original content samples to obtain a plurality of adversarial content samples includes:
[0014] Based on the first gradient corresponding to the plurality of original content samples, determine the gradient sign corresponding to the plurality of original content samples;
[0015] Based on the gradient sign and the preset perturbation factor, calculate the perturbation information corresponding to the multiple original content samples;
[0016] By adding corresponding perturbation information to the multiple original content samples, multiple adversarial content samples are obtained.
[0017] In some embodiments, the step of adding perturbation information to the plurality of original content samples to obtain a plurality of adversarial content samples includes:
[0018] Obtain a first preset number of target features from the plurality of original content samples;
[0019] Perturbation information is added to the target features in the multiple original content samples to obtain multiple adversarial content samples.
[0020] In some embodiments, the step of obtaining a first preset number of target features from the plurality of original content samples includes:
[0021] Singular value decomposition is performed on the original feature matrices corresponding to the multiple original content samples to obtain the initial singular value matrices corresponding to the multiple original feature matrices.
[0022] From the initial singular value matrices corresponding to the plurality of original feature matrices, a first preset number of target singular values are determined to obtain the target singular value matrices corresponding to the plurality of original feature matrices;
[0023] Obtain the target features indicated by the target singular value matrix corresponding to the multiple original feature matrices.
[0024] In some embodiments, the step of adjusting the parameters of the click-through rate prediction model based on the plurality of first gradients and the plurality of second gradients includes:
[0025] By summing the first and second gradients corresponding to the multiple original content samples, a third gradient corresponding to the multiple original content samples is obtained.
[0026] Based on the third gradient corresponding to the multiple original content samples, the parameters of the click-through rate prediction model are adjusted using the gradient descent algorithm.
[0027] In a second aspect of this application, a content display method is also provided, the method comprising:
[0028] Receive search keywords sent by the client;
[0029] Obtain multiple search results corresponding to the search keywords;
[0030] The search keywords and the multiple search contents are respectively input into the click-through rate prediction model to obtain the score corresponding to each search content. The click-through rate prediction model is a model trained according to any of the methods described in the first aspect above.
[0031] Based on the rating corresponding to each search item, the multiple search items are sorted to determine a second preset number of target search items;
[0032] The target search content is fed back to the client, so that the client displays the target search content and, in response to the interaction command of the target search content, sends the search command corresponding to the interaction command.
[0033] In a third aspect of this application, a model training apparatus is also provided, the apparatus comprising:
[0034] The first acquisition module is used to acquire a training sample set, which includes multiple original content samples carrying labels. The multiple original content samples are generated from target data. The number of clicks on the target data is less than a preset threshold. The labels represent the click probability of the target data.
[0035] An addition module is used to add perturbation information to the multiple original content samples to obtain multiple adversarial content samples;
[0036] The first input module is used to input the plurality of original content samples and the plurality of adversarial content samples into the click-through rate prediction model respectively to obtain a plurality of first prediction values corresponding to the plurality of original content samples and a plurality of second prediction values corresponding to the plurality of adversarial content samples.
[0037] The first calculation module is used to calculate, using the plurality of first estimated values, the plurality of second estimated values and the tags carried by the plurality of original content samples, a plurality of first loss values corresponding to the plurality of original content samples and a plurality of second loss values corresponding to the plurality of adversarial content samples, respectively.
[0038] The verification module is used to verify whether the click-through rate prediction model has converged based on the plurality of first loss values and the plurality of second loss values.
[0039] The second calculation module is used to calculate multiple first gradients corresponding to the multiple original content samples and multiple second gradients corresponding to the multiple adversarial content samples when the click-through rate prediction model has not converged.
[0040] An adjustment module is used to adjust the parameters of the click-through rate prediction model based on the plurality of first gradients and the plurality of second gradients.
[0041] In some embodiments, the adding module is specifically used for:
[0042] Based on the first gradient corresponding to the plurality of original content samples, determine the gradient sign corresponding to the plurality of original content samples;
[0043] Based on the gradient sign and the preset perturbation factor, calculate the perturbation information corresponding to the multiple original content samples;
[0044] By adding corresponding perturbation information to the multiple original content samples, multiple adversarial content samples are obtained.
[0045] In some embodiments, the adding module is specifically used for:
[0046] Obtain a first preset number of target features from the plurality of original content samples;
[0047] Perturbation information is added to the target features in the multiple original content samples to obtain multiple adversarial content samples.
[0048] In some embodiments, the adding module is specifically used for:
[0049] Singular value decomposition is performed on the original feature matrices corresponding to the multiple original content samples to obtain the initial singular value matrices corresponding to the multiple original feature matrices.
[0050] From the initial singular value matrices corresponding to the plurality of original feature matrices, a first preset number of target singular values are determined to obtain the target singular value matrices corresponding to the plurality of original feature matrices;
[0051] Obtain the target features indicated by the target singular value matrix corresponding to the multiple original feature matrices.
[0052] In some embodiments, the adjustment module is specifically used for:
[0053] By summing the first and second gradients corresponding to the multiple original content samples, a third gradient corresponding to the multiple original content samples is obtained.
[0054] Based on the third gradient corresponding to the multiple original content samples, the parameters of the click-through rate prediction model are adjusted using the gradient descent algorithm.
[0055] In a fourth aspect of this application, a content display device is also provided, the device comprising:
[0056] The receiving module is used to receive search keywords sent by the client;
[0057] The second acquisition module is used to acquire multiple search contents corresponding to the search keywords;
[0058] The second input module is used to input the search keywords and the multiple search contents into the click-through rate prediction model to obtain the score corresponding to each search content. The click-through rate prediction model is a model trained according to the device described in the second aspect above.
[0059] The sorting module is used to sort the multiple search contents according to the rating corresponding to each search content, and determine a second preset number of target search contents;
[0060] The feedback module is used to feed back the target search content to the client, so that the client can display the target search content and, in response to the interaction command of the target search content, send the search command corresponding to the interaction command.
[0061] In a fifth aspect of this application, an electronic device is also provided, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;
[0062] Memory, used to store computer programs;
[0063] The processor, when executing a program stored in memory, implements any of the model training or content display methods described above.
[0064] In another aspect of this application, a computer-readable storage medium is provided, wherein a computer program is stored therein, and when the computer program is executed by a processor, it implements any of the model training or content display methods described above.
[0065] In another aspect of this application, a computer program product containing instructions is also provided, which, when run on a computer, causes the computer to perform any of the model training or content display methods described above.
[0066] In the technical solution provided in this application, for original content samples generated from target data with a low number of clicks, the electronic device adds perturbation information to the original content samples to obtain adversarial content samples. These adversarial content samples and the original content samples are then used to train the CTR prediction model. In other words, adversarial training is added to optimize the gradient update direction, which can significantly improve the learning effect of the CTR prediction model on the target data. During content recommendation and display based on the trained CTR prediction model, the CTR prediction model can accurately score the target data, improving the accuracy and effectiveness of content recommendation, and thus improving the user's search efficiency. Attached Figure Description
[0067] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below.
[0068] Figure 1 A schematic diagram illustrating a content recommendation scenario provided in an embodiment of this application;
[0069] Figure 2 This is a schematic diagram of a first flowchart of a model training method provided in an embodiment of this application;
[0070] Figure 3 This is a first detailed schematic diagram of step S22 provided in an embodiment of this application;
[0071] Figure 4 A detailed schematic diagram of step S31 provided in the embodiments of this application;
[0072] Figure 5 This is a second detailed schematic diagram of step S22 provided in an embodiment of this application;
[0073] Figure 6 A flowchart illustrating the content display method provided in this application embodiment;
[0074] Figure 7 A flowchart illustrating a model training and content display method provided in an embodiment of this application;
[0075] Figure 8 A schematic diagram of the structure of the model training device provided in the embodiments of this application;
[0076] Figure 9 A schematic diagram of the structure of the content display device provided in the embodiments of this application;
[0077] Figure 10 This is a schematic diagram of a first structure of an electronic device provided in an embodiment of this application;
[0078] Figure 11 This is a schematic diagram of a second structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0079] The technical solutions in the embodiments of this application will now be described with reference to the accompanying drawings.
[0080] Currently, in content recommendation scenarios, specifically query suggestion scenarios, recommendation systems can be divided into two stages: the recall stage and the ranking stage. The recall stage primarily retrieves multiple pieces of content to be recommended based on information such as the keywords (tokens) entered by the user in the search box and their historical search history. The ranking stage mainly uses a CTR prediction model to score and rank the retrieved content, and then displays recommended content to the user according to the ranking results.
[0081] like Figure 1 When a user enters the keyword "dragon" in the search box on the current page, the system will retrieve multiple content items {content A, content B, content C} for recommendation during the recall phase. During the ranking phase, a CTR prediction model will score the retrieved content items. If the CTR prediction model scores content A, content B, and content C as s1, s2, and s3 respectively, with s2 > s1 > s3, the items will be ranked from highest to lowest score, resulting in the ranking of content B, content A, and content C. Therefore, the recommendation system will provide the user with a query list of {content B, content A, content C}. If content matching the user's intent receives a higher score and is ranked higher for the user to click and consume, it can significantly improve the user's search efficiency.
[0082] The CTR prediction model is trained and updated using the exposed content. However, the exposed content includes long-tail data and difficult data, which have low click-through rates. The CTR prediction model's scoring of these data is not accurate enough. For example, it scores long-tail data that can meet users' real needs lower, while scoring difficult data with low actual click-through rates higher, resulting in poor content recommendation performance and seriously affecting users' search efficiency.
[0083] For example, with long-tail data, because it receives fewer exposures, its quantity is relatively small. This means the CTR prediction model cannot learn sufficiently from it, resulting in lower scores for long-tail data. Consequently, long-tail data that truly meets user needs is ranked lower, making it difficult for users to quickly access the content they require, leading to poor content recommendation performance. Furthermore, the lower scores given to long-tail data by the CTR prediction model result in its lower ranking and limited exposure opportunities, exacerbating the Matthew Effect and further reducing its visibility. This further diminishes the effectiveness of content recommendation and severely impacts user search efficiency.
[0084] The above Figure 1 In the example, a user enters the keyword "dragon," actually intending to consume content D. However, content D is long-tail data with limited exposure. The CTR prediction model hasn't fully learned from content D, resulting in a low score for it. When the recommendation system uses the CTR prediction model to recommend content from the retrieved list, due to limited exposure slots, it recommends higher-rated content {content B, content A, content C}, leaving content D without any exposure and negatively impacting the user's search experience and efficiency.
[0085] For difficult data with high exposure but low click-through rate, the CTR prediction model scores it higher, causing it to be ranked higher and exposed. However, the actual click-through rate of difficult data is low, which also leads to poor content recommendation effect and affects the user's search efficiency.
[0086] To improve content recommendation performance and enhance user search efficiency, this application provides a model training method for training a CTR prediction model. This training method can be applied to electronic devices such as computers and servers. For ease of description, electronic devices will be used as the execution subject in the following description, without limitation. The CTR prediction model can be a fusion of linear and deep learning (W&D) models or deep neural network (DNN) models, without limitation.
[0087] See Figure 2 This is a schematic diagram of the first flowchart of the model training method provided in the embodiments of this application. The above model training method includes the following steps.
[0088] Step S21: Obtain a training sample set. The training sample set includes multiple original content samples carrying labels. The multiple original content samples are generated from the target data. The number of clicks on the target data is less than a preset threshold. The labels represent the click probability of the target data.
[0089] Step S22: Add perturbation information to multiple original content samples to obtain multiple adversarial content samples.
[0090] Step S23: Input multiple original content samples and multiple adversarial content samples into the click-through rate prediction model to obtain multiple first prediction values corresponding to multiple original content samples and multiple second prediction values corresponding to multiple adversarial content samples.
[0091] Step S24: Using multiple first predicted values, multiple second predicted values, and the labels carried by multiple original content samples, calculate multiple first loss values corresponding to multiple original content samples and multiple second loss values corresponding to multiple adversarial content samples.
[0092] Step S25: Verify whether the click-through rate prediction model has converged based on multiple first loss values and multiple second loss values.
[0093] Step S26: If the click-through rate prediction model has not converged, calculate the first gradients corresponding to the original content samples and the second gradients corresponding to the adversarial content samples respectively.
[0094] Step S27: Adjust the parameters of the click-through rate prediction model based on multiple first gradients and multiple second gradients.
[0095] In the technical solution provided in this application, for original content samples generated from target data with a low number of clicks, the electronic device adds perturbation information to the original content samples to obtain adversarial content samples. These adversarial content samples and the original content samples are then used to train the CTR prediction model. In other words, adversarial training is added to optimize the gradient update direction, which can significantly improve the learning effect of the CTR prediction model on the target data. During content recommendation and display based on the trained CTR prediction model, the CTR prediction model can accurately score the target data, improving the accuracy and effectiveness of content recommendation, and thus improving the user's search efficiency.
[0096] In step S21 above, the target data is the content exposed to users and whose number of clicks is less than a preset threshold. It can be audio, images, text, etc., and the form of the content is not limited here.
[0097] In this embodiment of the application, the target data can be long-tail data or difficult data. That is, the target data can be data that meets any of the following conditions: Condition 1, the number of exposures is less than or equal to a preset first threshold; Condition 2, the number of exposures is greater than a preset second threshold and the click-through rate is less than or equal to a preset third threshold; Condition 3, the publication duration is less than the duration threshold.
[0098] The preset threshold, first threshold, second threshold, third threshold, and duration threshold can be set according to the actual scenario. For example, the preset threshold could be 1,000 records, the first threshold could be 10,000 records, and the duration threshold could be one week. The electronic device can pre-set a database, and when data belongs to the preset database, it determines that the data's publication duration is less than the duration threshold. There are no restrictions on the conditions that the target data must meet or the method by which the electronic device determines the target data.
[0099] Electronic devices can generate raw content samples based on target data to construct a training sample set. Each raw content sample includes user-input keywords, target data, content-related features, and user behavior features. Keywords are the words or phrases entered by the user for querying. Content-related features include historical data such as the number of times the content has been exposed, clicked, viewed, and the click-through rate. This historical data can be for the current user or for all users. For example, the number of exposures can include both the number of exposures for the current user and the number of exposures for all users. The data types included in the content-related features are not limited here. User behavior features are user action data, including user click records, viewing records, and favorites data. The data types included in the user behavior features are not limited here.
[0100] The target data is as described above. Figure 1 Taking content A as an example, the original content sample generated from content A includes the keyword "dragon" entered by the user, content A, related features of content A, and user behavior features.
[0101] In this embodiment, the original content sample carries a tag indicating the click probability of the target data, that is, whether the target data is clicked by the user after being exposed to the user. It can be represented by 0 or 1. For example, when the tag is 1, the target data is clicked; when the tag is 0, the target data is not clicked. The form of the tag is not limited here.
[0102] In this embodiment of the application, during the model training process, the electronic device can acquire the constructed training sample set to obtain multiple original content samples and train the CTR prediction model.
[0103] In step S22 above, the perturbation information is a small perturbation added to the original content sample. The perturbation information can be pre-generated or generated based on the original content sample after it is obtained; the method of generating the perturbation information is not limited here.
[0104] In this embodiment, the electronic device can add perturbation information to each original content sample. That is, it adds the perturbation value corresponding to that feature in the perturbation information to each feature in the original content sample to obtain an adversarial content sample corresponding to each original content sample. The label of the obtained adversarial content sample is the same as the label of the original content sample. The electronic device can also select a portion of the original content samples from multiple original content samples and add perturbation information to the portion of the original content samples to obtain the corresponding adversarial content samples. There is no limitation on this.
[0105] In step S23 above, after acquiring the training sample set, the electronic device inputs the original content samples in the training sample set into the CTR prediction model, and uses the CTR prediction model to perform forward propagation on the original content samples to obtain the first prediction value corresponding to the original content samples.
[0106] After obtaining adversarial content samples from the original content samples, the electronic device inputs the adversarial content samples into the CTR prediction model and uses the CTR prediction model to perform forward propagation on the adversarial content samples to obtain the second prediction value corresponding to the adversarial content samples.
[0107] In this embodiment of the application, after the electronic device obtains the training sample set in step S21, it can input the original content sample into the CTR prediction model to calculate the first prediction value, without needing to input the original content sample into the CTR prediction model after obtaining the adversarial content sample.
[0108] In step S24 above, after obtaining the first estimated value, the electronic device calculates the first loss value corresponding to the original content sample based on the loss function, using the first estimated value and the tags carried by the original content sample.
[0109] After obtaining the second estimated value, the electronic device calculates the second loss value corresponding to the adversarial content sample based on the loss function, using the second estimated value and the labels carried by the adversarial content sample.
[0110] In step S25 above, after obtaining the first loss value and the second loss value, the electronic device can perform convergence verification on the CTR prediction model based on the first loss value and the second loss value to determine whether the CTR prediction model has converged.
[0111] For example, after obtaining the first loss value, the electronic device can determine whether the first loss value is greater than a preset value, and then determine whether the CTR prediction model has converged. If the first loss value is greater than the preset value, it is determined that the CTR prediction model has not converged; if the first loss value is less than or equal to the preset value, it is determined that the CTR prediction model has converged.
[0112] Similarly, after obtaining the second loss value, the electronic device can determine whether the second loss value is greater than the preset value, and then determine whether the CTR prediction model has converged. This is similar to the process of judging based on the first loss value, and will not be described in detail here.
[0113] In this embodiment, the electronic device can also verify the convergence of the CTR prediction model based on the number of iterations. For example, when the electronic device trains the CTR prediction model using the training sample set more than a preset number of times, and the obtained loss value (i.e., the first loss value or the second loss value) is less than or equal to a preset value, the electronic device can determine that the CTR prediction model has converged; otherwise, the electronic device determines that the CTR prediction model has not converged. The method by which the electronic device verifies the convergence of the CTR prediction model is not limited here.
[0114] In this embodiment of the application, if the electronic device determines that the CTR prediction model has not converged, it executes step S26; if it determines that the CTR prediction model has converged, it determines that the model training has ended.
[0115] In step S26 above, when the electronic device obtains the first loss value and determines that the CTR prediction model has not converged, it performs backpropagation based on the first loss value to calculate the gradient of the loss function with respect to the parameters of the CTR prediction model, thereby obtaining the first gradient corresponding to the original content sample.
[0116] If the electronic device obtains the second loss value and determines that the CTR prediction model has not converged, it performs backpropagation based on the second loss value to calculate the gradient of the loss function with respect to the parameters of the CTR prediction model, thereby obtaining the second gradient corresponding to the adversarial content sample.
[0117] In step S27 above, after calculating the first gradient and the second gradient, the electronic device adjusts the parameters of the CTR prediction model using gradient descent and other function optimization algorithms based on the first gradient and the second gradient, respectively. The method by which the electronic device adjusts the parameters is not limited here.
[0118] In this embodiment of the application, the electronic device can repeatedly execute steps S22-S27, using multiple original content samples and multiple adversarial content samples to adjust the parameters of the CTR prediction model and train the CTR prediction model.
[0119] In one example, an electronic device can alternately train a CTR prediction model using original content samples and corresponding adversarial content samples. That is, the electronic device first trains using an original content sample from the training sample set, and then trains using the adversarial content sample corresponding to that original content sample.
[0120] In another example, the electronic device can first train the CTR prediction model using multiple original content samples sequentially, and then train the CTR prediction model using adversarial content samples corresponding to these original content samples sequentially. The method by which the electronic device trains the CTR prediction model is not limited here.
[0121] In this embodiment, the electronic device can pre-characterize the original content samples and construct a training sample set using the characterized original content samples. The characterized original content samples are then input into the CTR prediction model for training. Alternatively, the electronic device can directly input uncharacterized original content samples into the CTR prediction model, characterize the original content samples within the CTR prediction model, and then train the model. No limitation is imposed on this approach.
[0122] In this embodiment of the application, the training sample set may also include other content samples generated from other data (i.e., non-target data), and the content samples included in the training sample set are not limited here.
[0123] Electronic devices can set target identifiers for raw content samples generated from target data. They can identify content samples in the training sample set. For raw content samples carrying target identifiers, the electronic device can execute steps S22-S27 to obtain adversarial content samples corresponding to the raw content samples. These adversarial content samples are then used to adversarially train the CTR prediction model. For other content samples without target identifiers, the electronic device can directly input these samples into the CTR prediction model and, using a method similar to steps S23-S27, obtain the gradients corresponding to these samples. The parameters of the CTR prediction model are then adjusted to train the model. By using only raw content samples generated from target data for adversarial training of the CTR prediction model, and not using adversarial training for other content samples, the electronic device improves both the learning effect of the CTR prediction model on the target data and the efficiency of model training.
[0124] Electronic devices can also use original content samples and other content samples to perform adversarial training on the CTR prediction model. In this case, the electronic device does not need to identify the original content samples. After obtaining the training sample set, steps S22-S27 can be executed to perform adversarial training on the CTR prediction model using original content samples and other content samples. This improves the learning effect of the CTR prediction model on all data and further enhances the robustness and generalization ability of the CTR prediction model.
[0125] In some embodiments, see Figure 3 This is a first detailed schematic diagram of step S22 provided in the embodiments of this application. Step S22 may include the following steps.
[0126] Step S31: Obtain a first preset number of target features from multiple original content samples.
[0127] Step S32: Add perturbation information to the target features in multiple original content samples to obtain multiple adversarial content samples.
[0128] In the technical solution provided in this application embodiment, the electronic device only adds perturbations to some features in the original content sample, reducing the scope of the added perturbations and reducing the impact of the perturbations on the entire CTR prediction model. This makes the interference of the perturbations on the CTR prediction model more controllable, improves the accuracy of adversarial training, and enhances the potential and effectiveness of adversarial training.
[0129] In step S31 above, the target feature is any feature in the original content sample. The first preset quantity can be a preset value, such as 100 or 200, or it can be a value determined according to a preset percentage, such as a preset percentage of 80% or 90%. Taking a preset percentage of 80% as an example, if the total number of features is 100, then the first preset quantity is 100 * 0.8 = 80. Here, the first preset quantity is not limited.
[0130] For each original content sample, the electronic device can randomly select a first preset number of features from the original content sample as target features.
[0131] In this embodiment, the target feature can be an important feature in the original content sample. The electronic device can perform feature analysis and feature extraction on the original content sample to determine the importance of each feature in the original content sample, and obtain a first preset number of highly important features from the original content sample as target features. The method by which the electronic device obtains the target features is not limited here.
[0132] In step S32 above, the electronic device only adds perturbation information to the target feature. That is, it adds the target feature in the original content sample to the perturbation value corresponding to the target feature in the perturbation information, and keeps other features other than the target feature unchanged to obtain the adversarial content sample.
[0133] In some embodiments, see Figure 4 This is a detailed schematic diagram of step S31 provided in the embodiment of this application. Step S31 may include the following steps.
[0134] Step S41: Perform singular value decomposition on the original feature matrices corresponding to multiple original content samples to obtain the initial singular value matrices corresponding to multiple original feature matrices.
[0135] Step S42: Determine a first preset number of target singular values from the initial singular value matrices corresponding to the multiple original feature matrices to obtain the target singular value matrices corresponding to the multiple original feature matrices.
[0136] Step S43: Obtain the target features indicated by the target singular value matrix corresponding to multiple original feature matrices.
[0137] In the technical solution provided in this application embodiment, the electronic device decomposes and reconstructs the original feature matrix corresponding to the original content sample through singular value decomposition, and filters according to singular values, ignoring smaller singular values, and determines important features from the original feature matrix as target features, thereby realizing feature filtering and feature dimensionality reduction of the original feature matrix and improving the accuracy of obtaining target features.
[0138] In step S41 above, after the original content sample is characterized, the electronic device can obtain the original feature matrix corresponding to the original content sample. For each original feature matrix, the electronic device can use Singular Value Decomposition (SVD) to decompose the original feature matrix into the product of three matrices, obtaining the left singular matrix, the diagonal singular value matrix, and the right singular matrix. The diagonal singular value matrix is the initial singular value matrix.
[0139] In step S42 above, the electronic device selects the first preset number of singular values from the initial singular value matrix as target singular values to obtain the target singular value matrix. That is, the electronic device reduces the dimension of the initial singular value matrix. For example, if the first preset number is k and the dimension of the initial singular value matrix is m*m, the first k (k≤m) singular values are retained, resulting in a target singular value matrix with a dimension of k*k after dimension reduction. Here, the value of the first preset number and the dimension of the singular value matrix are not limited.
[0140] In this embodiment of the application, the electronic device can regularize the initial singular value matrix after SVD to obtain the original feature matrix, which can further reduce the overfitting of the CTR prediction model to the head data and improve the generalization ability of the CTR prediction model.
[0141] In step S43 above, after determining the target singular value matrix, the electronic device can obtain the features indicated by the target singular value matrix, which are the important features in the original feature matrix. The electronic device can use the features indicated by the target singular value matrix as target features, and then obtain the target feature matrix, thereby reconstructing the original feature matrix.
[0142] In this embodiment of the application, the electronic device may also use other feature dimensionality reduction methods such as principal component analysis (PCA) to determine the target features. The method of determining the target features is not limited here.
[0143] In some embodiments, see Figure 5 This is a second detailed schematic diagram of step S22 provided in the embodiments of this application. Step S22 may include the following steps.
[0144] Step S51: Based on the first gradient corresponding to multiple original content samples, determine the gradient sign corresponding to multiple original content samples.
[0145] Step S52: Calculate the perturbation information corresponding to multiple original content samples based on the gradient sign and the preset perturbation factor.
[0146] Step S53: Add corresponding perturbation information to multiple original content samples to obtain multiple adversarial content samples.
[0147] In the technical solution provided in this application embodiment, the electronic device calculates perturbation information by using a preset perturbation factor and the gradient symbol corresponding to the original content sample, which can more accurately guide the adversarial training process, thereby enhancing the robustness of the CTR prediction model to adversarial perturbations.
[0148] In step S51 above, the gradient sign is the gradient direction. For each original content sample, after the electronic device inputs the original content sample into the CTR prediction model and calculates the first gradient corresponding to the original content sample, it can determine the gradient sign of the first gradient, which is then used as the gradient sign corresponding to the original content sample.
[0149] In step S52 above, the preset perturbation factor can be a preset value between 0 and 1, such as 0.1, 0.01, etc., which can be set according to the actual situation. Here, the value of the preset perturbation factor is not limited. For each original content sample, the electronic device can calculate the product of the preset perturbation factor and the gradient sign of the corresponding first gradient to obtain the corresponding perturbation matrix, which serves as the perturbation information.
[0150] In this embodiment, the electronic device may also sign the feature value of each feature in the original content sample and calculate the product of a preset perturbation factor and the sign of the feature value to obtain a perturbation matrix as perturbation information. The method by which the electronic device obtains the perturbation information is not limited here.
[0151] In step S53 above, for each original content sample, the electronic device adds the perturbation value corresponding to the feature in the corresponding perturbation information to each feature in the original content sample to obtain the corresponding adversarial content sample.
[0152] In this embodiment, the electronic device can calculate perturbation information based on the acquired target features. After acquiring the target features and obtaining the first gradient corresponding to the original content sample, the electronic device can determine the gradient sign of the gradient corresponding to the target features in the first gradient, and use it as the gradient sign corresponding to the original content sample. Then, the electronic device can calculate the product of a preset perturbation factor and the gradient sign corresponding to the target features to obtain the perturbation information corresponding to the target features, and add the perturbation information corresponding to the target features to the target features in the original content sample to obtain the adversarial content sample. The method by which the electronic device acquires the target features can be found in the above description of step S31.
[0153] In this embodiment, the electronic device can determine the gradient sign in the following two ways.
[0154] Method 1: After the electronic device calculates the first gradient grad and obtains the target feature (also known as the target feature matrix) r_matrix, it can first determine the gradient r_matrix.grad corresponding to the target feature from the first gradient, and then take the sign of the gradient r_matrix.grad corresponding to the target feature to obtain the gradient sign (also known as the gradient sign matrix) sign(r_matrix.grad) corresponding to the target feature.
[0155] Method 2: The electronic device can first denote the sign of the first gradient grad to obtain the gradient sign sign(grad), and then determine the gradient sign r_matrix.sign(grad) corresponding to the target feature from the gradient signs of the first gradient. The method for determining the gradient sign corresponding to the target feature is not limited here.
[0156] This explanation uses the example of an electronic device first determining the gradient corresponding to a target feature and then assigning a sign to that gradient; it is not intended to be limiting. The electronic device determines the gradient sign (sign(r_matrix.grad)) corresponding to the target feature, calculates the product of the preset perturbation factor epsilon and each value in the gradient sign, and obtains the perturbation matrix, as shown in the following formula.
[0157] r_at=epsilon*sign(r_matrix.grad)
[0158] Where r_at is the perturbation matrix, i.e., the perturbation information, epsilon is the preset perturbation factor, sign(·) is the sign function, r_matrix is the target feature, and grad is the first gradient.
[0159] In some embodiments, the electronic device may implement step S24 above by the following steps: accumulating the first gradient and the second gradient corresponding to multiple original content samples to obtain the third gradient corresponding to multiple original content samples; and adjusting the parameters of the click-through rate prediction model by using a gradient descent algorithm based on the third gradient corresponding to multiple original content samples.
[0160] For each original content sample, the electronic device obtains the first gradient corresponding to the original content sample and the second gradient corresponding to the adversarial content sample. Then, it accumulates the first and second gradients to obtain the third gradient corresponding to the original content sample. This third gradient is used as the gradient for gradient descent in this training iteration to adjust the parameters of the CTR prediction model. By accumulating the first and second gradients and adjusting the CTR prediction model parameters based on the accumulated gradient (i.e., the third gradient), the model parameters only need to be adjusted once when using original content samples and corresponding adversarial content samples for adversarial training of the CTR prediction model. This improves both the robustness and generalization ability of the CTR prediction model and training efficiency.
[0161] Corresponding to the model training method described above, this application also provides a content display method, see [link to relevant documentation]. Figure 6 This is a flowchart illustrating a content display method provided in an embodiment of this application. This content display method can be applied to electronic devices such as computers and servers. In this embodiment, the electronic device used for model training and the electronic device used for content display can be the same electronic device or different electronic devices. The example of a server as the execution entity is not intended to be limiting. The above content display method includes the following steps.
[0162] Step S61: Receive search keywords sent by the client.
[0163] Step S62: Obtain multiple search results corresponding to the search keywords.
[0164] Step S63: Input the search keywords and multiple search terms into the click-through rate prediction model to obtain the score corresponding to each search term. The click-through rate prediction model is a model trained according to any of the above model training methods.
[0165] Step S64: Sort the multiple search results according to the rating corresponding to each search result, and determine the second preset number of target search results.
[0166] Step S65: Feedback the target search content to the client, so that the client can display the target search content and respond to the interaction command of the target search content by sending the search command corresponding to the interaction command.
[0167] In the technical solution provided in this application, for original content samples generated from target data with a low number of clicks, the electronic device adds perturbation information to the original content samples to obtain adversarial content samples. These adversarial content samples and the original content samples are then used to train the CTR prediction model. In other words, adversarial training is added to optimize the gradient update direction, which can significantly improve the learning effect of the CTR prediction model on the target data. During content recommendation and display based on the trained CTR prediction model, the CTR prediction model can accurately score the target data, improving the accuracy and effectiveness of content recommendation, and thus improving the user's search efficiency.
[0168] In step S61 above, the client obtains the search keywords entered by the user and sends them to the server. The server receives the search keywords sent by the client, such as the word or phrase to be queried.
[0169] In step S62 above, the server recalls multiple contents to be recommended based on the search keywords and related or historical data of the search keywords, which are then used as multiple search contents corresponding to the search keywords.
[0170] In step S63 above, the CTR prediction model is a model trained using any of the above model training methods. The server inputs the search keywords and multiple search results into the trained CTR prediction model, and uses the CTR prediction model to score each search result, obtaining a score corresponding to each search result.
[0171] The server can obtain user behavior characteristics of users who input search keywords and content-related characteristics for each search result. It then inputs the search keywords, multiple search results, content-related characteristics for each search result, and user behavior characteristics into the CTR prediction model to obtain a score for each search result. For details regarding content-related characteristics and user behavior characteristics, please refer to the relevant description in step S21 above.
[0172] In step S64 above, the second preset quantity represents the number of exposure positions and is a positive integer greater than 0. The value of the second preset quantity is not limited here. The server ranks multiple search results based on the CTR prediction model's rating of each search result and determines the top second preset quantity of search results with the highest ratings as the target search results.
[0173] In step S65 above, the server feeds back the determined second preset number of target search results to the client, causing the client to display the target search results to the user in descending order of their sorting order. After the client receives an interaction instruction from the user such as a click on a target search result, the client responds to the interaction instruction by generating a search instruction and sending it to the server. The server executes the search instruction and returns the search results to the client.
[0174] The following is combined with Figure 7 This application provides a detailed description of the model training and content display methods provided in its embodiments.
[0175] Figure 7 This is a flowchart illustrating a model training and content display method provided in an embodiment of this application. It is described using an electronic device as the execution subject and is not intended to be limiting.
[0176] Step S71: Construct training samples and label and identify the target data samples.
[0177] In this embodiment, the electronic device constructs training samples (i.e., a training sample set), identifies target data samples (i.e., original content samples) generated from target data from the training samples, and labels the target data samples (i.e., carries target identifiers for the target data samples). The target data here can be data with insufficient exposure (i.e., exposure less than or equal to a first threshold), data with high exposure but low click-through rate (i.e., exposure greater than a second threshold and click-through rate less than or equal to a third threshold), or newly uploaded data (i.e., publication duration less than a duration threshold), etc., and can be specifically set according to the actual scenario.
[0178] Step S72: During the training of the CTR prediction model, perturbation is added to the feature matrix of the target data sample to perform adversarial training.
[0179] In this embodiment, the electronic device selects a suitable CTR prediction model for training, such as a DNN model. During model training, the electronic device determines the target data sample based on whether the sample carries a target identifier, adds a perturbation to the original feature matrix of the target data sample to obtain the perturbated feature matrix (i.e., the adversarial feature matrix) and the corresponding adversarial content sample, and updates the gradient based on the original feature matrix and the perturbated feature matrix. The specific implementation steps include:
[0180] Step A1: The electronic device acquires an input sample x (i.e., the target data sample) and its corresponding label y.
[0181] Step A2: The electronic device uses the CTR prediction model to perform forward propagation on the original feature matrix of the input sample x to obtain the output class (i.e., the first prediction value) of the input sample x.
[0182] Step A3: The electronic device performs SVD on the original feature matrix of the input sample x, selects and retains 80% to 90% (i.e., the preset percentage of the total number of features), and reconstructs the original feature matrix of the input sample x to obtain the reconstructed matrix r_matrix (i.e., the target feature matrix).
[0183] Step A4: The electronic device calculates the gradient (i.e., the first gradient) of the loss function of the CTR prediction model with respect to the input sample x.
[0184] The execution order of steps A3 and A4 is not specified here.
[0185] Step A5: The electronic device determines the gradient r_matrix.grad corresponding to the reconstruction matrix r_matrix based on the reconstruction matrix r_matrix and the gradient grad corresponding to the input sample x, and multiplies r_matrix.grad with a small perturbation factor epsilon to obtain the perturbation matrix r_at. The calculation formula is r_at=epsilon*sign(r_matrix.grad).
[0186] Step A6: The electronic device adds the perturbation matrix r_at to the original feature matrix of the input sample x to obtain the adversarial feature matrix and the adversarial content sample x'.
[0187] Step A7: The electronic device uses the CTR prediction model to perform forward propagation on the adversarial feature matrix of the adversarial sample x' to obtain the output class of the adversarial sample x' (i.e., the second prediction value). The loss function is calculated again using the adversarial feature matrix of the adversarial sample x', and backpropagation is performed to calculate the gradient of the loss function of the CTR prediction model with respect to the adversarial sample x' (i.e., the second gradient). The gradient of the adversarial training is accumulated on the gradient of the original training (i.e., the gradient corresponding to the input sample x and the gradient corresponding to the adversarial sample x' are accumulated), and the gradient information is updated.
[0188] In step A8, the electronic device performs gradient descent based on the updated gradient information to update the parameters of the CTR prediction model.
[0189] Step S73: Deploy the CTR prediction model online for content recommendation.
[0190] In this embodiment of the application, the electronic device deploys the CTR prediction model online. When the user enters a keyword (i.e., a token), the CTR prediction model is used to score the recalled content (i.e., the query). The recalled content is sorted from high to low scores, and the top N (i.e., top-N, which is the second-to-last preset number of items) of the sorted content are displayed to the user.
[0191] In the technical solution provided in this application, an adversarial training method is introduced when the CTR prediction model is trained using target data samples. The CTR prediction model not only learns to accurately classify the original input samples but also to accurately classify adversarial samples. Adversarial training can improve the robustness of text classification and language models, enabling the CTR prediction model to better learn from target data such as long-tailed and difficult data, and thus better handle target data.
[0192] The technical solution provided in this application introduces a singular value decomposition method to decompose the feature matrix, obtain important target features, and then calculate a more reasonable perturbation value based on the target features. This can reduce the impact of perturbation on the entire CTR prediction model, make the perturbation more controllable, reduce the degree of damage of perturbation to the entire model, and improve the potential and effect of adversarial training.
[0193] By applying the technical solutions provided in the embodiments of this application, the learning effect of the model on long-tail data or difficult data can be improved, the overfitting of the model to head data (i.e., content data with high exposure and high click-through rate) can be reduced, the generalization ability and robustness of the model can be improved, the recommendation effect of long-tail resources can be effectively improved, the distribution and traffic-driving efficiency of high-quality resources can be improved, the user's search experience can be improved, and the problem that some long-tail data that can actually meet the user's real needs are ranked low due to the low CTR prediction model score, thus failing to meet the user's actual consumption intentions can be solved.
[0194] Corresponding to the above-described model training method embodiments, this application also provides a model training apparatus, see [link to relevant documentation]. Figure 8 This is a schematic diagram of a model training device provided in an embodiment of this application. The device includes:
[0195] The first acquisition module 81 is used to acquire a training sample set, which includes multiple original content samples carrying labels. The multiple original content samples are generated from target data. The number of clicks on the target data is less than a preset threshold. The label represents the click probability of the target data.
[0196] Adding module 82 is used to add perturbation information to the multiple original content samples to obtain multiple adversarial content samples;
[0197] The first input module 83 is used to input the plurality of original content samples and the plurality of adversarial content samples into the click-through rate prediction model respectively to obtain a plurality of first prediction values corresponding to the plurality of original content samples and a plurality of second prediction values corresponding to the plurality of adversarial content samples.
[0198] The first calculation module 84 is used to calculate, using the plurality of first estimated values, the plurality of second estimated values and the tags carried by the plurality of original content samples, a plurality of first loss values corresponding to the plurality of original content samples and a plurality of second loss values corresponding to the plurality of adversarial content samples, respectively.
[0199] The verification module 85 is used to verify whether the click-through rate prediction model has converged based on the plurality of first loss values and the plurality of second loss values.
[0200] The second calculation module 86 is used to calculate, respectively, multiple first gradients corresponding to the multiple original content samples and multiple second gradients corresponding to the multiple adversarial content samples when the click-through rate prediction model has not converged.
[0201] The adjustment module 87 is used to adjust the parameters of the click-through rate prediction model based on the plurality of first gradients and the plurality of second gradients.
[0202] In the technical solution provided in this application, for original content samples generated from target data with a low number of clicks, the electronic device adds perturbation information to the original content samples to obtain adversarial content samples. These adversarial content samples and the original content samples are then used to train the CTR prediction model. In other words, adversarial training is added to optimize the gradient update direction, which can significantly improve the learning effect of the CTR prediction model on the target data. During content recommendation and display based on the trained CTR prediction model, the CTR prediction model can accurately score the target data, improving the accuracy and effectiveness of content recommendation, and thus improving the user's search efficiency.
[0203] In some embodiments, the adding module 82 is specifically used for:
[0204] Based on the first gradient corresponding to the plurality of original content samples, determine the gradient sign corresponding to the plurality of original content samples;
[0205] Based on the gradient sign and the preset perturbation factor, calculate the perturbation information corresponding to the multiple original content samples;
[0206] By adding corresponding perturbation information to the multiple original content samples, multiple adversarial content samples are obtained.
[0207] In some embodiments, the adding module 82 is specifically used for:
[0208] Obtain a first preset number of target features from the plurality of original content samples;
[0209] Perturbation information is added to the target features in the multiple original content samples to obtain multiple adversarial content samples.
[0210] In some embodiments, the adding module 82 is specifically used for:
[0211] Singular value decomposition is performed on the original feature matrices corresponding to the multiple original content samples to obtain the initial singular value matrices corresponding to the multiple original feature matrices.
[0212] From the initial singular value matrices corresponding to the plurality of original feature matrices, a first preset number of target singular values are determined to obtain the target singular value matrices corresponding to the plurality of original feature matrices;
[0213] Obtain the target features indicated by the target singular value matrix corresponding to the multiple original feature matrices.
[0214] In some embodiments, the adjustment module 87 is specifically used for:
[0215] By summing the first and second gradients corresponding to the multiple original content samples, a third gradient corresponding to the multiple original content samples is obtained.
[0216] Based on the third gradient corresponding to the multiple original content samples, the parameters of the click-through rate prediction model are adjusted using the gradient descent algorithm.
[0217] Corresponding to the above-described content display method embodiments, this application also provides a content display device, see below. Figure 9 This is a schematic diagram of a content display device provided in an embodiment of this application. The device includes:
[0218] The receiving module 91 is used to receive search keywords sent by the client;
[0219] The second acquisition module 92 is used to acquire multiple search contents corresponding to the search keywords;
[0220] The second input module 93 is used to input the search keywords and the multiple search contents into the click-through rate prediction model to obtain the score corresponding to each search content. The click-through rate prediction model is a model trained according to any of the above-described model training devices.
[0221] The sorting module 94 is used to sort the multiple search contents according to the rating corresponding to each search content, and determine a second preset number of target search contents;
[0222] Feedback module 95 is used to feed back the target search content to the client, so that the client displays the target search content and responds to the interaction command of the target search content by sending the search command corresponding to the interaction command.
[0223] In the technical solution provided in this application, for original content samples generated from target data with a low number of clicks, the electronic device adds perturbation information to the original content samples to obtain adversarial content samples. These adversarial content samples and the original content samples are then used to train the CTR prediction model. In other words, adversarial training is added to optimize the gradient update direction, which can significantly improve the learning effect of the CTR prediction model on the target data. During content recommendation and display based on the trained CTR prediction model, the CTR prediction model can accurately score the target data, improving the accuracy and effectiveness of content recommendation, and thus improving the user's search efficiency.
[0224] This application also provides an electronic device, such as... Figure 10 As shown, it includes a processor 101, a communication interface 102, a memory 103, and a communication bus 104, wherein the processor 101, the communication interface 102, and the memory 103 communicate with each other through the communication bus 104.
[0225] Memory 103 is used to store computer programs;
[0226] When processor 101 executes a program stored in memory 103, it performs the following steps:
[0227] Obtain a training sample set, which includes multiple original content samples with labels. These original content samples are generated from the target data. The number of clicks on the target data is less than a preset threshold. The labels represent the click probability of the target data.
[0228] By adding perturbation information to multiple original content samples, multiple adversarial content samples are obtained.
[0229] Input multiple original content samples and multiple adversarial content samples into the click-through rate prediction model to obtain multiple first-valued estimates corresponding to multiple original content samples and multiple second-valued estimates corresponding to multiple adversarial content samples.
[0230] Using multiple first-valued estimates, multiple second-valued estimates, and the labels carried by multiple original content samples, we calculate multiple first-valued losses corresponding to multiple original content samples and multiple second-valued losses corresponding to multiple adversarial content samples, respectively.
[0231] Based on multiple first loss values and multiple second loss values, verify whether the click-through rate prediction model has converged;
[0232] In the case where the click-through rate prediction model has not converged, calculate multiple first gradients corresponding to multiple original content samples and multiple second gradients corresponding to multiple adversarial content samples respectively.
[0233] The parameters of the click-through rate prediction model are adjusted based on multiple first gradients and multiple second gradients.
[0234] This application also provides an electronic device, such as... Figure 11 As shown, it includes a processor 111, a communication interface 112, a memory 113, and a communication bus 114, wherein the processor 111, the communication interface 112, and the memory 113 communicate with each other through the communication bus 114.
[0235] Memory 113 is used to store computer programs;
[0236] When processor 111 executes a program stored in memory 113, it performs the following steps:
[0237] Receive search keywords sent by the client;
[0238] Retrieve multiple search results corresponding to search keywords;
[0239] The search keywords and multiple search terms are input into the click-through rate (CTR) prediction model to obtain a score for each search term. The CTR prediction model is based on... Figures 2-5 The model trained by any model training method;
[0240] Based on the rating corresponding to each search item, sort the multiple search items to determine the second preset number of target search items;
[0241] The target search content is fed back to the client, enabling the client to display the target search content and respond to the interaction command of the target search content by sending the corresponding search command.
[0242] The communication bus mentioned above can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This communication bus can be divided into address bus, data bus, control bus, etc. For ease of illustration, only one thick line is used to represent it in the diagram, but this does not mean that there is only one bus or one type of bus.
[0243] The communication interface is used for communication between the aforementioned terminal and other devices.
[0244] The memory may include random access memory (RAM) or non-volatile memory, such as at least one disk storage device. Optionally, the memory may also be at least one storage device located remotely from the aforementioned processor.
[0245] The processors mentioned above can be general-purpose processors, including central processing units (CPUs), network processors (NPs), etc.; they can also be digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0246] In another embodiment provided in this application, a computer-readable storage medium is also provided, wherein a computer program is stored therein, and when the computer program is executed by a processor, it implements any of the model training or content display methods described in the above embodiments.
[0247] In another embodiment provided in this application, a computer program product containing instructions is also provided, which, when run on a computer, causes the computer to perform any of the model training or content display methods described in the above embodiments.
[0248] In the above embodiments, implementation can be achieved entirely or partially through software, hardware, firmware, or any combination thereof. When implemented using software, it can be implemented entirely or partially in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state disk (SSD)).
[0249] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0250] The various embodiments in this specification are described in a related manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the embodiments of apparatus, electronic devices, computer storage media, and computer program products are basically similar to the method embodiments, so the descriptions are relatively simple; relevant parts can be referred to the descriptions of the method embodiments.
[0251] The above description is merely a preferred embodiment of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application are included within the scope of protection of this application.
Claims
1. A model training method, characterized in that, The method includes: Obtain a training sample set, which includes multiple original content samples carrying labels. The multiple original content samples are generated from target data. The number of clicks on the target data is less than a preset threshold. The labels represent the click probability of the target data. Perturbation information is added to the multiple original content samples to obtain multiple adversarial content samples; The multiple original content samples and the multiple adversarial content samples are respectively input into the click-through rate prediction model to obtain multiple first prediction values corresponding to the multiple original content samples and multiple second prediction values corresponding to the multiple adversarial content samples. Using the multiple first estimated values, the multiple second estimated values, and the labels carried by the multiple original content samples, calculate the multiple first loss values corresponding to the multiple original content samples and the multiple second loss values corresponding to the multiple adversarial content samples, respectively. Based on the plurality of first loss values and the plurality of second loss values, verify whether the click-through rate prediction model has converged; If the click-through rate prediction model does not converge, calculate the multiple first gradients corresponding to the multiple original content samples and the multiple second gradients corresponding to the multiple adversarial content samples respectively. The parameters of the click-through rate prediction model are adjusted based on the plurality of first gradients and the plurality of second gradients.
2. The method according to claim 1, characterized in that, The step of adding perturbation information to the multiple original content samples to obtain multiple adversarial content samples includes: Based on the first gradient corresponding to the plurality of original content samples, determine the gradient sign corresponding to the plurality of original content samples; Based on the gradient sign and the preset perturbation factor, calculate the perturbation information corresponding to the multiple original content samples; By adding corresponding perturbation information to the multiple original content samples, multiple adversarial content samples are obtained.
3. The method according to claim 1, characterized in that, The step of adding perturbation information to the multiple original content samples to obtain multiple adversarial content samples includes: Obtain a first preset number of target features from the plurality of original content samples; Perturbation information is added to the target features in the multiple original content samples to obtain multiple adversarial content samples.
4. The method according to claim 3, characterized in that, The step of obtaining a first preset number of target features from the plurality of original content samples includes: Singular value decomposition is performed on the original feature matrices corresponding to the multiple original content samples to obtain the initial singular value matrices corresponding to the multiple original feature matrices. From the initial singular value matrices corresponding to the plurality of original feature matrices, a first preset number of target singular values are determined to obtain the target singular value matrices corresponding to the plurality of original feature matrices; Obtain the target features indicated by the target singular value matrix corresponding to the multiple original feature matrices.
5. The method according to claim 1, characterized in that, The step of adjusting the parameters of the click-through rate prediction model based on the plurality of first gradients and the plurality of second gradients includes: By summing the first and second gradients corresponding to the multiple original content samples, a third gradient corresponding to the multiple original content samples is obtained. Based on the third gradient corresponding to the multiple original content samples, the parameters of the click-through rate prediction model are adjusted using the gradient descent algorithm.
6. A content display method, characterized in that, The method includes: Receive search keywords sent by the client; Obtain multiple search results corresponding to the search keywords; The search keywords and the multiple search contents are respectively input into the click-through rate prediction model to obtain the score corresponding to each search content. The click-through rate prediction model is a model trained according to the method described in any one of claims 1-5. Based on the rating corresponding to each search item, the multiple search items are sorted to determine a second preset number of target search items; The target search content is fed back to the client, so that the client displays the target search content and, in response to the interaction command of the target search content, sends the search command corresponding to the interaction command.
7. A model training device, characterized in that, The device includes: The first acquisition module is used to acquire a training sample set, which includes multiple original content samples carrying labels. The multiple original content samples are generated from target data. The number of clicks on the target data is less than a preset threshold. The labels represent the click probability of the target data. An addition module is used to add perturbation information to the multiple original content samples to obtain multiple adversarial content samples; The first input module is used to input the plurality of original content samples and the plurality of adversarial content samples into the click-through rate prediction model respectively to obtain a plurality of first prediction values corresponding to the plurality of original content samples and a plurality of second prediction values corresponding to the plurality of adversarial content samples. The first calculation module is used to calculate, using the plurality of first estimated values, the plurality of second estimated values and the tags carried by the plurality of original content samples, a plurality of first loss values corresponding to the plurality of original content samples and a plurality of second loss values corresponding to the plurality of adversarial content samples, respectively. The verification module is used to verify whether the click-through rate prediction model has converged based on the plurality of first loss values and the plurality of second loss values. The second calculation module is used to calculate multiple first gradients corresponding to the multiple original content samples and multiple second gradients corresponding to the multiple adversarial content samples when the click-through rate prediction model has not converged. An adjustment module is used to adjust the parameters of the click-through rate prediction model based on the plurality of first gradients and the plurality of second gradients.
8. A content display device, characterized in that, The device includes: The receiving module is used to receive search keywords sent by the client; The second acquisition module is used to acquire multiple search contents corresponding to the search keywords; The second input module is used to input the search keywords and the multiple search contents into the click-through rate prediction model to obtain the score corresponding to each search content. The click-through rate prediction model is a model trained by the device according to claim 7. The sorting module is used to sort the multiple search contents according to the rating corresponding to each search content, and determine a second preset number of target search contents; The feedback module is used to feed back the target search content to the client, so that the client can display the target search content and, in response to the interaction command of the target search content, send the search command corresponding to the interaction command.
9. An electronic device, characterized in that, It includes a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus; Memory, used to store computer programs; A processor, when executing a program stored in memory, implements the steps of the method according to any one of claims 1-5 or 6.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the steps of the method according to any one of claims 1-5 or 6.