Ensemble learning model, model determination method, prediction method, medium and product
By using an ensemble learning model based on user segmentation, weights adapted to different user groups are generated through user segmentation and weight generation. This solves the robustness problem of machine learning models under uneven user distribution and enables the model to adapt efficiently and predict accurately among different user groups.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
- Filing Date
- 2022-11-29
- Publication Date
- 2026-06-19
AI Technical Summary
Machine learning models exhibit significant differences in performance between offline and online environments, primarily due to uneven user distribution and variations in user behavior, leading to insufficient model robustness.
An ensemble learning model based on user segmentation is adopted. The user segmentation part determines the user characteristics of the target user and generates corresponding weights. It is then combined with N sub-models for prediction. The weight generation part generates the weights of each sub-model to control their contribution in the ensemble learning.
This improves the robustness of the model, enabling it to adapt to changes in user distribution across different demographics, reducing performance differences between online and offline environments, and simplifying model deployment and maintenance.
Smart Images

Figure CN115796311B_ABST
Abstract
Description
Technical Field
[0001] This specification relates to the field of cluster learning technology, and in particular to an ensemble learning model based on population stratification, a method and apparatus for determining an ensemble learning model based on population stratification, a prediction method and apparatus based on an ensemble learning model, a computer-readable storage medium, an electronic device, and a computer program product. Background Technology
[0002] Machine learning often exhibits performance differences between offline and online training, primarily due to uneven user distribution and variations in user behavior across different user groups. Specifically, offline training data is collected from user data over a specific period, focusing on a subset of users. During model training, the principle of minimizing empirical error loss based on the overall sample size leads to mutual influence between user groups, meaning the overall optimal model may not be optimal for every individual user group. In online prediction, variations in user distribution cause a decline in the overall predictive performance of the model. Therefore, the robustness of models provided by related technologies needs improvement.
[0003] It should be noted that the information disclosed in the background section above is only used to enhance the understanding of the background of this specification, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention
[0004] The purpose of this specification is to provide an ensemble learning model based on population stratification, a method and apparatus for determining an ensemble learning model based on population stratification, a prediction method and apparatus based on an ensemble learning model, a computer-readable storage medium, an electronic device, and a computer program product, which can at least improve the robustness of the model to a certain extent.
[0005] Other features and advantages of this specification will become apparent from the following detailed description, or may be learned in part by practice of this specification.
[0006] According to one aspect of this specification, a model is provided, comprising: a user segmentation part for determining user characteristics corresponding to the target user based on user characteristics of the target user; an ensemble learning part comprising N sub-models, where N is an integer greater than 1; and a weight generation part for generating an i-th weight corresponding to the i-th sub-model based on the user characteristics corresponding to the target user, where i is a positive integer not greater than N; wherein the i-th sub-model is used to determine an i-th prediction branch value for a learning objective based on the user characteristics of the target user and the product characteristics of the marketing product; and the ensemble learning part is used to determine a prediction value based on the i-th prediction branch value and the i-th weight.
[0007] According to another aspect of this specification, a method for determining an ensemble learning model based on audience segmentation is provided. The ensemble learning model based on audience segmentation includes: an audience segmentation part, an ensemble learning part comprising N sub-models, and a weight generation part. The method includes: inputting user features of sample users into the audience segmentation part to determine audience features corresponding to the sample users; wherein the user features of the sample users and the audience features corresponding to the sample users are used to determine a first loss function; inputting the audience features corresponding to the sample users into the weight generation part to generate an i-th weight corresponding to the i-th sub-model, where i is a positive integer not greater than N; using the i-th sub-model, determining an i-th prediction branch value for the learning objective based on the user features of the sample users and the product features of the marketing products; using the ensemble learning part, determining fine-grained sample prediction values based on the i-th prediction branch value and the i-th weight; wherein a second loss function is determined based on the fine-grained sample prediction values and the actual values corresponding to the sample users; and determining the ensemble learning model based on audience segmentation based on the first loss function and the second loss function.
[0008] According to another aspect of this specification, a prediction method based on an ensemble learning model is provided. The method includes: inputting user characteristics of a user to be tested into the population segmentation part of an ensemble learning model based on population segmentation to obtain population characteristics corresponding to the user to be tested, wherein the ensemble learning model based on population segmentation is trained according to the determination method for the ensemble learning model based on population segmentation provided in the above embodiments; inputting the user characteristics of the user to be tested and the product characteristics of marketing products (such as advertisements, goods, etc.) into N sub-models of the ensemble learning part, wherein the i-th sub-model outputs the i-th prediction branch value regarding the learning objective, where N is an integer greater than 1 and i is a positive integer not greater than N; determining a predicted value for the user to be tested based on the i-th prediction branch value and the i-th weight, wherein the i-th weight is generated based on the population characteristics corresponding to the user to be tested.
[0009] According to another aspect of this specification, an apparatus for determining an ensemble learning model based on population stratification is provided, wherein the ensemble learning model based on population stratification includes: a population segmentation part, an ensemble learning part including N sub-models, and a weight generation part; the apparatus includes: a first loss determination module, an input module, a branch value determination module, a second loss determination module, and a model training module.
[0010] The first loss determination module is used to input the user characteristics of the sample users into the population segmentation part, and determine the population characteristics corresponding to the sample users through the population segmentation part; wherein the user characteristics of the sample users and the population characteristics corresponding to the sample users are used to determine the first loss function; the input module is used to input the population characteristics corresponding to the sample users into the weight generation part, and generate the i-th weight corresponding to the i-th sub-model, where i is a positive integer not greater than N; the branch value determination module is used to determine the i-th predicted branch value about the learning objective through the i-th sub-model, based on the user characteristics of the sample users and the product characteristics of the marketing products; the second loss determination module is used to determine the fine-grained sample prediction value through the ensemble learning part, based on the i-th predicted branch value and the i-th weight; wherein the second loss function is determined based on the fine-grained sample prediction value and the actual value corresponding to the sample users; and the model training module is used to determine the ensemble learning model based on population segmentation based on the first loss function and the second loss function.
[0011] According to another aspect of this specification, a prediction apparatus based on an ensemble learning model is provided, wherein the apparatus includes: a first input module, a second input module, and a prediction module.
[0012] The first input module is used to input the user characteristics of the user to be tested into the audience segmentation part of the audience segmentation-based ensemble learning model to obtain the audience characteristics corresponding to the user to be tested. The audience segmentation-based ensemble learning model is trained according to the determination method of the audience segmentation-based ensemble learning model. The second input module is used to input the user characteristics of the user to be tested and the product characteristics of the marketing product into the N sub-models of the ensemble learning part, respectively. The i-th sub-model outputs the i-th prediction branch value with respect to the learning objective. N is an integer greater than 1, and i is a positive integer not greater than N. The prediction module is used to determine the prediction value of the user to be tested based on the i-th prediction branch value and the i-th weight. The i-th weight is generated based on the audience characteristics corresponding to the user to be tested.
[0013] According to another aspect of this specification, an electronic device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the hierarchical cluster learning method as described in the above embodiments, or implements the prediction method based on an ensemble learning model as described in the above embodiments.
[0014] According to one aspect of this specification, a computer-readable storage medium is provided, wherein instructions are stored therein, which, when executed on a computer or processor, cause the computer or processor to perform the hierarchical cluster learning method as described in the above embodiments, or to implement the prediction method based on an ensemble learning model as described in the above embodiments.
[0015] According to another aspect of this specification, a computer program product containing instructions is provided that, when the computer program product is run on a computer or processor, causes the computer or processor to perform the hierarchical cluster learning method as described in the above embodiments, or to implement the prediction method based on an ensemble learning model as described in the above embodiments.
[0016] The ensemble learning model based on population stratification, the method and apparatus for determining the ensemble learning model based on population stratification, the prediction method and apparatus based on the ensemble learning model, the computer-readable storage medium, the electronic device, and the computer program product provided in the embodiments of this specification have the following technical effects:
[0017] The ensemble learning model based on user segmentation provided in this specification includes: a user segmentation part, an ensemble learning part comprising N sub-models, and a weight generation part. The user segmentation part determines the user characteristics corresponding to the target user. Furthermore, the weight generation part generates a set of weights based on these user characteristics, where the N weight values correspond to the N sub-models. Since different user groups have different characteristics, different sets of weights can be determined based on these characteristics. Because the N weights correspond to the N sub-models in the ensemble learning part, the contributions of each sub-model to the prediction value differ for two users who do not belong to the same user group. Therefore, by segmenting the users to be tested through the user segmentation part and generating different sets of weights based on different user characteristics, the model can be applied to users belonging to different user groups. In other words, changes in user distribution do not lead to a decrease in the overall predictive performance of the model. Thus, the ensemble learning model based on user segmentation provided in this specification has high robustness.
[0018] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this specification. Attached Figure Description
[0019] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this specification and, together with the description, serve to explain the principles of this specification. It is obvious that the drawings described below are merely some embodiments of this specification, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort.
[0020] Figure 1 This is a schematic diagram of the structure of the ensemble learning model based on population stratification provided in the embodiments of this specification.
[0021] Figure 2 This is a flowchart illustrating the method for determining an ensemble learning model based on population stratification, as provided in the embodiments of this specification.
[0022] Figure 3 This is a flowchart illustrating a method for determining an ensemble learning model based on population stratification, as provided in another embodiment of this specification.
[0023] Figure 4 This is a flowchart illustrating the prediction method based on an ensemble learning model provided in the embodiments of this specification.
[0024] Figure 5 This is a schematic diagram of the structure of an ensemble learning model based on population stratification, provided as another embodiment of this specification.
[0025] Figure 6 This is a schematic diagram of the structure of the integrated learning model determination device based on population stratification provided in the embodiments of this specification.
[0026] Figure 7 This is a schematic diagram of the structure of an integrated learning model determination device based on population stratification, provided for another embodiment of this specification.
[0027] Figure 8 This is a schematic diagram of the structure of the prediction device based on the ensemble learning model provided in the embodiments of this specification.
[0028] Figure 9 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this specification. Detailed Implementation
[0029] To make the objectives, technical solutions, and advantages of this specification clearer, the embodiments of this specification will be described in further detail below with reference to the accompanying drawings.
[0030] In the following description, when referring to the accompanying drawings, the same numbers in different drawings denote the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this specification. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this specification as detailed in the appended claims.
[0031] Example embodiments will now be described more fully with reference to the accompanying drawings. However, example embodiments can be implemented in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided to make this specification more comprehensive and complete, and to fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics can be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a full understanding of the embodiments described herein. However, those skilled in the art will recognize that the technical solutions described herein may be practiced with one or more of the specific details omitted, or other methods, components, apparatus, steps, etc., may be employed. In other instances, well-known technical solutions are not shown or described in detail to avoid obscuring various aspects of this specification.
[0032] Furthermore, the accompanying drawings are merely illustrative diagrams of this specification and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and therefore repeated descriptions of them will be omitted. Some block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in software, in one or more hardware modules or integrated circuits, or in different network and / or processor devices and / or microcontroller devices.
[0033] This specification provides embodiments of an ensemble learning model based on population stratification, a method and apparatus for determining an ensemble learning model based on population stratification, a prediction method and apparatus based on an ensemble learning model, a computer-readable storage medium, an electronic device, and a computer program product, which can solve the problems existing in related technologies.
[0034] in, Figure 1 This is a schematic diagram illustrating the structure of an ensemble learning model based on population stratification, provided in an embodiment of this specification. (Reference) Figure 1 The figure shows an ensemble learning model 100 based on population segmentation, which includes: a population segmentation part 10, an ensemble learning part 30, and a weight generation part 20.
[0035] refer to Figure 1The ensemble learning component 30 comprises N sub-models, where N is an integer greater than 1. For example, these sub-models can be neural networks. The neural network structures of the N sub-models are identical, but their parameters may differ after training. Due to these differences in model parameters, the output vectors of each sub-model will vary, reflecting their different emphases on the input. Therefore, in predicting one or more learning objectives, different sub-models excel at different aspects. It is evident that combining the predicted branch values of at least two sub-models is more accurate than using the predicted branch value of a single sub-model as the total prediction.
[0036] In related technologies, to achieve high accuracy in ensemble learning predictions, a separate ensemble learning model is set up for each population group. For example, users belonging to population A use ensemble learning model 1, which includes the 1st, 2nd, and 4th of the aforementioned N sub-models; users belonging to population B use ensemble learning model 2, which includes the 3rd, 4th, and 6th of the aforementioned N sub-models, and so on. It is evident that to be applicable to multiple population groups, multiple models are required, necessitating multiple model training and maintenance processes, resulting in resource waste.
[0037] However, the ensemble learning model based on population stratification provided in the embodiments of this specification further includes: a population segmentation part 10 and a weight generation part 20. The population segmentation part 10 determines the population to which the user to be tested belongs and inputs the population characteristics corresponding to that population. Further, the weight generation part 20 generates a set of weights based on the population characteristics corresponding to the user to be tested, and this set of weights controls the contribution of each of the N sub-models in the ensemble learning.
[0038] This instruction manual first introduces the population segmentation section 10 in its embodiments:
[0039] Understandably, user characteristics include age, gender, education level, and occupation. Based on these characteristics, multiple users can be divided into multiple groups. For example, the characteristics e1 corresponding to group E1 are: female, aged 25-35, and with a college degree or above; the characteristics e2 corresponding to group E2 are: male, aged 30-45, and with a college degree or above, and so on.
[0040] In the embodiments of this specification, the aforementioned population segmentation section 10 is used to determine the population feature e corresponding to the target user U based on the user feature u0 of any user (target user U). xFor example, if the target user U's own user characteristics are: male, 28 years old, and with a bachelor's degree, then based on the above-mentioned population segmentation part 10, the population represented by the target user U can be determined as E2. Further, the population characteristics e2 corresponding to E2 can be obtained as: male, aged between 30 and 45 years old, and with a college degree or above. Therefore, the population characteristics e2 corresponding to the target user U are determined. x = e2.
[0041] The following is an example of weight generation in this specification, section 20:
[0042] The aforementioned weight generation section 20 is used to generate weights corresponding to the N group models of the ensemble learning section based on the demographic characteristics of the target user. Specifically, the i-th weight corresponds to the i-th sub-model, where i is a positive integer not greater than N. The i-th sub-model is used to determine the i-th prediction branch value for the learning objective based on the user characteristics of the target user and the product characteristics of the marketing product (such as advertisements, goods, etc.). Further, the predicted value is determined based on the i-th prediction branch value and the i-th weight. For example, the ensemble learning section 30 includes 4 sub-models, and the predicted branch values output for the target user A are [y1, y2, y3, y4]. The weights generated by the weight generation section 20 based on the demographic characteristics of the target user A can be represented as [0, 0.5, 0, 0.5]. Therefore, the total predicted value is: 0×y1+0.5×y2+0×y3+0.5×y4. The ensemble learning part 30 includes four sub-models, and the predicted branch values output for the target user B are [y1', y2', y3', y4']. The weights generated by the weight generation part 20 based on the population characteristics corresponding to the target user B can be expressed as [0.1, 0.2, 0.7, 0]. Therefore, the total predicted value is 0.1×y1'+0.2×y2'+0.7×y3'+0×y4'.
[0043] pass Figure 1 The ensemble learning model 100 based on user segmentation shown can group users into different groups and determine the characteristics of each group when they belong to different groups. Furthermore, different group weights can be determined for different group characteristics, thus allowing explicit control over the contribution of each sub-model in the ensemble learning part based on these different group weights. It is evident that the embodiments in this specification only require training and maintaining one model to be applicable to users belonging to different groups. In other words, changes in user distribution do not lead to a decrease in the overall predictive performance of the model. Therefore, the ensemble learning model based on user segmentation provided in the embodiments of this specification has high robustness.
[0044] In an exemplary embodiment, Figure 2This is a flowchart illustrating the method for determining an ensemble learning model based on population stratification, provided in the embodiments of this specification. Figure 1 The training process of the model. (Reference) Figure 2 The embodiment shown in the figure includes S210-S240.
[0045] In S210, the user characteristics of the sample user are input into the crowd segmentation part, and the crowd segmentation part determines the crowd characteristics corresponding to the sample user; wherein, the user characteristics of the sample user and the crowd characteristics corresponding to the sample user are used to determine the first loss function.
[0046] In an exemplary embodiment, reference is made to Figure 1 The population segmentation component 10 includes a feature processing network 101 and a discrete latent variable space 102. For example, the feature processing network 101 can be implemented using a neural network f(). For example, the discrete latent variable space 102 contains multiple latent variables [e1, …e2]. k ] represents the population characteristics corresponding to k groups. Here, k is a hyperparameter, which can be determined according to the user scale corresponding to the input sample characteristics, for example, it can be selected from the hundreds to the thousands level.
[0047] In an exemplary embodiment, before inputting the user characteristics of the sample users into the above-mentioned population segmentation part 10, the model parameters are first randomly initialized, that is, the parameters of the neural network f() are initialized, and the k hidden vectors in the discrete hidden variable space 102 are initialized.
[0048] In an exemplary embodiment, reference is made to Figure 1 The user features u of the i-th sample user i The input is processed by the feature processing network 101 to obtain the latent vector f(u) of the sample. i Furthermore, the sample latent vector f(u) i The vectors in the discrete latent variable space 102 are compared to obtain the first population feature z(u) corresponding to the i-th sample user. i ), that is, from [e1, …e k [e was determined in] s .
[0049]
[0050] Where j takes values from 1 to k. Specifically, the latent vector f(u) of the previous sample is obtained. i ) and each latent vector center e in the discrete latent variable space 102 j (j takes the value of an integer between 1 and k, inclusive) the distance between them, and the vector e corresponding to the minimum distance is calculated. s It is determined to be the latent vector f(u) of the above samples.i Determine the matching population center, i.e., vector e. s It can reflect the characteristics of the population to which the i-th sample user belongs.
[0051] Furthermore, the first loss function Loss1 mentioned above is based on the aforementioned sample latent vector f(u) i ) and the characteristics of the first group mentioned above e s Certainly. For example, Loss1 can be represented as:
[0052]
[0053] Here, M represents the total number of samples used to train the aforementioned model 100. It can be understood that during model training, the aforementioned user segmentation part can divide the M sample users into at most k groups, thus enabling user segmentation based on a single machine learning model. Furthermore, it enables a hierarchical cluster learning scheme based on user segmentation and innovatively introduces explicit control over user segmentation, which is beneficial for improving the model's adaptability to differences in user distribution and exhibits good robustness to changes in online user distribution, reducing performance differences between online and offline environments. In addition, compared to general hierarchical cluster learning, the model provided in this embodiment uses only a single network, thereby simplifying model deployment and maintenance, and realizing the implementation of an end-to-end hierarchical ensemble learning solution.
[0054] In an exemplary embodiment, to avoid overfitting of the crowd segmentation part 10, it is necessary to block certain features (features that are not important for crowd segmentation) from entering this network part, and instead allow features that are important for crowd segmentation to enter this network part. Therefore, in this embodiment of the specification, based on the parameters (w1, w2, ... w) of the first layer of the feature processing network 101, h ,…w S (where h is a positive integer not greater than S, and S is a positive integer), determine the fifth loss function (denoted as Sparsity loss). Sparsity loss can be expressed as:
[0055]
[0056] Specifically, the Sparsity loss is constrained by the population segmentation part, with parameter w. h The larger the value, the greater the sparsity loss. Constrained by the sparsity loss, the parameters of the first layer of the feature processing network 101 tend to have weights of 0 for certain input features. Here, sparsity indicates that the input is sparse (because w...). h(If the input value is 0, the output will not be affected by whether the input has a value or not), thus playing a role in filtering the features of the input samples.
[0057] In an exemplary embodiment, to enhance the stability of the model and avoid overfitting in the crowd segmentation part 10, an L1 penalty can be added to the parameters of the first layer of the feature processing network 101. This can also serve to filter features, penalizing features that are not significantly shared in the crowd segmentation, with the aim of obtaining sparse input.
[0058] Continue to refer to Figure 2 In S220, the demographic features corresponding to the sample users are input into the weight generation part to generate the i-th weight corresponding to the i-th sub-model. Here, i takes the value of a positive integer not greater than N.
[0059] In an exemplary embodiment, reference is made to Figure 1 In the population characteristics corresponding to the i-th sample user Before input weight generation part 20, feature processing can be performed through semantic addition part 40. In the scheme provided in the embodiments of this specification, in order to ensure that the vector of input weight generation part 20 not only reflects the characteristics of the user's group, but also needs to be related to the learning objective (in the context of product marketing, the learning objective is such as CTR (Click-Through-Rate) or CVR (Conversion Rate)). And because the discrete latent variable space 102 is determined above... This characterizes the features of the group (denoted as, first group feature), reflecting only the characteristics of the group to which the user belongs. Therefore, in this embodiment, a semantic addition section 40 is provided for the first group feature. Predicting learning objectives can help normalize the characteristics of the first group. The vector representation is obtained by setting the corresponding loss function (auxiliary prediction loss function). This ensures that the second population features output by the semantic addition part 40 can reflect the above learning objectives.
[0060] Specifically, refer to Figure 1 The characteristics of the first group Input the semantic addition part 40 (represented as h()) to make predictions about the learning objective and obtain the second group features corresponding to the target user. Among them, the characteristics of the second group It can reflect the characteristics of the target user's demographic group and includes semantic features related to the learning objectives mentioned above.
[0061] For example, the above auxiliary prediction loss function (also known as the fourth loss function) can be expressed as:
[0062] Loss =
[0063] in, Indicates the characteristics of the first group of people After inputting into the semantic addition section 40 above, the resulting predicted label (coarse-grained sample prediction value) for the learning objective is obtained. This represents the actual label of the i-th sample user regarding the learning objective. The auxiliary prediction loss function Loss4 described above can be used to prevent the collapse of the network model in the crowd segmentation strategy (including the crowd segmentation part 10 and the semantic addition part 40).
[0064] It is understandable that the semantic addition part 40 can use a simple neural network to achieve auxiliary prediction (coarse-grained prediction) of the learning target (ctr, cvr). Since the input features of this part are the features of the population to which the sample users belong, it is impossible to use it for accurate prediction of the learning target. As mentioned above, it is mainly used to bring the constraint semantics of the learning target into the features of the first population to obtain the features of the second population.
[0065] In an exemplary embodiment, the target gradient determined based on the fourth loss function is assigned to the user feature u input to the feature processing network 101 during gradient backpropagation. i For example, the target gradient determined based on the fourth loss function is [0.2, 0.1, 0.3, 0.4]. During gradient backpropagation, the gradient of u is also set to [0.2, 0.1, 0.3, 0.4]. This helps to enhance the added semantic features related to the learning target. Therefore, the weights of the feature processing network 101 are adjusted based on the gradient descent described above, so that the latent variables predicted by the feature processing network 101 can have corresponding representations, thereby serving the learning target.
[0066] Furthermore, the characteristics of the aforementioned second population group Input the weight generation part 20 above to generate the weights corresponding to each sub-model in the ensemble learning part 30.
[0067] In an exemplary embodiment, reference is made to Figure 1 The aforementioned weight generation part 20 employs a gated network. Based on this, Figure 3 This diagram illustrates a flowchart of an ensemble learning model determination method based on population stratification, as provided in another embodiment of this specification. Figure 2 It is implemented based on this. S220' can be a specific implementation of the above-mentioned S220.
[0068] In S220', the population features corresponding to the sample users are input into the gating network to obtain a sample gate vector of dimension N, which contains the i-th weight corresponding to the i-th sub-model.
[0069] And also execute S300: determine the ensemble learning model based on the population stratification according to the first loss function, the second loss function and the third loss function.
[0070] In the embodiments provided in this specification, a gating network is used to achieve explicit control over the combination of sub-models in the ensemble learning part, which helps to avoid the repetition of sub-model combinations corresponding to different groups. Specifically, a loss function (i.e., the third loss function, also known as the diversity loss function) is constructed for the above-mentioned gating network. Based on this loss function, the goal is to ensure that all sub-models in the ensemble learning part are used equally, balancing the efficiency of each sub-model. The diversity loss function is expressed as:
[0071]
[0072] in, This represents batch_size, assuming size is 128, i.e. = 128. N is the dimension of the output vectors of the gating network. Assuming the gate vector dimension is 10, then one batch corresponds to 128 vector groups of dimension 10: 128 10. Understandably, this is related to the number of sub-models in the ensemble network. If the number of sub-models is 10, then the above dimension can be set to 10. Further, in the above loss function, the column vectors of this vector group are summed and averaged to obtain an average vector with dimension 10. Each dimension of the average vector corresponds to the probability of the corresponding sub-model being selected. In this embodiment, it is expected that the probability of each sub-model being selected is equal; therefore, the loss function aims to make each dimension value as close as possible to 1 / 10 (τ value). Here, τ value represents the mean, is related to the dimension N of the gate vector, and can be estimated using 1 / N.
[0073] In an exemplary embodiment, to enhance model stability, a small amount of random noise is added to the input of the gating network, thereby improving the model's robustness to disturbances after training. Specifically, a small amount of noise can be generated using a Gaussian distribution and added to the crowd feature z(u), which is then passed to the weight generation part G().
[0074] refer to Figure 2 and Figure 3The model training scheme provided in the embodiments of this specification further includes, in S230, determining the i-th prediction branch value for the learning objective through the i-th sub-model based on the user characteristics of the sample users and the product characteristics of the marketing products.
[0075] For example, suppose the ensemble learning part contains 10 sub-models, and the input features for each sub-model are the same, which are the user features u of the i-th sample user. i (u) i (Same as the audience segmentation section) and product characteristics of the marketing product v i The 10 sub-models predict the learning objective as [y1, y2, y3, y4, y5, y6, y7, y8, y9, y10]. If each prediction branch value is a 128-dimensional vector, then the output of the 10 sub-models is 10. A vector group E of 128.
[0076] On the other hand, if the aforementioned second population characteristic G(h(z(u) i After inputting the weight generation part 20 above, a set of weights with dimension 10 is output as [0.84, 0, 0, 0.11, 0, 0.02, 0.01, 0, 0, 0].
[0077] Combining the outputs of S220 and S230, the output G(h(z(u) of the integrated network section 20 can be determined. i The weights are multiplied by E, thus achieving a weighted aggregation of the N sub-models. It is understood that the weight generation part 20 can also be other network structures capable of generating N weights, not limited to gating networks.
[0078] The model training method provided in the embodiments of this specification further includes S240: determining the fine-grained sample prediction value based on the i-th prediction branch value and the i-th weight; wherein, the second loss function is determined based on the fine-grained sample prediction value and the actual value corresponding to the sample user.
[0079] It is understandable that the above fine-grained sample prediction can be expressed as: the output of the ensemble network part 20, G(h(z(u)). i Therefore, the fine-grained sample prediction value is the output G(h(z(u) of the integrated network part 20) in the above embodiment. i The method for determining “)))×E” will not be elaborated here.
[0080] The second loss function can be expressed as:
[0081]
[0082] in, That is, the above-mentioned fine-grained sample prediction value G(h(z(u) i )))×E. Wherein, for the same sample, the learning objective of the coarse-grained prediction in the auxiliary prediction loss function and the learning objective of the fine-grained prediction in the second loss function can be consistent, then... and In Consistency indicates that the i-th sample user has the actual label for the learning objective.
[0083] Understandably, compared to the coarse-grained prediction in the above embodiments, the input is the crowd feature z(u). i In this embodiment, the input for fine-grained prediction is the user feature u of the sample user. i and product features v i Because the input features for this prediction are richer and more comprehensive, the prediction accuracy is higher, so it is called fine-grained prediction.
[0084] Continue to refer to Figure 3 As one implementation of S250, S250' is executed: determining the ensemble learning model based on population stratification according to the first loss function, the second loss function, and the third loss function.
[0085] According to the foregoing embodiments, five loss functions are used for model parameter optimization. For example, the formula for determining the total loss function (LOSS) based on the five loss functions can be expressed as:
[0086] LOSS = loss2+ α × (loss4+ loss1) + β × loss5+ γ× loss3
[0087] For example, α takes a value between 0.1 and 0.2, β takes a value between 0.05 and 0.1, and γ takes a value between 0.05 and 0.1.
[0088] Through LOSS backward gradient propagation, updates will be made simultaneously as follows: Figure 1 The parameters of each network component in the population-stratified ensemble learning model are shown.
[0089] It should be noted that LOSS includes loss2 and loss1. Loss3, loss4, and loss5 can be set according to actual needs. For example, if LOSS includes loss2, loss1, and loss3, then the above LOSS = loss2 + α × loss1 + γ × loss3. Or, if LOSS includes loss2, loss1, and loss4, then the above LOSS = loss2 + α × (loss4 + loss1).
[0090] In the scheme provided in the embodiments of this specification, the concept of user grouping is explicitly constructed by setting a discrete latent variable space E. By constraining the loss1 of group clustering, it is expected that users with similar performance will be clustered. By adding a loss4 that imposes semantic constraints on the learning objective, it is expected that users from different groups will also have differences in their latent variables due to differences in auxiliary prediction performance, thus avoiding model collapse in the group segmentation part. It can be seen that through loss1 and loss4, the meta-group (or group center) behind the users can be determined, that is, the group features containing the semantics of the learning objective can be obtained.
[0091] Furthermore, by incorporating user demographic features that include the semantics of the learning objectives, the combination of sub-models in the ensemble learning process is controlled. Specifically, the loss3 applied to the weight generation part ensures that different sub-models are used as evenly as possible. Considering the differences in user demographic features, this allows for explicit control over different sub-model combinations for different user groups, preventing model collapse in the weight generation part.
[0092] The solutions provided in the embodiments of this specification, on the one hand, achieve explicit construction of user groups by setting a user segmentation part and a discrete latent variable space E; on the other hand, they explicitly control the combination of sub-models in the ensemble learning part by setting a weight generation part. This results in good adaptability to differences in user distribution and good robustness to changes in online user distribution, reducing performance differences between online and offline models.
[0093] In an exemplary embodiment, Figure 4 This is a flowchart illustrating the prediction method based on an ensemble learning model provided in the embodiments of this specification. (Reference) Figure 4 The embodiment shown in the figure includes: S410-S430.
[0094] In S410, the user characteristics of the user to be tested are input into the population segmentation part of the ensemble learning model based on population stratification to obtain the population characteristics corresponding to the user to be tested.
[0095] The aforementioned ensemble learning model based on population stratification was trained using the model training method described in the above embodiments.
[0096] For example, refer to Figure 5 The user features u' of the user to be tested are input into the population segmentation part 10 of the ensemble learning model based on population stratification to obtain the population features z(u')=e of the user to be tested. s '.
[0097] For example, refer to Figure 5The user features u' of the user to be tested are input into the feature processing network 101 of the population segmentation part 10 of the ensemble learning model based on population stratification, to obtain the latent vector f(u') of the user to be tested. Further, based on the distance between the latent vector f(u') and each discrete latent variable in the discrete latent variable space E, the distance e with the smallest distance is determined. s ' represents the population feature z(u) corresponding to the user to be tested.
[0098] In S420, the user characteristics of the user to be tested and the product characteristics of the marketing product are respectively input into the N sub-models of the ensemble learning part, where the i-th sub-model outputs the i-th prediction branch value about the learning objective.
[0099] Where N is an integer greater than 1, and i is a positive integer not greater than N.
[0100] In an exemplary embodiment, the product feature v may include features such as product name, shape, ingredients, and color. It is understood that the product features of the marketing product are not limited and can be set according to the actual needs of the product.
[0101] In this embodiment, the user features u' of the user to be tested are input into each sub-model of the population segmentation part 10 and the ensemble learning part 30. The population segmentation part 10 and the discrete latent variable space 102 can determine the population to which the user belongs and determine the corresponding population center (population feature, i.e., z(u') = e^(-u')). s Each of the N sub-models can determine its predicted branch value for the learning objective based on its own input. Specifically, the i-th sub-model makes a prediction about the learning objective based on the input features (v and u') and outputs the i-th predicted branch value.
[0102] In an exemplary embodiment, the ensemble learning model based on user segmentation also includes a semantic addition component 40. This semantic addition component 40 (denoted as h()) is used for coarse-grained prediction of the learning objective. The semantic addition component 40 ensures that the vector generated by the input weights in the generation component 20 not only reflects the characteristics of the user's user segment but also relates to the learning objective (in a product marketing scenario, the learning objective might be CTR or CVR). This is because the vectors defined in the discrete latent variable space 102... This characterizes the features of the population (denoted as, the first population feature), reflecting only the characteristics of the user's population. Therefore, in this embodiment, a semantic addition section 40 is provided for the first population feature. Predicting learning objectives can help normalize the characteristics of the first group. The vector representation of .
[0103] It is understandable that the feature used by the semantic addition part 40 to predict the learning target is the population center, while the features used by the above sub-models to predict the learning target are richer (v and u'). As a result, the accuracy of the prediction results of the semantic addition part is relatively poor. Therefore, the prediction of the semantic addition part can be referred to as "coarse-grained" prediction, and the prediction of the ensemble learning part can be referred to as "fine-grained" prediction.
[0104] Specifically, after the discrete latent variable z(u') corresponding to the user's population (i.e., the first feature of the user's population) is input into the semantic addition part 40, the semantic addition part 40 performs a coarse-grained prediction on it regarding the learning target. The semantic addition part 40 outputs a coarse-grained prediction value regarding the learning target. Simultaneously, after processing by the semantic addition part 40, the second feature of the user's population h(z(u')) can be obtained. The second feature of the user's population h(z(u')) reflects the characteristics of the user's population and also includes semantic features related to the aforementioned learning target.
[0105] Furthermore, through the weight generation part 20, a set of weights is generated based on the second test population feature h(z(u')) mentioned above. This set of weights can control the contribution of each of the N sub-models in the ensemble learning.
[0106] Continue to refer to Figure 4 In S430, the predicted value for the user to be tested is determined based on the i-th predicted branch value and the i-th weight, wherein the i-th weight is generated based on the population characteristics corresponding to the user to be tested.
[0107] In an exemplary embodiment, if the weights generated by the weight generation part 20 are [0.1, 0.2, 0.7, 0], and the predicted branch values output by the N sub-models of the ensemble learning part are [y1, y2, y3, y4], then the total predicted value is: 0.1×y1'+0.2×y2'+0.7×y3'+0×y4'.
[0108] It should be noted that the above figures are merely illustrative of the processes included in the methods according to exemplary embodiments of this specification, and are not intended to be limiting. It is readily understood that the processes shown in the above figures do not indicate or limit the temporal order of these processes. Furthermore, it is readily understood that these processes may, for example, be executed synchronously or asynchronously in multiple modules.
[0109] The following are embodiments of the apparatus described in this specification, which can be used to execute the embodiments of the methods described in this specification. For details not disclosed in the apparatus embodiments of this specification, please refer to the embodiments of the methods described in this specification.
[0110] in, Figure 6A schematic diagram of a determination device for an ensemble learning model based on population stratification, which can be applied according to an embodiment of this specification, is shown. Please refer to... Figure 6 The device for determining the ensemble learning model based on population segmentation, as shown in the figure, can be implemented as a whole or part of an electronic device through software, hardware, or a combination of both. It can also be integrated as an independent module on a server or within an electronic device. The aforementioned ensemble learning model based on population segmentation includes: a population segmentation part, an ensemble learning part comprising N sub-models, and a weight generation part.
[0111] The device 600 for determining the ensemble learning model based on population stratification in the embodiments of this specification includes: a first loss determination module 610, an input module 620, a branch value determination module 630, a second loss determination module 640, and a model training module 650.
[0112] The first loss determination module 610 is used to input the user characteristics of the sample users into the population segmentation part, and determine the population characteristics corresponding to the sample users through the population segmentation part; wherein the user characteristics of the sample users and the population characteristics corresponding to the sample users are used to determine the first loss function; the input module 620 is used to input the population characteristics corresponding to the sample users into the weight generation part, and generate the i-th weight corresponding to the i-th sub-model, where i is a positive integer not greater than N; the branch value determination module 630 is used to determine the i-th predicted branch value about the learning objective through the i-th sub-model, based on the user characteristics of the sample users and the product characteristics of the marketing products; the second loss determination module 640 is used to determine the fine-grained sample prediction value through the ensemble learning part, based on the i-th predicted branch value and the i-th weight; wherein the second loss function is determined based on the fine-grained sample prediction value and the actual value corresponding to the sample users; and the model training module 650 is used to determine the ensemble learning model based on population segmentation based on the first loss function and the second loss function.
[0113] In an exemplary embodiment, Figure 7 A schematic diagram illustrates a structural representation of a determination apparatus for an ensemble learning model based on population stratification, according to another exemplary embodiment of this specification. See also... Figure 7 :
[0114] In an exemplary embodiment, based on the above scheme, the weight generation part is a gating network; the input module 620 is specifically used to: input the population features corresponding to the sample users into the gating network to obtain a sample gate vector of dimension N, wherein the sample gate vector contains the i-th weight corresponding to the i-th sub-model.
[0115] The above-mentioned device 600 for determining the ensemble learning model based on population stratification further includes: a third loss determination module 660, used to determine a third loss function based on the above-mentioned sample gate vector of a preset number of samples;
[0116] The aforementioned model training module 650 is specifically used to: determine the aforementioned ensemble learning model based on population stratification according to the aforementioned first loss function, the aforementioned second loss function, and the aforementioned third loss function.
[0117] In an exemplary embodiment, based on the above scheme, the above-mentioned population segmentation part includes: a feature processing network and a discrete latent variable space;
[0118] The aforementioned first loss determination module 610 is specifically used for: inputting the user features of the sample user into the aforementioned feature processing network for feature processing to obtain the sample latent vector; and comparing the aforementioned sample latent vector with the vector in the aforementioned discrete latent variable space to obtain the first population feature corresponding to the aforementioned sample user; wherein, the aforementioned first loss function is determined based on the aforementioned sample latent vector and the aforementioned first population feature.
[0119] In an exemplary embodiment, based on the above scheme, the model further includes: a semantic addition part; the input module 620 is specifically used to: predict the learning target based on the first group feature through the semantic addition part to obtain the second group feature corresponding to the target user, wherein the second group feature includes semantic features related to the learning target; and input the second group feature into the weight generation part to generate the i-th weight corresponding to the i-th sub-model.
[0120] In an exemplary embodiment, based on the above scheme, the device 600 for determining the ensemble learning model based on population stratification further includes: a fourth loss determination module 670, used to predict the first population features with respect to the learning target to obtain a coarse-grained sample prediction value; and to determine a fourth loss function based on the coarse-grained sample prediction value and the actual value corresponding to the sample user.
[0121] The aforementioned model training module 650 is specifically used to: determine the aforementioned ensemble learning model based on population stratification according to the aforementioned first loss function, the aforementioned second loss function, and the aforementioned fourth loss function.
[0122] In an exemplary embodiment, based on the above scheme, the device 600 for determining the ensemble learning model based on population stratification further includes: an assignment module 680.
[0123] The assignment module 680 is used to: after the fourth loss determination module 670 determines the fourth loss function based on the fine-grained sample prediction value and the actual value corresponding to the sample user, assign the target gradient determined by the fourth loss function to the user features input to the feature processing network.
[0124] In an exemplary embodiment, based on the above scheme, the device 600 for determining the ensemble learning model based on population stratification further includes: a fifth loss determination module 690, used to determine a fifth loss function based on the parameters of the first layer of the neural network of the feature processing network.
[0125] The aforementioned model training module 650 is specifically used to: determine the aforementioned ensemble learning model based on population stratification according to the aforementioned first loss function, the aforementioned second loss function, and the aforementioned fifth loss function.
[0126] It should be noted that the device for determining the ensemble learning model based on population stratification provided in the above embodiments is only illustrated by the division of the above functional modules when executing the method for determining the ensemble learning model based on population stratification. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above.
[0127] Furthermore, the apparatus for determining an ensemble learning model based on population stratification and the method for determining an ensemble learning model based on population stratification provided in the above embodiments belong to the same concept. Therefore, for details not disclosed in the apparatus embodiments of this specification, please refer to the embodiments of the method for determining an ensemble learning model based on population stratification described above, which will not be repeated here.
[0128] in, Figure 8 A schematic diagram of a prediction device based on an ensemble learning model, applicable to an embodiment of this specification, is shown. Please refer to... Figure 8 The prediction device based on the ensemble learning model shown in the figure can be implemented as all or part of an electronic device through software, hardware, or a combination of both. It can also be integrated as a standalone module on a server or within an electronic device. The aforementioned ensemble learning model includes: a population segmentation component, an ensemble learning component comprising N sub-models, and a weight generation component.
[0129] The prediction device 800 based on the ensemble learning model described in the embodiments of this specification includes: a first input module 810, a second input module 820, and a prediction module 830.
[0130] The first input module 810 is used to input the user characteristics of the user to be tested into the audience segmentation part of the audience segmentation-based ensemble learning model to obtain the audience characteristics corresponding to the user to be tested. The audience segmentation-based ensemble learning model is trained according to the determination method of the audience segmentation-based ensemble learning model. The second input module 820 is used to input the user characteristics of the user to be tested and the product characteristics of the marketing product into the N sub-models of the ensemble learning part, respectively. The i-th sub-model outputs the i-th prediction branch value with respect to the learning objective. N is an integer greater than 1, and i is a positive integer not greater than N. The prediction module 830 is used to determine the prediction value of the user to be tested based on the i-th prediction branch value and the i-th weight. The i-th weight is generated based on the audience characteristics corresponding to the user to be tested.
[0131] It should be noted that the prediction device based on the ensemble learning model provided in the above embodiments is only illustrated by the division of the above functional modules when executing the prediction method based on the ensemble learning model. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above.
[0132] Furthermore, the prediction device based on the ensemble learning model and the prediction method based on the ensemble learning model provided in the above embodiments belong to the same concept. Therefore, for details not disclosed in the device embodiments of this specification, please refer to the above embodiments of the prediction method based on the ensemble learning model in this specification, which will not be repeated here.
[0133] Figure 9 This schematic diagram illustrates the structure of an electronic device according to an exemplary embodiment of this specification. Please refer to... Figure 9 As shown, the electronic device 900 includes a processor 901 and a memory 902.
[0134] In this embodiment, processor 901 is the control center of the computer system and can be a processor of a physical machine or a processor of a virtual machine. Processor 901 may include one or more processing cores, such as a 4-core processor or an 8-core processor. Processor 901 may be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). Processor 901 may also include a main processor and a coprocessor; the main processor is used to process data in the wake-up state, and the coprocessor is a low-power processor used to process data in the standby state.
[0135] In the embodiments described in this specification, the processor 901 is specifically used for:
[0136] The method comprises a user segmentation part, an ensemble learning part including N sub-models, and a weight generation part; the method includes: inputting the user features of sample users into the user segmentation part to determine the user characteristics corresponding to the sample users; wherein, the user features of the sample users and the user characteristics corresponding to the sample users are used to determine a first loss function; inputting the user characteristics corresponding to the sample users into the weight generation part to generate the i-th weight corresponding to the i-th sub-model, where i is a positive integer not greater than N; using the i-th sub-model, determining the i-th prediction branch value for the learning objective based on the user features of the sample users and the product features of the marketing products; using the ensemble learning part, determining fine-grained sample prediction values based on the i-th prediction branch value and the i-th weight; wherein, determining a second loss function based on the fine-grained sample prediction values and the actual values corresponding to the sample users; and determining the ensemble learning model based on user segmentation based on the first loss function and the second loss function.
[0137] Furthermore, the aforementioned weight generation part is a gating network;
[0138] The above-mentioned input of the population features corresponding to the above-mentioned sample users into the above-mentioned weight generation part to generate the i-th weight corresponding to the i-th sub-model includes: inputting the population features corresponding to the above-mentioned sample users into the above-mentioned gating network to obtain a sample gate vector of dimension N, wherein the sample gate vector contains the i-th weight corresponding to the i-th sub-model;
[0139] The processor 901 is also specifically used to: determine the third loss function based on the sample gate vector of the preset number of samples;
[0140] The above determination of the ensemble learning model based on population stratification based on the first loss function and the second loss function includes: determining the ensemble learning model based on population stratification based on the first loss function, the second loss function and the third loss function.
[0141] Furthermore, the aforementioned population segmentation component includes: a feature processing network and a discrete latent variable space;
[0142] The above-mentioned input of the user features of the sample users into the above-mentioned population segmentation part, and the determination of the population features corresponding to the sample users by the above-mentioned population segmentation part, includes: inputting the user features of the sample users into the above-mentioned feature processing network for feature processing to obtain the sample latent vector; comparing the above-mentioned sample latent vector with the vector in the above-mentioned discrete latent variable space to obtain the first population feature corresponding to the sample users; wherein, the above-mentioned first loss function is determined based on the above-mentioned sample latent vector and the above-mentioned first population feature.
[0143] Furthermore, the above model also includes: a semantic addition component;
[0144] The above-mentioned input of the population features corresponding to the sample users into the weight generation part includes: using the semantic addition part to predict the learning target based on the first population features to obtain the second population features corresponding to the target users, wherein the second population features contain semantic features related to the learning target; inputting the second population features into the weight generation part to generate the i-th weight corresponding to the i-th sub-model.
[0145] Furthermore, the processor 901 is specifically used to: predict the learning target based on the features of the first group of people to obtain coarse-grained sample prediction values; and determine a fourth loss function based on the coarse-grained sample prediction values and the actual values corresponding to the sample users.
[0146] The above determination of the ensemble learning model based on population stratification based on the first loss function and the second loss function includes: determining the ensemble learning model based on population stratification based on the first loss function, the second loss function and the fourth loss function.
[0147] Furthermore, the processor 901 is specifically used to: after determining the fourth loss function based on the fine-grained sample prediction value and the actual value corresponding to the sample user, the method further includes: assigning the target gradient determined based on the fourth loss function to the user features input to the feature processing network.
[0148] Furthermore, the processor 901 is specifically used to: determine the fifth loss function based on the parameters of the first layer of the neural network of the feature processing network;
[0149] The above determination of the ensemble learning model based on population stratification based on the first loss function and the second loss function includes: determining the ensemble learning model based on population stratification based on the first loss function, the second loss function and the fifth loss function.
[0150] In the embodiments described in this specification, the processor 901 is further specifically used for:
[0151] The user characteristics of the test user are input into the audience segmentation part of the audience segmentation-based ensemble learning model to obtain the audience characteristics corresponding to the test user. The audience segmentation-based ensemble learning model is trained according to the audience segmentation-based ensemble learning model determination method provided in the above scheme. The user characteristics of the test user and the product characteristics of the marketing product are respectively input into the N sub-models of the above ensemble learning part. The i-th sub-model outputs the i-th prediction branch value with respect to the learning objective. N is an integer greater than 1 and i is a positive integer not greater than N. The predicted value for the test user is determined according to the i-th prediction branch value and the i-th weight. The i-th weight is generated according to the audience characteristics corresponding to the test user.
[0152] Memory 902 may include one or more computer-readable storage media, which may be non-transitory. Memory 902 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In some embodiments of this specification, the non-transitory computer-readable storage media in memory 902 is used to store at least one instruction for execution by processor 901 to implement the methods in the embodiments of this specification.
[0153] In some embodiments, the electronic device 900 further includes a peripheral device interface 903 and at least one peripheral device. The processor 901, memory 902, and peripheral device interface 903 are connected via a bus or signal line. Each peripheral device can be connected to the peripheral device interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes at least one of a display screen 904, a camera 905, and an audio circuit 906.
[0154] Peripheral interface 903 can be used to connect at least one input / output (I / O) related peripheral device to processor 901 and memory 902. In some embodiments of this specification, processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in other embodiments of this specification, any one or two of processor 901, memory 902, and peripheral interface 903 can be implemented on separate chips or circuit boards. This specification does not specifically limit the embodiments in this regard.
[0155] Display screen 904 is used to display a user interface (UI). The UI may include graphics, text, icons, videos, and any combination thereof. When display screen 904 is a touch display screen, it also has the ability to collect touch signals on or above its surface. These touch signals can be input as control signals to processor 901 for processing. In this case, display screen 904 can also be used to provide virtual buttons and / or a virtual keyboard, also known as soft buttons and / or a soft keyboard. In some embodiments of this specification, there may be one display screen 904, which is disposed on the front panel of electronic device 900; in other embodiments, there may be at least two display screens 904, respectively disposed on different surfaces of electronic device 900 or in a folded design; in still other embodiments, display screen 904 may be a flexible display screen, disposed on a curved or folded surface of electronic device 900. Furthermore, display screen 904 may be configured as a non-rectangular irregular shape, i.e., a non-rectangular screen. Display screen 904 may be made of materials such as Liquid Crystal Display (LCD) or Organic Light-Emitting Diode (OLED).
[0156] Camera 905 is used to capture images or videos. Optionally, camera 905 includes a front-facing camera and a rear-facing camera. Typically, the front-facing camera is located on the front panel of the electronic device, and the rear-facing camera is located on the back of the electronic device. In some embodiments, there are at least two rear-facing cameras, which are any one of a main camera, a depth-sensing camera, a wide-angle camera, and a telephoto camera, to achieve background blurring by fusion of the main camera and the depth-sensing camera, panoramic shooting by fusion of the main camera and the wide-angle camera, virtual reality (VR) shooting, or other fusion shooting functions. In some embodiments of this specification, camera 905 may also include a flash. The flash can be a single-color temperature flash or a dual-color temperature flash. A dual-color temperature flash refers to a combination of a warm light flash and a cool light flash, which can be used for light compensation at different color temperatures.
[0157] The audio circuit 906 may include a microphone and a speaker. The microphone is used to collect sound waves from the user and the environment, and convert the sound waves into electrical signals that are input to the processor 901 for processing. For stereo sound acquisition or noise reduction purposes, there may be multiple microphones, each located in a different part of the electronic device 900. The microphone may also be an array microphone or an omnidirectional microphone.
[0158] Power supply 907 is used to supply power to various components in electronic device 900. Power supply 907 can be AC power, DC power, a disposable battery, or a rechargeable battery. When power supply 907 includes a rechargeable battery, the rechargeable battery can be a wired rechargeable battery or a wireless rechargeable battery. A wired rechargeable battery is a battery that is charged via a wired line, while a wireless rechargeable battery is a battery that is charged via a wireless coil. The rechargeable battery can also be used to support fast charging technology.
[0159] The block diagrams of the electronic device shown in the embodiments of this specification do not constitute a limitation on the electronic device 900. The electronic device 900 may include more or fewer components than shown, or combine certain components, or use different component arrangements.
[0160] In the description of this specification, it should be understood that the terms "first," "second," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance. Those skilled in the art can understand the specific meaning of these terms in this specification based on the specific circumstances. Furthermore, in the description of this specification, unless otherwise stated, "multiple" means two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship.
[0161] This specification also provides a computer-readable storage medium storing instructions that, when executed on a computer or processor, cause the computer or processor to perform one or more steps in the above embodiments. The constituent modules of the above-described population-stratified ensemble learning model determination device and the above-described ensemble learning model prediction device, if implemented as software functional units and sold or used as independent products, can be stored in the above-described computer-readable storage medium.
[0162] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions. When these computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this specification are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in or transmitted through a computer-readable storage medium. The computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The aforementioned available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., Digital Versatile Discs (DVDs)), or semiconductor media (e.g., Solid State Disks (SSDs)).
[0163] It should be noted that the above description describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims may be performed in a different order than that shown in the embodiments and still achieve the desired results. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
[0164] The above description is merely a specific embodiment of this specification, but the scope of protection of this specification is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this specification should be included within the scope of protection of this specification. Therefore, equivalent variations made in accordance with the claims of this specification are still within the scope of this specification.
Claims
1. An ensemble learning method based on population stratification, wherein, The method, applied to ensemble learning models based on population stratification, includes: The audience segmentation section is used to determine the audience characteristics corresponding to the target user based on the user characteristics of the target user; The ensemble learning component consists of N sub-models, where N is an integer greater than 1. The weight generation part is used to generate the i-th weight corresponding to the i-th sub-model based on the demographic characteristics of the target user, where i is a positive integer not greater than N; The i-th sub-model is used to determine the i-th prediction branch value for the learning objective based on the user characteristics of the target user and the product characteristics of the marketing product; the ensemble learning part is used to determine the prediction value based on the i-th prediction branch value and the i-th weight. The population segmentation component includes: a feature processing network and a discrete latent variable space; The feature processing network is used to process the user features of the target user to obtain target latent variables; The discrete latent variable space is used to determine the first group characteristics corresponding to the target user based on the target latent variables; The method also includes: a semantic addition component; The semantic addition part is used to predict the learning target based on the first group characteristics to obtain the second group characteristics corresponding to the target user, wherein the second group characteristics include semantic features related to the learning target; The weight generation part is specifically used to generate the i-th weight corresponding to the i-th sub-model based on the characteristics of the second population group.
2. The method of claim 1, wherein, The ensemble learning model based on population segmentation includes: a population segmentation part, an ensemble learning part comprising N sub-models, and a weight generation part; the method includes: The user characteristics of the sample users are input into the population segmentation part, and the population segmentation part determines the population characteristics corresponding to the sample users; wherein, the user characteristics of the sample users and the population characteristics corresponding to the sample users are used to determine the first loss function; The population characteristics corresponding to the sample users are input into the weight generation part to generate the i-th weight corresponding to the i-th sub-model, where i takes the value of a positive integer not greater than N; Based on the user characteristics of the sample users and the product characteristics of the marketing products, the i-th prediction branch value for the learning objective is determined using the i-th sub-model. Through the ensemble learning component, a fine-grained sample prediction value is determined based on the i-th prediction branch value and the i-th weight; wherein, a second loss function is determined based on the fine-grained sample prediction value and the actual value corresponding to the sample user; The ensemble learning model based on population stratification is determined based on the first loss function and the second loss function.
3. The method of claim 2, wherein, The weight generation part is a gated network; The step of inputting the population characteristics corresponding to the sample users into the weight generation part to generate the i-th weight corresponding to the i-th sub-model includes: The population features corresponding to the sample users are input into the gating network to obtain a sample gate vector of dimension N, wherein the sample gate vector contains the i-th weight corresponding to the i-th sub-model; The method further includes: determining a third loss function based on the sample gate vector of a preset number of samples; Determining the ensemble learning model based on population stratification according to the first loss function and the second loss function includes: The ensemble learning model based on population stratification is determined based on the first loss function, the second loss function, and the third loss function.
4. The method of claim 2, wherein, The population segmentation component includes: a feature processing network and a discrete latent variable space; The step of inputting the user characteristics of the sample users into the population segmentation part, and determining the population characteristics corresponding to the sample users through the population segmentation part, includes: The user features of the sample users are input into the feature processing network for feature processing to obtain the sample latent vectors; The latent vector of the sample is compared with the vector in the discrete latent variable space to obtain the first group feature corresponding to the sample user; The first loss function is determined based on the latent vector of the sample and the features of the first population.
5. The method of claim 4, wherein, The model also includes: a semantic addition component; The step of inputting the population characteristics corresponding to the sample users into the weight generation part includes: The semantic addition part is used to predict the learning target based on the first group characteristics to obtain the second group characteristics corresponding to the target user, wherein the second group characteristics include semantic features related to the learning target; The second population feature is input into the weight generation part to generate the i-th weight corresponding to the i-th sub-model.
6. The method of claim 5, wherein, The method further includes: The characteristics of the first population group are used to predict the learning target, resulting in coarse-grained sample prediction values. The fourth loss function is determined based on the coarse-grained sample prediction value and the actual value corresponding to the sample user; Determining the ensemble learning model based on population stratification according to the first loss function and the second loss function includes: The ensemble learning model based on population stratification is determined based on the first loss function, the second loss function, and the fourth loss function.
7. The method of claim 6, wherein, After determining the fourth loss function based on the fine-grained sample predicted values and the actual values corresponding to the sample users, the method further includes: The target gradient determined based on the fourth loss function is assigned to the user features input into the feature processing network.
8. The method of claim 4, wherein, The method further includes: The fifth loss function is determined based on the parameters of the first layer of the feature processing network. Determining the ensemble learning model based on population stratification according to the first loss function and the second loss function includes: The ensemble learning model based on population stratification is determined based on the first loss function, the second loss function, and the fifth loss function.
9. The method of claim 2, wherein, The method includes: The user characteristics of the user to be tested are input into the population segmentation part of the ensemble learning model based on population stratification to obtain the population characteristics corresponding to the user to be tested. The user characteristics of the user to be tested and the product characteristics of the marketing product are respectively input into the N sub-models of the integrated learning part, wherein the i-th sub-model outputs the i-th prediction branch value about the learning objective, N is an integer greater than 1, and i is a positive integer not greater than N; The predicted value for the user to be tested is determined based on the i-th predicted branch value and the i-th weight, wherein the i-th weight is generated based on the population characteristics corresponding to the user to be tested.
10. A device for determining an ensemble learning model based on population stratification, wherein, The ensemble learning model based on population stratification is The method according to claim 1 includes: a population segmentation part, an ensemble learning part including N sub-models, and a weight generation part; the apparatus includes: The first loss determination module is used to input the user features of the sample users into the population segmentation part, and determine the population features corresponding to the sample users through the population segmentation part; wherein, the user features of the sample users and the population features corresponding to the sample users are used to determine the first loss function; The input module is used to input the population characteristics corresponding to the sample users into the weight generation part to generate the i-th weight corresponding to the i-th sub-model, where i takes the value of a positive integer not greater than N; The branch value determination module is used to determine the i-th predicted branch value about the learning objective based on the user characteristics of the sample users and the product characteristics of the marketing products through the i-th sub-model. The second loss determination module is used to determine the fine-grained sample prediction value based on the i-th prediction branch value and the i-th weight through the ensemble learning part; wherein, the second loss function is determined based on the fine-grained sample prediction value and the actual value corresponding to the sample user; The model training module is used to determine the ensemble learning model based on the population stratification according to the first loss function and the second loss function.
11. A prediction device based on an ensemble learning model, wherein, The device includes: The first input module is used to input the user characteristics of the user to be tested into the population segmentation part of the ensemble learning model based on population stratification, so as to obtain the population characteristics corresponding to the user to be tested, wherein the ensemble learning model based on population stratification is obtained by the method according to any one of claims 1 to 8; The second input module is used to input the user characteristics of the user to be tested and the product characteristics of the marketing product into the N sub-models of the integrated learning part, respectively. The i-th sub-model outputs the i-th prediction branch value about the learning objective, where N is an integer greater than 1 and i is a positive integer not greater than N. The prediction module is used to determine the predicted value for the user to be tested based on the i-th prediction branch value and the i-th weight, wherein the i-th weight is generated based on the population characteristics corresponding to the user to be tested.
12. A computer-readable storage medium having stored therein instructions, wherein, When the instructions are executed on a computer or processor, the computer or processor performs the ensemble learning method based on population stratification as described in any one of claims 1 to 9.
13. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein, When the processor executes the computer program, it implements the ensemble learning method based on population stratification as described in any one of claims 1 to 9.
14. A computer program product comprising instructions, wherein, When the computer program product is run on a computer or processor, it causes the computer or processor to perform the ensemble learning method based on population stratification as described in any one of claims 1 to 9.