Debiasing causal inference method and system for advertisement synergistic modeling scenarios
By using deep neural network models and feature balancing techniques, the problems of confusion bias and sample selection bias in advertising efficiency modeling were solved, thereby optimizing advertising costs and rationally allocating resources.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2023-06-16
- Publication Date
- 2026-06-16
AI Technical Summary
Existing advertising efficiency modeling methods have failed to effectively remove confounding bias and sample selection bias in the observation data, resulting in unreasonable allocation of advertising resources and increased advertising costs.
A deep neural network model is used to extract information from store and consumer features. The model is trained by an integral probability metric module and a mean square error function to eliminate confusion bias and sample selection bias. A balanced feature representation and result prediction network model is constructed to accurately estimate the transaction amount after advertising.
It enables accurate estimation of advertising efficiency in advertising business scenarios, reduces placement costs, and improves the efficiency of rational allocation of advertising resources.
Smart Images

Figure CN116777523B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of causal reasoning and efficiency modeling, and in particular to a bias-free causal inference method and system for advertising efficiency modeling scenarios. Background Technology
[0002] With the decline of internet population and traffic dividends, advertisers face challenges in increasing their scale and budgets. Given limited budgets, shifting from a traditional absolute value perspective to an incremental value perspective to help stores and platforms uncover incremental value becomes increasingly important. Efficiency modeling aims to estimate the incremental value of sales generated after advertising campaigns compared to when advertising was not running, based on observational data. This can guide targeted advertising and save on advertising costs. Traditional advertising efficiency modeling methods simply use deep learning and other technologies to build predictive models of consumer and store characteristics and the impact of advertising campaigns on sales, ignoring the complex biases inherent in the observational data. This leads to a significant discrepancy between the estimated advertising efficiency and the actual advertising efficiency in practical applications, severely impacting the rational allocation of advertising resources and increasing advertising costs. Therefore, how to remove the complex biases in the observational data and establish a stable and interpretable advertising efficiency model is currently a key focus and challenge in the field of efficiency modeling.
[0003] In observational data, associations primarily originate from three sources: causal relationships, confounding bias, and sample selection bias. Only causal relationships constitute stable, interpretable, and genuine associations, while the latter two are unstable, uninterpretable, and spurious associations. Causal inference aims to eliminate spurious associations in observational data and uncover reliable causal relationships within the data, making it a powerful statistical modeling tool for interpretability analysis and stable learning. Specifically, a major challenge in causal inference from observational data is that only the results of factual data are observable, while the results of counterfactual data are unobservable. If confounding bias exists due to confounding variables, the distribution of unobservable counterfactual data will be inconsistent with the distribution of observable factual data, leading to biased estimations based on observational data. Another major challenge is that sample data is drawn from the entire dataset of the research objective. If, due to human factors or uncontrollable factors, the sample selection process is not completely random, sample selection bias will occur, causing the distribution of sample data to be inconsistent with the overall data distribution, thus biasing the estimations based on sample data. In complex scenarios like advertising effectiveness modeling with massive amounts of high-dimensional data, both of these biases often coexist. Summary of the Invention
[0004] The purpose of this invention is to overcome the shortcomings of existing technologies and provide a causal inference technique for collaboratively resolving confusion bias and sample selection bias in advertising efficiency modeling scenarios. Compared with general causal inference algorithms, this invention fully considers the objective fact that confusion bias and sample selection bias coexist in advertising efficiency modeling scenarios, and that many assumptions of existing methods are difficult to meet in this scenario. It effectively eliminates confusion bias and sample selection bias without introducing strong assumptions and while ensuring practicality in advertising business scenarios.
[0005] To achieve the above-mentioned objectives, the technical solution adopted by this invention is as follows:
[0006] In a first aspect, the present invention provides a bias-free causal inference method for advertising efficiency improvement modeling scenarios, which includes the following steps:
[0007] S1: Extract the current advertising efficiency logs from the target platform, and based on whether the consumer and store information is recorded in the current advertising efficiency logs, divide the store and consumer feature data into two parts: selected sample data and non-selected sample data. Then, perform sample selection and labeling on the two parts of data respectively. The selected sample data is further divided into advertising delivery groups and blocking groups, thereby constructing a dataset.
[0008] S2: Construct a deep neural network model to extract information from store and consumer features, and train the deep neural network model using the dataset. During training, an integral probability metric module is used to limit the distribution similarity of the extracted information in order to obtain a balanced feature representation between the unselected sample data and the selected sample data, as well as between the advertising group and the advertising blocking group of the selected sample data. After training, a balanced feature extraction model is obtained.
[0009] S3: Construct a post-launch transaction volume prediction model and a post-masking result prediction network model based on deep neural networks. Train the model using the balanced feature representations of the selected sample data launch group and the selected sample data masking group, respectively, to obtain a post-launch result prediction network model and a post-masking result prediction network model that removes confusion bias and sample selection bias.
[0010] S4: Based on the trained balanced feature extraction model, the post-advertising result prediction network model, and the post-blocking result prediction network model, the transaction amount after advertising for target stores and target consumers and the transaction amount after advertising blocking are predicted respectively, and the advertising efficiency corresponding to target stores and target consumers is estimated.
[0011] As a preferred embodiment of the first aspect above, the specific implementation steps of S1 are as follows:
[0012] S101: Based on whether consumer and store information is in the current period's advertising effectiveness log record, the store and consumer characteristic data are divided into two parts: data selected for the sample and data not selected for the sample. The current period's advertising effectiveness log record is extracted from the target platform and records whether the store's advertisements were delivered to consumers, and the consumer's transaction amount at the corresponding store before the end of the current period. Whether a consumer accepted the advertising is defined as a processing variable and denoted as t, where t=1 indicates that the consumer accepted the advertising and belongs to the delivery group, and t=0 indicates that the consumer did not accept the advertising and belongs to the blocking group. The consumer's transaction amount at the corresponding store before the end of the current period is defined as a result variable and denoted as y. The store and consumer characteristic variable set is represented as follows: Where x j Let d represent the j-th variable in the feature variable set, and d represent the total number of variables in the feature variable set.
[0013] Using s as the sample selection label, store and consumer data that are recorded in the current advertising efficiency log from all stores and all consumers on the target platform are marked as selected sample data, denoted as s=1; store and consumer data that are not recorded in the current advertising efficiency log from all stores and consumers are marked as unselected sample data, denoted as s=0;
[0014] S102: Based on the current advertising efficiency log data and store and consumer characteristic data, represent each data point as a quadruple (x, t, y, s) and construct a dataset; where x, t, y, and s of data with s = 1 are observable, while t and y of data with s = 0 are unobservable, and their values are set to NaN in the dataset.
[0015] As a preferred embodiment of the first aspect above, the specific implementation steps of S2 are as follows:
[0016] S201: Construct a deep neural network model F1 for feature learning. Input the set of store and consumer feature variables x from the dataset into the deep neural network model F1 to obtain its vectorized feature representation.
[0017] S202: Based on the data processing variable t and the sample selection marker s, The data is categorized into feature representations of unselected sample data, feature representations of selected sample data, feature representations of the selected sample data in the distribution group, and feature representations of the selected sample data in the masked group. The feature representations of unselected sample data consist of the feature representations of all data where s=0. Composition, denoted as The feature representation of the selected sample data consists of all data with s=1. Composition, denoted as The feature group of the selected sample data is represented by all data where s=1 and t=1. Composition, denoted as The masking group feature of the selected sample data is represented by all data where s=1 and t=0. Composition, denoted as
[0018] S203: Calculated using an integral probability metric module. and Distribution distance between as well as and Distribution distance between The F1 deep neural network model was trained using the dataset, and the distance was constrained during training to achieve the following training objective:
[0019]
[0020] Where minf(·) means minimizing the function f(·), D(·,·) represents the integral probability metric function used during training, and α and β are adjustable hyperparameters during training.
[0021] After the deep neural network model F1 is trained, the balanced feature extraction model F'1 is obtained.
[0022] As a preferred embodiment of the first aspect above, the specific implementation steps of S3 are as follows:
[0023] S301: Construct a pair of deep neural network models F2 and F3. The two network models have the same structure and are used to predict the results after deployment and the results after blocking, respectively.
[0024] S302: Will Inputting the data into the network model F2 used to predict ad delivery results yields the predicted sales revenue after ad delivery. Will Inputting the data into network model F3 to predict the results after ad blocking yields the predicted transaction volume after ad blocking.
[0025] S303: Calculate using the mean square error function respectively The actual outcome variable y in the selected sample data distribution group (t=1) t=1 The differences, and The true result variable value y in the selected sample data masking group (t=0) t=0 The differences were analyzed, and the dataset was used to train deep neural network models F2 and F3. During training, the error was constrained to achieve the following training objectives:
[0026]
[0027] Where MSE(·,·) represents the mean squared error function, and γ is an adjustable hyperparameter during the training process, used to balance the difference in the number of data points in the delivery group and the shielded group in the selected sample data;
[0028] After the deep neural network models F2 and F3 are trained, they are used as the network model F'2 for predicting the results after deployment and the network model F'3 for predicting the results after shielding, respectively.
[0029] As a preferred embodiment of the first aspect above, the specific implementation steps of S4 are as follows:
[0030] S401: Input the set of feature variables x of the target store and the target consumer into the trained balanced feature extraction model F'1 for feature learning to obtain its balanced feature representation;
[0031] S402: Input the balanced feature representation obtained in S401 into the post-launch result prediction network model F'2 for predicting post-launch results and the post-blocking result prediction network model F'3 for predicting post-blocking results, respectively, to obtain the predicted values of post-launch transaction amount and post-blocking transaction amount, and use the difference between the two as the estimated value of advertising efficiency.
[0032] As a preferred embodiment of the first aspect above, in S203, the following loss function is used during the training process to achieve the training objective:
[0033]
[0034] In step S303, the following loss function is used during the training process to achieve the training objective:
[0035]
[0036] The deep neural network models F1, F2, and F3 are trained together as a whole. The total loss function used in the training process of the entire network model is as follows:
[0037]
[0038] As a preferred embodiment of the first aspect mentioned above, both α and β are set to 0.1; and the integral probability metric D(·,·) is a square root linear maximization mean metric, calculated using the following formula:
[0039]
[0040] Where a and b represent any two sets of variables with the same number of variables, m d m represents the number of variables. a and mb These represent the number of data entries in data 'a' and data entries in data 'b', respectively. In a, the i-th a The value of the j-th variable in the data. In b, the i-th b The value of the j-th variable in the data.
[0041] As a preferred embodiment of the first aspect mentioned above, γ is taken as the ratio of the number of data entries in the projection group to the number of data entries in the shielded group in the selected sample data; the mean square error function MSE(·,·) is calculated using the following formula:
[0042]
[0043] Where 'a' represents the observed value of a certain variable. This represents the predicted value of the variable, where m represents the number of data points, and a... i This represents the observed value of 'a' in the i-th data point. This represents the predicted value of 'a' in the i-th data point.
[0044] Secondly, the present invention provides a method for selecting target users for advertising enhancement on online shopping platforms. Specifically, for a target store, the method described in any of the solutions in the first aspect above is used to obtain an estimated value of advertising enhancement between the target store and each consumer on the platform. Then, based on the estimated value of advertising enhancement, consumers on the platform are selected, and the selection results are used as the target users for advertising enhancement of the target store.
[0045] Thirdly, the present invention provides an advertising-enhancing target user screening system for online shopping platforms, comprising:
[0046] The advertising efficiency estimation module is used to obtain an estimated advertising efficiency value between the target store and each consumer on the platform, based on the biased causal inference method described in any of the solutions in the first aspect above.
[0047] The user filtering module is used to filter consumers on the platform based on the advertising efficiency estimate according to the preset filtering strategy, and use the filtering results as the target users for advertising efficiency enhancement of the target store.
[0048] Compared with the prior art, the present invention has the following beneficial effects:
[0049] This invention addresses the problem of advertising enhancement modeling where both confusion bias and sample selection bias exist. It points out that confusion bias and sample selection bias arise from inconsistencies between the distribution of some store and consumer characteristics (based on observed ad placement and / or blocking) and the distribution of all store and consumer characteristics, respectively, and also from inconsistencies in the observed distributions of store and consumer characteristics between the ad placement group and the ad blocking group. Therefore, this invention proposes a novel method that simultaneously optimizes a balanced feature representation learning model that constrains the distributions of store and consumer characteristics in the placement and blocking groups, ensures that the distributions of store and consumer characteristics in the selected and unselected samples are similar, and includes a pair of prediction network models for post-ad placement and post-ad blocking results, thereby accurately estimating advertising enhancement. Based on this invention, it can be further applied to online shopping platforms for target user screening in advertising enhancement, selectively selecting target users with high consumption potential based on the estimated advertising enhancement results. Practical results demonstrate the superior performance of the proposed method. This invention can also be directly applied to other causal inference tasks with observational data exhibiting confusion bias and / or sample selection bias. Attached Figure Description
[0050] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0051] Figure 1 This is a schematic diagram of the causal inference technology process for collaboratively resolving confusion bias and sample selection bias in advertising efficiency modeling scenarios provided by an embodiment of the present invention.
[0052] Figure 2 This is a logic diagram of the collaborative learning model for balanced representation and transaction volume prediction in an embodiment of the present invention. Detailed Implementation
[0053] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0054] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0055] To address the problems existing in current technologies, this invention focuses on simultaneously resolving confounding bias and sample selection bias in observed data without introducing strong assumptions and while ensuring practicality in advertising business scenarios. Confounding bias arises from the difference in data distribution between the treatment group and the control group in the observed data, while sample selection bias arises from the difference between the distribution of the overall data and the distribution of the sample data. Recently, some researchers have studied causal inference methods to address confounding bias, using methods such as reweighting and feature extraction to make the distributions of the treatment and control groups in the sample data similar. However, they ignore sample selection bias and only use subsample data with observable results for treatment effect estimation, making the estimation results still affected by sample selection bias. Other researchers have studied causal inference methods to address sample selection bias, but they often make strong assumptions about the distribution of data and noise, the functional form of the outcome variable generation process, the functional form of the sample selection process, and the causal graph structure. These assumptions are often difficult to meet in reality, thus limiting their widespread application in advertising business scenarios. In summary, there is currently no mature method that can simultaneously resolve confounding bias and sample selection bias without introducing strong assumptions.
[0056] like Figure 1 As shown, in a preferred embodiment of the present invention, a bias-free causal inference method for advertising efficiency improvement modeling scenarios is provided, comprising the following steps:
[0057] S1: Extract the current advertising efficiency logs from the target platform, and based on whether the consumer and store information is recorded in the current advertising efficiency logs, divide the store and consumer feature data into two parts: selected sample data and non-selected sample data. Then, perform sample selection labeling on the two parts of data. The selected sample data is further divided into advertising delivery group and blocking group, thereby constructing a dataset.
[0058] In this embodiment, the specific implementation steps of S1 are as follows:
[0059] S101: Based on whether consumer and store information is in the current period's advertising effectiveness log record, the store and consumer characteristic data are divided into two parts: data selected for the sample and data not selected for the sample. The current period's advertising effectiveness log record is extracted from the target platform and records whether the store's advertisements were delivered to consumers, and the consumer's transaction amount at the corresponding store before the end of the current period. Whether the consumer accepted the advertising is defined as a processing variable and denoted as t, where t=1 indicates that the consumer accepted the advertising (i.e., belongs to the advertising group), and t=0 indicates that the consumer did not accept the advertising (i.e., belongs to the blocked advertising group). The consumer's transaction amount at the corresponding store before the end of the current period is defined as a result variable and denoted as y. The store and consumer characteristic variable set is represented as follows: Where x j Let d represent the j-th variable in the feature variable set, and d represent the total number of variables in the feature variable set.
[0060] In this embodiment, due to time and cost constraints, only a sub-sample of all stores and consumers is monitored and recorded in the current advertising effectiveness log. Therefore, the current advertising effectiveness log records whether advertisements from some stores were delivered to some consumers during the current period, and the transaction amount of consumers in the corresponding stores before the end of the current period. In this embodiment, s is used as the sample selection marker. The store and consumer data recorded in the current advertising effectiveness log from all store and consumer data on the target platform are marked as selected sample data, denoted as s=1; the store and consumer data not recorded in the current advertising effectiveness log from all store and consumer data are marked as unselected sample data, denoted as s=0.
[0061] S102: Based on the current advertising efficiency log data and store and consumer characteristic data, represent each data point used for training and testing as a quadruple (x, t, y, s) and construct a dataset; where x, t, y, and s of data with s = 1 are observable, while t and y of data with s = 0 are unobservable, and their values are set to NaN in the dataset.
[0062] Table 1 below shows the sample selection labels and dataset construction results for some data in the embodiments of the present invention.
[0063] Table 1 shows the sample selection tags and dataset construction results in the examples.
[0064]
[0065] S2: Construct a deep neural network model to extract information from store and consumer features, and train the deep neural network model using the dataset. During the training process, an integral probability metric module is used to limit the distribution similarity of the extracted information in order to obtain a balanced feature representation between the unselected sample data and the selected sample data, as well as between the advertising group and the advertising blocking group of the selected sample data. After training, a balanced feature extraction model is obtained.
[0066] In this embodiment, the specific implementation steps of S2 are as follows:
[0067] S201: Construct a deep neural network model F1 for feature learning. Input the set of store and consumer feature variables x from the dataset into the deep neural network model F1 to obtain its vectorized feature representation.
[0068] S202: Based on the data processing variable t and the sample selection marker s, The data is categorized into feature representations of unselected sample data, feature representations of selected sample data, feature representations of the selected sample data in the distribution group, and feature representations of the selected sample data in the masked group. The feature representations of unselected sample data consist of the feature representations of all data where s=0. Composition, denoted as The feature representation of the selected sample data consists of all data with s=1. Composition, denoted as The feature group of the selected sample data is represented by all data where s=1 and t=1. Composition, denoted as The masking group feature of the selected sample data is represented by all data where s=1 and t=0. Composition, denoted as
[0069] S203: Calculated using an integral probability metric module. and Distribution distance between as well as and Distribution distance between The F1 deep neural network model was trained using the dataset, and the distance was constrained during training to achieve the following training objective:
[0070]
[0071] Where minf(·) means minimizing the function f(·), D(·,·) represents the integral probability metric function used during training, and α and β are adjustable hyperparameters during training.
[0072] After the deep neural network model F1 is trained, the balanced feature extraction model F'1 is obtained.
[0073] S3: Construct a post-launch transaction volume prediction model and a post-masking result prediction network model based on deep neural networks. Train the model using the balanced feature representations of the selected sample data launch group and the selected sample data masking group, respectively, to obtain post-launch result prediction network models and post-masking result prediction network models that remove confusion bias and sample selection bias.
[0074] In this embodiment, the specific implementation steps of S3 are as follows:
[0075] S301: Construct a pair of deep neural network models F2 and F3. The two network models have the same structure and are used to predict the results after deployment and the results after blocking, respectively.
[0076] It should be noted that the internal network structure and processing flow of the two models F2 and F3 are the same, but their training data are different. Therefore, the predicted values of the outcome variables under different processing variable values can be given. See S302 to S303 for details.
[0077] S302: Will Inputting the data into the network model F2 used to predict ad delivery results yields the predicted sales revenue after ad delivery. Will Inputting the data into network model F3 to predict the results after ad blocking yields the predicted transaction volume after ad blocking.
[0078] In the above steps, only the selected sample data of the delivery group is used to train the prediction model of the delivery result, and only the selected sample data of the shielding group is used to train the prediction model of the shielding result. Because this invention uses balanced vectorized feature representation to replace the original feature variables, the influence of distribution inconsistency caused by confusion bias and sample selection bias is eliminated, and the trained model is unbiased.
[0079] S303: Calculate using the mean square error function respectively The actual outcome variable y in the selected sample data distribution group (t=1) t=1 The differences, and The true result variable value y in the selected sample data masking group (t=0) t=0 The differences were analyzed, and the dataset was used to train deep neural network models F2 and F3. During training, the error was constrained to achieve the following training objectives:
[0080]
[0081] Where MSE(·,·) represents the mean squared error function, and γ is an adjustable hyperparameter during the training process, used to balance the difference in the number of data points in the delivery group and the shielded group in the selected sample data;
[0082] After the deep neural network models F2 and F3 are trained, they are used as the network model F'2 for predicting the results after deployment and the network model F'3 for predicting the results after shielding, respectively.
[0083] It should be noted that this invention requires training three deep neural network models, namely F1, F2, and F3. In this embodiment, the three deep neural network models F1, F2, and F3 can be trained together as a whole; therefore, a general loss function needs to be constructed to achieve the joint training of the three models. In this embodiment, in S203 above, the following loss function is used during the training process to achieve the training objective:
[0084]
[0085] In S303 above, the following loss function is used during training to achieve the training objective:
[0086]
[0087] Therefore, the total loss function used during the training of the entire network model is as follows:
[0088]
[0089] In this embodiment, the above calculation When considering the loss term, both α and β are 0.1; and the integral probability metric D(·,·) is a square root linear maximization mean metric, with the general formula as follows:
[0090]
[0091] Where a and b represent any two sets of variables with the same number of variables, m d m represents the number of variables. a and m b These represent the number of data entries in data 'a' and data entries in data 'b', respectively. In a, the i-th a The value of the j-th variable in the data. In b, the i-th b The value of the j-th variable in the data.
[0092] In this embodiment, the above calculation When considering the loss term, γ is taken as the ratio of the number of data points in the projection group to the number of data points in the shielded group in the selected sample data; the general formula for calculating the mean square error function MSE(·,·) is as follows:
[0093]
[0094] Where 'a' represents the observed value of a certain variable. This represents the predicted value of the variable, where m represents the number of data points, and a... i This represents the observed value of 'a' in the i-th data point. This represents the predicted value of 'a' in the i-th data point.
[0095] After the three deep neural network models F1, F2 and F3 are trained, they correspond to the balanced feature extraction model F'1, the post-advertising result prediction network model F'2 and the post-blocking result prediction network model F'3. Therefore, for any store and consumer, these three trained models can be used to predict the transaction amount before and after advertising. The following will describe this in detail.
[0096] S4: Based on the trained balanced feature extraction model, the post-advertising result prediction network model, and the post-blocking result prediction network model, the transaction amount after advertising for target stores and target consumers and the transaction amount after advertising blocking are predicted respectively, and the advertising efficiency corresponding to target stores and target consumers is estimated.
[0097] In this embodiment, the specific implementation steps of S4 are as follows:
[0098] S401: Input the set of feature variables x of the target store and the target consumer into the trained balanced feature extraction model F'1 for feature learning to obtain its balanced feature representation;
[0099] S402: Input the balanced feature representation obtained in S401 into the post-launch result prediction network model F'2 for predicting post-launch results and the post-blocking result prediction network model F'3 for predicting post-blocking results, respectively, to obtain the predicted values of post-launch transaction amount and post-blocking transaction amount, and use the difference between the two as the estimated value of advertising efficiency.
[0100] Therefore, for each pairing of stores and consumers, the advertising synergy can be estimated using S401 and S402 as described above. At this point, counterfactual outcomes that were originally missing in the data (the results of data originally at t=1 at t=0, and the results of data originally at t=0 at t=1), as well as unobservable facts and counterfactual outcomes that were not included in the sample data (the results of data originally at s=0 at t=0 and t=1), can all be predicted unbiasedly.
[0101] Furthermore, this invention can also be used to estimate the average processing effect of all data. That is, if it is necessary to estimate the advertising synergy among all stores and all consumers, the transaction volume after advertising and the transaction volume after advertising blocking of all stores and consumers are predicted to estimate the advertising synergy of all stores and consumers, and the corresponding average value is calculated according to the following formula:
[0102]
[0103] Where n represents the number of data entries in the entire dataset. This represents the estimated value of the advertising effectiveness result after the i-th data point is delivered. This represents the estimated value of the advertising effectiveness result after blocking the i-th data.
[0104] The overall framework corresponding to S1 to S4 above is as follows: Figure 2 As shown, the deep neural network model used for feature representation learning and the deep neural network model used for predicting the results after deployment and after shielding are combined into a whole and trained together. That is, the deep neural network model for feature representation learning, the deep neural network model for predicting the results after deployment, and the deep neural network model for predicting the results after shielding all need to be trained and optimized together to obtain the aforementioned balanced feature extraction model F'1, the network model for predicting the results after deployment F'2, and the network model for predicting the results after shielding F'3.
[0105] In this invention, a deep neural network model for feature representation learning extracts information from the feature variable set of all stores and consumers. It constrains the distribution of store and consumer features in the advertising and blocking groups, as well as the distribution of features in the selected samples and those in the unselected samples, to be similar, resulting in a balanced feature representation. This effectively eliminates the distribution differences caused by confusion bias and sample selection bias. Therefore, it can predict the post-advertising sales volume and post-advertising sales volume of all stores and consumers using only the observed transaction volume and balanced vectorized features of the selected sample data. Through this process, this invention can unbiasedly estimate the advertising efficiency of all stores and consumers.
[0106] Based on the bias-free causal inference method for advertising efficiency improvement modeling scenarios shown in S1 to S4 of this invention, the performance of advertising efficiency improvement can be objectively evaluated through relevant indicators, including:
[0107] 1) Bias. This index directly compares the estimated average increase in efficiency in each experiment with the actual value, calculates the absolute value of the average difference, and measures the correctness of the estimation result.
[0108]
[0109] Where k represents the number of experiments, with a value of 50; ATE represents the average scalar efficiency estimate for the i-th experiment. i This represents the true average synergy value of the i-th experiment.
[0110] 2) Standard Deviation (SD). This index calculates the standard deviation of the estimated average scalar increase in the experiment, measuring the stability of the estimation results.
[0111]
[0112] 3) Mean Absolute Error (MAE). This indicator calculates the average of the absolute values of the differences between the estimated and actual average efficiency gains, measuring the accuracy of the estimation results.
[0113]
[0114] 4) Root Mean Square Error (RMSE). This metric calculates the root mean square of the difference between the estimated and actual average efficiency gains, and measures both the accuracy and stability of the estimation results.
[0115]
[0116] The results show that the estimation results of this invention are basically consistent with the actual advertising efficiency gains, meeting the application requirements. Furthermore, the estimation method of this invention has high stability and accuracy in efficiency estimation. In another embodiment of this invention, a method for selecting target users for advertising efficiency gains on online shopping platforms can be further provided, the specific steps of which are as follows:
[0117] For the target store, the biased causal inference method for the advertising efficiency modeling scenario shown in S1 to S4 above is used to obtain the advertising efficiency estimate between the target store and each consumer on the platform. Then, based on the advertising efficiency estimate, consumers on the platform are screened, and the screening results are used as the target users for advertising efficiency enhancement of the target store.
[0118] It should be noted that the method for estimating advertising effectiveness after pairing each target store and each consumer on the platform is the aforementioned S4 step. The feature variable set x of the target store and the corresponding consumer is input into the trained balanced feature extraction model F'1 for feature learning to obtain its balanced feature representation. The obtained balanced feature representation is then input into the post-campaign result prediction network model F'2 for predicting the post-campaign result and the post-blocking result prediction network model F'3 for predicting the post-blocking result, respectively, to obtain the predicted values of the post-campaign transaction amount and the post-blocking transaction amount, respectively, and the difference between the two is used as the advertising effectiveness estimate.
[0119] In another embodiment of the present invention, an advertising-enhancing target user screening system for online shopping platforms can be further provided, comprising:
[0120] The advertising efficiency estimation module is used to obtain the advertising efficiency estimate between the target store and each consumer on the platform by using the biased causal inference method for advertising efficiency modeling scenarios shown in S1 to S4 above.
[0121] The user filtering module is used to filter consumers on the platform based on the advertising efficiency estimate according to the preset filtering strategy, and use the filtering results as the target users for advertising efficiency enhancement of the target store.
[0122] It should be noted that the specific screening strategy in this invention can be adjusted according to actual needs. For example, a minimum threshold for advertising effectiveness can be set, and then all consumers on the platform whose estimated advertising effectiveness exceeds this minimum threshold can be selected as target users. Alternatively, target users can be screened according to the number of users who intend to receive advertising from the target store, in descending order of estimated advertising effectiveness, thereby selecting a specified number of target users with the highest estimated advertising effectiveness and maximizing the increase in transaction volume.
[0123] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the invention. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, all technical solutions obtained through equivalent substitution or transformation fall within the protection scope of the present invention.
Claims
1. A bias-free causal inference method for advertising efficiency improvement modeling scenarios, characterized in that, Includes the following steps: S1: Extract the current advertising efficiency logs from the target platform, and based on whether the consumer and store information is recorded in the current advertising efficiency logs, divide the store and consumer feature data into two parts: selected sample data and non-selected sample data. Then, perform sample selection and labeling on the two parts of data respectively. The selected sample data is further divided into advertising delivery groups and blocking groups, thereby constructing a dataset. S2: Construct a deep neural network model to extract information from store and consumer features, and train the deep neural network model using the dataset. During the training process, an integral probability metric module is used to limit the distribution similarity of the extracted information in order to obtain a feature representation that is balanced between the unselected sample data and the selected sample data, as well as between the advertising group and the advertising blocking group of the selected sample data. After training, a balanced feature extraction model is obtained; S3: Construct a post-launch transaction volume prediction model and a post-masking result prediction network model based on deep neural networks. Train the model using the balanced feature representations of the selected sample data launch group and the selected sample data masking group, respectively, to obtain a post-launch result prediction network model and a post-masking result prediction network model that removes confusion bias and sample selection bias. S4: Based on the trained balanced feature extraction model, the post-advertising result prediction network model, and the post-blocking result prediction network model, the transaction amount after advertising for target stores and target consumers and the transaction amount after advertising blocking are predicted respectively, and the advertising efficiency corresponding to target stores and target consumers is estimated. The specific implementation steps of S3 are as follows: S301: Construct a pair of deep neural network models F2 and F3. The two network models have the same structure and are used to predict the results after deployment and the results after blocking, respectively. S302: Will Inputting the data into the network model F2 used to predict ad delivery results yields the predicted sales revenue after ad delivery. ;Will Inputting the data into network model F3 to predict the results after ad blocking yields the predicted transaction volume after ad blocking. ; S303: Calculate using the mean square error function respectively Compared with the actual outcome variable values in the selected sample data distribution group The differences, and The true result variable values in the selected sample data masking group The differences were analyzed, and the dataset was used to train deep neural network models F2 and F3. During training, the error was constrained to achieve the following training objectives: in, This represents the mean square error function. It is an adjustable hyperparameter during the training process, used to balance the difference in the number of data entries in the selected sample data between the delivery group and the shielded group; After the deep neural network models F2 and F3 are trained, they are used as the network model F'2 for predicting the results after deployment and the network model F'3 for predicting the results after shielding, respectively.
2. The bias-free causal inference method for advertising efficiency improvement modeling scenarios as described in claim 1, characterized in that, The specific implementation steps of S1 are as follows: S101: Based on whether consumer and store information is in the current period's advertising effectiveness log record, the store and consumer characteristic data are divided into two parts: data selected for the sample and data not selected for the sample. The current period's advertising effectiveness log record is extracted from the target platform and records whether the store's advertisements were delivered to consumers, and the consumer's transaction volume at the corresponding store before the end of the current period. Whether the consumer accepted the advertising is defined as a processing variable and expressed as... , This means that consumers who accept the advertising campaign belong to the advertising group. Consumers who did not receive advertising are considered to be in the blocking group; the total transaction amount of consumers in the corresponding stores before the end of the current period is defined as the outcome variable and represented as... The set of store and consumer characteristic variables is represented as follows: ,in Represents the first in the set of feature variables One variable, This represents the total number of variables in the feature variable set; by To select samples, mark the store and consumer data recorded in the current advertising performance log from all stores and consumers on the target platform as the selected sample data, denoted as . Store and consumer data that were not recorded in the current advertising effectiveness log will be marked as data not selected for the sample, and recorded as... ; S102: Based on the current advertising effectiveness log data and store and consumer characteristic data, represent each data point as a quadruple ( ), and construct a dataset; among which Data Both are observable, and In the data It is unobservable, and its value is set to [value] in the dataset. .
3. The bias-free causal inference method for advertising efficiency improvement modeling scenarios as described in claim 1, characterized in that, The specific implementation steps of S2 are as follows: S201: Construct a deep neural network model F1 for feature learning, incorporating the store and consumer feature variables from the dataset. Input the deep neural network model F1 to obtain its vectorized feature representation. ; S202: According to the data processing variables and sample selection markers Will The features are categorized into three groups: feature representations of unselected sample data, feature representations of selected sample data, feature representations of the selected sample data distribution group, and feature representations of the selected sample data masking group. The feature representations of unselected sample data are comprised of all... Data feature representation Composition, denoted as The feature representation of the selected sample data is composed of all Data Composition, denoted as The features of the selected sample data distribution group are represented by all and Data Composition, denoted as The masking group features selected for inclusion in the sample data are represented by all and Data Composition, denoted as ; S203: Calculated using an integral probability metric module. and Distribution distance between ,as well as and Distribution distance between The F1 deep neural network model was trained using the dataset, and the distance was constrained during training to achieve the following training objective: in Indicates that the function Take the minimum value. This indicates the integral probability metric function used during training. and These are all adjustable hyperparameters during the training process; After the deep neural network model F1 is trained, the balanced feature extraction model F'1 is obtained.
4. The bias-free causal inference method for advertising efficiency improvement modeling scenarios as described in claim 1, characterized in that, The specific implementation steps of S4 are as follows: S401: Set of characteristic variables for target stores and target consumers Input the trained balanced feature extraction model F'1 for feature learning to obtain its balanced feature representation; S402: Input the balanced feature representation obtained in S401 into the post-launch result prediction network model F'2 for predicting post-launch results and the post-blocking result prediction network model F'3 for predicting post-blocking results, respectively, to obtain the predicted values of post-launch transaction amount and post-blocking transaction amount, and use the difference between the two as the estimated value of advertising efficiency.
5. The bias-free causal inference method for advertising efficiency improvement modeling scenarios as described in claim 3, characterized in that, In S203, the following loss function is used during the training process to achieve the training objective: In step S303, the following loss function is used during the training process to achieve the training objective: The deep neural network models F1, F2, and F3 are trained together as a whole. The total loss function used in the training process of the entire network model is as follows:
6. The bias-free causal inference method for advertising efficiency improvement modeling scenarios as described in claim 3, characterized in that, and The values are all 0.1; and the integral probability metric function used is... The formula for calculating the mean as a linear maximization of the square root is: in and This represents a set of variables where any two variables have the same number of elements. Indicates the number of variables. and They represent Number of data entries and Number of data entries express The Middle The first data item The values of the variables, express The Middle The first data item The values of the variables.
7. The bias-free causal inference method for advertising efficiency improvement modeling scenarios as described in claim 1, characterized in that, The value is the ratio of the number of data entries in the selected sample data to the number of data entries in the projection group to the number of data entries in the shielded group. Mean square error function used The calculation formula is in Represents the observed value of a variable. This represents the predicted value of the variable. Indicates the number of data entries. Indicates the first In the data The observed values, Indicates the first In the data The predicted value.
8. A method for selecting target users to enhance advertising effectiveness on online shopping platforms, characterized in that, For a target store, the advertising efficiency estimate between the target store and each consumer on the platform is obtained by using the biased causal inference method described in any one of claims 1 to 7. Then, consumers on the platform are screened based on the advertising efficiency estimate, and the screening results are used as the target users for advertising efficiency enhancement of the target store.
9. A target user screening system for improving advertising effectiveness on online shopping platforms, characterized in that, include: The advertising efficiency estimation module is used to obtain an estimated advertising efficiency between a target store and each consumer on the platform, according to the biased causal inference method described in any one of claims 1 to 7. The user filtering module is used to filter consumers on the platform based on the advertising efficiency estimate according to the preset filtering strategy, and use the filtering results as the target users for advertising efficiency enhancement of the target store.